Download - Stop Worrying & Love the SQL - A Case Study
![Page 1: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/1.jpg)
Stop WorryingLove the SQL! (the Quepid story)
![Page 2: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/2.jpg)
OpenSource Connections
Me/UsDoug Turnbull@softwaredoug
Likes: Solr, Elasticsearch, Cassandra, Postgres
OpenSource Connections@o19s
Search, Discovery and Analytics
Let us introduce you to freelancing!
![Page 3: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/3.jpg)
OpenSource Connections
Most Importantly we do...Make my search results more relevant!
“Search Relevancy”
What database works best for problem X?“(No)SQL Architect/Trusted Advisor”
![Page 4: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/4.jpg)
OpenSource Connections
How products actually get built
Rena: Doug, John can you come by this afternoon?
One of our Solr-based products needs some urgent relevancy work
Its Friday, it needs to get done today!
Us: Sure!
The Client(Rena!)smart cookie!
![Page 5: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/5.jpg)
OpenSource Connections
A few hours laterUs: we’ve made a bit of progress!
image frustration-1081 by jseliger2
Rena: but everytime we fix something, we break an existing search!
Us: yeah! we’re stuck in a whack-a-mole-game
other image: whack a mole by jencu
![Page 6: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/6.jpg)
OpenSource Connections
Whack-a-MoleWhat search relevancy work actually looks like
![Page 7: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/7.jpg)
OpenSource Connections
I HAVE AN IDEA● Middle of the afternoon, I stop doing search
work and start throwing together some python
from flask import Flaskapp = Flask(__name__)
Everyone: Doug, stop that, you have important search work to do!
Me: We’re not making any progress!WE NEED A WAY TO REGRESSION TEST OUR RELEVANCY AS WE TUNE!
Everyone: You’re nuts!
![Page 8: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/8.jpg)
OpenSource Connections
What did I make?Focus on gathering stakeholder (ie Rena) feedback on search, coupled w/ workbench tuning against that feedback
Today we have customers...
… forget that, tell me about your failures!
![Page 9: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/9.jpg)
OpenSource Connections
Our war storyMy mistakes:
● Building a product● Selling a product● As a user experience engineer● As an Angular developer● At choosing databases
![Page 10: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/10.jpg)
OpenSource Connections
Quepid 0.0.0.0.0.0.1Track multiple user searches
for this query (hdmi cables) Rena rates this document as a good/bad search result
need to store:<search> -> <id for search result> -> <rating 1-10>“hdmi cables” -> “doc1234” -> “10”
*Actual UI may have been much uglier
![Page 11: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/11.jpg)
OpenSource Connections
Data structure selection under duress
● What’s simple, easy, and will persist our data?
● What plays well with python?
● What can I get working now in Rena’s office?
![Page 12: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/12.jpg)
OpenSource Connections
Redis● In memory “Data Structure Server”
○ hashes, lists, simple key-> value storage
● Persistent -- write to disk every X minutes
![Page 13: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/13.jpg)
OpenSource Connections
Redis
from redis import Redisredis = Redis()redis.set("foo", "bar")redis.get("foo") # gets ‘bar’
$ pip install redis
Easy to install and go! Specific to our problem:
from redis import Redisredis = Redis()
ratings = {“doc1234”: “10”, “doc532”: “5”}searchQuery = “hdmi cables”
redis.hsetall(searchQuery, ratings)
Store a hash table at “hdmi cables” with:
“doc1234” -> “10”“doc532” -> “5”
![Page 14: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/14.jpg)
OpenSource Connections
Success!● My insanity paid off that afternoon
● Now we’re left with a pile of hacked together (terrible) code -- now what?
![Page 15: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/15.jpg)
OpenSource Connections
Adding some features● Would like to add multiple “cases”
(different search projects that solve different problems)
● Would like to add user accounts
● Still a one-off for Silverchair
![Page 16: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/16.jpg)
OpenSource Connections
CasesTuning a cable shopping site... … vs state laws
![Page 17: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/17.jpg)
OpenSource Connections
Cases in Redis?
from redis import Redisredis = Redis()
ratings = {“doc1234”: “10”, “doc532”: “5”}searchQuery = “hdmi cables”
redis.hset(searchQuery, ratings)
Recall our existing implementation“data model”
Out of the box, redis can deal with 2 levels deep:{
“hdmi cables”: {“doc1234”: “10”,“doc532”: “5”
},“ethernet cables”...
}
Can’t add extra layer (redis hash only one layer)
{“cable site”: {“hdmi cables”: {...}“ethernet cables”: {...}
}“laws site: {...}}
![Page 18: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/18.jpg)
OpenSource Connections
Time to give up Redis?“All problems in computer science can be solved by another level of indirection” -- David Wheeler
Crazy Idea: Add dynamic prefix to query keys to indicate case, ie:{
“case_cablestore_hdmi cables”: {“doc1234”: “10”,“doc532”: “5”
},“case_cablestore_ethernet cables”: {… },“case_statelaws_car tax”: { …}
}
Queries for “Cable Store” case
Query for “State Laws” case
redis.keys(“case_cablestore*”)
To Fetch:
![Page 19: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/19.jpg)
OpenSource Connections
Store other info about cases?New problem: we need to store some information about cases, case name, et
{“case_cablestore_hdmi cables”: {
“doc1234”: “10”,“doc532”: “5”
},“case_cablestore_ethernet cables”: {… },“case_statelaws_car tax”: { …}
}
Where would it go here?{
“case_cablestore” {“name”: “cablestore”,“created” “20140101”
},“case_cablestore_query_hdmi cables”: {
“doc1234”: “10”,“doc532”: “5”
},“case_cablestore_query_ethernet cables”:
{… },“case_statelaws_query_car tax”: { …}
}
![Page 20: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/20.jpg)
OpenSource Connections
Oh but let’s add usersExtrapolating on past patterns {
“user_doug” {“name”: “Doug”,“created_date”: “20140101”
},“user_doug_case_cablestore” {
“name”: “cablestore”,“created_date” “20140101”
},“user_doug_case_cablestore_query_hdmi cables”: {
“doc1234”: “10”,“doc532”: “5”
},“user_doug_case_cablestore_query_ethernet cables”:
{… },“user_tom_case_statelaws_query_car tax”: { …}
}image: Rage Wallpaper from Flickr user Thoth God of Knowledge
You right now!
![Page 21: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/21.jpg)
OpenSource Connections
Step BackWe ask ourselves: Is this tool a product? Is it useful outside of this customer?
What level of software engineering helps us move forward?
● Migrate to RDMS?● “NoSQL” options?● Clean up use of Redis somehow?
![Page 22: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/22.jpg)
OpenSource Connections
SubRedis
Operationalizes hierarchy inside of redis
https://github.com/softwaredoug/subredis
from redis import Redisfrom subredis import SubRedisredis = Redis()
sr = SubRedis(“case_%s” % caseId , redis)
ratings = {“doc1234”: “10”, “doc532”: “5”}searchQuery = “hdmi cables”
sr.hsetall(searchQuery, ratings)
Create a redis sandbox for this case
Interact with this case’s queries with redis sandbox specific to that case
Behind the scenes, subredis queries/appends the case_1 prefix to everything
![Page 23: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/23.jpg)
OpenSource Connections
SubRedis == composable
userSr = SubRedis(“user_%s” % userId , redis)
caseSr = SubRedis(“case_%s” % caseId , userSr)
# Sandbox redis for queries about userratings = {“doc1234”: “10”, “doc532”: “5”}searchQuery = “hdmi cables”
caseSr.hsetall(searchQuery, ratings)
SubRedis takes any Redis like thing, and works safely in that sandbox
Now working on sandbox, within a sandbox
![Page 24: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/24.jpg)
OpenSource Connections
Does something reasonable under the hood
{
“user_1_name”: “Doug”,“user_1_created_date”: “Doug”,“user_1_case_1_name”: “name”: “cablestore”“user_1_case_1_hdmi cables”: {
“doc1234”: “10”,“doc532”: “5”
},“user_2_name”, “Rena”,...
}
AllRedis
user_1 subred.
case_1subred.
![Page 25: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/25.jpg)
OpenSource Connections
We reflect again● Ok we tried this out as a product. Launched.
● Paid off *some* tech debt, but wtf are we doing
● Works well enough, we’ve got a bunch of new features, forge ahead
![Page 26: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/26.jpg)
OpenSource Connections
We reflect again● We have real customers
● Our backend is evolving away from simple key-value storage○ user accounts? users that share cases? stored
search snapshots? etc etc
![Page 27: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/27.jpg)
OpenSource Connections
Attack of the relationalGiven our current set of tools, how would we solve the problem“case X can be shared between multiple users”?
{
“user_1_name”: “Doug”,“user_1_created_date”: “Doug”,“user_1_case_1_name”: “name”: “cablestore”“user_1_case_1_hdmi cables”: {
“doc1234”: “10”,“doc532”: “5”
},“user_2_name”, “Rena”,“user_2_case_1_name”: “name”: “cablestore”“user_2_case_1_hdmi cables”: {
“doc1234”: “10”,“doc532”: “5”
},}
Could duplicate the data? This stinks!
● Updates require visiting many (every?) user, looking for this case
● Bloated database
Duplicate the data?
![Page 28: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/28.jpg)
OpenSource Connections
Attack of the relationalGiven our current set of tools, how would we solve the problem“case X can be shared between multiple users”?
{
“user_1_name”: “Doug”,“user_1_created_date”: “Doug”,“user_1_cases”: [1, ...]“case_1_name”: “name”: “cablestore”“case_1_hdmi cables”: {
“doc1234”: “10”,“doc532”: “5”
},“user_2_name”, “Rena”,“user_2_cases”: [1, ...]...
}
User 1
Case 1
User 2
Store list of owned cases
Break out cases to a top-level record?
![Page 29: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/29.jpg)
OpenSource Connections
SudRedisRelational?{
“user_1_name”: “Doug”,“user_1_created_date”: “Doug”,“user_1_cases”: [1, ...]“case_1_name”: “name”: “cablestore”“case_1_hdmi cables”: {
“doc1234”: “10”,“doc532”: “5”
},“user_2_name”, “Rena”,“user_2_cases”: [1, ...]...
}
We’ve actually just normalized our data.
Why was this good?● We want to update case 1 in isolation
without anomalies● We don’t want to visit every user to
update case 1!● We want to avoid duplication
We just made our “NoSQL” database a bit relational
![Page 30: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/30.jpg)
OpenSource Connections
Other Problems● Simple CRUD tasks like “delete a case”
need to be coded up
● We’re managing our own record ids
● Is any of this atomic? does it occur in isolation?
![Page 31: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/31.jpg)
OpenSource Connections
What’s our next DB?● These problems are hard, we need a new
DB
● We also need better tooling!
![Page 32: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/32.jpg)
OpenSource Connections
Irony● This is the exact situation we warn clients
about in our (No)SQL Architect Roles.○ Relational == General Purpose○ Many-many, many-one, one-many, etc○ Relational == consistent tooling
○ NoSQL == solve specific problems well
![Page 33: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/33.jpg)
OpenSource Connections
So we went relational!● Took advantage of great tooling: MySQL,
Sqlalchemy (ORM), Alembic (migrations)
● Modeled our data relationships exactly like we needed them to be modeled
![Page 34: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/34.jpg)
OpenSource Connections
Map db Python classes
class SearchQuery(Base): __tablename__ = 'query' id = Column(Integer, primary_key=True) search_string = Column(String) ratings = relationship("QueryRating")
class QueryRating(Base): __tablename__ = 'rating' id = Column(Integer, primary_key=True) doc_id = Column(String) rating = Column(Integer)
Can model my domain in coder-friendly classes class SearchQuery(Base):
__tablename__ = 'query' id = Column(Integer, primary_key=True) search_string = Column(String) ratings = relationship("QueryRating")
class QueryRating(Base): __tablename__ = 'rating' id = Column(Integer, primary_key=True) doc_id = Column(String) rating = Column(Integer)
![Page 35: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/35.jpg)
OpenSource Connections
Easy CRUDq = SearchQuery(search_string=”hdmi cable”)db.session.add(q)db.session.commit()
del q.ratings[0]db.session.add(q)db.session.commit()
q = SearchQuery.query.filter(id=1).one()q.search_string=”foo”db.session.add(q)db.session.commit()
Create!
Delete!
Update!
![Page 36: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/36.jpg)
OpenSource Connections
Migrations are good
alembic revision --autogenerate -m "name for tries"alembic upgrade headalembic downgrade 0ab51c25c
How do you upgrade your database to add/move/reorganize data?
● Redis this was always done manually/scripted
● Migrations with RDMS are a very robust/well-understood way to handle this
SQLAlchemy has “alembic” to help:
![Page 37: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/37.jpg)
OpenSource Connections
Modeling Users ←→ Casesassociation_table = Table(case2users, Base.metadata, Column('case_id', Integer, ForeignKey('case.id')), Column('user_id', Integer, ForeignKey('user.id')))
class User(Base): __tablename__ = 'user' id = Column(Integer, primary_key=True) cases = relationship("Case", secondary=association_table)
class Case(Base): __tablename__ = 'case' id = Column(Integer, primary_key=True)
Can model many-many relationships
![Page 38: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/38.jpg)
OpenSource Connections
Ultimate Query Flexibilityfor user in User.query.all(): for case in user.cases: print case.caseName
for user in User.query.filter(User.isPaying==True): for case in user.cases: print case.caseName
Print all cases:
Cases from paying members:
![Page 39: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/39.jpg)
OpenSource Connections
Lots of things easier● backups● robust hosting services (RDS)● industrial strength ACID with flexible
querying● 3rd-party tooling (ie VividCortex for MySQL)
![Page 40: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/40.jpg)
OpenSource Connections
When NoSQL?● Solve specific problems well
○ Optimize for specific query patterns○ Full-Text Search (Elasticsearch, Solr)○ Caching, shared data structure (Redis)
● Optimize for specific scaling problems○ Provide a denormalized “view” of your data for
specific task
![Page 41: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/41.jpg)
OpenSource Connections
Final ThoughtsSometimes RDMS’s have harder initial hurdle for setup, figuring out migrations; data modeling; etc
Why isn’t the easy path the wise path?
![Page 42: Stop Worrying & Love the SQL - A Case Study](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a59a361a28abb9578b4573/html5/thumbnails/42.jpg)
OpenSource Connections
In conclusion