nosql databases and polyglot applications
TRANSCRIPT
Bases de Datos NoSQL Y Aplicaciones Poliglotas
Agus%n Magaña Falconi
AGENDA
• Un poco de historia • Definición de NoSQL • Tipos de NoSQL • ¿Porqué Aplicaciones Poliglotas? • Ejemplo usando REDIS
Edgar Frank "Ted" Codd
1980’s
Tendencia: Menos uniformidad
Datos dispersos: BD Relacionales
Tendencia: Crecimiento exponencial de datos
Not Only SQL
Not Only SQL
• No usan SQL como lenguaje principal de consultas • No se requieren estructuras fijas como tablas • No JOINsa • No se garanAza ACID • Escalan bien horizontalmente
Tipos de Bases de Datos NoSQL
• Key/Value stores • Document Databases • Graph Databases • Column Oriented Databases
Key/Value Stores
• Apache Cassandra • BigTable de Google • Dynamo de Amazon • Voldemort de LinkedIn • Memcached • Oracle NoSQL Database • Redis
Graph Databases
• Neo4j • DEX • AlegroGraph • OrientDB • InfiniteGraph • InfoGraph • HyperGraphDB
Document Databases
• MongoDB • CouchDB • BaseX • Djondb • SimpleDB • Terrastore
Column Oriented Databases
• Cassandra • BigTable • Hbase • Hypertable
¿Aplicaciones Políglotas?
Food To Go Architecture
Order taking
Restaurant Management
MySQL Database
CONSUMER RESTAURANT OWNER
Limitaciones de las Bases de Datos Relacionales
• Escalabilidad • Distribución • Modifición del schema • O/R impedance mismatch • Manejor de datos semi-‐estructurados
Solución: Gastar mucho dinero
Solución: Usar NoSQL
• Alto performance • Alta escalabilidad
• Rico modelo de datos
• Schema-‐less
Beneficios • Limitaciones en transacciones
• Limitaciones en queries • Consistencia no garanAzada
• Datos sin constrains
Desventajas
Redis
• Almacenamiento key-‐value avanzado
• Muy rápida (100K reqs/seg)
• Persistencia opcional
• Transacciones con bloqueo opAmista
• Replicación de información Master/Slave
Sorted sets
myset a
5.0
b
10.
Key Value
ScoreMembers are sorted by score
Adding members to a sorted setRedis Server
zadd myset 5.0 a myset a
5.0
Key Score Value
Adding members to a sorted setRedis Server
zadd myset 10.0 b myset a
5.0
b
10.
Adding members to a sorted setRedis Server
zadd myset 1.0 c myset a
5.0
b
10.
c
1.0
Retrieving members by index range
Redis Server
zrange myset 0 1
myset a
5.0
b
10.
c
1.0ac
Key StartIndex
EndIndex
Retrieving members by score
Redis Server
zrangebyscore myset 1 6
myset a
5.0
b
10.
c
1.0ac
Key Min value
Max value
Redis es bueno pero ]ene desventajas
• Búsquedas solo por PK
• Modelo de transacciones limitado:
• Lee primero y ejecuta updates como batch.
• Los datos deben de caber en memoria
• Le faltan funcionalidad de controles de acceso
Caching with Redis
Order taking
Restaurant Management
MySQL Database
CONSUMER RESTAURANT OWNER
RedisCacheFirst Second
TimeRange MenuItem
Domain object to key-value mapping?
Restaurant
TimeRange MenuItem
ServiceArea
K1 V1
K2 V2
... ...
Finding available restaurants
Available restaurants =Serve the zip code of the delivery address
AND Are open at the delivery time
public interface AvailableRestaurantRepository {
List<AvailableRestaurant> ! findAvailableRestaurants(Address deliveryAddress, Date deliveryTime); ...}
Food to Go – Domain model (partial)
class Restaurant { long id; String name; Set<String> serviceArea; Set<TimeRange> openingHours; List<MenuItem> menuItems;}
class MenuItem { String name; double price;}
class TimeRange { long id; int dayOfWeek; int openTime; int closeTime;}
Database schemaID Name …
1 Ajanta
2 Montclair Eggshop
Restaurant_id zipcode
1 94707
1 94619
2 94611
2 94619
Restaurant_id dayOfWeek openTime closeTime
1 Monday 1130 1430
1 Monday 1730 2130
2 Tuesday 1130 …
RESTAURANT table
RESTAURANT_ZIPCODE table
RESTAURANT_TIME_RANGE table
Finding available restaurants on Monday, 6.15pm for 94619 zipcode
Straightforward three-way join
select r.*from restaurant r inner join restaurant_time_range tr on r.id =tr.restaurant_id inner join restaurant_zipcode sa on r.id = sa.restaurant_id where ’94619’ = sa.zip_code and tr.day_of_week=’monday’ and tr.openingtime <= 1815 and 1815 <= tr.closingtime
BUT how to implement findAvailableRestaurants() with Redis?!
?select r.*from restaurant r inner join restaurant_time_range tr on r.id =tr.restaurant_id inner join restaurant_zipcode sa on r.id = sa.restaurant_id where ’94619’ = sa.zip_code and tr.day_of_week=’monday’ and tr.openingtime <= 1815 and 1815 <= tr.closingtime
K1 V1
K2 V2
... ...
Where we need to be
ZRANGEBYSCORE myset 1 6
select value,scorefrom sorted_setwhere key = ‘myset’ and score >= 1 and score <= 6
=
key value score
sorted_set
We need to denormalize
Think materialized view
Simplification #1: Denormalization
Restaurant_id Day_of_week Open_time Close_time Zip_code
1 Monday 1130 1430 94707
1 Monday 1130 1430 94619
1 Monday 1730 2130 94707
1 Monday 1730 2130 94619
2 Monday 0700 1430 94619
…
SELECT restaurant_id FROM time_range_zip_code WHERE day_of_week = ‘Monday’ AND zip_code = 94619 AND 1815 < close_time AND open_time < 1815
Simpler query: No joins Two = and two <
Simplification #2: Application filtering
SELECT restaurant_id, open_time FROM time_range_zip_code WHERE day_of_week = ‘Monday’ AND zip_code = 94619 AND 1815 < close_time AND open_time < 1815
Even simpler query• No joins• Two = and one <
Simplification #3: Eliminate multiple =’s with concatenation
SELECT restaurant_id, open_time FROM time_range_zip_code WHERE zip_code_day_of_week = ‘94619:Monday’ AND 1815 < close_time
Restaurant_id Zip_dow Open_time Close_time
1 94707:Monday 1130 1430
1 94619:Monday 1130 1430
1 94707:Monday 1730 2130
1 94619:Monday 1730 2130
2 94619:Monday 0700 1430
…
key
range
Simplification #4: Eliminate multiple RETURN VALUES with concatenation
SELECT open_time_restaurant_id, FROM time_range_zip_code WHERE zip_code_day_of_week = ‘94619:Monday’ AND 1815 < close_time
zip_dow open_time_restaurant_id close_time
94707:Monday 1130_1 1430
94619:Monday 1130_1 1430
94707:Monday 1730_1 2130
94619:Monday 1730_1 2130
94619:Monday 0700_2 1430
...
✔
zip_dow open_time_restaurant_id close_time
94707:Monday 1130_1 1430
94619:Monday 1130_1 1430
94707:Monday 1730_1 2130
94619:Monday 1730_1 2130
94619:Monday 0700_2 1430
...
Sorted Set [ Entry:Score, …]
[1130_1:1430, 1730_1:2130]
[0700_2:1430, 1130_1:1430, 1730_1:2130]
Using a Redis sorted set as an index
Key
94619:Monday
94707:Monday
94619:Monday 0700_2 1430
Querying with ZRANGEBYSCORE
ZRANGEBYSCORE 94619:Monday 1815 2359
{1730_1}
1730 is before 1815 Ajanta is open
Delivery timeDelivery zip and day
Sorted Set [ Entry:Score, …]
[1130_1:1430, 1730_1:2130]
[0700_2:1430, 1130_1:1430, 1730_1:2130]
Key
94619:Monday
94707:Monday
Sorry Ted
The future is polyglot
IEEE Software Sept/October 2010 - Debasish Ghosh / Twitter @debasishg
e.g. Netflix• RDBMS• SimpleDB• Cassandra• Hadoop/Hbase
Algunos usarios NoSQL
Gracias