webinar: strongly typed languages and flexible schemas

60

Upload: mongodb

Post on 15-Aug-2015

1.309 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Webinar: Strongly Typed Languages and Flexible Schemas
Page 2: Webinar: Strongly Typed Languages and Flexible Schemas

Strongly Typed Languages and Flexible Schemas

Page 3: Webinar: Strongly Typed Languages and Flexible Schemas

3

Agenda

Strongly Typed Languages

Flexible Schema Databases

Change Management

Strategies

Tradeoffs

Page 4: Webinar: Strongly Typed Languages and Flexible Schemas

Strongly Typed Languages

Page 5: Webinar: Strongly Typed Languages and Flexible Schemas

"A programming language that requires a variable to be defined as well as the variable it is"

Page 6: Webinar: Strongly Typed Languages and Flexible Schemas

Flexible Schema Databases

Page 7: Webinar: Strongly Typed Languages and Flexible Schemas

7

Traditional RDMS

create table users (id int, firstname text, lastname text);

Table definition

Column structure

Page 8: Webinar: Strongly Typed Languages and Flexible Schemas

8

Traditional RDMS

Table with checks

create table cat_pictures(

id int not null,

size int not null,

picture blob not null,

user_id int,

primary key (id),

foreign key (user_id) references users(id));

Null checks

Foreign and Primary key checks

Page 9: Webinar: Strongly Typed Languages and Flexible Schemas

9

Traditional RDMS

users cat_pictures

1 N

Page 10: Webinar: Strongly Typed Languages and Flexible Schemas

10

Is this Flexible?

• What happens when we need to change the schema?– Add new fields– Add new relations– Change data types

• What happens when we need to scale out our data structure?

Page 11: Webinar: Strongly Typed Languages and Flexible Schemas

11

Flexible Schema Database

Document Graph Key Value

Page 12: Webinar: Strongly Typed Languages and Flexible Schemas

12

Flexible Schema

• No mandatory schema definition• No structure restrictions• No schema validation process

Page 13: Webinar: Strongly Typed Languages and Flexible Schemas

13

We start from code

public class CatPicture {

int size;byte[] blob;

}

public class User {

int id;String firstname;String lastname;

CatPicture[] cat_pictures;

}

Page 14: Webinar: Strongly Typed Languages and Flexible Schemas

14

Document Structure

{ _id: 1234, firstname: 'Juan', lastname: 'Olivo', cat_pictures: [ { size: 10, picture: BinData("0x133334299399299432"), } ]}

Rich Data Types

Embedded Documents

Page 15: Webinar: Strongly Typed Languages and Flexible Schemas

15

Flexible Schema Databases

• Challenges–Different Versions of Documents–Different Structures of Documents–Different Value Types for Fields in

Documents

Page 16: Webinar: Strongly Typed Languages and Flexible Schemas

16

Different Versions of Documents

Same document across time suffers changes on how it represents data

{ "_id" : 174, "firstname": "Juan" }

{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo" }

First Version

Second Version

{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo" , "cat_pictures": [{"size": 10, picture: BinData("0x133334299399299432")}]}

Third Version

Page 17: Webinar: Strongly Typed Languages and Flexible Schemas

17

Different Versions of Documents

Same document across time suffers changes on how it represents data

{ "_id" : 174, "firstname": "Juan" }

{ "_id" : 174, "name": { "first": "Juan", "last": "Olivo"} }

Different Structure

Page 18: Webinar: Strongly Typed Languages and Flexible Schemas

18

Different Structures of Documents

Different documents coexisting on the same collection

{ "_id" : 175, "brand": "Ford", "model": "Mustang", "date": ISODate("XXX") }

{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo" }

Within same collection

Page 19: Webinar: Strongly Typed Languages and Flexible Schemas

19

Different Data Types for Fields

Different documents coexisting on the same collection

{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo", "bdate": 1224234312}

{ "_id" : 175, "firstname": "Paco", "lastname": "Hernan", "bdate": "2015-06-27"}

{ "_id" : 176, "firstname": "Tomas", "lastname": "Marce", "bdate": ISODate("2015-06-27")}

Same field, different data type

Page 20: Webinar: Strongly Typed Languages and Flexible Schemas

Change Management

Page 21: Webinar: Strongly Typed Languages and Flexible Schemas

21

Change Management

Versioning Class Loading

How to set correct data format versioning?

What mechanisms are out there to make this work ?

Page 22: Webinar: Strongly Typed Languages and Flexible Schemas

Strategies

Page 23: Webinar: Strongly Typed Languages and Flexible Schemas

23

Strategies

• Decoupling Architectures• ODM'S• Versioning• Data Migrations

Page 24: Webinar: Strongly Typed Languages and Flexible Schemas

Decoupled Architectures

Page 25: Webinar: Strongly Typed Languages and Flexible Schemas

25

Strongly Coupled

Page 26: Webinar: Strongly Typed Languages and Flexible Schemas

26

Becomes a mess in your hair…

Page 27: Webinar: Strongly Typed Languages and Flexible Schemas

Coupled Architectures

DatabaseApplication A

Application C

Application B Let me perform some schema

changes!

Page 28: Webinar: Strongly Typed Languages and Flexible Schemas

Decoupled Architecture

DatabaseApplication A API

Application C

Application B

Page 29: Webinar: Strongly Typed Languages and Flexible Schemas

29

Decoupled Architectures

• Allows the business logic to evolve independently of the data layer

• Decouples the underlying storage / persistency option from the business service

• Changes are "requested" and not imposed across all applications

• Better versioning control of each request and it's mapping

Page 30: Webinar: Strongly Typed Languages and Flexible Schemas

ODM's

Page 31: Webinar: Strongly Typed Languages and Flexible Schemas

31

ODM

• Reduce impedance between code and Databases• Data management facilitator • Hides complexity of operators• Tries to decouple business complexity with "magic"

recipes

Page 32: Webinar: Strongly Typed Languages and Flexible Schemas

32

Spring Data

• POJO centric model• MongoTemplate || CrudRepository

extensions to make the connection to the repositories

• Uses annotations to override default field names and even data types (data type mapping)

public interface UserRepository extends MongoRepository<User, Integer>{

}

public class User {

@Idint id;

@Field("first_name")String firstname;String lastname;

Page 33: Webinar: Strongly Typed Languages and Flexible Schemas

33

Spring Data Document Structure

{ "_id": 1, "first_name": "first", "lastname": "last", "catpictures": [ { "size": 10, "blob": BinData(0, "Kr3AqmvV1R9TJQ==") }, ]}

Page 34: Webinar: Strongly Typed Languages and Flexible Schemas

34

Spring Data Considerations

• Data formats, versions and types still need to be managed

• Does not solve issues like type validation out-of-box• Can make things more complicated but more

"controllable"@Field("first_name")String firstname;

Page 35: Webinar: Strongly Typed Languages and Flexible Schemas

35

Morphia

• Data source centric• Will do all the discovery of POJO's for

given package• Also uses annotations to perform

overrides and deal with object mapping

@Entity("users")public class User {

@Idint id;String firstname;String lastname;

morphia.mapPackage("examples.odms.morphia.pojos");

Datastore datastore = morphia.createDatastore(new MongoClient(), "morphia_example");datastore.save(user);

Page 36: Webinar: Strongly Typed Languages and Flexible Schemas

36

Morphia Document Structure

{ "_id": 1, "className": "examples.odms.morphia.pojos.User", "firstname": "first", "lastname": "last", "catpictures": [ { "size": 10, "blob": BinData(0, "Kr3AqmvV1R9TJQ==") }, ]}

Class Definition

Page 37: Webinar: Strongly Typed Languages and Flexible Schemas

37

Morphia Considerations

• Enables better control at Class loading• Also facilitates, like Spring Data, the field overriding (tags

to define field keys)• Better support for Object Polymorphism

Page 38: Webinar: Strongly Typed Languages and Flexible Schemas

Versioning

Page 39: Webinar: Strongly Typed Languages and Flexible Schemas

39

Versioning

Versioning of data structures (specially documents) can be very helpful

Recreate documents over time

Flow Control

Data / Field Multiversion Requirements

Archiving and History Purposes

Page 40: Webinar: Strongly Typed Languages and Flexible Schemas

40

Versioning – Option 0

Change existing document each time there is a write with monotonically increasing version number inside

{ "_id" : 174, "v" : 1, "firstname": "Juan" }

{ "_id" : 174, "v" : 2, "firstname": "Juan", "lastname": "Olivo" }

{ "_id" : 174, "v" : 3, "firstname": "Juan", "lastname": "Olivo", "gender": "M" }

> db.users.update( {"_id":174 } , { {"$set" :{ ... }, {"$inc": { "v": 1 }} } )

Increment field value

Page 41: Webinar: Strongly Typed Languages and Flexible Schemas

41

Versioning – Option 1

Store full document each time there is a write with monotonically increasing version number inside

{ "docId" : 174, "v" : 1, "firstname": "Juan" }

{ "docId" : 174, "v" : 2, "firstname": "Juan", "lastname": "Olivo" }

{ "docId" : 174, "v" : 3, "firstname": "Juan", "lastname": "Olivo", "gender": "M" }

> db.users.insert( {"docId":174 …})

> db.docs.find({"docId":174}).sort({"v":-1}).limit(-1);

Find always latest version

Page 42: Webinar: Strongly Typed Languages and Flexible Schemas

42

Versioning – Option 2

Store all document versions inside a single document.

> db.users.update( {"_id": 174 } , { {"$set" :{ "current": ... }, {"$inc": { "current.v": 1 }}, {"$addToSet": {"prev": {... }}} } )

Current value

{ "_id" : 174, "current" : { "v" :3, "attr1": 184, "attr2" : "A-1" }, "prev" : [ { "v" : 1, "attr1": 165 }, { "v" : 2, "attr1": 165, "attr2": "A-1" } ]}

Previous values

Page 43: Webinar: Strongly Typed Languages and Flexible Schemas

43

Versioning – Option 3

Keep collection for "current" version and past versions

> db.users.find( {"_id": 174 })

> db.users_past.find( {"pid": 174 })

{ "pid" : 174, "v" : 1, "firstname": "Juan" }

{ "pid" : 174, "v" : 2, "firstname": "Juan", "lastname": "Olivo" }

{ "_id" : 174, "v" : 3, "firstname": "Juan", "lastname": "Olivo", "gender": "M" }

Previous versions collection

Current collection

Page 44: Webinar: Strongly Typed Languages and Flexible Schemas

44

Versioning

Schema Fetch 1 Fetch Many Update Recover if Fail

0) Increment Version

Easy, Fast Fast Easy Medium N/A

1) New Document

Easy, Fast Not Easy, Slow

Medium Hard

2) Embedded in Single Doc

Easy, Fastest

Easy, Fastest Medium N/A

3) Separate Collection

Easy, Fastest

Easy, Fastest Medium Medium, Hard

Page 45: Webinar: Strongly Typed Languages and Flexible Schemas

Migrations

Page 46: Webinar: Strongly Typed Languages and Flexible Schemas

46

Migrations

Several types of "Migrations":

Add/Remove Fields

Change Field Names

Change Field Data Type

Extract Embedded Document into Collection

Page 47: Webinar: Strongly Typed Languages and Flexible Schemas

47

Add / Remove Fields

For Flexible Schema Database this is our Bread & Butter

{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo", "gender": "M" }

{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo", "newfield": "value" }

> db.users.update( {"_id": 174}, {"$set": { "newfield": "value" }, "$unset": {"gender":""} })

Page 48: Webinar: Strongly Typed Languages and Flexible Schemas

48

Change Field Names

Again, programmatically you can do it

{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo",}

{ "_id" : 174, "first": "Juan", "last": "Olivo" }

> db.users.update( {"_id": 174}, {"$rename": { "firstname": "first", "lastname":"last"} })

Page 49: Webinar: Strongly Typed Languages and Flexible Schemas

49

Change Field Data Type

Align to a new code change and move from Int to String

{..."bdate": 1435394461522} {..."bdate": "2015-06-27"}

1) Batch Process

2) Aggregation Framework

3) Change based on usage

Page 50: Webinar: Strongly Typed Languages and Flexible Schemas

50

Change Field Data Type1) Batch Process – bulk api

public void migrateBulk(){DateFormat df = new SimpleDateFormat("yyyy-MM-DD");...List<UpdateOneModel<Document>> toUpdate =

new ArrayList<UpdateOneModel<Document>>();for (Document doc : coll.find()){

String dateAsString = df.format( new Date( doc.getInteger("bdate", 0) ));Document filter = new Document("_id", doc.getInteger("_id"));Document value = new Document("bdate", dateAsString);Document update = new Document("$set", value);

toUpdate.add(new UpdateOneModel<Document>(filter, update));}coll.bulkWrite(toUpdate);

Page 51: Webinar: Strongly Typed Languages and Flexible Schemas

51

Change Field Data Type1) Batch Process – bulk api

public void migrateBulk(){...for (Document doc : coll.find()){

...}coll.bulkWrite(toUpdate);

Is there any problem with this?

Page 52: Webinar: Strongly Typed Languages and Flexible Schemas

52

Change Field Data Type1) Batch Process – bulk api

public void migrateBulk(){...//bson type 16 represents int32 data typeDocument query = new Document("bdate", new Document("$type", "16"));for (Document doc : coll.find(query)){

...}

coll.bulkWrite(toUpdate);More efficient filtering!

Page 53: Webinar: Strongly Typed Languages and Flexible Schemas

53

Extract Document into CollectionNormalize your schema

{"size": 10, picture: BinData("0x133334299399299432")}{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo",}

> db.users.aggregate( [ {$unwind: "$cat_pictures"}, {$project: { "_id":0, "uid":"$_id", "size": "$cat_pictures.size", "picture": "$cat_pictures.picture"}}, {$out:"cats"}])

{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo" , "cat_pictures": [{"size": 10, picture: BinData(0, "m/lhLlLmoNiUKQ==")}]}

{"size": 10, "picture": BinData(0, "m/lhLlLmoNiUKQ==")}

Page 54: Webinar: Strongly Typed Languages and Flexible Schemas

Tradeoffs

Page 55: Webinar: Strongly Typed Languages and Flexible Schemas

55

Tradeoffs

Positives Penalties

Decoupled Architecture - Should be your default approach

- Clean Solution - Scalable

N/A

Data Structures Variability - Reflects Nowadays data structures

- You can push decisions for later

- More complex code base

Data Structures Strictness - Simple to maintain- Always aligned with your

code base

- Will eventually need Migrations

- Restricts your code iterations

Page 56: Webinar: Strongly Typed Languages and Flexible Schemas

Recap

Page 57: Webinar: Strongly Typed Languages and Flexible Schemas

57

Recap

• Flexible and Dynamic Schemas are a great tool– Use them wisely – Make sure you understand the tradeoffs– Make sure you understand the different strategies and

options

• Works well with Strongly Typed Languages

Page 58: Webinar: Strongly Typed Languages and Flexible Schemas

58

Free Educationhttps://university.mongodb.com/courses/M101J/about

Page 59: Webinar: Strongly Typed Languages and Flexible Schemas

Obrigado!• Norberto Leite• Technical Evangelist• http://www.mongodb.com/norberto• [email protected]• @nleite

Page 60: Webinar: Strongly Typed Languages and Flexible Schemas