taming big data with nosql

50
This presentation includes information that is condential and proprietary to Basho Technologies and should not be forwarded or distributed without Basho's prior written consent. © 2014. Basho Technologies, Inc. All Rights Reserved. This presentation includes information that is condential and proprietary to Basho Technologies and should not be forwarded or distributed without Basho's prior written consent. © 2014. Basho Technologies, Inc. All Rights Reserved. Matt Brender Developer Advocate Taming Big Data with NoSQL

Upload: basho-technologies

Post on 15-Jul-2015

167 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Taming Big Data with NoSQL

This presentation includes information that is confidential and proprietary to Basho Technologies and should not be forwarded or distributed without Basho's prior written consent. © 2014. Basho Technologies, Inc. All Rights Reserved.

This presentation includes information that is confidential and proprietary to Basho Technologies and should not be forwarded or distributed without Basho's prior written consent. © 2014. Basho Technologies, Inc. All Rights Reserved.

Matt Brender Developer Advocate

Taming

Big Data with NoSQL

Page 2: Taming Big Data with NoSQL

Relational databases are not bad

Data Scientist Big Data

Page 3: Taming Big Data with NoSQL

Basho Confidential 3

Data Scientist

Big Data

Page 4: Taming Big Data with NoSQL

4

Big Data is

Page 5: Taming Big Data with NoSQL

Basho Confidential 5

Page 6: Taming Big Data with NoSQL

6

And it’s a distributed systems problem

Page 7: Taming Big Data with NoSQL

Basho Confidential 7

Beyond the Scope of one tool

Beyond the Scope of one file system

Beyond the Scope of

one database

Page 8: Taming Big Data with NoSQL

8

Ergo, NoSQL

Page 9: Taming Big Data with NoSQL

9

NoSQL is

Page 10: Taming Big Data with NoSQL

10

Page 11: Taming Big Data with NoSQL

11

For Good Reason

Page 12: Taming Big Data with NoSQL

Basho Confidential 12

Page 13: Taming Big Data with NoSQL

13

Consistency LevelConflict ResolutionPartitioning Strategy

Page 14: Taming Big Data with NoSQL

14

Consistency Level

Page 15: Taming Big Data with NoSQL

Eventually Consistent

C = ConsistencyA = AvailabilityP = Partition Tolerance

Client Client

DBDBDB

Network Partition

Cap theorem states that a distributed system can at most support 2 out of these 3 properties

Page 16: Taming Big Data with NoSQL

16

Consistency Level

Page 17: Taming Big Data with NoSQL

17

Conflict Resolution

Last Write Winsvs.

Causal Context

Page 18: Taming Big Data with NoSQL

18

Conflict Resolution

Page 19: Taming Big Data with NoSQL

19

Partition Strategy

Master

Slave Slave Slave

OR

Node  1   Node  2   Node  3  

Page 20: Taming Big Data with NoSQL

20

Partition Strategy

Page 21: Taming Big Data with NoSQL

21

“data is powerful when stored and analyzed”

Page 22: Taming Big Data with NoSQL

Relational databases are not bad

Page 23: Taming Big Data with NoSQL

23

Storing

Page 24: Taming Big Data with NoSQL

24

Report on this

Page 25: Taming Big Data with NoSQL

Basho Confidential 25

Report on this

Page 26: Taming Big Data with NoSQL

26

vs

Page 27: Taming Big Data with NoSQL

27

A single scalable system

Page 28: Taming Big Data with NoSQL

28

A single scalable system

Page 29: Taming Big Data with NoSQL

29

Analyzing

Page 30: Taming Big Data with NoSQL

What kind of questions do you need to ask?

Page 31: Taming Big Data with NoSQL

Basho Confidential 31

Page 32: Taming Big Data with NoSQL

32

Error Analysis?As simple as NoSQL + Solr

Page 33: Taming Big Data with NoSQL

33

Patterns? Machine Learning?

Multi-client writes to NoSQL & HDFS

Page 34: Taming Big Data with NoSQL

OR

Page 35: Taming Big Data with NoSQL

35

NoSQL + ETL process => other datastore + Spark

and/or Hadoop M/R

Page 36: Taming Big Data with NoSQL

OR

Page 37: Taming Big Data with NoSQL

37

NoSQL & Apache Storm to Kafka to HDFS

Page 38: Taming Big Data with NoSQL

38

Page 39: Taming Big Data with NoSQL

LAMDA

39

Page 40: Taming Big Data with NoSQL

So we agree. NoSQL is helpful.

Page 41: Taming Big Data with NoSQL

Side Note: NoSQL is a terrible term

Page 42: Taming Big Data with NoSQL

42

In Review

Page 43: Taming Big Data with NoSQL

43

You can’t analyze what you don’t have.

And you don’t want an analysis system to be unreliable.

Page 44: Taming Big Data with NoSQL

THE COST OF DOWNTIME

44

Page 45: Taming Big Data with NoSQL

Basho Confidential 45

Page 46: Taming Big Data with NoSQL

46

everything works at small scale

Page 47: Taming Big Data with NoSQL

47

Nothing matters if..

Page 48: Taming Big Data with NoSQL

Basho Confidential 48

Page 49: Taming Big Data with NoSQL

•  Hadoop A framework that allows for the distributed processing of large data sets across clusters

•  SparkA fast, general engine for large-scale data processing

•  Storm A distributed real-time computation system

•  NoSQLA collection of highly scalable, highly available systems that fall within CAP theorem

•  SolrApache project for indexing text for search

•  KafkaDistributed scalable pub/sub messaging queue

Summary

Page 50: Taming Big Data with NoSQL

50

Thank You!

Matt Brender@mjbrender