taming big data with nosql

Post on 15-Jul-2015

167 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

This presentation includes information that is confidential and proprietary to Basho Technologies and should not be forwarded or distributed without Basho's prior written consent. © 2014. Basho Technologies, Inc. All Rights Reserved.

This presentation includes information that is confidential and proprietary to Basho Technologies and should not be forwarded or distributed without Basho's prior written consent. © 2014. Basho Technologies, Inc. All Rights Reserved.

Matt Brender Developer Advocate

Taming

Big Data with NoSQL

Relational databases are not bad

Data Scientist Big Data

Basho Confidential 3

Data Scientist

Big Data

4

Big Data is

Basho Confidential 5

6

And it’s a distributed systems problem

Basho Confidential 7

Beyond the Scope of one tool

Beyond the Scope of one file system

Beyond the Scope of

one database

8

Ergo, NoSQL

9

NoSQL is

10

11

For Good Reason

Basho Confidential 12

13

Consistency LevelConflict ResolutionPartitioning Strategy

14

Consistency Level

Eventually Consistent

C = ConsistencyA = AvailabilityP = Partition Tolerance

Client Client

DBDBDB

Network Partition

Cap theorem states that a distributed system can at most support 2 out of these 3 properties

16

Consistency Level

17

Conflict Resolution

Last Write Winsvs.

Causal Context

18

Conflict Resolution

19

Partition Strategy

Master

Slave Slave Slave

OR

Node  1   Node  2   Node  3  

20

Partition Strategy

21

“data is powerful when stored and analyzed”

Relational databases are not bad

23

Storing

24

Report on this

Basho Confidential 25

Report on this

26

vs

27

A single scalable system

28

A single scalable system

29

Analyzing

What kind of questions do you need to ask?

Basho Confidential 31

32

Error Analysis?As simple as NoSQL + Solr

33

Patterns? Machine Learning?

Multi-client writes to NoSQL & HDFS

OR

35

NoSQL + ETL process => other datastore + Spark

and/or Hadoop M/R

OR

37

NoSQL & Apache Storm to Kafka to HDFS

38

LAMDA

39

So we agree. NoSQL is helpful.

Side Note: NoSQL is a terrible term

42

In Review

43

You can’t analyze what you don’t have.

And you don’t want an analysis system to be unreliable.

THE COST OF DOWNTIME

44

Basho Confidential 45

46

everything works at small scale

47

Nothing matters if..

Basho Confidential 48

•  Hadoop A framework that allows for the distributed processing of large data sets across clusters

•  SparkA fast, general engine for large-scale data processing

•  Storm A distributed real-time computation system

•  NoSQLA collection of highly scalable, highly available systems that fall within CAP theorem

•  SolrApache project for indexing text for search

•  KafkaDistributed scalable pub/sub messaging queue

Summary

50

Thank You!

Matt Brender@mjbrender

top related