tatyana matvienko,senior java developer, big data storages

19
Big Data Storages

Upload: alina-vilk

Post on 24-Jan-2017

40 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Tatyana Matvienko,Senior Java Developer, Big data storages

Big Data Storages

Page 2: Tatyana Matvienko,Senior Java Developer, Big data storages

Agenda[Big]Data Source: when it becomes Big?What cluster is? Horizontal and vertical scaling[Big]Data Storage challengesDisadvantagesNoSQL = Not only SQLMost popular and trendyTech Example: Apache Cassandra architectureDemo

Page 3: Tatyana Matvienko,Senior Java Developer, Big data storages

Big Data Storage ConceptsOnly stores facts (events), doesn’t analyze itImmutableTime series data (based on timestamps and, maybe,

origin)Store everything, delete nothing

Where: Messages (email, twitter), social networks, Sensor data (IoT), Log files, Locations

Page 4: Tatyana Matvienko,Senior Java Developer, Big data storages

Cluster. Horizontal and vertical scalingWhat cluster is?Load balancerCommunication:

master/slave architecture

Fault tolerance and replication factor

Page 5: Tatyana Matvienko,Senior Java Developer, Big data storages

Size (keep and search huge amount of data)

Speed (data acquisition, data search)

Availability (fault tolerance, partition tolerance)

Big Data Storage Challenges

Page 6: Tatyana Matvienko,Senior Java Developer, Big data storages

Disadvantages of Big Data Storages

No transactions (ACID)Less matureBig variety of concepts, lack of standardizationNo BI or analytics in queriesAdministration

Page 7: Tatyana Matvienko,Senior Java Developer, Big data storages

Distributed File storage

Amazon

Page 8: Tatyana Matvienko,Senior Java Developer, Big data storages
Page 9: Tatyana Matvienko,Senior Java Developer, Big data storages

Storages: Key-Value

Examples: Redis, DynamoDB, MemcacheDB, Riak KV, Aerospike, OrientDB

Page 10: Tatyana Matvienko,Senior Java Developer, Big data storages

Storages: Document oriented

Examples: Apache CouchDB, Couchbase, MongoDB

Page 11: Tatyana Matvienko,Senior Java Developer, Big data storages

Storages: Graphs

Examples: Allegro, Neo4J, OrientDB, Titan

Page 12: Tatyana Matvienko,Senior Java Developer, Big data storages

Storages: Column basedExamples: Cassandra, HBase, Accumulo, Vertica

Page 13: Tatyana Matvienko,Senior Java Developer, Big data storages

Why Cassandra?

Page 14: Tatyana Matvienko,Senior Java Developer, Big data storages

Apache Cassandra: basicsMasterless architecture with read/write anywhere design

All nodes are the same

No single point of failure

Zone support

Linear scalability

CQL - cassandra query language

Availability and Partition Tolerance but Eventual Consistency

Page 15: Tatyana Matvienko,Senior Java Developer, Big data storages
Page 16: Tatyana Matvienko,Senior Java Developer, Big data storages

Partitioning and Replication

Page 17: Tatyana Matvienko,Senior Java Developer, Big data storages

Data modeling

Page 18: Tatyana Matvienko,Senior Java Developer, Big data storages
Page 19: Tatyana Matvienko,Senior Java Developer, Big data storages

Demo