tatyana matvienko,senior java developer, big data storages
TRANSCRIPT
![Page 1: Tatyana Matvienko,Senior Java Developer, Big data storages](https://reader035.vdocuments.net/reader035/viewer/2022081513/588778d71a28ab63208b46c7/html5/thumbnails/1.jpg)
Big Data Storages
![Page 2: Tatyana Matvienko,Senior Java Developer, Big data storages](https://reader035.vdocuments.net/reader035/viewer/2022081513/588778d71a28ab63208b46c7/html5/thumbnails/2.jpg)
Agenda[Big]Data Source: when it becomes Big?What cluster is? Horizontal and vertical scaling[Big]Data Storage challengesDisadvantagesNoSQL = Not only SQLMost popular and trendyTech Example: Apache Cassandra architectureDemo
![Page 3: Tatyana Matvienko,Senior Java Developer, Big data storages](https://reader035.vdocuments.net/reader035/viewer/2022081513/588778d71a28ab63208b46c7/html5/thumbnails/3.jpg)
Big Data Storage ConceptsOnly stores facts (events), doesn’t analyze itImmutableTime series data (based on timestamps and, maybe,
origin)Store everything, delete nothing
Where: Messages (email, twitter), social networks, Sensor data (IoT), Log files, Locations
![Page 4: Tatyana Matvienko,Senior Java Developer, Big data storages](https://reader035.vdocuments.net/reader035/viewer/2022081513/588778d71a28ab63208b46c7/html5/thumbnails/4.jpg)
Cluster. Horizontal and vertical scalingWhat cluster is?Load balancerCommunication:
master/slave architecture
Fault tolerance and replication factor
![Page 5: Tatyana Matvienko,Senior Java Developer, Big data storages](https://reader035.vdocuments.net/reader035/viewer/2022081513/588778d71a28ab63208b46c7/html5/thumbnails/5.jpg)
Size (keep and search huge amount of data)
Speed (data acquisition, data search)
Availability (fault tolerance, partition tolerance)
Big Data Storage Challenges
![Page 6: Tatyana Matvienko,Senior Java Developer, Big data storages](https://reader035.vdocuments.net/reader035/viewer/2022081513/588778d71a28ab63208b46c7/html5/thumbnails/6.jpg)
Disadvantages of Big Data Storages
No transactions (ACID)Less matureBig variety of concepts, lack of standardizationNo BI or analytics in queriesAdministration
![Page 7: Tatyana Matvienko,Senior Java Developer, Big data storages](https://reader035.vdocuments.net/reader035/viewer/2022081513/588778d71a28ab63208b46c7/html5/thumbnails/7.jpg)
Distributed File storage
Amazon
![Page 8: Tatyana Matvienko,Senior Java Developer, Big data storages](https://reader035.vdocuments.net/reader035/viewer/2022081513/588778d71a28ab63208b46c7/html5/thumbnails/8.jpg)
![Page 9: Tatyana Matvienko,Senior Java Developer, Big data storages](https://reader035.vdocuments.net/reader035/viewer/2022081513/588778d71a28ab63208b46c7/html5/thumbnails/9.jpg)
Storages: Key-Value
Examples: Redis, DynamoDB, MemcacheDB, Riak KV, Aerospike, OrientDB
![Page 10: Tatyana Matvienko,Senior Java Developer, Big data storages](https://reader035.vdocuments.net/reader035/viewer/2022081513/588778d71a28ab63208b46c7/html5/thumbnails/10.jpg)
Storages: Document oriented
Examples: Apache CouchDB, Couchbase, MongoDB
![Page 11: Tatyana Matvienko,Senior Java Developer, Big data storages](https://reader035.vdocuments.net/reader035/viewer/2022081513/588778d71a28ab63208b46c7/html5/thumbnails/11.jpg)
Storages: Graphs
Examples: Allegro, Neo4J, OrientDB, Titan
![Page 12: Tatyana Matvienko,Senior Java Developer, Big data storages](https://reader035.vdocuments.net/reader035/viewer/2022081513/588778d71a28ab63208b46c7/html5/thumbnails/12.jpg)
Storages: Column basedExamples: Cassandra, HBase, Accumulo, Vertica
![Page 13: Tatyana Matvienko,Senior Java Developer, Big data storages](https://reader035.vdocuments.net/reader035/viewer/2022081513/588778d71a28ab63208b46c7/html5/thumbnails/13.jpg)
Why Cassandra?
![Page 14: Tatyana Matvienko,Senior Java Developer, Big data storages](https://reader035.vdocuments.net/reader035/viewer/2022081513/588778d71a28ab63208b46c7/html5/thumbnails/14.jpg)
Apache Cassandra: basicsMasterless architecture with read/write anywhere design
All nodes are the same
No single point of failure
Zone support
Linear scalability
CQL - cassandra query language
Availability and Partition Tolerance but Eventual Consistency
![Page 15: Tatyana Matvienko,Senior Java Developer, Big data storages](https://reader035.vdocuments.net/reader035/viewer/2022081513/588778d71a28ab63208b46c7/html5/thumbnails/15.jpg)
![Page 16: Tatyana Matvienko,Senior Java Developer, Big data storages](https://reader035.vdocuments.net/reader035/viewer/2022081513/588778d71a28ab63208b46c7/html5/thumbnails/16.jpg)
Partitioning and Replication
![Page 17: Tatyana Matvienko,Senior Java Developer, Big data storages](https://reader035.vdocuments.net/reader035/viewer/2022081513/588778d71a28ab63208b46c7/html5/thumbnails/17.jpg)
Data modeling
![Page 18: Tatyana Matvienko,Senior Java Developer, Big data storages](https://reader035.vdocuments.net/reader035/viewer/2022081513/588778d71a28ab63208b46c7/html5/thumbnails/18.jpg)
![Page 19: Tatyana Matvienko,Senior Java Developer, Big data storages](https://reader035.vdocuments.net/reader035/viewer/2022081513/588778d71a28ab63208b46c7/html5/thumbnails/19.jpg)
Demo