big data, cloud computing and no sql

Post on 10-May-2015

346 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Cloud, Big Data and No Sql are popular buzz words today. This presentation shows how they all fit together. It makes sense in all of the above and show how these new technologies can help the business become more productive.

TRANSCRIPT

© Copyright SELA software & Education Labs Ltd. | 14-18 Baruch Hirsch St Bnei Brak, 51202 Israel | www.selagroup.com

SELA DEVELOPER PRACTICEDecember 15-19, 2013

Manu Cohen-Yashar

The Cloud, Big Data and NoSQL

Agenda

What is the cloudData boom No SQLBig DataCloud DistributionsWhat’s next

Make sense of : Cloud , Big Data and No SQL

How they fit together

Make money !!!

What is the cloud

Cloud Computing is an Idea …

Infrastructure is provisioned by a cloud provider.Automatic Scale.Elasticity. Pay as you use.Availability.Simple, Automatic, Economic.

Type of Clouds

IAASPAASSAASand more…

Identity As A ServiceConnectivity As A Service

Storage As A Service

Lots of Data

Data is doubles every 18 monthPicturesWeb siteemailsSensorsGeo InformationFinancial InformationScienceArt. . . (Infinite list)

No Limits

With the cloud it is now possible to mount any size if cluster and conduct any computation in any scale.The one who will make sense of all available data will rule the world.

The conclusion: Use the cloud to analyze large scale of data.

Lets Talk about data

When we think of data we think of …

Data has many forms

Yet data comes in many forms and shapes

Graphs Documents

Time Series

Blobs

GeoSensors

UnstructuredStructured

Web

No Relational

Not all types of data fit well into the relational world.Not all data use cases fit well into the ACID conventionThe relational model does not scale very good

Difficult to distributeDifficult to replicate

The CAP Theory

RDBMS

Replicated NoSQL

ShardedNoSQL

During a network partition, a distributed system must choose either Consistency or Availability.

NO SQL

Large family of databasesNo SchemaNo relations enforcedDesigned for high scale and distribution

Types of NO SQL DBKey ValueWide ColumnsDocumentsGraph

Motivation for NO SQL

Large Scale and DistributionSimplicityLow costGood fit with the data modelVolume, Velocity and Variety

There is no one NO SQL solution for all use cases

Important

There are over than 150 possible offerings…

The Cloud and NO SQL

All Cloud Providers have NO SQL solutionsAzure TablesGoogle Big TableAmazon DynamoDB

NO SQL Databases are deployed on a cluster

There are large number of cloud hosting offerings for no-sql clusters

MongoHQ (MongoDB)Cassandra on Google Compute engineMany more

Example – Mongo in Azure

Big Data

What is Big?“Big” cannot fit on a single machine.

Conclusion:Big data has to be distributed.

Types of Big Data Processing

QueryGeneral AnalysisClassificationRecommendationClusteringAuditing and monitoringMore…

Challenges

Develop a parallel algorithmReduce the network traffic -> bring compute to dataMonitor and manage large number of parallel tasksSurvive failuresPerformanceLinear scale

Batch Processing VS Operational Intelligence

Batch ProcessingWork on existing dataProvide results within minutes

Operational IntelligenceWork on stream of dataProvide real-time results

Distributed File System

No one server can store Big Data filesDistribute files across clusterFailure is part of the gameSimilar API to traditional File SystemsExamples:

HDFSGFSCassandra FSMongo FS

Hadoop

Big Data Analysis PlatformBatch ProcessingBrings Compute tasks to data nodesParallel Processing using Map-ReduceOpen Source Huge eco system

Hadoop Eco System

Writing a valuable Map-Reduce job for Hadoop is not simpleMany open source projects provide abstractions

PigHiveHBaseSqoopMahoutZooKeeperMore

Hadoop on the Cloud

Hadoop runs on a clusterYou can use a cluster as a service on major cloud offerings

Storm

Real-Time big data analyticsProcess streams of dataCan be used with any programming languageWide integration with data sources

Check your schema

Be open to use NO-SQL data storesIdentify your use-case and find the right database for youCreate a simple POC

Look for Big Data

Ask yourself: What can I gain from big data?

How the new data or analysis scope can enhance your existing set of capabilities? What additional opportunities for intervention or processes optimisation does it present?

Identify your use case and find the right product and data model.Look for web distributions and create a simple POC

Questions

top related