building big data applications on aws by ran tessler
TRANSCRIPT
What to Expect from this Session
Big Data architectural principles Reference Lambda ArchitectureLive demo
Architectural Principles
• Decoupled “data bus”Data → Store → Process → Answers
• Use the right tool for the jobLatency, throughput, access patterns
• Apply Lambda architecture ideasImmutable (append-only) log, batch/speed/serving layer
• Leverage AWS managed servicesNo/low admin
• Be cost conscious Big data ≠ big cost
Simplify Big Data Processing
Ingest / collect store process /
analyzeconsume / visualize
data answers
Time to Answer (data freshness)Throughput
AccessLog - Common Log Format (CLF)
75.35.230.210 - - [20/Jul/2009:22:22:42 -0700] "GET /images/pigtrihawk.jpg HTTP/1.1" 200 29236
Your First Big Data Application on AWS
PROCESS
S3
STORE
Logs COLLECT:
Amazon Kinesis FirehoseAmazon Kinesis
ANALYZE & VISUALIZE
Your First Big Data Application on AWS
S3
STORE
Logs COLLECT:
Amazon Kinesis FirehoseAmazon Kinesis
ANALYZE & VISUALIZE
PROCESS: Amazon EMR with Spark & HiveS
park
Your First Big Data Application on AWS
PROCESS: Amazon EMR with Spark & Hive
EMRS3
STORE
Amazon Redshift
ANALYZE & VISUALIZE: Amazon Redshift and Amazon QuickSight
Logs COLLECT:
Amazon Kinesis FirehoseAmazon Kinesis
Spa
rkQuickSight
processstore
Apps
Batch Layer
Amazon Kinesis S3 Connector
Amazon S3
Amazon Redshift
Amazon EMR
Presto
Hive
Pig
Spark
Lambda Architecture
Serving Layer
AmazonElastiCache
AmazonDynamoDB
AmazonRDS
AmazonES
Amazon
Kinesis Speed Layer
KCL
AWS Lambda
Spark Streaming
Storm
AmazonMLdata
Back to our demo…
PROCESS: Amazon EMR with Spark & Hive
EMRS3
STORE
Amazon Redshift
ANALYZE & VISUALIZE: Amazon Redshift and Amazon QuickSight
Logs COLLECT:
Amazon Kinesis FirehoseAmazon Kinesis
Spa
rkQuickSight
DIYDownload all steps: http://bit.ly/29fhcwu
AmazonKinesis
Firehose
AmazonEMR
AmazonS3
AmazonRedshift
AmazonQuickSight
AmazonS3
http://aws.amazon.com/big-data/use-cases/