amazon emr facebook presto meetup
TRANSCRIPT
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
March 19, 2015 | Facebook Presto Meetup
Interactive SQL on Amazon S3 using Presto on Amazon EMR
Steve McPherson
instance
AMI DB on instance
instance with
CloudWatch
Elastic IP optimized instance
Amazon WorkSpaces
assignment/task
Amazon EMR cluster MapR M3 engine
MapR M5 engine
MapR M7 engine
engine
Kinesis-enabled app
new!Amazon Route 53
hosted zone
route table
solid state disks
AWS Direct Connect
router
Amazon RDS
customer gateway
attribute
VPC peering
Auto Scaling
Amazon S3 bucket with objects
object AWS Import/Export
AWS Storage Gateway
volume snapshotAmazon EBS
cached volume
virtual tape library
Elastic Beanstalk
Amazon Glacier
archive vaultCloudFron
tdownload distributio
nNode.js
streaming distributio
nitems
tableDynamoDB attributes global secondary
index
Amazon KinesisRDS DB
instanceRDS DB instance standby (Multi-AZ)
Oracle DB instance
MS SQL instance
PostgreSQL instance
PIOP Memcached
Redisnew! new! new! new!
AWS CloudTrail
instances
domain Amazon Redshift
Amazon SimpleDB
new!
DW1 Dense Compute
ElastiCache
DW2 Dense Compute
edge location
AWS Toolkit for Visual
Studio
JavaScriptapplication
stack
Amazon VPC VPN connection
virtual private
gateway
alarm
stack
Internet gateway
.NET
RDS DB instance
read replica
IAMJava Python (boto)
AWS CLI
permissions
role
MFA token
new!
new! new!
AWS OpsWorks
elastic network instance
PHPdata encryption key
AWS Data Pipeline
monitoring
new!
new!
deployment
CloudWatch
Elastic LoadBalancing
SQL master
new!new!
Amazon EC2
new!
SQL slave
encrypted data
AWS Tools for Windows
PowerShellnon-cached volume
users
IAM add-on
deployments
bucket deploymentsnew!
permissions
iOS
resources
cache node
stack
AWS OpsWorks layers
apps
new!
new! apps
new!
Amazon SNS
new!
Human Intelligence Tasks
(HIT)
AWS Simple Icons: Deployment & Management
instances
new!
new!new!
Ruby
new!
instances
new!
permissions
resources
new!
topicnew!
templateAWS Toolkit for Eclipse
Amazon SES
traditional server
Elastic Transcoder
monitoring
Requester
email notification
HTTP notification
Amazon CloudSearch SDF metadata
Amazon SQS itemmessage
Amazon SWFdecider
layers
worker
tape storagedisk
userInternet
Amazon Mechanical Turk
client mobile client multimedia
workers
corporate data center generic databaseAndroid
AWS Security Token Service
AWS cloud
AWS Management Console
virtual private cloud forums
MySQL DB instance
queueAMAZON EMR
Amazon EMR makes Cluster Management easy
Amazon EMR
• Setup and configuration
• Node monitoring and replacement
• Log aggregation
• Cloudwatch integration • Expand and shrink on
demand
• Integration with Spot
• AWS Support
Data Warehousing on Amazon EMR
Extract Transform & Load Data Warehouse Report Generation & Ad Hoc Analysis
Amazon S3 Amazon EMR Amazon EMR
• MapReduce API• Scoop
• Spark• Cascading• Pig• MR
• Hive• Spark• Cascading• Pig
• Presto• Hive• Spark-SQL• Lingual
• Parquet• ORC• SEQ• Text
Extract Transform & Load
Data Warehouse Report Generation
Ad Hoc Analysis
write read
Different Clusters for different workloads
Hive, Pig,Cascading
Presto
Spark HBase
Amazon S3
Why our customers like Presto?
• It works directly on S3
• It integrates with Hive
• It’s fast
• It’s Java
Demo: Launch a cluster#> aws emr create-cluster /--name="PRESTO-0-95" /--ami-version=3.5.0 /--applications Name=hive /--ec2-attributes KeyName=[KEY_NAME] /--instance-groups /InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge /InstanceGroupType=CORE,InstanceCount=1,InstanceType=m3.xlarge /--bootstrap-action Name="install presto",Path="s3://github-emr-bootstrap-actions/presto/0.95/install-presto",Args="[-p,8989,-m,1024,-n,128]”
#wait 5 minutes#> emrscreen
Run a Query#> hiveCREATE EXTERNAL TABLE test(id int, name string, surname string, emails string, country string, ip string)ROW FORMAT DELIMITED FIELDS TERMINATED BY ','LOCATION "s3://support.elasticmapreduce/bootstrap-actions/presto/0.95/Query_Sample/";
#> presto-cli --catalog hiveshow tables; SELECT name,COUNT(name) FROM test GROUP BY name;
What’s next
• Formal packaging of Presto
• Graceful shrink
• Cloudwatch integration
• Identity and Authorization integration with AWS services
Get started today
Amazon EMR
http://aws.amazon.com/elasticmapreduce/