aws black belt techシリーズ amazon kinesis

61
Amazon Kinesis AWS Black Belt Tech Webinar 2014 (旧マイスターシリーズ) パートナーソリューションアーキテクト 榎並 晃

Upload: amazon-web-services-japan

Post on 26-Jan-2015

122 views

Category:

Technology


3 download

DESCRIPTION

AWS Black Belt Tech Webinar 2014 (旧マイスターシリーズ) Amazon Kinesis

TRANSCRIPT

  • 1. Amazon Kinesis AWS Black Belt Tech Webinar 2014 ()

2. Agenda Kinesis Kinesis Kinesis 3. Kinesis 4. M2M IoT Web Logs POS Data 5. POS 6. AWS S3 Process Submissions Store Batches Process Hourly w/ Hadoop Clients Submitting Data Data Warehouse 100 ETL Job 100 Kinesis 7. Kinesis 8. Amazon Kinesis Kinesis1 AZ 9. Kinesis AWS SDK LOG4J Flume Fluentd Get* APIs Kinesis Client Library + Connector Library Apache Storm Amazon Elastic MapReduce 10. KinesisData Sources App.4 [Machine Learning] App.1 [Aggregate & De-Duplicate]Data Sources Data SourcesData Sources App.2 [Metric Extraction] S3 DynamoDB Redshift App.3 [Real-time Dashboard]Data Sources Availability Zone Shard 1 Shard 2 Shard N Availability Zone Availability Zone Kinesis AWS Endpoint StreamStream1Shard Shard 1MB/sec, 1000 TPS 2 MB/sec, 5TPS Data RecordData Record24 AZ Shard Stream 11. Management Console/API StreamShard Stream Cloud Watch 12. $0.015/shard/ Put $0.028/1,000,000PUT 1 (30)10Shard100,000,000PUT$110/ 13. 14. PutRecord API http://docs.aws.amazon.com/kinesis/latest/APIReference/API_PutRecord.html AWS SDK for Java, Javascript, Python, Ruby, PHP, .Net botoput_record http://docs.pythonboto.org/en/latest/ref/kinesis.html#module-boto.kinesis.layer1 15. Stream Shard-0 Shard-1 Data Record Data Record Data Record KinesisShard 16. Stream Shard-0 Shard-1 Data Record Data Record Data Record Data RecordShard DataRecord (Max 50KB) (Max 256B) 17. Shard Shard MD5Shard 0 MD5 2128 Shard-1 (2128/2 - 2128)MD5() Shard-0 (0 - 2128/2) 18. Tips TIPS ShardPartition Key Shard " Partition Key 19. StreamData RecordKinesisStream ( PutRecord API PutRecord API SequenceNumberForOrdering PutRecord API (14) (15) (17) (19) (20) 20. SDK Log4J Appender Log4JKinesis Appender https://github.com/awslabs/kinesis-log4j-appender http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/kinesis- pig-publisher.html 21. 22. GetShardIterator APIShardGetRecords API http://docs.aws.amazon.com/kinesis/latest/APIReference/API_GetShardIterator.html http://docs.aws.amazon.com/kinesis/latest/APIReference/API_GetRecords.html AWS SDK for Java, Javascript, Python, Ruby, PHP, .Net botoget_shard_iterator, get_records http://docs.pythonboto.org/en/latest/ref/kinesis.html#module-boto.kinesis.layer1 23. GetShardIterator GetShardIterator APIShardIteratorType ShardIteratorType AT_SEQUENCE_NUMBER ( ) AFTER_SEQUENCE_NUMBER ( ) TRIM_HORIZON ( Shard ) LATEST ( ) Seq: xxx LATEST AT_SEQUENCE_NUMBER AFTER_SEQUENCE_NUMBER TRIM_HORIZON GetShardIterator 24. Kinesis Client Library GetShardIterator APIGetRecords API Shard Kinesis Client Library Kinesis Client LibraryJava Github https://github.com/awslabs/amazon-kinesis-client Kinesis Client LibraryDynamoDB DynamoDB Read Provisioned Throughput, Write Provisioned Throughput10 25. Kinesis Client Library Kinesis Client LibraryKinesis WorkerKinesis Kinesis IRecordProcessor http://docs.aws.amazon.com/kinesis/latest/dev/kinesis-record-processor- app.html 26. Kinesis Client Library publicclassSampleRecordProcessorimplementsIRecordProcessor{ @Override publicvoidinitialize(StringshardId){ LOG.info("Initializingrecordprocessorforshard:"+shardId); this.kinesisShardId=shardId; } @Override publicvoidprocessRecords(Listrecords,IRecordProcessorCheckpointercheckpointer){ LOG.info("Processing"+records.size()+"recordsforkinesisShardId"+kinesisShardId); //Processrecordsandperformallexceptionhandling. processRecordsWithRetries(records); //Checkpointonceeverycheckpointinterval. if(System.currentTimeMillis()>nextCheckpointTimeInMillis){ checkpoint(checkpointer); nextCheckpointTimeInMillis=System.currentTimeMillis()+CHECKPOINT_INTERVAL_MILLIS; } } } [Sample RecordProcessor ] 27. Kinesis Client Library IRecordProcessorFactoryrecordProcessorFactory=newSampleRecordProcessorFactory(); Workerworker=newWorker(recordProcessorFactory,kinesisClientLibConfiguration); intexitCode=0; try{ worker.run(); }catch(Throwablet){ LOG.error("Caughtthrowablewhileprocessingdata.",t); exitCode=1; } [Sample Worker] 28. Kinesis Client Library Stream Shard-0 Shard-1 Kinesis (KCL) Instance A 12345 Instance B 98765 Data Record (12345) Data Record (24680) Data Record (98765) DynamoDB Instance A Kinesis (KCL) Instance B 1. Kinesis Client LibraryShardData Record 2. ID DynamoDB Key, Attribute 29. Kinesis Client Library Stream Shard-0 Kinesis (KCL) Instance A Instance B 12345 Data Record (12345) Data Record (24680) DynamoDB Instance A Kinesis (KCL) Instance B Instance AInstance BDynamoDB Key, Attribute 30. Kinesis Client Library Stream Shard-0 Kinesis (KCL) Shard Shard-0 Instance A 12345 Shard-1 Instance A 98765 Data Record (12345) Data Record (24680) DynamoDB Instance A Shard-1Shard-1 DynamoDB Shard-1 Data Record (98765) New Key, Attribute 31. Kinesis (12345) (98765) (24680) (12345) (98765) (24680) (KCL) DynamoDB Instance A Shard Shard-0 Instance A 12345 Shard-1 Instance A 98765 (KCL) Instance A Shard Shard-0 Instance A 24680 Shard-1 Instance A 98765 Archive Table Calc Table 32. Kinesis Connector Library Kinesis Connector LibraryS3, DynamoDBRedshift Kinesis Connector LibraryJava Github https://github.com/awslabs/amazon-kinesis-connectors RedShift DynamoDB S3 Kinesis 33. Data Record ITransformer Kinesis IFilter IBuer IEmitter AWS S3 DynamoDB Redshift Kinesis Kinesis Connector Library 34. Kinesis Connector Library publicclassS3PipelineimplementsIKinesisConnectorPipeline{ @Override publicITransformergetTransformer(KinesisConnectorConfigurationconfiguration){ returnnewJsonToByteArrayTransformer(KinesisMessageModel.class); } @Override publicIFiltergetFilter(KinesisConnectorConfigurationconfiguration){ returnnewAllPassFilter(); } @Override publicIBuffergetBuffer(KinesisConnectorConfigurationconfiguration){ returnnewBasicMemoryBuffer(configuration); } @Override publicIEmittergetEmitter(KinesisConnectorConfigurationconfiguration){ returnnewS3Emitter(configuration); } } [Sample S3 pipeline] JSONByteArray S3 35. Kinesis Storm Spout KinesisApache StormSpout Kinesis Storm SpoutJava Github https://github.com/awslabs/kinesis-storm-spout 36. EMR Connector HivePigCascadingHadoop StreamingHadoop Kinesis StreamMap Reduce ETLKinesis Stream, S3, DynamoDB, HDFSHive Table JOIN () Clickstream (Kinesis) JOIN Ad campaign data (DynamoDB) Kinesis Stream EMR Hive Table Data Storage Table Mapping (Hive) http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-kinesis.html 37. EMR Connector : Hive Hive Kinesis Stream HQL() HQL Kinesis 38. DynamoDB Data pipeline / CrontabKinesis EMR Connector : Hive 39. Kinesis 40. CloudWatch CloudWatch Shard GetRecords.Bytes GetRecords GetRecords.IteratorAge GetShardIterator GetRecords.Latency GetRecords GetRecords.Success GetRecords API PutRecord.Bytes PutRecord PutRecord.Latency PutRecord PutRecord.Success PutRecord API 41. Shard ShardShard SpritShard APIMergeShards API (SpritShard) http://docs.aws.amazon.com/kinesis/latest/APIReference/API_SplitShard.html (MergeShards) http://docs.aws.amazon.com/kinesis/latest/APIReference/API_MergeShards.html AWSEndpoint Availability Zone Shard 1 Shard 2 Shard N Availability Zone Availability Zone Shard- 1 Shard- 2 Shard- 1 Shard- 2 Shard- 3 Shard- 4 42. SpritShardMergeShards Shard ShardSpritMerge Shard 43. SpritShard API SpritShard conn = KinesisConnection() descStream = conn.describe_stream(stream_name) for shard in descStream['StreamDescription']['Shards']: StartHashKey = shard['HashKeyRange']['StartingHashKey'] EndHashKey = shard['HashKeyRange']['EndingHashKey] NewHashKey = (long(EndHashKey) - long(StartHashKey))/2 print "StartHashKey : ", StartHashKey print "EndHashKey : ", EndHashKey print "NewHashKey : ", str(NewHashKey) ret = conn.split_shard(stream_name,targetShard,str(NewHashKey)) Boto1Stream1Shard StartHashKey : 0 EndHashKey : 340282366920938463463374607431768211455 NewHashKey : 170141183460469231731687303715884105727 44. MergeShards API MergeShards APIShardShard conn = KinesisConnection() conn.merge_shards(stream_name,targetShard, mergedShard) Boto 45. 46. Digital Ad. Tech Metering with Kinesis Incremental Ad. Statistics Computation Metering Record Archive Ad Analytics Dashboard Continuous Ad Metrics Extraction 47. Stream Data Sources Data Sources Data Sources Kinesis App Kinesis App Kinesis App Data Sources Data Sources Data Sources Kinesis App Kinesis App Kinesis App Kinesis App Data SourceA Data SourceB Data SourceAETL Data SourceBETL 48. AWS AWS Endpoint Kinesis App.1 Kinesis App.2 Redshift DynamoDB Kinesis App.3 Availability Zone Shard 1 Shard 2 Shard N Availabil ity Zone Availabil ity Zone RDS BI as a Service S3 Kinesis 49. 50. Amazon Kinesis Amazon Kinesis 51. Gaming Analytics with Amazon Kinesis 52. Kinesis 53. SQSKinesis KinesisPub-SubStream DataRecord SQS Worker Worker Worker Kinesis Worker A Worker B WorkerKinesis 54. Web Kinesis App [] DashboardKinesis App [ ] Redshift DynamoDB 55. ETL KinesisApp [ETL] S3 EMR 56. Data Sources Data Sources Data Sources KinesisApp [Worker] SQS WorkerStorm KinesisApp [Worker] 57. 58. Stream S3, Redshift, & DynamoDB S3, RedshiftS3 Kinesis Client Library Low Cost Amazon Kinesis 59. Amazon Kinesis API Reference http://docs.aws.amazon.com/kinesis/latest/APIReference/Welcome.html Amazon Kinesis Developer Guide http://docs.aws.amazon.com/kinesis/latest/dev/introduction.html Amazon Kinesis Forum https://forums.aws.amazon.com/forum.jspa?forumID=169# 60. Q&A 61. Webinar AWS http://aws.amazon.com/jp/aws-jp-introduction/