pydata amazon kinesisのご紹介

48
Amazon Kinesisの紹介 と使いドコロ アマゾンデータサービスジャパン株式会社 パートナーソリューションアーキテクト 榎並 晃

Upload: amazon-web-services-japan

Post on 04-Jul-2015

2.094 views

Category:

Technology


4 download

DESCRIPTION

Amazon Kinesisのご紹介

TRANSCRIPT

  • 1. Amazon Kinesis

2. [email protected] @ToshiakiEnami AWS Amazon Kinesis Amazon DynamoDB 3. AWSS3ProcessSubmissionsStoreBatchesProcessHourly w/HadoopClientsSubmittingDataDataWarehouse 100ETL Job 100 , keep everything 4. IngestClient/SensorIngestProcessingStorageAnalytics +Visualization+ Reporting 5. Ingest Layer ProcessingOrKinesisKafkaProcessingKinesis 6. 7. Kinesis 8. Amazon Kinesis Kinesis1AZ 9. POS 10. KinesisKinesis Client Library+Connector LibraryHTTPS PostAWS SDKFluentdFlumeLOG4JGet* APIsApacheStormAmazon ElasticMapReduceMobileSDK Cognito 11. KinesisDataSourcesApp.1[Aggregate De-Duplicate]App.4[MachineLearning]DataSourcesDataSourcesDataSourcesApp.2[MetricExtraction]S3DynamoDBRedshiftApp.3[Real-timeDashboard]DataSourcesStreamAvailabilityZoneAvailabilityZoneShard 1Shard 2Shard NAvailabilityZoneKinesisAWS Endpoint StreamStream1Shard Shard 1MB/sec, 1000 TPS 2 MB/sec, 5TPS Data RecordData Record24 AZ Shard 12. Kinesis$0.0195/shard/Put$0.043/100Put $14 Get EC2 13. 14. PutRecord API http://docs.aws.amazon.com/kinesis/latest/APIReference/API_PutRecord.html AWS SDK for Java, Javascript, Python, Ruby, PHP, .Net botoput_record http://docs.pythonboto.org/en/latest/ref/kinesis.html#module-boto.kinesis.layer1 15. DataRecordShard ShardMD5Shard02128Shard-1MD5()Shard-002127 16. KinesisStream 24 shardSeqNo(14)SeqNo(17)SeqNo(25)SeqNo(26)SeqNo(32) 17. Web 18. Fluentd Plugin Web GithubPluginhttps://github.com/awslabs/aws-fluent-plugin-kinesis Log4J JavaLog4J http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/kinesis-pig-publisher.htmlWeblog4j.properties # KINESIS appenderlog4j.logger.KinesisLogger=INFO, KINESISlog4j.additivity.KinesisLogger=falselog4j.appender.KINESIS=com.amazonaws.services.kinesis.log4j.KinesisAppenderlog4j.appender.KINESIS.layout=org.apache.log4j.PatternLayoutlog4j.appender.KINESIS.layout.ConversionPattern=%m 19. MQTT) MQTT BrokerMQTT-Kinesis BridgeKinesisMQTTBrokerKinesis-MQTTBridge GithubMQTT-Kinesis Bridgehttps://github.com/awslabs/mqtt-kinesis-bridgeMQTTBrokerKinesis-MQTTBridgeAuto scaling Group 20. CognitoMobileSDKKinesis KinesisLogin OAUTH/OpenIDAccess TokenEnd UsersApp w/SDKAccess TokenPool IDRole ARNsCognito ID,TempCredentialsPut RecodeAmazon Cognito - IDAWS identitiesAccountIdentitypoolIdentityProvidersAccessauthenticatedidentitypool PolicyUnauthenticatedIdentities 21. 22. GetShardIterator APIShardGetRecordsAPI http://docs.aws.amazon.com/kinesis/latest/APIReference/API_GetShardIterator.html http://docs.aws.amazon.com/kinesis/latest/APIReference/API_GetRecords.html AWS SDK for Java, Javascript, Python, Ruby, PHP, .Net botoget_shard_iterator, get_records http://docs.pythonboto.org/en/latest/ref/kinesis.html#module-boto.kinesis.layer1 23. GetShardIterator GetShardIterator APIShardIteratorType ShardIteratorType AT_SEQUENCE_NUMBER ( ) AFTER_SEQUENCE_NUMBER ( ) TRIM_HORIZON ( Shard ) LATEST ( )Seq: xxxLATESTAT_SEQUENCE_NUMBERAFTER_SEQUENCE_NUMBERTRIM_HORIZONGetShardIterator 24. Kinesis Client Library (KCL)Client library for fault-tolerant, at least-once, ContinuousProcessing ShardWorker WorkerShard 1 WorkerShard 2 workerShard 3 AutoScalingShard 4 At least onceShard nEC2 InstanceKCL Worker 1KCL Worker 2EC2 InstanceKCL Worker 3KCL Worker 4EC2 InstanceKCL Worker nKinesis 25. Kinesis Client LibraryStreamShard-0Shard-1Kinesis(KCL)Instance A12345Instance A98765DataRecord(12345)DataRecord(24680)DataRecord(98765)DynamoDBInstance AKey, Attribute1. Kinesis Client LibraryShardData Record2. IDDynamoDB3. Shard 26. Kinesis Client LibraryStreamShard-0Shard-1Kinesis(KCL)Instance A12345Instance B98765DataRecord(12345)DataRecord(24680)DataRecord(98765)DynamoDBInstance AInstance BKinesis(KCL)1. Key, Attribute 27. Kinesis Client LibraryStreamShard-0Shard-1Kinesis(KCL)Instance AInstance B12345Instance B98765DataRecord(12345)DataRecord(24680)DataRecord(98765)DynamoDBInstance AInstance BKinesis(KCL)Key, AttributeInstance AInstance BDynamoDB 28. Kinesis Client LibraryStreamShard-0Kinesis(KCL)ShardShard-0Instance A12345Shard-1Instance A98765DataRecord(12345)DataRecord(24680)DynamoDBInstance AShard-1DataRecord(98765)NewKey, AttributeShard-1Shard-1DynamoDB 29. Kinesis(12345)(24680)(98765)(KCL)DynamoDBInstance AArchive TableShardShard-0Instance A12345Shard-1Instance A98765Instance A(KCL)Calc TableShardShard-0Instance A24680Shard-1Instance A98765 30. Kinesis Client Library (KCL) for Python KCL for PythonKCL for JavaMultiLangDaemonPython MultiLangDaemonSTDIN/STDOUT 31. Kinesis Client Library (KCL) for Python KCL for PythonKCL for JavaMultiLangDaemonPython MultiLangDaemonSTDIN/STDOUTKCL(Java)Worker Thread Python LogicShard-0Shard-1Worker ThreadProcessPython LogicProcess 32. KCL for Python#!env pythonfrom amazon_kclpy import kclimport json, base64class RecordProcessor(kcl.RecordProcessorBase):def initialize(self, shard_id):passdef process_records(self, records, checkpointer):passdef shutdown(self, checkpointer, reason):passif __name__ == __main__:kclprocess = kcl.KCLProcess(RecordProcessor())kclprocess.run() 33. KCL for PythonKCL for Pythonhttps://github.com/awslabs/amazon-kinesis-client-python/blob/master/amazon_kclpy/kcl.pyKCL for Javahttps://github.com/awslabs/amazon-kinesis-client/tree/master/src/main/java/com/amazonaws/services/kinesis/multilang 34. Multi Language ProtocolAction ParameterInitialize shardId : stringprocessRecords [{ data : base64encoded_string,partitionKey : partition key,sequenceNumber : sequence number;}] // a list of recordscheckpoint checkpoint : sequence number,error : NameOfExceptionshutdown reason : TERMINATE|ZOMBIE 35. KCL for PythonfailoverTimeMillisWorkerWorkerDynamoDBPIOPSmaxRecords1idleTimeBetweenReadsInMilliscallProcessRecordsEvenForEmptyRecordListTrue or FaultparentShardPollIntervalMillisShardDynamoDBPIOPScleanupLeasesUponShardCompletionshradtaskBackoffTimeMillisKCLmetricsBufferTimeMillisCloudWatchAPImetricsMaxQueueSizeCloudWatchAPIvalidateSequenceNumberBeforeCheckpointingCheckpointingmaxActiveThreadsMultiLangDaemon 36. KCL for Python[ec2-user@ip-172-31-17-43 samples]$ amazon_kclpy_helper.py --print_command -j /usr/bin/java -p /home/ec2-user/amazon-kinesis-client-python/samples/sample.properties/usr/bin/java -cp /usr/lib/python2.6/site-packages/amazon_kclpy-1.0.0-py2.6.egg/amazon_kclpy/jars/amazon-kinesis-client-1.2.0.jar:/usr/lib/python2.6/site-packages/amazon_kclpy-1.0.0-py2.6.egg/amazon_kclpy/jars/jackson-annotations-2.1.1.jar:/usr/lib/python2.6/site-packages/amazon_kclpy-1.0.0-py2.6.egg/amazon_kclpy/jars/commons-codec-1.3.jar:/usr/lib/python2.6/site-packages/amazon_kclpy-1.0.0-py2.6.egg/amazon_kclpy/jars/commons-logging-1.1.1.jar:/usr/lib/python2.6/site-packages/amazon_kclpy-1.0.0-py2.6.egg/amazon_kclpy/jars/joda-time-2.4.jar:/usr/lib/python2.6/site-packages/amazon_kclpy-1.0.0-py2.6.egg/amazon_kclpy/jars/jackson-databind-2.1.1.jar:/usr/lib/python2.6/site-packages/amazon_kclpy-1.0.0-py2.6.egg/amazon_kclpy/jars/jackson-core-2.1.1.jar:/usr/lib/python2.6/site-packages/amazon_kclpy-1.0.0-py2.6.egg/amazon_kclpy/jars/aws-java-sdk-1.7.13.jar:/usr/lib/python2.6/site-packages/amazon_kclpy-1.0.0-py2.6.egg/amazon_kclpy/jars/httpclient-4.2.jar:/usr/lib/python2.6/site-packages/amazon_kclpy-1.0.0-py2.6.egg/amazon_kclpy/jars/httpcore-4.2.jar:/home/ec2-user/amazon-kinesis-client-python/samplescom.amazonaws.services.kinesis.multilang.MultiLangDaemon sample.propertiesKCL 37. Kinesis ASeqNoB(14)SeqNo(17)SeqNo(25)SeqNo(26)SeqNo(32) 38. KinesisSimple ETLKinesisIngestS3DynamoDBRedshift- ETL/MapReduceKinesisIngestHadoopSparkStorm- - ETLFilterKinesisFiltering/MapReduce- - AWS Lambda AWS Lambda 39. KCL DashboardDynamoDBRedshift 40. Simple ETL DynamoDBRedshiftS3KinesisConnector Libraryhttps://github.com/awslabs/amazon-kinesis-connectorsS3RedshiftS3RedshiftKinesis ConnectorTransformerFilterBufferEmitter 41. ETL/MapReduce1 HadoopSpark KinesisHivePigHadoopETLMap Reduce Kinesis Stream, S3, DynamoDB, HDFSHive TableJOIN Data pipeline / CrontabKinesisData PipelineEMR ClusterS3 DataPipelineHiveKinesisS3KinesisEMR AMI 3.0.4Kinesis 42. ETL/MapReduce2 Apache Storm Bolt KinesisApache StormSpouthttps://github.com/awslabs/kinesis-storm-spoutDataSourcesDataSourcesDataSourcesStormSpoutStormBoltStormBoltStormBolt 43. Filter Kinesis FilterMapReduceKinesis KinesisDataSourcesDataSourcesDataSourcesFilter Layer ()Process Layer ()KinesisAppKinesisAppKinesisAppKinesisApp 44. Apache SparkApache StormDataSourcesDataSourcesDataSourcesJubatusDashboardJubatus 45. AWS Lambda Lambda FunctionAWS LambdaDataSourcesDataSourcesDataSourcesS3Redshift 46. KinesisEC2 Jubatus(iPhone)HTTP/WSPut RecordHTTP/WSGet Records 47. 48. IoT AWS