big data 101 v1

13
www.geekseat.co m.au Agile Software Development Welcome to “Big Data” Jungle Welly Tambunan ([email protected]) Solution and Integration Architect Lead Analytics & Data warehouse Department

Upload: welly-tambunan

Post on 15-Apr-2017

31 views

Category:

Internet


0 download

TRANSCRIPT

Page 1: Big data 101 v1

www.geekseat.com.au Agile Software Development

Welcome to “Big Data” JungleWelly Tambunan

([email protected])

Solution and Integration Architect LeadAnalytics & Data warehouse Department

Page 2: Big data 101 v1

Outlines Big Data Overview and History Introduction to Hadoop Hadoop Ecosystem Hadoop Distribution

Cloudera

Big Data Architecture ETL vs ELT Talend for ETL Tools

Page 3: Big data 101 v1

Big Data Overview and History

Google Search Engine Search Engine Architecture

Crawler

Indexer

Search Algorithm / Page Rank

Doug Cutting and Search Engine Apache Lucene

Apache Nutch

Google File System + Map Reduce Hadoop Birth

Page 4: Big data 101 v1

Hadoop HDFS ( Hadoop Distributed File System ) Map Reduce Hadoop = HDFS + Map Reduce Hadoop = Storage + Processing Feature

schemaless with no predefined structure, i.e. no rigid schema with tables and columns (and column types and sizes)

durable once data is written it should never be lost

capable of handling component failure without human intervention (e.g. CPU, disk, memory, network, power supply, MB)

automatically rebalanced to even out disk space consumption throughout cluster

Page 5: Big data 101 v1

Hadoop Ecosystem SQL on Hadoop

HIVE

Impala

Hbase Hue Kafka Oozie Sqoop

Page 6: Big data 101 v1

Hadoop Ecosystem Yarn Zookeeper Spark

Batch

Streaming

Flink Batch

Streaming

Page 7: Big data 101 v1

Hadoop Distribution

Cloudera ( Danamon choice ) Hortonworks MapR IBM etc

Page 8: Big data 101 v1

Cloudera Demo Cloudera Manager Hue File

Format CSV

Parquet

Avro

Compression Gzip

Snappy

Deflate

Read as Database from Hive

Impala

Page 9: Big data 101 v1

ETL vs ELT

Extract Transform Load Extract Load Transform

Page 10: Big data 101 v1

Talend for ETL/ELT Tools

Demo for Standard Job with Database Demo for Batch Job Demo for Streaming Job

Page 11: Big data 101 v1

Announcement https://weltam.wordpress.com/ is back with Big Data Flavor

Page 12: Big data 101 v1

Questions ?

Page 13: Big data 101 v1

Rock On !