bigdata nedir? hadoop nedir? mapreduce nedir? big data

29
Big Data Zekeriya Beşiroğlu http://zekeriyabesiroglu.com http://bilginc.com http://twitter.com/zbesiroglu

Upload: zekeriya-besiroglu

Post on 07-Aug-2015

611 views

Category:

Software


1 download

TRANSCRIPT

Page 1: Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data

Big Data

Zekeriya Beşiroğluhttp://zekeriyabesiroglu.com

http://bilginc.comhttp://twitter.com/zbesiroglu

Page 2: Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data

Zekeriya Besiroglu• Bilginc IT Academy - Expert

Consultant

• + 16 IT

• +14 ORACLE DB/DWH

• +7 WEBLOGIC

• +3 BIG DATA

• TROUG

• Speaker

Page 3: Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data

Bilginc IT Academy

Page 4: Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data

DATA TRENS

- Facebook has around 60 PB warehouse and it’s constantly growing

- Twitter messages are 140 bytes each generating 8TB data per day.

-Data is more than doubling every year. -Almost 80% of data will be unstructured data.

-Amazon: 35% of product sales come from product recommendations

Page 5: Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data

New Type of DATA?• Sentiment : Understand how your customers feel about

your products / company

• Sensor/Machine:Discover patters in data streaming automatically from sensors and machines.

• Unstructured: text,video,pictures.

• Server Logs:Search logs find pattern

• Geographic:Analyze location-based data

• Clickstream:Capture and analyze website visitors data

Page 6: Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data

Big Datahttps://www.youtube.com/watch?v=1GU4Imbo6R8

Page 7: Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data

Capacity vs CostYear Capacity(GB) Cost per GB(USD)

1990 0.10 $4000

1997 2 $150

2002 80 $3.75

2007 750 $0.35

2012 3.000 $0.05

2015 10.000 $0.02

Page 8: Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data

What is Big Data• Big Data is When the Volume,Velocity,Variety of

data gets to the point where it is too difficult/expensive for traditional systems to work with.

Page 9: Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data

3Vs of Big Data

Page 10: Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data

Traditional Large scale Computing System Problems• Computation has been

processor bound

• Relatively small amount of data

• Complex processing

• Need bigger computers

• More memory,More/fast processor

Page 11: Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data

Better Solution

• Distributed Systems- Multiple machine run for single job

Problem Of Distributed Systems

Data Stored central location Data Copied processor runtime

Page 12: Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data

Todays• Total Data size PetaBytes

• Daily TerabytesWe Need New Solution

HADOOP

Page 13: Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data

HADOOP• Distribute the Data when it is stored

SPARK Data is Distributed in Memory

Page 14: Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data

RDBMS vs HADOOP

Page 15: Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data

Hadoop

• Hadoop consist of two component

• HDFS

• Map Reduce

• Hadoop ecosystem

• Pig,Hive,Hbase,Flume,Oozie,Sqoop,etc

Page 16: Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data

Traditional ETL

Source Layer Structured Data DWH Data Mart

ETL/ELT ETL/ELT

Hadoop ETLSource Layer

Structured Data UnStructed Data DWH Data MartHADOOP

Page 17: Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data

HDFS

• Hadoop Distributed File System:Storing data

• Data Split into blocks. 64 Mb…

• Each Block replicated e.g 3 times. replicas store different nodes.

• Based on Google File system

• ext3,ext4,xfs

• No random writes allowed. Prefer large streaming reads

Page 18: Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data

HDFS

Page 19: Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data

HDFS• hadoop fs -ls (user home directory)

• hadoop fs -ls / (root directory)

• hadoop fs -cat /user/zekeriya/deneme.txt

• hadoop fs -mkdir

• hadoop fs -rm -r veri

Page 20: Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data

MapReduce

• Process Data in the Hadoop Cluster

• Two Stage MAP and REDUCE

Page 21: Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data

MAPREDUCE

map(String input_key, String input_value)foreach word w in input_value:emit(w, 1)reduce(String output_key, Iterator<int> intermediate_vals) set count = 0 foreach v in intermediate_vals: count += vemit(output_key, count)

(1000,’Galatasaray sampiyon olur’)(2000,’beşiktas sampiyon olur’)(2200,’Galatasaray Türkiyedir’)

Page 22: Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data

MAPREDUCEOutput Mapper(‘Galatasaray’, 1), (‘sampiyon’, 1), (‘olur’, 1), (‘beşiktas’, 1),(‘sampiyon, 1), (‘olur’, 1), (‘Galatasaray’, 1), (‘Türkiyedir’, 1)

Intermediate Data Reducer’a gönderilen(‘Galatasaray’,[1,1])(‘sampiyon’,[1,1])(‘olur’,[1])(‘beşiktas’,[1])(‘Türkiyedir’,[1])

Reducer’ın son cıktısı

(‘Galatasaray’,2)(‘sampiyon’,2)(‘olur’,1)(‘beşiktas’,1)(‘Türkiyedir’,1)

Page 23: Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data

Hadoop Ecosystem• HIVE

• LIKE SQL

• User query data in hadoop cluster without knowing Java and Map reduce.

• PIG

• Uses a dataflow scripting language

• IMPALA

• Open source project created by cloudier

• Very similar to HiveQL.Produces much faster.

Page 24: Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data

Hadoop Ecosystem• FLUME

• Import data into HDFS as it is generated

• Log files from a Web Server

• Sqoop

• Import data from tables in a OLTP into HDFS

• Populate database tables from files in HDFS

• Oozi

• Developers create a workflow of MapReduce Jobs

Page 25: Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data

Hadoop Ecosystem• HBASE

• HADOOP DATABASE

• NOSQL DATASTORE

• HUGE DATA STORE,GB,TB,PB

• Query Language get/put/scan

• Read/write Throughput Millions of query ps ,rdbms is 1000s queries/second

Page 26: Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data

Big Data

• Finance ,Fraud detection,Customer risk analysis

• Retail, Product recommendation,buy and discount

• Advertising,More effective web ads

• Defense

• Telco

• Healthcare

Page 27: Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data

Analyzing Twitter Data• https://github.com/cloudera/cdh-twitter-

example

Page 28: Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data

Career Path

• Develop with Hadoop

• Hadoop Administration

• Hadoop for Data Scientists & Analysts

Page 29: Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data

Zekeriya Beşiroğlu http://zekeriyabesiroglu.com http://twitter.com/zbesiroglu

http://bilginc.com http://troug.org

mail to:[email protected] [email protected]