Transcript

Big Data Technologies

The First NIDA Business Analytics and Data Sciences Contest/Conference

วันที่ 1-2 กันยายน 2559 ณ อาคารนวมินทราธิราช สถาบันบัณฑิตพัฒนบริหารศาสตร์

https://businessanalyticsnida.wordpress.com

https://www.facebook.com/BusinessAnalyticsNIDA/

Big Data คืออะไร

สถาปัตยกรรมสําหรับข้อมลูขนาดใหญ่เป็นเช่นไร

จะจัดการกับข้อมูลขนาดใหญ่ได้อย่างไร

การประมวลผลข้อมลูขนาดใหญ่จะทาํเช่นไร

Unstructured data ต่างจาก Relational Database Management System หรือไม่

เทคโนโลยีล่าสุดของข้อมลูขนาดใหญ่มีอะไรบ้าง

ทีมงาน Data Science Thailand

นวมินทราธิราช 3001 วันที่ 1 กันยายน 2559 15.15-16.30 น.

The First NIDA Business Analytics & Data Sciences Conference

Big Data Technology

1-2 September 2016

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

People think this [big data] is a tech revolution. But it is really a business revolution enabled by technology.

– Steven Messer

Let's define “Big Data”

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

3Vs : Volume Velocity Variety

5Vs : Veracity Values

Let's define “Big Data”

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Let's talk about Scalability

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Scale-UP (vertical)

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Scale-OUT (horizontal)

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

What are the challenges?

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Even with the best hardware, frequent failure is the norm.

Challenges:Handling failure

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Divide & Conquer strategies

Challenges:Parallelization

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Challenges:Barrier to entry

“The IBM Blue Gene/P supercomputer installation at the Argonne Leadership Angela Yang Computing Facility located in the Argonne National Laboratory.”https://en.wikipedia.org/wiki/File:IBM_Blue_Gene_P_supercomputer.jpg

Challenges:Barrier to entry

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Challenges:Barrier to entry

“The Borg, a beowulf cluster used by the McGill University pulsar group to search for binary pulsars (among other things).”https://en.wikipedia.org/wiki/File:Beowulf-cluster-the-borg.jpg

Challenges:Barrier to entry

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Let's talk about the revolution

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Hadoop : A brief history

Cutting & Cafarella

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Hadoop : Abstraction Layers

Storage Layer : HDFS (GFS)

Compute Layer : YARN + MapReduce

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Hadoop

HDFS(Hadoop Distributed File System)

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Hadoop : HDFS

FILE_001 A B C

NODE1 NODE2 NODE3

A2

A1

A3

B1

B2 B3

C1

C2C3

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Hadoop : HDFShttps://ha doop.apach e.org/doc s/r1.2.1/h dfs_desig n.html

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Hadoop

YARN(Yet Another Resource Negotiator)

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Hadoop : YARN

A B C D

DATA1 DATA2 DATA1 DATA2

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Hadoop : YARN

A B

DATA1 DATA2Network is expensive

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Hadoop : YARN

A B C D

DATA1 DATA2 DATA1 DATA2

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Hadoop : YARN

A B C D

DATA1 DATA2 DATA1 DATA2

App Master

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Hadoop : YARN

A B C D

DATA1 DATA2 DATA1 DATA2

App Master

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Hadoop

MapReduce

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Hadoop : MapReduce Paradigm

Consists of 2 primary functionsMap & Reduce

Both functions transform data

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Hadoop : MapReduce Paradigm

https://twitter.com/steveluscher/status/741089564329054208

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Hadoop : MapReduce Paradigm

https://twitter.com/steveluscher/status/741089564329054208

Parallelism achieved!

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Hadoop :The Big Data enabler

●Fault-tolerant storage●Fault-tolerant computation●Parallel processing paradigm for the rest of us ?????

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Hadoop : Not everyone codes

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Hive : A High-level toolfor the rest of us

●Developed by Facebook●HiveQL (looks just like SQL)●Schema-on-Read●Data warehousing at scale

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Hive : A High-level toolfor the rest of us

HiveQL(SQL) MapReduce job(s)→

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Hive : A High-level toolfor the rest of us

DDLCREATE TABLE inventory(sku int, product_name string, manufacturer string, num_instock int) ROW FORMAT DELIMITEDFIELDS TERMINATED BY ','STORED AS TEXTFILETBLPROPERTIES("skip.header.line.count"="1");

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Hive : A High-level toolfor the rest of us

DMLLOAD DATA INPATH '/home/user/products.csv' OVERWRITE INTO TABLE inventory;

SELECT SUM(num_instock) as instock_countFROM inventory;

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

What else?

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Other selected tools

● Hbase● Kafka● Spark (SQL, GraphX, MLLib, Streaming)● Docker● Kubernetes● Mesos

Impacts of Hadoop(and other “Big Data” tools)

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Impacts of Hadoop(and other “Big Data” tools)

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Democratization of high-volume, high-velocity data processing

Impacts of Hadoop(and other “Big Data” tools)

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

We support Hadoop

Impacts of Hadoop(and other “Big Data” tools)

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

We provide an alternative

Adopting “Big Data” technologies

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Adopting “Big Data” technologies

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Big Data is not a technology. It’s about answering business questions and delivering value.

– Teresa de Onis

Adopting “Big Data” technologies

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

It's not about building a Hadoop clusteror other “Big Data” solutions for that matter

Adopting “Big Data” technologies

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

http://mat tturck.co m/2016/02 /01/big-d ata-lands cape/

Adopting “Big Data” technologies

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Big Data tools = building blocksBuild whatever you want

Adopting “Big Data” technologies

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

https://gi thub.com/ fluxcapac itor/pipe line

Adopting “Big Data” technologies

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

It requires a team effort… and support from the management

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

If You Want To Succeed With Big Data,

Start Small

– Doug Cutting

The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

Thank you for your time

Q&A

[email protected]


Top Related