big data technology by data sciences thailand ในงาน the first nida business analytics and...
TRANSCRIPT
Big Data Technologies
The First NIDA Business Analytics and Data Sciences Contest/Conference
วันที่ 1-2 กันยายน 2559 ณ อาคารนวมินทราธิราช สถาบันบัณฑิตพัฒนบริหารศาสตร์
https://businessanalyticsnida.wordpress.com
https://www.facebook.com/BusinessAnalyticsNIDA/
Big Data คืออะไร
สถาปัตยกรรมสําหรับข้อมลูขนาดใหญ่เป็นเช่นไร
จะจัดการกับข้อมูลขนาดใหญ่ได้อย่างไร
การประมวลผลข้อมลูขนาดใหญ่จะทาํเช่นไร
Unstructured data ต่างจาก Relational Database Management System หรือไม่
เทคโนโลยีล่าสุดของข้อมลูขนาดใหญ่มีอะไรบ้าง
ทีมงาน Data Science Thailand
นวมินทราธิราช 3001 วันที่ 1 กันยายน 2559 15.15-16.30 น.
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
People think this [big data] is a tech revolution. But it is really a business revolution enabled by technology.
– Steven Messer
Let's define “Big Data”
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
3Vs : Volume Velocity Variety
5Vs : Veracity Values
Let's define “Big Data”
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Let's talk about Scalability
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Scale-UP (vertical)
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Scale-OUT (horizontal)
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
What are the challenges?
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Even with the best hardware, frequent failure is the norm.
Challenges:Handling failure
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Divide & Conquer strategies
Challenges:Parallelization
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Challenges:Barrier to entry
“The IBM Blue Gene/P supercomputer installation at the Argonne Leadership Angela Yang Computing Facility located in the Argonne National Laboratory.”https://en.wikipedia.org/wiki/File:IBM_Blue_Gene_P_supercomputer.jpg
Challenges:Barrier to entry
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Challenges:Barrier to entry
“The Borg, a beowulf cluster used by the McGill University pulsar group to search for binary pulsars (among other things).”https://en.wikipedia.org/wiki/File:Beowulf-cluster-the-borg.jpg
Challenges:Barrier to entry
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Let's talk about the revolution
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hadoop : A brief history
Cutting & Cafarella
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hadoop : Abstraction Layers
Storage Layer : HDFS (GFS)
Compute Layer : YARN + MapReduce
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hadoop
HDFS(Hadoop Distributed File System)
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hadoop : HDFS
FILE_001 A B C
NODE1 NODE2 NODE3
A2
A1
A3
B1
B2 B3
C1
C2C3
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hadoop : HDFShttps://ha doop.apach e.org/doc s/r1.2.1/h dfs_desig n.html
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hadoop
YARN(Yet Another Resource Negotiator)
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hadoop : YARN
A B C D
DATA1 DATA2 DATA1 DATA2
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hadoop : YARN
A B
DATA1 DATA2Network is expensive
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hadoop : YARN
A B C D
DATA1 DATA2 DATA1 DATA2
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hadoop : YARN
A B C D
DATA1 DATA2 DATA1 DATA2
App Master
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hadoop : YARN
A B C D
DATA1 DATA2 DATA1 DATA2
App Master
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hadoop : MapReduce Paradigm
Consists of 2 primary functionsMap & Reduce
Both functions transform data
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hadoop : MapReduce Paradigm
https://twitter.com/steveluscher/status/741089564329054208
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hadoop : MapReduce Paradigm
https://twitter.com/steveluscher/status/741089564329054208
Parallelism achieved!
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hadoop :The Big Data enabler
●Fault-tolerant storage●Fault-tolerant computation●Parallel processing paradigm for the rest of us ?????
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hadoop : Not everyone codes
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hive : A High-level toolfor the rest of us
●Developed by Facebook●HiveQL (looks just like SQL)●Schema-on-Read●Data warehousing at scale
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hive : A High-level toolfor the rest of us
HiveQL(SQL) MapReduce job(s)→
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hive : A High-level toolfor the rest of us
DDLCREATE TABLE inventory(sku int, product_name string, manufacturer string, num_instock int) ROW FORMAT DELIMITEDFIELDS TERMINATED BY ','STORED AS TEXTFILETBLPROPERTIES("skip.header.line.count"="1");
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Hive : A High-level toolfor the rest of us
DMLLOAD DATA INPATH '/home/user/products.csv' OVERWRITE INTO TABLE inventory;
SELECT SUM(num_instock) as instock_countFROM inventory;
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Other selected tools
● Hbase● Kafka● Spark (SQL, GraphX, MLLib, Streaming)● Docker● Kubernetes● Mesos
Impacts of Hadoop(and other “Big Data” tools)
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Impacts of Hadoop(and other “Big Data” tools)
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Democratization of high-volume, high-velocity data processing
Impacts of Hadoop(and other “Big Data” tools)
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
We support Hadoop
Impacts of Hadoop(and other “Big Data” tools)
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
We provide an alternative
Adopting “Big Data” technologies
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Adopting “Big Data” technologies
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Big Data is not a technology. It’s about answering business questions and delivering value.
– Teresa de Onis
Adopting “Big Data” technologies
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
It's not about building a Hadoop clusteror other “Big Data” solutions for that matter
Adopting “Big Data” technologies
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
http://mat tturck.co m/2016/02 /01/big-d ata-lands cape/
Adopting “Big Data” technologies
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Big Data tools = building blocksBuild whatever you want
Adopting “Big Data” technologies
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
https://gi thub.com/ fluxcapac itor/pipe line
Adopting “Big Data” technologies
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
It requires a team effort… and support from the management
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
If You Want To Succeed With Big Data,
Start Small
– Doug Cutting
The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016
Thank you for your time
Q&A