Big data technology by Data Sciences Thailand ในงาน THE FIRST NIDA BUSINESS ANALYTICS AND DATA SCIENCES CONTEST/CONFERENCE จัดโดย คณะสถิติประยุกต์และ DATA SCIENCES THAILAND

Download Big data technology by Data Sciences Thailand ในงาน THE FIRST NIDA BUSINESS ANALYTICS AND DATA SCIENCES CONTEST/CONFERENCE จัดโดย คณะสถิติประยุกต์และ DATA SCIENCES THAILAND

Post on 12-Apr-2017

818 views

Category:

Education

0 download

Embed Size (px)

TRANSCRIPT

  • Big Data Technologies

    The First NIDA Business Analytics and Data Sciences Contest/Conference

    1-2 2559

    https://businessanalyticsnida.wordpress.com

    https://www.facebook.com/BusinessAnalyticsNIDA/

    Big Data

    Unstructured data Relational Database Management System

    Data Science Thailand

    3001 1 2559 15.15-16.30 .

  • The First NIDA Business Analytics & Data Sciences Conference

    Big Data Technology

    1-2 September 2016

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    People think this [big data] is a tech revolution. But it is really a business revolution enabled by technology.

    Steven Messer

    Let's define Big Data

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    3Vs : Volume Velocity Variety

    5Vs : Veracity Values

    Let's define Big Data

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Let's talk about Scalability

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Scale-UP (vertical)

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Scale-OUT (horizontal)

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    What are the challenges?

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Even with the best hardware, frequent failure is the norm.

    Challenges:Handling failure

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Divide & Conquer strategies

    Challenges:Parallelization

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Challenges:Barrier to entry

    The IBM Blue Gene/P supercomputer installation at the Argonne Leadership Angela Yang Computing Facility located in the Argonne National Laboratory.https://en.wikipedia.org/wiki/File:IBM_Blue_Gene_P_supercomputer.jpg

    Challenges:Barrier to entry

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Challenges:Barrier to entry

    The Borg, a beowulf cluster used by the McGill University pulsar group to search for binary pulsars (among other things).https://en.wikipedia.org/wiki/File:Beowulf-cluster-the-borg.jpg

    Challenges:Barrier to entry

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Let's talk about the revolution

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Hadoop : A brief history

    Cutting & Cafarella

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Hadoop : Abstraction Layers

    Storage Layer : HDFS (GFS)

    Compute Layer : YARN + MapReduce

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Hadoop

    HDFS(Hadoop Distributed File System)

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Hadoop : HDFS

    FILE_001 A B C

    NODE1 NODE2 NODE3

    A2

    A1

    A3

    B1

    B2 B3

    C1

    C2C3

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Hadoop : HDFShttps://ha doop.apach e.org/doc s/r1.2.1/h dfs_desig n.html

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Hadoop

    YARN(Yet Another Resource Negotiator)

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Hadoop : YARN

    A B C D

    DATA1 DATA2 DATA1 DATA2

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Hadoop : YARN

    A B

    DATA1 DATA2Network is expensive

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Hadoop : YARN

    A B C D

    DATA1 DATA2 DATA1 DATA2

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Hadoop : YARN

    A B C D

    DATA1 DATA2 DATA1 DATA2

    App Master

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Hadoop : YARN

    A B C D

    DATA1 DATA2 DATA1 DATA2

    App Master

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Hadoop

    MapReduce

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Hadoop : MapReduce Paradigm

    Consists of 2 primary functionsMap & Reduce

    Both functions transform data

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Hadoop : MapReduce Paradigm

    https://twitter.com/steveluscher/status/741089564329054208

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Hadoop : MapReduce Paradigm

    https://twitter.com/steveluscher/status/741089564329054208

    Parallelism achieved!

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Hadoop :The Big Data enabler

    Fault-tolerant storageFault-tolerant computationParallel processing paradigm for the rest of us ?????

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Hadoop : Not everyone codes

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Hive : A High-level toolfor the rest of us

    Developed by FacebookHiveQL (looks just like SQL)Schema-on-ReadData warehousing at scale

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Hive : A High-level toolfor the rest of us

    HiveQL(SQL) MapReduce job(s)

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Hive : A High-level toolfor the rest of us

    DDLCREATE TABLE inventory(sku int, product_name string, manufacturer string, num_instock int) ROW FORMAT DELIMITEDFIELDS TERMINATED BY ','STORED AS TEXTFILETBLPROPERTIES("skip.header.line.count"="1");

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Hive : A High-level toolfor the rest of us

    DMLLOAD DATA INPATH '/home/user/products.csv' OVERWRITE INTO TABLE inventory;

    SELECT SUM(num_instock) as instock_countFROM inventory;

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    What else?

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Other selected tools

    Hbase Kafka Spark (SQL, GraphX, MLLib, Streaming) Docker Kubernetes Mesos

  • Impacts of Hadoop(and other Big Data tools)

    The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

  • Impacts of Hadoop(and other Big Data tools)

    The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Democratization of high-volume, high-velocity data processing

  • Impacts of Hadoop(and other Big Data tools)

    The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    We support Hadoop

  • Impacts of Hadoop(and other Big Data tools)

    The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    We provide an alternative

  • Adopting Big Data technologies

    The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

  • Adopting Big Data technologies

    The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Big Data is not a technology. Its about answering business questions and delivering value.

    Teresa de Onis

  • Adopting Big Data technologies

    The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    It's not about building a Hadoop clusteror other Big Data solutions for that matter

  • Adopting Big Data technologies

    The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    http://mat tturck.co m/2016/02 /01/big-d ata-lands cape/

  • Adopting Big Data technologies

    The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Big Data tools = building blocksBuild whatever you want

  • Adopting Big Data technologies

    The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    https://gi thub.com/ fluxcapac itor/pipe line

  • Adopting Big Data technologies

    The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    It requires a team effort and support from the management

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    If You Want To Succeed With Big Data,

    Start Small

    Doug Cutting

  • The First NIDA Business Analytics & Data Sciences Conference : 1-2 September 2016

    Thank you for your time

    Q&A

    don@datascienceth.com

    Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29Slide 30Slide 31Slide 32Slide 33Slide 34Slide 35Slide 36Slide 37Slide 38Slide 39Slide 40Slide 41Slide 42Slide 43Slide 44Slide 45Slide 46Slide 47Slide 48

Recommended

View more >