asakusa frameworkとscalaの密かな関係

53
2016/10/8 ひしだま Asakusa FrameworkとScalaの密かな関係 Scala関Summit 2016

Upload: hishidama

Post on 12-Apr-2017

1.141 views

Category:

Software


1 download

TRANSCRIPT

  • 2016/10/8

    Asakusa FrameworkScala

    ScalaSummit 2016

  • 2

    1.

    ScalaAsakusaFW

    2.ScalaSparkAsakusaFW3.ScalaSparkAsakusaFW4.AsakusaFW5.

  • 3

    Twitter ID@hishidama

    http://hiroba.dqx.jp/sc/character/1091135261820/

    AsakusaFWDQ10

  • 4

    2004 Scala2006 Apache HadoopJava62010 2 HadoopHBase2010 SparkOSS2010 Scala2011 3 Asakusa Framework2011 7 Spark2012 2 SIer2012 8 DQ102014 2 Apache Spark2014 3 Java8

  • 5

    Hadoop1. 2. l Hadoop

    http://techblog.yahoo.co.jp/architecture/hadoop/ 6635 53470

    l 8 33

    3. Hadoopl Twitter

  • 6

    HBase1. HadoopNoSQLHBase

    RDBHBase

    HBaseHadoop

  • 7

    NoSQLNoSQLNot Only SQL

    SQLRDBDBDB

    CAPCconsistencyAavailabilityPpartition toleranceCANoSQLCPNoSQL CA

  • 8

    Scala1. HBaseScala

    Better JavaScala

    import

    2. Scala ScalaSeq

    orz

  • 9

    Asakusa Framework1. Hadoop

    HadoopMapReduce Hive Pig Cascading Huahin FrameworkAsakusaFW AZAREA ClusterAsakusaFW

    AsakusaFW

  • 10

    Spark1. Hadoop

    SparkScala

    Mesos

    SparkHadoop

  • 11

    ScalaSparkAsakusaFW

  • 12

    Scala

  • 13

    Apache Hadoop1/3lHDFSlMapReducelYARN

    1.2.MapReducejar

  • 14

    Apache Hadoop2/3

    DB

    app

    app

    Hadoop

    Hadoop

    app

    app

  • 15

    Apache Hadoop3/3Hadoopl Hadoop

    HadooplMapReduce

    l

  • 16

    Apache SparklRDDScalal

    lHDFS

    AMPLabDatabrickslhttps://databricks.com/spark/aboutlApache Spark

  • 17

    Asakusa FrameworklJavaDSLlHadoopSparkM3BP

    l http://www.asakusafw.com/

  • 18

    ScalaSparkAsakusaFW

  • 19

    Scalaval operator = new MyOperator

    val s0 : Stream[Data] =

    val s1 = s0.filter(operator.f)

    val s2 = s1.map(operator.m)

    val out1 = s2.toSeq

  • 20

    ScalaMyOperatorclass MyOperator {

    def f(data: Data) : Boolean =

    data.getValue() % 2 == 0

    def m(data: Data) : Data =

    Data(data.getValue() + 1)

    }

  • 21

    Scalaval operator = new MyOperator

    val s0 : Stream[Data] =

    val s1 = s0.filter(operator.f)

    val s2 = s1.map(operator.m)

    val out1 = s2.toSeq

    DAG

  • 22

    DAG1/2l ER

  • 23

    DAG2/2Directed Acyclic Graphll

  • 24

    Scalaval operator = new MyOperator

    val s0 : Stream[Data] =

    val s1 = s0.filter(operator.f)

    val s2 = s1.map(operator.m)

    val out1 = s2.toSeq

    s0 filter

    f map m

    out1

  • 25

    Apache Sparkval sc = new SparkContext()

    val operator = new MyOperator

    val s0 : RDD[Data] = sc.

    val s1 = s0.filter(operator.f)

    val s2 = s1.map(operator.m)

    s2.saveAsTextFile(out1)

    MyOperatorScala

    s0 filter

    f map m

    out1

  • 26

    Asakusa FrameworkIn s0 = ; //

    Out out1 = ; //

    MyOperatorFactory operator = new MyOperatorFactory();

    Source s1 = operator.f(s0).out;

    Source s2 = operator.m(s1).out;

    out1.add(s2);

    s0 @Branch

    f @Update

    m out1

  • 27

    Asakusa FrameworkMyOperatorFactorypublic abstract class MyOperator {

    @Branch

    public Filter f(Data data) {

    return (data.getValue() % 2 == 0) ? Filter.OUT : Filter.MISSED;

    }

    @Update

    public void m(Data data) {

    data.setValue(data.getValue() + 1);

    }

    }

    MyOperatorFactory

  • 28

    DAG

    DAG

    s0 filter

    f map m

    out1

    s0 @Branch

    f @Update

    m out1

  • 29

    1unionjoinzip

    s0

    out1

    s1

  • 30

    1 unionJava8 Stream API

    Stream out = Stream.concat(Stream.concat(s0, s1), s2);

    Scala Spark

    val out = s0 ++ s1 ++ s2

    AsakusaFW

    Source out = core.confluent(s0, s1, s2);

    1,abc

    2,def

    1,foo

    3,bar

    1,abc

    2,def

    1,foo

    3,bar

  • 31

    1 joinJava8 Stream API

    Scala

    Spark val out = s0.join(s1)

    AsakusaFW

    Source out = operator.join(s0, s1).joined; // @MasterJoin

    1,abc

    2,def

    1,foo

    3,bar

    1,abc,foo

  • 32

    1 cogroupJava8 Stream API

    Scala

    Spark val out = s0.cogroup(s1)

    AsakusaFW

    Source out = operator.group(s0, s1).out; // @CoGroup

    1,abc

    2,def

    1,foo

    3,bar

    2,def,null 1,abc,foo

    3,null,bar

  • 33

    1zip zipJava8 Stream API

    Scala Spark

    val out = s0.zip(s1)

    AsakusaFW

    1,abc

    2,def

    1,foo

    3,bar

    zip 2,def,3,bar 1,abc,1,foo

  • 34

    2duplicate

    s0

    2 out2

    1 out1

  • 35

    2duplicate duplicateJava8 Stream API

    Scala TraversableOnceSpark

    Spark

    val out1 = s0.map(operator.m1) val out2 = s0.map(operator.m2)

    AsakusaFW

    Source out1 = operator.m1(s0).out; Source out2 = operator.m2(s0).out;

  • 36

    3branch

    s0

    out2

    out1

  • 37

    3 branchJava8 Stream API

    Scala Spark

    filter

    AsakusaFW

    // @Branch Branch result = operator.branch(s0); Source out1 = result.out1; Source out2 = result.out2; Source out3 = result.out3;

  • 38

    AsakusaDAGDAG

    @Convert

    @CoGroup

    @Summarize

    @CoGroup

    @MJoinUpdate

    1 252

    @MJoinUpdate

  • 39

    Java8 Stream API

    ListStream Stream

    Scala

    .par

    Spark Scala Streaming

    AsakusaFW HadoopSparkM3BP

    Hadoop, Spark

  • 40

    AsakusaFW

  • 41

    Asakusa Framework1. Hadoop

    Hadoop

    2.

    3. SparkM3BPScalaJava.NETJava

  • 42

    M3 for Batch ProcessingM3BP

    https://github.com/fixstars/m3bpOS

    Spark

    CPUGB

  • 43

    1

    1

    2010Hadoophttp://shiumachi.hatenablog.com/entry/20100703/1278133318CPU 816 1632GB 424TB

    http://www.atmarkit.co.jp/ait/articles/1608/22/news027.html CPU 20 256GB 36TB

    100GB

  • 44

    Asakusa FrameworkAsakusaFWjarHadoopMapReduce

    SparkM3BP

  • 45

    Asakusa on MapReduce

    Asakusa on Spark

    Asakusa on M3BP

    javac javac javac CMake gcc/g++

    MapReducejava

    SparkASMclass scalac

    C++

    SEGV

  • 46

    AsakusaAsakusa on MapReduce

    Asakusa on Spark

    Asakusa on M3BP

    1.2GB561 60MB69

    110 85 8

    29kB900 280B1

    15 60 3

    11GB21700 940MB783

    380 700 260

    74MB53 81GB1084

    3400 2030 400 256270GB

    76GB2420 153MB89

    670 360 92

  • 47

    CPU

    HadoopMapR Spark

    13 128 750GB

    M3BP 1 88 512GB

    l M3BPHadoop Spark M3BP

    l M3BP 122 881.11.2

  • 48

    Asakusa Framework

    M3BP

    AsakusaFW

  • 49

    1/2Hadoop1HDDHDD

  • 50

    2/2

    SSD2020100TB

    CPU100Asakusa on M3BP

    RSACPU110TB

    MRAM

  • 51

    Asakusa DSL1. AsakusaFWAsakusa DSLJava

    DSL

    2. DSLScala3. Asakusa DSLScala

    AsakusaFWSIer

    SIerJava Asakusa Scala DSL

    Asakusa Scala DSLhttps://atnd.org/events/13174

  • 52

  • 53

    Apache Spark

    ScalaStreaming

    HadoopMapReduceAsakusa Framework

    ScalaSIerAsakusa Scala DSL

    DQ10DQ10 ver3.4 2016/10/6