a reliable memory-centric distributed storage systemhaoyuan/talks/tachyon_2014-10-16... · a...

58
A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata & Hadoop World NYC Website: tachyon-project.org Meetup: www.meetup.com/Tachyon UC Berkeley

Upload: hatuong

Post on 17-Feb-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata & Hadoop World NYC

Website: tachyon-project.org Meetup: www.meetup.com/Tachyon

UC  Berkeley  

Page 2: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Outline

•  Overview  – Feature  1:  Memory  Centric  Storage  Architecture  – Feature  2:  Lineage  in  Storage  

•  Open  Source  

•  Roadmap

Page 3: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Outline

•  Overview  – Feature  1:  Memory  Centric  Storage  Architecture  – Feature  2:  Lineage  in  Storage  

•  Open  Source  

•  Roadmap

Page 4: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

                                       Projects  UC  Berkeley  

•  Design  next  generaDon  data  analyDcs  stack:  Berkeley  Data  AnalyDcs  Stack  (BDAS)  

                                 a  cluster  manager  making  it  easy  to  write  and  deploy  distributed  applicaDons.  

                                 a  parallel  compuDng  system  supporDng  general  and  efficient  in-­‐memory  execuDon.  

                                                     a  reliable  distributed  memory-­‐centric  storage  enabling  memory-­‐speed  data  sharing.  

Page 5: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

 Why  Tachyon?

5  

Page 6: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Memory  is  King  

•  RAM  throughput  increasing  exponenDally  •  Disk  throughput  increasing  slowly  

Memory-­‐locality  key  to  interacDve  response  Dme

Page 7: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Realized  by  many…  •  Frameworks  already  leverage  memory  

 

7  

Page 8: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Problem  solved?

8  

Page 9: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Missing  a  SoluDon  for  Storage  Layer

9  

Page 10: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

An  Example:                                -­‐          

•  Fast  in-­‐memory  data  processing  framework  – Keep  one  in-­‐memory  copy  inside  JVM  – Track  lineage  of  operaDons  used  to  derive  data  – Upon  failure,  use  lineage  to  recompute  data  

map  

filter   map  

join   reduce  

Lineage  Tracking  

Page 11: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Issue  1  Data  Sharing  is  the  bo/leneck  in  

analy4cs  pipeline:  Slow  writes  to  disk  

Spark  Task  

                       Spark  mem                                  block  manager  

block  1  

block  3  

Spark  Task  

                       Spark  mem                                  block  manager  

block  3  

block  1  

                                 HDFS  /  Amazon  S3  block  1  

block  3  

block  2  

block  4  

storage  engine  &    execution  engine  same  process  (slow  writes)  

11  

Page 12: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Issue  1  

Spark  Task  

                       Spark  mem                                  block  manager  

block  1  

block  3  

Hadoop  MR  

YARN  

                                 HDFS  /  Amazon  S3  block  1  

block  3  

block  2  

block  4  

storage  engine  &    execution  engine  same  process  (slow  writes)  

12  

Data  Sharing  is  the  bo/leneck  in  analy4cs  pipeline:  Slow  writes  to  disk  

Page 13: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Issue  2  

Spark  Task  

Spark  memory  block  manager  

block  1  

block  3  

               HDFS  /  Amazon  S3  block  1  

block  3  

block  2  

block  4  

execution  engine  &    storage  engine  same  process  

13  

Cache  loss  when  process  crashes.  

Page 14: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Issue  2  

crash  

Spark  memory  block  manager  

block  1  

block  3  

                           HDFS  /  Amazon  S3  block  1  

block  3  

block  2  

block  4  

execution  engine  &    storage  engine  same  process  

14  

Cache  loss  when  process  crashes.  

Page 15: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

                           HDFS  /  Amazon  S3  

Issue  2  

block  1  

block  3  

block  2  

block  4  

execution  engine  &    storage  engine  same  process  

crash  

15  

Cache  loss  when  process  crashes.  

Page 16: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

                           HDFS  /  Amazon  S3  

Issue  3  

In-­‐memory  Data  Duplica4on  &  Java  Garbage  Collec4on  

Spark  Task  

                       Spark  mem                                  block  manager  

block  1  

block  3  

Spark  Task  

                       Spark  mem                                  block  manager  

block  3  

block  1  

block  1  

block  3  

block  2  

block  4  

execution  engine  &    storage  engine  same  process  (duplication  &  GC)  

16  

Page 17: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Tachyon  

 Reliable  data  sharing  at  memory-­‐speed  

within  and  across  cluster  frameworks/jobs    

17  

Page 18: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

SoluDon  Overview  

Basic  idea  •  Feature  1:  memory-­‐centric  storage  architecture  •  Feature  2:  push  lineage  down  to  storage  layer    Facts  •  One  data  copy  in  memory  •  RecomputaDon  for  fault-­‐tolerance  

Page 19: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Stack

Computation  Frameworks  (Spark,  MapReduce,  Impala,  H2O,  …)  

Existing  Storage  Systems  (HDFS,  S3,  GlusterFS,  …)  

Tachyon  

Page 20: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Memory-­‐Centric  Storage  Architecture

20  

Page 21: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Issue  1  revisited  

Memory-­‐speed  data  sharing  among  jobs  in  different  frameworks  

execution  engine  &    storage  engine  same  process  (fast  writes)  

Spark  Task  

                       Spark  mem  block  1  

Hadoop  MR  

YARN  

                             HDFS  /  Amazon  S3  block  1  

block  3  

block  2  

block  4  

HDFS  disk  

block  1  

block  3  

block  2  

block  4  Tachyon  in-­‐memory  

block  1  

block  3   block  4  

21  

Page 22: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Issue  2  revisited  

Spark  Task  

Spark  memory  block  manager  

block  1  

                           HDFS  /  Amazon  S3  block  1  

block  3  

block  2  

block  4  

execution  engine  &    storage  engine  same  process  

Tachyon  in-­‐memory  

block  1  

block  3   block  4  

22  

Keep  in-­‐memory  data  safe,  even  when  a  job  crashes.  

Page 23: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Issue  2  revisited  

Spark  memory  block  manager  

HDFS  disk  

block  1  

block  3  

block  2  

block  4  

execution  engine  &    storage  engine  same  process  

Tachyon  in-­‐memory    

block  1  

block  3   block  4  

crash  

                               HDFS  /  Amazon  S3  block  1  

block  3  

block  2  

block  4   23  

Keep  in-­‐memory  data  safe,  even  when  a  job  crashes.  

Page 24: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Issue  2  revisited  

HDFS  disk  

block  1  

block  3  

block  2  

block  4  

execution  engine  &    storage  engine  same  process  

Tachyon  in-­‐memory    

block  1  

block  3   block  4  

crash  

                             HDFS  /  Amazon  S3  block  1  

block  3  

block  2  

block  4  

Keep  in-­‐memory  data  safe,  even  when  a  job  crashes.  

24  

Page 25: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Issue  3  revisited  

No  in-­‐memory  data  duplica4on,  much  less  GC  

Spark  Task  

Spark  mem  

Spark  Task  

Spark  mem  

                             HDFS  /  Amazon  S3  block  1  

block  3  

block  2  

block  4  

execution  engine  &    storage  engine  same  process  (no  duplication  &  GC)  

HDFS  disk  

block  1  

block  3  

block  2  

block  4  Tachyon  in-­‐memory  

block  1  

block  3   block  4  

25  

Page 26: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Lineage  in  Storage  (alpha)

26  

Page 27: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Comparison  with  in  Memory  HDFS    

Page 28: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Workflow  Improvement

Performance  comparison  for  realisDc  workflow.  The  workflow  ran  4x  faster  on  Tachyon  than  on  MemHDFS.  In  case  of  node  failure,  applicaDons  in  Tachyon  sDll  finishes  3.8x  faster.

28  

Page 29: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Further  Improve  Spark’s  Performance  

Grep  Program  

Page 30: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

How  easy  /  hard  to  use  Tachyon?

30  

Page 31: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Spark/MapReduce/Shark  without  Tachyon  

•  Spark  – val  file  =  sc.textFile(“hdfs://ip:port/path”)  

•  Hadoop  MapReduce  – hadoop  jar  hadoop-­‐examples-­‐1.0.4.jar  wordcount  hdfs://localhost:19998/input  hdfs://localhost:19998/output  

•  Shark  – CREATE  TABLE  orders_cached  AS  SELECT  *  FROM  orders;  

Page 32: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Spark/MapReduce/Shark  with  Tachyon  

•  Spark  – val  file  =  sc.textFile(“tachyon://ip:port/path”)  

•  Hadoop  MapReduce  – hadoop  jar  hadoop-­‐examples-­‐1.0.4.jar  wordcount  tachyon://localhost:19998/input  tachyon://localhost:19998/output  

•  Shark  – CREATE  TABLE  orders_tachyon  AS  SELECT  *  FROM  orders;

Page 33: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Spark  on  Tachyon  ./bin/spark-­‐shell  sc.hadoopConfiguraDon.set("fs.tachyon.impl",  "tachyon.hadoop.TFS")    //  Load  input  from  Tachyon  val  file  =  sc.textFile("tachyon://localhost:19998/LICENSE")  file.count()  ;  file.take(10);    //  Store  RDD  OFF_HEAP  in  Tachyon  import  org.apache.spark.storage.StorageLevel;  file.persist(StorageLevel.OFF_HEAP)  file.count();  file.take(10);    //  Save  output  to  Tachyon  file.flatMap(line  =>  line.split("  ")).map(s  =>  (s,  1)).reduceByKey((a,  b)  =>  a  +  b).saveAsTextFile("tachyon://localhost:19998/LICENSE_WC")  

Page 34: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Outline

•  Overview  – Feature  1:  Memory  Centric  Storage  Architecture  – Feature  2:  Lineage  in  Storage  

•  Open  Source  

•  Roadmap

Page 35: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

History Started  at  UC  Berkeley  AMPLab  from  the  summer  of  2012  

•  Reliable,  Memory  Speed  Storage  for  Cluster  CompuDng  Frameworks  (UC  Berkeley  EECS  Tech  Report)  

•  Haoyuan  Li,  Ali  Ghodsi,  Matei  Zaharia,  Ion  Stoica,  Scot  Shenker  

35  

Page 36: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

A                                                            Open  Source  Status  

•  Apache License 2.0, Version 0.5.0 (July 2014)

•  Deployed at tens of companies

•  20+ Companies Contributing

•  Spark and MapReduce applications can run without any code change

Page 37: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Release  Growth

37  

Tachyon 0.1: -1 contributor

Dec ‘12

Page 38: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Release  Growth

Tachyon 0.2: - 3 contributors

Apr ‘13 38  

Tachyon 0.1: -1 contributor

Dec ‘12

Page 39: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Release  Growth

Tachyon 0.2: - 3 contributors

Oct‘13 Apr ‘13

Tachyon 0.3: - 15 contributors

39  

Tachyon 0.1: -1 contributor

Dec ‘12

Page 40: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Release  Growth

Tachyon 0.2: - 3 contributors

Feb ‘14 Oct‘13 Apr ‘13

Tachyon 0.3: - 15 contributors

Tachyon 0.4: - 30 contributors

40  

Tachyon 0.1: -1 contributor

Dec ‘12

Page 41: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Release  Growth

Tachyon 0.2: - 3 contributors

Feb ‘14 Oct‘13 Apr ‘13

Tachyon 0.3: - 15 contributors

Tachyon 0.4: - 30 contributors

41  July ‘14

Tachyon 0.5: - 46 contributors

Tachyon 0.1: -1 contributor

Dec ‘12

Page 42: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Open  Community

42  

Berkeley  Contributors  

Non-­‐Berkeley  Contributors  

Page 43: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Thanks  to  our  Code  Contributors!  Aaron Davidson Achal Soni Ali Ghodsi Andrew Ash Anurag Khandelwal Aslan Bekirov Bill Zhao Brad Childs Calvin Jia Chao Chen Cheng Chang Cheng Hao Colin Patrick McCabe David Capwell

43  

David Zhu Du Li Fei Wang Gerald Zhang Grace Huang Haoyuan Li Henry Saputra Hobin Yoon Huamin Chen Jey Kottalam Joseph Tang Juan Zhou Jun Aoki Lin Xing

Lukasz Jastrzebski Manu Goyal Mark Hamstra Mingfei Shi Mubarak Seyed Nick Lanham Orcun Simsek Pengfei Xuan Qianhao Dong Qifan Pu Raymond Liu Reynold Xin Robert Metzger Rong Gu

Sean Zhong Seonghwan Moon Shivaram Venkataraman Srinivas Parayya Tao Wang Timothy St. Clair Thu Kyaw Vamsi Chitters Xi Liu Xiang Zhong Xiaomin Zhang Zhao Zhang

Page 44: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Thanks  to  Redhat!  

Page 45: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Thanks  to  Redhat!  

Commercially  supported    by                        x    

and  running  in  dozens  of  their  customers’  clusters

Page 46: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Thanks  to  Redhat!  

Tachyon  is  the  Default  Off-­‐Heap  Storage  

SoluLon  for              .  

Page 47: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

 Exchange  Data  Between  Spark  and  H20

47  

Page 48: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Believe  from  Industry

48  

Page 49: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Reaching  wider  communiDes:  e.g.  GlusterFS

49  

Page 50: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Under  Filesystem  Choices (Big  Data,  Cloud,  HPC,  Enterprise)

Page 51: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Under  Filesystem  Choices (Big  Data,  Cloud,  HPC,  Enterprise)

Page 52: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Outline

•  Overview  – Feature  1:  Memory  Centric  Storage  Architecture  – Feature  2:  Lineage  in  Storage  

•  Open  Source  

•  Roadmap

Page 53: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Features

•  Memory  Centric  Storage  Architecture  •  Lineage  in  Storage  (alpha)  •  Hierarchical  Local  Storage  •  Data  Serving  •  Different  hardware  •  More…  •  Your  Requirements?  

53  

Page 54: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Short  Term  Roadmap  (0.6  Release) •  Ceph  IntegraDon  (Ceph  Community,  Redhat)    •  Hierarchical  Local  Storage  (Intel)    •  Performance  Improvement  (Yahoo)  

•  MulD-­‐tenancy  (AMPLab)  

•  Mesos  IntegraDon  (Mesos  Community,  Mesosphere)  

•  Network  Sub-­‐system  Improvement  (Pivotal)    •  Many  more  from  AMPLab  and  Industry  Contributors  

54  

Page 55: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Goal?  

55  

Page 56: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Beter  Assist  Other  Components  

Tachyon  

Spark   MapReduce  

Spark  SQL  H2O   GraphX   Impala  

HDFS   S3   GlusterFS  

OrangeFS   NFS   Ceph   ……  

……  

Welcome  CollaboraLon!

Page 57: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Thanks!  Ques.ons?  

•  More  Informa.on:  – Website:  h;p://tachyon-­‐project.org  – Github:  h;ps://github.com/amplab/tachyon  – Meetup:  h;p://www.meetup.com/Tachyon  

•  Email:  [email protected]

Page 58: A Reliable Memory-Centric Distributed Storage Systemhaoyuan/talks/Tachyon_2014-10-16... · A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata &

Release  Growth

Tachyon 0.2: - 3 contributors

Feb ‘14 Oct‘13 Apr ‘13

Tachyon 0.3: - 15 contributors

Tachyon 0.4: - 30 contributors

58  July ‘14

Tachyon 0.5: - 46 contributors

Tachyon 0.1: -1 contributor

Dec ‘12