Transcript
Page 1: Towards Enabling Big Data and Federated Computing in the Cloud · Towards Enabling Big Data and Federated Computing in the Cloud Author: Enis Afgan Subject: BOSC2013, Berlin Created

Towards  Enabling  Big  Data  and  Federated  Compu8ng  in  the  Cloud    Yousef  Kowsar1,  Enis  Afgan1,2  

 1VLSCI,  University  of  Melbourne  2CIR,  Ruđer  Bošković  Ins8tute  (RBI)    BOSC  2013,  Berlin  

Page 2: Towards Enabling Big Data and Federated Computing in the Cloud · Towards Enabling Big Data and Federated Computing in the Cloud Author: Enis Afgan Subject: BOSC2013, Berlin Created

Lots of big data

Distributed data

‘Cloud’ resources

Page 3: Towards Enabling Big Data and Federated Computing in the Cloud · Towards Enabling Big Data and Federated Computing in the Cloud Author: Enis Afgan Subject: BOSC2013, Berlin Created

Big  Data  and  Hadoop  

•  Distributed  compu8ng    solu8on  – Map-­‐Reduce  paradigm  – A  plaVorm  for  Big  Data  – Supports  running  of    applica8ons  on  large  clusters  

•  Hadoop  has  been  effec8vely  applied  to  the  Big  Data  problem  

Page 4: Towards Enabling Big Data and Federated Computing in the Cloud · Towards Enabling Big Data and Federated Computing in the Cloud Author: Enis Afgan Subject: BOSC2013, Berlin Created

Federated  Compu8ng    and  HTCondor  

•  An  approach  toward  federated  compu8ng  •  HTCondor:  –  Since  1988  at  University  of  Wisconsin-­‐Madison  – High  Throughput  Compu8ng  on  large  collec8ons  distribu8ve  compu8ng  resources:  cycle  scavenging  

•  Gains  from  using  HTCondor  –  Exis8ng  solu8on  –  Scalability  –  Reliability  –  Cost  

Page 5: Towards Enabling Big Data and Federated Computing in the Cloud · Towards Enabling Big Data and Federated Computing in the Cloud Author: Enis Afgan Subject: BOSC2013, Berlin Created

•  Cloud  Manager  for  orchestra8ng  cloud  resources  •  Cluster-­‐on-­‐the-­‐cloud,  any  cloud  •  Ease  the  process  of  establishing  a  cloud  environment  for  bioinforma8cs  analysis  –  “Galaxy  on  the  Cloud”  

•  Facilitate  management  of  IaaS  services  

Page 6: Towards Enabling Big Data and Federated Computing in the Cloud · Towards Enabling Big Data and Federated Computing in the Cloud Author: Enis Afgan Subject: BOSC2013, Berlin Created

A  path  forward  

•  Have  a  central  manager  capturing  all  the  three  func8ons  at  once:      – CloudMan  

•  Easy  &  ready  to  use  cluster  environment  for  the  cloud  

– Hadoop  •  PlaVorm  for  Big  Data  analysis  

– HTCondor  •  Central  manager  able  to  handle  versa8le,  heterogeneous  compute  environments  

Page 7: Towards Enabling Big Data and Federated Computing in the Cloud · Towards Enabling Big Data and Federated Computing in the Cloud Author: Enis Afgan Subject: BOSC2013, Berlin Created

Our Approach

•  Integrate HTCondor and Hadoop into CloudMan clusters

•  Single management interface

•  Multiple types of workloads and infrastructures

•  Make  it  easier  to  deploy  necessary  plaVorm  and  enable  1.  Tool  development    2.  Data  analysis  

CloudMan

SGE Hadoop Condor

Batch jobs Big-data jobsFederated jobs

Page 8: Towards Enabling Big Data and Federated Computing in the Cloud · Towards Enabling Big Data and Federated Computing in the Cloud Author: Enis Afgan Subject: BOSC2013, Berlin Created

Hadoop-on-demand platform •  Hadoop-over-SGE: dynamically

setup at runtime •  Low and constant setup

overhead •  Increase infrastructure flexibility

•  Cost •  Workload type

Page 9: Towards Enabling Big Data and Federated Computing in the Cloud · Towards Enabling Big Data and Federated Computing in the Cloud Author: Enis Afgan Subject: BOSC2013, Berlin Created

 Hadoop  example  •  Edit  sge-­‐integra8on  script  

•  Submit  your  job  into  SGE  

Page 10: Towards Enabling Big Data and Federated Computing in the Cloud · Towards Enabling Big Data and Federated Computing in the Cloud Author: Enis Afgan Subject: BOSC2013, Berlin Created

HTCondor  integra8on  

•  Local  jobs  run  via  SGE  •  Nodes  pooled  together  via  –  Flocking  –  Gliding  –  Pool  sharing  

AWS$$$

NeCTAR(~private)

Campus

Cluster

Page 11: Towards Enabling Big Data and Federated Computing in the Cloud · Towards Enabling Big Data and Federated Computing in the Cloud Author: Enis Afgan Subject: BOSC2013, Berlin Created

HTCondor  example  Cluster  1  -­‐  AWS   Cluster  2  -­‐  NeCTAR  

Common resource pool

Job submission script

Running jobs

Page 12: Towards Enabling Big Data and Federated Computing in the Cloud · Towards Enabling Big Data and Federated Computing in the Cloud Author: Enis Afgan Subject: BOSC2013, Berlin Created

Conclusions  •  Challenges  – Data  transfer  &  locality  

•  Future  work  –  Streamline  scaling  of  Condor  hosts  –  Integra8on  with  Galaxy  –  Condor  over  Hadoop  

•  An  architecture  paper  available  from  MIPRO  2013  –  “Support  for  data-­‐intensive  compu8ng  with  CloudMan”  

A  cloud  environment  for  distributed  compu8ng:  batch;  Hadoop;  HTCondor  hfp://usecloudman.org  


Top Related