towards enabling big data and federated computing in the cloud · towards enabling big data and...

Post on 04-Jun-2020

10 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Towards  Enabling  Big  Data  and  Federated  Compu8ng  in  the  Cloud    Yousef  Kowsar1,  Enis  Afgan1,2  

 1VLSCI,  University  of  Melbourne  2CIR,  Ruđer  Bošković  Ins8tute  (RBI)    BOSC  2013,  Berlin  

Lots of big data

Distributed data

‘Cloud’ resources

Big  Data  and  Hadoop  

•  Distributed  compu8ng    solu8on  – Map-­‐Reduce  paradigm  – A  plaVorm  for  Big  Data  – Supports  running  of    applica8ons  on  large  clusters  

•  Hadoop  has  been  effec8vely  applied  to  the  Big  Data  problem  

Federated  Compu8ng    and  HTCondor  

•  An  approach  toward  federated  compu8ng  •  HTCondor:  –  Since  1988  at  University  of  Wisconsin-­‐Madison  – High  Throughput  Compu8ng  on  large  collec8ons  distribu8ve  compu8ng  resources:  cycle  scavenging  

•  Gains  from  using  HTCondor  –  Exis8ng  solu8on  –  Scalability  –  Reliability  –  Cost  

•  Cloud  Manager  for  orchestra8ng  cloud  resources  •  Cluster-­‐on-­‐the-­‐cloud,  any  cloud  •  Ease  the  process  of  establishing  a  cloud  environment  for  bioinforma8cs  analysis  –  “Galaxy  on  the  Cloud”  

•  Facilitate  management  of  IaaS  services  

A  path  forward  

•  Have  a  central  manager  capturing  all  the  three  func8ons  at  once:      – CloudMan  

•  Easy  &  ready  to  use  cluster  environment  for  the  cloud  

– Hadoop  •  PlaVorm  for  Big  Data  analysis  

– HTCondor  •  Central  manager  able  to  handle  versa8le,  heterogeneous  compute  environments  

Our Approach

•  Integrate HTCondor and Hadoop into CloudMan clusters

•  Single management interface

•  Multiple types of workloads and infrastructures

•  Make  it  easier  to  deploy  necessary  plaVorm  and  enable  1.  Tool  development    2.  Data  analysis  

CloudMan

SGE Hadoop Condor

Batch jobs Big-data jobsFederated jobs

Hadoop-on-demand platform •  Hadoop-over-SGE: dynamically

setup at runtime •  Low and constant setup

overhead •  Increase infrastructure flexibility

•  Cost •  Workload type

 Hadoop  example  •  Edit  sge-­‐integra8on  script  

•  Submit  your  job  into  SGE  

HTCondor  integra8on  

•  Local  jobs  run  via  SGE  •  Nodes  pooled  together  via  –  Flocking  –  Gliding  –  Pool  sharing  

AWS$$$

NeCTAR(~private)

Campus

Cluster

HTCondor  example  Cluster  1  -­‐  AWS   Cluster  2  -­‐  NeCTAR  

Common resource pool

Job submission script

Running jobs

Conclusions  •  Challenges  – Data  transfer  &  locality  

•  Future  work  –  Streamline  scaling  of  Condor  hosts  –  Integra8on  with  Galaxy  –  Condor  over  Hadoop  

•  An  architecture  paper  available  from  MIPRO  2013  –  “Support  for  data-­‐intensive  compu8ng  with  CloudMan”  

A  cloud  environment  for  distributed  compu8ng:  batch;  Hadoop;  HTCondor  hfp://usecloudman.org  

top related