cloud based ngs framework

63
Cloud based NGS Analysis Framework 김김김 김김김김김 [email protected] KM 김김김 Insilicogen, Inc.

Upload: hyungyong-kim

Post on 24-Jan-2015

894 views

Category:

Documents


4 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Cloud based NGS framework

Cloud based NGS Analysis Frame-work

김형용 책임개발자

[email protected] 사업부Insilicogen, Inc.

Page 2: Cloud based NGS framework

2

“ 데이터를 얻는 능력 , 즉 데이터를 이해하는 능력 , 처리하는 능력 , 가치를 뽑아내는 능력 , 시각화하는 능력 , 전달하는 능력이야말로 앞으로 10 년간 엄청나게 중요한 능력이 될 것이다”

Hal Varian, Chief Economist at Google

Page 3: Cloud based NGS framework

VIRTUALIZATION

Page 4: Cloud based NGS framework

Virtualization

Page 5: Cloud based NGS framework

Virtualization

• 컴퓨터 자원의 추상화를 일컫는 말

• 가상의 물리적 리소스를 만들어 냄 .

• 물리적인 1 대의 하드웨어 자원을 논리적으로 여러 개로 나누어 사용하거나 ,

• 여러대의 하드웨어 자원을 논리적으로 통합하여 이용하는 기술

• 하드웨어 관리 , 재난에 대한 시스템 복구 등 여러 문제를 해결할 수 있는 방법으로

최근 각광 받고 있음

가상화

Page 6: Cloud based NGS framework

Virtualization

6Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

• 비용절감 서버 한 대를 분할하여 여러 대의 서버를 구성할 수 있음

서버 구입비용 절감 , 전기 , 상면비용 , 서버관리비용이 절감

• 자원의 효율적인 사용 서버의 비 활용되는 자원을 이용하여 가상머신을 만듬으로써 효율적인 자원사용이 가능

• 안정적인 운영 서버를 이미지로 백업 , 손쉬운 서버 이전으로 장애에 대한 신속한 대처 가능

• SW 의 지속적인 운영 서버 HW 의 수명 주기가 끝나면 OS 벤더는 장치 드라이버 지원이 중단됨

-> 마이그레이션 문제가 발생

가상머신에 기존의 시스템을 가상머신에 올리기 때문에 장치 드라이버에 대한 문제

가 발생하지 않음

가상화의 장점 !!

Page 7: Cloud based NGS framework

가상화 이점

Copyright ⓒ Insilicogen,Inc. 2011. All rights reserved. 7

단일서버 - CPU: 2 - RAM: 96G - HDD: 1T

Type A

단일서버 - CPU: 24 - RAM: 96G - HDD: 500G

Type B

클러스터서버 - CPU: 2 - RAM: 8G - HDD: 500G - NODE: 12EA

Type C

일반적인서버 구성

추가적인 하드웨어 구매필요모든 자원이 활용되는 것은 아님

Page 8: Cloud based NGS framework

가상화 이점

Copyright ⓒ Insilicogen,Inc. 2011. All rights reserved. 8

단일서버 - CPU: 2 - RAM: 96G - HDD: 1T

Type A

단일서버 - CPU: 24 - RAM: 96G - HDD: 500G

Type B

클러스터서버 - CPU: 2 - RAM: 8G - HDD: 500G - NODE: 12EA

Type C

가상화 이용서버 구성

가상머신

가상머신

가상머신

하드웨어 비용 절감자원의 효율적 이용

Page 9: Cloud based NGS framework

클라우드 서비스에 기본적으로 활용

Copyright ⓒ Insilicogen,Inc. 2011. All rights reserved. 9

Page 10: Cloud based NGS framework

OpenNebula

• Virtual Machine(VM) 관리 Tool

• Xen, KVM, VMWare 등의 관리 제공• OpenNebula 의 기능들 - User Management - VM Image Management - Virtual Network Management - Virtual Machine Management - User Interfaces - Service Management - Scheduling - Infrastructure Management - Storage Management

10

Page 11: Cloud based NGS framework

OpenNebula - Sunstone

11

Page 12: Cloud based NGS framework

OpenStack

12

IaaS cloud computing by Raskpace Cloud and NASA

Open source software for building private and public clouds

Deliver solutions for all types of clouds by being simple to implement, mas-sively scalable

Page 13: Cloud based NGS framework

GRID COMPUTING

Page 14: Cloud based NGS framework

Grid vs Cluster

14Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

대용량 데이터에 대한 연산을 작은 소규모 연산들로 나누어 작은 여러대의 컴퓨터로 분산시켜 수행

WAN 상에서 서로 다른 기종의 머신들을 연결다양한 플랫폼을 서로 연결함연결대수에 제한이 없음

공통점

차이점

Page 15: Cloud based NGS framework

Grid

15Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

Page 16: Cloud based NGS framework

Globus Toolkit

16Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

대표적인 계산 그리드 미들웨어 Open source toolkit for building computing

grids developed and provided by Globus Al-liance

Standards implementation• Open Grid Service Architecture (OGSA)• Open Grid Service Infrastructure (OGSI)• Web Services Resource Framework

(WSRF)• Job Submission Description Language

(JSDL)• Distributed Resource Management

Application API (DRMAA)• SOAP• WSDL• Grid Security Infrastructure

Page 17: Cloud based NGS framework

17Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

High level Open Grid Forum API specification for submission and control of jobs to a Distributed Resource Management (DRM, Job scheduler) sys-tem, such as a Cluster or Grid computing infrastructure

Page 18: Cloud based NGS framework

PBS (Portable Batch System)

18Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

Computer software that performs job scheduling in Unix cluster envi-

ronment

A component of the Globus Toolkit

Originally developed by NASA

Following versions

• OpenPBS

• TORQUE – a fork of OpenPBS

• PBS Professional (PBS pro) - commercial

Page 19: Cloud based NGS framework

TORQUE

19Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

Distributed resource manager providing con-trol over batch jobs and distributed compute node

It stands for Terascale Open Source Resource and QUEue Manager

Slave 노드의 CPU 개수 , core 개수 , RAM사이즈 , 임시저장소 등의 설정정보를 가지고 스케줄러에 의해 요청이 왔을 때 클러스터 리소스를 분배함

Master

Slave 1

Slave 2

Slave 3

> qsub a.sh

NFS

a.sh 명령을 스케줄러에 따라 slave 로 넘김

Page 20: Cloud based NGS framework

Virtualized Galaxy (Test-bed)

20Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

Page 21: Cloud based NGS framework

CLOUD COMPUTING

21Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

Page 22: Cloud based NGS framework

Cloud computing

22Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

Delivery of computing and storage capacity as a service to a heterogeneous commu-nity of end-recipients.

Page 23: Cloud based NGS framework

23Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

Page 24: Cloud based NGS framework

VPS (Virtual Private Server)

24Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

Internet hosting services to refer a virtual machine in a cloud

Page 25: Cloud based NGS framework

AMAZON WEB SERVICES

25Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

Page 26: Cloud based NGS framework

26

Amazon EC2 (Amazon Elastic Compute Cloud)

Virtualization + Grid(Cluster) computing in a Cloud

Page 27: Cloud based NGS framework

27

Amazon EC2 (Amazon Elastic Compute Cloud)

Page 28: Cloud based NGS framework

28

Amazon EC2 (Amazon Elastic Compute Cloud)

Page 29: Cloud based NGS framework

29

Amazon EC2 (Amazon Elastic Compute Cloud)

Page 30: Cloud based NGS framework

30

Amazon S3 (Amazon Simple Storage Service)

Page 31: Cloud based NGS framework

31

Aspera Connect Server

FTP 대비 국내연결시 3x~5x, 해외연결시 5x~1000x 전송속도 향상1000 Genome, EBI 등 해외 주요 생물정보 사이트에서도 서비스

Page 32: Cloud based NGS framework

GALAXY CLOUDMAN

32Copyright ⓒ Insilicogen, Inc. 2010. All rights reserved.

Page 33: Cloud based NGS framework

33Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

Page 34: Cloud based NGS framework

Galaxy 구성요소

Galaxy 주요구성 요소

Datasources : 입력 데이터 지정 . 별도의

지역 시스템이나 , 외부 웹사이트의 데이터를

등록 가능

Tool : 기본적인 분석의 최소 단위 ,

지역설치시 원하는 툴을 만들어 넣을 수 있음

History : 입력데이터가 Tool 의 조합을

거쳐 얻어진 중간 결과물 목록

Workflow : History 는 입력데이터 및

파라메터만 바꾸면 새로운 데이터 결과를 얻을

수 있다 . 이를 별도로 프로세스 등록

Visualization : 분석결과를 가시화 도구와

연결

Page : 위 요소들을 종합한 보고서 작성 기능

34Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

Eprimer3 tool 을 별도로 만들어 등록한 예제

Page 35: Cloud based NGS framework

Galaxy tool 은

35Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

Tool입력포맷

출력포맷

입력 데이터를 ( 포맷에 맞게 ) 작업하여 ( 포맷에 맞게 ) 출력 데이터를 만드는 역할

조합하면 Workflow 가 된다

Page 36: Cloud based NGS framework

Creating your own Galaxy

36Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

Page 37: Cloud based NGS framework

Primer design tool

37Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

Page 38: Cloud based NGS framework

Galaxy on Cloud

38Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

Using Amazon EC2 + S3

Select AMIs in Community AMIs

Page 39: Cloud based NGS framework

Galaxy on Cloud

39Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

Page 40: Cloud based NGS framework

Galaxy on Cloud

40Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

Page 41: Cloud based NGS framework

Galaxy on Cloud

41Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

Page 42: Cloud based NGS framework

Galaxy on Cloud

42Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

Page 43: Cloud based NGS framework

Galaxy on Cloud

43Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

Page 44: Cloud based NGS framework

Galaxy on Insilicogen

44Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

Galaxy localization on cluster

Tool development

Workflow development

Page 45: Cloud based NGS framework

CLOUD BASED NGS ANALYSIS SERVICE

45Copyright ⓒ Insilicogen, Inc. 2010. All rights reserved.

Page 46: Cloud based NGS framework

46Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

AWS 를 활용 HPC 서비스 제공 ( 예 , PacBio 의 SMART)

Page 47: Cloud based NGS framework

47Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

Page 48: Cloud based NGS framework

48Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

30x Human genome 1 sample (150G) 500 만원 (1 년저장 )

Page 49: Cloud based NGS framework

49Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

구글로부터 투자받아 NCBI SRA 서비스 연동

온라인에서 실험없이 곧바로 분석 가능

Page 50: Cloud based NGS framework

50Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

BGI 의 무료 분석서비스 현재 인간데이터 분석에 초점 . 6 월부터 타 생물종 지원예정

Page 51: Cloud based NGS framework

51Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

Page 52: Cloud based NGS framework

52Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

Page 53: Cloud based NGS framework

53Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

Page 54: Cloud based NGS framework

54Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

Bina Box 라는 별도의 컴퓨터를 분석장비에 장착

이곳에서 기본 분석 후 데이터 용량을 줄여 Cloud 로 전송

Page 55: Cloud based NGS framework

55Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

Page 56: Cloud based NGS framework

56Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

Genome-in-a-Day

Page 57: Cloud based NGS framework

57

Page 58: Cloud based NGS framework

58

Page 59: Cloud based NGS framework

CONCLUSION

59Copyright ⓒ Insilicogen, Inc. 2010. All rights reserved.

Page 60: Cloud based NGS framework

Cloud based NGS analysis

60Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

No need to purchase hardware

Data acquisition and analysis and service in the same space

Elastic computing power and storage

But, data transfer problem (Aspera, NAS box)

My Book Thun-derbolt 6TB

Page 61: Cloud based NGS framework

Opportunity

61Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

Domestic Analysis Market Expansion (PGM21, Teragen,…)

For large NGS analysis, We need more server and storage

AWS is more easy and cheap

Customer want to easy analysis and high quality product

Need to easy web application

With KT?

Page 62: Cloud based NGS framework

What can we do?

62Copyright Insilicogen,Inc. 2011. All rights reserved.ⓒ

Customized/Advanced Analysis Service Positioning

Galaxy + IncoBook on the cloud

Specialized analysis pipeline on the cloud

Page 63: Cloud based NGS framework

www.insilicogen.comE-mail [email protected] Tel 031-278-0061Fax 031-278-0062