actionable data in life sciences

34
1 Sunday 11 December 16 Jorge Bouças, Bioinformatics Core Facility, MPI-AGE, Köln Actionable data in life sciences

Upload: jorge-boucas

Post on 22-Jan-2018

194 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: Actionable data in life sciences

1 Sunday 11 December 16 Jorge Bouças, Bioinformatics Core Facility, MPI-AGE, Köln

Actionable data in life sciences

Page 2: Actionable data in life sciences

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16 2

Performance

request for data analysis reply with results

time

•  background / scientific question

•  metadata collection

•  data transfer

•  data analysis •  validation

•  data transfer

Page 3: Actionable data in life sciences

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16 3

Performance

request for data analysis reply with results

time

•  background / scientific question

•  metadata collection

•  data transfer

•  data analysis •  validation

•  data transfer

No build test No integration test Tailor cut validation

Page 4: Actionable data in life sciences

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16 4

Performance

request for data analysis reply with results

time

•  background / scientific question

•  metadata collection

•  data transfer

•  data analysis •  validation

•  data transfer

structured inplace actionable 24/7

Page 5: Actionable data in life sciences

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16 5

Performance

þ Network

þ Storage

þ CPUs

þ Memory

þ Software

þ Algorithms ¨ Human

Page 6: Actionable data in life sciences

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16 6

Performance

þ Network

þ Storage

þ CPUs

þ Memory

þ Software

þ Algorithms ¨ Human

"Nur 8,3 Prozent der Stellen für

Informatiker können problemlos besetzt

werden.”

http://www.golem.de

Page 7: Actionable data in life sciences

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16 7

Performance

þ Network

þ Storage

þ CPUs

þ Memory

þ Software

þ Algorithms ¨ Human

Data Science

Computer Science

Math & Statistics

Subject Matter Expertise

/ biology

Unicorn Trad.

Research Trad.

Software

Machine Learning

Copyright 2014 by Steven Geringer Raleigh, NC. Permission is granted to use, distribute, or modify this image, provided that this copyright remains intact

Page 8: Actionable data in life sciences

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16 8

Performance

þ Network

þ Storage

þ CPUs

þ Memory

þ Software

þ Algorithms ¨ Human

“… It appears that the development of effective human cooperation and the development of man-computer symbiosis are "chicken-and-egg" problems. It will take unusual human teamwork to set up a truly workable man-computer partnership, and it will take man-computer partnerships to engender and facilitate the human cooperation. …if the required solutions are not ready, it would not be good to wait for them.”

Licklieder JRC, Clark WE, On-line man-computer communication, Proceedings of the May 1-3, 1962, spring joint computer conference

Page 9: Actionable data in life sciences

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16

“On-line man-computer communication”

HPC

git

datashare

9

Page 10: Actionable data in life sciences

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16

“On-line man-computer communication”

HPC

git

datashare

10

Berlin

Garching

Köln

Page 11: Actionable data in life sciences

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16

“On-line man-computer communication”

HPC

git

datashare

11

Berlin

Garching

Köln

TAPE

in-house

curl / wget md5sum

bit -g

www

rsync

Page 12: Actionable data in life sciences

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16

“On-line man-computer communication”

HPC

git

datashare

12

Berlin

Garching

Köln

results 8kb .. 8gb

private link 21d public link

write upload log on wiki with perma links

push code

https://to.data

bit -i <myfile.txt> -m <code and data message>

customer

Page 13: Actionable data in life sciences

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16

“On-line man-computer communication”

HPC

git

datashare

13

Berlin

Garching

Köln

results 8kb .. 8gb

private link 21d public link

write upload log on wiki with perma links

push code

https://to.data

bit -i <myfile.txt> -m <code and data message>

customer

Binding of Results & Code

Page 14: Actionable data in life sciences

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16

“On-line man-computer communication”

HPC

git

datashare

14

Berlin

Garching

Köln

results 8kb .. 8gb

private link 21d public link

write upload log on wiki with perma links

push code

https://to.data

bit -i <myfile.txt> -m <code and data message>

customer

Binding of Results & Code

> 30 projects / 3 analysts

1 project: > 1000 GB data > 1000 files > 1000 lines of code (with dependencies)

> 10-40 change actions

Page 15: Actionable data in life sciences

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16

“On-line man-computer communication”

15

HPC datashare git

Page 16: Actionable data in life sciences

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16

“On-line man-computer communication”

16

HPC datashare git

bit --start <DP_project_name>

Page 17: Actionable data in life sciences

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16

“On-line man-computer communication”

17

HPC datashare git

bit --start <DP_project_name>

Page 18: Actionable data in life sciences

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16

“On-line man-computer communication”

18

HPC datashare git

bit -i <myfile.txt> -m <code and data message>

Page 19: Actionable data in life sciences

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16

“On-line man-computer communication”

19

HPC datashare git

bit -i <myfile.txt> -m <code and data message>

Page 20: Actionable data in life sciences

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16

“On-line man-computer communication”

20

HPC datashare git

bit -c <folder_to_create>

Page 21: Actionable data in life sciences

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16

“On-line man-computer communication”

21

HPC datashare git

bit -g <folder_or_file_to_download>

Page 22: Actionable data in life sciences

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16

“On-line man-computer communication”

22

HPC HPC2

bit --sync <folder_or_file_to_sync> --sync_to <Uname@HPC2>

Page 23: Actionable data in life sciences

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16

“On-line man-computer communication”

23

HPC HPC2

bit --sync <folder_or_file_to_sync> --sync_from <Uname@HPC2>

Page 24: Actionable data in life sciences

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16

“On-line man-computer communication”

24

HPC git

bit --adduser

Page 25: Actionable data in life sciences

Garching HPC

Köln HPC

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16

“On-line man-computer communication”

git

datashare

25

Berlin

Garching

results 8kb .. 8gb

private link 21d public link

write upload log on wiki with perma links

push code

https://to.data customer

user1

user2

user3

pull code

Page 26: Actionable data in life sciences

Garching HPC

Köln HPC

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16

github.com/owncloud/pyocclient

datashare

26

Garching

results 8kb .. 8gb

private link 21d public link

Page 27: Actionable data in life sciences

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16

github.com/owncloud/pyocclient

27

REST API

Page 28: Actionable data in life sciences

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16

github.com/owncloud/pyocclient

28

Page 29: Actionable data in life sciences

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16

github.com/owncloud/pyocclient

29

Page 30: Actionable data in life sciences

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16

Why?

30

ownCloud http tmp link >> download. simplicity

Github

“With statement-by-statement compiling and testing and with computer-aided book-keeping and program integration, a few very talented men may be able to handle in weeks programming tasks that ordinarily require many people and many months.”

Licklieder JRC, Clark WE, On-line man-computer communication, Proceedings of the May 1-3, 1962, spring joint computer conference

ownCloud + Github data & metadata management

Page 31: Actionable data in life sciences

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16

Front-end

31

Page 32: Actionable data in life sciences

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16

“Back-end”

32

register

http://www.mpcdf.mpg.de/userspace/forms/onlineregistrationform

Sys. Admin. (MPI-AGE)

Github (MPI-MOLGEN)

user

Page 33: Actionable data in life sciences

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16 33

Performance

request for data analysis reply with results

time

•  background / scientific question

•  metadata collection

•  data transfer

•  data analysis •  validation

•  data transfer

bit

Page 34: Actionable data in life sciences

Jorge Bouças, Bioinformatics Core Facility Sunday 11 December 16

[b]ermuda [i]nformation [t]riangle

34

github.com/mpg-age-bioinformatics/AGEpy