trip report seattle

19
Data Integration – Company Engagement – BigData Seattle Trip Report Denis C. Bauer | Research Scientist 19 November 2012 CMIS

Upload: denis-bauer

Post on 19-Jan-2015

510 views

Category:

Technology


1 download

DESCRIPTION

The primary goal of my trip to Seattle was to establish a collaboration with a world-leading group on data integration. But by having chosen Seattle, a hub for technology companies, I also learned about synergies between business and research: Ilya Shmulevich from the Institute for Systems Biology makes use of Amazon's ''Random Forest" implementation and Google's 600.000 CPU cluster for cancer genomic association discovery. I also met with experts from University of Washington and Microsoft research to learn about technological advancements to tackle BigData and commoditizing parallelization. Finally, I observed a government funded research agency invest in solutions geared towards their enterprise structure rather than adopt solutions designed for research institutes without active computational community. In conclusion: CSIRO has unique properties and skill-sets that many collaborators would be interested in benefiting from, in return such collaborations would propel CSIRO instantly to the forefront of technology, which in particular for the analysis of big, unstructured datasets could be very rewarding.

TRANSCRIPT

  • 1. Seattle Trip ReportData Integration Company Engagement BigDataDenis C. Bauer | Research Scientist19 November 2012CMIS

2. About me BSc (Germany) Bioinformatics + Hons (ITEE, UQ) In Silico Protein Design Machine Learning PhD (IMB, UQ) Quantitative models of Transcriptional regulation Optimization PostDoc (IMB, UQ) Sorting the intranuclear proteom Bayesian Networks PostDoc (QBI, UQ) Bioinformatics for the Sequencing Facility Operation Research Scientist (CSIRO) Data integration of Omics data in CRC Develop protocols for data generation Develop pipelines for analysis Research ways for data integration pHealth (Garry Hannan) 3. Seattle: Future hub for life sciences?Seattle Trip Report | Denis C. Bauer | Page 3 4. Primary Goal: Collaboration withWilliam NobleBayesian Network for automaticgrouping of genomic functional elements(TSS, gene) by learning simultaneously frommeasured genomic features (histone Bill Noblemodifications) Michael HoffmanSeattle Trip Report | Denis C. Bauer | Page 4 5. Segway: predictions Histone Modifications H2M3 x0x00x0xx00x00x000x000x00x000000x00x00xxxxx00xx0x00xxxx0x00x0xx00x00x000x000x00x000000x00x00xxxxx00xx0x00xxxx0x00x0xx00x00x000x000x00x0 H3M4 x0x00x0xx00x00x000x000x00x000000x00x00xxxxx00xx0x00xxxx0x00x0xx00x00x000x000x00x000000x00x00xxxxx00xx0x00xxxx0x00x0xx00x00x000x000x00x0 H3M4 0x0xx00x00x000x000x00x000000x00x00xxxxx00xx0x00xxxx0x00x0xx00x00x000x000x00x000000x00x00xxxxx00xx0x00xxxx0x00x0xx00x00x000x000x00x00000 Bayesian Network Train Segmentation & Classification AnnotationPresentation title | Presenter name | Page 5 6. Institute for Systems Biology: case studyfor BigDataTCGA has 20 different cancertypes with up to 900 sampleseach. Faster computers Better approachesAmazon: machine learning method for uncovering Ilya Shmulevichmultivariate associations from large and diverse data sets.Google: Use 10.000 600.000 cores and benefit fromGoogle expertise in compute and storage.Seattle Trip Report | Denis C. Bauer | Page 6 7. ISB App Engine Presentation at Google IO 2012http://popcorn.webmadecontent.org/4d3Seattle Trip Report | Denis C. Bauer | Page 7 8. Focusing on large scale and tactile interactive experiences that engross and envelope the visitor, Philip Worthington (1977-) created Shadow Monsters, a digital version of the traditional shadow puppet.Seattle Trip Report | Denis C. Bauer | Page 8 9. Can CSIRO use outline-detection to do cool stuff ?Seattle Trip Report | Denis C. Bauer | Page 9 10. Road Trip to Pacific Northwestern National LaboratoryPresentation title | Presenter name | Page 10 11. Road Trip to PNNLPresentation title | Presenter name | Page 11 12. Road Trip to PNNLPresentation title | Presenter name | Page 12 13. Road Trip to PNNLPresentation title | Presenter name | Page 13 14. Road Trip to PNNLPresentation title | Presenter name | Page 14 15. Road Trip to PNNLPresentation title | Presenter name | Page 15 16. Enterprise-wide multidisciplinarycollaborationsPNNL predicts from sensor data if and whenradioactive material hits ground water.Mathematical and visual prediction methods ofcompute-intensive expert systemsIans team develops a framework that allowsenterprise wide collaboration Data sharing/annotation/provenance Computational expert pipelines -> graphical programming -> domain experts Developed for computer-grid infrastructure Ian GortonSeattle Trip Report | Denis C. Bauer | Page 16 17. Commoditize parallelizationComputer Science & EngineeringUniversity of WashingtonCurrently: Expert-system if !(embarrassingly parallel) Deciding how to most efficiently bundle for parallel execution and how to resolve The appropriate method can change with the actual load at runtimeParallelization needs to become something thecompiler at run time works out for us(just like we dont write assembly code anymore) SciDB SKEWTUNE (better load for Hadoop) HaLoop (Iterative parallele Data Processing) Magdalena BalazinskaPresentation title | Presenter name | Page 17 18. Commoditize parallelization (andvisualization)HDInsight Hadoop on windows Server and Azure Integration with excelPowerView Interactive graphicsSeattle Trip Report | Denis C. Bauer | Page 18 19. Collaboration options GS (Bill): Bayesian Network ISB (Ilya): Variant association CS (Magda): Iterative parallelization PNNL (Ian): Graphical programming FrameworkThank youCMISDenis C. BauerResearch Scientistt +61 2 9325 3174E [email protected] www.csiro.au/cmisCMIS