e research overview gahegan bioinformatics workshop 2010

25
eResearch: the evolution of science Mark Gahegan Center for eResearch The University of Auckland

Category:

Education


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: E research overview gahegan bioinformatics workshop 2010

eResearch: the evolution of science

Mark Gahegan

Center for eResearch The University of Auckland

Page 2: E research overview gahegan bioinformatics workshop 2010

Vannevar Bush, As We May Think (1945)

There is a growing mountain of research. But there is increased evidence that we are being bogged down today as specialization extends. The investigator is staggered by the findings and conclusions of thousands of other workers - conclusions which he cannot find time to grasp, much less to remember, as they appear.

Professionally our methods of transmitting and reviewing the results of research are generations old and by now are totally inadequate for their purpose…

…A record, if it is to be useful to science, must be continuously extended, it must be stored, and above all it must be consulted. (Bush, 1945)

Page 3: E research overview gahegan bioinformatics workshop 2010

The data explosion (from Wired ‘Big Data, July 2008)

Terabytes What it stores

1 2,600 songs Large hard disk ($200)

20 Photos uploaded to FaceBook every month

120 All the data collected by the Hubble telescope

330 Weekly data produced by the Large Hadron Collider (est.)

440 All the international climate / weather data compiled by the National Climatic Data Center in the USA

530 All the videos in YouTube

1000 (1 petabyte)

Data processed by Google servers every 72 minutes

Page 4: E research overview gahegan bioinformatics workshop 2010

Sarah E. Fratesi, 2008Journal of Research PracticeVolume 4, Issue 1, Article M1,Scientific Journals as Fossil Traces of Sweeping Change in the Structure and Practice of Modern Geology

Page 5: E research overview gahegan bioinformatics workshop 2010

Problems with Science

• The three pillars of Science– Communicable– Repeatable– Refutable

• Science efficiency– Share expensive facilities / equipment– Find, use, and understand, relevant resources– Question assumptions and reasoning

effectively

Page 6: E research overview gahegan bioinformatics workshop 2010

Connectivity resources

eResearchTheories, concepts

Knowledge representation

Data: Observations, measurements, experiments

Instrumentation

Information: real-time, archives, analyses

Informatics resources

Models, simulationsSupercomputing

PeopleCollaboration,

visualization, education resources

Awareness / OutreachEducation

Support / Enabling

Societal contextScience driversGlobal issues

Page 7: E research overview gahegan bioinformatics workshop 2010

Reproducible Science means context, quality, trust, easy access to the sources

Page 8: E research overview gahegan bioinformatics workshop 2010

Methods / workflows are scientific commodities

• Scripts, workflows, simulations, experimental plans statistical models, ...

• Repeatable, reproducible, comparable and reusable research.

• Sharing propagates expertise and builds reputation.

,

http://myexperiment.org

Page 9: E research overview gahegan bioinformatics workshop 2010

Methods

Lab Books

Preprints

DataVideo

Blogs

Podcasts

Codes

Algorithms

Models

Presentations

OntologiesIntermediateResults

Related Articles

Comments& Reviews

Plans

Models

Reproducible, or rather “fully supported”,Transparent science, Composite research components

Carole Goble, UK eScience

Page 10: E research overview gahegan bioinformatics workshop 2010

Methods

Lab Books

Preprints

DataVideo

Blogs

Podcasts

Codes

Algorithms

Models

Presentations

OntologiesIntermediateResults

Related Articles

Comments& Reviews

Connections run both ways…

Carole Goble, UK eScience

Page 11: E research overview gahegan bioinformatics workshop 2010

Virtual Research Environments

Support for knowledge communitiesSocial networks of collaboration, use cases,Emergent trends and patterns

Page 12: E research overview gahegan bioinformatics workshop 2010

Example: GEON—the Geosciences Network

www.geongrid.org

Page 13: E research overview gahegan bioinformatics workshop 2010

3D Earthquake Modeling

Page 14: E research overview gahegan bioinformatics workshop 2010

Earthquake scenarios

Page 15: E research overview gahegan bioinformatics workshop 2010

Some challenges and consequences

• Bigger, infrastructures: some institutionally focussed, some nationally focussed, some community focussed

• Who ‘OWNS’ our research: where is it physically housed? How is access managed?

• eResearch may also change the nature of the ‘Library’ the ‘Institution’ and even the ‘Academy’. Consider: Publish, Peer Review, Contribution, Tenure

Page 16: E research overview gahegan bioinformatics workshop 2010

What next for NZ? Aligning the research institutions around eReseach

Planning with MoRST for a long-term integrated landscape of HPC and eResearch, a National eResearch Infrastructure

What are the research needs, tools, applications, environments, computing capabilities that we will need, over the next 10 years?

Please get in touch if you would like to include your ideas and needs:

[email protected]

[email protected]

Page 17: E research overview gahegan bioinformatics workshop 2010

We_are@the_end

Questions, comments

Page 18: E research overview gahegan bioinformatics workshop 2010

Graphic Correlation Database

PGAPPGAP

Example 1Fossils and climate: Paleo-Integration

(Community and data integration)

PaleoIntegration ProjectAllister Rees, University of Arizona

Page 19: E research overview gahegan bioinformatics workshop 2010

3-tier architecture:

Front - user interface (computer terminal, user-friendly search terms and tools)

Back - databases (schema, ontology coding - age, geography, content)

Middleware - translates user-selected parameters for database searches - keeps track of user selections (workflow), so a modified search doesn’t mean “starting over” - routes user requests to different software components (e.g. data query, spatial data conversion), bringing results from multiple databases and tools together on one screen

How?Architecture—simplified

Page 20: E research overview gahegan bioinformatics workshop 2010

Integration of various data, datasets and databases

Download search results, analyze and interpret data

Fossil collection and publication

Publish new results and interpretations?

Early Jurassic Climates, Vegetation, and Dinosaur Distributions

Page 21: E research overview gahegan bioinformatics workshop 2010

Paleobiology Database (PBDB)Paleomap Project

LATE JURASSIC PLANT DIVERSITY

Page 22: E research overview gahegan bioinformatics workshop 2010

Paleogeographic Atlas Project (PGAP)Oil Source Rocks Dataset (OSR)Paleomap Project

LATE JURASSIC COALS AND EVAPORITES

Page 23: E research overview gahegan bioinformatics workshop 2010

Dinosauria Dataset (DINO)Paleomap Project

LATE JURASSIC DINOSAURS

Page 24: E research overview gahegan bioinformatics workshop 2010

TENDAGURU

MORRISON

Climate / biome reconstruction

Page 25: E research overview gahegan bioinformatics workshop 2010

GEON SYNSEIS Integration Platform

Dogan Seber, SDSC

Subsurface Model

SeismicG

EO

N p

ort

al a

nd

HP

C

En

viro

nm

en

t

Gravity Magnetic

Simulation, Analyses and Integration

Sci

enti

fic

Dis

cove

ries

Inte

rna

l an

d E

xte

rna

l Da

tase

tsExample 2: Earthquake simulation

(data integration & HPC)