the iplant collaborative iptol data assembly workshop november 21 st, 2009 steve goff, sonya lowry,...
DESCRIPTION
What is the process for identifying GC questions? Encourage and assist the community in organizing grand challenge workshops, forming grand challenge teams and developing grand challenge ‘white papers’ (‘proposals’) Community-representative Board of Directors (drawn from community nominations) evaluates ‘proposals’ and teams and makes recommendations for priorities iPlant leadership team decides whether & how to implement the Board’s recommendations and assists GC team leads in assessing needs, specifying requirements and designing ‘Discovery Environments’ to serve the team’s and the broader community’s needsTRANSCRIPT
The iPlant Collaborative
iPToL Data Assembly WorkshopNovember 21st, 2009
Steve Goff, Sonya Lowry, Martha Narro, Dan Stanzione
University of Arizona, Texas Advanced Computing Center
What is the iPlant Collaborative??
• iPlant’s nature:• an organization that enables new conceptual advances through enables new conceptual advances through
integrative, computational thinkingintegrative, computational thinking • an organization that is by, for, and of the community• a service-oriented project, not a research project (creates CI in
support of plant science research; but does not perform research outside of prototyping & testing)
• iPlant’s mission: •address an evolving array of plant science grand challenge address an evolving array of plant science grand challenge questionsquestions• to enable the research community to identify the to enable the research community to identify the major problems in in
plant sciences, plant sciences, thenthen to develop & pursue needed CI to develop & pursue needed CI solutionssolutions
What is the process for identifying GC questions?
• Encourage and assist the community in organizing grand challenge workshops, forming grand challenge teams and developinggrand challenge ‘white papers’ (‘proposals’)
• Community-representative Board of Directors (drawn from community nominations) evaluates ‘proposals’ and teams and makes recommendations for priorities
• iPlant leadership team decides whether & how to implement the Board’s recommendations and assists GC team leads in assessing needs, specifying requirements and designing ‘Discovery Environments’ to serve the team’s and the broader community’s needs
Biological questions will drive cyberinfrastructure design
• Phylogenetic relationships among species– Building large phylogenetic trees (species and gene)– Understanding Green Plant species relationships– Understanding gene family evolution– Addressing taxonomic problems and concepts– Facilitating understanding of evolution – form & function – Facilitating understanding of evolutionary processes
• Phenotype-Genotype relationships
Slide # 4
Biological questions will drive cyberinfrastructure design
• Phylogenetic relationships among species
• Phenotype-Genotype relationships– Sharing, accessing, and integrating datasets– Analysis and extraction of information and patterns
(automated phenotyping, imaging, etc)– Identifying complex relationships; networks & systems– Assigning functions to genes, networks & systems– Integration of phenological & ecological data with
networks and systems of genes, proteins, etc.– Understanding responses to environmental changes &
stresses (including climate) - natural and ‘ag’ ecosystems
Slide # 5
The iPlant Collaborative
Internal Advisory Board
Science Opportunities
Teams
Administrative
SupportTeam
Education, Outreach, and Training
Team
Cyberinfrastructure
Development Team
Executive Team
National Science Foundation
Board of
DirectorsCommunity Grand Challenge
Teams
Executive Team
Co-DirectorSteve Goff
Co-DirectorDan Stanzione
Cyberinfrastructure Development Team
Project Managers
Karla Gendler Michael Gonzales
Lead DeveloperSonya Lowry
Semantic Web
ArchitectDamian Gessler
CI Team
Phylogenetics
Engagement Team LeadSheldon Mckay
Gen2Phen Engagement Team LeadMatt Vaughn
CI Advisory TeamGreg AndrewsSudha RamNirav MerchantLincoln SteinDoreen Ware
IT/Infrastruct
ure Edwin Skidmore
Developers, Systems Staff, Research Scientists at ASU, CSHL, UA, UT
Scope: What iPlant won’t do
• iPlant is not a funding agency– A large grant shouldn’t become a bunch of small grants
• iPlant will not fund data generation• iPlant will (probably) not fund <favorite tool x> – Whose funding is ending
• iPlant will not replace all online databases• iPlant will not *impose* community standards
Scope: What iPlant *will* do• Provide storage, computation, hosting, &
programmer effort to support GC projects • Work with community to support & develop
standards• Provide forums to discuss the role and design of
CI in plant science• Help organize the community to collect data• Provide appropriate funding for time spent
helping us design and test the CI
GC Projects to Date
• 2 Grand Challenges• 11 Working Groups• Participation from ~45 scientists from ~25
institutions beyone the iPlant original team
iPToL
• Final Deliverable– A web (interface) environment allowing the
scientific community to create, access, share, annotate, and visualize phylogenetic tree(s) of varying size and complexity. Included in this environment are the software tools, as well as the infrastructure to host, process, analyze, and store this information.
• 6 working groups
iPToL Working Groups
• Data Assembly– Assembling the data to produce the 500k taxa tree
• Big Trees– Providing the methods (and the tree) to produce a 500K taxa tree
• Trait Evolution– Providing methods to relate phylogenetic trees to the evolution of specific traits
• Tree Reconciliation– Developing tools for inferring gene family histories in the context of species trees
• Data Integration– Combining data from different sources to initially meet the needs of the iPToL working
groups• Tree Visualization
– Developing tools to visualize large trees and annotations
iPToL WG Membership
iPG2P
• Final deliverable:– Procedure allowing an investigator to begin with
trait of interest in species possessing limited genetic resources and progress toward ability to predict trait scores for known genotypes in given, non-constant environments
• Identifying cross-cutting biological use cases to be addressed by working groups– Ex: develop informatics tools to reveal regulatory
networks underlying photosynthetic differentiation in C3 and C4 plants
• 5 working groups
iPG2P Working Groups
• NextGen Sequencing– Establishing an informatics pipeline that will allow the plant community to process
NextGen sequence data• Statistical Inference
– Developing a platform using advanced computational approaches to statistically link genotype to phenotype
• Modeling Tools– Developing a framework to support tools for the construction, simulation and analysis
of computational models of plant function at various scales of resolution and fidelity• Visual Analytics
– Generating, adapting, and integrating visualization tools capable of displaying diverse types of data from laboratory, field, in silico analyses and simulations
• Data Integration– Investigating and applying methods for describing and unifying data sets into virtual
systems that support iPG2P activities
iPG2P WG Membership
Additional Efforts (in progress or consideration)
• Image Analysis Platform – Edgar Spalding, BS Manjunath, Kris Kvilekval, Justin Borovitz, Steve Welch, Ed Buckler
• Semantic Web – Damian Gessler• APWeb2 – • Taxonomic Intelligence –
Why Evolution of Plants is more Exciting than Evoultuion of Humans
Slide # 18www.iplantcollaborative.org
versus
Discussion