bioinformatics and natural computing disco departmental workshop 2010-06-03

35
Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

Upload: tristan-bowes

Post on 01-Apr-2015

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

Bioinformatics and Natural Computing

DISCo Departmental Workshop2010-06-03

Page 2: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

DISCo UNIMIB Departmental Workshop

2

Outline

• BIMIB: BIonformatics MIlano Bicocca

• Research areas and new directions

http://bimib.disco.unimib.it

• People• Cooperations

2010-06-03

Page 3: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

DISCo UNIMIB Departmental Workshop

3

Research areas and directions

Sequence AnalysisMotif Finding, SNP classification,Haplotyping, Alternative Splicing

Prediction

Statistical Analysis of Biological Experiments

Association Studies, Microarray Analysis, Clustering, Redescriptions

AlgorithmicsApproximation Algorithms for Combinatorial Problems in

Computational Biology (MAST, LCS, Fingerprint clustering …)

Biomedical OntologiesCollaborative Association Studies, Phenotype

Ontology Development

Natural ComputingTheory and applications of Membrane Systems

Splicing Systems and Formal LanguagesDNA Word Design

Evolutionary computing

Systems BiologyModels of biological systems

Stochastic Simulation of Biochemical ProcessesData Mining

2010-06-03

Page 4: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

Natural Computing

Page 5: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

DISCo UNIMIB Departmental Workshop

5

Natural computing• The work conducted in this area concerns the study of models

of computation that are inspired by nature• The most important research lines that the BIMIB group is

pursuing are centered on– DNA computing– Membrane systems– Evolutionary and Genetic computing

2010-06-03

Page 6: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

DISCo UNIMIB Departmental Workshop

6

Natural Computing:basic research• Much of the type of research done in these areas can be

characterized as theoretical computer science, where questions of decidability, computational complexity and expressive power are paramount

• In particular:– Relations with languages in the usual Chomsky hierarchy– Comparison with other computational models– Complexity aspects related to time and space resources– Application of the model to the solution of computationally hard problems– Fitness-driven Importance Sampling techniques for evolutionary algorithms– Operators-Driven Distance Measures

2010-06-03

Page 7: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

DISCo UNIMIB Departmental Workshop

7

Natural Computing:applications

• Some applications include:– Description of cellular phenomena or cellular structures

(e.g., Mechanosensitive channels, Sodium-Potassium pump, …)

– Analysis of the behaviour of complex systems, by means of stochastic models

– Design of software simulators to return meaningful information to biologists

– Automatic assessment of system's biology parameters– Automatic mining of microarray datasets

2010-06-03

Page 8: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

Bioinformatics

Page 9: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

DISCo UNIMIB Departmental Workshop

9

Bioinformatics: sequence analysis applications

• One of the major applications of informatics to the molecular biology lies in the application of string analysis algorithms to the study of nucleic acids and proteomic sequences

2010-06-03

Page 10: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

DISCo UNIMIB Departmental Workshop

10

Bioinformatics: sequence analysis applications

• Alternative splicing prediction– Alternative splicing (AS) is considered one of the main

mechanisms able to explain the huge gap between the number of predicted genes and the high complexity of proteome in human.

– Main goal is the development of fast and reliable computational tools for analyzing and predicting AS from Expressed Sequence Tag (ESTs) and other genomic data

– ASPIC (Alternative Splicing PredICtion) tool

2010-06-03

Page 11: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

DISCo UNIMIB Departmental Workshop

11

Bioinformatics: sequence analysis applications

• Approximate Pattern Discovery– Given a set of nucleotide or protein sequences, find all the

motifs or conserved patterns, i.e.:• All patterns that occur (with a maximum allowed number of mutations,

insertions or deletions) in every sequence of the set• All patterns that occur (as above) in a “surprisingly” high number of

sequences• The pattern “closer” to the sequences under some distance measure

– Pattern discovery: The WeederWeb System

2010-06-03

Page 12: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

DISCo UNIMIB Departmental Workshop

12

Bioinformatics: sequence analysis applications

• Phylogenetic Reconstruction and Comparison– Computational complexity and algorithmic solution of

optimization problems derived by specific instances of the more general problem of comparing phylogenies (or evolutionary networks) to combine them into a single representation (i.e. an evolutionary tree or network).

– A basic problem we investigate in comparative phylogenetics is the reconciliation (or inference) of species tree from gene trees

2010-06-03

Page 13: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

DISCo UNIMIB Departmental Workshop

13

Bioinformatics: sequence analysis applications

• Haplotype Inference (HI) and Genetic Variation Analysis– Design and experimentation of algorithm for solving

combinatorial problems related to haplotype inference and genetic variations analysis.

– Specific computational problems of interest are: • inferring the complete information on haplotypes from

(incomplete or partial) haplotypes or genotypes• efficient reconstruction of the perfect phylogeny describing the

evolutionary history of Single Nucleotide Polymorphisms (SNPs) data in presence of recurrent mutations

2010-06-03

Page 14: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

Statistical Data Analysis of High Throughput Data

Page 15: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

DISCo UNIMIB Departmental Workshop

15

Statistical Data Analysis of Biological Experiments

• The amount of data generated by high-throughput (non-sequencing) biotechnology apparatuses is huge– Microarray– microRNA– Proteomic machinery (cfr. mass-spectrometry)

2010-06-03

Page 16: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

DISCo UNIMIB Departmental Workshop

16

Statistical Data Analysis of Biological Experiments• Statistical methods of various kinds are necessary to validate

hypotheses and perform data mining operations• The research pursued by the group in this area concentrated on

– Time course data analysis with kernel methods evaluation of ontological “enrichments”

– Multiple data sources integration for mass-spectrometry data with mutual information scoring

– Application of Evolutionary and Genetic computing for the assessment of features (biological markers and combination of biological markers) in gene assays

2010-06-03

Page 17: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

Biomedical Ontologies Engineering

Page 18: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

DISCo UNIMIB Departmental Workshop

18

Biomedical Ontologies• The need for common vocabularies and “ontologies”

used to label and/or model data has been recognized as a cornerstone of community research by biologists and physicians

• The BIMIB group worked on using ontologies for two applications– Enrichment studies (cfr., statistical analysis)– Definition of new ontologies for clinical applications and

genotype-phenotype associations

2010-06-03

Page 19: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

DISCo UNIMIB Departmental Workshop

19

Biomedical OntologiesNeuroWEB

• The NeuroWEB project was concluded in 2009– The aim of the NEUROWEB project is to support association

studies in the field of neurovascular medicine, with a special commitment to genotype-phenotype relations

– In particular, in the NEUROWEB project, the phenotype is formulated on the basis of the patients’ clinical data, eventually leading to the comprehensive assessment of the patients’ pathological state

2010-06-03

Page 20: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

DISCo UNIMIB Departmental Workshop

20

Biomedical OntologiesNeuroWEB

• Three main ontological layers (10 Top Phenotypes - ~200 Low Phenotypes - ~300 Core Data Set elements) is organized in taxonomies

• A set of ontological relations (17 object properties) to:– Connect the leaves of the three layers– Enable complex phenotype construction;

• Accessory layers (anatomical parts, quantitative/qualitative attributes, …)

2010-06-03

Page 21: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

DISCo UNIMIB Departmental Workshop

21

Biomedical OntologiesNeuroWEB

2010-06-03

CDS

TOP PHENOTYPEOntoRelations

LOW PHENOTYPE

OntoRelations

Page 22: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

Systems BiologySimulation and Analisys

Page 23: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

DISCo UNIMIB Departmental Workshop

23

Simulation of biological systems• Systems biology is the study of a biological

system emergent properties once modeled (and simulated) as a set of interacting parts

• Different kinds of simulations are possible– Deterministic (differential equations)– Stochastic (Gillespie’s algorithm, a form of Monte Carlo

algorithms)

2010-06-03

Page 24: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

DISCo UNIMIB Departmental Workshop

24

Stochastic Simulation• The modeling formalism:

– Membrane (P) systems

• The simulator– C language– Desktop PC– Cluster DISCo and CINECA with MPI implementation– Algorithm: modified Gillespie’s algorithm with τ-leaping

2010-06-03

Page 25: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

DISCo UNIMIB Departmental Workshop

25

Studying stochasticity in biological systems

• 2 kinds of noise:– intrinsic noise - due to the inherent nature of the biochemical

interactions– extrinsic noise - due to the external environmental conditions

• Complex systems such as the biological ones are non-linear and often exhibits many steady states, bifurcations or chaotic behavior

2010-06-03

Page 26: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

DISCo UNIMIB Departmental Workshop

26

Stochastic simulations: applications

• Molecular and cellular scale:– transport proteins

• Na+/K+ pump, Ca2+ channels, mechanosensitive channels

– chemical reactions • Belousov-Zhabotinsky, Michaelis-Menten

– cellular signaling pathways • EGFR, Ras/cAMP/PKA

– bacterial colonies • Vibrio fischeri, Pseudomonas aeruginosa

2010-06-03

Page 27: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

Biological systems simulations: Colon Rectal Crypts

Three-dimensional schematic of a crypt in the mouse small intestine. The positions of the individual cells show how things might look in a typical crypt. The Paneth cells tend toward the bottom, where they contribute to innate immunity by responding to bacterial infection (Ayabe et al. 2000). The numbers on the cells show the transit cell generation i, as in the Ti of Figure 12.6. The stem cells vary in actual cellular position in the range 3–7, but on average appear to be around cell position 4 when numbered from the bottom. The figure only shows the bottom 7 cell positions of the approximately 15 positions. CSC abbreviates "clonogenic stem cell" (see Figure 12.6). Redrawn from Marshman et al. (2002). Copied from NCBI Frank’s online book

2010-06-03 DISCo UNIMIB Departmental Workshop

27

Page 28: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

DISCo UNIMIB Departmental Workshop

28

People BIMIB DISCo• Marco Antoniotti • Paola Bonizzoni• Claudio Ferretti• Alberto Leporati• Giancarlo Mauri• Raffaella Rizzi• Leonardo Vanneschi• Claudio Zandron• Italo Zoppis

• Roslyn Sagaya Mary Antonath• Stefano Beretta• Mauro Castelli• Paolo Cazzaniga• Gianluca Colombo• Antonella Farinaccio• Luca Manzoni • Dario Pescini • Yuri Pirola• Antonio Enrico Porreca• Andrea Valsecchi

2010-06-03

Page 29: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

DISCo UNIMIB Departmental Workshop

29

Other People• Francesco Archetti, DISCo• Enza Messina, DISCo• Enzo Martegani, BtBs• Marco Vanoni, BtBs• Riccardo Dondi, Un. Bergamo• Gianluca Della Vedova, Statistica,

UNIMIB• Daniela Besozzi, Un. Milano• Giulio Pavesi, Un. Milano• Graziano Pesole, Un. Bari

• Mario Giacobini, Un. Torino• Paolo Provero, Un. Torino• Manuela Gariboldi, IFOM-IEO• James Reid, IFOM-IEO• Luciano Milanesi, ITB CNR• Marco Pierotti, Istituto Nazionale dei

Tumori• Giovanna Castoldi, Medicina,

UNIMIB• Fulvio Magni, Medicina, UNIMIB

2010-06-03

Page 30: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

DISCo UNIMIB Departmental Workshop

30

Other People International• Daniele Merico – Un. Toronto, Toronto, Canada• Gary Bader – Un. Toronto, Toronto, Canada• Bud Mishra – NYU, New York, USA• Naren Ramakrishnan – Virginia Tech, Blacksburg, VA, USA• Victor Moreno – ICOncologia, Barcellona, Spain• Miguel-Angel Pujana – ICOncologia, Barcellona, Spain• Laura Slaughter – National Technical University of Norway (NTNU),

Norway• Aristotelis Chatzioannou – EIE, Athens, Greece• Viktor Malyshkyn – Center for Supercomputing, Russian Academy of

Sciences, Novosibirsk, Russia

2010-06-03

Page 31: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

DISCo UNIMIB Departmental Workshop

31

Conferences and Workshops

• Signs Symptoms and Findings Workshop 2009, September 2009, Milan, Italy

2010-06-03

Page 32: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

DISCo UNIMIB Departmental Workshop

32

International cooperation

• BIMIB DISCo is the institutional contact point for all initiatives concerning the EC Virtual Physiological Human Network of Excellence (www.vph-noe.eu)

2010-06-03

Page 33: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

DISCo UNIMIB Departmental Workshop

33

Funding• Ongoing

– FAR– EnviGP - Improving Genetic Programming for the Environment and Other

Applications, Programa Operacional Factores de Competitividade, Fundação para a Ciência e a Tecnologia (FCT), Portugal (PTDC/EIA-CCO/103363/2008)

– ProteomeNet - Rete Nazionale per lo studio della proteomica umana, FIRB

• Pending– EU FP7 ICT Virtual Physiological Human

• CRControl (coordinator)• BioBridge (partner)

– Regione Lombardia, Programma ASTIL– Regione Lombardia, Programma Quadro/Università– PRIN 2009

2010-06-03

Page 34: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

DISCo UNIMIB Departmental Workshop

34

Publications• All publications authored by BIMIB affiliates and

collaborators are listed on the group web site and on the digidisco platform

http://bimib.disco.unimib.it/index.php/Special:Publications/en

2010-06-03

Page 35: Bioinformatics and Natural Computing DISCo Departmental Workshop 2010-06-03

DISCo UNIMIB Departmental Workshop

35

THANK YOU

2010-06-03