investigating questions in biology using computational ... filemosquito human mosquito liver rbc...

104
Investigating questions in biology using computational approaches Group Leader MRC Laboratory of Molecular Biology, Cambridge M. Madan Babu

Upload: others

Post on 23-Sep-2019

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Investigating questions in biology using computational approaches

Group LeaderMRC Laboratory of Molecular Biology, Cambridge

M. Madan Babu

Page 2: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

At what level is this talk pitched at?

“Biologically” inclined

“Com

puta

tiona

lly”i

nclin

ed

Development of methodsAlgorithms, programs, etc

Uncovering general principlesDiscovery using computational approaches

Prioritising experimentsInterpreting experimental results

Page 3: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Outline

• Introduction to resources and tools (10 minutes)

• A case study to highlight data integration (10 mins)

• Specific questions (25 mins)

Page 4: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

How can I know more about a gene?

Knowing more about a gene is like trying toobtain all possible information about a

suspect (as in a murder case) or a person (as in you are interested in someone ☺)

Treat it like investigating a case!

Page 5: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Treat it like investigating a case!(suspect)

What is the suspect’s name?What does the suspect look like?

Where does the suspect live?Where does the suspect work?

What does the suspect do?Who does the suspect interact with?What is the ancestry of the suspect?

Page 6: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Treat it like investigating a case!(gene)

What is the sequence of the protein?What is the structure of the protein?Where in the genome is it encoded?

Which tissues are the genes expressed?Which cellular compartment does it reside in?

What is the function of the protein? Who are the interacting partners of the protein?

What is its evolutionary history?

Page 7: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Information retrieval(think of google)

Database

Query

Search tool

Ranking, statistics

Output

How can one extract information?

Page 8: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Explosion of information about living systems

Major challenge – How to exploit this information?

Expression 20,000 different conditions150 organisms (SMD, GEO, ArrayExpress)

Interaction 100,000 interactions30 organisms (Bind, DIP, publications)

Structure 50,000 structures from 10,000 organisms (PDB, MSD)

Sequence 45,000,000 sequences from 160,000 organisms (Genbank, NCBI)

Page 9: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Sequence

Sequence database (NCBI, ENSEMBL)

Query sequence

BLAST, PSI-BLAST, etc

E-value, score, etc

SequenceAlignment

Structure

Structure database (PDB)

Query structure

DALI, SSM, VAST, etc

p-value, score, etc

StructureAlignments

Page 10: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Expresion

Expression database (GEO, BioGPS)

Query gene

Pearson correlation coefficient

Correlation coefficient

Other transcripts withSimilar expression profile

Network

Functional networks, Protein interaction network, pathway

database, etc

Gene list

Sub-graph extraction

P-value, enrichments

Network of functional associationFunctional enrichment

Page 11: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

What is the list of databases that is currently available?

Question #1

Page 12: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

http://www.oxfordjournals.org/nar/database/c/

1230 selected databases covering various aspects of molecular and cell biology

Page 13: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

http://www.expasy.ch/links.html

Page 14: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

http://database.oxfordjournals.org/

Page 15: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

What is the list of tools, web-servers and programs that are

currently available?

Question #2

Page 16: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

0ver 1500 selected tools covering various aspects of molecular and cell biology

http://bioinformatics.ca/links_directory/

Page 17: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

http://www.expasy.ch/tools/

Page 18: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

http://toolkit.tuebingen.mpg.de/sections/search

Page 19: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Outline

• Introduction to resources and tools (10 minutes)

• A case study to highlight data integration (10 mins)

• Specific questions (25 mins)

Page 20: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Mosquito Human

Mos

quito

LiverR

BC

www.cdc.gov

Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription factors in Plasmodium

5300 genes with over 700 metabolic enzymes

Extensive complement of chromosomal regulatory proteins

Extensive complementsignaling proteins (GTPases, kinases)

Large number of genes Complex life cycle

The Problem!How does this pathogen regulate gene expression?

Page 21: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Alternative regulatory mechanismsChromatin-level regulation

Post-translational modificationRNA based regulation

Undetected transcription factorsDistantly related or unrelated

to known DNA binding domains

Possible explanations for the paradoxical observation

Proteome of Plasmodium

Profiles & HMMs of known DBDs

bZIP

Homeo

MADs

AT-hook

Forkhead

ARID

PF14_0633

+ ?

AT-Hook

SEG

UncharacterizedGlobular domain

~60 aaThe suspect!

Page 22: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Characterization of the globular domain – sequence analysis I

Non-redundant database

+

...

..

..

Lineage specific expansion in Apicomplexa

Plasmodium falciparumPlasmodium vivax

Cryptosporidium parvum

Theileria annulata

Cryptosporidium hominis

Profiles + HMMof this region

Non-redundant database

+

Floral Homeotic protein Q(Triticum)

49L, an endonuclease (X. oryzae phage Xp10)

Globular region maps to AP2 DNA-binding domain

Page 23: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Non-redundant database

+AP2 DNA-binding

Domain fromD. Psychrophila

DP2593

MAL6P1.287(Plasmodium falciparum)

Cgd6_1140/Chro.60146(Cryptosporidium)

AP2 DNA-binding domain maps to the Globular region

Characterization of the globular domain – sequence analysis II

Multiple sequence alignment of all globular

domains

JPRED/PHD

Sequence of secondary structure is similar to the AP2 DNA-binding domain

Homologs of the conserved globular domain constitutes a novel family of the AP2 DNA-binding domain

S1 S2 S3 H1

Page 24: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Characterization of the globular domain – structural analysis I

A. thaliana ethylene response factor(ATERF1 - 1gcc – NMR structure)

Binds GC rich sequences

S1 S2 S3 H1

S1 S2 S3 H1

Predicted SS of ApiAP2

SS of ATERF1

S1 S2 S3 H1

12 residues show a strong pattern of conservation andthese are involved in key stabilizing hydrophobic

interactions that determine the path of the backbonein the three strands and helix of the AP2 domain

Core fold of the ApiAp2 domainwill be similar to the plantAP2 DNA-binding domain

Page 25: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Characterization of the globular domain – structural analysis II

Y186

R147

K156

T175

R170

W154

R152

R150

E160

W172

W162

G5

C6C7

G21

G20

G18

G17

R152 --- G5 (oxo group)D/N --- A (amino group)

R150 --- G20 (oxo group)S/T --- A (amino group)

Changes in base-contacting residues suggest binding to

AT-rich sequence

S2 S3

Charged residues in the insertmay contact multiple phosphate

groups to provide affinity

ApiAp2 domain binds DNA in a sequence specific manner

Page 26: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

RBC infection & merozoite burst

Characterization of the globular domain – expression analysis I

Mosquito Human

Mos

quito

www.cdc.gov

Complex life cycle

Liver

RB

C

Intra-erythrocyte developmental cycleDeRisi Lab

mRNA expression profilingUsing microarray

(sorbitol syncronization)

Page 27: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Characterization of the globular domain – expression analysis II

Co-expressed genes

Ave

rage

exp

ress

ion

prof

ile o

f all

gene

s

0 46Time points

Ring stage

Trophozoite stage

Early Schizont stage

Schizont stage

22 Transcription factors

0 46Time points

Striking expression pattern in specific developmental stages suggests that they could mediate transcriptional regulation of stage specific genes

Page 28: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Characterization of the globular domain – interaction analysis I

Protein interaction network of P. falciparum

Protein (1267)

Physical interaction (2846)LaCount et. al. Nature (2005)

Modified Y2H: Gal4 DBD + Protein + auxotrophic gene

RNA isolated from mixed stages ofIntra-erythrocyte developmental cycle

Guilt by association

Function of interacting neighbors provides cluesabout function of the protein

Page 29: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Characterization of the globular domain – interaction analysis II

Protein interaction network of P. falciparum

Guilt by association supports the role of ApiAp2 proteins to beinvolved in regulation of gene expression

ApiAp2 proteins (13)

Chromatin proteins (8)

50% hypotheticalNucleosome assemblyHMG proteinGlycolytic enzymesAntigenic proteinsHave a PPint domain

MAL8P1.153 (ES)

PFD0985w (S)

PF10_0075 (T)

PF07_0126 (R)

Network of ApiAp2 proteins (97 interactions, 93 proteins)

Gcn5

Page 30: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Sequence Structure Expression Interaction

Integration of different types of experimental data allowed us to discover potential transcription factors

in the Plasmodium genome

Integration of data can generate experimentally testable hypotheses

Page 31: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

The verdict!

Guilty!

Page 32: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Outline

• Introduction to resources and tools (10 minutes)

• A case study to highlight data integration (10 mins)

• Specific questions (25 mins)

Page 33: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

1.Formulate the big question2.Come up with several specific questions3.Prioritise questions and prepare a checklist4.Identify the database5.Identify the tools6.Be aware of the basic statistics7.Retrieve and integrate the information8.Formulate hypothesis and READ A LOT!9.Design experiments10.Publish work & be happy ever after ☺

General approach to investigate biological questions using computational approach

Page 34: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

How can I analyse the sequence of a protein?

Question #1

>gi|75018194MESTEDEFYTICLNLTAEDPSFGNCNYTTDFENGELLEKVVSRVVPIFFGFIGIVGLVGNALVVLVVAANPGMRSTTNLLIINLAVADLLFVIFCVPFTATDYVMPRWPFGDWWCKVVQYFIVVTAHASVYTLVLMSLDR FMAVVHPIASMSIRTEKNALLAIACIWVVILTTAIPVGICHGEREYSYFNRNHSSCVFLEERGYSKLGFQ MSFFLSSYVIPLALISVLYMCMLTRLWKSAPGGRVSAESRRGRKKVTRMVVVVVVVFAVCWCPIQIILLV KALNKYHITYFTVTAQIVSHVLAYMNSCVNPVLYAFLSENFRVAFRKVMYCPPPYNDGFSGRPQATKTTR TGNGNSCHDIV

Page 35: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Proteins are made of domains, which are independent evolutionary units

Unassigned regionNew domain or unstructured region

Single domain Two domain

HTH domain BAR domain

Sequence and structure based domain definition

Structure baseddomain definition

Page 36: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

http://pfam.janelia.org/family/bar

Database of protein domains

Page 37: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

How can I obtain all sequences containing a particular domain?

Click “Species” to obtain all proteins with a particular domain For Marwah Hassan

Page 38: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

http://supfam.org/SUPERFAMILY/cgi-bin/search.cgi

Structural domain assignments to completely sequenced genomes

Page 39: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Genome assignment, sequence alignment, domain combinations and taxonomic distribution For Marwah Hassan

Page 40: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Intrinsically Unstructured Proteins

IUPs: lack an unique 3-dimensional structure, either entirely or in parts. It is assumed that they sample a variety of conformations that

are in dynamic equilibrium under physiological conditions.

Structural continuum of intrinsically unstructured proteins

Page 41: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Exposed peptide may mediate protein-peptide interaction, PTM, etc

Page 42: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Identification of intrinsically unstructured proteins

Computational approaches to predict unstructured regions

Accuracies are similar to current secondary structure predictionmethods from sequence (~80%)

Computationally trained on existing structures in the PDB

DisoPred2: Support vector machinetrained on structure data

DisEMBL: Neural network trained on structure data

Sequence composition based on specific scales

FoldIndex: Identifies regions of lowhydrophobicity but high net chargeusing a sliding window

PONDR: Local amino-acid composition, flexibility and hydropathy

GlobPlot: amino-acid propensity to form globular domains using the russel-linding scale

HCA, IUPred, RONN, SEG, PreLink and several others

Page 43: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

http://en.wikipedia.org/wiki/Intrinsically_unstructured_proteins#Database_of_Protein_Disorder

Programs for identifying unstructured regions in a protein sequence

Page 44: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

http://elm.eu.org/

Identification of potential eukaryotic linear motif

Page 45: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

http://www.southampton.ac.uk/~re1u06/software/slimsuite/

Identification of potential eukaryotic linear motif

Page 46: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

http://smart.embl-heidelberg.de/

SMART combines domain assignment with prediction of linear motifs in unstructured regions

Page 47: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription
Page 48: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

How to characterise an unassigned region in a protein?

Page 49: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

1.Get a good text editor (textpad, bbedit, nedit)2.Run a domain analysis (pfam, superfamily, SMART)3.Identify regions that are of low complexity4.Run Linear Motif prediction5.Run a post-translation modification prediction6.Run a signal peptide prediction7.Run a PSI-BLAST search on individual domains8.Run a PSI-BLAST search on the whole sequence9.Ask if predicted regions are evolutionarily conserved

checklist

Page 50: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

How can I identify structurally similar proteins?

Which amino acids contact the ligand of interest in a structure?

What analyses can I do with a structure?

Question #2

Page 51: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

http://www.pdb.org/pdb/explore/explore.do?structureId=1YDV

Protein Data Bank: repository for protein structures

Page 52: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/GetPage.pl

Comprehensive summary of deposited structures

Page 53: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

http://redpoll.pharmacy.ualberta.ca/vadar/

Interactions, surface area and volume calculations

Page 54: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

http://www.ebi.ac.uk/thornton-srv/software/LIGPLOT/

Ligand contacting residues

Page 55: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

http://www.cathdb.info/

http://scop.mrc-lmb.cam.ac.uk/scop/

Domain definition and evolutionary relationship of

domains

Page 56: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

http://consurf.tau.ac.il/overview.html

Evolutionary analysis of individual amino

acids on the structure

Page 57: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Programs to retrieve similar structures

http://www.ebi.ac.uk/msd-srv/ssm/cgi-bin/ssmserver

http://ekhidna.biocenter.helsinki.fi/dali_server/

http://www.ncbi.nlm.nih.gov/Structure/VAST/vast.shtml

Page 58: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

http://supfam.mrc-lmb.cam.ac.uk/elevy/3dcomplex/About.cgi

Obtain related proteins with similar topology to investigate how interfaces evolve in homologous proteins

Page 59: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

http://www.ebi.ac.uk/pdbe-srv/emsearch/atlas/1740_visualization.html

EMDB: repository for electron microscopy structures

Page 60: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Databases and tools for structure analysis

http://www.ebi.ac.uk/Databases/structure.html

Page 61: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

How can I find the ortholog of my protein in other genomes?

Which residues are functionally important?

Question set #3

Page 62: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Ortholog detection is not trivial

Need to infer evolutionary history of the gene family

Obtain the tree

Investigate synteny

Orthologs are genes in different species that evolved from a common ancestral gene by speciation.

Paralogs are genes related by duplication within a genome.

Orthologs retain the same function in the course of evolution, whereas paralogs evolve new functions, even if these are related to the original one.

Page 63: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

ENSEMBL: a repository for genome sequence

Page 64: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription
Page 65: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Go to the entry page of a gene of interest

Page 66: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Tree of the evolutionary history of a gene

Click “Gene Tree” to get the tree for further detailed analysis

Page 67: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Orthology relationship table

Click “orthologs” to obtain pre-computed orthology relationship table

Page 68: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Synteny

Click “Genomic Alignments” to obtain details about synteny

Page 69: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Alignment

Click “Alignment” to obtain alignments

Page 70: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

http://weblogo.berkeley.edu/examples.html

Weblogo for position specific analysis of an alignment of orthologs

Evolutionary conservation reflects structural or a functional constrain on a position

Page 71: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

http://www.mrc-lmb.cam.ac.uk/genomes/spial/

SPIAL: Specificity conferring residues in paralogous families

Page 72: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Interface residues in glutamate receptor N-

terminal domain

Interface residues between stat4 and stat5

Residues that are different between paralogs but conserved amongorthologs are specificity conferring positions

Page 73: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

http://eggnog.embl.de/

Pre-computed ortholog databases

http://www.ncbi.nlm.nih.gov/COG/

http://inparanoid.sbc.su.se/cgi-bin/index.cgihttp://mbgd.genome.ad.jp/

Page 74: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

How can I know more about my favourite gene?

Splice forms, regulators, promoter structure, protein domains, SNPs, paralogs, genomic position, etc..

Question Set #4

Page 75: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

ENSEMBL entry for a gene has links to a lot of relevant information!

Page 76: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

What are the SNPs in my gene (or region) of interest?

Page 77: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

NCBIhttp://www.ncbi.nlm.nih.gov/sites/gquery

Page 78: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

How to use NCBI website effectivelyhttp://www.ncbi.nlm.nih.gov/guide/all/howto/

Page 79: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

In which tissues, cell lines, and conditions are my genes

expressed?

What are the transcript levels and protein levels for my gene?

Question Set #5

Page 80: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

http://biogps.gnf.org/#goto=welcome

A web-portal for data on expression levels of transcripts in 70 different human and mouse tissues and cancer cell lines

Page 81: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription
Page 82: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

http://www.ebi.ac.uk/microarray-as/ae/

Public repository for gene expression datasets at the EBI

Page 83: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

http://www.ncbi.nlm.nih.gov/geo/

Public repository for gene expression datasets at the NCBI

Page 84: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

http://www.proteinatlas.org/index.php

Human protein expression levels (Immunohistochemistry)

Page 85: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Immunohistochemistry in normal cells

Page 86: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Immunohistochemistry in cancer cells and cell lines

Page 87: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Where can I obtain functional networks for my gene(s)?

Question Set #6

Page 88: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

What is a functional interaction network?

Gene A Gene B

Edge and its weight is a function of:Co-regulationCo-expression across several conditionsPhysical interactionCo-localisationCo-evolution (present/absent together in several genomes)Gene fusion in another organismSimilar phenotype when knocked outToxic when over expressedGenetically interactEpistatic interactionSynthetic lethalityCo-citation

Page 89: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Functional interaction network

Guilty by association

Page 90: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Functional interaction network for eukaryotes

http://www.functionalnet.org/

Page 91: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

http://string-db.org/

Functional interaction network for prokaryotes and eukaryotes

Page 92: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

http://funcoup.sbc.su.se/

Page 93: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

http://sonorus.princeton.edu/hefalmp/

Page 94: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription
Page 95: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

http://avis.princeton.edu/mouseNET/viewgraph.php?graphID=42

Page 96: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

http://pixie.princeton.edu/pixie/

GFD1

For Jack Dean

Page 97: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

I have a set of genes from a Y2H experiment, what can I

make of it?

I have a list of differentially expressed genes, what can I do

with that list?

Question Set #7

For Alex Hamilton

Page 98: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Gene Ontology

The Gene Ontology is a controlled vocabulary, a set of standard terms—words and phrases—used for indexing and retrieving information. In addition to defining

terms, GO also defines the relationships between the terms, making it a structured vocabulary

Page 99: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

The Gene Ontology is a a set of standard terms

Every gene has a set of gene ontology terms associated with it

Given a set of genes, the objective is to ask if particular functions (or pathways) are enriched

Page 100: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Functional Enrichment

http://david.abcc.ncifcrf.gov/tools.jsp

Page 101: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription
Page 102: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

Approach

Are they evolutionarily conserved?Presence/absence inHumans, worm, fly

What are their molecular functions?Sequence &structure

What are their expression patterns?Expression pattern & co-expression analysis

What are their localization patterns?MembraneNucleus

What are the morphological changes?

No. NucleusVesicle size

What are their interaction partners?Protein complexesEvolutionary conservation of partners

Which TFs regulate their expression?Transcription factors involvedEvolutionary conservation of regulators

Which signaling proteins are involved? Signaling proteins involvedEvolutionary conservation

Which metabolic pathways are involved?

Metabolic pathways involvedEvolutionary conservation

Do orthologs behave similarly?

Page 103: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

1.Formulate the big question2.Come up with several specific questions3.Prioritise questions and prepare a checklist4.Identify the database5.Identify the tools6.Be aware of the basic statistics7.Retrieve and integrate the information8.Formulate hypothesis and READ A LOT!9.Design experiments10.Publish work & be happy ever after ☺

General approach to investigate biological questions using computational approach

Page 104: Investigating questions in biology using computational ... fileMosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription

THANK YOU!

questions welcome