Transcript
Page 1: Initial Proposal for the RNA Alignment Ontology Rob Knight Dept Chem & Biochem CU Boulder

Initial Proposal for the RNA Alignment Ontology

Rob Knight

Dept Chem & Biochem

CU Boulder

Page 2: Initial Proposal for the RNA Alignment Ontology Rob Knight Dept Chem & Biochem CU Boulder

What do we want to do?

• Represent detailed structural info and other metadata on alignment

• Avoid horizontal and vertical expansion

• Explicitly annotate correspondences at the level where they occur

Page 3: Initial Proposal for the RNA Alignment Ontology Rob Knight Dept Chem & Biochem CU Boulder

What do alignments look like now?

QuickTime™ and a decompressor

are needed to see this picture.

Page 4: Initial Proposal for the RNA Alignment Ontology Rob Knight Dept Chem & Biochem CU Boulder

Why is this a problem?

Page 5: Initial Proposal for the RNA Alignment Ontology Rob Knight Dept Chem & Biochem CU Boulder

…so real alignments look like this, to shoehorn everything into columns

that are assumed to be homologous

QuickTime™ and a decompressor

are needed to see this picture.

Page 6: Initial Proposal for the RNA Alignment Ontology Rob Knight Dept Chem & Biochem CU Boulder

Homology is problematic…• Fundamental problem: systems that are homologous at one

level are not necessarily homologous at other levels• E.g. bat wings and bird wings: homologous as pentadactyl

limbs, but not homologous as wings• Homology is hierarchical and

can partially overlap at any level(e.g. Griffiths 2006)

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Ridley “Evolution” 3rd ed.

Bat forelimbs

Bird forelimbs

Frog forelimbs

Rodent forelimbs

Mammal forelimbs

Tetrapod forelimbs

Page 7: Initial Proposal for the RNA Alignment Ontology Rob Knight Dept Chem & Biochem CU Boulder

…and correspondence need not be homology at all!

• Example from SELEX: hammerhead ribozymes independently evolved at least three times: in nature, and in Jack Szostak and Ron Breaker’s labs

• However, we still want to be able to align the functionally equivalent sequences although there is not evolutionary relationship

QuickTime™ and a decompressor

are needed to see this picture.

Page 8: Initial Proposal for the RNA Alignment Ontology Rob Knight Dept Chem & Biochem CU Boulder

So what are going to use the alignment ontology for?

Page 9: Initial Proposal for the RNA Alignment Ontology Rob Knight Dept Chem & Biochem CU Boulder

Use case 1: aligning rRNA

Page 10: Initial Proposal for the RNA Alignment Ontology Rob Knight Dept Chem & Biochem CU Boulder

Problem: have millions of fragments, want to align (incl. noncanonical pairs) + assign named regions

QuickTime™ and a decompressor

are needed to see this picture.

Page 11: Initial Proposal for the RNA Alignment Ontology Rob Knight Dept Chem & Biochem CU Boulder

Solution

• Use existing alignment, try to fit new seqs in

• Would be improved if we could explicitly annotate helices, noncanonical pairs, etc. on the sequence overall

• For display, need to easily show/hide groups of sequences and/or regions of the sequence

Page 12: Initial Proposal for the RNA Alignment Ontology Rob Knight Dept Chem & Biochem CU Boulder

Use case 2: SELEX

• From large number of unaligned sequences, want to identify motifs like this (Majerfeld & Yarus 2005)

Page 13: Initial Proposal for the RNA Alignment Ontology Rob Knight Dept Chem & Biochem CU Boulder

How is this currently done?

• Find regions that are similar in more sequences than chance

• Group these sequences centered on the “motif”• See if the parts of the motif can be related by helices• See if anything else is reliably found by the motif• Repeat for other families and see if there are

relationships between them• Group these families together, then iterate

Page 14: Initial Proposal for the RNA Alignment Ontology Rob Knight Dept Chem & Biochem CU Boulder

QuickTime™ and a decompressor

are needed to see this picture.

e.g. here we discovered unpaired G important

Page 15: Initial Proposal for the RNA Alignment Ontology Rob Knight Dept Chem & Biochem CU Boulder

So how do we handle all this? A proposal

Entities:• sequence_region: a thing that defines a set of bases

relative to some sequence (i.e. with indices for each base)• paired_sequence_region: two regions linked by pairs• helical_sequence_region: two regions completely paired• base: region that consists of single nucleotide• base_pair: region that consists of two, paired bases• canonical_base_pair: base pair that is cis-WW• loop: contiguous sequence_region stretching from i to j

such that i-1 and j+1 are a base pair• etc. (bulge, internal_loop, junction, etc.)

Page 16: Initial Proposal for the RNA Alignment Ontology Rob Knight Dept Chem & Biochem CU Boulder

So how do we handle all this? A proposal

Relationships:• correspondence: relation among set of sequence_regions implying

all share a feature (with metadata about how determined)• homology: correspondence implying continuous chain of descent

preserving the relation• sequence_similarity: correspondence implying regions are similar

in primary sequence• two_d_structure_similarity: correspondence implying regions are

similar in 2D structure, i.e. nested canonical base pairs• secondary_structure_similarity: correspondence implying regions

are similar in secondary structure, i.e. incl. pseudoknots/noncanonicals

• tertiary_structure_similarity: correspondence implying regions are similar in 3D structure

Page 17: Initial Proposal for the RNA Alignment Ontology Rob Knight Dept Chem & Biochem CU Boulder

So how do we handle all this? A proposal

Relationships:• pairing: relation that asserts that two sequence_regions each have

parts of at least one base_pair that connects them• helical_pairing: pairing that includes several base_pairs (not

necessarily contiguous) between two sequence_regions• unbroken_helical_pairing: helical_pairing that includes no bases in

the sequence_regions that are not paired with the other sequence_region, in order

• base_pairing: pairing that connects exactly two bases, annotated with the Leontis-Westhof classification

More exotic uses for alignment:• microrna_target: pairing relation in which one member is a miRNA

and the other is an mRNA according to SO• same_microrna_target: a relation among a set of sequences that

have microrna_target relation to the same miRNA

Page 18: Initial Proposal for the RNA Alignment Ontology Rob Knight Dept Chem & Biochem CU Boulder

Implementation notes

• Must be able to name regions (e.g. P3 in RNaseP) and subclass them (e.g. P3 in firmicutes)

• Must be able to subclass homologies, e.g. homologous as wing vs. homologous as limb

• Correspondences are all symmetric and transitive, so can implement as set of regions that share the correspondence

• (probably) don’t want to reify names of parts of well-known RNAs in the overall RNAO?

Page 19: Initial Proposal for the RNA Alignment Ontology Rob Knight Dept Chem & Biochem CU Boulder

AcknowledgementsRNA Alignment Ontology

working group:• James. W. Brown• Fabrice Jossinet• Rym Kachouri• B. Franz. Lang• Neocles Lenotis• Gerhard Steger• Jesse Stombaugh• Eric WesthofOther coauthors:• Amanda Birmingham• Paul Griffiths• Franz Lang NSF RCN grant #

0443508

Knight Lab members:

• Cathy Lozupone

• Micah Hamady

• Chris Lauber

• Jesse Zaneveld

• Jeremy Widmann

• Elizabeth Costello

• Jens Reeder

• Daniel McDonald

• Anh Vu

• Ryan Kennedy

• Julia Goodrich

• Meg Pirrung

• Reece Gesumaria

Trp project:

• Irene Majerfeld

• Jana Chochosolousova

• Vikas Malaiya

• Matthew Iyer

• Mike Yarus


Top Related