discriminating facts from artefacts in the secreted ly-6 protein family

39
Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family Christopher Southan Department of Molecular Pharmacology AstraZeneca R&D, Mölndal, Sweden

Upload: chris-southan

Post on 26-Jun-2015

585 views

Category:

Technology


3 download

DESCRIPTION

Presented at the University of Nottingham, 2005

TRANSCRIPT

Page 1: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

Christopher Southan

Department of Molecular Pharmacology

AstraZeneca R&D, Mölndal, Sweden

Page 2: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

Outline

• Introduction• Proteomic identification of novel secreted rat Ly6 proteins in EST data• Discovery of unknown homologues• Bioinformatic analysis of chimeric mRNAs• Database errors propagated by the chimeras • Delineating a large secreted Ly6 family on the rat genome• Discovery of mouse homologues but no clear orthologues• Equivocal biochemical results for homologues• Summary of bioinformatic pitfalls

Page 3: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

Introduction: Quirks that Lurk in Databases

• The sequence deluge into the primary databases necessitates automated pipelines to produce 'value added' secondary databases

• But, however sophisticated the data parsing or curation, anomalies will get through

• Most things that could have gone wrong, have• Although the overall quirk frequency is low, they present

pitfalls for the unwary• Responsibility for primary annotation and sequence quality

lies solely with submitting authors• Few originating authors correct, update or withdraw their

primary sequence entries• It is difficult to discriminate between in vitro artifacts or rare in

vivo events

Page 4: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

Rat Urine HPLC Intact MALDI N-Terminal Sequence

High-speed microbore column

Page 5: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

Rat Urine 2D-Gel Trypsin MS/MS PepSea Search EST hits

Spot 1 gave two differentpeptide matches

• CTSFDSTGFCHVGR contained within rat EST A893514

• CESLDSTGLCR contained within rat EST AA800439

Page 6: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

EST AA893514 vs. dbEST: 30 Rat Hits at 95% to 100% Identity

Page 7: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

Assembly of Rat Urinary Proteins 1 and 2

• 9 EST sequences, the MS/MS sequences, and the N-terminal Edman data, were consistent with two paralogous proteins

• 90% identical at the AA level and 96% identical at the DNA level• Highly represented in rat liver ESTs• One N-glycosylation site with 1.6 to 2.0 Kda glycan• Secreted forms abundant in male rat urine by HPLC• RUP1 independently verified as liver regeneration-related protein by full mRNA

verified signal peptide

RUP1 MGKHILLLPLGLSLLMSSLLALQCFRCTSFDSTGFCHVGRQKCQTYPDEICAWVVVTTRD ||| ||||||||||||||||||||||| |:||||:|:|||: |||||||||||||||||| RUP2 MGKPILLLPLGLSLLMSSLLALQCFRCESLDSTGLCRVGRRICQTYPDEICAWVVVTTRD RUP1 GKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR 101 ||||||||||||| :|||||||||:||||||||||||||| RUP2 GKFVYGNQSCAECIGTTVEHGSLIISTNCCSATPFCNMVHP 101

Page 8: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

RUP3: Independent MS-based Identification by Wait et al. “Proteins of rat serum, urine and CSF:VI”

Electrophoresis 22, 3043-3052 (2001)

RUP1 MGKPILLLPLGLSLLMSSLLALQCFRCESLDSTGLCRVGRRICQTYPDEICAWVVVTTRDRUP2 MGKHILLLPLGLSLLMSSLLALQCFRCTSFDSTGFCHVGRQKCQTYPDEICAWVVVTTRDRUP3 MGKHILLLPLGLSLLMSSLLALQCFRCISFDSTGFCYVGRHICQTYPDEICAWVVVTTRD *** *********************** * **** * ***. ******************

RUP1 GKFVYGNQSCAECIGTTVEHGSLIISTNCCSATPFCNMVHP EST AA800439RUP2 GKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR EST AA893514 RUP3 GKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR EST AA893518 ************* *********.***************

Page 9: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

RUP Paralogues Define a New Family of Secreted Ly-6 Proteins

UP1_RAT : UP2_RAT : UP3_RAT : SP1_RAT :

* 20 * 40 * MGKPILLLPLGLSLLMSSLLALQCFRCESLDSTGLCRVGRRICQTYPDEICMGKHILLLPLGLSLLMSSLLALQCFRCTSFDSTGFCHVGRQKCQTYPDEICMGKHILLLPLGLSLLMSSLLALQCFRCISFDSTGFCYVGRHICQTYPDEICMGKNILLLLLGLSFVIGFLQALRCLECDMLNSDGICEKGNSTCEAKEDQEC

: 51 : 51 : 51 : 51

UP1_RAT : UP2_RAT : UP3_RAT : SP1_RAT :

60 * 80 * 100 AWVVVTTRDGKFVYGNQSCA-ECIGTTVEHGSLIISTNCCSATPFCNMVHPAWVVVTTRDGKFVYGNQSCA-ECNATTVEHGSLIVSTNCCSATPFCNMVHRAWVVVTTRDGKFVYGNQSCA-ECNATTVEHGSLIVSTNCCSATPFCNMVHRGILVVSQG-VDILFGMQDCSSHCLNKTFHHYNLTLDFTCCHDQSLCNEF--

: 101 : 101 : 101 : 99

Page 10: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

A Quirky Result: Solid Matches Between RUP2 and Four Unrelated mRNAs

• Rat mitochondrial IF1 protein mRNA, L07806, 883 bp• Rat casein kinase II alpha subunit (CK2), L15618, 2180 bp • Rat mitochondrial succinyl-CoA synthetase alpha subunit

J03621, 1684 bp• Rat 3' non-translated beta-F1-ATPase mRNA-binding protein

mRNA AF368860, 1197 bp • Matches of 92% to 100% identity over 300-500 bases• Two in reverse-frame, two in forward frame

Page 11: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

L07806 F1-ATPase inhibitor

AF368860 UTR F1-ATPase inhib

L15618 casein kinase II alpha

J03621 mito succinyl-CoA synthase alpha

Three RUP-like Chimeras and a Pre-mRNA

Page 12: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

Translation Matches for the Chimeras Reveal a Cryptic Protein

RUP-2 28 TSFDSTGFCHVGRQKCQTYPDEICAWVVVTTRDGKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR 101 TSFDSTGFCHVGRQKCQTYPDEICAWVVVTTRDGKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR 417 TSFDSTGFCHVGRQKCQTYPDEICAWVVVTTRDGKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR 196

L07806 Rattus rattus mitochondrial IF1 protein mRNA

RUP-2: 59 RDGKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR 101 RDGKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR 708 RDGKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR 580

L15618 Rat casein kinase II alpha subunit (CK2) mRNA

RUP-2 24 CFRCTSFDSTGFCHVGRQKCQTYPDEICAWVVVTTRDGKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMV 99 CF C + +S G C+ C +P E+CA V+T +DGKFVYGNQSCAEC+ TVEHGSLIVSTNCCSAT FCN+V 50 CFECGNLNSMGICNFRTAVCYAHPGEVCA-SVLTYKDGKFVYGNQSCAECSGRTVEHGSLIVSTNCCSATSFCNIV 274

J03621 Rat mitochondrial succinyl-CoA synthetase alpha subunit

Page 13: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

RUP1 Gene Structure

Page 14: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

Matching the Chimeras Against the Rat Genome

SCORE START END QSIZE IDENTITY CHRO STRAND START END ------------------------------------------------------------

L15618 Rat casein kinase II alpha subunit 1451 709 2177 2180 99.9% 3 + 142470350 142514932 799 1091 2161 2180 90.2% 10 - 39567792 39568918 313 392 711 2180 99.1% 8 - 36902949 36905031

L07806 Rattus rattus mitochondrial IF1 protein 405 420 826 833 100.0% 5 + 152628418 152632060 398 8 415 833 99.1% 8 - 36902399 36905032

J03621Rat mitochondrial succinyl-CoA synthetase subunit 1203 472 1684 1684 100.0% 4 + 106816653 106845979 469 1 472 1684 100.0% 8 - 36133698 36137263

AF368860 Rattus norvegicus 3' non-translated beta-F1-ATPase1118 1 1120 1120 100.0% 8 + 37247995 372515301016 1 1120 1120 96.9% 8 + 36688890 369050341006 1 1120 1120 95.6% 8 + 36901482 37055697

Page 15: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

Multiple Loci on Rat Chromosome 8: Erroneous Mapping of the Chimeras

L15618 casein kinase II alphaL07806 F1-ATPase inhibitorAF198441 Rat RUP2 AF198442 Rat spleen protein 1

Page 16: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

What Caused the Chimeras?

• Each of the chimeric cDNAs submitted by different research groups 1988-1993

• All were prepared from rat cDNA libraries• Two of these genes are nuclear-encoded mitochondrial proteins• L07806-IF1 has 2 non-chimeric counterparts• Hits to rat genome data confirm the three 'host' transcripts are on

different loci• The 5' insertions are different sequences, lengths and orientations• L15618 is single-exon insert and maps to an unexpressed locus• Are these insertions of RUP2-like genes in vitro artefacts or rare

translocation events in vivo?

Page 17: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

Protein Database Entries from the Chimera and Pre mRNA

The L07806-derived chimeric protein was chosen as the reference sequence by NCBI

NP_037047 ATPase inhibitor, mitochondrial precursor length = 107: NP_037047 MTKSCRIEASTLGVWGMRVLQTRGFGSDS M S + LGVWGMRVLQTRGFGSDSQ03344 MAGSALAVRARLGVWGMRVLQTRGFGSDS

but Swiss-Prot Q03344 highlights the discrepancy and correctly chooses “normal” rather than the chimera

CONFLICT MAGSALAVRAR -> MTKSCRIEAST (IN REF. 1).

The L07806-derived chimeric protein, without the targeting sequence, was expressed as a maltose binding protein fusion in E coli and was fully active!

tr Q91XP0 3' non-translated beta-F1-ATPase mRNA-binding protein: Length = 28

The artefactual sequence includes an exon Q91XP0 and AAK61874 MGKHILLLPLVLSLLMSSLQDSCGHEPS RUP1 MGKHILLLPLGLSLLMSSLLLALQCFRCTSFDSTGFCHVGRQK...

Page 18: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

The L07806 Chimera Caused Errors in

UniGene

Page 19: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

RUP Gene Family on Rat 8q21

Page 20: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

Rat and Mouse RUP Homologues are Highly Diverged

* 20 * 40 * 60ENSM34617 : MGKLLLLHFLLMQASFALVFIQVQATVCMVCKSFK-SGHCLVGKNNCTTRYKPGCRTRNYENSM34619 : ----MKNFLRLCLFLLCFETG--FPLQCVQCQSYK-NGECATKKETCTTKPGETCMIRRTENSM34610 : --MNSVTKISTLLIVILSFLCFVEGLICNSCEKSR-DSRCTMSQSRCVAKPGESCS---TENSM39445 : ---------------LAFSIS---ALKCFQCTLFNSKGKCLFQEPPCETQNNEVCV---LENSM23855 : --------ILLHLLGLSFLVGFLKALTCITCDRINSQGICESGEGCCQAKPGEKCA---SENSM48154 : ----MGKHILQLLLVLSLLVMSSQALTCITCDRINSQGICESGEGCCQAKPGEKCA---SENSR7555/1 : ----MGKHILLLPLGLSLLMSSLLALQCFRCISFDSTGFCYVGRHICQTYPDEICAW--VENSR7614/1 : ----MGKHILLLPLGLSLLMSSLLALQCFRCISFDSTGFCYVGRHICQTYPDEICAW--VENSR7667/1 : ----MGKPILLLPLGLSLLMSSLLALQCFRCESLDSTGLCRVGRRICQTYPDEICAW--VENSR7837/1 : ----MGKHILLLPLGLSLLMSSLLALQCFRCESFDSTGLCQFGRYKCQTYPGEVCAF--VENSR7903/1 : ----MGKHILLLPLGLSLLMSSLLALQCFRCTSFDSTGFCHVGRQKCQTYPDEICAW--VENSR11381/ : ----MGKNILLLLLGLSFVIGFLQALRCLECDMLNSDGICEKGNSTCEAKEDQECG---I * 80 * 100 * ENSM34617 : FLFSHTGKWVHNHTELDCDKACMAENMYLGALKISTFCCKGEDFCNKYHGQVVNKNIYENSM34619 : WYANEIHNLQDAE--TKCTNSCKFEEKTSGYLTTHTYCCSHGDFCNDINLPIVMT---ENSM34610 : VSHFVGTKHVYSK--QMCSPQCKEKQLNTGKKLIYIMCCEKN-LCNSF----------ENSM39445 : WAKFEGGRFMYGF--QECSHTCVNQTLNLRNKRIEMKCCNDKSFCN------------ENSM23855 : LITLKDGKIQFGN--QRCANICFTGTVQTGDQTVKMKCCKKRSFCNEL----------ENSM48154 : LITLKDGKIQFGN--QRCANICFTGTVQTGDQTVKMKCCKKRSFCN------------ENSR7555/1 : VVTTRDGKFVYGN--QSCA-ECNATTVEHGSLIVSTNCCSATPFCNMVHR--------ENSR7614/1 : VVTTRDGKFVYGN--QSCA-ECNATTVEHGSLIVSTNCCSATPFCNMVHR--------ENSR7667/1 : VVTTRDGKFVYGN--QSCA-ECIGTTVEHGSLIISTNCCSATPFCNMVHP--------ENSR7837/1 : IITTRDGKFVYGN--QSCA-ECNATTVEHGSLIVSTNCFSATPFCNMVHR--------ENSR7903/1 : VVTTRDGKFVYGN--QSCA-ECNATTVEHGSLIVSTNCCSATPFCNMVHR--------ENSR11381/ : LVVSQGVDILFGM--QDCSSHCLNKTFHHYNLTLDFTCCHDQSLCNEF----------

Page 21: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

Sequences Conserved in Rat but Divergent in Mouse

Page 22: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

Homologues in Five Mammals but True Orthology Unclear

UP1_RAT : SP1_RAT : PIP1_PIG : BOP1_COW : EQP1_HOR : EQP2_HOR : DOP1_DOG : XP42_MOU :

* 20 * 40 * 60 * 80 * 100 MGKPILLLP--LGLSLLMSSLLALQCFRCESLDSTGLCRVGRRICQTYPDEICAWVVVTTRDGKFVYGNQSCA-ECIGTTVEHGSLIISTNCCSATPFCNMVHPMGKNILLL--LLGLSFVIGFLQALRCLECDMLNSDGICEKGNSTCEAKEDQECGILVVSQ-GVDILFGMQDCSSHCLNKTFHHYNLTLDFTCCHDQSLCNEF--MGKCLLLPLLLVVLSSLLGFPQALECFQCQRVSASGVCESGKSFCQTQGSQQCFLRKVYE-GDTVSYGHQGCSSLCVPMKFFRPNVTVDFRCCHDSPFCNKF--MAKCLLL-LLLVVLSSLLGLPQALECFQCNRVNASGVCETGGSTCQTQGSQQCFLRRIFE-NGTLSYGHQGCSQLCIPMKLFNPSVIVEYKCCHDSPLCNKF--MGKHLLLP--LVILSSLLGFLQALQCFHCDRVNASGVCVSGERFCETTGSQQCFVKKVYE-DGIISYGYQGCSSLCVDMMFLNFNVNLDWKCCHHASLCNKF--MGKHLLLP--LIILSSLLGFLQALTCLKCDRVNTSGVCQSGASFCQTKGSQQCYVRKVYE-DDTISYGSQGCSSICTDILLFSPNVAVDLKCCDDSPLCNKF--MGRCLLLLHLLLILCSQLDLLQALQCFQCKQVNANGVCEDGKSTCQAEGNQQCFLRKVYK-DNILSYGYQGCSSVCSPMTIFSTDVNLEEKCCNDSSFCNKF--MEKYLLLL--LLGIFLRVGFLQALTCVSCGRLNSSGICETAETSCEATNNRKCALRLLYK-DGKFQYGFQGCLGTCFNYTKTNNNMVKEHKCCDHQNLCNKP--

: 101 : 99 : 101 : 100 : 99 : 99 : 101 : 99

Page 23: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

Remote Human Homolgues but no Strict Ortholgues

>tr|AF462605|Q8WXA2|9AD752F00D901FFE PATE.[Homo sapiens] (expressed in prostate and testis) Length = 126

Score = 31.2 bits (69), Expect = 3.3 Identities = 21/79 (26%), Positives = 32/79 (39%), Gaps = 6/79 (7%)

RUP1 : 23 QCFRCESLDSTGLCRVGRRICQTYPDEICAWVVVTTRDGK----FVYGNQSCAECIGTTV QC C C GR IC +E C + RDG F+ ++CA+ G +PATE : 47 QCRMCHLQFPGEKCSRGRGICTATTEEACMVGRMFKRDGNPWLTFMGCLKNCADVKG--I

Query: 79 EHGSLIISTNCCSATPFCN 97 +++ CC + CNSbjct: 105 RWSVYLVNFRCCRSHDLCN 123

Page 24: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

Threading Reveals Homology between RUP1, Lynx1 and Snake Toxin Structures

P81827|UP1 : Q9WVC2|LYN : P81782|BUC :

* 20 * 40 * 60 MGKPILLLPLGLSLLMSSLL--ALQCFRC--ES--LDSTGLCRVGRRICQTYPDEICAWVVV-MTH--LLTVFLVALMGLPVAQALECHVCAYNGDNCFKPMRCPAMATYCMTTRTYF-----------------------------MECYRCGVSG--CHLKITCSAEETFCYKWLNKI------

P81827|UP1 : Q9WVC2|LYN : P81782|BUC :

* 80 * 100 * 120 TTRDGKFVYGNQSCAECIGTTVEHGSLII---------STNCCSATPFCNM----------V---------TPYR-MKVRKSCVPSCFETVYDGYSKHASATSCCQ-YYLCNGAGFATPVTLAL---------SNERWLGCAKTCTEIDTWNVY---------NKCCT-TNLCNT-----------

Lynx1, an Endogenous Toxin-like modulator of AChRs in the CNS,

Page 25: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

Why so Few Apparent Orthologues?

Page 26: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

P55000: Antineoplastic Urinary Protein/Secreted Mammalian Ly-6/uPAR Related Protein – Equivocal Annotation

Page 27: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

Linking Sequence to Function: the Lost Keyword Problem (PubMed Queries in red)

• Adermann et al. "Structural and phylogenetic characterisation of human SLURP-1, the first secreted mammalian member of the Ly-6 /uPAR protein superfamily" Protein Sci. 1999 … from blood and urine peptide libraries. SLURP-1 is encoded by the ARS (component B)-81/s locus, and appears to be the first mammalian member of the Ly-6/uPAR family lacking a GPI-anchoring signal sequence ... SLURP-1 (+) Ly-6 (+) ANUP (-)

• Katz et al "A partial catalogue of proteins secreted by epidermal keratinocytes in culture." J Invest Dermatol. 1999 … proteins secreted by adult human epidermal keratinocytes included anti-neoplastic urinary protein (+) ANUP (-) SLURP-1(-) Ly-6 (-)

• Fischer et al. "Mutations in the gene encoding SLURP-1 in Mal de Meleda". Hum Mol Genet. 2001 … Three different homozygous mutations (a deletion, a nonsense and a splice site mutation) were detected in 19 families of Algerian and Croatian origin … first instance of a secreted protein being involved in a palmoplantar keratoderma.. SLURP-1 (+) Ly-6 (+) ANUP (-)

Page 28: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

Mouse Ly-6-like Caltrin: Sequence Errors, Unverified Reported Function, New Name and New Function?

Page 29: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

Confusion Over Caltrin: 5 Different Sequences in SwissProt; 22 PubMed Citations

Caltrin = inhibition of Ca2+ uptake into spermatozoa

• CALTRIN PRECURSOR (CALCIUM TRANSPORT INHIBITOR). - Mus musculus (a Ly-6 protein)

• CALTRIN PRECURSOR (CALCIUM TRANSPORT INHIBITOR) (SEMINALPLASMIN) (SPLN). - Bos taurus (PYY-like)

• CALTRIN-LIKE PROTEIN I. - Cavia porcellus (weak protease inhibitor match)

• CALTRIN-LIKE PROTEIN II. - Cavia porcellus (elastase inhibitor like)

• PANCREATIC SECRETORY TRYPSIN INHIBITOR II PRECURSOR (PSTI-II) (CALTRIN) (CALCIUM TRANSPORT INHIBITOR). - Rattus norvegicus (trypsin inhibitor identity)

Page 30: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

Limited Knolwedge for the Short Ly-6 Proteins

• Single domain proteins ~85-100 residues mostly with signal peptide• Probable ligands by inference from toxin structures?• Recently duplicated rodent parologous family of 6 -10 gene loci but

very different evolutionary trajectories between mouse and rat • Liver and spleen expression in rat• Significant amounts of multiple gene products, probably glycosylated,

secreted in male rat urine• Foetal expression for pig, bovine and horse orthologues• Rapid evolution in mammals • Mix of secreted and GPI anchored homologues in human• Human Lynx-1 modulating AChRs • SLURP linked to skin physiology• Caltrin/SVS VII Phospholipid binding• Homologues involved in myelopoiesis in Xenopus and liver acute

phase in rainbow trout

Page 31: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

Summary of the Bioinformatic Pitfalls

• The chimeric and pre-mRNAs lead to:– Artifactual clustering of ESTs and non-homologous gene products

in Unigene– Protein database conflicts and artifacts– Propogation of errors in RefSeq and rat genome

• Loose ends and sequence errors in old data • Equivocal functional annotation transitively perpetuated• Sequence-literature links broken by gene name ambiguities• Incorrect signal peptide annotation • Similarity scores for Ly-6 homologues fall below those in domain

databases• Rapid evolution made orthologue assignment difficult

Page 32: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

Conclusions

• Bioinformatics can help a little bit of proteomics data go a long way• Finding quirks in database entries is definitely part of the fun but …• Sequence anomalies can seriously confound automated annotation• They can only be exposed of unravelled by

– transitive and broad sequence/keyword searching– detailed examination of sequence and literature links– understanding database building procedures– chimeras can be recognised by EST and genome matches

• Conflicting data links should be ideally be resolved by new data but may have to use judgment

• Difficult to discriminate between in vitro artefacts and rare in vivo events

• Inferring biological meaning from database searches requires an understanding of the experiments and the in-silico analyses

• Value of Swiss-Prot is significantly enhanced by community annotation

Page 33: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

Acknowledgments, Reference and Database Entries

Southan C, Cutler P, Birrell H, Connell J, Fantom KG, Sims M, Shaikh N, Schneider K. “The characterisation of novel secreted Ly-6 proteins from rat urine by the combined use of two-dimensional gel electrophoresis, microbore high performance liquid chromatography and expressed sequence tag data” Proteomics 2002 Feb;2(2):187-96.

AF198441 Rat RUP2 mRNAUP1_RAT (P81827) Urinary protein 1 (RUP1) UP2_RAT (P81828) Urinary protein 2 (RUP2)UP3_RAT (P83125) Urinary protein 3 (RUP3) RSP1_RAT (Q9QXN2) Spleen protein 1AF198442 Rat spleen protein 1 precursor, mRNA, complete cds P83106 PIP1 protein (PIP1) - Sus scrofa P83107 BOP1 protein (BOP1) - Bos taurus Q9BZG9 Ly-6 neurotoxin-like protein Lynx1 - Homo sapiensAF321824 Human Ly-6 neurotoxin-like protein Lynx1 mRNA, partial cds

Page 34: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

Human Short Ly6 Proteins

Name Size Chrom Ens ESTs Sigpep GPI InterPro

Patents

LYNX1 115 8q24.3 + + 19 91 Ly6 Curagen, Hyseq, HGS (sec), Incyte (sec)

Genset (partial)

SLURP2 97 8q24.3 + + 22 - Ly6 Genentch (sec/tm) ZymoGenetics

RGTR43 125 8q24.3 - + 22 103 Ly6 Genentech, HGS, Incyte

SLURP1 103 8q24.3 + + 22 - Ly6 HGS, ARS, Biovision (partial)

PATE 126 11q24.2 + + 21 - CyCPA2

Genset (sec), USDOH

LVLF31 113 11q24.2 - + 18 - - None

Page 35: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

VertebrateShort Ly6 Proteins

ENSR7555 : ENSR7614 : ENSR7903 : ENSR7837 : ENSR7667 : ENSM23855 : ENSM48154 : PIP1_pig : BOP1_cow : ENSR11381 : ENSM39445 : ENSM34619 : ENSM34617 : ENSM34610 : LVLF3112_h : Hep21_Chic : SLURP2_hum : RGTR430_hu : SLURP1_hum : LYNX1_hum : PATE_hum :

MGKHILL----LPLGLS--------------------LLMSSLLALQCFRCISFDSTGFCYVGRHICQTYPDEICMGKHILL----LPLGLS--------------------LLMSSLLALQCFRCISFDSTGFCYVGRHICQTYPDEICMGKHILL----LPLGLS--------------------LLMSSLLALQCFRCTSFDSTGFCHVGRQKCQTYPDEICMGKHILL----LPLGLS--------------------LLMSSLLALQCFRCESFDSTGLCQFGRYKCQTYPGEVCMGKPILL----LPLGLS--------------------LLMSSLLALQCFRCESLDSTGLCRVGRRICQTYPDEIC----ILL----HLLGLS--------------------FLVGFLKALTCITCDRINSQGICESGEGCCQAKPGEKCMGKHILQ----LLLVLS--------------------LLVMSSQALTCITCDRINSQGICESGEGCCQAKPGEKCMGKCLLLP--LLLVVLS--------------------SLLGFPQALECFQCQRVSASGVCESGKSFCQTQGSQQCMAKCLLL---LLLVVLS--------------------SLLGLPQALECFQCNRVNASGVCETGGSTCQTQGSQQCMGKNILL----LLLGLS--------------------FVIGFLQALRCLECDMLNSDGICEKGNSTCEAKEDQEC--------------------------------------LAFSISALKCFQCTLFNSKGKCLFQEPPCETQNNEVC-MKNFLR-----LCLFL--------------------LCFETGFPLQCVQCQSYK-NGECATKKETCTTKPGETCMGKLLLLHFLLMQASFA--------------------LVFIQVQATVCMVCKSFK-SGHCLVGKNNCTTRYKPGCMNSVTKIS--TLLIVIL--------------------SFLCFVEGLICNSCEKSR-DSRCTMSQSRCVAKPGESCMLVLFLLGTVFLLCPYWGEL-----------------HDPIKATEIMCYECKKYH-LGLCYGVMTSCSLKHKQSCMKLLFVG------LALV--------------------LCVGVVEALQCKVCKYKIPYVGCFHGANETTCERRERCMQLGTGL---LLAAVLS--------------------LQLAAAEAIWCHQCTGFG---GCSHG-SRCLR-DSTHCMRGTRLA---LLALVLA--------------------ACGELAPALRCYVCPEPTGVSDCVTIAT-CTT-NETMCMASRWAVQ---LLLVAA--------------------WSMGCGEALKCYTCKEPMTSASCRTITR-CKP-EDTACMTPCSPD----LVVLMG----------------------LPLAQALDCHVCAYNG--DNCFNPMR-CPA-MVAYCMDKSLLLELPILLCCFRALSGSLSMRNDAVNEIVAVKNNFPVIEIVQCRMCHLQFPGEKCSRGRGICTATTEEACm a C C C c C

: 51 : 51 : 51 : 51 : 51 : 47 : 51 : 53 : 52 : 51 : 37 : 48 : 54 : 52 : 57 : 49 : 47 : 50 : 50 : 45 : 75

ENSR7555 : ENSR7614 : ENSR7903 : ENSR7837 : ENSR7667 : ENSM23855 : ENSM48154 : PIP1_pig : BOP1_cow : ENSR11381 : ENSM39445 : ENSM34619 : ENSM34617 : ENSM34610 : LVLF3112_h : Hep21_Chic : SLURP2_hum : RGTR430_hu : SLURP1_hum : LYNX1_hum : PATE_hum :

AW--VVVTTRDGKFVYGN--QSCA---ECNATTVEHGS--LIVSTNCCSATPFCNMVHR----------------AW--VVVTTRDGKFVYGN--QSCA---ECNATTVEHGS--LIVSTNCCSATPFCNMVHR----------------AW--VVVTTRDGKFVYGN--QSCA---ECNATTVEHGS--LIVSTNCCSATPFCNMVHR----------------AF--VIITTRDGKFVYGN--QSCA---ECNATTVEHGS--LIVSTNCFSATPFCNMVHR----------------AW--VVVTTRDGKFVYGN--QSCA---ECIGTTVEHGS--LIISTNCCSATPFCNMVHP----------------A---SLITLKDGKIQFGN--QRCAN--ICFTGTVQTGD--QTVKMKCCKKRSFCNEL------------------A---SLITLKDGKIQFGN--QRCAN--ICFTGTVQTGD--QTVKMKCCKKRSFCN--------------------F---LRKVYEGDTVSYGH--QGCSS--LCVPMKFFRPN--VTVDFRCCHDSPFCNKF------------------F---LRRIFENGTLSYGH--QGCSQ--LCIPMKLFNPS--VIVEYKCCHDSPLCNKF------------------G---ILVVSQGVDILFGM--QDCSS--HCLNKTFHHYN--LTLDFTCCHDQSLCNEF------------------V---LWAKFEGGRFMYGF--QECSH--TCVNQTLNLRN--KRIEMKCCNDKSFCN--------------------MIRRTWYANEIHNLQDAE--TKCTN--SCKFEEKTSGY--LTTHTYCCSHGDFCNDINLPIVMT-----------RTRNYFLFSHTGKWVHNHTELDCDK--ACMAENMYLGA--LKISTFCCKGEDFCNKYHGQVVNKNIY--------S---TVSHFVGTKHVYSK--QMCSP--QCKEKQLNTGK--KLIYIMCCEKN-LCNSF------------------AVENFYILTRKGQSMYHYSKLSCMT--SCEDINFLGFT--KRVELICCDHSNYCNLPEGV---------------A---IIKTSLGKVTLYYQ--QGCTSALNCGRERASDAESRLTSRYSCCETD-LCNEKWDDDPTD-----------VTTATRVLSNTEDLPLVT--KMCHI--GCPDIPSLGLG--PYVSIACCQTS-LCNHD------------------KTTLYSREIVYPFQGDSTVTKSCAS--KCKPSDVDGIG--QTLPVSCCN-TELCNVDGAPALNSLHCGALTLLPLMTTLVTVEAEYPFNQSPVVTRSCSS--SCVATDPDSIG--AAHLIFCCFRD-LCNSEL-----------------M---TTRTYYTPTRMKVS--KSCVP--RCFETVYDGYS-KHASTTSCCQYD-LCNGTGLATPATLALAPILLATLMVG--RMFKRDGNPWLTF--MGCLK--NCADVKGIRWS-VYLVNFRCCRSHDLCNEDL----------------- C C Cc CN

: 101 : 101 : 101 : 101 : 101 : 95 : 97 : 101 : 100 : 99 : 83 : 106 : 117 : 99 : 113 : 107 : 97 : 120 : 103 : 111 : 126

Page 36: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

Searches Against Rat ESTs Confirmed the Three mRNAs as Chimeras

J03621

L07806

L15618

Page 37: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

mRNA Anomaly No. 4: Unspliced?

LOCUS AF368860 1197 bp mRNA 13-JUN-2001 (CDS 10..96 "MGKHILLLPLVLSLLMSSLQDSCGHEPS")

Rattus norvegicus 3' non-translated beta-F1-ATPase mRNA-binding protein mRNA, complete cds. "Identification of a liver specific cDNA clone chaperoning the differential assembly of ribonucleoprotein complexes at the 3' UTR of the mRNAs of

oxidative phosphorylation"

BLAST

vs Rat ESTs

RUP-4? MGKHILLLPLVLSLLMSSLLALQCIQCARIDSRGICRHDIYICHADSDEVCSWVVATTRD MGKHILLLPL LSLLMSSLLALQC +C DS G C C DE+C+WVV TTRDRUP-2 MGKHILLLPLGLSLLMSSLLALQCFRCTSFDSTGFCHVGRQKCQTYPDEICAWVVVTTRD

RUP-4? GKFVYGNQSCAECNATTVEQGSLIVSTNCCSASHFCNMVYR(ESTs AA945232,AA945121) GKFVYGNQSCAECNATTVE GSLIVSTNCCSA+ FCNMV+RRUP-2 GKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR 101

Page 38: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

RUP Homologues Expand a New Sub-family of Secreted Ly-6 Proteins

Page 39: Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

3D PSSM Fold Recognition Server