sequence tracking deanna m. church staff scientist, ncbi @deannachurch short course in medical...

22
Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics nderstanding your sequence context

Upload: madlyn-ellis

Post on 21-Jan-2016

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context

Sequence Tracking

Deanna M. Church Staff Scientist, NCBI

@deannachurch Short Course in Medical Genetics 2013

Understanding your sequence context

Page 2: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context

What’s in a name?

Bob Bob

BobBob

Page 3: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context

Bob

*

*http://howmanyofme.com

What’s in a name?

123-45-6789

Page 4: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context

Bob

MirandaLydia

Samantha

What’s in a name?

Need more than unique identifiertrack updates/improvements

Page 5: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context

chr1Chr11Chrom1

Page 6: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context

Mouse chrX: 34,800,000-34,890,000

NC_000086.123456 CM001013.17 2

Page 7: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context

Mouse chrX: 35,000,000-36,000000

X

MGSCv3 MGSCv36

Page 8: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context

GenBank

Data Archives

Data in a common formatData in a single location (and mirrored)Most quality checked prior to depositionRobust data tracking mechanism (accession.version)Data owned by submitter

Page 9: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context

Data tracking

ABC14-1065514J1GapsPhase LengthDate

FP565796.1 1 121-Oct-2009

FP565796.2 1 014-Oct-2010

FP565796.3 3 007-Nov-2010

Page 10: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context

Data Archives

Initial versions of human and mouse reference assemblies not in INSDC!!*

First human version in INSDC: GRCh37First mouse version in INSDC: NCBI36

* But were tracked by RefSeq

Page 11: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context

Data ArchivesINSDC archives track INDIVIDUAL sequences

An assembly is a COLLECTION of sequences

Page 12: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context

hg19GRCh37

mm8MGSCv37

NCBIM37

danRer5Zv7

More naming issues

Page 13: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context

chr21:8,913,216-9,246,964

Zv7

Page 14: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context

Zv7 chr21:8,913,216-9,246,964 vs MGSCv36 chrX

Page 15: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context

http://www.ncbi.nlm.nih.gov/genome/assembly

GRCh37hg19

Page 16: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context
Page 17: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context

Genome Browser Agreement

Submitter deposits assembly to

GenBank/EMBL/DDBJAssembly QA

Submitter updates assembly based on QA

results

Browsers pick up assembly from

GenBank/EMBL/DDBJ Assemblies must be in GenBank/EMBL/DDBJ

Page 18: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context

GenBank RefSeq vs

Submitter Owned RefSeq Owned

Redundancy Non-RedundantUpdated rarely Curated

INSDC Not INSDC

BRCA183 genomic records31 mRNA records27 protein records

3 genomic records 5 mRNA records1 RNA record5 protein records

Page 19: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context
Page 20: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context

RefSeq for Assemblies

Typical assembly edits

Addition of non-nuclear (e.g. MT) assembly units

Removal of contamination

Drop unlocalized/unplaced scaffoldsMask contamination that is placed on chromosome(while preserving coordinate space)

Page 21: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context

http://www.ncbi.nlm.nih.gov/assembly/organism/9606/

Human assemblies in assembly database

Page 22: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context

Take home messages

Assemblies can (and do) update!Know what assembly your are working on

Track by accession.version, not just nameData in INSDC databases are mirroredRefSeq is NCBI specific