sequence tracking deanna m. church staff scientist, ncbi @deannachurch short course in medical...
TRANSCRIPT
![Page 1: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context](https://reader036.vdocuments.net/reader036/viewer/2022081603/5697bfd01a28abf838caa946/html5/thumbnails/1.jpg)
Sequence Tracking
Deanna M. Church Staff Scientist, NCBI
@deannachurch Short Course in Medical Genetics 2013
Understanding your sequence context
![Page 2: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context](https://reader036.vdocuments.net/reader036/viewer/2022081603/5697bfd01a28abf838caa946/html5/thumbnails/2.jpg)
What’s in a name?
Bob Bob
BobBob
![Page 3: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context](https://reader036.vdocuments.net/reader036/viewer/2022081603/5697bfd01a28abf838caa946/html5/thumbnails/3.jpg)
Bob
*
*http://howmanyofme.com
What’s in a name?
123-45-6789
![Page 4: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context](https://reader036.vdocuments.net/reader036/viewer/2022081603/5697bfd01a28abf838caa946/html5/thumbnails/4.jpg)
Bob
MirandaLydia
Samantha
What’s in a name?
Need more than unique identifiertrack updates/improvements
![Page 5: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context](https://reader036.vdocuments.net/reader036/viewer/2022081603/5697bfd01a28abf838caa946/html5/thumbnails/5.jpg)
chr1Chr11Chrom1
![Page 6: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context](https://reader036.vdocuments.net/reader036/viewer/2022081603/5697bfd01a28abf838caa946/html5/thumbnails/6.jpg)
Mouse chrX: 34,800,000-34,890,000
NC_000086.123456 CM001013.17 2
![Page 7: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context](https://reader036.vdocuments.net/reader036/viewer/2022081603/5697bfd01a28abf838caa946/html5/thumbnails/7.jpg)
Mouse chrX: 35,000,000-36,000000
X
MGSCv3 MGSCv36
![Page 8: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context](https://reader036.vdocuments.net/reader036/viewer/2022081603/5697bfd01a28abf838caa946/html5/thumbnails/8.jpg)
GenBank
Data Archives
Data in a common formatData in a single location (and mirrored)Most quality checked prior to depositionRobust data tracking mechanism (accession.version)Data owned by submitter
![Page 9: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context](https://reader036.vdocuments.net/reader036/viewer/2022081603/5697bfd01a28abf838caa946/html5/thumbnails/9.jpg)
Data tracking
ABC14-1065514J1GapsPhase LengthDate
FP565796.1 1 121-Oct-2009
FP565796.2 1 014-Oct-2010
FP565796.3 3 007-Nov-2010
![Page 10: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context](https://reader036.vdocuments.net/reader036/viewer/2022081603/5697bfd01a28abf838caa946/html5/thumbnails/10.jpg)
Data Archives
Initial versions of human and mouse reference assemblies not in INSDC!!*
First human version in INSDC: GRCh37First mouse version in INSDC: NCBI36
* But were tracked by RefSeq
![Page 11: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context](https://reader036.vdocuments.net/reader036/viewer/2022081603/5697bfd01a28abf838caa946/html5/thumbnails/11.jpg)
Data ArchivesINSDC archives track INDIVIDUAL sequences
An assembly is a COLLECTION of sequences
![Page 12: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context](https://reader036.vdocuments.net/reader036/viewer/2022081603/5697bfd01a28abf838caa946/html5/thumbnails/12.jpg)
hg19GRCh37
mm8MGSCv37
NCBIM37
danRer5Zv7
More naming issues
![Page 13: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context](https://reader036.vdocuments.net/reader036/viewer/2022081603/5697bfd01a28abf838caa946/html5/thumbnails/13.jpg)
chr21:8,913,216-9,246,964
Zv7
![Page 14: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context](https://reader036.vdocuments.net/reader036/viewer/2022081603/5697bfd01a28abf838caa946/html5/thumbnails/14.jpg)
Zv7 chr21:8,913,216-9,246,964 vs MGSCv36 chrX
![Page 15: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context](https://reader036.vdocuments.net/reader036/viewer/2022081603/5697bfd01a28abf838caa946/html5/thumbnails/15.jpg)
http://www.ncbi.nlm.nih.gov/genome/assembly
GRCh37hg19
![Page 16: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context](https://reader036.vdocuments.net/reader036/viewer/2022081603/5697bfd01a28abf838caa946/html5/thumbnails/16.jpg)
![Page 17: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context](https://reader036.vdocuments.net/reader036/viewer/2022081603/5697bfd01a28abf838caa946/html5/thumbnails/17.jpg)
Genome Browser Agreement
Submitter deposits assembly to
GenBank/EMBL/DDBJAssembly QA
Submitter updates assembly based on QA
results
Browsers pick up assembly from
GenBank/EMBL/DDBJ Assemblies must be in GenBank/EMBL/DDBJ
![Page 18: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context](https://reader036.vdocuments.net/reader036/viewer/2022081603/5697bfd01a28abf838caa946/html5/thumbnails/18.jpg)
GenBank RefSeq vs
Submitter Owned RefSeq Owned
Redundancy Non-RedundantUpdated rarely Curated
INSDC Not INSDC
BRCA183 genomic records31 mRNA records27 protein records
3 genomic records 5 mRNA records1 RNA record5 protein records
![Page 19: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context](https://reader036.vdocuments.net/reader036/viewer/2022081603/5697bfd01a28abf838caa946/html5/thumbnails/19.jpg)
![Page 20: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context](https://reader036.vdocuments.net/reader036/viewer/2022081603/5697bfd01a28abf838caa946/html5/thumbnails/20.jpg)
RefSeq for Assemblies
Typical assembly edits
Addition of non-nuclear (e.g. MT) assembly units
Removal of contamination
Drop unlocalized/unplaced scaffoldsMask contamination that is placed on chromosome(while preserving coordinate space)
![Page 21: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context](https://reader036.vdocuments.net/reader036/viewer/2022081603/5697bfd01a28abf838caa946/html5/thumbnails/21.jpg)
http://www.ncbi.nlm.nih.gov/assembly/organism/9606/
Human assemblies in assembly database
![Page 22: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context](https://reader036.vdocuments.net/reader036/viewer/2022081603/5697bfd01a28abf838caa946/html5/thumbnails/22.jpg)
Take home messages
Assemblies can (and do) update!Know what assembly your are working on
Track by accession.version, not just nameData in INSDC databases are mirroredRefSeq is NCBI specific