lecture-2 - pennsylvania state university · 2012-08-30 ·...
TRANSCRIPT
![Page 1: lecture-2 - Pennsylvania State University · 2012-08-30 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$1,$Lecture$2$ István'Albert' ' Biochemistry$and$Molecular$Biology$$](https://reader033.vdocuments.net/reader033/viewer/2022043023/5f3ef3970532f02bd3083bf3/html5/thumbnails/1.jpg)
2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$
$$Week$1,$Lecture$2$
István'Albert''
Biochemistry$and$Molecular$Biology$$and$Bioinforma;cs$Consul;ng$Center$
$Penn$State$
![Page 2: lecture-2 - Pennsylvania State University · 2012-08-30 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$1,$Lecture$2$ István'Albert' ' Biochemistry$and$Molecular$Biology$$](https://reader033.vdocuments.net/reader033/viewer/2022043023/5f3ef3970532f02bd3083bf3/html5/thumbnails/2.jpg)
Get$a$good$text$editor$
Desired$features:$syntax'highligh3ng,$line$numbering,$ability$to$view$white$space$$• Komodo$Edit$• Sublime$Text$• TextMate$$
There$are$many$other$op;ons.$$
![Page 3: lecture-2 - Pennsylvania State University · 2012-08-30 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$1,$Lecture$2$ István'Albert' ' Biochemistry$and$Molecular$Biology$$](https://reader033.vdocuments.net/reader033/viewer/2022043023/5f3ef3970532f02bd3083bf3/html5/thumbnails/3.jpg)
Download$the$data$for$the$lecture$
The$url$sent$out$via$email$(also$on$the$course$webpage)$$
hVp://downloads.yeastgenome.org/cura;on/chromosomal_feature/saccharomyces_cerevisiae.gff$$$
![Page 4: lecture-2 - Pennsylvania State University · 2012-08-30 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$1,$Lecture$2$ István'Albert' ' Biochemistry$and$Molecular$Biology$$](https://reader033.vdocuments.net/reader033/viewer/2022043023/5f3ef3970532f02bd3083bf3/html5/thumbnails/4.jpg)
Biological$file$formats$
Each$file$format$represents$ $
1. Informa3on$–$types$of$knowledge$that$are$ stored$in$the$file $$
2. Op3miza3on$–$$types$of$opera;ons$that$are$easy/efficient$to$perform$
The$above$implies$that$some$informa;on$may$not$be$present$or$cannot$be$easily$extracted$from$a$certain$file$format. $
![Page 5: lecture-2 - Pennsylvania State University · 2012-08-30 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$1,$Lecture$2$ István'Albert' ' Biochemistry$and$Molecular$Biology$$](https://reader033.vdocuments.net/reader033/viewer/2022043023/5f3ef3970532f02bd3083bf3/html5/thumbnails/5.jpg)
Tabular$formats$
• Many$common$bioinforma;cs$data$formats$are$column$based$and$tab%separated$$
• First$format$we$deal$with$will$be$the$$
GFF3 '–'Generic''Feature''Format'
(search$for$GFF3$to$see$the$specifica;on$for$version$3 )$$
hVp://www.sequenceontology.org/gff3.shtml$ $
![Page 6: lecture-2 - Pennsylvania State University · 2012-08-30 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$1,$Lecture$2$ István'Albert' ' Biochemistry$and$Molecular$Biology$$](https://reader033.vdocuments.net/reader033/viewer/2022043023/5f3ef3970532f02bd3083bf3/html5/thumbnails/6.jpg)
The$GFF3$specifica;on$
![Page 7: lecture-2 - Pennsylvania State University · 2012-08-30 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$1,$Lecture$2$ István'Albert' ' Biochemistry$and$Molecular$Biology$$](https://reader033.vdocuments.net/reader033/viewer/2022043023/5f3ef3970532f02bd3083bf3/html5/thumbnails/7.jpg)
GFF$format$Search$for$GFF3$!$hVp://www.sequenceontology.org/gff3.shtml$
Tab$separated$with$9$columns.$Missing$aVributes$may$be$replaced$with$a$$dot$!$.$
1. Seqid'$$$$$$$$$$(usually$chromosome)$2. Source$$$$$$$$$(where$is$the$data$coming$from)$3. Type$$$$$$$$$$$$$(usually$a$term$from$the$sequence$ontology)$4. Start''$$$$$$$$$$$(interval$start$rela;ve$to$the$seqid)$5. End''''$$$$$$$$$$$(interval$end$rela;ve$to$the$seqid)$6. Score'''$$$$$$$$$(the$score$of$the$feature,$a$floa;ng$point$number)$7. Strand''$$$$$$$$(+/%/.)$8. Phase'''''''$$$$(used$to$indicate$reading$frame$for$coding$sequences)$9. APributes$$$$(semicolon$separated$aVributes$!$Name=ABC;ID=1)$
Example$aVribute$specifica;on:$name=REB1;id=YP33546
![Page 8: lecture-2 - Pennsylvania State University · 2012-08-30 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$1,$Lecture$2$ István'Albert' ' Biochemistry$and$Molecular$Biology$$](https://reader033.vdocuments.net/reader033/viewer/2022043023/5f3ef3970532f02bd3083bf3/html5/thumbnails/8.jpg)
Variants$of$GFF$–$GTF$2 $$
GTF$2$–$Gene'Transfer'Format' same$9$columns$as$the$GFF$$
hPp://mblab.wustl.edu/GTF2 .html'
Differences$$1. Only$a$subset$of$types$are$allowed$in$column$3:$CDS, start_codon, stop_codon a nd$a$
few$more$$
2. AVribute$column$format$change,$key$values$are$separated$by$space$and$not$semicolon$=$$3. Two$mandatory$aVributes$at$the$end$of$the$record:$
$• gene_id'value;$$$$$A$globally$unique$iden;fier$for$the$genomic$source$of$the$transcript$
$• transcript_id'value;$$$$$A$globally$unique$iden;fier$for$the$predicted$transcript.$
$Example$aVribute$specifica;on:$name “REB1”; id “YP33546”$
![Page 9: lecture-2 - Pennsylvania State University · 2012-08-30 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$1,$Lecture$2$ István'Albert' ' Biochemistry$and$Molecular$Biology$$](https://reader033.vdocuments.net/reader033/viewer/2022043023/5f3ef3970532f02bd3083bf3/html5/thumbnails/9.jpg)
What$do$the$terms$mean?$
![Page 10: lecture-2 - Pennsylvania State University · 2012-08-30 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$1,$Lecture$2$ István'Albert' ' Biochemistry$and$Molecular$Biology$$](https://reader033.vdocuments.net/reader033/viewer/2022043023/5f3ef3970532f02bd3083bf3/html5/thumbnails/10.jpg)
Sequence$ontology$browser$
![Page 11: lecture-2 - Pennsylvania State University · 2012-08-30 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$1,$Lecture$2$ István'Albert' ' Biochemistry$and$Molecular$Biology$$](https://reader033.vdocuments.net/reader033/viewer/2022043023/5f3ef3970532f02bd3083bf3/html5/thumbnails/11.jpg)
Searching$for$$
X_element_combinatorial_repeat$$
![Page 12: lecture-2 - Pennsylvania State University · 2012-08-30 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$1,$Lecture$2$ István'Albert' ' Biochemistry$and$Molecular$Biology$$](https://reader033.vdocuments.net/reader033/viewer/2022043023/5f3ef3970532f02bd3083bf3/html5/thumbnails/12.jpg)
Unix$commands$in$this$lecture$
$• wc, cat, head, tail, sort, cut, grep, more, clear
Handy'Tips'$
CTRL%C$!$interrupts$any$process$that$may$be$running$$
clear$!$clears$the$screen$$
$cursor$keys$allow$you$to$recall$past$commands$$$
$auto%complete$!$write$part$of$the$filename$then$press$TAB $
![Page 13: lecture-2 - Pennsylvania State University · 2012-08-30 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$1,$Lecture$2$ István'Albert' ' Biochemistry$and$Molecular$Biology$$](https://reader033.vdocuments.net/reader033/viewer/2022043023/5f3ef3970532f02bd3083bf3/html5/thumbnails/13.jpg)
Inves;gate$your$data$
![Page 14: lecture-2 - Pennsylvania State University · 2012-08-30 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$1,$Lecture$2$ István'Albert' ' Biochemistry$and$Molecular$Biology$$](https://reader033.vdocuments.net/reader033/viewer/2022043023/5f3ef3970532f02bd3083bf3/html5/thumbnails/14.jpg)
Check$head/tail$of$the$file$
![Page 15: lecture-2 - Pennsylvania State University · 2012-08-30 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$1,$Lecture$2$ István'Albert' ' Biochemistry$and$Molecular$Biology$$](https://reader033.vdocuments.net/reader033/viewer/2022043023/5f3ef3970532f02bd3083bf3/html5/thumbnails/15.jpg)
Paging$data$with:$less$(more)$
• q$or$ESC$!$quits$the$pager$
• SPACE$or$f$!$go$forward,$next$page$
• b$!$go$backward$
• /$word$!$search$for$a$word$$$
• /$!$repeats$the$search$for$the$last$word$
![Page 16: lecture-2 - Pennsylvania State University · 2012-08-30 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$1,$Lecture$2$ István'Albert' ' Biochemistry$and$Molecular$Biology$$](https://reader033.vdocuments.net/reader033/viewer/2022043023/5f3ef3970532f02bd3083bf3/html5/thumbnails/16.jpg)
Find$paVerns$in$the$file$
![Page 17: lecture-2 - Pennsylvania State University · 2012-08-30 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$1,$Lecture$2$ István'Albert' ' Biochemistry$and$Molecular$Biology$$](https://reader033.vdocuments.net/reader033/viewer/2022043023/5f3ef3970532f02bd3083bf3/html5/thumbnails/17.jpg)
Connec;ng$streams$
• Input$streams:$entry$from$the$keyboard$or$$files$
• Output$streams:$print$on$screen,$into$files$
Stream$redirec;on$the$symbols$of$“arrows”$<,$>$$
Input$stream$redirec;on$from$file:$$<'filename'Output$stream$redirec;on$to$a$file:$>'filename''
![Page 18: lecture-2 - Pennsylvania State University · 2012-08-30 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$1,$Lecture$2$ István'Albert' ' Biochemistry$and$Molecular$Biology$$](https://reader033.vdocuments.net/reader033/viewer/2022043023/5f3ef3970532f02bd3083bf3/html5/thumbnails/18.jpg)
Redirec;ng$to$a$file$$creates/overwrites$that$file$
![Page 19: lecture-2 - Pennsylvania State University · 2012-08-30 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$1,$Lecture$2$ István'Albert' ' Biochemistry$and$Molecular$Biology$$](https://reader033.vdocuments.net/reader033/viewer/2022043023/5f3ef3970532f02bd3083bf3/html5/thumbnails/19.jpg)
Piping$streams$
• The$pipe$character$$|'channels$the$output$of$one$command$into$the$other$
$(located$above$the$ENTER$key)$
$
You$can$pipe$mul;ple$commands$together$
![Page 20: lecture-2 - Pennsylvania State University · 2012-08-30 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$1,$Lecture$2$ István'Albert' ' Biochemistry$and$Molecular$Biology$$](https://reader033.vdocuments.net/reader033/viewer/2022043023/5f3ef3970532f02bd3083bf3/html5/thumbnails/20.jpg)
Piping$commands$
![Page 21: lecture-2 - Pennsylvania State University · 2012-08-30 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$1,$Lecture$2$ István'Albert' ' Biochemistry$and$Molecular$Biology$$](https://reader033.vdocuments.net/reader033/viewer/2022043023/5f3ef3970532f02bd3083bf3/html5/thumbnails/21.jpg)
Isola;ng$relevant$parts$of$our$file$
![Page 22: lecture-2 - Pennsylvania State University · 2012-08-30 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$1,$Lecture$2$ István'Albert' ' Biochemistry$and$Molecular$Biology$$](https://reader033.vdocuments.net/reader033/viewer/2022043023/5f3ef3970532f02bd3083bf3/html5/thumbnails/22.jpg)
How$many$of$each$elements$
![Page 23: lecture-2 - Pennsylvania State University · 2012-08-30 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$1,$Lecture$2$ István'Albert' ' Biochemistry$and$Molecular$Biology$$](https://reader033.vdocuments.net/reader033/viewer/2022043023/5f3ef3970532f02bd3083bf3/html5/thumbnails/23.jpg)
Find$out$how$many$of$each$features$
![Page 24: lecture-2 - Pennsylvania State University · 2012-08-30 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$1,$Lecture$2$ István'Albert' ' Biochemistry$and$Molecular$Biology$$](https://reader033.vdocuments.net/reader033/viewer/2022043023/5f3ef3970532f02bd3083bf3/html5/thumbnails/24.jpg)
Homework$2$
• Create$a$file$that$lists$all$possible$ontology$terms$that$are$present$in$the$provided$GFF$file$with$a$count$of$how$many$;mes$this$element$occurs$in$the$yeast$genome.$Sort$this$file$by$this$count$in$reverse$order$(hint:$man$sort)$
• Pick$an$ontology$term$that$is$unfamiliar$to$you$and$look$it$up$in$the$Sequence$Ontology,$paste$the$explana;on$into$the$homework$