using gc content to distinguish phytophthora sequences from tomato sequences
TRANSCRIPT
![Page 1: Using GC content to distinguish Phytophthora sequences from tomato sequences](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c1103/html5/thumbnails/1.jpg)
Using GC content to distinguish Phytophthora sequences from
tomato sequences
![Page 2: Using GC content to distinguish Phytophthora sequences from tomato sequences](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c1103/html5/thumbnails/2.jpg)
Mission #1
Calculate the GC content of each sequence in the Phytophthora-tomato interactome
We will use a perl script to accomplish the mission.
![Page 3: Using GC content to distinguish Phytophthora sequences from tomato sequences](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c1103/html5/thumbnails/3.jpg)
Preparation
• Download the perl script (gc.pl) from the class web site and store it in C:/BioDownload folder
![Page 4: Using GC content to distinguish Phytophthora sequences from tomato sequences](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c1103/html5/thumbnails/4.jpg)
• Open cygwin, or command prompt (Vista users), or terminal (Mac users)
• Change directory (cd) to the BioDownload folder
perl<space>gc.pl<space>PhytophSeq1.txt<space>phyto_gc.out
Running the script
![Page 5: Using GC content to distinguish Phytophthora sequences from tomato sequences](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c1103/html5/thumbnails/5.jpg)
In cygwin (Windows users) or terminal (Mac users)
grep<space>--perl-regexp<space>”\t”<space>-c<space>phytoph_gc.out
grep<space>”>”<space>-c<space>PhytophSeq1.txt
You should get the same number from the two commands.
The number should be 3921.
Results
![Page 6: Using GC content to distinguish Phytophthora sequences from tomato sequences](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c1103/html5/thumbnails/6.jpg)
The output file
GC content column
Namecolumn
![Page 7: Using GC content to distinguish Phytophthora sequences from tomato sequences](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c1103/html5/thumbnails/7.jpg)
Build a histogram of the values of GC content
We will use R program to accomplish this mission.
Mission #2
![Page 8: Using GC content to distinguish Phytophthora sequences from tomato sequences](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c1103/html5/thumbnails/8.jpg)
http://www.r-project.org
![Page 9: Using GC content to distinguish Phytophthora sequences from tomato sequences](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c1103/html5/thumbnails/9.jpg)
![Page 10: Using GC content to distinguish Phytophthora sequences from tomato sequences](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c1103/html5/thumbnails/10.jpg)
![Page 11: Using GC content to distinguish Phytophthora sequences from tomato sequences](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c1103/html5/thumbnails/11.jpg)
Mac users
![Page 12: Using GC content to distinguish Phytophthora sequences from tomato sequences](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c1103/html5/thumbnails/12.jpg)
All Windows users
![Page 13: Using GC content to distinguish Phytophthora sequences from tomato sequences](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c1103/html5/thumbnails/13.jpg)
XP users
Vista users
![Page 14: Using GC content to distinguish Phytophthora sequences from tomato sequences](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c1103/html5/thumbnails/14.jpg)
![Page 15: Using GC content to distinguish Phytophthora sequences from tomato sequences](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c1103/html5/thumbnails/15.jpg)
getwd() to know which folder you are in now
![Page 16: Using GC content to distinguish Phytophthora sequences from tomato sequences](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c1103/html5/thumbnails/16.jpg)
setwd(“c:/BioDownload”) to change the working directory to C:/BioDownload
setwd(“/path/to/biodownload”) for Mac users
![Page 17: Using GC content to distinguish Phytophthora sequences from tomato sequences](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c1103/html5/thumbnails/17.jpg)
data<-read.table(“phytoph_gc.out”,sep=“\t”,header=FALSE)
to read in the data in the file phytoph_gc.out (your file name may be different)
![Page 18: Using GC content to distinguish Phytophthora sequences from tomato sequences](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c1103/html5/thumbnails/18.jpg)
data[1:10,]
to see the first 10 lines of the vector “data”
![Page 19: Using GC content to distinguish Phytophthora sequences from tomato sequences](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c1103/html5/thumbnails/19.jpg)
gc<-data[,2]
to assign the values from the 2nd column of “data” to a new vector “gc”
![Page 20: Using GC content to distinguish Phytophthora sequences from tomato sequences](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c1103/html5/thumbnails/20.jpg)
summary(gc)
to get the summary of the values in the vector “gc”
![Page 21: Using GC content to distinguish Phytophthora sequences from tomato sequences](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c1103/html5/thumbnails/21.jpg)
hist(gc,breaks=58)
to draw a histogram of the values in “gc” vector
Breaks indicates how many cells you want for the histogram. It was calculated as 78.7 (max) - 21.2 (min). It means the bin of the histogram is ~ 1 GC value
![Page 22: Using GC content to distinguish Phytophthora sequences from tomato sequences](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c1103/html5/thumbnails/22.jpg)
hist(gc,breaks=58,xlab=“GC content”,ylim=range(c(0,400)),main=“Histogram of GC content of sequences\ninPhytophthora-tomato interactome”)
to make the histogram look better
![Page 23: Using GC content to distinguish Phytophthora sequences from tomato sequences](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c1103/html5/thumbnails/23.jpg)
>pdf(“gc_histogram.pdf”)>hist(gc,breaks=58,xlab=“GC content”,ylim=range(c(0,400)),main=“Histogram of GC content of sequences\ninPhytophthora-tomato interactome”)>dev.off()
To output the histogram to a PDF file.
![Page 24: Using GC content to distinguish Phytophthora sequences from tomato sequences](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cf35503460f949c1103/html5/thumbnails/24.jpg)
location
file