smrt-portal exercises

Post on 16-Dec-2016

242 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

SMRT-Portal ExercisesJ Fass

UCD Genome Center Bioinformatics CoreThursday April 16, 2015

Running SMRT-Portal in AWS

see PacBio documentation

We’ll be running a virtual machine (VM) in the Amazon Web services “Cloud” (a server farm somewhere in the region you’ve selected). On this VM is a web server, serving you pages created by the SMRT-Portal application.

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

Running SMRT-Portal in AWS

Launch an m3.2xlarge instance using ami-953fddd1.

Generate or re-use a key pair - you will need it!

Once running, find the public IP address (#.#.#.#), and open a browser tab with the URL:#.#.#.#:8080/smrtanalysis … or … #.#.#.#:8080/smrtportal

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

Running SMRT-Portal in AWS

On a “vanilla” PacBio SMRT-Portal instance (U.S. East / N. Virginia), you would need to create one administrator account. This AMI already has one, but feel free to change the password, add non-admin accounts, etc.

user: administratorpwd: 5MRT-P0rtal

Note: pwd = >0 symbols, >0 numbers, >8 characters

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

Running SMRT-Portal in AWS

Log in as administrator (special user), then create separate accounts if desired.

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

How I imported 8 SMRT Cells (E coli)

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

SSH to AWS instance

ssh -i ~/.ssh/yourKey.pem ubuntu@54.193.130.19

ssh commandoption block (supplies private key in this case)destination (username@computername)

… (or use PuTTY) …

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

PacBio Public Datasets

https://github.com/PacificBiosciences/DevNet/wiki/Datasets

look for “Data supporting publications” …

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

PacBio Public Datasets

https://github.com/PacificBiosciences/DevNet/wiki/Datasets

look for “Data supporting publications” … look for the first MG1655 xml & bas.h5 files …

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

Enter “dropbox” directory

cd /opt/smrtanalysis/userdata/inputs_dropbox

cd commanddestination directory

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

Pull in data

mkdir MG1655

cd MG1655

wget [xml file link]

mkdir Analysis_Results

cd Analysis_Results

wget [bas.h5 file link, + bax.h5’s if present]

commanddirectory / destination / source

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

Import SMRT Cell data

Back in SMRT Portal, click through “Home” (upper left), then “Import and Manage” (third image), then “Input SMRT Cells.”

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

Import another SMRT Cell (exercise)

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

SSH to AWS instance

ssh -i ~/.ssh/yourKey.pem ubuntu@54.193.130.19

ssh commandoption block (supplies private key in this case)destination (username@computername)

… (or use PuTTY) …

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

PacBio Public Datasets

https://github.com/PacificBiosciences/DevNet/wiki/Datasets

look for “E. coli size selected 20kb library” …

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

PacBio Public Datasets

Find the SMRT Cell data files “tarball,” and copy the link (don’t download; you’ll break our wireless!).

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

Feeding Data to the SMRT-Portal

Back in a shell (terminal) on your instance, navigate to SMRT-Portal’s input dropbox.

cd /opt/smrtanalysis/userdata/inputs_dropboxwget [link]mkdir Ecoli20kbcd Ecoli20kbmkdir Analysis_Resultstar -xzvf ecoliK12.tar.gz

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

Feeding Data to the SMRT-Portal

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

Back in SMRT Portal, click through “Home” (upper left), then “Import and Manage” (third image), then “Input SMRT Cells.”

Feeding Data to the SMRT-Portal

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

Via Home, Import and Manage, and [Import] SMRT cells, get to import page. Select directory, and Scan.

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

HGAP Assembly

Running HGAP

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

Click Design Job, then Create New, (deal with the design wizard - I usually select “display all protocols”). You should see 9 SMRT Cells available (we just imported the 9th).

Running HGAP

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

Select the “RS_HGAP_Assembly.3” Protocol from the drop-down menu, enter name and (if desired) comments, select 20kb cell and click right arrowhead to add cell to the job you’re designing, then Save and Start!

Running HGAP

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

early results just assess reads, subreads ...

Running HGAP

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

Final results include pre-assembly, realigned reads, etc.

HGAP output

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

Find the Polished Assembly Fasta link, right-click and Save link as … (to avoid troublesome name).

HGAP output

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

Notice the BAM and BAI links; these allow you to view the original reads aligned back to the assembly (e.g. in IGV).

Check assembly via homology

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

Using Mauve, we’ll align our assembled genome to the trusted E. coli K-12 MG1655 reference assembly, from GenBank (link).

Check assembly via homology

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

Launch Mauve, then select File → Align with progressiveMauve. Then Add Sequence (click to add GenBank reference, then our assembly), click Align (and add a place to save output).

Check assembly via homology

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

(see Mauve site for details on viewer, etc. … we’ll explore during Workshop)

Check for circularity (if appropriate)

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

Launch Gepard, Select file specifying the polished genome assembly twice (once for horizontal, once for vertical), then create dotplot.

Check for circularity (if appropriate)

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

Looks fine, right? But the overlaps will be on the size scale of the reads … not visible at this scale.

Check for circularity (if appropriate)

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

Use the Advanced mode, Plot tab, to specify the first ~20kb on the horizontal, and the last ~20kb on the vertical. Then Update dotplot.

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

Alignment / Resequencing Protocols

Align to your own reference

In SMRT-Portal, go Home, then Import and Manage, then reference sequences. Select New to upload our down loaded reference (note there’s also a Scan option - upload first to /opt/smrtanalysis/userdata/references_dropbox/).

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

Align to your own reference

Design a job using the same reads, and the RS_Resequencing.1 protocol. Specify your uploaded reference sequence, save, and start the job. (I’m using E albertii in this case, RefSeq id NZ_CP007025.1)

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

UC Davis Genome Center | Bioinformatics Core | J Fass SMRT-Portal Exercises 2015-04-16

Viewing Read Alignments with IGV

top related