first of all: “darnit jim, i’m a doctor not a bioinformatician!”

Download First of all: “Darnit Jim, I’m a doctor not a bioinformatician!”

If you can't read please download the document

Upload: lora-atkins

Post on 18-Jan-2018

217 views

Category:

Documents


0 download

DESCRIPTION

First time experience of Galaxy

TRANSCRIPT

First of all: Darnit Jim, Im a doctor not a bioinformatician! Researcher interested in gene expression I have obtained raw RNAseq files (FASTQ) for a set of cell lines. How can I process this data and examine my gene(s) of interest? Do it yourself using TraIT tools: run available NGS workflow in Galaxy Ask a bioinformatician First time experience of Galaxy Looks like RNA expression analysis But, I have something called a FASTQ file I dont know about this format, where do I get such a reference? Looks like RNA expression analysis How do I know that the settings here are correct for my type of data? And many more options Instead of a BAM file I have a FASTQ file. How do I process this? Solution: readily available workflow And other pipelines in progress Gene expression: input parameters Ideally metadata on these parameters was provided by original data owners and/or can be traced back (own data known; from other person trace back) Trial run For 4 colorectal cancer cell lines the FASTQ files were provided. Data owner could provide: platform adapter sequences library type Wanted to compare these to the processed RNAseq data of prostate cell lines (same experimental platform was used). Ran workflow and obtained readcounts/measure of expression for the new cell lines. Comparison: colon and prostate Possible for non/little-informed user to run Galaxy workflow and obtain results in a format that can be used in downstream analysis. Further analysis Usually, comparison is tumour sample vs normal sample. EdgeR is available to perform this comparison. Comparison of expression between groups is possible (e.g. colorectal cell lines vs prostate cell lines), however, when I have only cell lines: how to solve the question: does my gene of interest show altered expression in a particular sample compared to a reference sample? Issues When not in possession of normal/reference in the dataset (T only, cell lines), how to determine altered expression of a gene of interest? Use a general normal reference that needs to be provided for comparison? (standard cut-off for increased or decreased expression) xxx reads = increased exp? Calculate a median expression for all genes of the platform and then compare expression of one gene to median expression of all genes (significant outliers?) Distiguish expression of a gene in diploid vs aneuploid cells trouble, in most cases no ploidy status known Issues When investigating data in the data-integration platform, query for the gene AURKA will give certain results. If one study had T/N and the other only T and different manners for determining altered expression were applied can this data be compared? Pro: its processed and called data youre comparing in this platform, trust the called data Con: I dont think its fair to compare differently called data if comparing such datasets, start from the beginning and treat in the same manner convert the data of the T/N analysed data to T-only or cell line only analysed