the refgene database

19
The Refgene Database

Upload: callum-malone

Post on 30-Dec-2015

26 views

Category:

Documents


2 download

DESCRIPTION

The Refgene Database. Currently we use Google Spreadsheets to Track Reference Genes. Currently we use Google Spreadsheets to Track Reference Genes. Problems with Google Spreadsheets. Occasional errors picking a gene previously selected Every group has their own Spreadsheet - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Refgene Database

The Refgene Database

Page 2: The Refgene Database

Currently we use Google Spreadsheets to Track Reference Genes

Page 3: The Refgene Database

Currently we use Google Spreadsheets to Track Reference Genes

Page 4: The Refgene Database

Problems with Google Spreadsheets

• Occasional errors picking a gene previously selected

• Every group has their own Spreadsheet– Makes cross-access difficult– They are not integrated in any real way

• They are hard to Maintain– Every month they are updated by hand– Hand editing leads to mistakes

Page 5: The Refgene Database

It would be Easier to Have a Database to Keep Track of Reference Genes

• Data from all reference genomes would be integrated and searchable

• They could be automatically updated– New genes and their homologs in every species could

populate automatically from Kara’s homology compute.– MODs could provide database reports to update stats on

annotation progress

• Stats for pubs and grant updates could be retrieved easily

Page 6: The Refgene Database

Proposed Interfaces

• Refgene interface: Used to set new target genes each month.– As genes are targeted, homologs are

automatically populated from P-POD clusters

• MOD interface: Used by MOD curators– Allows MOD curators to hand-modify individual

records

Page 7: The Refgene Database

Fields Required

• homolog gene identifiers for a MOD

• Date that the gene was deemed comprehensively annotated

• Date the gene was chosen as a reference gene

Page 8: The Refgene Database

Functionality that should be on the Ref. Gen. Curation Home Page

1. Log in

2. Enter new curation target genes

3. Homologs: - Upload orthologs for those genes- Manually add orthologs- Enter 'curation status'

4. Generate reports

Page 9: The Refgene Database

The Database Should also Accept Automated Loads from MODs

Curation statusCuration target genes

MOD ID

SGD02541231

SGD02541231

SGD02541231

SGD02541231

SGD12334234

SGD21314434

Page 10: The Refgene Database

1. Curator log in - 1

Login:

Password:

Curator Login-Once logged in, the curator name and MOD can be assumed anywhere that data is needed. - Also prevents random folks from editing and should probably also restrict editing capability to only homologs from the species you are a curator of? - We all still need to be able to add new target genes regardless of our MOD however.

-Once logged in, the curator name and MOD can be assumed anywhere that data is needed. - Also prevents random folks from editing and should probably also restrict editing capability to only homologs from the species you are a curator of? - We all still need to be able to add new target genes regardless of our MOD however.

Login

Species Homologs No homolog Comprehensive

A.thaliana 220 10 190

C.elegans 200 30 189

D.rerio 175 35 170

D.discoideum 250 7 200

D.melanogaster 185 19 160

E.coli 100 25 98

G.gallus 250 19 120

H.sapiens 274 2 100

Etc… … …

Reference Genomes SnapshotTarget genes: 275

86%

To Curation Central

Page 11: The Refgene Database

Search TargetsGene Symbol

OrEntrez Gene ID:

Search

Logged in: doughowe | ZFIN

Upload New Targets

06- 2008Target Completion Date (MM-YYYY)

Upload

Gene Name Completion target date

homologs Report

[can edit] [can edit] View/enter homologs

View Curation Report

[can edit] [can edit] View/enter homologs

View Curation Report

[can edit] [can edit] View/enter homologs

View Curation Report

[can edit] [can edit] View/enter homologs

View Curation Report

Search result takes you to the homolog Add/Update page shown on the bottom of slide 13 for the specific target gene located in the search. More than one gene located by the search shows a list of these genes which then link to the homolog View/Add/Update page for each gene

Search result takes you to the homolog Add/Update page shown on the bottom of slide 13 for the specific target gene located in the search. More than one gene located by the search shows a list of these genes which then link to the homolog View/Add/Update page for each geneView reports

by organism:

GO

by gene:

drop down listenter gene

Symbol or MOD ID

Reports:

Targets TAB file: 06- 2008 Browse

•TO DO List (not comprehensive)•ISS could be added•Potential outliers

Access to reports could be done a couple of ways

Curation Targets

[view all]

1. Curator log in - 2

Upload orthologsSelect month: pull-down list

Page 12: The Refgene Database

2. Upload New Targets - 1

• Check that all required data is included• Check that no genes on new targets list have

been a target or called as homologs previously…if so, reject load and alert curator to any/all duplications so they can select new targets and try again.

Page 13: The Refgene Database

Upload

Upload New Targets

06- 2008Target Completion Date (MM-YYYY)

Upload

Upload Targets TAB file:

Browse

Gene Name MOD ID Completion target date

[edit/autofill] (A)

[edit/autofill] 06-2008

[edit/autofill] [edit/autofill] 06-2008

[edit/autofill] [edit/autofill] 06-2008

[edit/autofill] [edit/autofill] 06-2008

Curator enters target date here; and that info is applied to the new table Curator enters target date here; and that info is applied to the new table

2. Upload New Targets - 2

option 1

option 2

MOD ID

SGD02541231

SGD02541231

SGD02541231

SGD02541231

SGD12334234

SGD21314434

load file

Upload

Gene Name MOD ID Completion target date

RNR1 SGD02541231 06-2008

CDC2 SGD02541231 06-2008

ADE4 SGD02541231 06-2008

POL6 SGD02541231 06-2008

(A) We need to be able to enter either an ID or a gene name (in the same or in a different column(B) Need a check for genes already selected

(A) We need to be able to enter either an ID or a gene name (in the same or in a different column(B) Need a check for genes already selected

Page 14: The Refgene Database

Upload

Gene Name MOD ID Completion target date

RNR1 SGD02541231 06-2008

CDC2 SGD02541231 06-2008

ADE4 SGD02541231 06-2008

POL6 SGD02541231 06-2008

etc… etc… etc…

2. Upload New Targets - 3

Your upload was successful! Error: CDC2 (SGD02541231) has already been selected. Please go back and replace this entry.

Page 15: The Refgene Database

Search TargetsGene Symbol

OrEntrez Gene ID:

Search

Logged in: doughowe | ZFIN

Upload New Targets

06- 2008Target Completion Date (MM-YYYY)

Upload

View reportsby organism:

GO

by gene:

drop down listenter gene

Symbol or MOD ID

Reports:

Targets TAB file: 06- 2008 Browse

•TO DO List (not comprehensive)•ISS could be added•Potential outliers

[view all]

3. Upload orthologs - 1

Upload orthologsSelect month: pull-down list GO

Page 16: The Refgene Database

3. OrthologsUpload orthologs

Select month: 06-2008 GO

S. cerevisiae E. coli H. sapiens D. discoideum D. reiro D. melanogaster M. musculus etc

RNR1|SGD02541231 nrdA|P00832 RNR1|P34234 rnrA|DDB098908 rnra|DR23423 rnrA|DM8787 RNR1|MGI:2332

CDC2|SGD02541231 no ortholog found

PK12 cdc22

ADE4|SGD02541231 adeS|P32434

POL6|SGD02541231 polR|P3434

etc… etc…

loads calculated data from P-POD and other available methods

Page 17: The Refgene Database

View/Enter/Edit Homolog page:Target Gene: POLA (H.Sap)

Organism Gene Symbol Comprehensive curation (date)

Curator

Target Gene Symbol

Completion target date

Homologues Report

POLA P09884 Aug 2006 View/enter homologs

View Curation Report

ADH1A P07327 Nov 2007 View/enter homologs

View Curation Report

GOT1 P17174 Jan 2008 View/enter homologs

View Curation Report

GOT2 P00505 Jan 2008 View/enter homologs

View Curation Report

P-P

OD

InP

ara

no

id

Tre

eF

am

Ho

mo

loG

en

e

Ort

ho

MC

L

Ma

nu

al

An

aly

sis

Pu

bli

sh

ed

ho

mo

logHomology Determination Methods

D.mel pol1a 1/20/2008 [Curator name]

[note]

D.rer pola - [Curator name]

[note]

M.mus No homolog 4/10/2007 [Curator name]

[note]

C.ele pola - [Curator name]

[note]

[note]

curator enters symbol curation date curator name PD

Add

Species PD

This is an interface to support adding individual homologs to specific target genes as well as edit previously added homologs..more detail next slide

Clicking View/Enter homologsgoes to new Target gene-specific page to edit/add homologs

[note]

Page 18: The Refgene Database

Organism Gene Comprehensive curation (date)

Curator

H. sapiens

C. elegans

D. Discoideum

D. melanogaster

E. Coli

M. musculus

r. norvegicus

S. pombe

Etc…

pull down menu OR organism already chosen based upon curator log in name

pull down menu OR organism already chosen based upon curator log in name

curator enters symbol/ID curation date curator name

Curators enter gene symbols OR MOD object IDs. TAIR gene symbols are not unique.

Curators enter gene symbols OR MOD object IDs. TAIR gene symbols are not unique.

pull down menu listing curators. If user login is supported, this column could probably be dropped and would be assumed behind the scenes based on the user login.

Currently logged in user displayed at top of page like this:

Logged In: dhowe|ZFIN.

pull down menu listing curators. If user login is supported, this column could probably be dropped and would be assumed behind the scenes based on the user login.

Currently logged in user displayed at top of page like this:

Logged In: dhowe|ZFIN.

[note]

P-P

OD

InP

ara

no

id

Tre

eF

am

Ho

mo

loG

en

e

Ort

ho

MC

L

Ma

nu

al

Cu

rati

on

Pu

bli

sh

ed

ho

mo

logHomology Determination Methods

Submit

A checkbox set to assert which homology determination methods were used to support the homology call. Full set may be more than shown here.

A checkbox set to assert which homology determination methods were used to support the homology call. Full set may be more than shown here.

View/Enter/Edit Homolog page:Target Gene: POLA (H.Sap)

Clicking submit adds a new homolog to the DB after checking that the gene has not already been added as a target gene or homologClicking submit adds a new homolog to the DB after checking that the gene has not already been added as a target gene or homolog

Notes are used to describe anything specific about the orthology callNotes are used to describe anything specific about the orthology call

Page 19: The Refgene Database

Organism Homologue Date comprehensively curated

Arabidopsis thaliana AT541 2008-01-02

Caenorhabditis elegans CE1121 2008-01-02

Danio rerio CD2121CD2122

2008-01-022008-01-03

Dictyostelium discoideum no homolog 2008-01-02

Drosophila melanogaster DM0022 2008-01-02

Escherichia coli no homolog 2008-01-02

Gallus gallus GGAA11 2008-01-02

Homo sapiens P09884 2008-01-02

Mus musculus MGI:212121212 2008-01-02

Rattus norvegicus RGD:1212121 2008-01-02

Saccharomyces cerevisiae no homolog 2008-01-02

Schizosaccharomyces pombe

no homolog 2008-01-02

View curation report: POLA

Options: -select all/unselect all-View annotations: * all * non-IEA * experimental

4. Reports