the refgene database
DESCRIPTION
The Refgene Database. Currently we use Google Spreadsheets to Track Reference Genes. Currently we use Google Spreadsheets to Track Reference Genes. Problems with Google Spreadsheets. Occasional errors picking a gene previously selected Every group has their own Spreadsheet - PowerPoint PPT PresentationTRANSCRIPT
The Refgene Database
Currently we use Google Spreadsheets to Track Reference Genes
Currently we use Google Spreadsheets to Track Reference Genes
Problems with Google Spreadsheets
• Occasional errors picking a gene previously selected
• Every group has their own Spreadsheet– Makes cross-access difficult– They are not integrated in any real way
• They are hard to Maintain– Every month they are updated by hand– Hand editing leads to mistakes
It would be Easier to Have a Database to Keep Track of Reference Genes
• Data from all reference genomes would be integrated and searchable
• They could be automatically updated– New genes and their homologs in every species could
populate automatically from Kara’s homology compute.– MODs could provide database reports to update stats on
annotation progress
• Stats for pubs and grant updates could be retrieved easily
Proposed Interfaces
• Refgene interface: Used to set new target genes each month.– As genes are targeted, homologs are
automatically populated from P-POD clusters
• MOD interface: Used by MOD curators– Allows MOD curators to hand-modify individual
records
Fields Required
• homolog gene identifiers for a MOD
• Date that the gene was deemed comprehensively annotated
• Date the gene was chosen as a reference gene
Functionality that should be on the Ref. Gen. Curation Home Page
1. Log in
2. Enter new curation target genes
3. Homologs: - Upload orthologs for those genes- Manually add orthologs- Enter 'curation status'
4. Generate reports
The Database Should also Accept Automated Loads from MODs
Curation statusCuration target genes
MOD ID
SGD02541231
SGD02541231
SGD02541231
SGD02541231
SGD12334234
SGD21314434
1. Curator log in - 1
Login:
Password:
Curator Login-Once logged in, the curator name and MOD can be assumed anywhere that data is needed. - Also prevents random folks from editing and should probably also restrict editing capability to only homologs from the species you are a curator of? - We all still need to be able to add new target genes regardless of our MOD however.
-Once logged in, the curator name and MOD can be assumed anywhere that data is needed. - Also prevents random folks from editing and should probably also restrict editing capability to only homologs from the species you are a curator of? - We all still need to be able to add new target genes regardless of our MOD however.
Login
Species Homologs No homolog Comprehensive
A.thaliana 220 10 190
C.elegans 200 30 189
D.rerio 175 35 170
D.discoideum 250 7 200
D.melanogaster 185 19 160
E.coli 100 25 98
G.gallus 250 19 120
H.sapiens 274 2 100
Etc… … …
Reference Genomes SnapshotTarget genes: 275
86%
To Curation Central
Search TargetsGene Symbol
OrEntrez Gene ID:
Search
Logged in: doughowe | ZFIN
Upload New Targets
06- 2008Target Completion Date (MM-YYYY)
Upload
Gene Name Completion target date
homologs Report
[can edit] [can edit] View/enter homologs
View Curation Report
[can edit] [can edit] View/enter homologs
View Curation Report
[can edit] [can edit] View/enter homologs
View Curation Report
[can edit] [can edit] View/enter homologs
View Curation Report
Search result takes you to the homolog Add/Update page shown on the bottom of slide 13 for the specific target gene located in the search. More than one gene located by the search shows a list of these genes which then link to the homolog View/Add/Update page for each gene
Search result takes you to the homolog Add/Update page shown on the bottom of slide 13 for the specific target gene located in the search. More than one gene located by the search shows a list of these genes which then link to the homolog View/Add/Update page for each geneView reports
by organism:
GO
by gene:
drop down listenter gene
Symbol or MOD ID
Reports:
Targets TAB file: 06- 2008 Browse
•TO DO List (not comprehensive)•ISS could be added•Potential outliers
Access to reports could be done a couple of ways
Curation Targets
[view all]
1. Curator log in - 2
Upload orthologsSelect month: pull-down list
2. Upload New Targets - 1
• Check that all required data is included• Check that no genes on new targets list have
been a target or called as homologs previously…if so, reject load and alert curator to any/all duplications so they can select new targets and try again.
Upload
Upload New Targets
06- 2008Target Completion Date (MM-YYYY)
Upload
Upload Targets TAB file:
Browse
Gene Name MOD ID Completion target date
[edit/autofill] (A)
[edit/autofill] 06-2008
[edit/autofill] [edit/autofill] 06-2008
[edit/autofill] [edit/autofill] 06-2008
[edit/autofill] [edit/autofill] 06-2008
Curator enters target date here; and that info is applied to the new table Curator enters target date here; and that info is applied to the new table
2. Upload New Targets - 2
option 1
option 2
MOD ID
SGD02541231
SGD02541231
SGD02541231
SGD02541231
SGD12334234
SGD21314434
load file
Upload
Gene Name MOD ID Completion target date
RNR1 SGD02541231 06-2008
CDC2 SGD02541231 06-2008
ADE4 SGD02541231 06-2008
POL6 SGD02541231 06-2008
(A) We need to be able to enter either an ID or a gene name (in the same or in a different column(B) Need a check for genes already selected
(A) We need to be able to enter either an ID or a gene name (in the same or in a different column(B) Need a check for genes already selected
Upload
Gene Name MOD ID Completion target date
RNR1 SGD02541231 06-2008
CDC2 SGD02541231 06-2008
ADE4 SGD02541231 06-2008
POL6 SGD02541231 06-2008
etc… etc… etc…
2. Upload New Targets - 3
Your upload was successful! Error: CDC2 (SGD02541231) has already been selected. Please go back and replace this entry.
Search TargetsGene Symbol
OrEntrez Gene ID:
Search
Logged in: doughowe | ZFIN
Upload New Targets
06- 2008Target Completion Date (MM-YYYY)
Upload
View reportsby organism:
GO
by gene:
drop down listenter gene
Symbol or MOD ID
Reports:
Targets TAB file: 06- 2008 Browse
•TO DO List (not comprehensive)•ISS could be added•Potential outliers
[view all]
3. Upload orthologs - 1
Upload orthologsSelect month: pull-down list GO
3. OrthologsUpload orthologs
Select month: 06-2008 GO
S. cerevisiae E. coli H. sapiens D. discoideum D. reiro D. melanogaster M. musculus etc
RNR1|SGD02541231 nrdA|P00832 RNR1|P34234 rnrA|DDB098908 rnra|DR23423 rnrA|DM8787 RNR1|MGI:2332
CDC2|SGD02541231 no ortholog found
PK12 cdc22
ADE4|SGD02541231 adeS|P32434
POL6|SGD02541231 polR|P3434
etc… etc…
loads calculated data from P-POD and other available methods
View/Enter/Edit Homolog page:Target Gene: POLA (H.Sap)
Organism Gene Symbol Comprehensive curation (date)
Curator
Target Gene Symbol
Completion target date
Homologues Report
POLA P09884 Aug 2006 View/enter homologs
View Curation Report
ADH1A P07327 Nov 2007 View/enter homologs
View Curation Report
GOT1 P17174 Jan 2008 View/enter homologs
View Curation Report
GOT2 P00505 Jan 2008 View/enter homologs
View Curation Report
P-P
OD
InP
ara
no
id
Tre
eF
am
Ho
mo
loG
en
e
Ort
ho
MC
L
Ma
nu
al
An
aly
sis
Pu
bli
sh
ed
ho
mo
logHomology Determination Methods
D.mel pol1a 1/20/2008 [Curator name]
[note]
D.rer pola - [Curator name]
[note]
M.mus No homolog 4/10/2007 [Curator name]
[note]
C.ele pola - [Curator name]
[note]
[note]
curator enters symbol curation date curator name PD
Add
Species PD
This is an interface to support adding individual homologs to specific target genes as well as edit previously added homologs..more detail next slide
Clicking View/Enter homologsgoes to new Target gene-specific page to edit/add homologs
[note]
Organism Gene Comprehensive curation (date)
Curator
H. sapiens
C. elegans
D. Discoideum
D. melanogaster
E. Coli
M. musculus
r. norvegicus
S. pombe
Etc…
pull down menu OR organism already chosen based upon curator log in name
pull down menu OR organism already chosen based upon curator log in name
curator enters symbol/ID curation date curator name
Curators enter gene symbols OR MOD object IDs. TAIR gene symbols are not unique.
Curators enter gene symbols OR MOD object IDs. TAIR gene symbols are not unique.
pull down menu listing curators. If user login is supported, this column could probably be dropped and would be assumed behind the scenes based on the user login.
Currently logged in user displayed at top of page like this:
Logged In: dhowe|ZFIN.
pull down menu listing curators. If user login is supported, this column could probably be dropped and would be assumed behind the scenes based on the user login.
Currently logged in user displayed at top of page like this:
Logged In: dhowe|ZFIN.
[note]
P-P
OD
InP
ara
no
id
Tre
eF
am
Ho
mo
loG
en
e
Ort
ho
MC
L
Ma
nu
al
Cu
rati
on
Pu
bli
sh
ed
ho
mo
logHomology Determination Methods
Submit
A checkbox set to assert which homology determination methods were used to support the homology call. Full set may be more than shown here.
A checkbox set to assert which homology determination methods were used to support the homology call. Full set may be more than shown here.
View/Enter/Edit Homolog page:Target Gene: POLA (H.Sap)
Clicking submit adds a new homolog to the DB after checking that the gene has not already been added as a target gene or homologClicking submit adds a new homolog to the DB after checking that the gene has not already been added as a target gene or homolog
Notes are used to describe anything specific about the orthology callNotes are used to describe anything specific about the orthology call
Organism Homologue Date comprehensively curated
Arabidopsis thaliana AT541 2008-01-02
Caenorhabditis elegans CE1121 2008-01-02
Danio rerio CD2121CD2122
2008-01-022008-01-03
Dictyostelium discoideum no homolog 2008-01-02
Drosophila melanogaster DM0022 2008-01-02
Escherichia coli no homolog 2008-01-02
Gallus gallus GGAA11 2008-01-02
Homo sapiens P09884 2008-01-02
Mus musculus MGI:212121212 2008-01-02
Rattus norvegicus RGD:1212121 2008-01-02
Saccharomyces cerevisiae no homolog 2008-01-02
Schizosaccharomyces pombe
no homolog 2008-01-02
View curation report: POLA
√
√
Options: -select all/unselect all-View annotations: * all * non-IEA * experimental
4. Reports