integration of biomedical data and electronic publications
DESCRIPTION
10th International Symposium on Electronic Theses and Dissertations, Uppsala University, Uppsala, Sweden, June 13-16, 2007TRANSCRIPT
Integration of biomedical dataand electronic publications
Lars Juhl JensenEMBL Heidelberg
printed publications
dead wood
electronic publications
virtual dead wood
de Lichtenberg et al., Science, 2005
small font sizes
“no”
Jensen et al., Nature Reviews Genetics, 2006
small font sizes
hyperlinks
“no”
“hell no”
why?
archival
reanalysis
data mining
reader interaction
what?
raw data
processed data
final data
“facts”
where?
part of the document
too much data
too coarse grained
escalates the problem
institutional repositories
too many types of data
lack of standardization
difficult to download all data
public databases
specialization
standardization
mandatory deposition
easy to download all data
cross references
examples from biomedicine
GenBank
17.9 million sequences
80 billion nucleotides
UniProt
4.7 million sequences
Ensembl
35 complete genomes
PDB
44000 protein structures
GEO
5800 data sets
152000 samples
ArrayExpress
1800 data sets
BioGRID
186000 interactions
129000 proteins
MINT
103000 interactions
28000 proteins
PubChem
7.5 million compounds
PubMed Central
330 open access journals
12000 open access papers
downloadable
standardized formats
cross-referenced
archival
reanalysis
data mining
reader interaction
thank you!