databases for knowledge discovery jan h. van bemmel erasmus university rotterdam databases for...

37
Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

Upload: janel-blake

Post on 29-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

Databases for Knowledge Discovery

Jan H. van BemmelErasmus University Rotterdam

Databases for Knowledge Discovery

Page 2: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

Databases for Knowledge Discovery

● Natural sciences physics, chemistry, engineering models, experiments, theories ► ’hard’ data

● Humanities arts, social sciences, economics behavioural studies, text analysis ► ‘soft’ data

● Biomedical and health sciences biomedicine, health sciences models, experiments, studies ► hard & soft data

Page 3: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

Databases for Knowledge Discovery

Biomedical research related to the 'hard' scientific approach as in physics and engineering

Clinical research using rather 'hard' data, and sometimes ‘soft’ subjective observations

Population-based research data collected from populations of healthy and ill persons This research can be subdivided into

• retrospective research• prospective research

● Biomedicine & health sciences

Page 4: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

Databases for Knowledge Discovery

Biomedical research related to the 'hard' scientific approach as in physics and engineering

Clinical research using rather 'hard' data, and sometimes ‘soft’ subjective observations

Population-based research data collected from populations of healthy and ill persons This research can be subdivided into

• retrospective research• prospective research

● Biomedicine & health sciences

Page 5: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

experiments

patients

populations

Basic Research

Clinical Research

Health Research

Biomedicine andHealth Sciences

RegionalRegionalDatabaseDatabase

RegionalRegionalDatabaseDatabase

RegionalRegionalDatabaseDatabase

ResearchResearchDatabaseDatabase

Databases for Knowledge Discovery

Discovery of new scientificknowledge from large databasesof measurements, observationsand interpretations

Page 6: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

Databases for Knowledge Discovery

Until recently, basic research in biomedicine was done on organs and organisms.

Nowadays the fundamental challenges lay a magnitude lower: on the level of molecules and cells.

Research on organs and organisms is still of interest: breakthroughs from biomolecular research are to be translated to higher levels.

● Biomedicine & health sciences

Page 7: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

Databases for Knowledge Discovery

Knowledge contained in multiple databases

of refereed articles and

databases on genes and proteins

MedLine: 11 million abstracts; 500,000/year searching for articles in sphere of interest how to find new knowledge? how to cope with serendipity?

● Biomedical research

Page 8: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

Databases for Knowledge Discovery

Different methods to retrieve knowledge:

simple Boolean expressions too specific: few references too broad: avalanche of references

use of a more complex ‘fingerprint’

combination of different databases complex retrieval using ontology dbase

● Biomedical research

for-ward

in-verse

Page 9: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

Databases for Knowledge Discovery

● Biomedical research

Page 10: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

Databases for Knowledge Discovery

● Biomedical research

Page 11: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

Databases for Knowledge Discovery

● Biomedical research

content fingerprints

JobsCVs, Skills

Articlesbooks

EmailsWord RFPs

people fingerprints

average

organisation fingerprints

average

Page 12: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

FindFindnewnew

associa-associa-tionstions

MatchingMatchingmethodsmethods

GeneticsDatabase

LiteratureDatabase

Databases for Knowledge Discovery

● Biomedical research

A – B B – C A – C

Datamining

Page 13: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

Databases for Knowledge Discovery

Composition of a thesaurus

from separate databases

GDB: AAA; BBB

LocusLink: AAA; CCC

Hugo NC: AAA

OMIM: BBB; CCC

SwissProt: BBB

concept: AAA

synonyms: BBB; CCC

● Biomedical research

FindFindnewnew

associa-associa-tionstions

MatchingMatching

methodsmethods

GeneticsDatabase

LiteratureDatabase

Page 14: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

Databases for Knowledge Discovery

● Biomedical research

Page 15: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

Databases for Knowledge Discovery

● Biomedical research

CollexionCollexionOntologyOntologydatabasedatabase

ACSACSconstruc-construc-

tortor

ACSACSmodelmodel

ACS: AssociativeConcept Space

ACSACS

viewerviewer

ACSACS

valida-valida-tiontion

Page 16: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

Databases for Knowledge Discovery

Biomedical research related to the 'hard' scientific approach as in physics and engineering

Clinical research using rather 'hard' data, and sometimes ‘soft’ subjective observations

Population-based research data collected from populations of healthy and ill persons This research can be subdivided into

• retrospective research• prospective research

● Biomedicine & health sciences

Page 17: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

Databases for Knowledge Discovery

● Clinical research

0

10

20

30

40

50

60

70

80

90

100

78 80 82 84 86 88 90 92 94 96

Per

cent

age

of p

rimar

y ca

re p

ract

ices

Year

98

Growth ofinformationsystems inprimary care

Computer-based patientrecords

UK

NL

Page 18: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

BloodLinkThe impact of

guidelines-based

decision support

on lab test ordering

in primary care.

Databases for Knowledge Discovery

● Clinical research

Page 19: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

BloodLink Control Guideline-

controlled clinical trial Group Group

No. of practices 21 23

No. of physicians 29 31

No. of patients 97,177 98,432

Sickfunds 52% 52%

No. of order forms 12,786 12,700

Databases for Knowledge Discovery

● Clinical research

Page 20: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

Databases for Knowledge Discovery

ESRTest

HemoglobinWBC countHematocriteCreatinineErytrocytesMCVDifferentiatieCholesterolTSHGamma-GTGlucose in serumALAT (SGPT)PotassiumASAT (SGOT)Glucose fastingTriglyceridesHDL cholesterolNatriumFree T4

5612BloodLink Guideline

6061371936113314336031593060341332132004296418921096

959128613981350

745618

-29%Difference

-17%-26%-25%-34%-28%-32%-26%

-1%+9%

-42%19%

-34%-53%-58%-20%

1%-2%

-30%-47%

7932BloodLink control

7332503948305024469046424151435429543466250128502320226916111380138210701163

Page 21: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

ESRTest

HemoglobinWBC countHematocriteCreatinineErytrocytesMCVDifferentiatieCholesterol

Gamma-GTGlucose in serumALAT (SGPT)PotassiumASAT (SGOT)Glucose fastingTriglyceridesHDL cholesterolNatriumFree T4

5612BloodLink Guideline

6061371936113314336031593060341332132004296418921096

959128613981350

745618

-29%Difference

-17%-26%-25%-34%-28%-32%-26%

-1%

-42%19%

-34%-53%-58%-20%

1%-2%

-30%

7932BloodLink control

7332503948305024469046424151435429543466250128502320226916111380138210701163

In case of thyroid disease, physicians were used to orderthe T4 test (free thyroxine); the protocol prescribed the TSH test instead (thyroidstimulating hormone)

+9%

-47%Free T4

TSH

Databases for Knowledge Discovery

Page 22: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

ESRTest

HemoglobinWBC countHematocriteCreatinineErytrocytesMCVDifferentiatieCholesterol

NatriumFree T4

5612BloodLink Guideline

6061371936113314336031593060341332132004296418921096

959128613981350

745618

-29%Difference

-17%-26%-25%-34%-28%-32%-26%

-1%+9%

-42%19%

-34%-53%

-58%-20%

1%-2%

-30%-47%

7932BloodLink control

7332503948305024469046424151435429543466250128502320226916111380138210701163

Tests, such as SGOT (serum glu-tamic oxalacetic transaminase), Gamma GT and SGPT, had been ordered almost automatically; theprotocols, however, did not support such tests. The same applies to K+.

TSH

Glucose in serum

ASAT (SGOT)Glucose fastingTriglyceridesHDL cholesterol

ALAT (SGPT)

Gamma-GT

Potassium

Gamma GTALAT (SGPT)ASAT (SGOT)

Databases for Knowledge Discovery

Page 23: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

BloodLink Control Guideline-

controlled clinical trial Group Group

No. of practices 21 23

No. of GPs 29 31

No. of patients 97,177 98,432

Sickfunds 52% 52%

No. of order forms 12,786 12,700

% of forms generated by BloodLink 89% 73%

No. of requested tests 87,634 70,479

Average No. of tests per order1 6.9 5.5

1Student's t-test, N=44, p<0.001

Databases for Knowledge Discovery

● Clinical research

Page 24: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

Cardiology

Databases for Knowledge Discovery

● Clinical research

Page 25: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

# sens spec

1 0.94 0.36

2 0.86 0.70

3 0.72 0.82

4 0.65 0.75

5 0.73 0.69

6 0.70 0.78

7 0.88 0.52

8 0.74 0.77

CS 0.74 0.88

Critiquing system for hypertension

sens(%)

100 90 80 70 60 50 40 30 20 10 00

10

20

30

40

50

60

70

80

90

100

spec (%)

Databases for Knowledge Discovery

● Clinical research

Page 26: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery
Page 27: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

Ref

eren

ce

Class N NL LVH RVH BVH AMI IMI MIX OTH VH+MI

NL 382 95.5 0.9 0.4 0.0 1.4 1.6 0.0 0.1

LVH 183 19.0 69.0 0.5 0.0 4.3 6.9 0.2 0.0

RVH 55 40.6 6.7 45.8 2.7 1.2 2.1 0.0 0.9

BVH 53 22.0 54.7 14.5 1.6 5.3 1.9 0.0 0.0

AMI 170 14.3 2.6 0.6 0.0 80.0 1.8 0.7 0.0

IMI 273 19.8 2.6 0.2 0.0 0.7 76.7 0.1 0.0

MIX 73 2.5 4.1 1.6 0.0 51.6 37.4 2.7 0.0

VH+MI 31 22.6 0.0 0.0 0.0 0.0 0.0 0.0 16.1 61.3

Databases for Knowledge Discovery

● Clinical research

Page 28: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

Computer-assistedECGinter-pretation

Assessment of different interpretation programs

60

65

70

75

80

85

90

60 65 70 75 80 85 90% agreement with clinical data

% a

gree

men

t wit

h re

fere

es

cardiologistssystems

Databases for Knowledge Discovery

● Clinical research

Page 29: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

Databases for Knowledge Discovery

Biomedical research related to the 'hard' scientific approach as in physics and engineering

Clinical research using rather 'hard' data, and sometimes ‘soft’ subjective observations

Population-based research data collected from populations of healthy and ill persons This research can be subdivided into

• retrospective research• prospective research

● Biomedicine & health sciences

Page 30: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

Databases for Knowledge Discovery

● Population-based research: retrospective

Post-marketing surveillance of drugs

Combinations of drugs: interactions

Longitudinal databases of about 500,000 patients

Patient privacy and data security

0

10

20

30

40

50

60

70

80

90

100

78 80 82 84 86 88 90 92 94 96

Per

cent

age

of p

rimar

y ca

re p

ract

ices

Year

98

Growth ofinformationsystems inprimary care

Computer-based patientrecords

UK

NL

health carepractices

CentralCentralDatabaseDatabase

CPRCPRCPRCPR

CPRCPRCPRCPR

Page 31: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

Databases for Knowledge Discovery

● Population-based research: retrospective

ResearchResearchdatabasedatabase

researchresearchdatadataresearchresearch

datadataresearchresearchdatadataresearchresearch

datadata

population-based

research

Page 32: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

Databases for Knowledge Discovery

● Population-based research: retrospective

ResearchResearchdatabasedatabase

researchresearchdatadataresearchresearch

datadataresearchresearchdatadataresearchresearch

datadata

population-based

research

coupling of clinical data to genealogical database

municipal records of > 20,000 individuals

each disorder could be coupled to common ancestor:

genes involved in diabetes, Alzheimer’s disease, etc.

recessivePedigree tree

Page 33: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

RotterdamRotterdamStudyStudy

Databases for Knowledge Discovery

ResearchResearchdatabasedatabase

researchresearchdatadataresearchresearch

datadataresearchresearchdatadataresearchresearch

datadata

population-based

research

● Population-based research: prospective

Page 34: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

RotterdamRotterdamStudyStudy

Databases for Knowledge Discovery

ResearchResearchdatabasedatabase

researchresearchdatadataresearchresearch

datadataresearchresearchdatadataresearchresearch

datadata

population-based

research Prospective longitudinal database

10,000 persions > 55 years of age

relationships between risks and diseases

cardiovascular and vessel-wall diseases, glaucoma

neurologic diseases (Alzheimer), osteoporosis

● Population-based research: prospective

Page 35: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

Databases for Knowledge Discovery

● Population-based research: prospective

Generation R

ResearchResearchdatabasedatabase

researchresearchdatadataresearchresearch

datadataresearchresearchdatadataresearchresearch

datadata

population-based

research

Page 36: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

Databases for Knowledge Discovery

Generation R

ResearchResearchdatabasedatabase

researchresearchdatadataresearchresearch

datadataresearchresearchdatadataresearchresearch

datadata

population-based

research Prospective longitudinal database

10,000 children from pregnancy onwards

relations risks and genetics/environmental data

perinatal circumstances, diseases at young age

cultural backgrounds, impact of education, etc.

● Population-based research: prospective

Page 37: Databases for Knowledge Discovery Jan H. van Bemmel Erasmus University Rotterdam Databases for Knowledge Discovery

Databases for Knowledge Discovery A formal ('forward‘ ) method in analysing large research databases may hamper the flexible attitude of a researcher, not knowing in advance what he may expect (serendipity).

‘Hard’ and‘soft’ examples from biomedicine and the health sciences show that computers can be very helpful in finding new and unforeseen (‘inverse’ ) associations between the data stored in research databases.

Well-documented databases are an enormous treasure for the advancement of scientific research.