biomedical text mining

87
Lars Juhl Jensen Biomedical text mining

Upload: gray-cummings

Post on 04-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Biomedical text mining. Lars Juhl Jensen. exponential growth. ~45 seconds per paper. information retrieval. named entity recognition. augmented browsing. text corpora. information extraction. information retrieval. find the relevant papers. ad hoc retrieval. user-specified query. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Biomedical text mining

Lars Juhl Jensen

Biomedical text mining

Page 2: Biomedical text mining

exponential growth

Page 3: Biomedical text mining
Page 4: Biomedical text mining
Page 5: Biomedical text mining

~45 seconds per paper

Page 6: Biomedical text mining

information retrieval

Page 7: Biomedical text mining

named entity recognition

Page 8: Biomedical text mining

augmented browsing

Page 9: Biomedical text mining

text corpora

Page 10: Biomedical text mining

information extraction

Page 11: Biomedical text mining

information retrieval

Page 12: Biomedical text mining

find the relevant papers

Page 13: Biomedical text mining

ad hoc retrieval

Page 14: Biomedical text mining

user-specified query

Page 15: Biomedical text mining

“yeast AND cell cycle”

Page 16: Biomedical text mining

PubMed

Page 17: Biomedical text mining
Page 18: Biomedical text mining

indexing

Page 19: Biomedical text mining

fast lookup

Page 20: Biomedical text mining

stemming

Page 21: Biomedical text mining

word endings

Page 22: Biomedical text mining

dynamic query expansion

Page 23: Biomedical text mining

MeSH terms

Page 24: Biomedical text mining

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1

and this modification served as a priming step to promote subsequent

Cdc5-dependent Swe1 hyperphosphorylation and degradation

Page 25: Biomedical text mining

no tool will find that

Page 26: Biomedical text mining

named entity recognition

Page 27: Biomedical text mining

computer

Page 28: Biomedical text mining

as smart as a dog

Page 29: Biomedical text mining

teach it specific tricks

Page 30: Biomedical text mining
Page 31: Biomedical text mining
Page 32: Biomedical text mining

identify the concepts

Page 33: Biomedical text mining

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1

and this modification served as a priming step to promote subsequent

Cdc5-dependent Swe1 hyperphosphorylation and degradation

Page 34: Biomedical text mining

comprehensive lexicon

Page 35: Biomedical text mining

proteins

Page 36: Biomedical text mining

chemicals

Page 37: Biomedical text mining

compartments

Page 38: Biomedical text mining

tissues

Page 39: Biomedical text mining

diseases

Page 40: Biomedical text mining

organisms

Page 41: Biomedical text mining

CDC2

Page 42: Biomedical text mining

cyclin dependent kinase 1

Page 43: Biomedical text mining

orthographic variation

Page 44: Biomedical text mining

upper- and lower-case

Page 45: Biomedical text mining

CDC2

Page 46: Biomedical text mining

Cdc2

Page 47: Biomedical text mining

spaces and hyphens

Page 48: Biomedical text mining

cyclin dependent kinase 1

Page 49: Biomedical text mining

cyclin-dependent kinase 1

Page 50: Biomedical text mining

prefixes and postfixes

Page 51: Biomedical text mining

CDC2

Page 52: Biomedical text mining

hCDC2

Page 53: Biomedical text mining

“black list”

Page 54: Biomedical text mining

SDS

Page 55: Biomedical text mining

scalable implementation

Page 56: Biomedical text mining

text corpora

Page 57: Biomedical text mining

>10 km<10 hours

Page 58: Biomedical text mining

most use Medline

Page 59: Biomedical text mining

~22 million abstracts

Page 60: Biomedical text mining

few use full-text articles

Page 61: Biomedical text mining

no access

Page 62: Biomedical text mining

PDF files

Page 63: Biomedical text mining
Page 64: Biomedical text mining

layout-aware extraction

Page 65: Biomedical text mining

millions of full-text articles

Page 66: Biomedical text mining

information extraction

Page 67: Biomedical text mining

formalize the facts

Page 68: Biomedical text mining

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1

and this modification served as a priming step to promote subsequent

Cdc5-dependent Swe1 hyperphosphorylation and degradation

Page 69: Biomedical text mining

two approaches

Page 70: Biomedical text mining

co-mentioning

Page 71: Biomedical text mining

counting

Page 72: Biomedical text mining

within documents

Page 73: Biomedical text mining

within paragraphs

Page 74: Biomedical text mining

within sentences

Page 75: Biomedical text mining

co-mentioning score

Page 76: Biomedical text mining

NLPNatural Language Processing

Page 77: Biomedical text mining

grammatical analysis

Page 78: Biomedical text mining

part-of-speech tagging

Page 79: Biomedical text mining

multiword detection

Page 80: Biomedical text mining

semantic tagging

Page 81: Biomedical text mining

sentence parsing

Page 82: Biomedical text mining

Gene and protein namesCue words for entity recognitionVerbs for relation extraction

[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]

Page 83: Biomedical text mining

extract stated facts

Page 84: Biomedical text mining

high precision

Page 85: Biomedical text mining

poor recall

Page 86: Biomedical text mining

ExerciseGo to http://diseases.jensenlab.org

Find TYMS disease associations

Inspect the text-mining evidence

Look for examples of synonym usage

Find genes linked to colorectal cancer

Page 87: Biomedical text mining

thank you!