biomedical text mining: automatic processing of unstructured text

106
Lars Juhl Jensen Biomedical text mining Automatic processing of unstructured text

Upload: lars-juhl-jensen

Post on 28-Jan-2018

32 views

Category:

Science


1 download

TRANSCRIPT

Page 1: Biomedical text mining: Automatic processing of unstructured text

Lars Juhl Jensen

Biomedical text miningAutomatic processing of unstructured

text

Page 2: Biomedical text mining: Automatic processing of unstructured text

>10 km

Page 3: Biomedical text mining: Automatic processing of unstructured text

1 paper / 40 seconds

Page 4: Biomedical text mining: Automatic processing of unstructured text

patent literature

Page 5: Biomedical text mining: Automatic processing of unstructured text

grant proposals

Page 6: Biomedical text mining: Automatic processing of unstructured text

FDA product labels

Page 7: Biomedical text mining: Automatic processing of unstructured text
Page 8: Biomedical text mining: Automatic processing of unstructured text

electronic medical records

Page 9: Biomedical text mining: Automatic processing of unstructured text
Page 10: Biomedical text mining: Automatic processing of unstructured text

too much to read

Page 11: Biomedical text mining: Automatic processing of unstructured text

computer

Page 12: Biomedical text mining: Automatic processing of unstructured text

as smart as a dog

Page 13: Biomedical text mining: Automatic processing of unstructured text

teach it specific tricks

Page 14: Biomedical text mining: Automatic processing of unstructured text
Page 15: Biomedical text mining: Automatic processing of unstructured text
Page 16: Biomedical text mining: Automatic processing of unstructured text

named entity recognition

Page 17: Biomedical text mining: Automatic processing of unstructured text

comprehensive dictionary

Page 18: Biomedical text mining: Automatic processing of unstructured text

genes/proteins

Page 19: Biomedical text mining: Automatic processing of unstructured text

cyclin dependent kinase 1

Page 20: Biomedical text mining: Automatic processing of unstructured text

CDC2

Page 21: Biomedical text mining: Automatic processing of unstructured text

chemical compounds

Page 22: Biomedical text mining: Automatic processing of unstructured text

diseases

Page 23: Biomedical text mining: Automatic processing of unstructured text

adverse drug reactions

Page 24: Biomedical text mining: Automatic processing of unstructured text

cellular components

Page 25: Biomedical text mining: Automatic processing of unstructured text

tissues

Page 26: Biomedical text mining: Automatic processing of unstructured text

organisms

Page 27: Biomedical text mining: Automatic processing of unstructured text

environments

Page 28: Biomedical text mining: Automatic processing of unstructured text

orthographic variation

Page 29: Biomedical text mining: Automatic processing of unstructured text

flexible matching

Page 30: Biomedical text mining: Automatic processing of unstructured text

spaces and hyphens

Page 31: Biomedical text mining: Automatic processing of unstructured text

cyclin dependent kinase 1

Page 32: Biomedical text mining: Automatic processing of unstructured text

cyclin-dependent kinase 1

Page 33: Biomedical text mining: Automatic processing of unstructured text

expansion rules

Page 34: Biomedical text mining: Automatic processing of unstructured text

prefixes and suffixes

Page 35: Biomedical text mining: Automatic processing of unstructured text

CDC2

Page 36: Biomedical text mining: Automatic processing of unstructured text

hCdc2

Page 37: Biomedical text mining: Automatic processing of unstructured text

plural/adjective forms

Page 38: Biomedical text mining: Automatic processing of unstructured text

mitochondrion

Page 39: Biomedical text mining: Automatic processing of unstructured text

mitochondria

Page 40: Biomedical text mining: Automatic processing of unstructured text

mitochondrial

Page 41: Biomedical text mining: Automatic processing of unstructured text

abbreviated forms

Page 42: Biomedical text mining: Automatic processing of unstructured text

Saccharomyces cerevisiae

Page 43: Biomedical text mining: Automatic processing of unstructured text

S. cerevisiae

Page 44: Biomedical text mining: Automatic processing of unstructured text

“black list”

Page 45: Biomedical text mining: Automatic processing of unstructured text

SDS

Page 46: Biomedical text mining: Automatic processing of unstructured text

use cases

Page 47: Biomedical text mining: Automatic processing of unstructured text

assess studiedness

Page 48: Biomedical text mining: Automatic processing of unstructured text

TIN-X

Page 49: Biomedical text mining: Automatic processing of unstructured text

Cannon et al., Bioinformatics, 2017newdrugtargets.org

Page 50: Biomedical text mining: Automatic processing of unstructured text

interactive annotation

Page 51: Biomedical text mining: Automatic processing of unstructured text

EXTRACT

Page 52: Biomedical text mining: Automatic processing of unstructured text

Pafilis et al., Database, 2016extract.hcmr.gr

Page 53: Biomedical text mining: Automatic processing of unstructured text

extract.hcmr.gr Pafilis et al., Database, 2016

Page 54: Biomedical text mining: Automatic processing of unstructured text

implicit relations

Page 55: Biomedical text mining: Automatic processing of unstructured text

Encyclopedia of Life

Page 56: Biomedical text mining: Automatic processing of unstructured text

habitats

Page 57: Biomedical text mining: Automatic processing of unstructured text

Pafilis et al., Bioinformatics, 2016environments.hcmr.gr / eol.org

Page 58: Biomedical text mining: Automatic processing of unstructured text

SIDER

Page 59: Biomedical text mining: Automatic processing of unstructured text

adverse drug reactions

Page 60: Biomedical text mining: Automatic processing of unstructured text

Kuhn et al., Nucleic Acids Research, 2016sideeffects.embl.de

Page 61: Biomedical text mining: Automatic processing of unstructured text

relation extraction

Page 62: Biomedical text mining: Automatic processing of unstructured text

two approaches

Page 63: Biomedical text mining: Automatic processing of unstructured text

natural language processing

Page 64: Biomedical text mining: Automatic processing of unstructured text

part-of-speech tagging

Page 65: Biomedical text mining: Automatic processing of unstructured text

what you learned in schoolpronoun pronoun verb preposition noun

Page 66: Biomedical text mining: Automatic processing of unstructured text

sentence parsing

Page 67: Biomedical text mining: Automatic processing of unstructured text

Gene and protein namesCue words for entity

recognitionVerbs for relation extraction

[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]

Saric et al., Proceedings of ACL, 2004

Page 68: Biomedical text mining: Automatic processing of unstructured text

manually crafted rules

Page 69: Biomedical text mining: Automatic processing of unstructured text

machine learning

Page 70: Biomedical text mining: Automatic processing of unstructured text

manually annotated corpus

Page 71: Biomedical text mining: Automatic processing of unstructured text

association type

Page 72: Biomedical text mining: Automatic processing of unstructured text

direction

Page 73: Biomedical text mining: Automatic processing of unstructured text

high precision

Page 74: Biomedical text mining: Automatic processing of unstructured text

poor recall

Page 75: Biomedical text mining: Automatic processing of unstructured text

manual work

Page 76: Biomedical text mining: Automatic processing of unstructured text

co-mentioning

Page 77: Biomedical text mining: Automatic processing of unstructured text

counting

Page 78: Biomedical text mining: Automatic processing of unstructured text

within documents

Page 79: Biomedical text mining: Automatic processing of unstructured text

within paragraphs

Page 80: Biomedical text mining: Automatic processing of unstructured text

within sentences

Page 81: Biomedical text mining: Automatic processing of unstructured text

scoring scheme

Page 82: Biomedical text mining: Automatic processing of unstructured text

weighted counts

Page 83: Biomedical text mining: Automatic processing of unstructured text
Page 84: Biomedical text mining: Automatic processing of unstructured text

normalization

Page 85: Biomedical text mining: Automatic processing of unstructured text
Page 86: Biomedical text mining: Automatic processing of unstructured text

easy

Page 87: Biomedical text mining: Automatic processing of unstructured text

high recall

Page 88: Biomedical text mining: Automatic processing of unstructured text

high precision

Page 89: Biomedical text mining: Automatic processing of unstructured text

undirected associations

Page 90: Biomedical text mining: Automatic processing of unstructured text

unknown type

Page 91: Biomedical text mining: Automatic processing of unstructured text

use cases

Page 92: Biomedical text mining: Automatic processing of unstructured text

natural language processing

Page 93: Biomedical text mining: Automatic processing of unstructured text

transcription factor targets

Page 94: Biomedical text mining: Automatic processing of unstructured text

kinase substrates

Page 95: Biomedical text mining: Automatic processing of unstructured text

protein–protein interactions

Page 96: Biomedical text mining: Automatic processing of unstructured text

co-mentioning

Page 97: Biomedical text mining: Automatic processing of unstructured text

drug targets

Page 98: Biomedical text mining: Automatic processing of unstructured text

protein function

Page 99: Biomedical text mining: Automatic processing of unstructured text

subcellular localization

Page 100: Biomedical text mining: Automatic processing of unstructured text

Binder et al., Database, 2014compartments.jensenlab.org

Page 101: Biomedical text mining: Automatic processing of unstructured text

tissue expression

Page 102: Biomedical text mining: Automatic processing of unstructured text

tissues.jensenlab.org Santos et al., PeerJ, 2015

Page 103: Biomedical text mining: Automatic processing of unstructured text

disease genes

Page 104: Biomedical text mining: Automatic processing of unstructured text

diseases.jensenlab.org Frankild et al., Methods, 2015

Page 105: Biomedical text mining: Automatic processing of unstructured text

disease mutations

Page 106: Biomedical text mining: Automatic processing of unstructured text