topic mapping tools for biomedical corpora

19
Topic Mapping Tools for Biomedical Corpora Gully APC Burns, USC/ISI Dave Newman, UC Irvine Bruce Herr, IU

Upload: israel

Post on 18-Feb-2016

33 views

Category:

Documents


1 download

DESCRIPTION

Topic Mapping Tools for Biomedical Corpora. Gully APC Burns, USC/ISI Dave Newman, UC Irvine Bruce Herr, IU. ‘Snapshots of Neuroscience’. Society for Neuroscience Annual meeting (2000 New Orleans) ~30,000 attendees, ~12,000 posters per year. Basic Idea: Topic Modeling. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Topic Mapping Tools for Biomedical Corpora

Topic Mapping Tools for Biomedical Corpora

Gully APC Burns, USC/ISIDave Newman, UC IrvineBruce Herr, IU

Page 2: Topic Mapping Tools for Biomedical Corpora

‘Snapshots of Neuroscience’

Society for Neuroscience Annual meeting (2000 New Orleans)~30,000 attendees, ~12,000 posters per year

Page 3: Topic Mapping Tools for Biomedical Corpora

Basic Idea: Topic Modeling

Erythropoietin (Epo), a hematopoietic cytokine, has recently been demonstrated to provide neuroprotection on nigral dopaminergic neurons. However, there is no information available about whether Epo can protect dopaminergic neurons from the neurotoxicity of 6-hydroxydopamine (6-OHDA) that is most commonly used to create a rat model of Parkinson’s disease (PD). In the present study, we tested the hypothesis that recombinant human Epo (rhEpo) would protect dopaminergic neurons and improve neurobehavioral outcomes in a rat model of progressive PD. rhEpo (20 units in 2μl of vehicle) was stereotaxically injected into one side of the striatum. The 6-OHDA lesion was made into the same side one day after rhEpo treatment. Methamphetamine-induced rotation was measured 3 and 10 weeks after the lesion, and paw reaching was also tested at 10 weeks. After the last time of behavioral test, rats were then sacrificed, and the brains were perfusion-fixed for histology and immunocytochemistry. We observed that intrastriatal administration of rhEpo significantly reduced the degree of rotational asymmetries. The rhEpo-treated animals also showed a better improvement in skilled forelimb use when compared with the control rats. In accompanying with the recovery of neurobehavioral outcomes, tyrosine hydroxylase (TH)-immunoreactive neurons of the substantia nigra were protected from progressive degeneration in the rhEpo-treated rats. TH-immunoreactivity in the 6-OHDA lesioned striatum also significantly increased in the rhEpo-treated rats. To examine if systemic administration of rhEpo could exert the similar biological effects …

Page 4: Topic Mapping Tools for Biomedical Corpora

Basic Idea: Topic Modeling

Page 5: Topic Mapping Tools for Biomedical Corpora

Basic Idea: Topic Modeling

... plus all remaining ‘topic mass’ – provides a signature from which we can calculate document-document similarities (~12,000 x ~12,000 matrix)

Page 6: Topic Mapping Tools for Biomedical Corpora

‘Topic Mapping’ Workflow

ischemia cerebral

ischemic stroke brain occlusion injury infarct mcao hour reperfusion

artery volume model middle

transient

LiteratureCorpus

Topic Modeling

using Gibbs Sampling

Topic ModelDocument-Document

Similarity Map

Google MapsApplication

Graph Layout

Processing with VxOrd /

DrL

Multi-level image

rendering, Cluster

analysis for label

placement

Page 7: Topic Mapping Tools for Biomedical Corpora

Implementation 1: SfN 2006 Maps @ SfN 2007

Analysis: Dave Newman, UCIVisualization: Bruce Herr, IU

Page 8: Topic Mapping Tools for Biomedical Corpora

Lessons LearnedThis demonstration had a high impact at SfN 2007

[Shown to Neuroinformatics Committee (NIC), PubMed Plus Panel, Program Committee, General Council]

Why?1. System emphasizes elegant visualization2. Application has natural, familiar, intuitive design3. Criticisms centered on concerns about analysis

validity (‘what do clusters actually mean’?) ...but, system focused on utility, not interpretations...

Page 9: Topic Mapping Tools for Biomedical Corpora

Next StepsGary Westbrook

[NIC, ex-editor of J Neurosci, external committee of National Institute of Neurological Disorders and Stroke, NINDS]

Edmund Talley [Program Director NINDS, Channels Synapses

and Circuits]Requested a system to examine NINDS

grants accessed from CRISP

Page 10: Topic Mapping Tools for Biomedical Corpora

CRISP: Computer Retrieval of Information on Scientific Projects Lists all funded DHHS projects from 1972

[including data from NIH, CDC, FDA, HRSA and AHRQ]Build topic map of NINDS 2006 grants in relation

to 13 other NIH institutes involved with funding Neuroscience research.[Largest Institute: NCI ~ 9373 grants (2006)][Smallest Institute: NIAAA ~ 1198 grants (2006)]

Downloaded 10 years of abstracts from NINDS (to weight distribution in favor of NINDS topics) and 1 year of all other 13 institutes.

NINDS staff hand-annotated ~2500 grants with SfN categories (theme, sub-theme, topic) to compare with categories generated by the topic model.

Page 11: Topic Mapping Tools for Biomedical Corpora

Additional Features for this implementation Improved navigability Multiple maps Multiple labeling / coloring schemes Search

Google Map – based flags, etc. full-text search within the HTML

application

Page 12: Topic Mapping Tools for Biomedical Corpora

Implementation 2: NINDS + NIH Maps for 2006

Page 13: Topic Mapping Tools for Biomedical Corpora

What’s Next?All 2007 abstracts from NIH (all institutes)Diagnostic functions within browser

- ‘Heat maps’ of each individual topic- ‘Cluster Expansion’

Trend analysisWhich topics are emergent? Which are in decline?

Can we perform analysis across corpora? SfN abstracts from 2001-2008Medline (>8 million abstracts)CRISP (funded federal project abstracts) PubMed Central (~1 million full text papers)Other full-text resources

Page 14: Topic Mapping Tools for Biomedical Corpora

‘Cluster Expansion’

Page 15: Topic Mapping Tools for Biomedical Corpora

What’s Next?All 2007 abstracts from NIH (all institutes)Diagnostic functions within browser

- ‘Heat maps’ of each individual topic- ‘Cluster Expansion’

Trend analysisWhich topics are emergent? Which are in decline?

Can we perform analysis across corpora? SfN abstracts from 2001-2008Medline (>8 million abstracts)CRISP (funded federal project abstracts) PubMed Central (~1 million full text papers)Other full-text resources

Page 16: Topic Mapping Tools for Biomedical Corpora

Data across many years allows trend analysis

Medline Data

PDHIVp53

Page 17: Topic Mapping Tools for Biomedical Corpora

What’s Next?All 2007 abstracts from NIH (all institutes)Diagnostic functions within browser

- ‘Heat maps’ of each individual topic- ‘Cluster Expansion’

Trend analysisWhich topics are emergent? Which are in decline?

Can we perform analysis across corpora? SfN abstracts from 2001-2008Medline (>8 million abstracts)CRISP (funded federal project abstracts) PubMed Central (~1 million full text papers)Other full-text resources

Page 18: Topic Mapping Tools for Biomedical Corpora

Full-text Biomedical Articles

Source Size (# articles millions)

Type

Medline 15.8 CitationsElsevier’s ScienceDirect 6.75 ArticlesPubMed Central 0.97 ArticlesCambridge Journals 0.18 ArticlesJSTOR 1.62 ArticlesSpringerLink (Biomedical / Medical)

1.32 (0.72 / 0.60)

Articles

Wiley Interscience 1.50 Articles

Page 19: Topic Mapping Tools for Biomedical Corpora

AcknowledgementsFunding

Information Sciences Institute, seed funding

NSF: IIS-0513650 NINDS contracts

(Ned Talley)

Collaborators Dave Newman (UCI) Bruce Herr (IU)

Developers Tommy Ingulfsen

Contributing Computer Scientists Padhraic Smyth

(UCI) Katy Borner (IU) Patrick Pantel

(ISI/Yahoo!)