search pubmed with r part3

17
Search Search Pubmed Pubmed with with R R Part3 Part3

Upload: cpmarqui

Post on 20-Apr-2017

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Search Pubmed With R Part3

SearchSearch PubmedPubmed withwith RR

Part3Part3

Page 2: Search Pubmed With R Part3

Query Query pubmedpubmed titles for systemic lupus titles for systemic lupus erythematosuserythematosus with R Package RISmedwith R Package RISmed11

#Type the following in the R console:#Type the following in the R console:library(RISmedlibrary(RISmed))lupus<lupus<-- EUtilsSummary('lupus[TiEUtilsSummary('lupus[Ti] ] erythematosus[tierythematosus[ti] ] systemic[Tisystemic[Ti]', ]', retmaxretmax=200)=200)

# # retmaxretmax refer to Maximum number of records to retrieve, default is 100refer to Maximum number of records to retrieve, default is 1000.0.

fetch.lupusfetch.lupus <<-- EUtilsGet(lupusEUtilsGet(lupus))fetch.lupusfetch.lupus

# Results: # Results: PubMedPubMed query: query: lupus[Tilupus[Ti] AND ] AND erythematosus[tierythematosus[ti] AND ] AND systemic[Tisystemic[Ti] Records: 200 ] Records: 200

lupus.titlupus.tit<<--ArticleTitle(fetch.lupusArticleTitle(fetch.lupus))lupus.titlupus.tit [1:10] # to view the first 10 results of titles[1:10] # to view the first 10 results of titles

# export results to text file# export results to text file

write(lupus.tit,filewrite(lupus.tit,file="="lupusRISmedTi.txtlupusRISmedTi.txt")")ReferencesReferences11-- RISmedRISmed packagepackage: : StephanieStephanie KovalchikKovalchik (2013). (2013). RISmedRISmed: : DownloadDownload contentcontent fromfrom NCBI NCBI databasesdatabases. R . R packagepackage versionversion 2.1.0. 2.1.0.

httphttp://://CRAN.RCRAN.R--project.orgproject.org//packagepackage==RISmedRISmed

Page 3: Search Pubmed With R Part3

Query Query pubmedpubmed titles for systemic titles for systemic lupus lupus erythematosuserythematosus using using RISmedRISmed

Page 4: Search Pubmed With R Part3

View results of the exported text fileView results of the exported text file

Export results to text file with R command line Export results to text file with R command line write(lupus.tit,filewrite(lupus.tit,file="="lupusRISmedTi.txtlupusRISmedTi.txt")")# export title results as text file and open file in excel or an# export title results as text file and open file in excel or any other valid text editory other valid text editor

Page 5: Search Pubmed With R Part3

Find the Title Verb Relation with Find the Title Verb Relation with ReverbReverb

REVERB1 is an open extractor executable jarexecutable jar programdeveloped by the University of Washington's Turing Center.

• It is important to note that Reverb is dependent on JAVA, therefore itis not a R program.

• Reverb is powerful and provides useful information about structurerelation of a text. It is relative easy to use and runs very fast.

• In our case we will apply Reverb to to our text title results.

Reference:@inproceedings{ReVerb2011, author = {Anthony Fader and Stephen Soderland and Oren Etzioni},

title = {Identifying Relations for Open Information Extraction}, booktitle = {Proceedings of the Conference of Empirical Methods in Natural Language Processing ({EMNLP} '11)}, year = {2011}, month = {July 27-31}, address = {Edinburgh, Scotland, UK} }

Page 6: Search Pubmed With R Part3

Install ReverbInstall ReverbYou can download the latest You can download the latest ReVerbReVerb jar from jar from http://reverb.cs.washington.edu/reverbhttp://reverb.cs.washington.edu/reverb--latest.jarlatest.jar

This is the executable jar file is easy to run from MSThis is the executable jar file is easy to run from MS--DOS command. DOS command.

In In https://github.com/knowitall/reverb/https://github.com/knowitall/reverb/ you can find how to use you can find how to use Reverb. It provides the following example which illustrates whaReverb. It provides the following example which illustrates what it t it does:does:

““ReVerbReVerb takestakes rawraw texttext as as inputinput, , andand outputsoutputs (argument1, (argument1, relationrelationphrasephrase, argument2) triples. , argument2) triples. ForFor exampleexample, , givengiven thethe sentencesentence"Bananas are "Bananas are anan excellentexcellent sourcesource ofof potassiumpotassium," ," ReVerbReVerb willwill extractextractthethe triple (bananas, be triple (bananas, be sourcesource ofof, , potassiumpotassium).).””

In In orderorder toto runrun ReverbReverb youyou needneed toto havehave Java Java installedinstalled onon youryourcomputercomputer. . YouYou can can installinstall Java Java fromfrom https://www.java.com/en/download/https://www.java.com/en/download/

Reference:@inproceedings{ReVerb2011, author = {Anthony Fader and Stephen Soderland and Oren Etzioni}, title = {Identifying Relations for Open Information Extraction}, booktitle = {Proceedings of the Conference of Empirical Methods in

Natural Language Processing ({EMNLP} '11)}, year = {2011}, month = {July 27-31}, address = {Edinburgh, Scotland, UK} }

Page 7: Search Pubmed With R Part3

Use of ReverbUse of Reverb

Place Place reverb-latest.jar file and the result file “lupusRISmedTi.txtlupusRISmedTi.txt”” under the same folderunder the same folder

Figure shows example of the 2 files in the same folder (which we named Reverb-Java)

Page 8: Search Pubmed With R Part3

Use of ReverbUse of Reverb

11--Open the MSOpen the MS--DOS DOS cmdcmd and type the path of and type the path of the folder (Reverbthe folder (Reverb--Java in our example) Java in our example) containing both files: containing both files: reverb-latest.jar file and lupusRISmedTi.txtlupusRISmedTi.txt

Page 9: Search Pubmed With R Part3

Use ReverbUse Reverb22-- Type the following cmd line to view results on the console:

java -Xmx512m -jar reverb-latest.jar lupusRISmedTi.txtlupusRISmedTi.txt

Results are displayed on the MSResults are displayed on the MS--DOS windowDOS window

Page 10: Search Pubmed With R Part3

Use of ReverbUse of Reverb-- export the results to export the results to xlsxls filefile

33-- Type the following cmd line to export results to a file ::

java -Xmx512m -jar reverb-latest.jar lupusRISmedTi.txtlupusRISmedTi.txt > > ReverbLupusRISmedTi.txtReverbLupusRISmedTi.txt

(the name given to the file was ReverbLupusRISmedTi.txtReverbLupusRISmedTi.txt. You can use . You can use other name or even export to a other name or even export to a xlsxls file if you type file if you type ReverbLupusRISmedTi.xlsReverbLupusRISmedTi.xls

Page 11: Search Pubmed With R Part3

Open the Reverb result file Open the Reverb result file ReverbLupusRISmedTi.txtReverbLupusRISmedTi.txt with MS excel with MS excel

Page 12: Search Pubmed With R Part3

Reverb outputReverb outputThe Reverb output has 18 columnsThe Reverb output has 18 columns

(see results in the excel file)(see results in the excel file)TheThe mostmost interestinginteresting are:are:

Col 3 (Col C) : Argument1 Col 3 (Col C) : Argument1 Col 4 (Col D): Col 4 (Col D): VerbVerb RelationRelation phrasephraseCol 5 (Col E): Argument2Col 5 (Col E): Argument2

(Col 12 (Col 12 referrefer toto thethe confidenceconfidence thatthat thisthis extractionextraction isis correctcorrect andand col 2 col 2 referrefer toto the sentence number where the extraction came from)

Page 13: Search Pubmed With R Part3

Reverb ResultsReverb ResultsResults of the first 5 rows (excel) from columns 3Results of the first 5 rows (excel) from columns 3--55

11-- childhoodchildhood--onset systemic lupus onset systemic lupus erythematosuserythematosus is associated withis associated with ethnicityethnicity22-- renal involvementrenal involvement are lower inare lower in ACE inhibitorACE inhibitor--treated patientstreated patients33-- PrednisonePrednisone inducedinduced twotwo--way myocardial developmentway myocardial development44-- Acetylated Acetylated histoneshistones contribute tocontribute to the the immunostimulatoryimmunostimulatory potential of potential of

NeutrophilNeutrophil ExtracellularExtracellular TrapsTraps55--clinical practiceclinical practice monitor the impact ofmonitor the impact of systemic lupus systemic lupus erythematosuserythematosus

Note: Note: Blue color refer to argument 1Blue color refer to argument 1; white color is verb relation; ; white color is verb relation; orange color orange color refer to argument 2refer to argument 2

Page 14: Search Pubmed With R Part3

Prepare Reverb ResultsPrepare Reverb Resultsdata for R data for R WordcloudWordcloud

# use # use read.tableread.table script (from referencescript (from reference11 ) as follows:) as follows:d <d <--

read.table('ReverbLupusRISmedTi.txt',quoteread.table('ReverbLupusRISmedTi.txt',quote='',='',commentcomment.char.char='', ='', allowEscapesallowEscapes==F,sepF,sep='='\\t', header=FALSE, t', header=FALSE, as.isas.is=T, =T, stringsAsFactorsstringsAsFactors=F)=F)

# transforms the data into a data frame# transforms the data into a data framee<e<--as.data.frame(das.data.frame(d))# merge columns (3# merge columns (3--5) into a single text sentence5) into a single text sentencef=paste(e$V3,e$V4,e$V5) f=paste(e$V3,e$V4,e$V5) f[1:3] f[1:3] # view the first 3 lines # view the first 3 lines [1] "childhood[1] "childhood--onset systemic lupus onset systemic lupus erythematosuserythematosus is associated with ethnicity"is associated with ethnicity"[2] "renal involvement are lower in ACE inhibitor[2] "renal involvement are lower in ACE inhibitor--treated patients" treated patients" [3] "Prednisone induced two[3] "Prednisone induced two--way myocardial development"way myocardial development"Reference:Reference:1 Please stop using Excel1 Please stop using Excel--like formats to exchange datalike formats to exchange data

December 7th, 2012John MountDecember 7th, 2012John Mount

Page 15: Search Pubmed With R Part3

Represent Reverb ResultsRepresent Reverb Resultsin R in R WordcloudWordcloud

library (tm)my.corpusmy.corpus<<--Corpus(VectorSource(fCorpus(VectorSource(f))))summary(my.corpus)inspect(my.corpus [1:3]) my.corpus <- tm_map(my.corpus, removeWords, stopwords("english"))#my.corpus <- tm_map(my.corpus, stemDocument)myTdm <- TermDocumentMatrix(my.corpus, control =

list(wordLengths=c(1,Inf)))myTdm

# A term-document matrix (140 terms, 26 documents)# Non-/sparse entries: 163/3477# Sparsity : 96%# Maximal term length: 22 # Weighting : term frequency (tf)

Page 16: Search Pubmed With R Part3

Represent Reverb ResultsRepresent Reverb Resultsin R in R WordcloudWordcloud

findFreqTerms(myTdm, lowfreq=2)# [1] "associated" "damage" "distinct" "erythematosus"# [5] "increased" "independently" "lupus" "systemic"

termFrequency <- rowSums(as.matrix(myTdm))termFrequency <- subset(termFrequency, termFrequency>=10)m <- as.matrix(myTdm)wordFreq <- sort(rowSums(m), decreasing=TRUE) # This yields Word

Frequencylibrary (wordcloud)#library (RColorBrewer)set.seed(375) pal1 <- brewer.pal(6,"Dark2")wordcloud(words=names(wordFreq), freq=wordFreq,

scale=c(2,.9),min.freq=1, random.order=F, colors= pal1)

Page 17: Search Pubmed With R Part3

R WordcloudR Wordcloud of Reverb Resultsof Reverb Results