interactive text mining suite: data visualization for literary studies
TRANSCRIPT
Introduction
VisualizationMethods
ITMS
MedievalCorpus
Conclusion Interactive Text Mining Suite: DataVisualization for Literary Studies
Olga Scrivner and Jefferson Davis
Indiana University
CDH 20171 / 30
Introduction
VisualizationMethods
ITMS
MedievalCorpus
Conclusion
Outline
1 Visual Analytics in Digital Humanities
2 Shiny Web Application - Interactive Text Mining Suite
3 Case Study: Visualization of Medieval Romance ofFlamenca
2 / 30
Introduction
VisualizationMethods
ITMS
MedievalCorpus
Conclusion
Digital Humanities - Transformation
The “epic transformation of archives” - shifting from print todigital archival form (Folsom, 2007)
3 / 30
Introduction
VisualizationMethods
ITMS
MedievalCorpus
Conclusion
Digital Humanities
“As our collective knowledge continues to be digitized andstored (...) it becomes more difficult to find and discover what
we are looking for.” (Blei 2012)
4 / 30
Introduction
VisualizationMethods
ITMS
MedievalCorpus
Conclusion
Digital Humanity Manifesto 2.0 (2009) and Berry(2011)
1st Wave: “The first wave of digital humanities work wasquantitative, mobilizing the search and retrievalpowers of the database, automating corpuslinguistics, stacking hypercards into criticalarrays”
2nd Wave: “The second wave is qualitative, interpretive”,concentrating on new tools for creating andcurating digital repositories (Berry, 2011)
3rd Wave: Concentration on the computationality, search,retrieval and analysis originated inhumanity-based work
5 / 30
Introduction
VisualizationMethods
ITMS
MedievalCorpus
Conclusion
Visual Analytics in Literature
“The science of analytical reasoning facilitated byvisual interactive interfaces”
(Thomas et al., 2005)
6 / 30
Introduction
VisualizationMethods
ITMS
MedievalCorpus
Conclusion
Close Reading
Concept Micro-analysis (Jockers, 2013)
Close textual analysis of individual texts to“unveil words, verbal images, elements of style,sentences, argument patterns” (Jasinski, 2001)
Methods Color coding, marginal comments, underlining
Tools Poem Viewer, PRISM, Juxta, eMargin
7 / 30
Introduction
VisualizationMethods
ITMS
MedievalCorpus
Conclusion
Close Reading Visualization: eMargin and JUXTA
8 / 30
Introduction
VisualizationMethods
ITMS
MedievalCorpus
Conclusion
Distant Reading
Concept Macro-analysis (Jockers, 2013)
“the construction of abstract models”(Jasinski, 2001)
Methods Tag clouds, heat maps, clusters, topics, networkgraphs
Tools GUI: Voyant, PapermachineTUI: Mallet, Meta, R and Python packages
9 / 30
Introduction
VisualizationMethods
ITMS
MedievalCorpus
Conclusion
Visualization Methods in Literature
Graphs, maps and trees for literature analysis (Moretti, 2005)
10 / 30
Introduction
VisualizationMethods
ITMS
MedievalCorpus
Conclusion
Visualization Methods in Literature
Word clouds to analyze a novel (Vuillemot et al., 2009)
11 / 30
Introduction
VisualizationMethods
ITMS
MedievalCorpus
Conclusion
Visualization Methods in Literature
Social network graphs of characters in Greek tragedies(Rydberg-Cox, 2011)
12 / 30
Introduction
VisualizationMethods
ITMS
MedievalCorpus
Conclusion
Visualization Methods in Literature
Literary fingerprint and summaries (Oelke et al., 2012)
13 / 30
Introduction
VisualizationMethods
ITMS
MedievalCorpus
Conclusion
Visualization Methods in Literature
Tracking emotion and sentiment in fairy tales(Mohammad, 2012)
14 / 30
Introduction
VisualizationMethods
ITMS
MedievalCorpus
Conclusion
Topic Modeling
Discovering underlying theme of collection from Science magazine1990-2000 (Blei 2012)
15 / 30
Introduction
VisualizationMethods
ITMS
MedievalCorpus
Conclusion
Technological and Methodological Obstacles
Many tools require some programming skills (Mallet,Meta, R and Python libraries)
GUI tools are limited to certain formats and functions(Voyant, PaperMachine)
Lack of active control by users
16 / 30
Introduction
VisualizationMethods
ITMS
MedievalCorpus
Conclusion
Our Goals - Interactive Text Mining Suite
A user-friendly interactive tool for quantitative andvisualization analysis
Designed for linguistic and literary analysis
Incorporation of annotated corpora in macro-analysis
17 / 30
Introduction
VisualizationMethods
ITMS
MedievalCorpus
Conclusion
Background
1 R - a free programming language for statistical computingand graphics
2 RStudio - Integrated Development Environment: a sourcecode editor, an executor and a debugger
3 Shiny App - a web application framework for R
18 / 30
Introduction
VisualizationMethods
ITMS
MedievalCorpus
Conclusion
ITMS
Platform-independent, user-friendly and interactive
State-of-the-art statistical and graphical tools (R libraries)
http://www.interactivetextminingsuite.com
19 / 30
Introduction
VisualizationMethods
ITMS
MedievalCorpus
Conclusion
Multi-Functional
1 Import txt, pdf, rdf and Google books API
2 Metadata extraction
3 Interactive data pre-processing
4 Dynamic visualization
20 / 30
Introduction
VisualizationMethods
ITMS
MedievalCorpus
Conclusion
Case Study - Medieval Occitan
Occitan (Provencal) constitutes an important element of theliterary, linguistic, and cultural heritage in the history ofRomance languages
Interactive online database and linguistically annotated corpus(Scrivner et al., 2014)http://www.oldoccitancorpus.org
21 / 30
Introduction
VisualizationMethods
ITMS
MedievalCorpus
Conclusion
Comparative Analysis of POS: Original andTranslation
Occitan corpus English translation
22 / 30
Introduction
VisualizationMethods
ITMS
MedievalCorpus
Conclusion
Key-Word-in-Context Analysis of POS
Existential - there Negation
23 / 30
Introduction
VisualizationMethods
ITMS
MedievalCorpus
Conclusion
Stylistic Similarities - Sentence Length
Occitan Corpus English Translation
24 / 30
Introduction
VisualizationMethods
ITMS
MedievalCorpus
Conclusion
Stylistic Comparison - Punctuation
Occitan Corpus English Translation
Question marks and exclamation marks - red; quotation marks, hyphens and parenthesis - green; semicolons,colons, commas, periods - blue25 / 30
Introduction
VisualizationMethods
ITMS
MedievalCorpus
Conclusion
Document Level Cluster Analysis
Cluster analysis - groups documents into subgroups. Thesesubgroups “are coherent internally, but clearly different from
each other”(Manning, 2009)
26 / 30
Introduction
VisualizationMethods
ITMS
MedievalCorpus
Conclusion
Document Level Topic Analysis
Text collections - “represented as random mixtures over latenttopics, where each topic is characterized by a distribution over
words”(Blei, 2003)
27 / 30
Introduction
VisualizationMethods
ITMS
MedievalCorpus
Conclusion
Conclusion
1 There is a need for text mining tools designed for linguistsand literary scholars
2 Interactive user-friendly applications bridge the gapbetween data mining and digital humanities
3 Shiny framework can be incorporated in any digitalcorpora to exhibit, search or visualize written collections
28 / 30
Introduction
VisualizationMethods
ITMS
MedievalCorpus
Conclusion
ITMS
Browser and Smart Phone
Questions, comments
https://languagevariationsuite.wordpress.com/
29 / 30
Introduction
VisualizationMethods
ITMS
MedievalCorpus
Conclusion
References
Mohammad, Saif. 2013. From Once Upon a Time to Happily Ever After:Tracking Emotions in Novels and Fairy Tales. In Proceedings of the ACLWorkshop on Language Technology for Cultural Heritage, Social Sciences, andHumanities (LaTeCH), 2011, Portland, OR.Moretti, Franco. 2005. Graphs, maps, trees: abstract models for a literary history.R.R. Donnelley & Sons.Oelke, Daniela, Dimitrios Kokkinakis and Mats Malm. 2012. Advanced VisualAnalytics Methods for Literature Analysis. In Proceedings of the 6th EACLWorkshop, 35-44.Rydberg-Cox, Jeff. 2011. Social Networks and the Language of Greek Tragedy.Journal of the Chicago Colloquium on Digital Humanities and Computer Science.1(3): 1-11.Thomas, James and Kristin Cook. 2005. Illuminating the Path: the Research andDevelopment Agenda for Visual Analytics. National Visualization and AnalyticsCenter.Vuillemot, Romain, Tanya Clement, Catherine Plaisant and Amit Kumar. 2009.What’s Being Near “Martha”? Exploring Name Entities in Literary TextCollections. In Proceedings if the IEEE Symposium. Atlantic City, New Jersey.107-114.http://www.clipartbest.com/clipart-9i4A55xiE
30 / 30