tagpies: comparative visualization of textual data

12
TagPies: Comparative Visualization of Textual Data Stefan Jänicke 1 , Judith Blumenstein 2 , Michaela Rücker 2 , Dirk Zeckzer 1 and Gerik Scheuermann 1 1 Image and Signal Processing Group, Leipzig University, Leipzig, Germany 2 Faculty of History, Arts and Oriental Studies, Leipzig University, Leipzig, Germany {stjaenicke, zeckzer, scheuermann}@informatik.uni-leipzig.de, [email protected], [email protected] Keywords: Tag Clouds, Pie Charts, TagPies, Text Visualization, Text Comparison, Digital Humanities Abstract: A TagPie is a novel tag cloud layout that arranges the tags belonging to multiple data categories in a pie chart manner. Motivated from research in classical philology, TagPies were designed to support the comparative analysis of classical terminology. In this scenario, the data categories represent the co-occurrences of different searched keywords, so that the comparison of the contexts in which these keywords were used becomes pos- sible using TagPies. This paper illustrates the iterative development of TagPies, which aid as a distant reading view on a text corpus for humanities scholars. We outline various steps of our collaborative digital humanities project, and we emphasize the utility of the proposed design by outlining various usage scenarios representing current research questions in classical philology. 1 MOTIVATION Traditionally, humanities scholars read texts on pa- per in order to generate and verify hypotheses about precisely formulated research questions. As a re- sult of mass digitization, nowadays, the scholars have access to large digital libraries containing numerous texts. This on demand availability of texts changes the traditional workflows of the scholars in different ways. First, the retrieval of text passages gets eas- ier, usually, by querying a text corpus using a typi- cal keyword-based search. The drawback of this ap- proach is that the quality of results is usually not sat- isfying. Often, the humanities scholars receive too many results, which they cannot process individually. Consequently, it is impossible to generate useful hy- potheses. On the other hand, the precision can be low so that many found text passages are irrelevant to the given research question. Especially in that case, picking text passages related to the observed topic is a laborious task. Second, the access to vast tex- tual data brought forth new research methodologies in the humanities, introduced by Franco Moretti as dis- tant reading (Moretti, 2005). Before the digital age it was inconceivable to generate hypotheses about texts without explicitly reading them; Moretti presented re- search questions that were impossible to investigate with the traditional close reading technique. In our digital humanities projecteXChange, 1 the 1 http://exchange-projekt.de/ collaborating humanities scholars—six historians and classical philologists—wanted to explore medical concepts in classical texts, which required workflows that include distant as well as close readings. Work- ing with the project’s large text corpus, the humanities scholars are interested in the co-occurrences of med- ical terms. For instance, they look for terms describ- ing medical conditions, associated terms for symp- toms, body parts, etc., in order to explore what ancient writers knew about the medical concept. The mis- sion of the corresponding digital humanities project was investigating novel research questions in classi- cal philology—the comparison of medical concepts. For instance, a humanities scholar hypothesized that the terms morbus comitialis and morbus sacer like- wise were used to denote epilepsy (a discussion of this example can be found in Section 5.2). In order to support the comparative analysis of medical concepts in classical texts, we developed Tag- Pies in close collaboration to the humanities scholars of our project. This paper outlines the steps of the iter- ative development and the final TagPies layout algo- rithm that includes a tailored tag sorting mechanism and design features applied to visually separate shared and individual contexts of terms. To meet the needs of the humanities scholars, we embedded TagPies as a distant reading visualization into a visual interface that is linked to a close reading view, which enable the inspection of individual text passages in order to as- sess their relevancy to the observed medical concept.

Upload: others

Post on 30-Jan-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

TagPies: Comparative Visualization of Textual Data

Stefan Jänicke1, Judith Blumenstein2, Michaela Rücker2, Dirk Zeckzer1 and Gerik Scheuermann1

1Image and Signal Processing Group, Leipzig University, Leipzig, Germany2Faculty of History, Arts and Oriental Studies, Leipzig University, Leipzig, Germany

{stjaenicke, zeckzer, scheuermann}@informatik.uni-leipzig.de, [email protected], [email protected]

Keywords: Tag Clouds, Pie Charts, TagPies, Text Visualization, Text Comparison, Digital Humanities

Abstract: A TagPie is a novel tag cloud layout that arranges the tags belonging to multiple data categories in a pie chartmanner. Motivated from research in classical philology, TagPies were designed to support the comparativeanalysis of classical terminology. In this scenario, the data categories represent the co-occurrences of differentsearched keywords, so that the comparison of the contexts in which these keywords were used becomes pos-sible using TagPies. This paper illustrates the iterative development of TagPies, which aid as a distant readingview on a text corpus for humanities scholars. We outline various steps of our collaborative digital humanitiesproject, and we emphasize the utility of the proposed design by outlining various usage scenarios representingcurrent research questions in classical philology.

1 MOTIVATION

Traditionally, humanities scholars read texts on pa-per in order to generate and verify hypotheses aboutprecisely formulated research questions. As a re-sult of mass digitization, nowadays, the scholars haveaccess to large digital libraries containing numeroustexts. This on demand availability of texts changesthe traditional workflows of the scholars in differentways. First, the retrieval of text passages gets eas-ier, usually, by querying a text corpus using a typi-cal keyword-based search. The drawback of this ap-proach is that the quality of results is usually not sat-isfying. Often, the humanities scholars receive toomany results, which they cannot process individually.Consequently, it is impossible to generate useful hy-potheses. On the other hand, the precision can below so that many found text passages are irrelevant tothe given research question. Especially in that case,picking text passages related to the observed topicis a laborious task. Second, the access to vast tex-tual data brought forth new research methodologies inthe humanities, introduced by Franco Moretti as dis-tant reading (Moretti, 2005). Before the digital age itwas inconceivable to generate hypotheses about textswithout explicitly reading them; Moretti presented re-search questions that were impossible to investigatewith the traditional close reading technique.

In our digital humanities projecteXChange,1 the1http://exchange-projekt.de/

collaborating humanities scholars—six historians andclassical philologists—wanted to explore medicalconcepts in classical texts, which required workflowsthat include distant as well as close readings. Work-ing with the project’s large text corpus, the humanitiesscholars are interested in the co-occurrences of med-ical terms. For instance, they look for terms describ-ing medical conditions, associated terms for symp-toms, body parts, etc., in order to explore what ancientwriters knew about the medical concept. The mis-sion of the corresponding digital humanities projectwas investigating novel research questions in classi-cal philology—the comparison of medical concepts.For instance, a humanities scholar hypothesized thatthe terms morbus comitialis and morbus sacer like-wise were used to denote epilepsy (a discussion of thisexample can be found in Section 5.2).

In order to support the comparative analysis ofmedical concepts in classical texts, we developed Tag-Pies in close collaboration to the humanities scholarsof our project. This paper outlines the steps of the iter-ative development and the final TagPies layout algo-rithm that includes a tailored tag sorting mechanismand design features applied to visually separate sharedand individual contexts of terms. To meet the needsof the humanities scholars, we embedded TagPies asa distant reading visualization into a visual interfacethat is linked to a close reading view, which enable theinspection of individual text passages in order to as-sess their relevancy to the observed medical concept.

We emphasize the utility of this visual interface forthe humanities scholars by providing various usagescenarios. In a storytelling style, each scenario ex-emplifies how TagPies support verifying and generat-ing hypotheses concerning philological matters. Ad-ditionally, we report collaboration experiences gainedduring our digital humanities project. This includesthe iterative evaluation of TagPies with the humani-ties scholars, and limitations of our approach.

2 RELATED WORK

Widely used and perceived as being fun, tag cloudsare important components in the social web to visu-alize summaries of textual data. Many works presentlayout methods developed to consolidate the use andvalidity of tag clouds for specific purposes. Below,we outline general information about tag clouds, theiruse in digital humanities applications, and we take alook at various tag cloud layout approaches that sup-port the visualization of multiple data categories.

2.1 Tag Cloud Visualizations

The primary purpose of tag clouds is to present a vi-sual summary of textual data (Sinclair and Cardew-Hall, 2008). First introduced by Stanley Milgram’smental map of Paris (Milgram and Jodelet, 1976)in 1976, tag clouds later became popular in the so-cial web community. Although originally used fornon-specific information discovery, tag clouds canalso be used to support analytical tasks such as theexamination of text collections (Viegas and Watten-berg, 2008). Furthermore, tag clouds obtained wideacceptance as interfaces for navigation purposes ondatabases (Hearst and Rosner, 2008). Traditionally, atag cloud is a simple list of words placed on multi-ple lines, either ordered alphabetically or by the im-portance of a tag, which is encoded by variable fontsize (Murugesan, 2007). Portals such as ManyEyescan be used to create such kind of tag cloud visualiza-tions on demand (Viegas et al., 2007). A user studyon the utility of tag clouds revealed that the usual al-phabetic order is not obvious for the observer, but tagclouds are generally seen as a popular social com-ponent (Hearst and Rosner, 2008). Potentially, thiswas one of the reasons that later more sophisticatedtag cloud layout approaches were developed, whichrather emphasized aesthetics than meaningful order-ings. A representative technique is Wordle (Viegaset al., 2009), a popular web-based tool for visualiz-ing tag clouds used for a wide range of applications.Wordle produces compact aesthetic layouts with tags

in different colors and orientations, but both featuresdo not transfer any additional information. The tagcloud design presented in this paper is based on theWordle algorithm. It places the tags similarly usingan Archimedean spiral, but, additionally, we use thefeatures color and position to visually express the be-longing of individual tags to and their significance forvarious data categories.

2.2 Tag Clouds in Digital Humanities

Visualizations in general are widely applied in dig-ital humanities projects to explore cultural heritagedatasets (Jänicke et al., 2015). Tag clouds in par-ticular are frequently used to encode the number ofword occurrences within a selected section of a text,a whole document or an entire text corpus (Vuille-mot et al., 2009; Fankhauser et al., 2014). For ex-ample, the VarifocalReader (Koch et al., 2014) usestag clouds “to give a visually appealing overview ofa section of text,” which points out the importanceof aesthetics when designing visualizations for dig-ital humanities scholars. By applying significancemeasures, the visualization can be limited to display-ing only characteristic tags, e.g., the most signifi-cant tags of a selected time period (Eisenstein et al.,2014) or the most frequently mentioned commodotiesin a text corpus after filtering (Hinrichs et al., 2015).Topic modeling approaches gain more and more ac-ceptance in digital humanities applications. Here, tagclouds help to illustrate the most descriptive tags oftopics (Binder and Jennings, 2014; Montague et al.,2015). When analyzing the evolution of topics overtime (Cui et al., 2011; Cui et al., 2014), tag cloudsserve to explore the temporal change of a topic’s ter-minology. In contrast, some tag cloud approaches il-lustrate trends in a text corpus. Parallel Tag Cloudsgenerate alphabetically ordered tag lists as columnsfor a number of time slices and highlight the temporalevolution of a tag placed in various columns on mouseinteraction (Collins et al., 2009). SparkClouds attacha graph showing the tag’s evolution over time (Leeet al., 2010). Hinrichs links tag clouds to a classifi-cation schema in the form of a tree structure to helphumanities scholars getting access to texts of a specu-lative fiction anthology corpus (Hinrichs et al., 2016).The tag cloud approach presented in this paper wasdeveloped in order to support humanities scholar incomparing the co-occurrences of different classicalterms to each other. That tag clouds are appropriatevisualization to illustrate a word’s co-occurrences isalready shown by Beaven (Beavan, 2008). A basicvisualization that contrasts the co-occurrences of twowords is outlined by Beaven (Beavan, 2011).

2.3 Visualizing Categories in TagClouds

Gleicher gives an overview of comparative visual-ization techniques for different scenarios (Gleicheret al., 2011). A radial comparative overview of top-ics whose words are represented by dots is illus-trated by Havre (Havre et al., 2001). For the com-parative visualization of tags, various approaches en-deavor to place related tags close to each other in vi-sual groups, in the following called data categories.Thematically clustered tag clouds or semantic tagclouds support the detection of tags belonging to acertain topic (Lohmann et al., 2009). As shown bySchrammel et al. (Schrammel et al., 2009), these tagcloud designs were often preferred by users for spe-cific search tasks as they raise the attention towardssmall tags compared to other designs. For traditionaltag lists, semantically related tags of a data categorycan be placed subsequently (Schrammel and Tsche-ligi, 2014), for more sophisticated layouts the usageof force directed approaches is quite popular. Here,semantically close terms attract each other (Cui et al.,2010; Wu et al., 2011; Liu et al., 2014). GMap is aforce directed approach that delivers a segmentationof a graph into color-coded neighborhoods (Gansneret al., 2010). Other methods try to preserve seman-tic relationships in tag clouds by placing the relatedtags of each data category in non-overlapping areasindividually. Afterwards, multiple tag clouds are vi-sually combined to a single one. The Star Forestmethod (Barth et al., 2014) initially calculates the lay-out for the tags of each data category independently.Then, it uses a force directed method to pack the vari-ous clouds to gain a unified tag cloud. In ProjClouds,a tag cloud layout for each cluster of a documentcollection is computed within its assigned polygo-nal space in the plane (Paulovich et al., 2012). Allabove mentioned methods pack multiple tag cloudstogether, thus, they can be seen as sophisticated smallmultiples approaches since the tag clouds for all datacategories are computed independent of each other.As a consequence, large in-between whitespaces oc-cur when composing these clouds to a visual entity.Words Storms is a rather traditional small multiplesapproach computing a tag cloud for each documentof a corpus to support the visual comparison of doc-uments (Castellà and Sutton, 2014). Here, a signif-icant tag for multiple documents appearing in mul-tiple clouds is placed at similar locations with samecolor and orientation. RadCloud visualizes tags be-longing to various data categories in a shared ellip-tical area (Burch et al., 2014), but it also suffersfrom whitespaces. In Compare Clouds, tags of me-

dia frames are comparatively visualized in a singletag cloud (Diakopoulos et al., 2015), but the design islimited to visualizing two data categories. TagSpheresarrange tags hierarchically on several circular discs totransmit the notion of distance in tag clouds using adifferent color for every hierarchy level (Jänicke andScheuermann, 2016). Furthermore, the TagSphereslayout can be used to visualize tree structures (Jänickeand Scheuermann, 2017).

3 DIGITAL HUMANITIESBACKGROUND

This research bears on research in classical philology,a field of the humanities that is concerned with an-alyzing Latin and ancient Greek texts written in theclassical period. In the following, we outline projectgoals and collaboration aspects at project start that ledto designing the proposed TagPies layout.

Project Idea. The purpose of the digital human-ities project eXChange was the development of newworkflows in order to analyze and to compare medi-cal concepts in classic texts. Due to the digitizationera, humanities scholars are now able to browse digi-tal libraries using a simple keyword search as a stan-dard technique to discover related text passages. Thecorpus of our project combines a multitude of existingsources such as the Perseus Digital Library2 and theBibliotheca Teubneriana Latina.3 Working with thatcorpus, the humanities scholars faced the problem ofretrieving too many results, e.g., a search for morbus(disease) returned 1,558 text passages. Reading alltext passages and combining the gained insights, es-pecially, comparing different result sets was not pos-sible. In close collaboration to humanities scholarsusing the project’s text corpus, our mission in theproject was designing an interface, consistent of adistant reading visualization—TagPies—and a closereading view, that supports the dynamic explorationof the results of various keyword-based search queriesin order to facilitate the comparison of various medi-cal concepts.

Project Start. To ensure designing a valuable,powerful tool that supports investigating the posedresearch questions, we adopted several suggestionsmade by Munzner (Munzner, 2009) for the imple-mentation of our project. Also, we considered collab-oration experiences (Jänicke et al., 2016b) reported

2Perseus Digital Library, Ed. Gregory R. Crane. TuftsUniversity. http://www.perseus.tufts.edu

3Bibliotheca Teubneriana Latina. Walter de Gruyter.http://www.degruyter.com/db/btl

by visualization scholars involved in digital human-ities projects to avoid typical pitfalls when workingtogether with humanities scholars. We furthermoreworked through related works in the digital humani-ties providing valuable suggestions and guidelines fordesigning interfaces for humanities scholars (Gibbsand Owens, 2012; Jänicke, 2016). To avoid makingassumptions for the design of a visual interface thatis hard to comprehend and does not support the con-cerned philological research questions, we initiallydiscussed the needs of the humanities scholars andtheir faced challenges in the targeted domain in sev-eral meetings. The humanities scholars explainedtheir usual workflows, for example, how they use on-line digital libraries for research purposes. On theother hand, we presented and discussed related textvisualization techniques in order to convey an impres-sion of the capabilities and challenges within our re-search field. This get together turned out to be impor-tant to understand each others mindsets, and to definea set of workflows to compare medical concepts thatthe visualization shall support.

Requirement Analysis. Having a large text cor-pus and the project idea at hand, we began without aclear visualization idea. We needed several interdis-ciplinary meetings at the beginning of the project tospecify the research goals of the humanities scholarsand their requirements concerning the visual interfaceto be implemented. Initially, the humanities schol-ars wanted to comparatively analyze classic medicalterminology. They explained, how they would ap-proach this research task using common workflows:by reading the text passages that contain specific key-words. Thus, for a comparative analysis of classicmedical terminology the contexts in which the key-word terms occur need to be compared to each other.In a first workshop, we presented an overview of textvisualization techniques, and we discussed their po-tential to support the given research task. It turned outthat some scholars were familiar with the idea of tagclouds, and basic bar charts were also seen as an ap-propriate method for comparing word frequencies. Assome of the humanities scholars never worked withvisualizations before, and most of them were not usedto work with complex tools, it was necessary to de-velop a system, which is easy to understand. De-spite known theoretical problems (Viegas and Wat-tenberg, 2008), designing a tag cloud visualizationwas the means of choice as they are intuitive, widelyused metaphors to display summaries of textual data.Moreover, tag clouds have been successfully appliedin digital humanities applications before to analyzethe context of words (Beavan, 2008; Beavan, 2011).In the following meeting, we discussed a list of re-

quirements of a tag cloud layout to be valuable forthe collaborating humanities scholars. The tag cloudshould (1) support the analysis of the context of a sin-gle keyword and the comparison of the contexts ofvarious keywords, (2) communicate the relevancy ofa tag to the keyword it co-occurs with and its rele-vancy concerning all keywords, and (3) to reflect theproportion of tags from different categories. In a sec-ond workshop, the humanities scholars worked withseveral existing tag cloud visualizations we adaptedto the project corpus. For each queried keyword, wesummarized the frequencies of the co-occurrences.First, we provided small multiples of Wordle tagclouds (Viegas et al., 2009), which were seen as aes-thetic and a good solution to analyze the context ofa single keyword, but a comparative analysis was noteasy as tags co-occurring with various keywords werehard to find. Working with RadCloud (Burch et al.,2014) yielded exact opposite results: the discoveryof tags co-occurring with various keywords was easy,but the analysis of a single keyword’s context wascrucial. Here, the approach of visualizing word fre-quencies in two concurring manners (bar, tag’s fontsize) was confusing for the humanities scholars. Par-allel Tag Clouds (Collins et al., 2009) were unexpect-edly not favored, although their basic design is simi-lar to word lists, with which humanities scholars areused to work. The major issues were the heights ofthe tag clouds, which forced the humanities scholarsto vertically scroll many times during the explorationprocess. Also, the required interaction to gain addi-tional information was seen problematic. The human-ities scholars stated they want to see several informa-tion “at the first glance.” When developing TagPies,we took the feedback during the second workshopas well as the importance of aesthetics—often men-tioned by the humanities scholars—into account. Aspostulated by Oelke and Gurevych, we designed Tag-Pies based upon the above listed requirements derivedfrom the needs of the targeted user group (Oelke andGurevych, 2014).

4 TAGPIES LAYOUT

Given are n data categories d1, . . . ,dn (n search re-sult sets), each containing the co-occurrences for thequeried search terms T 1, . . . ,T n in the form of tags.The general TagPies idea is to place the tags belong-ing to a data category in a specific circular sector,forcing vocabulary shared by several data categoriesto be placed in the center, and tags unique to a sin-gle category to be placed in the outer regions of thetag cloud. With the resultant tag cloud subdivision,

the final TagPie layout is visually comparable to apie chart, which helps the observer to compare thetag sets of various data categories and to assess theirrelative proportions. According to the actual propor-tions (the number of occurrences of the main terms inthe database) and a maximum number of tags to bedisplayed (for the examples in this paper we chose amaximum of 500 tags), we select the top co-occurringterms (tags) for each data category. If the relative pro-portion of a data category is too small, we leave aminimum of five tags to be displayed.

For each data category di, we need to positionthe category’s main tag T i (the search term) and thetags t i

1, . . . , ti|di| (di = {t i

k|1≤ k ≤ |di|}), which are co-

occurrences of T i. F(T i) encodes the number of oc-currences of T i in the database, F(t i

k) denotes howoften t i

k co-occurs with T i. The relevancy R(t ik) of a

tag t ik for data category i is defined by

R(t ik) =

F(t ik)

F(T i).

In the following, we distinguish between sharedtags and unique tags. A shared tag t i

s has multipleinstances in the TagPie, which are placed in differentsectors. These instances are defined as

I(t is) = {t i

s}∪{t js |1≤ j ≤ n, i 6= j, t i

s = t js },

and |I(t is)| denotes the number of instances. A unique

tag t iu occurs only once in the TagPie as a tag of the

i-th data category, so that I(t iu) = {t i

u} and |I(t iu)|= 1.

4.1 Layout Algorithm

In preparation, we order the data categories d1, . . . ,dn

according to their similarity aiming to place as manysimilar tags as possible close to each other. The simi-larity s(di,d j) is defined by the number of shared tagsin proportion to the number of unique tags betweentwo data categories di and d j as the Jaccard index

s(di,d j) =|di∩d j||di∪d j|

.

Initially, we put the two most similar data categoriesnext to each other in a double-ended queue. Then,we iteratively determine the data category di with thehighest similarity to either the first (then, we insert di

at the start of the queue) or the last element in thequeue (then, we insert di at the end of the queue).With the resultant ordering at hand, we estimate theamount of space required to place all tags of a datacategory. This is achieved by mapping the tags inthe corresponding font sizes dependent on their fre-quencies, and by adding up the bounding boxes for

Figure 1: Defining circular tag cloud sectors.

all tags. So, we obtain an approximate space require-ment for each data category. Based on that proportion,we define the angles ϕ1, . . . ,ϕn of circular sectors ford1, . . . ,dn that subdivide a Cartesian coordinate sys-tem at it’s center (0,0) as shown in Figure 1.

Main Tag Placement. At first, we position themain tags T 1, . . . ,T n in the centers of their corre-sponding TagPie sectors. To define these centers, weneed to estimate the radius r of the TagPie before ac-tually computing it’s layout. Therefore, we compute aWordle tag cloud without sectors containing all tags.Using the obtained radius of this tag cloud as expectedradius r for the TagPie, we can place the main tag T i

of a data category di in the TagPie’s correspondingsector at position p(T i) = (xi,yi) as illustrated in Fig-ure 1. Starting with the orientation s, p(T i) is definedby

xi = γ · r · cos(π+i−1

∑k=0

ϕk +ϕi

2)

and

yi = γ · r · sin(π+i−1

∑k=0

ϕk +ϕi

2).

With γ = 0.5, we position T i at the center of the sec-tor. Especially when several small sectors are adja-cent, the corresponding main tags can occlude. Toavoid these occlusions, we automatically decrease orincrease γ in such cases.

Tag Sorting. The idea of the sorting method is toplace tags with a high relevancy to all data categoriesin the center of the TagPie. The farther away from thecenter a tag is placed, the more relevant it is to thecorresponding data category. Thus, unique tags shallbe placed in the outer regions of TagPie sectors. Toobtain this ordering, one of the following conditionsneed to be fulfilled for arbitrary adjacent tags X and Yin a correctly sorted tag list {. . . ,X ,Y, . . .}:

C1: |I(X)|> |I(Y )|,C2: |I(X)|= |I(Y )| and U(X)<U(Y ), or

C3: |I(X)| = |I(Y )|, U(X) = U(Y ) and F(X) ≥F(Y ).

With C1, shared tags belonging to all data categoriesmove to the beginning of the tag list, and unique tagsmove to the end. In case of even numbers of instances,tags with low uniqueness values are treated beforetags with high uniqueness values (C2). The unique-ness of a tag X is defined by the quotient of the twomost frequent occurrence totals of X among all datacategories as

U(X) =

maxX1∈I(X)

R(X1)

maxX2∈I(X)\X1

R(X2).

The more characteristic a tag X is for a certain datacategory compared to the other data categories shar-ing X , the higher gets U(X). In case of even numbersof instances and even uniqueness values, C3 ensuresthat more frequent tags are processed earlier than lessfrequent ones. A final step slightly reorders the tagsaccording to the proportions of the data categories inthe TagPie. As (unique) tags of small data categoriesare usually less frequent than (unique) tags belongingto larger data categories, they are placed at the end ofthe tag list after sorting according to the above men-tioned conditions. In order to ensure that all TagPiesectors are uniformly filled with tags, this slight re-ordering guarantees that tags belonging to small datacategories are treated earlier during the layout algo-rithm.

With the final tag ordering, we iteratively positionall tags following an Archimedean spiral originatingfrom the tag cloud center at position (0,0). A tag isplaced if the determined position on the spiral lies inthe sector that is assigned to the corresponding datacategory, and if the tag does not occlude other tags.Otherwise, the tag will be placed in following turnsof the spiral farther away from (0,0).

4.2 Design

To avoid whitespaces, a problem addressed in (Seifertet al., 2008), the above outlined layout algorithm isbased on the Wordle algorithm, which permits over-lapping tag bounding boxes if the letters do not oc-clude. Thus, uniform, aesthetic tag clouds capableof visualizing much information compactly inside asmall region are obtained.

Tag design. We use several well-established de-sign features for the TagPie layout. Evaluated as be-ing the most powerful property in (Bateman et al.,2008), we use font size to encode the number ofoccurrences of each tag. The visualization of maintags, which are placed in the center of their assignedsectors, supersedes an additional legend, and further-more it serves the purpose of accentuating the belong-ing of related tags to their category. Main tags are

salient due to bold font style and underlinings. Statedin (Waldner et al., 2013), users perceive rotated tagsas “unstructured, unattractive, and hardly readable.”Therefore, we do not rotate tags to keep the layouteasily readable to provide an interface that is benefi-cial for the collaborating humanities scholars. Alsosuggested in (Waldner et al., 2013), color is the bestchoice for distinguishing categories. Hence, we usequalitative color maps to assign distinctive colors tod1, . . . ,dn. For this purpose, we use those qualita-tive color maps provided by ColorBrewer (Harrowerand Brewer, 2003) that contain solely saturated col-ors. Here, we consider not to assign red and greenhues as well as colors with similar hues to data cate-gories of adjacent TagPie sectors.

To visually separate two concurring tag groupingsin TagPies, we considered the Gestalt theory (Ware,2013). On the one hand, the differentiation betweenshared and unique tags was necessary for explorationpurposes, on the other hand, all tags that belong toa certain data category shall be visual unity. To fa-cilitate perceiving the former grouping, shared tagsreceive a black color while unique tags retain the cor-responding data category’s color, thus, implementingthe Gestalt principle of similarity. In order to achievethat the now differently colored tags of a data categorydi form a visual unity, we applied the Gestalt principleof enclosure by adding a background shape—coloredin a less saturated version of the color assigned to di—that encloses all tags of di.

Computing Background Shapes. In order to de-termine background shapes for the data categories, wecompute a Delaunay triangulation like illustrated inFigure 2. Iteratively, we insert the centroids of tagbounding boxes, and finally, we receive a triangula-tion of the tag distribution that contains three differenttriangle types. Either a triangle connects three tagsbelonging to the same data category, tags of two ortags of three data categories. Triangles of the formertype are not required for computing background shape

Figure 2: Delaunay triangulation to determine back-grounds.

borders as they lie completely inside the correspond-ing TagPie sector. When two tags of a triangle belongto the same data category, we interpolate a line seg-ment between the bounding boxes as shown in Fig-ure 2a. For triangles containing dummy nodes of thesuper triangle, we do not interpolate line segments,instead, we link the exterior vertices of the boundingboxes (see Figure 2b). When all three tags of a tri-angle belong to different data categories, we generatethree line segments each originating from the trianglecentroid as can be seen in Figure 2c. Finally, the bor-der of a data category di—drawn as a Bezier spline—is composed of the line segments of all triangles thateither contain one or two tags of di.

5 USAGE SCENARIOS

TagPies were designed during the digital humanitiesproject eXChange to support the comparative analy-sis of medical concepts in classical texts. Figure 3shows a screenshot of the web-based user interfacethe humanities scholars work with. TagPies are em-bedded as a distant reading visualization that con-trasts the co-occurrences of various keywords. Thescholar can configure TagPies by choosing the num-ber of tags to be shown, and by defining the maxi-mum distance between a searched keyword term andconsidered co-occurrences. After retrieving the re-sults, stopwords are removed according to stopwordlists provided by the humanities scholars. The re-maining co-occurrences are visualized in the TagPie.To facilitate navigation and exploration abilities, weenhanced TagPies by basic means of interaction ac-cording to the humanities scholars’ wishes. Of par-ticular interest was highlighting spelling variants ofwords, which are provided by the backend of the re-search platform. With mouse interaction, we enable

Figure 3: Screenshot of the TagPies user interface. Theco-occurrence medicus (doctor) of aeger (sick) is selected.

the scholar to detect related tags more quickly. Hov-ering a tag highlights the remaining shared tags andspelling variants. Additionally, all related tags arelisted in a tooltip (shown on mouse click) that illus-trates the distribution using a bar chart. By clickinga tag, a close reading view lists previews of text pas-sages containing the selected co-occurrence and thecorresponding keyword. The humanities scholars de-sired this connection to the underlying texts in orderto quickly inspect interesting word relationships.

In the following, we emphasize the benefit of Tag-Pies for investigating novel research questions in thehumanities that demand distant reading arguments.We illustrate three usage scenarios provided by col-laborating humanities scholars, for whom TagPiesturned out to be heuristically valuable for philologi-cal matters.

5.1 Comparing gibbus and gibbosus

Looking for the term “humpy” in Latin dictionar-ies, the synonyms gibbus and gibbosus are found.The first example illustrates how a humanities scholarused TagPies to verify this synonymity. To do so,she constructed two keyword-based search queries in-cluding all declensions of the two terms:

gibbus:gibbus|gibbum|gibba|gibbi|gibbo|gibbe|gibbae|gibbam|gibbas|gibbis|gibbos|gibbarum|gibborum

gibbosus:gibbosus|gibbosum|gibbosam|gibbosas|gibbosae|gibbosis|gibbosos|gibbosa|gibbosi|gibboso|gibbose|gibbosarum|gibbosorum

The resulting TagPie (Figure 4) provides an overviewof the co-occurrences of both terms. It contains198 text passages for gibbus and 88 text passagesfor gibbosus. The group of black colored, sharedtags in the center of the TagPie illustrates a syn-onymous usage of the terms concerning the (human)body, e.g., as they both co-occur with body parts likepede (foot) or manu (hand), or they are used in thecontext of diseases like eye diseases (albuginem, lip-pus) or broken bones (fracto). More physical terms,e.g., dorso (back), caput (head) and cerebri (brain),co-occur only with gibbus. In contrast, numerousterms related to the field of Christian morality co-occur with gibbosus, e.g., cupiditatis (lust), avarum(stingy), modestia (moderation) and glorietur (boast).In combination with close reading several text pas-sages, the humanities scholar hypothesized that gib-bus was rather used to describe physical features, and

Figure 4: Comparing gibbus and gibbosus.

gibbosus was mostly used in moral contexts. The hu-manities scholar stated that dictionaries like the The-saurus Linguae Latinae communicate that both termswere used in the actual physical meaning of “humpy”as well as in a figurative sense. But, as opposed toTagPies, dictionaries do not report about frequenciesand a tendency in which contexts either of the termswas used. In that scenario, using TagPies was ben-eficial as it surpasses the reconstructive character ofdictionaries by referring back to the actual usage ofterms in text corpora.

5.2 Comparing morbus comitialis andmorbus sacer

The second example narrates an unexpected insightfor a humanities scholar who wanted to verify her hy-pothesis that the terms morbus comitialis and morbussacer were both similarly used to describe the diseaseepilepsy. Two keyword-based search queries includedall possible cases:

morbus comitialis:morbus&comitialis|morbi&comitialis|morbo&comitiali|morbum&comitialem

morbus sacer:morbus&sacer|morbi&sacri|morbo&sacro|morbum&sacrum

The TagPie for the posed queries (Figure 5) supportedthe scholar in examining three research questions:

• What is the semantic relationship betweenboth terms? In the center of the cloud the sharedtags graeci – appellant – ιεραν – νοσον, form-ing the phrase “the Greeks – call it – holy – dis-

Figure 5: Comparing morbus comitialis and morbus sacer.

ease,” can be seen. This relationship indicates thatboth terms were actually used as synonyms forepilepsy in the classical period.

• Were the terms used to describe the dis-ease in it’s medical or metaphorical meaning?Whereas morbus sacer, literally translated as“holy disease,” was rather used as an euphemisticpseudonym for epilepsy, the co-occurrences ofmorbus comitialis instead hypothesize a medi-cal disease, e.g., shown by remedia (medicine),medici (doctor), and medentes (curing).

• How was the overall knowledge about the dis-ease at that time? The co-occurrences for bothterms, e.g., caput (head), insania (insanity), animi(mind), corporis (body), nervorum (nerves), mor-tis (death), and abit (died), indicate that epilepsywas seen as a potentially lethal insanity with phys-ical symptoms.

In order to generate these hypotheses, the scholaroften used the close reading functionality. When ex-amining the last research question, the scholar dis-covered that Maurus Servius Honoratus, a populargrammarian in the 5th century, mistakenly conceivedepilepsy as a feverish disease, shown by co-occurringterms of morbus sacer like ignis (fire) or carbunculi(burning ulcer). As, in the first century, Pliny the El-der already ascertained that fever is not a symptom ofepilepsy, the humanities scholar denoted this discov-ery as not intutive, so that it would have never beenfound using traditional methods.

5.3 Comparing τεχνη, υγιεια and νοσος

The third example investigates the meaning of artin antiquity, a concept hard to describe nowadays.The idea at that time was that art can be taught asit includes knowledge. Therefore, art is related tomany fields in ancient Greek texts (Allen, 1999). Ex-pectedly, the number of text passages for the ancientGreek term for art, τεχνη, is enormous (6,216). Thefields of art are visible in the corresponding TagPieshown in Figure 6 as co-occurrences: φυσικη (natu-ral science), μαντικη (art of prophecy), γραμματικη(grammar), ανθρωπινη (human art), διαλεκτικη (di-alectic), ρητορικη (rhetoric), ποιητικη (poetics), ια-τρικη (medicine), μαγικη (magic) etc.

The analysis of the general term τεχνη comparedto the more specific ancient Greek terms for health(υγιεια) and disease (νοσος) composing the art ofphysicians—medicine—was of particular interest forone of the humanities scholars. So, she added twofurther sections to the TagPie representing 1,013 textpassages for υγιεια and 2,092 text passages for νοσος.In contrast to the diverse terms surrounding τεχνη, theco-occurrences here are closely related to their mainterms. Both terms co-occur with parts of the body,e.g., σωμα (body) and ψυχη (breath, life). Further-more, υγιεια is related to positive terms like καλλος(beauty), ισχυς (strength) or ηδονη (enjoyment),whereas νοσος occurs together with rather negativeterms like λοιμικη (plague), ασθενεια (weakness),γηρας (senility) or θανατος (death). Also, one of theknown reasons of diseases, poverty (πενια), co-occurs71 times.

This scenario illustrates the capability of TagPiesfor non-specific information discovery in a distantreading manner; around 9,000 text passages are sum-marized and dynamically accessible by clicking co-occurring terms. This way, new hypotheses can begenerated and verified—impossible using only tradi-tional close reading means. For example, the human-ities scholar discovered a frequent usage of πλουτος(wealth) in connection with υγιεια (52 times). Look-ing at the references, five text passages in the biog-raphy of Zeno of Elea (Vitae philosophorum, writtenby Diogenes Laertius) are listed among others. In thistext, various things are denoted as good or bad. Zenoof Elea categorizes neither health nor wealth as good,since both terms can be used also in a negative con-text. This example reveals another aspect of υγιεια ina philosophical rather than a medical context. The hu-manities scholar expected a correlation between med-ical art and wealth as a consequence of the medicalprofession after the 5th century BC in the context ofτεχνη, but πλουτος does not co-occur significantly.

Figure 6: Comparing τεχνη, υγιεια and νοσος.

6 DISCUSSION

The proposed tag cloud layout TagPies was designedto support answering a novel type of research questionin classical philology. Some aspects of the collabora-tive work are outlined below.

Evaluation. When developing TagPies and theuser interface for the eXChange project, we closelycollaborated with six humanities scholars—threepostdoctoral reseachers and three PhD students—whoiteratively evaluated current prototypes. At first, weprovided an interface consistent of a tag cloud show-ing the co-occurrences of a single keyword search anda basic close reading view in order to assess if themethodology can work for the targeted comparativeapproach. After positive feedback, we designed a firstcircular layout according to the approach outlined inSection 4. In that version, coloring the tags in depen-dency on their category was one of the few designcriteria. Tough the humanities scholars were keenworking with the proposed visualization, they hadproblems separating unique from shared terminology.In the following, we prepared and discussed variousdesign variants, e.g., using different font styles oradding bar glyphs to shared tags, without finding afavorite solution. This situation pushed us to inves-tigate further on the perception of objects as visualgroups. Finally, following the principles of Gestalttheory yielded the agreed-upon TagPies design pre-sented in this paper. In addition to design features,which also included the selection of an appropriatefont family, the sorting method changed gradually,so that tags with a particular relevance to an individ-ual data category move to the outer regions of a Tag-

Pie, and shared tags with a similar relevance to mul-tiple categories are positioned in the center. All inall, the collaborating humanities scholars of the eX-Change project evaluated TagPies as comprehensibleand aesthetic, especially the pie chart style was per-ceived as a suitable metaphor. In addition, we re-ceived positive feedback from scholars using TagPiesto comparatively visualize textual data for investigat-ing research questions in other domains. In that con-text, one scholar rated TagPies as a visualization witha “broad relevance to the entire field of digital human-ities.”

Limitations. Our main objective was generatingaesthetic, uniformly looking tag clouds that supportinvestigating the given research tasks. To gain unifor-mity, we start the Archimedean spiral to determine alltag positions at the tag cloud origin. Especially, if aTagPie consists of many data categories, an adjacentplacement of shared tags cannot be guaranteed. Then,highlighting shared tags placed far apart requires us-ing the provided interaction functionality. We ex-perimented with moving the spiral origin to alreadyplaced instances of shared tags or to borders betweenTagPie sectors that share tags, but these approachesdestroyed the intended unity. Sometimes, humanitiesscholars are interested in rare cases. TagPies aim tovisualize the most significant co-occurrences of thegiven search terms. The more occurrences of a searchterm exist, the more co-occurrences need to be dis-played. Then, rare but potentially interesting casesmay be not shown due to the limited number of tagspositioned in a TagPie. Humanities scholars usuallycompare a limited number of data categories (up tofive). In order to assess the scalability of our ap-proach, we tried examples with more data categories.Then, pie sectors become very small, so that tags arehard to position. As a consequence, a sector mightdecompose into several components, which destroysthe intended uniformity. Furthermore, effective, qual-itative color maps are hard to define for a large num-ber of data categories. Even though the gained re-sults were satisfactory, TagPies produce best layoutsfor few data categories.

7 CONCLUSION

For the humanities, the digital age brought changesto the scholars’ research workflows. The ability toquery digital libraries in order to receive text passagescontaining specific keywords on demand quickens hy-potheses generation, but often, vast numbers of re-sults are hard to process as text passages need to bechecked individually. Tag clouds can be used to facili-

tate the access to search results by aggregating the co-occurrences of a searched keyword, so that frequentcollacates get salient and the context that defines themeaning of a keyword gets visible. TagPies extendthis idea by arranging the co-occurrences of differentkeyword searches in a pie chart manner, so that thecontexts in which different keywords occur can be an-alyzed and compared to each other. This comparativeanalysis was the desired capability of TagPies in thecorresponding digital humanities project.

During the development, we closely collaboratedwith humanities scholars, who state that the resul-tant visual interface, consistent of TagPies as a dis-tant reading view for the results of several keywordsearches and a close reading view for the text pas-sages, is a valuable analysis instrument that serves anovel type of research interest that requires distantreading arguments—the comparison of concepts inclassic texts—and provokes new research questions.Furthermore, the humanities scholars mentioned thatthey have a much more intuitive and dynamic accessto search results when using TagPies in comparison toworking with traditional result lists.

Besides their application to compare the co-occurrences of words, TagPies are also used to sup-port investigating other types of research questionsin the eXChange project, e.g., for exploring con-cept search results of classical terminology (Cheemaet al., 2016a), or to facilitate the close reading oftexts (Cheema et al., 2016b). Furthermore, TagPiesare embedded in the Corpus Explorer 4 to support cor-pus linguists in analyzing political texts. In order toenable a wide applicability, we designed TagPies theway that it can easily be adapted to textual data ofany domain. Representative examples are outlined byJänicke (Jänicke et al., 2016a), and some can be foundon the TagPies homepage.5

ACKNOWLEDGEMENTS

The authors thank all humanities scholars involved inthe TagPies design process. We are furthermore in-debted to Thomas Efer for maintaining the eXChangeproject backend, and for providing interfaces requiredto generate TagPies. This research was funded by theGerman Federal Ministry of Education and Research.

4Corpus Explorer: https://goo.gl/gKWJSX5http://tagpies.vizcovery.org/

REFERENCES

Allen, J. (1999). s.v. Kunst. Der Neue Pauly (DNP), Bd.6:Sp. 915–919.

Barth, L., Kobourov, S., and Pupyrev, S. (2014). Experi-mental Comparison of Semantic Word Clouds. In Ex-perimental Algorithms, volume 8504 of Lecture Notesin Computer Science, pages 247–258. Springer Inter-national Publishing.

Bateman, S., Gutwin, C., and Nacenta, M. (2008). SeeingThings in the Clouds: The Effect of Visual Featureson Tag Cloud Selections. In Proceedings of the Nine-teenth ACM Conference on Hypertext and Hyperme-dia, HT ’08, pages 193–202. ACM.

Beavan, D. (2008). Glimpses though the clouds: collocatesin a new light. In Proceedings of the Digital Humani-ties 2008.

Beavan, D. (2011). ComPair: Compare and Visualise theUsage of Language. In Proceedings of the Digital Hu-manities 2011.

Binder, J. M. and Jennings, C. (2014). Visibility and mean-ing in topic models and 18th-century subject indexes.Literary and Linguistic Computing, 29(3):405–411.

Burch, M., Lohmann, S., Beck, F., Rodriguez, N., Di Sil-vestro, L., and Weiskopf, D. (2014). RadCloud: Vi-sualizing Multiple Texts with Merged Word Clouds.In Information Visualisation (IV), 2014 18th Interna-tional Conference on, pages 108–113.

Castellà, Q. and Sutton, C. (2014). Word Storms: Multi-ples of Word Clouds for Visual Comparison of Docu-ments. In Proceedings of the 23rd International Con-ference on World Wide Web, WWW ’14, pages 665–676. ACM.

Cheema, M. F., Jänicke, S., Blumenstein, J., and Scheuer-mann, G. (2016a). A Directed Concept Search En-vironment to Visually Explore Texts Related to User-defined Concept Models. In Proceedings of the 11thJoint Conference on Computer Vision, Imaging andComputer Graphics Theory and Applications, pages72–83.

Cheema, M. F., Jänicke, S., and Scheuermann, G. (2016b).AnnotateVis: Combining Traditional Close Readingwith Visual Text Analysis. In Workshop on Visual-ization for the Digital Humanities, IEEE VIS 2016,Baltimore, Maryland, USA.

Collins, C., Viegas, F., and Wattenberg, M. (2009). Par-allel Tag Clouds to explore and analyze faceted textcorpora. In Visual Analytics Science and Technology,2009. VAST 2009. IEEE Symposium on, pages 91–98.

Cui, W., Liu, S., Tan, L., Shi, C., Song, Y., Gao, Z., Qu,H., and Tong, X. (2011). TextFlow: Towards BetterUnderstanding of Evolving Topics in Text. Visualiza-tion and Computer Graphics, IEEE Transactions on,17(12):2412–2421.

Cui, W., Liu, S., Wu, Z., and Wei, H. (2014). How Hier-archical Topics Evolve in Large Text Corpora. Visu-alization and Computer Graphics, IEEE Transactionson, 20(12):2281–2290.

Cui, W., Wu, Y., Liu, S., Wei, F., Zhou, M., and Qu, H.(2010). Context preserving dynamic word cloud vi-

sualization. In Pacific Visualization Symposium (Paci-ficVis), 2010 IEEE, pages 121–128.

Diakopoulos, N., Elgesem, D., Salway, A., Zhang, A., andHofland, K. (2015). Compare Clouds: VisualizingText Corpora to Compare Media Frames. In Proc. ofIUI Workshop on Visual Text Analytics.

Eisenstein, J., Sun, I., and Klein, L. F. (2014). Ex-ploratory Thematic Analysis for Historical NewspaperArchives. In Proceedings of the Digital Humanities2014.

Fankhauser, P., Kermes, H., and Teich, E. (2014). Combin-ing Macro- and Microanalysis for Exploring the Con-strual of Scientific Disciplinarity. In Proceedings ofthe Digital Humanities 2014.

Gansner, E. R., Hu, Y., and Kobourov, S. (2010). Gmap: Vi-sualizing graphs and clusters as maps. In 2010 IEEEPacific Visualization Symposium (PacificVis), pages201–208.

Gibbs, F. and Owens, T. (2012). Building Better Digital Hu-manities Tools: Toward broader audiences and user-centered designs. Digital Humanities Quarterly, 6(2).

Gleicher, M., Albers, D., Walker, R., Jusufi, I., Hansen,C. D., and Roberts, J. C. (2011). Visual Comparisonfor Information Visualization. Information Visualiza-tion, 10(4):289–309.

Harrower, M. and Brewer, C. A. (2003). ColorBrewer.org:An Online Tool for Selecting Colour Schemes forMaps. The Cartographic Journal, 40(1):27–37.

Havre, S., Hetzler, E., Perrine, K., Jurrus, E., and Miller, N.(2001). Interactive Visualization of Multiple QueryResults. In Proceedings of the IEEE Symposium onInformation Visualization 2001 (INFOVIS’01), INFO-VIS ’01, pages 105–, Washington, DC, USA. IEEEComputer Society.

Hearst, M. and Rosner, D. (2008). Tag Clouds: Data Anal-ysis Tool or Social Signaller? In Hawaii InternationalConference on System Sciences, Proceedings of the41st Annual, pages 160–160.

Hinrichs, U., Alex, B., Clifford, J., Watson, A., Quigley, A.,Klein, E., and Coates, C. M. (2015). Trading Con-sequences: A Case Study of Combining Text Miningand Visualization to Facilitate Document Exploration.Digital Scholarship in the Humanities.

Hinrichs, U., Forlini, S., and Moynihan, B. (2016). Spec-ulative Practices: Utilizing InfoVis to Explore Un-tapped Literary Collections. Visualization and Com-puter Graphics, IEEE Transactions on, 22(1):429–438.

Jänicke, S. (2016). Valuable Research for Visualization andDigital Humanities: A Balancing Act. In Workshopon Visualization for the Digital Humanities, IEEE VIS2016, Baltimore, Maryland, USA.

Jänicke, S., Efer, T., Blumenstein, J., Wöckener-Gade, E.,Schubert, C., and Scheuermann, G. (2016a). Über dieNutzung von TagPies zur vergleichenden Analyse vonTextdaten. In Konferenzabstracts der Digital Human-ities im deutschsprachigen Raum 2016.

Jänicke, S., Franzini, G., Cheema, M. F., and Scheuermann,G. (2015). On Close and Distant Reading in Digi-tal Humanities: A Survey and Future Challenges. In

Borgo, R., Ganovelli, F., and Viola, I., editors, Eu-rographics Conference on Visualization (EuroVis) -STARs. The Eurographics Association.

Jänicke, S., Franzini, G., Cheema, M. F., and Scheuermann,G. (2016b). Visual Text Analysis in Digital Humani-ties. Computer Graphics Forum.

Jänicke, S. and Scheuermann, G. (2016). TagSpheres: Visu-alizing Hierarchical Relations in Tag Clouds. In Pro-ceedings of the 11th Joint Conference on ComputerVision, Imaging and Computer Graphics Theory andApplications, pages 15–26.

Jänicke, S. and Scheuermann, G. (2017). On the Visual-ization of Hierarchical Relations and Tree Structureswith TagSpheres. In Braz, J., Magnenat-Thalmann,N., Richard, P., Linsen, L., Telea, A., Battiato, S., andImai, F., editors, Computer Vision, Imaging and Com-puter Graphics Theory and Applications: 11th Inter-national Joint Conference, VISIGRAPP 2016, Rome,Italy, February 27 – 29, 2016, Revised Selected Pa-pers, pages 199–219. Springer International Publish-ing, Cham.

Koch, S., John, M., Worner, M., Muller, A., and Ertl, T.(2014). VarifocalReader – In-Depth Visual Analysisof Large Text Documents. Visualization and Com-puter Graphics, IEEE Transactions on, 20(12):1723–1732.

Lee, B., Riche, N., Karlson, A., and Carpendale, S. (2010).SparkClouds: Visualizing Trends in Tag Clouds. Visu-alization and Computer Graphics, IEEE Transactionson, 16(6):1182–1189.

Liu, X., Shen, H.-W., and Hu, Y. (2014). Supporting mul-tifaceted viewing of word clouds with focus+contextdisplay. Information Visualization.

Lohmann, S., Ziegler, J., and Tetzlaff, L. (2009). Compari-son of Tag Cloud Layouts: Task-Related Performanceand Visual Exploration. In Human-Computer Interac-tion - INTERACT 2009, volume 5726 of Lecture Notesin Computer Science, pages 392–404. Springer BerlinHeidelberg.

Milgram, S. and Jodelet, D. (1976). Psychological Maps ofParis. Environmental Psychology, pages 104–124.

Montague, J., Simpson, J., Rockwell, G., Ruecker, S., andBrown, S. (2015). Exploring Large Datasets withTopic Model Visualizations. In Proceedings of theDigital Humanities 2015.

Moretti, F. (2005). Graphs, Maps, Trees: Abstract Modelsfor a Literary History. Verso.

Munzner, T. (2009). A Nested Model for VisualizationDesign and Validation. Visualization and ComputerGraphics, IEEE Transactions on, 15(6):921–928.

Murugesan, S. (2007). Understanding Web 2.0. IT Profes-sional, 9(4):34–41.

Oelke, D. and Gurevych, I. (2014). A Study on Human-Generated Tag Structures to Inform Tag Cloud Lay-out. In Proceedings of the 2014 International Work-ing Conference on Advanced Visual Interfaces (AVI2014), pages 297–304. ACM.

Paulovich, F. V., Toledo, F., Telles, G. P., Minghim, R.,and Nonato, L. G. (2012). Semantic Wordification of

Document Collections. In Computer Graphics Forum,volume 31, pages 1145–1153. Wiley Online Library.

Schrammel, J., Leitner, M., and Tscheligi, M. (2009).Semantically Structured Tag Clouds: An EmpiricalEvaluation of Clustered Presentation Approaches. InProceedings of the SIGCHI Conference on HumanFactors in Computing Systems, CHI ’09, pages 2037–2040. ACM.

Schrammel, J. and Tscheligi, M. (2014). Patterns in theClouds - The Effects of Clustered Presentation on TagCloud Interaction. In Building Bridges: HCI, Visu-alization, and Non-formal Modeling, Lecture Notesin Computer Science, pages 124–132. Springer BerlinHeidelberg.

Seifert, C., Kump, B., Kienreich, W., Granitzer, G., andGranitzer, M. (2008). On the Beauty and Usabilityof Tag Clouds. In Information Visualisation, 2008. IV’08. 12th International Conference, pages 17–25.

Sinclair, J. and Cardew-Hall, M. (2008). The folksonomytag cloud: when is it useful? Journal of InformationScience, 34(1):15–29.

Viegas, F. and Wattenberg, M. (2008). TIMELINES: TagClouds and the Case for Vernacular Visualization. in-teractions, 15(4):49–52.

Viegas, F., Wattenberg, M., and Feinberg, J. (2009).Participatory Visualization with Wordle. Visualiza-tion and Computer Graphics, IEEE Transactions on,15(6):1137–1144.

Viegas, F., Wattenberg, M., van Ham, F., Kriss, J., andMcKeon, M. (2007). ManyEyes: a Site for Visual-ization at Internet Scale. Visualization and ComputerGraphics, IEEE Transactions on, 13(6):1121–1128.

Vuillemot, R., Clement, T., Plaisant, C., and Kumar, A.(2009). What’s being said near "Martha"? Explor-ing name entities in literary text collections. In VisualAnalytics Science and Technology, 2009. VAST 2009.IEEE Symposium on, pages 107–114.

Waldner, M., Schrammel, J., Klein, M., Kristjánsdóttir, K.,Unger, D., and Tscheligi, M. (2013). FacetClouds:Exploring Tag Clouds for Multi-dimensional Data.In Proceedings of Graphics Interface 2013, GI ’13,pages 17–24. Canadian Information Processing Soci-ety.

Ware, C. (2013). Information Visualization: Perception forDesign. Elsevier.

Wu, Y., Provan, T., Wei, F., Liu, S., and Ma, K.-L. (2011).Semantic-Preserving Word Clouds by Seam Carving.In Computer Graphics Forum, volume 30, pages 741–750. Wiley Online Library.