web viewteam research proposal. team politic. political opinions in literature: identifying themes...

51
MLA 7 th 1 Team Research Proposal Team POLITIC Political Opinions in Literature: Identifying Themes in International Compositions Robert Cai, Matthew Carr, Adam Elrafei, Alexander Goniprow, Adrian Hamins-Puertolas, Manpreet Khural, Andrew Li, Alexandra Winter, Soumya Yanamandra, Dan Yang, and Kay Zhang University of Maryland Gemstone Program Mentor: Dr. Peter Mallios Librarian: Timothy Hackman and The Maryland Institute for Technology in the Humanities

Upload: phamkhanh

Post on 04-Feb-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Web viewTeam Research Proposal. Team POLITIC. Political Opinions in Literature: Identifying Themes in International Compositions. Robert Cai, Matthew Carr, Adam Elrafei

MLA 7th 1

Team Research Proposal

Team POLITIC

Political Opinions in Literature: Identifying Themes in International Compositions

Robert Cai, Matthew Carr, Adam Elrafei, Alexander Goniprow,

Adrian Hamins-Puertolas, Manpreet Khural, Andrew Li, Alexandra Winter,

Soumya Yanamandra, Dan Yang, and Kay Zhang

University of Maryland Gemstone Program

Mentor: Dr. Peter Mallios

Librarian: Timothy Hackman

and

The Maryland Institute for Technology in the Humanities

We pledge on our honor that we have not given or received any unauthorized assistance on this

assignment.

Page 2: Web viewTeam Research Proposal. Team POLITIC. Political Opinions in Literature: Identifying Themes in International Compositions. Robert Cai, Matthew Carr, Adam Elrafei

Team POLITIC

Introduction

The United States was involved in numerous international conflicts throughout the 20th

century. A prevalent theory suggests deeper public understanding of foreign cultures might have

allowed the United States to avoid several of these conflicts, including the Iran Hostage Crisis

and the Vietnam War (Li). Since the United States is a democracy, citizen perception of foreign

countries has a direct relationship with foreign policies enacted. A thorough understanding of

how the American public gathers its perceptions of foreign cultures is crucial to fully

comprehend American foreign policy and international relations. Foreign literature is one

important medium that exposes the United States to the political and cultural ideologies of other

countries (Griswold 1077). The American public reads novels by foreign authors to gain an

intimate perspective of foreign societies—views unavailable through domestic media. Readers

can also connect to other cultures because novels create emotional ties by appealing to universal

human themes (Aubry 27). At the same time, international and domestic political concerns guide

the United States’ public interest in foreign literature. For instance, it is not a coincidence that the

peaceful writings of Gandhi became important in the United States during the Civil Rights

Movement (Mallios 10-19).

        However, different foreign authors often provide opposing viewpoints of their societies.

The most popular works form a selective base of foreign literature that potentially accommodates

elites’ self-serving political biases. Using experimental methods, Gilens asserts that the United

States’ ignorance and misinformation “leads many [citizens] to hold political views different

from those they would hold otherwise” (379). Therefore, understanding public intent and attitude

requires knowing why certain novels and authors seem representative of a cultural canon. To

become a better-informed political citizen of the United States, one must think critically about

2

Page 3: Web viewTeam Research Proposal. Team POLITIC. Political Opinions in Literature: Identifying Themes in International Compositions. Robert Cai, Matthew Carr, Adam Elrafei

Team POLITIC

the uses of foreign literature.

        Our study will investigate how publicly available United States media received foreign

novels and authors and how these portrayals work toward social and political ends of

government support and criticism (Mallios 10-19). Specifically, we will conduct a low-constraint

case study of Russian literature to address the following question: Did the reception of Russian

novels and authors in the United States and United States foreign policy toward Russia reflect

each other from 1900-1923? We hypothesize that the reception of Russian literature in the

United States significantly correlates with United States policies toward Russia, due to inherent

ties between literary evaluation and political understanding. Scholars, politicians, and other

government officials will likely take interest in our study.

        We will use the portrayals of selected Russian novels and authors in nationally available

print media to define the reception of Russian literature in the United States during this time

period. We recognize scholars could investigate how alternative forms of media, such as pictures

or political cartoons, influence public understanding. However, we chose print media because it

is the easiest to quantitatively analyze. We will define United States foreign policy toward

Russia through quantifiable measures such as foreign aid, military investment, and trade deals

from 1900-1923. This will take the form of overarching topics that describe the types of policies

enacted, such as interventionism and humanitarianism. Our analysis will include keyword

searches relative to both literary reception and foreign policy. We will track how these themes

have evolved over time using techniques of topic modeling.1

        Our study does not seek to determine a relationship between political climates and

messages found in novels, opinions held by authors, or motivations behind translators. Instead,

we will determine the extent to which there is a relationship between media reception of Russian

1 See Appendix H for an example of topic modeling output.

3

Page 4: Web viewTeam Research Proposal. Team POLITIC. Political Opinions in Literature: Identifying Themes in International Compositions. Robert Cai, Matthew Carr, Adam Elrafei

Team POLITIC

literature in the United States and the political climate. Our research is distinguished from

previous studies in two ways: it analyzes reception in United States media and not the intent of

authors or translators, and we will accomplish our analysis through quantitative, not just

qualitative, methods.

Throughout the rest of our proposal, we will summarize our literature review, outline our

methodology, explain the limitations of our research, list confounding variables, and conclude

with descriptions of our anticipated results, our budget, our timeline, and the statistical tools we

will use throughout the project.

Literature Review

Introduction of Russian Literature in the Western World

Eugene-Melchoir de Vogue's Le Roman Russe (The Russian Novel) in 1886 represented

the increasing interest in Russian literature in Western Europe and America. Many writers,

including Isabel Hapgood and Constance Garnett, published English translations of Russian

novels, short stories, and poems to critical acclaim in subsequent decades (Moser 431). In other

words, the late nineteenth and early twentieth centuries marked the availability of Russian

literature to US public and intellectuals.

Many studies have sought to understand literary themes found in major Russian works.

For example, Emerson analyzes Leo Tolstoy’s views on war through a close reading of his many

texts (1855). However, only a few studies address Russian literary reception in the United States

during the early twentieth century. One of these rarities is Goldfarb’s account of how a

prominent literary critic, William Dean Howells, supported Tolstoy’s works in the United States

during the twentieth century (318). However, this study is limited in that it only contemplates

4

Page 5: Web viewTeam Research Proposal. Team POLITIC. Political Opinions in Literature: Identifying Themes in International Compositions. Robert Cai, Matthew Carr, Adam Elrafei

Team POLITIC

Russian literary reception through Howells’ and his critics’ views. We intend to expand on such

studies by using comprehensive statistical tools to analyze a wider base of reception material.

Canon Formation and Politics

Political motivations shape a nation’s literary canon, which in turn projects that nation’s

identity. The idea of a national literature emerged in the late eighteenth century as a way of

proving cultural independence on an international level (Corse, Nationalism and Literature 7-

14). Original research studies suggest canonical or high-culture literature does not reveal how

citizens perceive themselves, but rather how elites want to envision their nation (ibid 74). These

previous studies turn to college syllabi and literary prizes to define the most frequently appearing

works as canonical or high-culture (Brown, 1; Corse, Nations and Novels 1279-82). Unlike

bestsellers or popular culture novels, canonical texts differ greatly between countries, as they are

symbolic in value and not simply “economic commodities.” Theories of canon formation state

novels have to experience a conjunction of large sales and certain types of recognition to reach

canonical status (Ohmann 206). This recognition refers to the critical reception of works found in

publications that “carried special weight in forming cultural judgments,” such as the New York

Times Book Review and the New Republic (204). However, scholars have never specified the

ways in which elites have translated cross-cultural differences into literature.

Topic Modeling

Researchers use topic modeling to analyze large corpora of data. Topic modeling affirms

“documents are mixtures of topics, where a topic is a probability distribution over words”

(Steyvers 2). Furthermore, Latent Dirichlet Allocation (LDA), a more specific type of topic

modeling, asserts each document from a larger corpus consists of a plurality of topics (Chaney

and Blei 2). In past studies, researchers have used topic modeling in general and LDA

5

Page 6: Web viewTeam Research Proposal. Team POLITIC. Political Opinions in Literature: Identifying Themes in International Compositions. Robert Cai, Matthew Carr, Adam Elrafei

Team POLITIC

specifically to analyze large corpora of data. For example, a 100-topic LDA model generated

word probabilities under each topic for all articles in the journal Science between 1880 and 2002

(ibid 4).

More complex versions of topic modeling, however, can gather more information from

our Russian author database. For example, Topics over Time (TOT) models can account for the

chronology of documents in a corpus (ibid 9). Since our documents are dynamic in that they

change over time, LDA would confound the topics’ changes and lose any perceivable patterns.

Xuerui Wang and Andrew McCallum explain the topic analysis of US Presidential State-of-the-

Union addresses, where LDA “confounds Mexican-American War (1846-1848) with some

aspects of World War I (1914-1918)” since it is “unaware of the 70-year separation between the

two events” (1). Modeling topics over time serves to address this issue.

In Wang and McCallum’s study, they incorporated timestamps to help track “changes in

the occurrence of the topics themselves” as a function of time (2). They tested their model on

three data sets: “more than two centuries of U.S. Presidential State-of-the-Union addresses,” “17-

year history of the NIPS [Neural Information Processing Systems] conference,” and “nine

months of email archive” (ibid). The results of their study show the TOT model is able to predict

the timestamps of documents and generates topics that are “more distinct from each other than

LDA topics” (ibid 5). In our research, we will also use a TOT model on the databases we

anticipate constructing to account for time.

Furthermore, modified versions of LDA can relate metadata to topics. Metadata is

information about the documents we collect such as “author, title, geographic location, [and]

links” (Blei 10). Therefore, we can also correlate influences such as the gender and ethnicity of

the authors of the reception material to word probabilities found in topics in our corpus.

6

Page 7: Web viewTeam Research Proposal. Team POLITIC. Political Opinions in Literature: Identifying Themes in International Compositions. Robert Cai, Matthew Carr, Adam Elrafei

Team POLITIC

Sentiment Analysis

Sentiment analysis is also useful for sorting through large corpora of data. While topic

modeling focuses on the subject of the data in question, sentiment analysis focuses on the

opinion expressed about the subject matter of the data (Lee and Pang 1). Multiple methods can

determine the sentiment of a piece of data. Lee and Pang compared three different algorithms

used for sentiment analysis: the Naive Bayes, maximum entropy classification, and support

vector machines (ibid 3). The Naive Bayes algorithm is a simplistic algorithm. It may not hold to

high accuracy rates with complicated sets of data, but it “tends to perform surprisingly well” and

is even the ideal algorithm for use with “problem classes with highly dependent features” (ibid).

Maximum entropy classification and support vector machines are both much more sophisticated

methods. Maximum entropy classification algorithms “make no assumptions about the

relationships between features”, which will make it better than Naive Bayes with data that has

little or no dependence on similar features (ibid 4). Support vector machines differ from both of

the previous methods in that they do not focus on probability, which brings them much closer to

traditional methods used for normal topic modeling adapted to work with sentiment analysis

(ibid 4).

For our project, sentiment analysis methods will allow us to quickly categorize articles by

gauging how American periodicals perceive and discuss Russian authors and novels during the

time period of interest. In addition, incorporating a sentiment categorization into our database

will allow future researchers to quickly add to and examine our data.

Foreign Policy Analysis

Political scientists have devised several models and theories to explain how foreign

policy develops (Boyer 185). One such theory is the rational actor model, which states stimuli

7

Page 8: Web viewTeam Research Proposal. Team POLITIC. Political Opinions in Literature: Identifying Themes in International Compositions. Robert Cai, Matthew Carr, Adam Elrafei

Team POLITIC

and immediate responses lead to the creation of foreign policy (Boyer 189). However, the

political aspect of our study does not seek to determine how political leaders create foreign

policy, but rather attempts to measure and quantify it. Many previous studies have determined

United States foreign policy towards various nations by analyzing its components. For example,

Rick Travis analyzes foreign policy towards Africa by focusing on foreign aid to the continent

(798). Haslam focuses on direct foreign investment and the corresponding treaties to determine

United States foreign policy toward other nations (1182). For our study, we will gather data on

“exports, imports, investments, arms sales, and categories of foreign aid (bilateral, aggregate, and

per capita)” between the United States and the Russian Empire (and later the Soviet Union) to

define United States foreign policy (Watson 253).

Methodology

Our first tasks were to determine a time range and country to investigate, as outlined in

the literature review. We selected an upper time bound of 1923, since all preceding publications

are in the public domain and we can publicly release all collected data. We chose 1900 as our

lower time bound to guarantee a significant number of periodicals will be available.2 Time

allowing, we may be able to expand the time period of interest, guaranteeing more articles for

analysis. We decided to investigate Russian literature for several reasons. First, Russia was a

focal point of the United States during the twentieth century. World War I, the Bolshevik

Revolution, and the threat of communism led to increased public and governmental interest in

Russia during our selected time period. Second, only a relatively small number of significant

Russian authors had works available in English at the time. A narrow range of Russian literary

figures suggests American periodicals interested in examining Russian literature had to invoke

2 We anticipate finding a significant number of periodicals referencing Russian literary figures during the selected time period, as shown in Appendix E. By the beginning of our time period of interest, many national periodicals had already been well established (Baldasty).

8

Page 9: Web viewTeam Research Proposal. Team POLITIC. Political Opinions in Literature: Identifying Themes in International Compositions. Robert Cai, Matthew Carr, Adam Elrafei

Team POLITIC

certain Russian literary figures and works frequently, leading to larger sample sizes for the

selected authors. Subsequently, we will be able to construct a more exhaustive corpus3 of

Russian literature than of the more readily available literature from other countries, such as

Britain or France.

        To decide which literary figures to study, we compiled a list Russian literary figures

whose works had English translations during our time period of interest. Using that list, we

cataloged the number of search results found in the Readers’ Guide Retrospective4 for each

literary figure of interest.5 From this preliminary summary of the availability of periodicals in the

United States specifically discussing Russian literary figures, we chose to investigate Dostoevsky

and Tolstoy to maintain the feasibility of our study. We bring some bias in our selection of

literary figures, as we have chosen two of the most renowned Russian literary figures in the

United States. Therefore, our data regarding the reception of selected Russian literary figures in

the United States will not be representative of the entirety of Russian literary figures. We could

add one or two minor Russian authors to our research to increase the external validity of our

project if time permits.

        We resolved to capture a large, representative sample of the body of articles that

explicitly mention our selected Russian literary figures in periodicals popular in the United

States between 1900 and 1923. We will construct a database containing these articles using the

Readers’ Guide Retrospective index. The Retrospective’s emphasis on more popular periodicals

fits well with our intent to gain an understanding of how the general American public perceived

significant Russian literary figures in the early twentieth century. We will use a subject search of

3 See our Glossary of Terms in Appendix H4 The Readers’ Guide Retrospective is a comprehensive index of 608 popular periodicals published in the United States spanning from 1890 to 1982. 224 periodicals in the Readers’ Guide Retrospective – almost 37% of the database – are available prior to 1923. See Appendix G.5 An abridged version of this list can be found in Appendix E.

9

Page 10: Web viewTeam Research Proposal. Team POLITIC. Political Opinions in Literature: Identifying Themes in International Compositions. Robert Cai, Matthew Carr, Adam Elrafei

Team POLITIC

selected literary authors to explore the Readers’ Guide Retrospective and find articles

appropriate for the constructed database.

Scanning

Since most articles in the Readers’ Guide are not digitized, we have to digitize the

physical or microfilm versions of articles that fall within search parameters. We are currently

scanning articles by using publicly available resources at the University of Maryland McKeldin

Library. Therefore, our initial database construction will contain only articles available within

the University of Maryland archive system. Should time permit, it may be feasible to explore

other academic archives for articles from the Readers’ Guide Retrospective.

We have standardized scanning techniques to reduce preventable variations in image

quality and size.6 Systematic errors, including the presence of dust particles, stains, and other

debris on the scanning glass, also contribute to poor image quality and complicate analysis of the

database. We will therefore wipe down the scanning glass with glass cleaner solution and a

microfiber cloth before and after each scan to reduce this source of error.

Preservation of the scanned material is essential to data accuracy and reliability. During

microfilm scanning, an auto-adjust function adjusts the brightness and scanned size of each page

to produce an optimally clear image. Furthermore, we must adjust the resolution of the scanner

up from the default 300 dots per inch (DPI) to the maximum setting of 600 DPI. Similar settings

are also present on the non-print source scanners. Once saved, the file is left unmodified with the

exception of cropping. We will not manipulate images after scanning to retain the original image

data, quality, and integrity.

6 Examples of standardization in scanning articles include: uniform Scanner type, Scanner settings, and format in which material is saved. Images will be saved in the Tag Image File Format , a standard “for distributing high quality scanned images or finished photographic files” (“TIFF Files”).

10

Page 11: Web viewTeam Research Proposal. Team POLITIC. Political Opinions in Literature: Identifying Themes in International Compositions. Robert Cai, Matthew Carr, Adam Elrafei

Team POLITIC

We will convert these files to readable documents through Optical Character Recognition

software. We are using ABBYY FineReader 11 to save the files as plain text documents, DjVu

files, and FineReader documents. Topic modeling and sentiment analysis software can analyze

plain text files; the DjVu format compresses documents and maintains the layout of text on each

page; and we save FineReader files to document the transition from scanned image to readable

text. At this stage, we remove pictures from the pages.

Foreign Policy Analysis

The second portion of the project focuses on United States foreign policy toward Russia.

Our goal is to quantify the United States’ changing attitude and foreign policy towards Russian

over the established time period for the study of the authors. As mentioned earlier on, one

method of defining this relationship is to examine statistical data that relates to foreign policy

including foreign aid to Russia, trade relations, and America’s military presence in Russia. We

will also examine Presidential speeches delivered during the time period of interest; we will

simply run searches for references to Russia and transfer Presidential speeches that produce hits

into a database for future analysis.

With sufficient time, we will also collect and analyze newspaper editorials in a similar

manner. A theory discovered in preliminary research indicates that editorials of major

newspapers of the late nineteenth and early twentieth centuries, specifically The New York

Times, reflected political motivations of the United States government (“Deductions” 42;

Lippmann and Merz 3). If pursued, a newspaper editorial database provides our project with a

wider scope because it provides an additional level of comparison with other foreign policy data.

11

Page 12: Web viewTeam Research Proposal. Team POLITIC. Political Opinions in Literature: Identifying Themes in International Compositions. Robert Cai, Matthew Carr, Adam Elrafei

Team POLITIC

Annotation

As we assemble a corpus of articles regarding literary authors of interest, one priority is

to ensure we effectively organize the constructed database. We can more easily analyze an

organized corpus, making it essential for generating metadata7. Beyond ease of analysis,

metadata will give us the ability to categorize and analyze articles that deal with a specific topic

or exhibit similar traits, an approach that will yield more significant and interesting results than a

simple keyword search. The assembled corpus’s metadata will include, at a minimum, historical

and archival data concerning each article. We will also attempt to capture metadata regarding the

characteristics of each article, such as whether articles include explicit references to radical

politics, by annotating8 each article.

Annotation questions may reflect biases and stereotypes that we bring individually to the

project and it is difficult to ensure our uniformity in annotation. We determined what kind of

metadata to capture and refined annotation questions by annotating a sample of articles from the

assembled database.9 The goal of refining annotation questions is to confirm we will arrive at

similar answers if annotating independently.

In conjunction with the Maryland Institute for Technology in the Humanities (MITH), we

will attempt to automate the process by which we construct metadata, reducing time spent on this

portion of our methodology. It is feasible to automate metadata collection through computer

scripts, including collection of spelling variations in literary author names across the constructed

corpus,10 or, more abstractly, performing sentiment analysis on articles in the corpus. The end

7 According to the National Information Standards Organization, metadata is “structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource…metadata is often called data about data” (National Information Standards Organization).8 Annotation is a way to produce variables that will allow us to understand the political significance of Russian Literature in the United States and catalog the constructed corpus.9 Reference to revisions of Annotation Questions in Appendix D.10 The names of Russian authors often have a number of accepted spellings and are subject to frequent mistranslation (Pasterczyk). We will catalog alternative spellings of selected literary figures. The use of Boolean operators to

12

Page 13: Web viewTeam Research Proposal. Team POLITIC. Political Opinions in Literature: Identifying Themes in International Compositions. Robert Cai, Matthew Carr, Adam Elrafei

Team POLITIC

goal of our research project is to form conclusions about the relationship between the reception

of Russian literature in the United States and United States foreign policies. To reach these

conclusions, we will need to analyze both an annotated database of articles that pertain to

literature and an annotated database of articles that pertain to foreign policies.

In the data analysis section of the methodology, we expect to discover trends in the

databases that provide answers to certain questions. For the Russian literature database, the

questions will focus on the discourse throughout the United States surrounding the predominant

Russian authors.11

To conduct this style of data analysis, we will use a collection of data mining strategies.

Data mining refers to the process of collecting unknown properties of a database. Two basic

strategies are keyword frequencies12 and semantic parsing.13

The most important data mining analysis we plan to conduct is probabilistic topic

modeling, “a suite of algorithms that aim to discover and annotate large archives of documents

with thematic information” (Blei 2). A topic is a collection of words that all have a high

probability of being associated to one another. The basic probabilistic topic modeling is Latent

Dirichlet Allocation (LDA), as described in our literature review. The end result is that all the

articles in the database will have labels with proportions of various topics, which can then be

search for common name variations in a keyword search of the Readers’ Guide Retrospective will increase the number of articles found that relate to Russian literary authors of interest. An example of common name variations can be found in Appendix F.11 See Appendix C for current annotation guidelines.12 Keyword frequencies, achieved by using the publicly available Text Analysis Portal for Research (TAPoR) tool, will allow for organization of data on a more general level (Berson). An example of the information that TAPor can provide are the frequencies of author references and how often author names are found near each other.13 We will achieve semantic parsing by using software programs Shalmaneser and FrameNet, developed by the International Computer Science Institute at the University of California, Berkley. These programs will allow us to analyze databases using ‘frames,’ which, according to FrameNet, are semantic representations of situations. These tools highlight the types of sentences used in specific articles. For example, If an article contains many sentences framed under the semantic categories of ‘Judgment’ and ‘Assessment,’ we can safely conclude that article contains a number of opinionated statements. See Appendix H for more information.

13

Page 14: Web viewTeam Research Proposal. Team POLITIC. Political Opinions in Literature: Identifying Themes in International Compositions. Robert Cai, Matthew Carr, Adam Elrafei

Team POLITIC

categorized based on topic frequency. By comparison, we will implement a supervised version of

LDA (sLDA) in the automation of metadata creation.14 Finally, the last form of topic modeling

that we will use is the Topics Over Time model (TOT), described in the literature review, which

will introduce a time variable into our analysis (Wang and McCallum 5).

At the conclusion of this step in the process, we will have fully annotated and labeled the

databases by all the various data mining strategies. From this data, we can determine certain

trends in the topics in the articles. It is these trends that will allow us to make certain inferences

about the relationship between the reception of Russian literature in the United States and United

States foreign policy.

Conclusion

Our research aims to provide new insight into how the United States receives

foreign authors and novels and how this reception relates to US foreign policy. Our anticipated

results are vital to a recent development in the humanities known as the globalization of

American literary studies, given that “the mechanisms by which [differences between countries]

are translated into literature have never been fully specified” (Corse, Nations and Novels 1279).

Foreign novels are an inherent part of United States culture and if one were to ignore the

presence of foreign literature in United States politics, then one would be ignoring a major factor

that shaped both the citizens and government of the United States. “A sound public opinion

cannot exist without access to the news” and “evidence is needed” to reveal inherent biases in

publicly available portrayals of political events (Lippmann and Merz 1). Experts in fields of

literary studies claim scholars reach “little agreement about what constitutes literary value in this

field” and there exists “unnecessary confusion as to clear standards and goals” in evaluating 14 In sLDA, “each document is paired with a response. The goal is to infer latent topics predictive of the response” (Blei and McAuliffe 1). Instead of letting the software construct its own distribution over topics, we will provide a fitted model, specifically the annotation form previously mentioned in the methodology (ibid). Then the software can predict a response for the previously designated topics, such as sentiment, nationality, racism, politics, etc.

14

Page 15: Web viewTeam Research Proposal. Team POLITIC. Political Opinions in Literature: Identifying Themes in International Compositions. Robert Cai, Matthew Carr, Adam Elrafei

Team POLITIC

these types of literature (Brown 1-8). We are also pioneering relatively new software and

technology in the realm of literary analysis.

By May 9th, 2012, we plan to have compiled a sample database of several hundred

articles scanned and processed through the OCR software in preparation for a technical seminar

with MITH. Our annotation team hopes to annotate 150 of these articles. The goal of this

seminar is to experiment with some of the available database analysis software to determine how

effectively the computer programs can learn to annotate articles independently and whether any

trends in the metadata begin to surface. We anticipate finding a distinct correlation between the

reception of foreign literature and public attitudes toward foreign policy. We will compile our

completed findings into an additive online database, to which other scholars can contribute

similar research. Over time, our foundation will pave the way to understanding overall patterns

in foreign literature reception.

15

Page 16: Web viewTeam Research Proposal. Team POLITIC. Political Opinions in Literature: Identifying Themes in International Compositions. Robert Cai, Matthew Carr, Adam Elrafei

Team POLITIC

Works Cited

Aubry, Timothy. "Afghanistan Meets the Amazon: Reading the Kite Runner in America."

PMLA: Publications of the Modern Language Association of America 124.1 (2009): 25-

43. EBSCO. Web. 10 Sept. 2011.

Baldasty, Gerald J. E.W. Scripps and the Business of Newspapers. Urbana-Champaign: U of

Illinois P, 1999. Print.

Berson, Alex, Stephen Smith, and Kurt Thearling. Building Data Mining Applications for CRM.

New York: McGraw Hill, (1999): n. pag. Print.

Blei, David M., and Jon D. McAuliffe. “Supervised Topic Models.” Princeton U and U of

California, Berkeley, 2010. Web. 17 Mar. 2012.

Blei, David M. “Introduction to Probabilistic Topic Models.” Communications of the ACM.

Princeton U, n.d. Web. 17 Mar. 2012.

Boyer, Mark A. "Issue Definition and Two-Level Negotiations: An Application to the American

Foreign Policy Process." Diplomacy & Statecraft 11.2 (2000): 185-212. America: History

and Life with Full Text. Web. 27 Nov. 2011.

Brown, Joan L., and Crista Johnson. "Required Reading: The Canon in Spanish and Spanish

American Literature." Hispania 81.1 (1998): 1-19. JSTOR. Web. 12 Sept. 2011.

Chaney, Allison J.B., and David M. Blei. “Visualizing Topic Models.” International AAAI

Conference on Social Media and Weblogs. Princeton U Dept. of Computer Science,

2012. Web. 15 Mar. 2012.

Corse, Sarah M. Nationalism and Literature: The Politics of Culture in Canada and the United

States. Cambridge: Cambridge University Press, 1997. Print.

16

Page 17: Web viewTeam Research Proposal. Team POLITIC. Political Opinions in Literature: Identifying Themes in International Compositions. Robert Cai, Matthew Carr, Adam Elrafei

Team POLITIC

---. "Nations and Novels: Cultural Politics and Literary Use." Social Forces 73.4 (1995): 1279-

308. JSTOR. Web. 8 Sept. 2011.

“Deductions.” New Republic 4 Aug. 1920: 42-3. EBSCOhost. Web. 20 Mar. 2012.

Emerson, Caryl. "Leo Tolstoy On Peace And War." PMLA: Publications Of The Modern

Language Association Of America 124.5 (2009): 1855-58. Academic Search Premier.

Web. 15 Mar. 2012.

Gilens, Martin. “Political Ignorance and Collective Policy Preferences.” American Political

Science Review. 95.2 (2001): 379-96. Web. 29 Nov. 2011.

Goldfarb, Charles. “William Dean Howells: An American Reaction to Tolstoy.” Comparative

Literature Studies 8.4 (1971): 317-37. JSTOR. Web. 12 Mar. 2012.

Griswold, Wendy. "The Fabrication of Meaning: Literary Interpretation in the United States,

Great Britain, and the West Indies." American Journal of Sociology 92.5 (1987): 1077-

115. JSTOR. Web. 13 Sept. 2011.

Haslam, Paul Alexander. "The Evolution of the Foreign Direct Investment Regime in the

Americas." Third World Quarterly 31.7 (2010): 1181-203. Academic Search Premier.

Web. 27 Nov. 2011.

Lee, Lillian, and Bo Pang. “Sentiment of Two Women: Sentiment Analysis and Social Media.”

1900 University Avenue, Cornell University, New York. 22 Mar. 2011. Lecture.

Li, V. “Misgivings of a Tongue-Tied Nation.” Editorial Research Reports 2 (1990): n. pag. Web.

CQ Researcher. 13 Sept. 2011.

Lippmann, Walter, and Charles Merz. “A Test of the News: Introduction.” New Republic 4 Aug.

1920: 1-4. EBSCOhost. Web. 17 Mar. 2012.

17

Page 18: Web viewTeam Research Proposal. Team POLITIC. Political Opinions in Literature: Identifying Themes in International Compositions. Robert Cai, Matthew Carr, Adam Elrafei

Team POLITIC

Mallios, Peter Lancelot. Our Conrad: Constituting American Modernity. Stanford: Stanford UP,

2010. Google Books. Web. 15 Sept. 2011.

Moser, Charles A. "The Achievement Of Constance Garnett." American Scholar 57.3 (1988):

431. Academic Search Premier. Web. 20 Mar. 2012.

National Information Standards Organization. Understanding Metadata. Bethesda: NISO P,

2004. Web. 17 Mar. 2012.

Ohmann, Richard. "The Shaping Of A Canon: U.S. Fiction, 1960-1975." Critical Inquiry 10.1

(1983): 199-223. MLA International Bibliography. Web. 13 Nov. 2011.

Pasterczyk, Catherine E. “Russian Transliteration Variations for Searchers.” Education

Resources Information Center 8.1 (1985): n. pag. Web. 20 Mar. 2012.

Steyvers, Mark. "Probabilistic Topic Models." Handbook of Latent Semantic Analysis. Mahwah,

NJ: Lawrence Erlbaum Associates, 2007.

“TIFF Files.” John Salim Photographic Glossary of Terms. 2012. Web. 20 Mar. 2012.

Travis, Rick. "Problems, Politics, and Policy Streams: A Reconsideration US Foreign Aid

Behavior toward Africa." International Studies Quarterly 54.3 (2010): 797-821.

Academic Search Premier. Web. 27 Nov. 2011.

Wang, Xuerui, and Andrew McCallum. “Topics over Time: A Non-Markov Continuous-Time

Model of Topical Trends.” U of Massachusetts Dept. of Computer Science, 2006. Web.

15 Mar. 2012.

Watson, Robert P., and Sean McCluskie. "Human Rights Considerations and U.S. Foreign

Policy: The Latin American Experience." Social Science Journal 34.2 (1997): 249-57.

Academic Search Premier. Web. 27 Nov. 2011.

18

Page 19: Web viewTeam Research Proposal. Team POLITIC. Political Opinions in Literature: Identifying Themes in International Compositions. Robert Cai, Matthew Carr, Adam Elrafei

Team POLITIC

Appendices

Appendix A: Team Budget

Cost Per Item Cost

Immediate Expenses:

MLA Guide Book (already purchased from MLA.org)

$22.00

Large External Hard Drive (1+ Terabyte)

$300.00

Subtotal: $322.00

Foreseeable Expenses:

Hiring Technical Consultant for Enhancement of Existing Tools

$1,500.00

Travel Expenses (Conferences) $3,000.00

Subtotal: $4,500.00

TOTAL: $4,822.00

19

Page 20: Web viewTeam Research Proposal. Team POLITIC. Political Opinions in Literature: Identifying Themes in International Compositions. Robert Cai, Matthew Carr, Adam Elrafei

Team POLITIC

Appendix B: Team TimelineSpring 2012

o Complete team website

o Continue literature review

o Begin scanning periodicals into constructed Russian literature database

o Begin annotating Russian literature database and select metadata to capture

o Begin coordination with MITH and start to familiarize team with methods of

constructing and analyzing databases

Attempt to automate metadata collection

Summer 2012

o Continue scanning and annotation of Russian literature database

Fall 2012

o Prepare for and present at Junior Colloquium

o Determine methods by which to quantify American foreign policies

Begin construction of Foreign attitude / policy database

Spring 2013

o Present at Undergraduate Research Day

o Being drafting team thesis

Summer 2013

o Continue to draft team thesis

Fall 2013

o Obtain feedback for our thesis paper from Dr. Mallios

o Gather data regarding American foreign policy toward Russia

o Draw conclusions regarding the relationship between American foreign policies

and reception of Russian literature

Winter 2013-14

o Prepare presentation for Thesis Conference

o Revise and edit team thesis

Spring 2014

o Present at Senior Thesis Conference

20

Page 21: Web viewTeam Research Proposal. Team POLITIC. Political Opinions in Literature: Identifying Themes in International Compositions. Robert Cai, Matthew Carr, Adam Elrafei

Team POLITIC

Appendix C: Current Annotation Guidelines

1. Author (or authors) of principal concern in article. What literary author or authors, if any, is this article primarily about?

• Spelling:

--Be sure to spell any names given in answer to this question as accurately as possible, exactly reproducing how the name is spelled in this article. (Spellings will differ between articles: we want to capture the differences.

--Include the fullest version of the author’s name included in the article: i.e., include an author’s first and/or middle names and/or initials if these names are included at any point in the article.

• Individuals: Only literary authors named by personal name (i.e., not anonymous figures or those referenced only by job title) and who are persons (i.e., not publications) count as “authors” for purposes of this question.

• “Literary author” means an author of fiction, poetry, plays, or related forms of creative writing. This applies whether the author is being invoked in his or her capacity as a literary writer or not. Academic professors, literary critics, and journalistic and other commentators on literature do not fall into this category, unless they have significant literary accomplishments of their own.

• An author is of “principal” or “primary” concern in an article when an author is a major, continual, or focal concern that runs and receives explicit mention throughout an article as part of its general field of concerns, not just in discrete or severable paragraphs of it.

• Some more rules of thumb on identifying whether an author is a “primary” or “principal” concern in an article:

• if a literary author’s name is included in the article’s title, it is likely that s/he should be included in the answer to this question

• if there is a large disproportion between the number of times different authors are mentioned or referred to, this is a good indicator that those mentioned less should likely not be included in the answer to this question

21

Page 22: Web viewTeam Research Proposal. Team POLITIC. Political Opinions in Literature: Identifying Themes in International Compositions. Robert Cai, Matthew Carr, Adam Elrafei

Team POLITIC

• if the excising of relatively few paragraphs from this article would result in the elimination of reference to an author, that author should generally not be included in the answer to this question

• as a general matter, construe answers to this question narrowly: only an author (or authors) comprising the main and consistent focus of an article should be included—although articles whose explicit focus is evenly to compare two (or more) authors throughout may be described as having multiple “principal” authors

2. Sentiment Analysis 1: the Opinion of the Article Writer. Which of the following ratings comes closest to the article writer’s expressed opinion of the literary author(s) this article principally concerns? [Note: this question concerns the opinion ultimately taken by the article writer him/herself on the literary authors question. This is so even though the article writer may quote or reference opposing opinions along the way.] This question should be answered separately for each author named in question1.

2 – A Positive Opinion: a generally or ultimately positive opinion as an overall matter.0 – A Negative Opinion: a generally or ultimately negative opinion as an overall matter.U – A Mixed or Unclear Opinion, or No Opinion Offered: it is not possible to say whether the writer’s overall opinion of an author is either positive or negative because the writer’s opinions are mixed, unclear, or not offered at all.

3. Sentiment Analysis 2: Uncertainty of Article Writer’s Opinion. If the answer to Question 2 is “U,” answer the following question; if not skip it. Which of the following ratings comes closest to describing why the article writer’s opinion of a principal literary author is unclear? This question should be answered separately for each author named in question 1.

1 – A Mixed or Unclear Opinion: the article writer either expresses mixed opinions about the literary author, or does not make clear how the opinions, judgments, or values s/he holds clearly relates to the literary authorX – Straight Factual Account: this is not an article in which the article writer’s personality, opinions, judgments, are in evidence; the article writer assumes the position of the “straight,” factual, objective newspaper reporter; the article writer’s stance is neutral with respect to his/her own opinions and values, not evaluative.

4. Sentiment Analysis 3: Principal Author as Subject of Debate. (Y/N) Does this article contain any explicit reference to the literary author(s) it principally concerns as a subject of debate, either because interpretations of that literary author’s meaning are explicitly disputed, or because opposing positive and negative opinions of an author are explicitly referenced?

22

Page 23: Web viewTeam Research Proposal. Team POLITIC. Political Opinions in Literature: Identifying Themes in International Compositions. Robert Cai, Matthew Carr, Adam Elrafei

Team POLITIC

5. Books mentioned? (Y/N). Does this article explicitly mention by title any specific books, poems, or texts written by any literary author it is principally about? Note: this question should be answered separately for each author named in question 1.

6. National identification. (Y/N) Does this article specifically identify the nationality of any literary author it is principally about? Note: this question should be answered separately for each author named in question 1.

7. Style or literary artistry as issue. (Y/N) With respect to any literary author this article is principally about, is the author explicitly described in terms of “art” or as an “artist” or in terms of his or her “artistic” vision, or is at least one paragraph of the article devoted to the style (not the content) of his or her writing? (A “yes” answer to any part of this question means a YES answer to the question as a whole.) Note: this question should be answered separately for each author named in question1.

8. Foreign Place Names. (Y/N) Are there any non-U.S. place names mentioned in this article?

9. Gender of Article Writer. Use the following scale to identify the apparent gender of the writer of this article (i.e., not the gender of the literary figure(s) in question, but the gender of the article writer who is writing about the literary figure(s)):

M – MaleF – FemaleU – Unclear (i.e., because name is ambiguous or initials are used; the article is unsigned; or for another reason)

10. Gender as Issue. (Y/N) Is gender ever explicitly discussed as an issue in this article?

• Note: The fact that a character or author discussed in the article is a man or woman is not sufficient to constitute a Yes answer to this question; there needs to be some explicit attention drawn to gender as a matter of significance—(if only in a single phrase)--or reflection on or significance attributed to the categories of “man” or “woman,” “masculine” or “feminine,” or other gender ideas.

11. Race as Issue. (Y/N) Is race ever explicitly raised as an issue in this article?

• Note: this question should be answered “Yes” only if: (i) the article explicitly uses the term “race” (or some direct variant on it: “racial,” “racism,” etc.); (ii) there is explicit discussion about general ideas of race; or (iii) one of the following radicalized categories is explicitly invoked: black or African; white or Aryan or Caucasian; Slavic; Jewish or Hebrew.

23

Page 24: Web viewTeam Research Proposal. Team POLITIC. Political Opinions in Literature: Identifying Themes in International Compositions. Robert Cai, Matthew Carr, Adam Elrafei

Team POLITIC

12. Socioeconomic class as issue. (Y/N) Does socioeconomic class receive explicit discussion in this article?

• Note: Any explicit mention of social class (for example, “aristocratic,” “peasant,” “the poor,” “Count,” “prince”) will qualify as a YES answer to this question. (Czar, however, as a state figure, does not alone qualify.)

13. Religion as Issue. (Y/N) Does religion receive explicit discussion in this article?

14. Radical Politics as issue. (Y/N) Do any radical political movements including anarchism, nihilism, bolshevism, socialism, or communism receive explicit mention in this article?

15. America/West invoked as a point of similarity with Russia. (Y/N) Does this article make any specific and explicit claims that Russia shares any quality in common with the U.S., “the West,” or any of the countries, cultures, and/or literatures of Western Europe?

16. America/West invoked as point of contrast with Russia. (Y/N) Does this article draw any specific and explicit contrasts between Russia or anything Russian and any qualities or aspects of the U.S., “the West,” or any of the countries, cultures, and/or literatures Western Europe?

24

Page 25: Web viewTeam Research Proposal. Team POLITIC. Political Opinions in Literature: Identifying Themes in International Compositions. Robert Cai, Matthew Carr, Adam Elrafei

Team POLITIC

Appendix D: Sample Annotation Question Evolution

Current Sample Annotation Question

4. Sentiment Analysis: Principal Author as Subject of Debate. (Y/N) Does this article contain any explicit reference to the literary author(s) it principally concerns as a subject of debate, either because interpretations of that literary author’s meaning are explicitly disputed, or because opposing positive and negative opinions of an author are explicitly referenced?

Original Sample Annotation Question

4. Sentiment Analysis: All Opinions Expressed in the Article. [This question concerns all opinions expressed in the article concerning the literary writers in question—whether they express the article’s own point of view or other perspectives quoted and referenced in the article.] Which of the following ratings comes closest to the entire field of opinions quoted or mentioned in this article concerning each of the literary authors the article principally concerns? Note: this question should be answered separately for each author named in question 1.

2 – A Positive Opinion: a generally or ultimately positive opinion as an overall matter1 – A Mixed or Unclear Opinion: such that it is not possible to say whether the article’s

overall opinion of an author is positive or negative0 – A Negative Opinion: a generally or ultimately negative opinion as an overall matterX – Neutral: This article is not evaluative: it does not express opinions about the

author(s) in question, but is rather strictly and neutrally factual

25

Page 26: Web viewTeam Research Proposal. Team POLITIC. Political Opinions in Literature: Identifying Themes in International Compositions. Robert Cai, Matthew Carr, Adam Elrafei

Team POLITIC

Appendix E: Search Results Using the Readers’ Guide Retrospective

Author / Subject Total # Search Results Tolstoy, L.N. 432Chekhov, A.P. 266“russian literature” 193Dostoevsky, F.M. 128Gorky, M. 123Turgenev, I.S. 96Breshko-Breshovskaya, E.K. 53

Appendix F: Alternative spellings of “Dostoevsky”

“dostoevsky” OR “dostoyevsky” OR “dostoevskii” OR “dostoyevskii” OR “dostojevsky” OR

“dostojevskii” OR “dostoeffsky” OR “dostoyeffsky” OR “dostoeffskii” OR “dostoyeffskii” OR

“dostoieffsky” OR “dostoievsky” OR “dostoieffskii” OR “dostoievskii” OR “dosteovsky” OR

“dostoyefsky” OR “dostoievski” OR “dosteoffsky” OR “dosteovskii” OR “dostoefsky” OR

“dostoefskii” OR “dostojefsky” OR “dostojefskii” OR “dostojefski” OR “dostoevski” OR

“dosteovski” OR “dostoyevski” OR “dostojevski” OR “dostojeffski” OR “dostoyeffski” OR

“dostoeffski” OR “dostoieffski” OR “dostoievski” OR “dostojefski” OR “dostoyefski” OR

“dostoefski” OR “dostoiefski”

Alternative spellings research conducted by Nick Slaughter of the Foreign Literatures in America project.

26

Page 27: Web viewTeam Research Proposal. Team POLITIC. Political Opinions in Literature: Identifying Themes in International Compositions. Robert Cai, Matthew Carr, Adam Elrafei

Team POLITIC

Appendix G: Sample Chart of Periodicals within Readers’ Guide Retrospective: 1890-1982

Source Type

ISSN / ISBN

Publication Name Publisher Indexing Start

IndexingStop

Magazine 0163-2027 50 Plus Reader's Digest Association, Inc. 1/1/83 11/1/88

Magazine 1548-2014 AARP the Magazine. AARP 5/1/03

Magazine 1041-102X Ad Astra National Space Society 1/1/89

Magazine 0955-2308 Adults Learning National Institute of Adult Continuing Education

1/1/95

Magazine 0001-8996 Advocate Regent Media 1/16/01

Magazine 0002-0966 Aging Superintendent of Documents 11/1/82 1/1/96

Academic Journal

1205-7398 Alternatives Journal University of Waterloo 1/1/05

Magazine Amazing Wellness Active Interest Media, Inc. 1/1/10

Magazine 1545-8741 Amber Waves: The Economics of Food, Farming, Natural Resources, & Rural America

U.S. Dept. of Agriculture Economic Research Service

2/2/04

Magazine 0002-7049 America America Press 1/1/83

Magazine 0002-7375 American Artist Interweave Press, LLC 1/1/83

Magazine 1540-966X American Conservative American Conservative 1/16/06

Magazine 1079-3690 American Cowboy Active Interest Media, Inc. 2/1/11

Magazine 0194-8008 American Craft American Craft Council 2/1/83

Academic Journal

0002-8304 American Education US Department of Education 12/1/82 1/3/85

Magazine 0361-4751 American Film BPI Communications 1/1/88 1/1/92

Magazine 0002-8541 American Forests American Forests 9/1/92

Academic Journal

1549-4934 American Geographical Society's Focus on Geography

Wiley-Blackwell 10/15/05

Magazine 1523-3359 American Health RD Publications Inc. 1/1/99 10/1/99

Magazine 0730-7004 American Health (0730-7004)

RD Publications Inc. 1/1/88 1/1/97

Magazine 1092-1656 American Health for RD Publications Inc. 12/1/96 1/1/99

27

Page 28: Web viewTeam Research Proposal. Team POLITIC. Political Opinions in Literature: Identifying Themes in International Compositions. Robert Cai, Matthew Carr, Adam Elrafei

Team POLITIC

Women

Magazine 0002-8738 American Heritage AHMC Inc. 2/1/83

Magazine 1076-8866 American History Weider History Group 6/1/94

Magazine 0002-8770 American History Illustrated

Weider History Group 1/1/83 3/1/94

Academic Journal

0095-182X American Indian Quarterly

University of Nebraska Press 1/1/90

Academic Journal

1067-8654 American Journalism Review

University of Maryland 3/1/93

Academic Journal

0003-0937 American Scholar Phi Beta Kappa Society 1/15/83

28

Page 29: Web viewTeam Research Proposal. Team POLITIC. Political Opinions in Literature: Identifying Themes in International Compositions. Robert Cai, Matthew Carr, Adam Elrafei

Team POLITIC

Appendix H: Glossary of Terms

Classification: Supervised (requires human input) method of analyzing text in which the user

first defines labels of how they want a collection of words, sentences, etc, to be classified.

Next, the user creates a training corpus of words, sentences, etc that is already classified

according the specified labels to train the software. The user can then input the collection

of words, sentences, etc. they want to “classify” by the labels.

Corpus: a large body of texts, often the entirety of works by an author, articles by a newspaper,

or writings about a certain subject

Keyword frequencies: How often a word appears in literature

Latent Dirichlet Allocation: Abbreviated LDA, attributes each word in a written document to a

select number of topics determined to compose the document

Optical Character Recognition Software: Abbreviated OCR, translates PDFs and scans of either

handwritten or typed texts into electronic machine readable text

Semantic Parsing/Analysis: Also known as opinion mining, using text analysis to determine

subjective information in written works

Shalmaneser: A supervised tool (requires human input) for semantic and syntactic parsing, which

automatically assigns text to semantic and syntactic classes. Generates output such as the

following figure:

29

Page 30: Web viewTeam Research Proposal. Team POLITIC. Political Opinions in Literature: Identifying Themes in International Compositions. Robert Cai, Matthew Carr, Adam Elrafei

Team POLITIC

Where the original sentence is: “Creeping in its shadow I reached a point whence I could

look straight through the uncurtained window.”

The green text is the generated analysis of the semantics of the sentence and the gray text

is the generated analysis of the syntax of the sentence.

Text Analysis Portal for Research: Abbreviated TAPoR, a collaborative project that permits

researchers to use text analysis tools for the Humanities

Topic modeling: The use of a type of statistical model that generates abstract “topics” in a

database of documents

30