0 textual and quantitative analysis: towards a new, e- mediated social science khurshid ahmad, lee...
TRANSCRIPT
1
Textual and Quantitative Analysis: Towards a new, e-mediated Social Science
Khurshid Ahmad,Lee Gillam, and David Cheng Department of Computing, University of Surrey
2
Outline
Think TankRationality, Bounded Rationality and SentimentNews Analysis and Sentiment AnalysisA method for identifying and extracting sentimentExperiments and EvaluationConclusions and Future Work
3
What is the connection between these pairs of
terms:HAPPY & SADMORE & LESS
NORTH & SOUTHAHEAD & BEHINDHIGHER & LOWER
LOUDER & QUIETERIN PROFIT & IN LOSS
OPERATIONAL & BROKENMORE EXPENSIVE & LESS EXPENSIVE
AT UNIVERSITY & AWAY FROM UNIVERSITY
METROThursday, June 28, 2005, pp 5.
THINK TANK
4
We rely on reviews and opinion polls of various kinds:
Film & TV reviews; Book reviews; Resort reviews
Bank reviews; Automobile Review; White good reviews;
Consumer surveys; ‘write your own’ reviews;
Newspaper editorials; Editors’ choice.
METROThursday, June 28, 2005, pp 5.
THINK TANK
5
We rely on the sentiment of the
reviewers, editors, investment experts, and ……
We do know the cost of durables, shares, holidays.
A reasonable price is rejected if the reviews are poor; an exorbitant price is acceptable if the reviews are good;
Bad reviews stick in the mind for longer than good reviews.
METRO
THINK TANK
6
We rely on the sentiment of
the more vociferous in the society sometimes
The vociferous may call black white, and white black;
The vociferous may repudiate facts and purvey fiction.
METRO
THINK TANK
7
An internal war may be due to bounded rationality: given certain structural conditions – emergent anarchy, economic scarcity, weakening state structures due to globalization – elites and groups make rational decisions to pursue their aims by violent means. Within the bounded context of their decision-making parameters, going to war may be entirely rational.
THINK TANK
Jackson, Richard (2004). ‘The Social Construction of Internal War’ In (Ed.) Richard Jackson. (Re)Constructing Cultures of Violence and Peace. Rodopi: Amsterdam/New York.
8
We rely on the sentiment of
safety expressed by our near and dear, and the media
The dears may have been mugged or burgled: the falling crime rate does not alleviate the fear of crime reassurance gap
METRO
THINK TANK
9
THINK TANK
Turney, Peter D. (2002). “Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews”. In Proc of the 40th Ann. Meeting of the Ass. for Comp. Linguistics (ACL). Philadelphia, July 2002, pp. 417-424. (Available at http://acl.ldc.upenn.edu/P/P02/P02-1053.pdf).
online service unethical practices
online experience low funds
direct deposit other problems
local branch old man
low fees lesser evil
well other virtual monopoly
small part probably wondering
printable version little difference
true service other bank
other bank possible moment
inconveniently located extra day
A new bank has just been launched: Punter Smith has passed his judgement on the bank. Which of the two columns tells us that he likes the new outfit?
10
THINK TANK
Turney, Peter D. (2002). “Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews”. In Proc of the 40th Ann. Meeting of the Ass. for Comp. Linguistics (ACL). Philadelphia, July 2002, pp. 417-424. (Available at http://acl.ldc.upenn.edu/P/P02/P02-1053.pdf).
online service unethical practices
online experience low funds
direct deposit other problems
local branch old man
low fees lesser evil
well other virtual monopoly
small part probably wondering
printable version little difference
true service other bank
other bank possible moment
inconveniently located
extra day
How can a machine detect the positive/negative sentiment from texts? We look at the collocation of words like excellent & poor in text corpus.
The point wise mutual information is computed between word1 & word2:
))()((
)&((),(
21
21
21 wordpwordp
wordwordpwordwordPMI
Semantic orientation of phrase is given as:
),"("
),"(")(
phrasepoorPMI
phraseexcellentPMIphraseSemOr
11
THINK TANK
Phrase Semantic Orientation
Phrase Semantic Orientation
online service 2.780 unethical practices
-8.484
online experience 2.253 low funds -6.843
direct deposit 1.288 other problems -2.748
local branch 0.421 old man -2.566
low fees 0.333 lesser evil -2.288
well other 0.237 virtual monopoly -2.050
small part 0.053 probably wondering
-1.830
printable version -0.705 little difference -1.615
true service -0.732 other bank -0.850
other bank -0.850 possible moment -0.668
inconveniently located
-1.541 extra day -0.286
How can a machine detect the positive/negative sentiment from texts? We look at the collocation of words like excellent & poor in a number of texts.
12
THINK TANK
Phrase Semantic
Orientation
Phrase Semantic
Orientation
online service 2.780 unethical practices -8.484
online experience
2.253 low funds -6.843
direct deposit 1.288 other problems -2.748
local branch 0.421 old man -2.566
low fees 0.333 lesser evil -2.288
well other 0.237 virtual monopoly -2.050
small part 0.053 probably wondering -1.830
printable version
-0.705
little difference -1.615
true service -0.732
other bank -0.850
other bank -0.850
possible moment -0.668
inconveniently located
-1.541 extra day -0.286
How can a machine detect the positive/negative sentiment from texts? We look at the collocation of words like excellent & poor in a number of texts.Note subjectivity: The analyst has chosen the pivotal words poor & excellent.
How well can the method be adapted to other domains?
Adaptive Information Extraction? For automatic choosing the pivots!
13
Japanese yen/US dollar exchange rate (decreasing solid line); US consumer price index (increasing solid line); Japanese consumer price index (increasing dashed line),
1970:1 − 2003:5, monthly observations
THINK TANK
Why is it that Japanese consumer price index is following the same trend as the US CPI?
14
The return series – the first difference values of US $/Japanese Yen exchange (Price t – Price t-1) between
1970-2003, monthly data
THINK TANK
15
The volatility series – the four-week moving average of
the square of the changes in the values of US $/Japanese Yen exchange (Price t – Price t-1) between 1970-2003.
THINK TANK
High Volatility Clusters
16
THINK TANK Robert Engle’s contribution: Volatility may vary considerably over time: large (small) changes in returns are followed by large (small) changes.
Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimatesof the variance of United Kingdom inflation. Econometrica Vol 50, pp 987—1007.
17
THINK TANKEngle and Ng have developed the concept of the news impact curve.
To condition at time t on the information available at t − 2 and thus consider the effect of the shock ε t−1 on the conditional variance ht in isolation.
The conditional variance is affected by the latest information, “the news” ε t−1:
The symmetric case: Both positive and negative news has the same effect.
The assymetric case: a positive and an equally large negative piece of “news” do not have the same effect on the conditional variance.
Engle, R. F. and Ng, V. K (1993). Measuring and testing the impact of news on volatility, Journal of Finance Vol. 48, pp 1749—1777.
2
110
tth
11
2
110
ttthh
18
THINK TANK
Engle, R. F. and Ng, V. K (1993). Measuring and testing the impact of news on volatility, Journal of Finance Vol. 48, pp 1749—1777.
Symmetric caseAsymmetric case
19
Rationality, Bounded Rationality and Sentiment
News Effects I: News Announcements Matter, and
Quickly; II: Announcement Timing Matters III: Volatility Adjusts to News Gradually IV: Pure Announcement Effects are Present
in Volatility V: Announcement Effects are Asymmetric –
Responses Vary with the Sign of the News; VI: The effect on traded volume persists
longer than on prices.
Andersen, T. G., Bollerslev, T., Diebold, F X., & Vega, C. (2002). Micro effects of macro announcements: Real time price discovery in foreign exchange. National Bureau of Economic Research Working Paper 8959, http://www.nber.org/papers/w8959
20
Rationality, Bounded Rationality and Sentiment
The following statements based entirely on statistical analysis of quantitative data:
Bad news in “good times” should have an unusually large impact
In a purely ‘good times’ sample “bad news should have unusually large effects,”
21
Rationality, Bounded Rationality and Sentiment
On average, the effect of macroeconomic news often varies with its sign. In particular, negative surprises often have greater impact than positive surprises.
Andersen, T. G., Bollerslev, T., Diebold, F X., & Vega, C. (2002). Micro effects of macro announcements: Real time price discovery in foreign exchange. National Bureau of Economic Research Working Paper 8959, http://www.nber.org/papers/w8959
22
Rationality, Bounded Rationality and Sentiment
So, where is the news? It is not the news but the timing of the announcement the timings are used as an information proxy.
Andersen, T. G., Bollerslev, T., Diebold, F X., & Vega, C. (2002). Micro effects of macro announcements: Real time price discovery in foreign exchange. National Bureau of Economic Research Working Paper 8959, http://www.nber.org/papers/w8959
23
Rationality, Bounded Rationality and Sentiment Firm-level Information Proxies:
Closed-end fund discount (CEFD); Turnover ratio (in NYSE for example) (TURN) Number of Initial Public Offerings (N-IPO); Average First Day Returns on R-IPO Equity share S Dividend Premium Age of the firm, external finance, ‘size’(log(equity))…….
Each sentiment proxy is likely to include a sentiment component and as well as idiosyncratic or non-sentiment-related components. Principal components analysis is typically used to isolate the common component.
A novel composite index built using Factor Analysis: Sentiment = -0.358CEFDt+0.402TURNt-1+0.414NIPOt
+0.464RIPOt+0.371 St-0.431Pt-1
Baker, M., and Wurgler, J. (2004). "Investor Sentiment and the Cross-Section of Stock Returns," NBER Working Papers 10449, Cambridge, Mass National Bureau of Economic Research, Inc.
24
Rationality, Bounded Rationality and Sentiment
So, where is the news and financial data? There is plenty of it but in a noisy state.Today’s news and figure may contradict yesterdays or, worse still, reinforce false hopes and prejudices.The financial news and data are truly organic data – not manufactured in a laboratory
Numerical data Time series price/value movement of financial
instruments;
c. 5MB/day, per instrument
Textual data Text streams different genres:
news items; financial reports; company brochures; government documents;
market sentiment surveys; interviews
c. 20MB/day
25
The Surrey Society Grids Project
A 24-node data and compute cluster (64 cpus) interfaced to a ‘real world’ data stream (Reuters News and Financial Time series Feed) for capturing, analysing and fusing quantitative and ‘qualitative’ data.Reuters Feed: 2 dedicated data lines, PC and Sun for feed management and associated networking
A small but well-formed grid – for creating a data nursery
26
Surrey Society Grids Architecture
Streaming Textual Data
GRID Cluster24 Slaves
Streaming Numeric DataMain Cluster
Text and Time Series Service
Notify user about results
Distribute Tasks
Receive Results
Send Service Request
1
2
34
Surrey Grid• Given an allocated task, the corresponding data is retrieved from the data providers by the slave machines. • The main cluster monitors the slave machines until they have completed their tasks, and subsequently combines the interim results. • The final result is sent back to the client machine.
27
Surrey Society Grids: Streaming Data
STREAMING ECONOMIC/POLITICAL NEWS-
Reuters; Yahoo; Bloomberg, BBC! Al Jazeera
28
Surrey Society Grid: Performance
Increasing the throughput We have created a 24 node grid infrastructure, which
can provide access to upto 64 processors simultaneously Processing the (complete) RCV1 corpus: 181 million
words in 806,791 texts
No. of processors Time (seconds)
1 (Dell PowerEdge 2650)
53000
16 3572
64 1683
29
Surrey Society Grid: Performance
Automatic extraction and annotation of sentiment bearing words in a 1,000,000 word text corpus –four days output from Reuters news feed – using automatically extracted key words and an automatically extracted local grammar for pattern identification.
0
50
100
150
200
250
300
350
400
450
0 6 12 18 24 30 36 42
Hours from midnight Nov. 15th, 2004
Nu
mb
er
of
wo
rds
Filtered Positive
Filtered Negative
30
Surrey Society Grid: Algorithms and
Methods
We have developed a for visualising and correlating the sentiment and instrument time series both as text (and numbers) and graphically as well.
31
Surrey Society Grid: Algorithms and
Methods
Interface the grid to local news media (e.g. Bradford Argus & Burnley Express) and local data repositories – crime statistics (crime surveys and police data), ethnicity compliance data, housing queues, field data
32
Surrey Society Grid: Social Science Data?
The real world GenreNews Reports; Regulatory Body Reports
Informative
Commentaries; Letters to the Editors; Rumour-laden e-mails
Appelative
Semi-structured interviews; Confidence Surveys
Expressive
Language and text are constitutive (and not merely representational): but ‘society is not reducible to language and linguistic analysis (Hodgson 2000:62). Discourses are broader than language, being constituted notjust in texts, but also in definite institutional and organizational practices’ (Jackson 2004). But text is all we have after the event, the interview, the survey
33
Surrey Society Grid: Social Science Data?
Financial Economics
Sociology of Crime; Crime
Science
Social Anthropology
Macro-micro Economic Indicators; Census Statistics;Survey of Social Attitudes;
Life-style and Well-being Statistics;
Market Movement Crime Statistics
Ethnicity-related data
Political News – Reports, Editorials, Letters to the Editor; Political and Social Opinion Polls;
Consumer Confidence Survey;
Investor/Trader Confidence Surveys; Regulatory Body Output;Financial News;
Citizen Confidence Surveys; Police Forces/Home Office Reports;Crime Reports;
Ethnic Minority Surveys; Police Forces/Home Office Reports;Crime Reports;
34
Surrey Society Grid: Social Science Data?
•There is no visible technique in social science research methodology that can improve the researchers productivity in collecting and analysing large volumes of speech and text.
•Social scientists survey, and occasionally interview, interesting individuals in various social groups – analyse the survey form and quantify.
•So what about the data collected in the field. Data is buried in tombs never to be taken out again.
•Most text, if ever, is hand-coded by the social science researcher and then the proxy of the interpretation of the codes is presented as objective analysis.
The real world
Genre
News Reports; Regulatory Body Reports
Informative
Commentaries; Letters to the Editors; Rumour-laden e-mails
Appelative
Semi-structured interviews; Confidence Surveys
Expressive
35
Surrey Society Grid: A Case Study
The real world
Genre
News Reports; Regulatory Body Reports
Informative
Commentaries; Letters to the Editors; Rumour-laden e-mails
Appelative
Semi-structured interviews; Confidence Surveys
Expressive
•We present a method for systematically identifying sentiment bearing phrases in large volumes of streaming texts – a local grammar comprising templates to extract the phrases with a minimal number of false positives.
•The sentiments are aligned with quantitative (time-varying) information and results co-integrated and tested for Granger causality
•The grammar itself is constructed automatically from a corpus of domain specific texts
36
Surrey Society Grid: A Case Study
Of all the contested boundaries that define the discipline of sociology, none is more crucial than the divide between sociology and economics […] Talcott Parsons, for all [his] synthesizing ambitions, solidified the divide. “Basically,” […] “Parsons made a pact ... you, economists, study value; we, the sociologists, will study values.”If the financial markets are the core of many high-modern economies, so at their core is arbitrage: the exploitation of discrepancies in the prices of identical or similar assets. Arbitrage is pivotal to the economic theory of financial markets. It allows markets to be posited as efficient without all individual investors having to be assumed to be economically rational.
MacKenzie, Donald. 2000b. “Long-Term Capital Management: a Sociological Essay.” In (Eds) in Okönomie und Gesellschaft, Herbert Kaltoff, Richard Rottenburg and Hans-Jürgen Wagener. Marberg: Metropolis. Pp 277-287.
37
Rationality, Bounded Rationality and Sentiment
A financial economist can analyse quantitative data using a large body of methods and techniques in statistical time series analysis on “fundamental data”, related, for example, to fixed assets of an enterprise, and on “technical data”, for example, share price movement;The economist can study the behaviour of a financial instrument, for example individual shares or currencies, or aggregated indices associated with stock exchanges, by looking at the changes in the value of the instrument at different time scales – ranging from minutes to decades;Financial investors/traders are trying to discover the market sentiment, looking for consensus in expectations, rising prices on falling volumes, and information/assistance from back-office analysts;The efficient market hypothesis suggests that quirks caused by sentiments can be rectified by the supposed inherent rationality of the majority of the players in the market
38
Rationality, Bounded Rationality and Sentiment
Recent developments in financial economics, signified by the emergence of derivatives and arbitrage, show the triumph of rational reasoning: such instruments/strategies were created on the basis of mathematical models (Black and Scholes 1972), and the trading can be monitored using the self same models (Miller 1990);
The assumption of overarching rational behaviour has been reviewed by Herbert Simon (1978/1992) and Daniel Kahnneman (2003), and arguments have been presented in favour of a model of bounded rationality where the actors in a given social situation prefer to ignore facts and trust their own version of reality and the efficient market mechanisms fail to operate;
39
News Analysis and Sentiment Analysis
Qualitative research methods are being used in financial economics, and in sociological studies of financial markets, for systematically studying the hopes and fears of the traders, investors, and regulators in the analysis of the behaviour of the markets.Since 2000, the analysis of news wire has become selective and targeted. Some researchers choose news related to economic and financial topics
news about employment distinguish between scheduled and non-scheduled news
announcements;
40
News Analysis and Sentiment Analysis
Some pre-select keywords that indicate change in the value of a financial instrument – including metaphorical terms like above, below, up and down – and use them to ‘represent’ positive/negative news stories.Some use the frequency of collocational patterns for assigning a ‘feel-good/bad’ score to the story
‘Good’ news stories appear to comprise collocates like revenues rose, share rose;
‘Bad’ news stories contain profit warning, poor expectation;
‘Neutral’ stories contain collocates such as announces product, alliance made;
The ‘sentiment’ of the story is then correlated with that of a financial instrument cited in the stories and inferences made.
DeGennaro, R., and R. Shrieves (1997): ‘Public information releases, private informationarrival and volatility in the foreign exchange market’. Journal of Empirical Finance Vol. 4, pp 295–315. ;Koppel, M and Shtrimberg, I. (2004). ‘Good News or Bad News? Let the Market Decide’. In AAAI Spring Symposium on Exploring Attitude and Affect in Text. Palo Alto: AAAI Press. pp. 86-88;
41
A method for identifying and extracting sentiment
No proxies – but the real dataWe adopt a text-driven and bottom-up method: starting from a collection of texts in a specialist domain, together with a representative general language corpus, A five-step algorithm for identifying discourse patterns with more or less unique meanings, without any overt access to an external knowledge base
42
An algorithm for identifying and extracting sentiment
I. Select training corpora: a randomly sampled special language corpus and a general language corpus.
II. Extract key words;III. Extract key collocates;IV. Extract local grammar using collocation
analysis and relevance feedback;V. Assert the grammar as a finite state
automaton.
43
Experiments and Evaluation of sentiment analysis method
I. Select training corpora
Training-Corpus The British National Corpus, comprising
100-million tokens distributed over 4124 texts (Aston and Burnard 1998);
Reuters Corpus Volume 1 (RCV1) comprising news texts produced in 1996-1997 and contains 181 million words distributed over 806,791 texts
44
Experiments and Evaluation of sentiment analysis method
II. Extract key words The frequencies of individual words in the RCV1 were
computed using System Quirk; For describing how our method works we will use a
randomly selected component of the corpus – the output of February 1997, henceforth referred to as the RCV1-Feb97 corpus;
The RCV1-Feb97 corpus containing 14 Million words distributed 63,364 texts.
45
Experiments and Evaluation of sentiment analysis method
Ranks
RCV1 Feb97(NRCV1Feb97=14 Million)
Cumulative
Number of
Tokens (%)
British NationalCorpus
(NBNC=100 Million)
Cumulative
Number of
Tokens (%)
1-10 the, to, of, in, a, and, said, on, s, for
0.87 M(21.3%)
the, of, and, a, in, to, for, is, as, that
22.3 M(22.3%)
11-20 at, that, was, is, it, by, with, from, percent, be
0.28 M(6.8%)
was, I, on, with, as, be, he, you, at, by
6.51 M(6.5 %)
21-30 as, he, million, year, its, will, but, has, would, were
0.17 M(4.2%)
are, this, have, but, not, from, had, his, they, or
4.23 M(4.2%)
31-40 an, not, are, have, which, had, up, n, new, market
0.13M(3.3%)
which, an, she, where, here, we, one, there, all, been
3.05 M(3.1%)
41-50 this, we, after, one, last, company, u, they, bank, government
0.10M(2.6%)
their, if, has, will, so, would, no, what, can, when
2.35 M(2.4%)
46
Experiments and Evaluation of sentiment analysis method
Token RCV1 Feb97
(NRCV1Feb97= 14,244,349) BNC
(NBNC=100,000,000) Weirdne
ss(a/b)
Rank fRCV1Feb97 fRCV1Feb97 /
NRCV1Feb97
(a)
Rank fBNC fBNC / NBNC
(b)
percent 19 65763 0.462% 3394 2928 0.003% 157.84
market 40 36349 0.255% 301 30078 0.030% 8.49
company
46 29058 0.204% 219 40118 0.040% 5.09
bank 49 28041 0.197% 562 17932 0.018% 10.99
shares 56 23352 0.164% 1285 8412 0.008% 19.51
47
Experiments and Evaluation of sentiment analysis method
III. Extract key collocates
f Left Right Total z-score
percent 65763
up 5315 4360 955 5315 15.91
rose 4361 3988 373 4361 13.04
rise 2391 980 1411 2391 7.12
down 2291 1636 655 2291 6.82
fell 2074 1844 230 2074 6.17
48
Experiments and Evaluation of sentiment analysis method
IV. Extract local grammar using collocation and relevance feedback
Pattern f Collocate
Left Right
z-score
10 percent to 108 rose 24 0 5.45
by 10 percent to 18 rose 5 0 2.27
rose 10 percent to
14 billion 0 7 4.24
rose 20 percent to
11 billion 1 7 6.02
49
Experiments and Evaluation of sentiment analysis method
V. Assert the grammar as a finite state automaton The (re-) collocation patterns can then be asserted as a finite state automata
for each of the movement verbs and spatial preposition metaphors
50
Experiments and Evaluation of sentiment analysis method
V. Assert the grammar as a finite state automaton The (re-) collocation patterns can then be asserted as a finite state automata for each of the
movement verbs and spatial preposition metaphors
51
Experiments and Evaluation of sentiment analysis method
V. Assert the grammar as a finite state automaton The (re-) collocation patterns can then be asserted as a finite state automata for each of the
movement verbs and spatial preposition metaphors
52
Experiments and Evaluation of sentiment analysis method
V. Assert the grammar as a finite state automaton The (re-) collocation patterns can then be asserted as a finite state automata for each of the
movement verbs and spatial preposition metaphors
53
Experiments and Evaluation of sentiment analysis method
•The local grammar is used sentences that contain sentiment bearing phrases and can automatically annotate the phrases.•The graph shows the filtering power of the local grammar patterns: identifies between 1,000 to 10,000 sentiment words hourly in a corpus of between 10,000 to 100,000 tokens per hour to find between 10 to 100 ‘true’ sentiment bearing sentences
0
1
2
3
4
5
6
7
0 6 12 18 24 30 36 42Hours from midnight Nov. 15th, 2004
Nu
mb
er o
f wo
rds
(Lo
g s
cale
)
Raw Sentiment
Filtered Sentiment
Total number of Tokens
54
Experiments and Evaluation of sentiment analysis method
Changes in the total number of positive/negative words together with those that are used in the local grammars (filtered positive / negative words) and total number of words.
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
6
0 6 12 18 24 30 36 42
Hours from midnight Nov. 15th, 2004
Nu
mb
er
of
wo
rds
(L
og
sc
ale
)
Raw Positive Words
Raw Negative Words
Filtered Positive Words
Filtered Negative Words
Total Number of Words
55
Experiments and Evaluation of sentiment analysis method
Changes in the total number of positive/negative words together with those that are used in the local grammars (filtered positive / negative words) and total number of words.
0
50
100
150
200
250
300
350
400
450
0 6 12 18 24 30 36 42
Hours from midnight Nov. 15th, 2004
Nu
mb
er
of
wo
rds
Filtered Positive
Filtered Negative
56
Experiments and Evaluation of sentiment analysis method
Increasing the throughput We have created a 24 node grid
infrastructure, which can provide access to upto 64 processors simultaneously
Processing the (complete) RCV1 corpus (181 million words in 806,791 texts) on a single machine (a Dell PowerEdge 2650) takes 53300 seconds
Using 16 processors we gain a throughput increase by a factor of 15 (3572 seconds);
Using 64 processors, the time is halved again (1683 seconds).
57
Conclusions and Future Work
Though we have devised programs that can learn unambiguous patterns of use of positive or negative sentiment, a sentence is always used in the context of other sentences and the context may change if the inference is made on the basis of one sentence only;One can argue that a new text is a response to some or all of the existing texts, and in that sense each text is contextualised within a network of other texts - even if all the existing texts unambiguously expressed a positive sentiment, a new text with strong negative sentiment may invalidate all of the positive sentiment.
58
Conclusions and Future Work
Range of quantitative analysis techniques includes wavelet analysis (Ahmad et al 2004), fuzzy-logic knowledge bases (Poopola et al 2004), and case-based reasoning;
These techniques may be used to create a confidence index – or sentiment index;
These techniques can be extended to the new areas like the reassurance gap in policing totalising war discourse that leads to ethnic/racial
conflicts
59
Conclusions and Future Work
Quantitative analysis methods developed in the Surrey Society Grids project can be used in the analysis of on-line or accessible data such as crime statistics, for sociology of crime, and labour force surveys, based on race/ethnicity for anthropology;
The fusion of the results of the textual and quantitative analysis can, in turn, be used to automatically produce a crime confidence index, for measuring the fear of crime, and a conflict index, for measuring ethnic/racial tension in a community;
60
Conclusions and Future Work
Data Sources Financial Economics
Sociology of Crime; Crime
Science
Social Anthropology
Quantitative
Macro-micro Economic Indicators; Census Statistics;
Survey of Social Attitudes; Life-style and Well-being Statistics;
Market Movement
Crime Statistics
Ethnicity-related data
Qualitative
Political News – Reports, Editorials, Letters to the Editor;
Political and Social Opinion Polls; Consumer Confidence Survey;
Investor/Trader Confidence Surveys; Regulatory Body Output;Financial News;
Citizen Confidence Surveys; Police Forces/Home Office Reports;Crime Reports;
Ethnic Minority ; Police Forces/Home Office Reports;Crime Reports;
61
Investor Psychology Sociology of Crime Anthropology of Ethnicity
Methods/ Techniques
Financial News and Reports; State-of-the-Economy Reports; Company Reports.
National News Reportage & Editorials; Police Authority & Other Reports; Policy Documents
National and International News Reportage & Editorials; Local Govt. Reports; Policy Documents
Corpus Ling. & IE: Terminological, Grammatical and Ontological Analysis for Identifying and Disambiguating sentiment and named entities
News Commentaries on financial instruments.
‘Letters to the Editor’; Web Sites
‘Letters to the Editor’; Web Sites
Ditto
Focus Group Encounters
Semi-structured interviews
Semi-structured interviews
Discourse Analysis
Qualitative data Informative
Appellative
Expressive Executive movements;
corporate entity identification
Anonymisation of field data
IE: Named Entity extractors
Technical Data (e.g. Stock Price Movement; Price/Earning Ratio)
Crime Statistics Labour Force Surveys; Educational Achievement Surveys
Wavelet analysis; Monte-Carlo type bootstrapping
Company demographics – fixed assets
UK census data
UK census data
Data Analysis; Aggregation; Visualisation; Case-Based Reasoning (CBR)
Quantitative data High Frequency
(Numerical)
Low Frequency (Numerical)
Indeterminate
Questionnaires Questionnaires Questionnaires Ditto
Confidence Index
Crime Index
Conflict Index
Data Mining; Visualisation techniques
Fusion
Investment decision (buy/sell)
Policy formation / evaluation
Policy formation / evaluation
Ontology learning for Rule-based / Case-Based Reasoning
62
INVESTOR PSYCHOLOGY
SOCIOLOGY OF CRIME
ANTHROPOLOGY OF ETHNICITY
METHODS/ TECHNIQUES
Qualitative data INFORMATIVE
APPELLATIVE
EXPRESSIVE
Quantitative data HIGH FREQUENCY
(NUMERICAL)
LOW FREQUENCY (NUMERICAL)
INDETERMINATE
Fusion