ai seminar

57
August 29, 2001 Melanie Martin - AI Semin ar 1 AI Seminar Our web page is at: www.cs.nmsu.edu/~gradrep Under “Events” in left frame

Upload: kelli

Post on 06-Jan-2016

24 views

Category:

Documents


0 download

DESCRIPTION

AI Seminar. Our web page is at: www.cs.nmsu.edu/~gradrep Under “Events” in left frame. Identifying Ideological Point of View. Melanie Martin August 29, 2001. Outline of this presentation. What is AI??? Introduction and Motivation The proposed system Ideology Discourse - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 1

AI Seminar

Our web page is at:

www.cs.nmsu.edu/~gradrep

Under “Events” in left frame

Page 2: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 2

Identifying Ideological Point of View

Melanie Martin

August 29, 2001

Page 3: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 3

Outline of this presentation What is AI??? Introduction and Motivation The proposed system Ideology Discourse Statistical NLP and Machine Learning Internet Conclusion

Page 4: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 4

What is AI???

“The practice of designing systems that possess and acquire knowledge and reason with knowledge.” (Tanimoto 1987)

“The design and study of computer programs that behave intelligently.” (Dean, Allen, Aloimonos 1995)

“The branch of computer science concerned with making computers behave like humans.” (Webopedia)

Page 5: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 5

What is AI???

But then, what is intelligence???– “the capacity for learning, reasoning,

understanding, and similar forms of mental activity; aptitude in grasping truths, relationships, facts, meanings, etc.” (Webster’s Encyclopedic Unabridged Dictionary of the English Language 1996)

Page 6: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 6

What is AI???

Agents Data Mining Expert Systems Games and Search

Knowledge Representation

Machine Learning Theory, Case-Based, Rule

Learning, ...

Natural Language Processing Planning

Robotics

Speech Theorem Proving

Vision & Pattern Recognition

Categories under AI on Cora

http://cora.whizbang.com/

Page 7: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 7

What is AI???

Goals in AI– Engineering: Solve real-world problems.

Build systems that exhibit intelligent behavior.

– Scientific: Understand what kind of computational mechanisms and knowledge are needed for modeling intelligent behavior.

Page 8: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 8

What is AI??? Do we really want to model humans?

– Seem like our best example, but….– Should we build airplanes with wings that

flap like birds? How do we know we did it?

– Turing test?• Focus on behavior instead of internal algorithm• Defines success in terms of human intelligence• Not well founded

Page 9: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 9

What is AI???

A couple of recurring issues:– How important is cognitive modeling in our

systems?– How do we balance scientific and

engineering goals?– How do we evaluate our system?

Page 10: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 10

What is AI???

So let’s get to the system we want to talk about today…..

This system will be in the area of Natural Language Processing aka Computational Linguistics

Page 11: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 11

Outline of this presentation What is AI??? Introduction and Motivation The proposed system Ideology Discourse Statistical NLP and Machine Learning Internet Conclusion

Page 12: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 12

Introduction and Motivation

Your back hurts, so you go to the web to find out what you can do, but there is too much information!

You are still bothered by the Florida election results and want to read a few sample articles with differing points of view. How can you find them?

Page 13: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 13

Introduction and Motivation

Suppose we could take information from web pages and Usenet newsgroups on a given topic and segment, classify or cluster it by ideological point of view…..

This talk is about what it might take to develop such a system.

Page 14: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 14

Introduction and Motivation

Sounds like a cool toy, but would it make any research contribution?

Areas where it could contribute:– natural language understanding– information retrieval– information extraction– internet structure

Page 15: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 15

Introduction and Motivation

But will it save the world?

Maybe not, but there is social value in analyzing ideological point of view– find implicit ideological content– better informed, more rational discussion of

important issues

Page 16: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 16

Outline of this presentation What is AI??? Introduction and Motivation The proposed system Ideology Discourse Statistical NLP and Machine Learning Internet Conclusion

Page 17: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 17

The Proposed System

Let’s recall what we want to do:

Build a system that could take information from web pages and Usenet newsgroups on a given topic and segment, classify or cluster it by ideological point of view…..

Page 18: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 18

The Proposed System

IdeologicalClassifier

TopicClassifier,

Filter

Set of documents

on topic

Internet:Web pages,

Usenet

Docs ontopic

classified by IPV

SearchEngine

User inputstopic

Page 19: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 19

The Proposed System

Immediately some issues arise:– Can we come up with a definition of

ideological point of view that is computationally feasible?

– To what extent do we need to understand the text?

– Would modeling human text understanding help?

Page 20: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 20

The Proposed System

More issues:– Can the structure of the internet help us?– What kind of knowledge is needed and can

it be learned?– How are we going to evaluate our system?

Page 21: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 21

Outline of this presentation What is AI??? Introduction and Motivation The proposed system Ideology Discourse Statistical NLP and Machine Learning Internet Conclusion

Page 22: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 22

Ideology

Working definition from van Dijk: “Ideologies are the fundamental beliefs of a group and its members.”– No negative evaluation– Subjective, since beliefs are subjective– Discourse plays a key role in development

and promulgation of ideologies

Page 23: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 23

Ideology What do we mean by groups?

– More than one person– Fewer than the entire society or culture– Some level of permanency or common

goals– Some membership criteria– Member identification with the group– Basis for self-definition and commonality– Structure, possibly informal

Page 24: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 24

Ideology

General strategy of most ideological discourse (van Dijk’s Ideological Square):

– Emphasize positive things about Us– Emphasize negative things about Them– De-emphasize negative things about Us– De-emphasize positive things about Them

Polarization; Us versus Them

Page 25: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 25

Ideology

How are these strategies instantiated in discourse?– What is there:

• argument structure• syntactic patterns• style and non-literal language• actor descriptions• thematic structure• topoi

Page 26: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 26

Ideology

– What is not there• implication• presupposition• inference• goals and plans

Page 27: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 27

Ideology

Disclaimers, selected examples:– Apparent Negation: I have nothing against X, but...– Apparent Concession: They may be very smart,

but...– Apparent Empathy: They may have had problems,

but...– Apparent Effort: We do everything we can, but...

Positive self-representation and face keeping

Page 28: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 28

Ideology

Linguistics– van Dijk (1998)– Blommaert & Verschueren (1998)– Wang (1993)– Wortham & Locher (1996)

Page 29: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 29

Ideology

The Systems– Ideology Machine -1965 to 1973 - Abelson et al.– Tale-Spin - 1976 - Meehan– Politics - 1979 - Carbonell– Pauline - 1987 - Hovy– Viewgen - 1991 - Ballim & Wilks– Tracking Point of View in Narrative - 1994 - Wiebe– Spin Doctor - 1994 - Sack– Terminal Time - 2000 - Mateas et al.

Page 30: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 30

Ideology

Some issues– Evaluation!!!– Hard-coded knowledge– Domain dependence– Cognitive plausibility– More precise definitions

Page 31: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 31

Ideology

What do we want to take with us?– van Dijk’s definitions augmented by Sack

and Wiebe– mine everything for clues to ideological

point of view

Page 32: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 32

Outline of this presentation What is AI??? Introduction and Motivation The proposed system Ideology Discourse Statistical NLP and Machine Learning Internet Conclusion

Page 33: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 33

Discourse

Now that we have a working definition of ideology and some ideas about things that might be clues, the question becomes how to find them?

First we are going to look at theories of discourse structure that might be useful.

Page 34: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 34

Discourse

Computational Linguistics – Hobbs (1979)– Mann & Thompson (RST) (1988)– Grosz & Sidner (G&S) (1986)– Morris & Hirst (Lexical chains) (1991)

Psycholinguistics– Kintsch (1994)

Page 35: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 35

Discourse Issues

– do we need it at all?– implementation

• Hobbs, G&S, RST

– finite number of fixed primitives• Hobbs, RST

– world knowledge• Hobbs

– domain specific

Page 36: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 36

Discourse

A reasonable first approach: Lexical Chains (Morris & Hirst)

Sequences of related words spanning a topical unit in the text– based on lexical cohesion– encapsulates context– helps identify key phrases

Page 37: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 37

Discourse

Lexical chains could help us in:– topic segmentation– intentional structure– lexical features for a classifier

Page 38: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 38

Discourse

Lexical chains are easy to implement, but are unlikely to be sufficient…

For the next approximation: RST– Marcu’s implementation incorporating G&S– Mostly used for summarization and

generation– Would help get at the argument structure

of the text

Page 39: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 39

Discourse

Would most likely use RST to generate features for a classifier or as input to a pattern recognizer

Nuclei spans help pick out the more important segments of text

Produces a tree that gives the structure of the rhetorical structure of the text

Page 40: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 40

Discourse

None of the discourse theories look like the are going to stand alone– may be able to give us structural, lexical

and other features – need to consider classification or clustering

based on these features– so we turn to….

Page 41: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 41

Outline of this presentation What is AI??? Introduction and Motivation The proposed system Ideology Discourse Statistical NLP and Machine Learning Internet Conclusion

Page 42: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 42

Statistical NLP and ML

Two techniques we will consider– Latent Semantic Analysis– Probabilistic Classification

Page 43: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 43

Statistical NLP and ML

Issues– clustering versus classification

• categories may not be predefined• may want to take a variety of features into

account

– favor learning over hard-coding knowledge– supervised versus unsupervised

• cost of annotated training data

Page 44: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 44

Statistical NLP and ML

Latent Semantic Analysis– text represented as a matrix

• entries are weighted frequency of word in context

– semantic space obtained through SVD• words appearing in similar context have similar

feature vectors

– characterizes semantic content of words in context

Page 45: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 45

Statistical NLP and ML

Why LSA is a good choice here– semantics is key component of ideological

discourse– clustering without need for predefined

categories– already shown useful for:

• summarization (Ando 2000)• text segmentation (Choi 2001)• measuring text coherence (Foltz 1998)

Page 46: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 46

Statistical NLP and ML

But LSA doesn’t use all of the stuff we just spent all this time talking about…

What if it doesn’t work very well? Another option is a probabilistic

classifier– assigns most probable class to an object

bases on a probability model

Page 47: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 47

Statistical NLP and ML

Probability model– defines joint distribution of variables

• set of feature variables and a class variable

Wiebe and Bruce (1995) got around the issue of not knowing the classes in advance by breaking up the problem and using a series of classifiers

Page 48: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 48

Statistical NLP and ML

Maybe this will work after all and we can use some of the features we have been talking about

Deciding which features to use can be determined statistically with goodness of fit of graphical models

Page 49: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 49

Statistical NLP and ML

Both methods seem to have a lot of potential

LSA would be easier to implement – possibly a baseline for evaluation of

probabilistic classifiers Less linguistic knowledge gain likely

with LSA

Page 50: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 50

Outline of this presentation What is AI??? Introduction and Motivation The proposed system Ideology Discourse Statistical NLP and Machine Learning Internet Conclusion

Page 51: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 51

Internet

We would like to mine the structure of the internet – see if there is a correspondence with

groups– improved IR by topic– figure out what search engine to use as a

base for our system

Page 52: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 52

Internet

Structure papers– Kleinberg (1997)– Kleinberg et al. (1999)– Terveen et al. (1999)– Whittaker et al. (1998)

Page 53: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 53

Internet

Issues– topic or query disambiguation– what is a minimal unit– how to use the structure of the web

• finding authorities• communities and subgraphs

– Evaluation!!!

Page 54: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 54

Internet

Kleinberg (1997)– link based model– hub - links to many related authorities– authority– iterative weighting algorithm that

converges (rapidly in practice)– can disambiguate authorities by sense– can be used to trawl for cyber communities

Page 55: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 55

Outline of this presentation What is AI??? Introduction and Motivation The proposed system Ideology Discourse Statistical NLP and Machine Learning Internet Conclusion

Page 56: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 56

Conclusion It seems that such a system can be built

– find a good search engine– use Kleinberg’s algorithm to improve

collection of documents retrieved– use LSA and/or a probabilistic classifier to

handle the ideological point of view– with a probabilistic classifier use features

discussed in the ideology and discourse sections

Page 57: AI Seminar

August 29, 2001 Melanie Martin - AI Seminar 57

The End

Thanks for listening!

If you want to know more, my Comprehensive Exam paper is at:

www.CS.NMSU.Edu/~mmartin/courses/comps_all.html