towards identifying unresolved discussions in student online forums

22
Towards Identifying Unresolved Discussions in Student Online Forums Jihie Kim, Jia Li, and Taehwan Kim Information Sciences Institute/ University of Southern California http://ai.isi.edu/discourse [email protected] 1

Upload: arwen

Post on 31-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

Towards Identifying Unresolved Discussions in Student Online Forums. Jihie Kim, Jia Li, and Taehwan Kim Information Sciences Institute/ University of Southern California http://ai.isi.edu/discourse [email protected]. “ Talk to as many other people as possible. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Towards Identifying Unresolved Discussions in Student Online Forums

Towards Identifying Unresolved Discussions in Student Online Forums

Jihie Kim, Jia Li, and Taehwan Kim

Information Sciences Institute/

University of Southern California

http://ai.isi.edu/discourse

[email protected]

1

Page 2: Towards Identifying Unresolved Discussions in Student Online Forums

Pedagogical Discourse Jihie Kim/USC-ISI

2

“Talk to as many other people as possible.

CS is learned by talking to others, not by reading,

or so it seems to me now.”

-- Advice from an undergraduate computer science studenthttp://www-scf.usc.edu/~csci402/

Page 3: Towards Identifying Unresolved Discussions in Student Online Forums

Pedagogical Discourse Jihie Kim/USC-ISI

3

Discussion Board and Corpora

15 semesters running…

CS and Engineering courses Undergrad/Graduate USC/Non-USC Almost 800 students Over 8000 messages

15 semesters running…

CS and Engineering courses Undergrad/Graduate USC/Non-USC Almost 800 students Over 8000 messages

Extensible open-source discussion board (phpBB) serves as a platform for bridging ISI research and USC teaching practice

Page 4: Towards Identifying Unresolved Discussions in Student Online Forums

Pedagogical Discourse Jihie Kim/USC-ISI

4

Student Messages in an Undergraduate Operating Systems Course

Text is incoherent and

ungrammatical.

Problem description: Non-

factoid questions are difficult

to identify, dependent on

context, and may include

multiple sentences or

paragraphs.

Answers require explanations.

Page 5: Towards Identifying Unresolved Discussions in Student Online Forums

Pedagogical Discourse Jihie Kim/USC-ISI

5

Thread Length Distribution

Data from an undergraduate CS Course

0

100

200

300

400

500

600

# of threads

1 3 5 7 9 11 13 15 18 20 31 # of posts

Statistics of thread length

Data from a graduate CS Course

0

2

4

6

8

10

12

14

16

18

1 2 3 4 5 6 8 9 10 12 16

# of threads

# of messages

Threads are often very short, many consisting of only 1-2 messages

Students jump into programming details without understanding larger picture or related concepts

TA and instructors are not always available to fully guide interactions

# of messages

Need of Discussion Assessment and Scaffolding

Page 6: Towards Identifying Unresolved Discussions in Student Online Forums

Pedagogical Discourse Jihie Kim/USC-ISI

6

PedDiscourse Research

Discussion Assessment Which discussions need instructor attention?

Who is asking and answering questions?

What topics are discussed when?

Discussion Scaffolding Promote reflection

Promote collaboration among students

Page 7: Towards Identifying Unresolved Discussions in Student Online Forums

Pedagogical Discourse Jihie Kim/USC-ISI

7

Individual messages Topic, quantity

Relations among messages Response/Replies Roles that a message play

Discussion threads Thread lengths and quantity Discussion Topic Discussion Focus

Related course data Notes, web pages, readings Assignments and projects

. . .

Modeling discussion threads

Page 8: Towards Identifying Unresolved Discussions in Student Online Forums

Pedagogical Discourse Jihie Kim/USC-ISI

8

Discussion Assessment

Which discussions need instructor attention? Identify roles that individual messages play (ques,

ans, ack, etc.) Analyze patterns of message roles Find discussion threads without an answer for the

initial question

Page 9: Towards Identifying Unresolved Discussions in Student Online Forums

Pedagogical Discourse Jihie Kim/USC-ISI

9

Roles of individual messages

Use Searle’s theory of Speech Acts (Searle, 1969) to model threaded discussions

Speech Acts • Choose SAs to use

Question (QUES), Answer or Suggestion (ANS-SUG), Correction or Objection (Neg-Ack), …..

• Provide relationship between a pair of messages• Multiple SA’s per pair of messages in thread• A single message can be related (via SAs) with

multiple messages

Page 10: Towards Identifying Unresolved Discussions in Student Online Forums

Pedagogical Discourse Jihie Kim/USC-ISI

10

Speech Acts (SAs) in a discussion thread

S1

S2

S1

I am still confused. I understand it is in the same address space as the parent process, where do we allocate the 8 pages of mem for it? And how do we keep track of .....? … I am sure it is a simple concept that I am just missing.

I am still confused. I understand it is in the same address space as the parent process, where do we allocate the 8 pages of mem for it? And how do we keep track of .....? … I am sure it is a simple concept that I am just missing.

S3

read the student documentation for the Fork syscall read the student documentation for the Fork syscall

The Professor gave us 2 methods for forking threads from the main program. One was ....... The other was to ......... When you fork a thread where does it get created and take its 8 pages from? Do you have to calculate ......? If so how? Where does it store its PCReg .......? Any suggestions would be helpfule.

The Professor gave us 2 methods for forking threads from the main program. One was ....... The other was to ......... When you fork a thread where does it get created and take its 8 pages from? Do you have to calculate ......? If so how? Where does it store its PCReg .......? Any suggestions would be helpfule.

If you use the first implementation...., then you'll have a hard limit on the number of threads....If you use the second implementation, you need to....

Either way, you'll need to implement the AddrSpace::NewStack() function and make sure that there is memory available.

If you use the first implementation...., then you'll have a hard limit on the number of threads....If you use the second implementation, you need to....

Either way, you'll need to implement the AddrSpace::NewStack() function and make sure that there is memory available.

QUES

ISSUE, QUES

ANS-SUG

ANS-SUG

Page 11: Towards Identifying Unresolved Discussions in Student Online Forums

Pedagogical Discourse Jihie Kim/USC-ISI

11

Code 1Name

QUES Question

ANNO Announcement

CANS Complex Answer

SANS Simple Answer

SUG Suggest

ELAB Elaborate

CORR Correct

OBJ Object

CRT Criticize

SUP Support

ACK Acknowledge

COMP Complement

Code 3

QUES

ANNO

ANS-SUG

ELAB

POS-ACK

NEG-ACK

Code 2

POS

NEUT

NEG

Code 1

Code 2Code 3

Kappa: 0.70

Kappa: 0.54

Kappa: 0.58

Speech Act categories explored

Page 12: Towards Identifying Unresolved Discussions in Student Online Forums

Pedagogical Discourse Jihie Kim/USC-ISI

12

Current Speech Act Categories

SA Category

Description kappaDistribution (%

in corpus)

QUESA question about a problem, including

question about a previous message0.94 50.6

ANS-SUGA simple or complex answer to a previous

question. Suggestion or advice0.72 41.2

ISSUEReport misunderstanding, unclear concepts

or issues in solving problems0.88 15.4

Pos-ACKAn acknowledgement, compliment or support in response to a prev. message

0.87 9.1

Neg-ACKA correction or objection (or complaint)

to/on a previous message0.85 2.6

Page 13: Towards Identifying Unresolved Discussions in Student Online Forums

Pedagogical Discourse Jihie Kim/USC-ISI

13

Data cleaning and pre-processing

Discussion data • Noisy, Incoherent• High variations – messages may contain answers or suggestions in the

form of questions• Informal dialect used by students

Data pre-processing – Tokenization, Stemming, other filtering steps applied

• (e.g. Removing programming code existing within messages, pluralized words,…etc….)

Data Categorization• Transform/Replace commonly occurring words/word-sequences with

categories Apostrophe words ( ‘re, ‘ve, ‘m…) Technical terms existing within messages replaced by TECH_TERM -

(from commonly used technical terms in course) Don’t replace pronouns (“you can” in ANS vs. “I can”)

Page 14: Towards Identifying Unresolved Discussions in Student Online Forums

Pedagogical Discourse Jihie Kim/USC-ISI

14

Features for SA Classification

F1: Cue phases and their positions (e.g. “Thank” position)

F2: Message Position F3: Previous Message

Information F4: Poster Class F5: Poster Change F6: Message Length

IF cue-phrase = {What} &{“?”} => QUES

IF cue-phrase = {“yes you can”}& poster-info = Instructor

& post-length = Medium => ANS

IF cue-phrase = {“yes”}& cue-position = CP_BEGIN

& prev-SA = QUES=> ANS

IF cue-phrase = {“not know”} & poster-info = student

& poster-change = YES => ISSUE

Example TBL rules

Page 15: Towards Identifying Unresolved Discussions in Student Online Forums

Pedagogical Discourse Jihie Kim/USC-ISI

15

SA Classification Results

SA Category

Support Vector Machine (SVM)

Transformation-Based Learning (TBL)

Precision Recall F score Precision Recall F score

QUES 0.95 0.90 0.94 0.96 0.91 0.95

ANS 0.87 0.80 0.85 0.83 0.64 0.78

ISSUE 0.65 0.54 0.62 0.46 0.76 0.50

Pos-ACK 0.57 0.44 0.54 0.58 0.56 0.57

Neg-ACK 0 0 0 0.5 0.38 0.47

Page 16: Towards Identifying Unresolved Discussions in Student Online Forums

Pedagogical Discourse Jihie Kim/USC-ISI

16

Profiling discussion threads with SAs

(Q1) Were all questions answered? (Y/N)(Q2) Were there any issues or confusion? (Y/N)(Q3) Were those issues or confusions resolved? (Y/N)

Page 17: Towards Identifying Unresolved Discussions in Student Online Forums

Pedagogical Discourse Jihie Kim/USC-ISI

17

Thread classification with SA classifiers

Feature Set1: Whether there was an [SA] in the thread Feature Set2: Whether the last message in the thread included [SA]

Precision Recall F score

Q1 0.93 0.93 0.93

Q2 0.93 0.93 0.93

Q3 0.89 0.89 0.89

(a) SVM Classification results with human annotated SAs

Precision Recall F score

Q1 0.83 0.84 0.83

Q2 0.77 0.74 0.76

Q3 0.68 0.69 0.68

(b) SVM Classification results with system generated SAs

(Q1) Were all questions answered? (Y/N)(Q2) Were there any issues or confusion? (Y/N)(Q3) Were those issues or confusions resolved? (Y/N)

Page 18: Towards Identifying Unresolved Discussions in Student Online Forums

Pedagogical Discourse Jihie Kim/USC-ISI

18

Direct thread classification without SA classifiers

F1’: cue phrases and their positions (last message or not) in the thread

Precision Recall F score

Q1 0.86 0.86 0.86

Q2 0.81 0.62 0.70

Q3 0.75 0.33 0.46

(a) With SAs

Precision Recall F score

Q1 0.83 0.84 0.83

Q2 0.77 0.74 0.76

Q3 0.68 0.69 0.68

(Q1) Were all questions answered? (Y/N)(Q2) Were there any issues or confusion? (Y/N)(Q3) Were those issues or confusions resolved? (Y/N)

(b) Direct classification

Page 19: Towards Identifying Unresolved Discussions in Student Online Forums

Pedagogical Discourse Jihie Kim/USC-ISI

19

Summary and Discussion

Identifying unresolved discussions Discerning speech acts (SAs) in student online discussions

Classify discussion threads with SA as features

Compare SA-based classification and direct thread classification with phrase features

SA-based features may help some difficult cases

• E.g. Longer threads with more than one questions raised

Page 20: Towards Identifying Unresolved Discussions in Student Online Forums

Pedagogical Discourse Jihie Kim/USC-ISI

20

Related Work

Pedagogical/tutorial dialogueInstructional discourse modeling (Yuan et al., 2008; Graesser et al., 2005;

McLaren et al., 2007; Boyer et al., 2008; Fossati 2008; Litman et al., 2003)

Dialogue modeling in email messages or blog (e.g. AAAI 2008 workshop on Enhanced Messaging)

• Email speech acts• Requests and commitments

Handling noisy data and high variance in text (Knoblock et al., 2007)

Course topic and task modeling using information extraction techniques(Roy et al. 2008; Jovanovic et al., 2006 )

Trace student e-learning activities (Israel and Aiken, 2007; Dringus and Ellis, 2005)

Page 21: Towards Identifying Unresolved Discussions in Student Online Forums

Pedagogical Discourse Jihie Kim/USC-ISI

21

Ongoing Work: Discussion Assessment

Discussion thread pattern and phase analysis question, understanding, solving and closing

Discussion topic analysis Coherency of discussion topics

Student profiling Information providers (peer mentors) vs. information seekers Information flow and influence network among participants

Use of workflows (distributed systems) for large-scale assessment E.g. participation changes over several semesters

Page 22: Towards Identifying Unresolved Discussions in Student Online Forums

Pedagogical Discourse Jihie Kim/USC-ISI

22

Supported by National Science Foundation (NSF)

More details available at

http://ai.isi.edu/discourse

Email: [email protected]