softcardinality: hierarchical text overlap for student response analysis

16
SOFTCARDINALITY: Hierarchical Text Overlap for Student Response Analysis Sergio Jimenez and Claudia Becerra participating system in the Student Response Analy TASK-7 SemEval 2013 Alexander Gelbukh Instituto Politécnico Nacional, México Soft Cardinality: Hierarchical Text Overlap for SRA Sergio Jimenez

Upload: sergio-jimenez

Post on 31-Jul-2015

134 views

Category:

Education


1 download

TRANSCRIPT

SOFTCARDINALITY: Hierarchical Text Overlap for

Student Response Analysis

Sergio Jimenez and Claudia Becerra

a participating system in the Student Response AnalysisTASK-7 SemEval 2013

Alexander Gelbukh

Instituto PolitécnicoNacional, México

Soft Cardinality: Hierarchical Text Overlap for SRA Sergio Jimenez

Soft Cardinality

A=, ,

B= , ,

|A|=3

|B|=3

Classical(integer)

Soft(real)

|A|’2.9

|B|’1.3

Cardinality: number of different elements in a collection, i.e. set definition.

C= ,= |C|=1 |C|’=1.0

Soft Cardinality: Hierarchical Text Overlap for SRA Sergio Jimenez

Soft Cardinality

|𝐴|′=∑𝑖=1

|𝐴|

𝑤𝑖 (∑𝑗=1

|𝐴|

𝑠𝑖𝑚(𝑎𝑖❑ ,𝑎 𝑗

❑)𝑝)− 1

inter-elementssimilarity

elementsweights

“softness”control

When

word-to-wordsimilarity

idf termweighting

Soft Cardinality: Hierarchical Text Overlap for SRA Sergio Jimenez

Hierarchical Similarity Model

Soft Cardinality: Hierarchical Text Overlap for SRA Sergio Jimenez

Word-to-wordsimilarity

“Sentence”-to-”sentence”

similarity

“Document”- Soft Cardinality

words Questions (Q) vs.Answers (A)

Reference Answer (RA) vs. Reference Answer (RA)

Q vs. set(SA)

A vs. set(SA)

(features for ML)

Word-to-word Similarity

Soft Cardinality: Hierarchical Text Overlap for SRA Sergio Jimenez

q-grams character overlap

“Sentence”-to-“Sentence” Similarity

Soft Cardinality: Hierarchical Text Overlap for SRA Sergio Jimenez

Word overlap

Questions (Q) vs. Answers (A)Reference Answer (RA) vs. Reference Answer (RA)

“Document” Soft Cardinality

Soft Cardinality: Hierarchical Text Overlap for SRA Sergio Jimenez

Text overlap

a question (Q) vs. sets of reference answers (RA)

an answer (A) vs. sets of reference answers (RA)

Weights from sentence soft cardinality

Feature Set

Soft Cardinality: Hierarchical Text Overlap for SRA Sergio Jimenez

Total number of features:

Submited System

Soft Cardinality: Hierarchical Text Overlap for SRA Sergio Jimenez

A single run obtained from supervised classification models (J48 Tree + bagging) trained separately on Beetle and SciEntsBank data sets.

Same feature set for 5-way, 3-way and 2way classification tasks.

Parameters were not necessary!No external resources used !

5-Way Official Results

Soft Cardinality: Hierarchical Text Overlap for SRA Sergio Jimenez

3-Way Official Results

Soft Cardinality: Hierarchical Text Overlap for SRA Sergio Jimenez

2-Way Official Results

Soft Cardinality: Hierarchical Text Overlap for SRA Sergio Jimenez

2-Way Official Results

Soft Cardinality: Hierarchical Text Overlap for SRA Sergio Jimenez

Soft Cardinality: Hierarchical Text Overlap for SRA Sergio Jimenez

overall accuracy

Soft Cardinality + Lexical Overlap Baseline

Soft Cardinality: Hierarchical Text Overlap for SRA Sergio Jimenez

Conclusions

• The text overlap method based on the soft cardinality is very challenging baseline for the SRA task.

• The Soft Cardinality method in combination with the lexical overlap baseline produce an even stronger baseline.

Soft Cardinality at *SEM and SemEval

• STS-2012, official 3th out of 89 systems• STS-2013-CORE task, 18th out of 90 systems (4th

un-official)• STS-2013-TYPED task, top-system UNITOR team• CLTE-2012, 3rd out of 29 systems (1st un-official)• CLTE-2013, among the 2-top systems• SRA-2013, among the 2-top systems

, , 1.3’