towards advanced data retrieval from learning objects repositories

www.metropolitan.ac.rs

TOWARDS ADVANCED DATA RETRIEVAL FROM LEARNING OBJECTS REPOSITORIES

Valentina PaunovicBelgrade Metropolitan University

Slobodan JovanovicBelgrade Metropolitan University

This work was supported by Ministry of Education, Science and Technology (Project III44006).

What problem do we solve?

Popularity of personalized distance based learning

Demands

Effective creation of learning materials

Enables Enables

REUSABILITY SEARCH

Textual search

Learning Object Search Type Type

Effective textual search in large LOR is important

TextImageVideo

...TEXTUALMeta

data

Our system - contributions

• Search engine– Steiner-trees approach– Algorithm for graph representation of LOR.

• Query language– Extension based on formal logic.– Algorithm for parsing extended language.

Steiner trees search

Traditional search: (for example - text processing applications)

alternative

Steiner trees

Steiner trees approach

• Query– word1, word2, word3

• Possible interpretation– Find all objects such that each object contains all

words from query– Issue: what if there is no such object?

• Alternative interpretation– Find all groups of related objects such that each

group contains all words form query

Example – possible alternatives

Ranking

• Smaller number of LO: – Stronger relationships among terms from query– Conclusion: advantage in rankings– Example: the best solutions consist of only one LO

• Group which contains more similar LO (from same area or subject) – Stronger relationships among terms from query– Conclusion: advantage in rankings– Example: the best solution are groups of LO from the

same area

Main advantages

• Situation: there is no object which satisfies all terms from query – Traditional search – no results– Steiner trees search – returns results

• Possible to detect implicit relationships among learning objects

Vector space model from text mining

• How to determine which LO are related?

• LO is represented as an m-dimensional TF-IDF vector:

• Each component is calculated as

• Term frequency:

– n(i,j) - number of occurrences of i-th term in the j-th slot of LO d

– hj - weight associated with the j-th slot.

),...,,()( 21 mtfidftfidftfidfdr

j

ji jinhtf ),(

idftftfidf *

Vector space model II

• Weights :– The highest impact (weight) have terms from metadata

title, keywords and description.– Medium impact have terms from content (if there is

textual content).– Low impact have terms from the rest of searchable

metadata

• Inverse document frequency has purpose to reduce impact of common words

|}:{|

||log

dwLORd

LORidf

ii

LO similarity measure

• Now we can introduce similarity measure

• One possibility - Cosine similarity

||)1(||*||)1(||

)2()1()2,1(

drdr

drdrddsim

Search algorithm

• Issue: finding top k minimum cost Steiner trees (MCST-k) is NP complete

• DBPF-k developed for keyword search on DB:– Has polynomial solution– First returned result is optimal– The rest of (k-1) solutions are approximate

• Efficiency of DBPF-k algorithm depends on graph sparseness.

Graph representation of LOR

• Steiner-trees search requires sparse graph• Graph representation of LOR:– Nodes: LO– Weighted edges: defined by similarity measure

between any two nodes• Issue: dense graph - number of edges:

• Result: Slow search ))(( 2LOofnumberO

Graph sparsification - rules

• No node should be removed from the graph.• Low similarity edges should be removed from the graph.• Edge removal should not violate graph connectivity.• Targeted number of edges is specified by parameter T. Graph

obtained by sparsification process should have less than T edges, unless it violates connectivity constraint.

• No priority among edges of equal weight• If two learning objects are in relationship specified by the

metadata relation, it should be preserved in the graph regardless of similarity degree between these two learning objects.

Sparsification • Complexity of the

algorithm is:

))((

|)|log|(|2LOofnumberO

EEO

Query language

• Example query: exponential function• Issue 1: What if there is a term exp instead of

exponential?– Possible solution: dictionary of synonyms + dictionary of

acronyms and abbreviations – Problem: Can be complicated to implement

• Issue 2: Find all exponential or logarithmic functions– Possible solution: submit two different queries – Problem: Can be inconvenient for a user

Query language - extension

1. Operator and, marked by reserved word %AND.2. Operator or, marked by reserved word %OR.

• Both operators have the same precedence priority.• Expressions are evaluated from left to right. • If there is no operator between two terms, implicitly

is assumed %AND operation. For example, “math function” is evaluated as “math %AND function”.

• Associativity rule is preserved from formal logic

Query language

• How to evaluate complex expression like (a %OR b) %AND ((c %OR d) %AND e)

• We can not submit such query directly to search algorithm

• We need a query parsing algorithm

Query language - terminology

• Term (t) – word used in a query• Simple Query (Q) – set of terms:

• Expression (E) – set of simple queries:

• Operation corresponds to operator %AND:

• Operation corresponds to operator %OR:},|{ 2121 EQEQQQEE jiji

2121 EEEE

,

,

.

},...,,{ ||21 EQQQE

},...,,{ ||21 QtttQ

Parsing algorithminitialize S as empty stack of expressions;initialize empty set of search results R; foreach token w of query

switch(w):case “(”,“%AND”,“%OR”: push w to S;case “)”:

E<-evaluateTopExpression(S);push E to S;

default: if(previous token is term)

push “%AND” to S;Q = {w};E = {Q};push E to S;

end switch; E<-evaluateTopExpression(S);foreach simple query Q from E

result = DBPF-k(Q);add result to R;

evaluateTopExpression(S){initialize SH as empty stack;while (S not empty)

wh<-pop from S;if(wh = “(”)

break;push wh to SH;

while (true)

first<-pop from SH;if (SH is empty) return first;operator<-pop from SH;second<-pop from SH;switch(operator)case “%AND”:

result = first ^ second;case “%OR”:

result = first v second;end switch;push result to SH;

}

Architecture of search system

Conclusion

• Proposed architectural solution for advanced search through repositories of learning objects

• Search based on finding top-k min-cost Steiner trees

• Proposed algorithm for sparse weighted graph representation of a LO repository

• Proposed extension of query language based on formal logic and designed an algorithm for parsing it