towards advanced data retrieval from learning objects repositories
DESCRIPTION
TRANSCRIPT
www.metropolitan.ac.rs
TOWARDS ADVANCED DATA RETRIEVAL FROM LEARNING OBJECTS REPOSITORIES
Valentina PaunovicBelgrade Metropolitan University
Slobodan JovanovicBelgrade Metropolitan University
This work was supported by Ministry of Education, Science and Technology (Project III44006).
What problem do we solve?
Popularity of personalized distance based learning
Demands
Effective creation of learning materials
Enables Enables
REUSABILITY SEARCH
Textual search
Learning Object Search Type Type
Effective textual search in large LOR is important
TextImageVideo
...TEXTUALMeta
data
Our system - contributions
• Search engine– Steiner-trees approach– Algorithm for graph representation of LOR.
• Query language– Extension based on formal logic.– Algorithm for parsing extended language.
Steiner trees search
Traditional search: (for example - text processing applications)
alternative
Steiner trees
Steiner trees approach
• Query– word1, word2, word3
• Possible interpretation– Find all objects such that each object contains all
words from query– Issue: what if there is no such object?
• Alternative interpretation– Find all groups of related objects such that each
group contains all words form query
Example – possible alternatives
Ranking
• Smaller number of LO: – Stronger relationships among terms from query– Conclusion: advantage in rankings– Example: the best solutions consist of only one LO
• Group which contains more similar LO (from same area or subject) – Stronger relationships among terms from query– Conclusion: advantage in rankings– Example: the best solution are groups of LO from the
same area
Main advantages
• Situation: there is no object which satisfies all terms from query – Traditional search – no results– Steiner trees search – returns results
• Possible to detect implicit relationships among learning objects
Vector space model from text mining
• How to determine which LO are related?
• LO is represented as an m-dimensional TF-IDF vector:
• Each component is calculated as
• Term frequency:
– n(i,j) - number of occurrences of i-th term in the j-th slot of LO d
– hj - weight associated with the j-th slot.
),...,,()( 21 mtfidftfidftfidfdr
j
ji jinhtf ),(
idftftfidf *
Vector space model II
• Weights :– The highest impact (weight) have terms from metadata
title, keywords and description.– Medium impact have terms from content (if there is
textual content).– Low impact have terms from the rest of searchable
metadata
• Inverse document frequency has purpose to reduce impact of common words
|}:{|
||log
dwLORd
LORidf
ii
LO similarity measure
• Now we can introduce similarity measure
• One possibility - Cosine similarity
||)1(||*||)1(||
)2()1()2,1(
drdr
drdrddsim
Search algorithm
• Issue: finding top k minimum cost Steiner trees (MCST-k) is NP complete
• DBPF-k developed for keyword search on DB:– Has polynomial solution– First returned result is optimal– The rest of (k-1) solutions are approximate
• Efficiency of DBPF-k algorithm depends on graph sparseness.
Graph representation of LOR
• Steiner-trees search requires sparse graph• Graph representation of LOR:– Nodes: LO– Weighted edges: defined by similarity measure
between any two nodes• Issue: dense graph - number of edges:
• Result: Slow search ))(( 2LOofnumberO
Graph sparsification - rules
• No node should be removed from the graph.• Low similarity edges should be removed from the graph.• Edge removal should not violate graph connectivity.• Targeted number of edges is specified by parameter T. Graph
obtained by sparsification process should have less than T edges, unless it violates connectivity constraint.
• No priority among edges of equal weight• If two learning objects are in relationship specified by the
metadata relation, it should be preserved in the graph regardless of similarity degree between these two learning objects.
Sparsification • Complexity of the
algorithm is:
))((
|)|log|(|2LOofnumberO
EEO
Query language
• Example query: exponential function• Issue 1: What if there is a term exp instead of
exponential?– Possible solution: dictionary of synonyms + dictionary of
acronyms and abbreviations – Problem: Can be complicated to implement
• Issue 2: Find all exponential or logarithmic functions– Possible solution: submit two different queries – Problem: Can be inconvenient for a user
Query language - extension
1. Operator and, marked by reserved word %AND.2. Operator or, marked by reserved word %OR.
• Both operators have the same precedence priority.• Expressions are evaluated from left to right. • If there is no operator between two terms, implicitly
is assumed %AND operation. For example, “math function” is evaluated as “math %AND function”.
• Associativity rule is preserved from formal logic
Query language
• How to evaluate complex expression like (a %OR b) %AND ((c %OR d) %AND e)
• We can not submit such query directly to search algorithm
• We need a query parsing algorithm
Query language - terminology
• Term (t) – word used in a query• Simple Query (Q) – set of terms:
• Expression (E) – set of simple queries:
• Operation corresponds to operator %AND:
• Operation corresponds to operator %OR:},|{ 2121 EQEQQQEE jiji
2121 EEEE
,
,
.
},...,,{ ||21 EQQQE
},...,,{ ||21 QtttQ
Parsing algorithminitialize S as empty stack of expressions;initialize empty set of search results R; foreach token w of query
switch(w):case “(”,“%AND”,“%OR”: push w to S;case “)”:
E<-evaluateTopExpression(S);push E to S;
default: if(previous token is term)
push “%AND” to S;Q = {w};E = {Q};push E to S;
end switch; E<-evaluateTopExpression(S);foreach simple query Q from E
result = DBPF-k(Q);add result to R;
evaluateTopExpression(S){initialize SH as empty stack;while (S not empty)
wh<-pop from S;if(wh = “(”)
break;push wh to SH;
while (true)
first<-pop from SH;if (SH is empty) return first;operator<-pop from SH;second<-pop from SH;switch(operator)case “%AND”:
result = first ^ second;case “%OR”:
result = first v second;end switch;push result to SH;
}
Architecture of search system
Conclusion
• Proposed architectural solution for advanced search through repositories of learning objects
• Search based on finding top-k min-cost Steiner trees
• Proposed algorithm for sparse weighted graph representation of a LO repository
• Proposed extension of query language based on formal logic and designed an algorithm for parsing it