info624 - week 4 query languages and query operations dr. xia lin associate professor college of...
TRANSCRIPT
![Page 1: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/1.jpg)
INFO624 - Week 4
Query Languages and Query Operations
Dr. Xia LinDr. Xia LinAssociate ProfessorAssociate Professor
College of Information Science and TechnologyCollege of Information Science and Technology
Drexel UniversityDrexel University
![Page 2: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/2.jpg)
Query Query is a Query is a representationrepresentation of the user’s of the user’s
information needsinformation needs It may not represent the information It may not represent the information
needs exactly becauseneeds exactly becauseInformation needs are difficult to Information needs are difficult to
describe -- semantic difficultydescribe -- semantic difficultyQuery must be in a format Query must be in a format
acceptable to the retrieval system -- acceptable to the retrieval system -- syntactic difficultysyntactic difficulty
![Page 3: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/3.jpg)
Content-based queries
Words
Phrases
Proximity
Pattern Matchingword matching
Prefix/suffix
Wildcard search
Error handlingExtended patterns
Boolean Vector
Natural Language
![Page 4: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/4.jpg)
Boolean Queries
Request:Request:What are the likely problems when someone gets What are the likely problems when someone gets hurt on his knees when playing basketball?hurt on his knees when playing basketball?
Write your best Boolean query for this request:Write your best Boolean query for this request:
If the query returns zero hits, how do you modify If the query returns zero hits, how do you modify the query? the query?
If the query returns too many hits, how do you If the query returns too many hits, how do you modify the query?modify the query?
![Page 5: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/5.jpg)
How does AskJeeves translate the request? How does AskJeeves translate the request? What are the likely problems when What are the likely problems when
someone gets hurt on his knees when someone gets hurt on his knees when playing basketball?playing basketball?
![Page 6: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/6.jpg)
Construct your best Boolean query for this Construct your best Boolean query for this request:request: I am doing a research on personal space I am doing a research on personal space
boundaries. I want to know if there are boundaries. I want to know if there are any sex or race differences in personal any sex or race differences in personal space boundaries. space boundaries.
![Page 7: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/7.jpg)
Interaction with Queries
Starts with a SEED queryStarts with a SEED query The System responds with a list of The System responds with a list of
related termsrelated terms Adds selected terms from the list to the Adds selected terms from the list to the
queryquery The system updates the list of related The system updates the list of related
termsterms Repeat as neededRepeat as needed
![Page 8: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/8.jpg)
Example: MedLine Search Assistant
![Page 9: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/9.jpg)
Association-based Queries
Find documents similar to this document.Find documents similar to this document.
Find documents that links to this documentFind documents that links to this document ExplicitlyExplicitly Implicitly Implicitly
![Page 10: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/10.jpg)
Field-based Queries
![Page 11: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/11.jpg)
Field-based queries will likely improve Field-based queries will likely improve search precision.search precision.
Field-based queries require that the Field-based queries require that the data source has a fixed structure and data source has a fixed structure and are indexed by the structure.are indexed by the structure.
![Page 12: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/12.jpg)
Citation-based Queries
Retrieve all documents that document A Retrieve all documents that document A cites. cites.
Find all documents that cite document A.Find all documents that cite document A. Find all documents that cite this authorFind all documents that cite this author Find all document that cite both document Find all document that cite both document
A and document BA and document B Find documents that cites both author A Find documents that cites both author A
and author Band author B
![Page 13: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/13.jpg)
Co-Citation The college has more than 20 years tradition on The college has more than 20 years tradition on
Co-citation research.Co-citation research. Co-citation is the mentioning of any two earlier Co-citation is the mentioning of any two earlier
documents in the bibliographic references of a later documents in the bibliographic references of a later third document.third document.
Later Document 3
Document 1 cites
Document 2cites
?
![Page 14: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/14.jpg)
Co-Citation Analysis The count of mentions may grow over The count of mentions may grow over
time as new writings appear. Thus, co-time as new writings appear. Thus, co-citation counts can reflect citers’ citation counts can reflect citers’ changing perceptions of documents as changing perceptions of documents as more or less strongly related.more or less strongly related.
Documents shown to be related by their Documents shown to be related by their co-citation counts can be mapped as co-citation counts can be mapped as proximate in intellectual space.proximate in intellectual space.
![Page 15: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/15.jpg)
Co-Citation Mapping
Detects patterns in the frequency with which Detects patterns in the frequency with which any works by any two authors are jointly any works by any two authors are jointly cited in later works. cited in later works.
Only recurrent co-citation is significant: The Only recurrent co-citation is significant: The more times authors are cited together, the more times authors are cited together, the more strongly related they are in the eyes of more strongly related they are in the eyes of citers.citers.
![Page 16: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/16.jpg)
A Map of Information Scientists
![Page 17: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/17.jpg)
AuthorLinks
![Page 18: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/18.jpg)
Link-Based Queries Hypertext StructureHypertext Structure
Is a link a query?Is a link a query?http://www.google.com/search?http://www.google.com/search?
hl=en&q=information+retrievalhl=en&q=information+retrievalThis is called query-mediated link. This is called query-mediated link. It is also called “soft link.”It is also called “soft link.”
Is a query a link?Is a query a link?Many pages are dynamically generated Many pages are dynamically generated
from a database or a search engine.from a database or a search engine.• Your review pagesYour review pages
![Page 19: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/19.jpg)
Queries, Links, Is there a difference – SIGCHI’97 An experiment was conducted to compare An experiment was conducted to compare browsing behavior in query- and link-browsing behavior in query- and link-based interfaces. Results suggest that based interfaces. Results suggest that query-mediated links are as effective as query-mediated links are as effective as explicit queries, and that strategies explicit queries, and that strategies adopted by users affect performance. This adopted by users affect performance. This work has implications for the design of work has implications for the design of information exploration interfaces. information exploration interfaces.
![Page 20: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/20.jpg)
Query Structure Hierarchical StructureHierarchical Structure
What does the user want when searching for What does the user want when searching for “substance abuse”“substance abuse”
We may not know, but adding narrower terms We may not know, but adding narrower terms of “substance abuse” will likely get better of “substance abuse” will likely get better resultsresults
Alcohol Abuse; Alcohol Abuse; Drug Abuse; Drug Abuse; Alcohol-Related Disorders Alcohol-Related Disorders Amphetamine-Related Disorders Amphetamine-Related Disorders Cocaine-Related Disorders Cocaine-Related Disorders Marijuana Abuse Marijuana Abuse
![Page 21: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/21.jpg)
Automatic Expansion If there is a defined hierarchy, several If there is a defined hierarchy, several
search strategies may be defined to expand search strategies may be defined to expand the query:the query: Search with the query term onlySearch with the query term only Search with the query term and all the Search with the query term and all the
terms in its upper hierarchyterms in its upper hierarchy Search with the query term and all the Search with the query term and all the
terms in its lower hierarchy.terms in its lower hierarchy. Search with the query terms and its all Search with the query terms and its all
the sibling termsthe sibling terms
![Page 22: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/22.jpg)
![Page 23: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/23.jpg)
Query Operations
Query executionQuery execution Query expansionQuery expansion Query translationQuery translation
![Page 24: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/24.jpg)
Query Expansion
Improve the initial query through Improve the initial query through automatically automatically restructuring the query or restructuring the query or adding other new terms oradding other new terms or Adjusting weights of each terms.Adjusting weights of each terms.
![Page 25: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/25.jpg)
Restructuring the query:Restructuring the query: Identify key concepts through natural Identify key concepts through natural
language processinglanguage processing Identify any field information that Identify any field information that
may be contained in the querymay be contained in the queryIs this an author?Is this an author?Is this a journal?Is this a journal?
Reverse term orders in the queryReverse term orders in the query
![Page 26: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/26.jpg)
Adding new terms:Adding new terms: Synonyms Synonyms Hierarchical termsHierarchical terms Scope termsScope terms
Does query “Football” retrieve Does query “Football” retrieve information on football or on soccer? information on football or on soccer?
Relevant termsRelevant termsSelected terms from relevant documentsSelected terms from relevant documentsTerms co-occur most often with the query Terms co-occur most often with the query
termsterms
![Page 27: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/27.jpg)
Adjusting term weightingAdjusting term weighting If relevant documents are known, increase the If relevant documents are known, increase the
weights for terms assigned to the relevant weights for terms assigned to the relevant documents and decrease the weights to terms documents and decrease the weights to terms assigned to non-relevant documents.assigned to non-relevant documents.
Adjust term weights in a topic tree:Adjust term weights in a topic tree: Fruit Fruit
Fruit, 0.9 ; apple, 0.7; orange, 0.7; banana, Fruit, 0.9 ; apple, 0.7; orange, 0.7; banana, 0.6; ….; Macintosh, 0.1; Computer -.4.0.6; ….; Macintosh, 0.1; Computer -.4.
![Page 28: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/28.jpg)
Query Translation From natural language to queriesFrom natural language to queries
AskJeevesAskJeeves From queries in one system to queries in From queries in one system to queries in
another systemanother system From one natural language to another From one natural language to another
natural languagenatural language AltavistaAltavista
![Page 29: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/29.jpg)
Other types of representation for user’s needs?
Mind-reading?Mind-reading? Non-text queries?Non-text queries? Gesture/motion? Gesture/motion?
![Page 30: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/30.jpg)
IBM – Visualization Space•This information system understands the user.
•It "hears" users' voice commands and "sees"their gestures and body positions. Interactions are natural, more like human-to-human interactions.
![Page 31: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/31.jpg)
Multimedia Queries Content-basedContent-based
Text indexingText indexing Attribute-basedAttribute-based
Color, size, type, time period, …Color, size, type, time period, … Structure-basedStructure-based
Location, shape, layout, etc.Location, shape, layout, etc. Cluster-basedCluster-based
Semantic groups, physical groups, structure-Semantic groups, physical groups, structure-groups, groups,
Example: find a photo that has the White House Example: find a photo that has the White House in the center.in the center.
![Page 32: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/32.jpg)
Project Discussion Idea 1: Install and implement an IR systemIdea 1: Install and implement an IR system
Focus on system and technologyFocus on system and technology Need to have a collection Need to have a collection Need to have hand-on experience with systemsNeed to have hand-on experience with systems
Idea 2: Conduct an evaluation experiment on one Idea 2: Conduct an evaluation experiment on one or two selected IR systemsor two selected IR systems Focus on interfaces and usersFocus on interfaces and users
Idea 3: Customize an IR system Idea 3: Customize an IR system Focus on functionality and customization Focus on functionality and customization
![Page 33: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/33.jpg)
Project Evaluation TopicsTopics
RelevanceRelevance Problems identifiedProblems identified Technical difficultiesTechnical difficulties Solutions/ideasSolutions/ideas
The processThe process DesignDesign ImplementationImplementation
![Page 34: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/34.jpg)
The reportThe report BackgroundBackground Written Written Oral Oral
![Page 35: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/35.jpg)
Midterm ConceptsConcepts
What is information retrieval?What is information retrieval? Data, information, text, and documentsData, information, text, and documents Two abstractions principlesTwo abstractions principles User’s information needsUser’s information needs Queries and query formatsQueries and query formats Precision and RecallPrecision and Recall RelevanceRelevance
![Page 36: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/36.jpg)
Midterm
Procedures & problem solving Procedures & problem solving How to translate a request into a query?How to translate a request into a query? How to expand queriesHow to expand queries
for better recall or better precision?for better recall or better precision? How to create an inverted indexing?How to create an inverted indexing? How to create a vector space ?How to create a vector space ? How to calculate similarities of How to calculate similarities of
documents?documents? How to match a query to documents in a How to match a query to documents in a
vector space?vector space?
![Page 37: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e6a5503460f94b688ab/html5/thumbnails/37.jpg)
DiscussionsDiscussions Challenges of IRChallenges of IR Advantages and disadvantages of Boolean Advantages and disadvantages of Boolean
search (vector space, automatic indexing, search (vector space, automatic indexing, association-based queries, etc.)association-based queries, etc.)
Evaluation of IR systemsEvaluation of IR systemsWith or without using precision/recall.With or without using precision/recall.
Difference between data retrieval and Difference between data retrieval and information retrievalinformation retrieval