johnson graduate school of management library project
Embed Size (px)
DESCRIPTIONClients: Ken Bolton Lynn Brown Angela K. Horne Don Schneder Doris Smith JGSM Library Reference Team. Project Team: Jonathan Gong Benson Lee Man Fai Matthew Lee Greg Leedberg Liz Xu. Johnson Graduate School of Management Library Project. Since the last presentation…. - PowerPoint PPT Presentation
JGSM Library Project - CS 501 1
Johnson Graduate School of Management Library Project
Clients:Ken BoltonLynn BrownAngela K. HorneDon SchnederDoris SmithJGSM Library Reference Team
Man Fai Matthew Lee
JGSM Library Project - CS 501 2
Since the last presentation…
Tasks accomplished: Decided on using PHPDig as the backend Implemented many functional requirements Adjusted PHPDig code to improve ranking
based on client requirements Discussed with the client additional
functionality to be added to the system
JGSM Library Project - CS 501 3
New Requirements / Why PHPDig? Implemented Functionality
Abstract Display Advanced Search Administrative Features Ranking Adjustments
Task List for Final Milestone (Things to Do) Demo of Current System
JGSM Library Project - CS 501 4
Boosting Display Statistics Page Batch Adding Search Results Display Add/Remove Categories
JGSM Library Project - CS 501 5
Why PHPDig? Non-technical
Client prefers using PHP/MySQL since both technologies are on their web server
JGSM Library site has less than 300 HTML pages
A requirement: database Client involved in decision of continuing with
PHPDig Focus on maintainability and usability
JGSM Library Project - CS 501 6
Why PHPDig? Technical
PHPDig code is relatively short PHPDig = Open Source = Free to modify Florida State University, Dept. of Biology
The Kiwi Search Engine http://www.linknz.co.nz/ 123,000+ web sites indexed
Ranking is similar to Lucene since they both use the same ranking algorithm (tf-idf)
PHPDig version 1.8.7 www.phpdig.net
JGSM Library Project - CS 501 7
Implemented Functionality: Abstract Display
Purpose Users can get a description written by a
Modified PHPDig code to look for an abstract Added a table to the database: auxiliary
spider_id : int full_url : string abstract : string category : string
JGSM Library Project - CS 501 8
Example of Abstract Display
JGSM Library Project - CS 501 9
Example of Abstract Display (Cont’d)
JGSM Library Project - CS 501 10
Our Current Working Interface
We now have a functional interface which can actually perform searches, and display results.
The interface has evolved from the prototype previously presented, based on feedback from our clients.
JGSM Library Project - CS 501 11
Started with the prototype presented for progress report 1 as target design.
One we started working with PhpDig’s template system, made some slight changes to the original target interface due to the reality of what PhpDig can handle.
JGSM Library Project - CS 501 12
JGSM Library Project - CS 501 13
After presenting this design to our clients and discussing possible alternatives, we jointly came up with the current working design:
JGSM Library Project - CS 501 14
Our Current Working Interface: Advanced Search
JGSM Library Project - CS 501 15
Our Current Working Interface: Search Results
JGSM Library Project - CS 501 16
How We Implemented the Interface
PhpDig uses a template system Allow us to write HTML code for the search
page, and use special PhpDig tags to generate form controls, results, etc., within that page
JGSM Library Project - CS 501 17
How We Implemented the Interface
Some problems came up during this process: Problem: Some of the static HTML generated
automatically by PhpDig tags to produce the search form does not match our desired style.
Solution: We do not depend on PhpDig to generate all of the form HTML, some is hand-coded by us to match our style
JGSM Library Project - CS 501 18
How We Implemented The Interface
Some problems arose during this process: Problem: Some of the dynamic HTML generated
by PhpDig tags also does not match our style. Solution: We cannot hand-code this HTML
(category drop-down, etc.), so we modified the PhpDig source code which is called in response to these tags so that the generated HTML matches our desired style.
JGSM Library Project - CS 501 19
Where To Go From Here
Based on future discussions with our client, we will continue to refine the interface towards an ideal goal.
More source-level changes to PhpDig to get the details right Example: Context currently cuts off words in the
JGSM Library Project - CS 501 20
Implemented: Add a page
Options: abstract & category Remove a page from database Update a page in database
Options: update abstract & category Content is re-indexed
JGSM Library Project - CS 501 21
To be Implemented: Manual ranking abilities
Give a page more weight overall Give a page more weight for certain words
Feedback Kerberos authentication
JGSM Library Project - CS 501 22
To be Implemented: (continued) Display statistics
Statistics useful to the administrators, such as most frequent searches, searches with no results, etc
Batch adding of pages Category Administration
JGSM Library Project - CS 501 23
Improved from before, mostly complete Formula similar to Lucene default now:
)in (t.fieldlengthNorm * )in .fieldgetBoost(t*)(idf*)in (tf*),coord(),(score ddtdtdqdqQt
)in getBoost(*)(idf*)in (tf*),coord(),(score dttdtdqdqQt
JGSM Library Project - CS 501 24
q is the # of query terms matched in document
Q is # terms in query
only relevant in search for “any of the terms”
JGSM Library Project - CS 501 25
Completed: Ranking implementation complete
Left to do: Admin Panel to modify boosted pages/words Uses boost, but need to finalize how to
modify boosting parameter
JGSM Library Project - CS 501 26
1. Admin modifies score of page relative to current score.
2. Specify position a page should appear given a one-term query.
JGSM Library Project - CS 501 27
Pros and Cons
Method 1: Modify relative to current score
+ More careful manipulation of score possible
+ Faster to code, more time to test
- More difficult to use Method 2: modify rank
+ Easier to use
- Adjustments only possible on one-word queries
JGSM Library Project - CS 501 28
Task List for Final Milestone
Feedback Confirmations and errors will be adjusted to
display the message on the administrative page to improve usability.
JGSM Library Project - CS 501 29
Display stats page
Links for the relevant log pages will be added to the main administration page.
JGSM Library Project - CS 501 30
To facilitate the indexing process, we will add batch adding feature to the main administration page.
JGSM Library Project - CS 501 31
Adjust search results display
The page description will have no cut off words and that the client is satisfied with the search results interface.
JGSM Library Project - CS 501 32
Limit by category
Search by category will be implemented.
JGSM Library Project - CS 501 33
Administrative function to add and remove categories
Adding and removing categories will be implemented and linked to the administrative page.
JGSM Library Project - CS 501 34
Administrative function to weight ranking
Manual ranking adjustments will be added so that the client would be fully satisfied with the search results.
JGSM Library Project - CS 501 35
Access to the administration page will use Cornell University’s Web Authentication (CUWebAuth) for authentication.
JGSM Library Project - CS 501 36
Unit Testing and Integration Testing
Every unit that is implemented will be fully unit tested on our own computers, and also integrated into the rest of the code for integration testing.
JGSM Library Project - CS 501 37
Installation and Refinement
The installation of the final system will take place early before the next milestone in order to avoid any delay.
This time period is reserved for any last minute minor changes to the system to ensure the client’s satisfaction.
JGSM Library Project - CS 501 38
Documentation and Training Slides
Our final milestone includes a detailed documentation of the project, training slides and an informal training session to help administrators to learn the control of the system.
JGSM Library Project - CS 501 39
After careful testing and feedback, the search system will go live.
JGSM Library Project - CS 501 40
JGSM Library Project - CS 501 41
JGSM Library Project - CS 501 42