johnson graduate school of management library project
Embed Size (px)
DESCRIPTION
Clients: Ken Bolton Lynn Brown Angela K. Horne Don Schneder Doris Smith JGSM Library Reference Team. Project Team: Jonathan Gong Benson Lee Man Fai Matthew Lee Greg Leedberg Liz Xu. Johnson Graduate School of Management Library Project. Since the last presentation…. - PowerPoint PPT PresentationTRANSCRIPT

JGSM Library Project - CS 501 1
Johnson Graduate School of Management Library Project
Clients:Ken BoltonLynn BrownAngela K. HorneDon SchnederDoris SmithJGSM Library Reference Team
Project Team:
Jonathan Gong
Benson Lee
Man Fai Matthew Lee
Greg Leedberg
Liz Xu

JGSM Library Project - CS 501 2
Since the last presentation…
Tasks accomplished: Decided on using PHPDig as the backend Implemented many functional requirements Adjusted PHPDig code to improve ranking
based on client requirements Discussed with the client additional
functionality to be added to the system

JGSM Library Project - CS 501 3
Presentation Outline
New Requirements / Why PHPDig? Implemented Functionality
Abstract Display Advanced Search Administrative Features Ranking Adjustments
Task List for Final Milestone (Things to Do) Demo of Current System

JGSM Library Project - CS 501 4
New Requirements
Boosting Display Statistics Page Batch Adding Search Results Display Add/Remove Categories

JGSM Library Project - CS 501 5
Why PHPDig? Non-technical
Client prefers using PHP/MySQL since both technologies are on their web server
JGSM Library site has less than 300 HTML pages
A requirement: database Client involved in decision of continuing with
PHPDig Focus on maintainability and usability

JGSM Library Project - CS 501 6
Why PHPDig? Technical
PHPDig code is relatively short PHPDig = Open Source = Free to modify Florida State University, Dept. of Biology
www.bio.fsu.edu/phpdig
The Kiwi Search Engine http://www.linknz.co.nz/ 123,000+ web sites indexed
Ranking is similar to Lucene since they both use the same ranking algorithm (tf-idf)
PHPDig version 1.8.7 www.phpdig.net

JGSM Library Project - CS 501 7
Implemented Functionality: Abstract Display
Purpose Users can get a description written by a
librarian/administrator Implementation
Modified PHPDig code to look for an abstract Added a table to the database: auxiliary
spider_id : int full_url : string abstract : string category : string

JGSM Library Project - CS 501 8
Example of Abstract Display

JGSM Library Project - CS 501 9
Example of Abstract Display (Cont’d)

JGSM Library Project - CS 501 10
Our Current Working Interface
We now have a functional interface which can actually perform searches, and display results.
The interface has evolved from the prototype previously presented, based on feedback from our clients.

JGSM Library Project - CS 501 11
Evolved Interface
Started with the prototype presented for progress report 1 as target design.
One we started working with PhpDig’s template system, made some slight changes to the original target interface due to the reality of what PhpDig can handle.

JGSM Library Project - CS 501 12
Evolved Interface

JGSM Library Project - CS 501 13
Evolved Interface
After presenting this design to our clients and discussing possible alternatives, we jointly came up with the current working design:

JGSM Library Project - CS 501 14
Our Current Working Interface: Advanced Search

JGSM Library Project - CS 501 15
Our Current Working Interface: Search Results

JGSM Library Project - CS 501 16
How We Implemented the Interface
PhpDig uses a template system Allow us to write HTML code for the search
page, and use special PhpDig tags to generate form controls, results, etc., within that page

JGSM Library Project - CS 501 17
How We Implemented the Interface
Some problems came up during this process: Problem: Some of the static HTML generated
automatically by PhpDig tags to produce the search form does not match our desired style.
Solution: We do not depend on PhpDig to generate all of the form HTML, some is hand-coded by us to match our style

JGSM Library Project - CS 501 18
How We Implemented The Interface
Some problems arose during this process: Problem: Some of the dynamic HTML generated
by PhpDig tags also does not match our style. Solution: We cannot hand-code this HTML
(category drop-down, etc.), so we modified the PhpDig source code which is called in response to these tags so that the generated HTML matches our desired style.

JGSM Library Project - CS 501 19
Where To Go From Here
Based on future discussions with our client, we will continue to refine the interface towards an ideal goal.
More source-level changes to PhpDig to get the details right Example: Context currently cuts off words in the
middle

JGSM Library Project - CS 501 20
Administrative Features
Implemented: Add a page
Options: abstract & category Remove a page from database Update a page in database
Options: update abstract & category Content is re-indexed

JGSM Library Project - CS 501 21
Administrative Features
To be Implemented: Manual ranking abilities
Give a page more weight overall Give a page more weight for certain words
Feedback Kerberos authentication

JGSM Library Project - CS 501 22
Administrative Features
To be Implemented: (continued) Display statistics
Statistics useful to the administrators, such as most frequent searches, searches with no results, etc
Batch adding of pages Category Administration

JGSM Library Project - CS 501 23
Ranking
Improved from before, mostly complete Formula similar to Lucene default now:
Our formula:
)in (t.fieldlengthNorm * )in .fieldgetBoost(t*)(idf*)in (tf*),coord(),(score ddtdtdqdqQt
)in getBoost(*)(idf*)in (tf*),coord(),(score dttdtdqdqQt

JGSM Library Project - CS 501 24
coord function
coord():
q is the # of query terms matched in document
Q is # terms in query
only relevant in search for “any of the terms”
Q
q

JGSM Library Project - CS 501 25
Current Progress
Completed: Ranking implementation complete
Left to do: Admin Panel to modify boosted pages/words Uses boost, but need to finalize how to
modify boosting parameter

JGSM Library Project - CS 501 26
Boosting Methods
Two possibilities:
1. Admin modifies score of page relative to current score.
2. Specify position a page should appear given a one-term query.

JGSM Library Project - CS 501 27
Pros and Cons
Method 1: Modify relative to current score
+ More careful manipulation of score possible
+ Faster to code, more time to test
- More difficult to use Method 2: modify rank
+ Easier to use
- Adjustments only possible on one-word queries

JGSM Library Project - CS 501 28
Task List for Final Milestone
Feedback Confirmations and errors will be adjusted to
display the message on the administrative page to improve usability.

JGSM Library Project - CS 501 29
Display stats page
Links for the relevant log pages will be added to the main administration page.

JGSM Library Project - CS 501 30
Batch adding
To facilitate the indexing process, we will add batch adding feature to the main administration page.

JGSM Library Project - CS 501 31
Adjust search results display
The page description will have no cut off words and that the client is satisfied with the search results interface.

JGSM Library Project - CS 501 32
Limit by category
Search by category will be implemented.

JGSM Library Project - CS 501 33
Administrative function to add and remove categories
Adding and removing categories will be implemented and linked to the administrative page.

JGSM Library Project - CS 501 34
Administrative function to weight ranking
Manual ranking adjustments will be added so that the client would be fully satisfied with the search results.

JGSM Library Project - CS 501 35
Authentication
Access to the administration page will use Cornell University’s Web Authentication (CUWebAuth) for authentication.

JGSM Library Project - CS 501 36
Unit Testing and Integration Testing
Every unit that is implemented will be fully unit tested on our own computers, and also integrated into the rest of the code for integration testing.

JGSM Library Project - CS 501 37
Installation and Refinement
The installation of the final system will take place early before the next milestone in order to avoid any delay.
This time period is reserved for any last minute minor changes to the system to ensure the client’s satisfaction.

JGSM Library Project - CS 501 38
Documentation and Training Slides
Our final milestone includes a detailed documentation of the project, training slides and an informal training session to help administrators to learn the control of the system.

JGSM Library Project - CS 501 39
Deployment
After careful testing and feedback, the search system will go live.

JGSM Library Project - CS 501 40
Timeline

JGSM Library Project - CS 501 41
Demo…

JGSM Library Project - CS 501 42
The End.
Questions? Comments?