final presentation industrial project 234313 automatic tagging tool for hebrew wiki pages...

17
Final Presentation Industrial project 234313 Automatic tagging tool for Hebrew Wiki pages Supervisors: Dr. Miri Rabinovitz , Dr. Haim Mizrahi Academic coordinator: Prof. Michael Elad Students: Eyal Sharabi

Upload: percival-wilkerson

Post on 04-Jan-2016

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Final Presentation Industrial project 234313 Automatic tagging tool for Hebrew Wiki pages Supervisors: Dr. Miri Rabinovitz, Supervisors: Dr. Miri Rabinovitz,

Final PresentationIndustrial project 234313

Automatic tagging tool for Hebrew Wiki pages

Supervisors: Dr. Miri Rabinovitz, Dr. Haim Mizrahi

Academic coordinator: Prof. Michael Elad

Students: Eyal Sharabi Horwitz, Shiran Cohen

Page 2: Final Presentation Industrial project 234313 Automatic tagging tool for Hebrew Wiki pages Supervisors: Dr. Miri Rabinovitz, Supervisors: Dr. Miri Rabinovitz,

Project Objectives This project is part of an overall development of an

organizational Wiki meant for sharing information within the organization.

Our project’s objective is to serve as an automatic tagging tool for key phrases, based on an organizational taxonomy. The project is composed of two separate modules – a service module and the GUI module

The Objectives of the Service Module: Identifying key phrases that relate to an organizational

taxonomy in an unstructured text. Develop and implement algorithms to identify and

extract new key phrases from a given document.

Page 3: Final Presentation Industrial project 234313 Automatic tagging tool for Hebrew Wiki pages Supervisors: Dr. Miri Rabinovitz, Supervisors: Dr. Miri Rabinovitz,

Project Objectives – cont. The Objectives of the Service Module – cont.

Present the findings in an excel file to allow future analysis of the key phrases found by the automatic tagging tool.

The Objectives of the GUI Module: Design an Interface that enables the user to analyze

the key phrases found by the automatic tagging tool:• Insert a new key phrase into the taxonomy.• Delete a key phrase suggested by the automatic tagging

tool.• Edit the text of a key phrase suggested by the automatic

tagging tool before adding it to the taxonomy. Present the rationale that lead to the finding of a key

phrase by the service module, Allow the user to add new key phrases to the

taxonomy

Page 4: Final Presentation Industrial project 234313 Automatic tagging tool for Hebrew Wiki pages Supervisors: Dr. Miri Rabinovitz, Supervisors: Dr. Miri Rabinovitz,

Methodology In depth understanding of the morphology analyzed

documents and taxonomy and using this information in the different tagging algorithms.

Literature survey used for developing algorithms to present new key phrases to the user from a given document:

Frequency based tagging algorithm – checks how frequent a key phrase appear in a given document and in the whole corpus.

Location based tagging algorithm – gives a score to a key phrase based on it’s distance from the beginning and end of the document and it’s life span in the document.

Noun tagging algorithm – gives higher score to key phrases with multiple nouns.

Microsoft’s .Net WinForms API was used to create the GUI.

Access DB was used to save the information about the key phrases used by the different algorithms, and to save the updated taxonomy.

Page 5: Final Presentation Industrial project 234313 Automatic tagging tool for Hebrew Wiki pages Supervisors: Dr. Miri Rabinovitz, Supervisors: Dr. Miri Rabinovitz,

Achievements The Service Module

Implementing an algorithm for identifying key phrases from the taxonomy in a given text. Using an advanced screening process of similar key phrases.

Implementing several tagging algorithms used to suggest new key phrases to the user.• Frequency, location and noun based tagging (presented in

the methodology section)• Foreign language tagging – tagging the foreign language

phrases in the text Flexibility:

• GUI-Process separation to allow portability and usage with various systems

• Expansion of the taxonomy to effectively unlimited size• New tagging algorithms can be added easily to the process.

Page 6: Final Presentation Industrial project 234313 Automatic tagging tool for Hebrew Wiki pages Supervisors: Dr. Miri Rabinovitz, Supervisors: Dr. Miri Rabinovitz,

The GUI An Interface was created to enable the user to

analyze the key phrases found by the automatic tagging tool:• Insert a new key phrase into the taxonomy – adding the new

key phrase under an existing main subject and secondary subject in the taxonomy hierarchy or adding new ones.

• Delete a key phrase.• Edit the text of a key phrase suggested by the automatic

tagging tool before adding it to the taxonomy. Present the rationale that lead to the finding of a key

phrase by the service module, Allow the user to add new key phrases to the

taxonomy by marking the desired text in the document.

Achievements – cont.

Page 7: Final Presentation Industrial project 234313 Automatic tagging tool for Hebrew Wiki pages Supervisors: Dr. Miri Rabinovitz, Supervisors: Dr. Miri Rabinovitz,

The GUI – cont. Algorithm selection window:

• used to select the different algorithms to be used in order to find new key phrases in a given text.

• Allows to manage the parameters of the different algorithms, to give different weights to different algorithms and different weights to phrases of different size in order to give preference in the tagging process to phrases of a certain size.

Saving the findings for future use and analysis:• Enables the user to save the current taxonomy into the DB

for future use in other documents• Enables the user to save the current taxonomy and the new

key phrase found by the automatic tagging tool to an excel file for future analysis

Achievements – cont.

Page 8: Final Presentation Industrial project 234313 Automatic tagging tool for Hebrew Wiki pages Supervisors: Dr. Miri Rabinovitz, Supervisors: Dr. Miri Rabinovitz,

Achievements – cont.

Documentation provided User’s manual Developers’ guide Inline documentation of the code

Page 9: Final Presentation Industrial project 234313 Automatic tagging tool for Hebrew Wiki pages Supervisors: Dr. Miri Rabinovitz, Supervisors: Dr. Miri Rabinovitz,

Example of the tagging processA new document was loaded to the automatic tagging tool

P r e s s i n g f i l e - > O p e n

L o a d s a n e w f i l e

t o t h e a u t o m a t i c

t a g g i n g t o o l

Page 10: Final Presentation Industrial project 234313 Automatic tagging tool for Hebrew Wiki pages Supervisors: Dr. Miri Rabinovitz, Supervisors: Dr. Miri Rabinovitz,

By pressing the “Initiate Tagging” button the tagging process

begins. here presented are the tagging results of the taxonomy

based tagging algorithms

T h i s k e y p h r a s e w a s f o u n d b y t h e

a u t o m a t i c t a g g i n g t o o l a s a t a x o n o m y

p h r a s e

T h e d i f f e r e n t

t a x o n o m y k e y

p h r a s e s t h a t w e r e f o u n d i n t h e t e x t

a r e p r e s e n t e d

i n t h e h i e r a r c h y t h a t t h e y

a p p e a r i n , i n t h e

t a x o n o m y

I n i t i a t e t h e

t a g g i n g p r o c e s s

Page 11: Final Presentation Industrial project 234313 Automatic tagging tool for Hebrew Wiki pages Supervisors: Dr. Miri Rabinovitz, Supervisors: Dr. Miri Rabinovitz,

The user can press on the phrases that were found and see

their location in the document and their location in the

hierarchy of the organizational taxonomy

B y p r e s s i n g a

c e r t a i n k e y

p h r a s e i t i s

p r e s e n t e d i n t h e t e x t

i n r e d

Page 12: Final Presentation Industrial project 234313 Automatic tagging tool for Hebrew Wiki pages Supervisors: Dr. Miri Rabinovitz, Supervisors: Dr. Miri Rabinovitz,

The user can choose which of the implemented tagging

algorithms he wishes to run and their weight in determining

whether a phrase found in the document will be presented to the

user as a new suggested key phrase

P r e s s i n g : A l g o r i t h m - > a l g o r i t h m s e l e c t i o n ,

a l l o w s t h e u s e r t o c h o o s e t h e a d v a n c e d t a g g i n g a l g o r i t h m s t o

r u n

T h e u s e r c a n c h o o s e t h e a l g o r i t h m s t o r u n a n d t h e i r

w e i g h t i n t h e t o t a l s c o r e . H e c a n a l s o g i v e h i g h e r w e i g h t

t o p h r a s e s o f a c e r t a i n l e n g t h

Page 13: Final Presentation Industrial project 234313 Automatic tagging tool for Hebrew Wiki pages Supervisors: Dr. Miri Rabinovitz, Supervisors: Dr. Miri Rabinovitz,

The new key phrases found by the automatic tagging tool are

presented to the user and he can chose whether to approve

or delete each of the suggested key phrases

T h e n e w k e y

p h r a s e s f o u n d b y

t h e a u t o m a t i c

t a g g i n g t o o l a r e

p r e s e n t e d t o t h e

u s e r

T h e u s e r c a n

a p p r o v e / d e l e t e a n e w k e y

b y p r e s s i n g t h e r i g h t c l i c k o f

t h e m o u s e o n t h e p h r a s e

Page 14: Final Presentation Industrial project 234313 Automatic tagging tool for Hebrew Wiki pages Supervisors: Dr. Miri Rabinovitz, Supervisors: Dr. Miri Rabinovitz,

If the user chose to approve a certain key phrase, he enters an

editing window were he decides where the new key phrase

should be in the taxonomy hierarchy.

I f t h e u s e r s e l e c t s a p p r o v e , t h e e d i t i n g

w i n d o w o p e n s a n d t h e u s e r i s b e i n g

r e q u e s t e d t o c h o o s e a m a i n a n d s e c o n d a r y s u b j e c t f o r t h e n e w

k e y p h r a s e i n t h e t a x o n o m y h i e r a r c h y

Page 15: Final Presentation Industrial project 234313 Automatic tagging tool for Hebrew Wiki pages Supervisors: Dr. Miri Rabinovitz, Supervisors: Dr. Miri Rabinovitz,

The user can save all the new findings and the new taxonomy

into an excel file or into the DB for future use and analysis

T h e c u r r e n t t a x o n o m y a n d

n e w k e y p h r a s e s a p p r o v e d b y t h e

u s e r c a n b e s a v e d f o r f u t u r e

a n a l y s i s

Page 16: Final Presentation Industrial project 234313 Automatic tagging tool for Hebrew Wiki pages Supervisors: Dr. Miri Rabinovitz, Supervisors: Dr. Miri Rabinovitz,

Conclusions When developing a system, large or small, one must

take the time to plan and create a high level design and not rush to implement the system.

A considerable amount of time should be dedicated to fully understand the morphology analyzer’s output.

To optimize the system’s output it should be tested on a large document corpus.

This course has contributed a lot to us in learning how to work with different software tools, develop a large system and work in a team.

Page 17: Final Presentation Industrial project 234313 Automatic tagging tool for Hebrew Wiki pages Supervisors: Dr. Miri Rabinovitz, Supervisors: Dr. Miri Rabinovitz,

Points for improvement

Choose a certain appearance of a key phrase in the text based on high number of key phrases surrounding it.

Integrate algorithms using advanced natural language processing tools for better understanding of the text.

Add machine learning abilities that enable the system to adjust the parameters of the different algorithms as the system analyzes more documents.