supervisor: mr. phan trường lâm supervisor:. team information

59
Supervisor: Mr. Phan Trường Lâm Students: Vũ Nhật Linh Lê Quang Hoàn Nguyễn Duy Quyền Hoàng Nam Nguyễn Thế Anh Capstone Project Documents Management

Upload: meghan-haraway

Post on 02-Apr-2015

227 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Supervisor:

Mr. Phan Trường Lâm

Students: Vũ Nhật LinhLê Quang HoànNguyễn Duy QuyềnHoàng NamNguyễn Thế Anh

Capstone Project Documents Management

Page 2: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Team information

Page 3: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Agenda

Introduction

Project plan

System Requirement Specifications

System Analysis and Design

Testing

Deployment and User Guide

Summary

Demo and Q&A

Page 4: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Introduction

Initial Idea

Literature Review of Existing System

Proposal & Product

1 2 3 4 5 6 7 8

Page 5: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Initial Idea1 2 3 4 5 6 7 8

Page 6: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Initial Idea1 2 3 4 5 6 7 8

We decide to develop a new system that integrated:

Collect documents

Organize these documents

Extract keyword

Ranking

Searching

Page 7: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Literature Review of Existing System

Methods that these websites use to build their systems:

Big database

Search

Ranking and highlight return results

Compare documents to detect plagiarism

1 2 3 4 5 6 7 8

Page 8: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Literature Review

Achievements of the existing systems

Attractive• Easy to use• Speed & Reliability• Quality Results• Ensuring Security

Awareness

Limitations of the existing systems Costs Privacy

1 2 3 4 5 6 7 8

Page 9: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Proposal

•Collect and manage Capstone projects

•Support looking up Capstone projects

•Avoid repeating and copying idea

•Ranking results

•Refer to other materials

•Friendly interface like Google

•Chipper to build

•Free to use

•Public for everyone

•Inside and outside University

1 2 3 4 5 6 7 8

Page 10: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Product

(in future)(in future)Mobile application

Web application

1 2 3 4 5 6 7 8

Page 11: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Project Plan

Development environment

Process

Project organization

Project schedule

Risk management

1 2 3 4 5 6 7 8

Page 12: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Development Environment1 2 3 4 5 6 7 8

1 Gb of RAM100Gb of hard diskCore 2 Duo 2.0 GHz

2 Gb of RAM100Gb of hard disk

Core 2 Duo 2.0 GHz

HARD WARE

SOFT WARE

Page 13: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Process

Follow Waterfall model

1 2 3 4 5 6 7 8

Page 14: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Project organization1 2 3 4 5 6 7 8

Page 15: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Controlling and Monitoring

• Meeting

• Assign task

• Tracking task

• Issue resolve

• Review task

• Report

1 2 3 4 5 6 7 8

Project organization

Page 16: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Communication control Online activity

• Email• Chat• Phone

Offline activity• Kick-Off project• Team building

1 2 3 4 5 6 7 8

Project organization

Page 17: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Project Schedule1 2 3 4 5 6 7 8

Overall plan

Page 18: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Risk Management

Risk Management

People risk

Estimation risk

Technology risk

Requirement risk

Schedule risk

1 2 3 4 5 6 7 8

Page 19: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

System Requirement Specifications1 2 3 4 5 6 7 8

User Requirements

System Requirements

Non-functional requirements

Page 20: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

User Requirements1 2 3 4 5 6 7 8

Lecturers and Students:• Search project documents.• Download documents.

Librarians:• Edit profile.• Search documents.• Add/Edit/Delete document.• Add/Edit/Delete category.

Administrator• Edit profile.• Add/Edit/Delete account.

Page 21: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

User Requirements1 2 3 4 5 6 7 8

Other requirement• Searched results will be ranked.

• Document has following information:

Name

Author

Supervisor

Category

Description

Page 22: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

User Requirements1 2 3 4 5 6 7 8

• Input files:

Keyword file

Abstract file

Full document file

Other materials

Page 23: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

System Requirements1 2 3 4 5 6 7 8

Communicate via the protocol HTTP to complete interactions based on service with client computers and use standard protocols.

Configuration Server: Windows Server 2008 operating system

.NET framework 3.5SQL server 2008

IIS 7 Client: Web browser

Page 24: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Non-functional Requirements1 2 3 4 5 6 7 8

Usability

Availability

Security

Reliability

PerformanceSecurity

Maintainability

Page 25: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

System Analysis and Design1 2 3 4 5 6 7 8

Architectural design

Detail design

Database design

Coding convention

Extract Keyword algorithm

Ranking

Page 26: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Architectural design1 2 3 4 5 6 7 8

Overall architecture MVC architecture design pattern

Page 27: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Detail design1 2 3 4 5 6 7 8

CProDMS Component Diagram

Page 28: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Database design1 2 3 4 5 6 7 8

Entity diagram

Page 29: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Coding convention1 2 3 4 5 6 7 8

Follow:

Microsoft .NET Library StandardsFxCop rules and Code Analysis for Managed Code Warnings

Page 30: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Extract Keyword Algorithm1 2 3 4 5 6 7 8

Introduction

Study Algorithm

Evaluation

Keyword Extraction from a Single Document using Word Co-occurrence Statistical Information

(YUTAKA MATSUO and MITSURU ISHIZUKA)(Dec. 10, 2003)

Page 31: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Algorithm – What is the keyword?1 2 3 4 5 6 7 8

Position

Meaning

Frequency

Keyword

Page 32: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Algorithm – Step by step 1 2 3 4 5 6 7 8

Preprocessing

Processing

Discard stop words Stem Extract

frequency

Calculate X’2 value

Output

Expected probability

Select frequent term

Page 33: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Algorithm – Studying1 2 3 4 5 6 7 8

Original Text

Information is the most powerful weapon in the modern society. Every day we are overflowed with a huge amount of data in form of electronic newspaper articles, emails, web pages and search results. Often, information we receive is incomplete, such that further search activities are required to enable correct interpretation and usage of this information.

Example:

Information powerful weapon modern society day overflowed huge amount data electronic newspaper articles emails web pages search results Often information receive incomplete such further search activities required enable correct interpretation usage information

Stemmed Words

Information is the most powerful weapon in the modern society. Every day we are overflowed with a huge amount of data in form of electronic newspaper articles, emails, web pages and search results. Often, information we receive is incomplete, such that further search activities are required to enable correct interpretation and usage of this information.

Discarded Stop Words

Step1

Step2

Using Porter Stemming Algorithm

Information is the most powerful weapon in the modern society. Every day we are overflowed with a huge amount of data in form of electronic newspaper articles, emails, web pages and search results. Often, information we receive is incomplete, such that further search activities are required to enable correct interpretation and usage of this information.

Informat power weapon modern societi day overflow huge amoun data electronic newspaper articl email web page search result Often informat receive incomplet such further search activ requir enable correct interpret usag informat

Page 34: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Algorithm – Studying1 2 3 4 5 6 7 8

The top ten frequent terms (denoted as G) and the probability of occurrence, normalized so that the sum is to be 1.

Select frequent Term

As study, number of keyword is about 10% number of term in document and no more than 30 terms.

Page 35: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Algorithm – Studying1 2 3 4 5 6 7 8

Two terms in a sentence are considered to co-occur once.

Co-occurrence and Importance

Example:

The imitation game could then be played with the machine in question and the mimicking digital computer and the interrogator would be unable to distinguish them.

“imitation” and “digital computer” have one co-occurrence

Page 36: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Algorithm – Studying1 2 3 4 5 6 7 8

Co-occurrence and Importance

Page 37: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Algorithm – Studying1 2 3 4 5 6 7 8

The degree of biases of co-occurrence can be used as a indicator of term importance

Co-occurrence and Importance

Page 38: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Algorithm – Studying1 2 3 4 5 6 7 8

The statistical value of χ2 is defined as

pg Unconditional probability of a frequent term g G ∈(the expected probability)

nw The total number of co-occurrence of term w and frequent terms G

freq (w, g) Frequency of co-occurrence of term w and term g

Page 39: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Algorithm – Studying1 2 3 4 5 6 7 8

pg (the sum of the total number of terms in sentences where g appears) divided by (the total number of terms in the document)

nw The total number of terms in the sentences where w appears including w

We consider the length of each sentence and revise our definitions

Page 40: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

1 2 3 4 5 6 7 8

Algorithm – Studying

Page 41: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Algorithm – Studying1 2 3 4 5 6 7 8

the following function to measure robustness of bias values

Subtracts the maximal term from the X2 value

Page 42: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

1 2 3 4 5 6 7 8

Algorithm – Studying

Page 43: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

To improve extracted keyword, we will cluster terms

Two major approaches (Hofmann & Puzicha 1998) are:

Similarity-based clustering If terms w1 and w2 have similar distribution of co-occurrence with other terms, w1 and w2 are considered to be the same cluster.

Pairwise clustering If terms w1 and w2 co-occur frequently, w1 and w2 are considered to be the same cluster.

Eg: Monday is a day in week.Tuesday is a day in week.Wednesday is a day in week.

1 2 3 4 5 6 7 8

Algorithm – Studying

Page 44: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Similarity-based clustering centers upon Red Circles

Pairwise clustering focuses on Green Circles

1 2 3 4 5 6 7 8

Algorithm – Studying

Page 45: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Where:

Similarity-based clusteringCluster a pair of terms whose Jensen-Shannon divergence is

and:

1 2 3 4 5 6 7 8

Algorithm – Studying

Page 46: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Cluster a pair of terms whose mutual information is

Pairwise clustering

Where:

1 2 3 4 5 6 7 8

Algorithm – Studying

Page 47: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Algorithm – Evaluation1 2 3 4 5 6 7 8

Precision: Ratio of right keyword to number of keywordCoverage: Ratio of indispensable keyword in list to all the indispensable terms

Frequency index: average frequency of keyword in list

Page 48: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Ranking – Why?1 2 3 4 5 6 7 8

Ranking Result

Page 49: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Ranking1 2 3 4 5 6 7 8

Page 50: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Ranking1 2 3 4 5 6 7 8

Use rank calculate formula Term in a collection documents: ( Automatic Keyword Extraction for Database Search First examiner : Prof. Dr. techn. Dipl.-Ing. Wolfgang Nejdl Second examiner : Prof. Dr. Heribert Vollmer Supervisor : MSc. Dipl.-Inf. Elena Demidova ) R(t) = Fd(t)*log(1 + N/N(t)) (1)

Rank of Term t in all the collection

Total number of documents in the

collection

Frequency of Term t in the

given document

Total number of documents that contain Term t

Ranking formula :Rank = d * Rd(t) / R(t) (2)

=> Rank = d * Rd(t) / (Fd(t)*log(1 + N/N(t))) (3)

reliability coefficient

Rank of Term t in document, which

extracted by Extract Service

Page 51: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Searching1 2 3 4 5 6 7 8

Page 52: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Testing1 2 3 4 5 6 7 8

V - model

Page 53: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Testing1 2 3 4 5 6 7 8

Page 54: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Testing1 2 3 4 5 6 7 8

No Tester Module code Pass Fail Untested N/A Number of test cases

1 AnhNT Master Page 18 0 0 0 18

2 AnhNT Home Page 12 0 0 0 12

3 AnhNT Search Result 5 0 0 0 5

4 AnhNT User Account 69 0 0 0 69

5 AnhNT Error Page 8 0 0 0 8

6 NamH Category 36 0 0 0 36

7 NamH Document 47 0 0 0 47

8 NamH Authenticated 81 0 0 0 81

9 NamH User Document Detail 9 0 0 0 9

    Sub total 285 0 0 0 285

               

 

 

Test coverage   100.00 %    

 

 

Test successful coverage   100.00 %    

Test result

Page 55: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Deployment

Package Source Code

Client side

Server side

Page 56: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

User guide1 2 3 4 5 6 7 8

Page 57: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

Summary1 2 3 4 5 6 7 8

Strong point• Enthusiasm

• Creative

• Cope with change

Weak point• Lack of technical skill

• Lack of management skills

Lessons learned• Improve technical & management skills

• Release on-time product with the restriction of time and resource

• Improve communication skills & problem solving

Page 58: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information

1 2 3 4 5 6 7 8

Demo & Q&A

Page 59: Supervisor: Mr. Phan Trường Lâm Supervisor:. Team information