knowledge modeling from software documentation by madhuri gopal, g.s mahalakshmi v.vani vijayan

33
Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan

Upload: esmond-sims

Post on 28-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan

Knowledge Modeling from Software Documentation

ByMadhuri Gopal,

G.S Mahalakshmi V.Vani Vijayan

Page 2: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan

Agenda:

• Objective• Project overview• Design Principles• Technology Stack• Approach and Methodology • Execution Framework• Modules Covered• Results

Page 3: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan

Objective

The objective of this presentation is to understand the nuances of converting existing software documentation to an intelligent knowledge representation

Page 4: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan

Project Overview: Background • Traditional development , deployment & maintenance of conventional software applications require higher quality with shorter time to market cycles to reap the benefits of customer delight.

• This involves a formal , explicit and conventional representation of the knowledge base shared across stakeholders

• Existing SDLC documents do not cater to any intelligent extraction and interpretation either for downstream applications or enhancements.

• There is a growing need for effective and efficient utilization of software artifacts to deliver enhanced traceability to changing future needs.

Page 5: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan

Challenges in the existing systems

• More than 90% of existing software documentation is in the form of text• Knowledge Engineers create knowledge representations from the scratch making reuse and enhancements difficult to existing representations• Existing Knowledge representation techniques require domain knowledge and have a steep learning curve.• Difference in conceptualization of the domain model leads to inconsistencies in its representation

Page 6: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan

Design Principles Open Close Principle

Software entities like classes, modules and functions should be open for extension but closed for modifications.

Dependency Inversion Principle• High-level modules should not depend on low-level modules. Both should depend on abstractions.• Abstractions should not depend on details. Details should depend on

abstractions.

Page 7: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan

Design Principles Contd.. Single Responsibility Principle A class should have only one reason to change.

Liskov's Substitution Principle Derived types must be completely substitutable for their base types.

Page 8: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan

Technology Stack

The architecture followed is a 2 tier architecture.

Front-End : Java Back-end : Files

Page 9: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan

Development Hardware

Processor: Intel(R) Core™ 2 Duo CPU T6400 @ 2.00 GHZ Memory(RAM) : 4 GBSystem type: 32-bit Operating System

Tools used

CoreNLP – Stanford package for Natural Language Processing(NLP)ConExp - Open Source for creation of Formal Concept Lattice.

Page 10: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan

Approach and Methodology • Software prototyping (Incremental prototyping) methodology is used for development.

• The final product is built as separate prototypes. • At the end the separate prototypes are merged in an overall design

• Steps are: a) Identification of basic requirements. b) Development of the initial prototype c) Review of prototype d) Revision and Enhancement of the Prototype

Page 11: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan

Overall Architecture

Page 12: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan

Modules covered

1. Part Of Speech Tagging (POS) using a Maximum Entropy based Tagger algorithm

2. Lemmatization to reduce the relevant terms extracted by POS Tagging to their Lemma forms.

3. Named Entity Resolution(NER) using Conditional Random Fields(CRF) with Gibbs sampling for entity identification & extraction.

4. Parsing to determine the grammatical structure w.r.t Formal Parsed Grammar using a Factored model.

Page 13: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan

Modules covered contd….

5. Co-reference Resolution by using tiers of deterministic models to determine the relative importance of different terms.

6. Querying and Manipulation of Natural Language Text

7. Formal Concept analysis to derive the relationship between the attributes & the objects and also between attributes

8.Conversion of formal concept lattice to XML for extraction of Knowledge representation.

Page 14: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan

Input SourcesSoftware Engineering documents that are part ofMIL STD 498 Software Development Standard are used as input consisting of:

• Computer Operation Manual (COM)• Computer Programming Manual (CPM)• Database Design Description (DBDD)• Firmware Support Manual (FSM)• Interface Design Description (IDD)• Interface Requirements Specifications (IRS)• Operational Concept Description (OCD)• Software Centre Operator Manual(SCOM)• Software Design Description (SDD)• Software User Manual (SUM)• Software Version Description (SVD)

Page 15: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan

Input Sources Contd..

• Software Development Plan (SDP)• Software Input/ Output Manual (SIOM)• Software Installation Plan (SIP)• Software Product Specification (SPS)• Software Requirements Specification (SRS)• System/Subsystem Design Description• System/Subsystem Specification• Software Test Description (STD)• Software Test Plan• Software Test Report (STR)• Software Transition Plan (STrp)

Page 16: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan

AlgorithmStep 1 : Tagger 1= POS_Tagging_Function(SRS ) Tagger 2= POS_Tagging_Function(SDD ) Tagger 3= POS_Tagging_Function(STD)

Step 2: Lemma_Form1 = Lemma_construction(Tagger1) Lemma_Form2 = Lemma_construction(Tagger2) Lemma_Form 3= Lemma_construction(Tagger3)

Step 3: NER1 =CRF_Gibbs_Function(Lemma_Form1 ) NER2 =CRF_Gibbs_Function(Lemma_Form2 ) NER3 =CRF_Gibbs_Function(Lemma_Form3 )

Step 4: Parse1 = Parser(NER1) Parse2 = Parser(NER1) Parse3 = Parser(NER1)

Page 17: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan

Input Sources Contd..

Step 5: CoRef1 = Coreference_Resolution(Parse1) CoRef2 = Coreference_Resolution(Parse2) CoRef3 = Coreference_Resolution(Parse3)

Step 6: TREE_NODE= Query_Manipulation_function(CoRef1, CoRef2, CoRef3)

Step 7: Concept_Lattice= FCA (context, concept,TREE_NODE)

Step 8: XML_DOC = XML_Convert(Concept_Lattice)

Page 18: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan

Implementation StepsThe algorithm is mapped to the following series of steps:

• Collection of existing software documents a) Software Requirements Specification(SRS) This document contains a set of use cases that describe system – user interaction & non functional requirements as design constraints and quality standards.

b) Software Design Document (SDD) The SDD shows how the software system will be structured to represent software components, interfaces, and data necessary for the implementation phase.

c) Software Testing Document (STD) It specifies the form of a set of documents for use in different stages of software testing

Page 19: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan

Implementation Steps contd…• Extraction of relevant knowledge from the SRS, SDD, SDT by using a sequence of natural language processing steps as follows:

• POS tagging• Lemmatization• Named Entity Resolution• Syntactic Parsing• Coreference Resolution

Input: SRS, SDD , STDOutput: Annotated Text Corpora

Page 20: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan

Annotated SRS

Page 21: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan

Annotated SDD Annotated STD

Page 22: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan

Implementation Steps contd… Querying and Manipulation of annotated text corpora and conversion to tree data structures

• This step uses query manipulation tools to extract the relevant knowledge from the annotated text corpora .

• The verb subject , object and PP complement pairs are extracted and the syntactic dependencies between verb subject – verb- verb object and verb- PP complement are exploited to derive a meaningful hierarchical relationship

Input: Annotated SRS, SDD , STD Output: Tree Data Structure Representation

Page 23: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan

Implementation Steps contd… Formation of Concept Lattice using Formal Concept Analysis

• The hierarchical information and syntactic dependencies obtained by NLP gives a relationship between the set of verbs that act as objects and the verb-subject , verb-object & verb-PP Complement act as the set of attributes.

• This relationship is written in the form of a matrix given as input to ConExp that transforms the matrix to a concept lattice.

Input: Tree Data structure Representation Output: Formal Concept Lattice

Page 24: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan
Page 25: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan
Page 26: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan
Page 27: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan

Formal Concept Lattice

Page 28: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan

• The top most element indicates the object that has no attributes

• The bottom most element indicates the object that has all attributes. • The node in blue indicates the objects

• The node in orange depicts the attributes

Page 29: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan

Implementation Steps contd… Conversion of formal concept lattice to XML

• The set of all attributes and their values is extracted for each object .

• This provides an intermediate representation of the Concept hierarchy before it is transformed to a knowledge representation.

Input: Formal Concept Lattice Output: XML Format

Page 30: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan

Implementation Steps contd… Pseudocode for Conversion of formal concept lattice to XML

• Let n be the total number of objects and m be the total number of attributes For j =1 to n For k= 1 to m For each object Ij and attribute Ak that is is an attribute of Ij , Form the XML element with head =Ij and list of attributes Ak

Page 31: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan

Conclusion• Software documentation practices vary among different organizations. • 53% of the organizations deliver consistent software to maintenance phase• 16% update their documentation at all levels • 53% of organizations have their user manuals consistent with system state • 42% revise and modify regression test case repositories • 11% achieve full traceability amongst system documents and only 5% have

achieved traceability of change . On an average, a software Cost savings of 10- 15% is expected to be achieved

depending on the size and complexity of software documentation

Page 32: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan
Page 33: Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan

Thank You