Knowledge Management Systems: Development and ApplicationsPart III: Case Studies and Future
Hsinchun Chen, Ph.D.
McClelland Professor,
Director, Artificial Intelligence Lab and Hoffman E-Commerce Lab
The University of Arizona
Founder, Knowledge Computing Corporation
Acknowledgement: NSF DLI1, DLI2, NSDL, DG, ITR, IDM, CSS, NIH/NLM, NCI, NIJ, CIA, NCSA, HP, SAP
美國亞歷桑那大學 , 陳炘鈞 博士
Knowledge Management Systems:Knowledge Management Systems:Case StudiesCase Studies
Multi-lingual Knowledge Portal (1M):
Meta searching, post-retrieval analysis, summarization,
categorization, AI Lab tooolkits
• Knowledge Portals are online searching systems that provide large amount of information resources and services within a specific domain. – Providing frequently updated and highly domain-specific
information.– Providing efficient and precise searching service.– Providing advanced analysis functionalities which can help
users find the information needed among huge amount of data.
– Providing additional tools such as Personalization and Alerting System to facilitate the searching tasks.
NanoPort: Knowledge Portal for Nanotechnology Researchers• Goal:
– Providing information services to nanotechnology researchers.– The design of the content and function is based on the feedback of Nanoscale Science and
Engineering (NSSE) experts.• Content:
– 1,000,000 high quality nanotechnology-related webpages in database.– Meta-search 4 search engines, 5 online databases and 3 online journals
• Key Features:– Dynamic summarization– Folder display– Visualization using self-organizing map (SOM)– Patent nalysis
• Funding:– US National Science Foundation (NSF) Nano Initiative
• Demo:– http://nanoport.org/
Input keywords
Select search engines
Select online databases
Select online journals
Folder displayVisualization using SOM
Folder display Visualization with SOM
Summary
The original page
Highlight the summary in the original page
with corresponding color
Click on the summary sentence and jump to
its position in the original page
Summarize result dynamically
MedTextus: English Medical Intelligence• Goal:
– Providing information services to researchers in medical domain.
• Content:– Meta-search 5 large medicine-related online databases and journals.
• Key Features:– Keyword suggester– Folder display– Visualization using SOM
• Funding:– US National Library of Medicine (NLM)
• Demo:– http://ai23.bpa.aizona.edu/medtextus/
Keyword suggested by the system
Result pageVisualization with SOM
Folder display
Select databases
Input keywords
Keyword suggester
Advanced search options
eBizPort: English Business Intelligence• Goal:
– Providing business, trading and financial information services to commercial users.
• Content:– 500,000 high quality webpages in database.– Meta-search 10 authoritative online business magazines.
• Key Features:– Search by date– Keyword suggester– Dynamic summarization– Folder display– Visualization using SOM
• Demo:– http://ai18.bpa.arizona.edu:8080/ebizport/
Date of the result page
Result page
Folder display and SOM
Keyword suggested by the system
Keyword suggester
Limit the date of the result pages
Chinese Medical Intelligence (CMI)• Goal:
– Providing medical and health information services to both researchers and public.
• Content:– 350,000 high quality medical-related webpages collected from mainland
China, Hong Kong and Taiwan.– Meta-search 3 large general Chinese search engines.
• Key Features:– Built-in Simplified/Traditional Chinese encoding conversion– Dynamic summarization for both Simplified and Traditional Chinese– Automatic categorization– Visualization using SOM
• Demo: – http:// 128.196.40.169:8000/gbmed/
Simplified Chinese summary
Traditional Chinese summary
Chinese folder displayChinese visualizationwith SOM
Simplified/Traditional Chinese summarization
Select websites from mainland China, Hong Kong and Taiwan
Select search engines from mainland China, Hong Kong and Taiwan
Results are from both Simplified and Traditional Chinese
Original encoding of the result
Traditional Chinese results haven been converted into simplified Chinese
Chinese Business Intelligence (CBI)• Goal:
– Providing business, trading and financial information services to Chinese commercial users.
• Content:– 300,000 high quality webpages collected from Mainland China, Hong Kong
and Taiwan. • Key Feature:
– Built-in Simplified/Traditional Chinese encoding conversion– Dynamic summarization for both Simplified and Traditional Chinese– Folder display– Visualization using SOM
• Demo – http://ai14.bpa.arizona.edu:8081/nanoport/
The largest business, trading and financial websites in mainland China, Hong Kong and TaiwanBoth Simplified and Traditional
results are retured
Chinese summarizer
Simplified Chinese summary
Traditional Chinese summary
Chinese folder display
Chinese visualization with SOM
Spanish Business Intelligence Portal
Meta searches 7 major sources and provides searching of its own collection (PIN)
Supports boolean searching and allows the display of 10, 20, 30, 50, or 100 results per each meta searchers
Keyword suggestion from Scirus and Concept Space
Detailed directory of Spanish business resources on the Web
Keyword:
comercio electronico
Search, Organize, or Visualize resultsSearch, Organize, or Visualize resultsSearch, Organize, or Visualize results
Results organized by meta searchersSummarize in 3 or 5
sentences
Automatic keyword suggestion
Search Page Result PageSummarizer
A three-sentence summary on leftOriginal page
shown on right
Categorizer
Web pages grouped by key phrases extracted by mutual information algorithm (non-exclusive categorization)
Visualizer
Web pages visualized by self-organizing map (SOM) algorithm
Search Page Spanish Business Taxonomy
Web sites about the topic “Electronic Commerce” in Spanish speaking countries
Arabic Medical Intelligence Portal
Provides a virtual Arabic keyboard to facilitate input
Search Page Result Page
Categorizer
Visualizer
Lessons Learned• The content selection and functionality design of knowledge
portal should meet the need of real users.• Using meta-search together with other traditional data
collecting methods can improve the recall without sacrificing the precision of the knowledge portal.
• The structure of the webpage may introduce noise into the dynamic summary.
• The AI Lab toolkits support scalable multi-lingual spidering, indexing, searching, summarization, and categorization
• New Spanish and Arabic portals completed• New cross-lingual web retrieval engine completed
Biomedical Informatics (10M):
Biomedical content, biomedical ontologies,
linguistic phrasing, categorization, text mining
HelpfulMED Search of Medical Websites
HelpfulMED search of Evidence-based Databases
What does database cover?
Search which databases?
How many documents?
Enter search term
Consulting HelpfulMED Cancer Space (Thesaurus)
Enter search term
Select relevant search terms
New terms are posted
Search again...
Or find relevant webpages
1 Visual Site Browser
Browsing HelpfulMED Cancer Map
Top level map2
Diagnosis, Differential3
4 Brain Neoplasms
Brain Tumors5
Genescene Overview
Text MiningProcess Medline abstracts and extract gene relations automatically from the text
Data MiningProcess gene expression
data (and existing knowledge) and use
different algorithms to extract regulatory
networks Interface & Visualization
Allow searching for keywords, display a map of the relations extracted from the text and/or from
the microarray
Knowledge BaseIntegrate gene relations from
literature and outside databases and provide
knowledge for learning and evaluation in data mining
Genescene Overview
Medline
Titles & Abstracts
Feature Structures
Publications &
Meta Information
Publications
MicroArray DataUMLS
VisualizationInformation
RetrievalGeneSceneData Mart
GeneSceneText Mart
Text Mining GeneScene
ConceptSpace
Co-occurrence relations
Data Mining
Relation Parsers
Relations inflat files
XML Parser
UMLS
GO
HUGO
Ontologies
Relations inflat files
Spring Algorithm
BayesianNetworks
AssociationRule Mining
JIF
POS Tagging
FullParser
RelationGrammar
FSA
AZ NounPhraser
Adjuster & Tagger
Lexical lookup
External Databases
KnowledgeBase
Problem: Gene Pathway•Title Key roles for E2F1 in signaling p53-dependent apoptosis and in cell division within developing tumors.•Abstract: Apoptosis induced by the p53 tumor suppressor can attenuate cancer growth in preclinical animal models. Inactivation of the pRb proteins in mouse brain epithelium by the T121 oncogene induces aberrant proliferation and p53-dependent apoptosis. p53 inactivation causes aggressive tumor growth due to an 85% reduction in apoptosis. Here, we show that E2F1 signals p53-dependent apoptosis since E2F1 deficiency causes an 80% apoptosis reduction. E2F1 acts upstream of p53 since transcriptional activation of p53 target genes is also impaired. Yet, E2F1 deficiency does not accelerate tumor growth. Unlike normal cells, tumor cell proliferation is impaired without E2F1, counterbalancing the effect of apoptosis reduction. These studies may explain the apparent paradox that E2F1 can act as both an oncogene and a tumor suppressor in experimental systems
"E2F1 signals p53-dependentapoptosis"
p53
E2F1
apoptosis
infers So, I'm assuming... a straightline pathway...
reads "E2F1 acts upstream of p53"
p53
E2F1
apoptosis
"E2F1 deficiency does notaccelerate tumor growth"
E2F1
p53
apoptosis
tumor growth
reads
E2F1
p53
apoptosis
Action Protocols
reads
GraphicRepresentation
Expert errs and corrects
Final graph
Prepositions: OF/BY/IN
q0
q3
q1
q2
q7
q4
q5
q6
NP
NP, 5: str1
OF
OF
Negation
NP
Adjective,Noun,verb (-ed)
Adjective,noun,verb (-ed)
Nominalization (-ion)
Nominalization (-ion)
OF
Nominalization (-ion)
Adjective, noun,verb (-ed)
Nominalization (-ion)
Nominalization (-ion)
q8
q10
OF
NP
q9
OF BY
q11
q13
q14
q12
BYBY
aux verb
NP
q15Negation
verb
verb
Aux, 1: tr13
BY q16
IN
q17
NP
IN
q18
mod
IN
IN
mod
mod
verb
IN
NP
NP
Aux
q0
q3q3
q1
q2
q7
q4
q5
q6
NP
NP, 5: str1
OF
OF
Negation
NP
Adjective,Noun,verb (-ed)
Adjective,noun,verb (-ed)
Nominalization (-ion)
Nominalization (-ion)
OF
Nominalization (-ion)
Adjective, noun,verb (-ed)
Nominalization (-ion)
Nominalization (-ion)
q8
q10q10
OF
NP
q9
OF BY
q11
q13
q14
q12q12
BYBY
aux verb
NP
q15Negation
verb
verb
Aux, 1: tr13
BY q16
IN
q17q17
NP
IN
q18
mod
IN
IN
mod
mod
verb
IN
NP
NP
Aux
Example Map (one abstract)
Select interesting relations to visualize
Double click to expand
Overview
Expanded node
Finding the truth: p38 acts as a negative feedback for Ras
signaling
Lessons Learned:• Biomedical information is precise but terminologies
fluid• SOM performance for medical documents = 80%• Biomedical professionals need search and analysis
help• Biomedical linguistic parsing and ontologies are
promising for biomedical text mining• The need for integrated biomedical data (gene
microarray) and text mining (literature)• New testbeds completed: p53, AP1, and yeast
COPLINK Crime Data Mining (10M):
Intelligence and security informatics, crime association,
crime network analysis and visualization
COPLINK Connect
Consolidating & Sharing Information promotes problem solving and collaboration Records
Management Systems (RMS)
Mugshots Database
Gang Database
COPLINK Connect Functionality• Generic, common XML based criminal elements representation• Data migration (batch and incremental) and mapping for all major databases and legacy systems• Database independent: ODBC compliance data warehouse• Multi-layered Web-based architecture: database server, Web server, browser• Powerful and flexible search tools for various reports, e.g., incidents, warrants, pawns, etc.• Graphical browser-based GUI interface for ease of use, training and maintenance
H. Chen, J. Schroeder, R. V. Hauck, L. Ridgeway, H. Atabakhsh, H. Gupta, C. Boarman, K. Rasmussen, and A. W. Clements, “COPLINK Connect: Information and Knowledge Management for Law Enforcement,” Decision Support Systems, Special Issue on Digital Government, 2003.
COPLINK DetectConsolidated information enables targeted problem solving via powerful investigative criminal association analysis
COPLINK Detect Functionality• Simple association rule mining applied to criminal elements relationships• Generic, common XML based representation for criminal relationships• Incremental data migration and association analysis on databases• Support powerful, multi-attribute queries using partial crime information• Graphical browser-based GUI interface for simple crime relationship analysis and case retrieval H. Chen, D. Zeng, H. Atabakhsh, W. Wyzga, J. Schroeder, “COPLINK: Managing Law Enforcement Data and Knowledge,” Communications of the ACM, 2003.
COPLINK Detect 2.0/2.5
COPLINK Connect/Detect Status• Systems stable and shown useful. Commercialized and supported by KCC• Systems deployed at: TPD, UAPD, PPD, Phoenix, Huntsville (TX), Des Moines (Iowa), Ann Arbor (Michigan), Boston (Massachusetts), Montgomery county (sniper investigation)• Systems under deployment: Salt River (AZ), Cambridge (Massachusetts), Redmond (Washington), many others• COPLINK acclaims at LA Times and New York Times, Newsweek (sniper investigation)
COPLINK Visual Data Mining Research
COPLINK Criminal Network Analysis: Association Tree, Association Network Analysis, Temporal-Spatial Visualization
• P1000: A Picture is worth 1000 words.• Use visual representations and effective HCI to assist in more
efficient and effective crime analysis• Leverage different representations and algorithms: hyperbolic
trees, network placement algorithms, structural analysis, geo-spatial mapping, time visualization
H. Chen, D. Zeng, H. Atabakhsh, W. Wyzga, J. Schroeder, “COPLINK: Managing Law Enforcement Data and Knowledge,” Communications of the ACM, 2003.
A 9/11 Terrorist Network
Figure 1a: Relations among multiple criminal elements are shown on both a hyperbolic tree (right) and a hierarchical list (left).
Figure 1b: A hyperbolic tree with multiple levels of investigative leads.
Figure 2a: The initial layout of a criminal network before analysis.
Figure 2b: The network is analyzed and automatically adjusted to reflect subgroups and central criminal figures.
Figure 2c: A user may choose only the type that is of interest (e.g., person) and view crime associations (e.g., person name, address).
COPLINK Association Tree and Network (2nd generation)
COPLINK Criminal Structural Analysis (3rd generation)
• Criminal association identification– Using shortest-path algorithms to find the
strongest associations between two or more criminals in a network
• SNA (Social Network Analysis)– Using blockmodel analysis to detect subgroups
and patterns of interactions between groups– Identifying leaders, gatekeepers, and outliers
from a criminal networkJ. Xu & H. Chen, “Criminal Network Analysis: A Data Mining Perspective,” Decision Support Systems, 2004, forthcoming.
The proposed framework
COPLINK SNA Experiment• Data Sets
– TPD incident summaries • Time period—Narcotics: 2000-present; Gangs: 1995-present• Size
– Two testing networks• Narcotics (60 individuals)• Gang (24 individuals)
Total # individuals
# sub-networks
Size of sub-newtorks
Narcotics 12,842 2,628 1-10: 2,587
11-20: 31
21-100: 9
502: 1
Gangs 4,376 289 1-10: 264
11-20: 20
21-100: 4
2,595: 1
A narcotic network example
Switch between narcotic network and gang network
Show network and reset network
Adjust level of details
A point represents an individual labeled by his name
A line represents a link between two persons
A bubble represents a subgroup labeled by its leaders name
A line implies that some individuals in one group interact with some individuals in the other group. The thicker the link, the more individual interactions between the two groups
The size of a bubble is proportional to the number of individuals in the group
The rankings of the members of a selected group (green).
A gang network example
The leader
A clique
A gatekeeper
The reduced network structure
Patterns Found• The chain structure of the
narcotic network
• Implications: disrupt the network by breaking the chain
• The star structure of the gang network
• Implications: disrupt the network by removing the leader
White gangs who involved in murders and shootings
White gangs who sold crack cocaine
A group of black gangs
Expert Validation
“Yes, these two groups are together very often”
“(211) and (173) are best friends”
“He is very important. He has a lot of money and sells drugs. His girl friend brings a lot of dancers in the city and buy drugs.”
Lessons Learned:• Data warehousing and gateway approaches are
needed for information consolidation• XML and data normalization are critical• Co-occurrence analysis and link analysis are
extremely useful for crime investigation• Visual data mining is essential for criminal network
analysis• Wireless (laptop, PDA, cell phone) application is
essential• KM techniques may create unintended cultural and
practice side effects
GetSmart Concept Maps:
Knowledge creation, transfer and mapping
Meaningful Learning• Substantive synthesis
• Relate to experiences
• Intentionally connect to prior knowledge
• Memorization
• Unrelated to experience
• No effort to link to existing knowledge
• Practice, rehearsal and thoughtful replication contribute to meaningful learning.
A C
ontin
uum
Meaningful Learning
Rote Learning
Creative Production
Most School Learning
* Adapted from Novak’s model of meaningful learning
Six Steps of Information Search: A Constructivist Approach
Learners are actively involved in building on what they already know to come to a new understanding of the subject under study.
Exploration
Initiation
Presentation
Formulation
Collection
Selection
Introduce a problem.Identify a general area
for investigation.
Explore information to form a focus.
Summarize the topic and prepare to present to the intended audience.
Gather information that defines/supports the focus.
GetSmart Learning Tools
Digital Library CurriculumKnowledge
Representation
Keyword SuggestionFiltered Material
A Place to Store Work
AssignmentsAnnouncements
Linked Resources
Concept MapCustomized Resources
A Concept Map about Concept Maps
Navigation bar
Concept mapmanagementtools
Search tools
Meta search options
GetSmart Interface
By right clicking on a node you can delete the node, change the properties of the node, or add a resource to the node. Resources can be URLs, Maps, or Notes.1 2
3
You can either type a URL, or click the
“Add From URL Clipboard” button.
This is the clipboard. Simply highlight the URL you would like to add to a node and Click OK. Your URL will appear in the window,
click the Done button to add it to your map.
4
Printing
Choosing the Print option will cause a new window to open. This map will show your map, the title of the map, and any URL’s, notes, or maps you have linked to your map.
Usage: Overall at UA and VT
• 114 student users – all UA students (54) turned in all assignments (VT assignments still pending)
• 4,000+ user sessions• 1,000+ maps created for homework and presentations• 600+ searches performed• 50+ maps created as a group• 40,000+ relationships represented in the maps
Results (1)• 120 cue phrases were used to extract 37,674 links, which
accounted for 93% of the pool.
• These cue phrases were categorized into the proposed link types:– About 50 cue phrases map to the five previously
determined link types: hierarchical, componential, comparative, influential, and procedural.
– Over 50% of cue phrases expressed hierarchical and componential relationships.
– Descriptive relationships accounted for a large portion (30%), which were analyzed further.
21.30%
32.67%
3.86%
9.65%
2.91%
29.60%
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
35.00%
Hierarchical Componential Comparative Influential Procedural Descriptive
Over 50% of the links expressed hierarchical or componential relationships
Descriptive relationships accounted for a large portion at 30%, so we further analyze this link type
Link Type Distribution
* The number of links which had those identified cue phrases in them
Link types Number* Percentage Representative cue phrases
Hierarchical 8,026 21.30% example, such as, case, type, member, is a
Componential 12,307 32.67% consist, contain, include, compose, part, made of
Comparative 1,455 3.86% like, compare, similar, differ, alternative
Influential 3,635 9.65% lead to, cause, result, influence, determine
Procedural 1,097 2.91% next, go to, procedureDescriptive 11,153 29.60% use/implement/present/advantages/feature
Sum 37,673 100.00%
Lessons Learned:• Digital library and concepts maps support
meaningful learning• Digital library systems provide support for
community knowledge creation.• Semi-open link systems are useful for
capturing knowledge and learning process• NSDL is not a “library.” It should be a
learning or knowledge creation environment.
Knowledge Management Systems:Knowledge Management Systems:FutureFuture
Other Emerging Categorization Challenges/Opportunities:
• Multilingual terminology and semantic issues• Web analysis and categorization issues• E-Commerce information (transactions) classification
issues• Multimedia content and wireless delivery issues• Future: semantic web, multilingual web,
multimedia web, wireless web!
The Road Ahead
• The Semantic Web: XML, RDF, Ontologies
• The Wireless Web: WML, WIFI, display
• The Multimedia Web: content indexing and
analysis
• The Multilingual Web: cross-lingual MT and IR
Requirements For Successful KMS Implementation (General)
• Sponsor for the application• Business case for the application clearly
understood and measurable• High likelihood of having a significant impact
on the business• Good quality, relevant data in sufficient
quantities• The right people – business domain, data
management, and data mining experts
Requirements For Successful KMS Implementation (KM Specific)
• Information overload is more than anyone can handle
• Productivity gained and decision improvements evident among knowledge workers
• Organization’s IT infrastructure ready• Need to integrate with consulting, process,
content, and policy considerations