an ontological approach to the document access problem of insider threat
DESCRIPTION
An Ontological Approach to the Document Access Problem of Insider Threat. ISI 2005 , (May 20). Boanerges Aleman-Meza 1 Phillip Burns 2 Matthew Eavenson 1 Devanand Palaniswami 1 Amit P. Sheth 1. (1) LSDIS Lab, Computer Science Dept., University of Georgia, USA - PowerPoint PPT PresentationTRANSCRIPT
An Ontological Approach to the Document Access Problem ofInsider Threat
ISI 2005, (May 20)
Boanerges Aleman-Meza1
Phillip Burns2
Matthew Eavenson1
Devanand Palaniswami1
Amit P. Sheth1
(1) LSDIS Lab, Computer Science Dept., University of Georgia, USA
(2) CTA – Computer Technology AssociatesUSA
6/21/20042
Objective & Approach
Determine if (classified) documents reviewed an IC analyst satisfy his/her “need to know”
Characterization of “need to know” w.r.t. ontology
Characterizing document content in terms of ontology
Discovering weighted semantic relationships between document content and “need to know”
6/21/20043
Characterizing “Need to Know” using an a Semantic Approach (using Ontology) Requires domain ontology
models important concepts & relationships of domain (schema), captures factual knowledge (instances)
Relate analyst’s need to know to concepts & relationships in ontology e.g. terrorist organization, funding sources,
facilitators, members, methods
6/21/20044
“Need to know” = context of investigation 26,489 entities
34,513 (explicit) relationships
Add relationship to context
6/21/20045
Characterizing document content in terms of ontology “Semantic Annotation” Correlate words/phrases from document with
entities/relationships in ontology Entity identification
Meta-data added to document (from associated ontological knowledge)
Active area of research but practically useful technology now available
Constrained to content of ontology
6/21/20046
6/21/20047
Semantic Relationships between Document & “Need to Know” Semantic associations: relationships between
document concepts & “need to know” concepts are discovered and ranked
Ranking based on multiple factors no. of links, types of links, location in ontology, …
Ranking indicates degree of semantic “closeness” and therefore, how related document is to “need
to know”
6/21/20048
Highly relevant
Closely related
Ambiguous
Not relevant
Undeterminable
DocumentsRanking
6/21/20049
Research Content
Discovery & Ranking of semantic semantic associations
Characterizing “need to know” in terms of ontological concepts & relationships
Meta-data annotation of data and (semi-structured & unstructured) documents correlation of document content & concepts in
ontology
6/21/200410
In this project we are addressing: Discovery of Semantic Associations per entity
per document Input/Visualization/Management of Context of
Investigation Scalability on number of documents & ontology
size Performs well with thousand documents
Ranking of documents
Research Challenges
6/21/200411
“Closely related entities are more relevant than distant entities”
E = {e | e Document }
Ek = {f | distance(f, eE) = k }
nk
k
k
k
k
Eelevanceentities_R
ERelevancerelations_
Elevanceclasses_Re
0
)(
)(
)(
RelevanceDocument
Ranking of Documents Relevance
6/21/200412
Components of Document Relevance
(specific entities)• Abu Abdallah • Turkmenistan • Konduz Province • …
Context of Investigation
e7:Terror Organization
e4:Watch-List
worksat
friends withcit
izen
of
citizen of
liste
d in
e3:Person
claims
responsibility fo
r
e8:Event
friends
withe1:Person
livesin
e5:Person
e6:State
e9:Person
e2:Country
e6:Company
e7:Terror Organization
e4:Watch-List
worksat
friends withcit
izen
of
citizen of
liste
d in
e3:Person
claims
responsibility fo
r
e8:Event
friends
withe1:Person
livesin
e5:Person
e6:State
e9:Person
e2:Country
e6:Company
e7:Terror Organization
e4:Watch-List
worksat
friends withcit
izen
of
citizen of
liste
d in
e3:Person
claims
responsibility fo
r
e8:Event
friends
withe1:Person
livesin
e5:Person
e6:State
e9:Person
e2:Country
e6:Company
Entities belong to classes in the
Context type(entity) Context
1.
Relationships constrains
Relationship [Class]
2.
Entities match a list of entities of interest (in the Context)
entity Entities-List
3.
6/21/200413
Schematic of Ontological Approach to the Legitimate
Access Problem Semagix Freedom
Semagix Freedom
6/21/200414
Conclusions
New Semantic Approach to the challenging problem
Viability demonstrated on a small scale Significant new research that builds upon the
latest Semantic Platform Many applications of this approach: vendor
vetting, knowledge discovery, ….
6/21/200415
Acknowledgements
Semagix provided technology to populate ontology using knowledge extraction, and (semi-)automatic metadata extraction from documents (Freedom toolkit).
NSF-funded projects provided core research: "Semantic Association Identification and Knowledge Discovery for National Security Applications" (Grant No.
IIS-0219649) and "Semantic Discovery: Discovering Complex Relationships in Semantic Web" (Grant No. IIS-0325464)
6/21/200416
References 1. B. Aleman-Meza, C. Halaschek, I.B. Arpinar, A. Sheth, Context-Aware Semantic Association Ranking. Proceedings of Semantic Web and Databases Workshop, Berlin, September 7- 8 2003, pp. 33-50 2. B. Aleman-Meza, C. Halaschek, A. Sheth, I.B. Arpinar, and G. Sannapareddy. SWETO: Large-Scale Semantic Web Test-bed. Proceedings of the 16th International Conference on Software Engineering and Knowledge Engineering (SEKE2004): Workshop on Ontology in Action, Banff, Canada, June 21-24, 2004, pp. 490-493 3. R. Anderson and R. Brackney. Understanding the Insider Threat. Proceedings of a March 2004 Workshop. Prepared for the Advanced Research and Development Activity (ARDA). http://www.rand.org/publications/CF/CF196/ 4. K. Anyanwu and A. Sheth ρ-Queries: Enabling Querying for Semantic Associations on the Semantic Web The Twelfth International World Wide Web Conference, Budapest, Hungary, 2003, pp. 690-699 5. K. Anyanwu, A. Maduko, A. Sheth, SemRank: Ranking Complex Relationship Search Results on the Semantic Web, In Proceedings of the 14th International World Wide Web Conference, Japan 2005 (accepted, to appear) 6. K. Anyanwu, A. Maduko, A. Sheth, J. Miller. Top-k Path Query Evaluation in Semantic Web Databases. (submitted for publication), 2005 7. C. Halaschek, B. Aleman-Meza, I.B. Arpinar, A. Sheth Discovering and Ranking Semantic Associations over a Large RDF Metabase Demonstration Paper, VLDB 2004, 30th International Conference on Very Large Data Bases, Toronto, Canada, 30 August - 3 September, 2004 8. B. Hammond, A. Sheth, and K. Kochut, Semantic Enhancement Engine: A Modular Document Enhancement Platform for Semantic Applications over Heterogeneous Content, in Real World Semantic Web Applications, V. Kashyap and L. Shklar, Eds., IOS Press, December 2002, pp. 29-49
6/21/200417
References (cont)
9. M. Rectenwald, K. Lee, Y. Seo, J.A. Giampapa, and K. Sycara. Proof of Concept System for Automatically Determining Need-to-Know Access Privileges: Installation Notes and User Guide. Technical Report CMU-RI-TR-04-56, Robotics Institute, Carnegie Mellon University, October, 2004. http://www.ri.cmu.edu/pub_files/pub4/rectenwald_michael_2004_3/rectenwald_michael_20 04_3.pdf 10. C. Rocha, D. Schwabe, M.P. Aragao. A Hybrid Approach for Searching in the Semantic Web, In Proceedings of the 13th International World Wide Web, Conference, New York, May 2004, pp. 374-383. 11. M.A. Rodriguez, M.J. Egenhofer, Determining Semantic Similarity Among Entity Classes from Different Ontologies, IEEE Transactions on Knowledge and Data Engineering 2003 15(2):442-456 12. A. Sheth, C. Bertram, D. Avant, B. Hammond, K. Kochut, and Y. Warke. Managing Semantic Content for the Web. IEEE Internet Computing, 2002. 6(4):80-87 13. A. Sheth, B. Aleman-Meza, I.B. Arpinar, C. Halaschek, C. Ramakrishnan, C. Bertram, Y. Warke, D. Avant, F.S. Arpinar, K. Anyanwu, and K. Kochut. Semantic Association Identification and Knowledge Discovery for National Security Applications. Journal of Database Management, Jan-Mar 2005, 16 (1):33-53 14. Boanerges Aleman-Meza, Phillip Burns, Matthew Eavenson,Devanand Palaniswami, Amit Sheth. An
Ontological Approach to the Document Access Problem of Insider Threat
6/21/200418
Security and Terrorism Part of SWETO Ontology
6/21/200419
Semantic Annotation
Document searched for entity names (or synonyms) contained in ontology
Then document entities are annotated with additional information from corresponding entities in ontology including named relationships to other entities
Following chart is example Highlighted text are entities found corresponding to
concepts in ontology XML is corresponding meta-data annotation
6/21/200420
Relevance Measures for Documents(relating document content to IA “need to know” Relevance engine input
the set of semantically annotated documents the context of investigation for the assignment the ontology schema represented in RDFS, and
the ontology instances represented in RDF Relevance measure function used to verify
whether the entity annotations in the annotated document can be fit into the entity classes, entity instances, and/or keywords specified in the context of investigation.
6/21/200421
The Big Picture
Ontology /knowledge base
MassiveMetadata
Store
API
Ope
n/pr
oprie
tary
Het
erog
eneo
us D
ata
Sou
rces
documents
databases
Html pages
emails
XMLfeeds
TrustedSources
populates
Semi-structured
data
populates
Knowledge Discovery Algorithms
SWETO WebService Browsing
SWETO – Ontology Schema Visualization
See SemDis project of LSDIS Lab, University of Georgia
6/21/200423
Relevance Measures for Documents(relating document content to IA “need to know” (cont) Documents classified as:
Highly relevant Document entities directly related
Closely related Document entities related through strong semantic
associations Ambiguous
Document entities related through weak semantic associations
Not relevant Document entities not related to “need to know”
Undeterminable Document entities not found in ontology
6/21/200424
IA Context of Investigation(characterization of “Need to Know”)
We define the context of investigation as a combination of the following:
A set of entity classes and relationships, and/or a negation of a set of entity classes and relationships
A set of entity instance names, and/or a negation of a set of entity instance names
A set of keyword values that might appear at any attribute of the populated instance data, and/or a negation of a set of keyword values
6/21/200425
Context of Investigation (cont) Goal is to capture, at a high level, the types of
entities, (or relationships), that are considered important.
Relationships can be constrained to be associated with specified class types E.G. It can be specified that a relation ‘affiliated with’ is part
of the context only when it is connected with an entity that belongs to a specific class, say, ‘Terror Organization’
6/21/200426
Ranking of Documents RelevanceFour groups of document-ranking:- Not Related Documents
- unable to determine relation to context- Ambiguously Related Documents
- some relationship exists to the context- Somehow Related Documents
- Entities are closely related to the context- Highly Related Documents
- Entities are a direct match to the context
Cut-off values determine grouping of documents w.r.t. relevance- These are customizable cut-off values (more control and more
meaningful parameters compared to say automatic classification or statistical approaches)
“Inspection” of a document is possible via (a) original document or (b) original document with highlighted entities