from documents to knowledge models
DESCRIPTION
Max Völkel [email protected] Forschungszentrum Informatik an der Universität Karlsruhe (TH). From Documents to Knowledge Models. Personal Knowledge Management. Definition: knowledge cues [Haller] - PowerPoint PPT PresentationTRANSCRIPT
From Documents to Knowledge Models
Max Vö[email protected]
Forschungszentrum Informatik an der Universität Karlsruhe (TH)
© 2007 Max Völkel, FZI29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km2
Personal Knowledge Management
Definition: knowledge cues [Haller]
any kind of symbol, pattern or artefact which evokes some knowledge in a person’s mind, when viewed or used.
Knowledge cues can be stored and retrieved on a computer – while knowledge may or may not.
Ok, in fact you store bits (signals)
© 2007 Max Völkel, FZI29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km3
What is a Document?
A team of 50 French researchers discussed …
© 2007 Max Völkel, FZI29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km4
Definition: Document
A team of 50 French researchers could agree on:
Document as form Document as a container, which assembles and structures the
content to make it easier for the reader to understand it.
Document as sign Emphasize argumentative structure of the content. Document can be referenced acts as a sign for its meaning.
Document as medium “Reading contract“ = intention or assumption of the author what
will happen with the document.
© 2007 Max Völkel, FZI29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km5
Document (my definition) I/II
A document consists of information atoms. An information atom is the smallest unit of content which can be
interpreted without a documents context (but of course requiring background knowledge). For text, these atoms are single words.
Document
Author, audience, goal
Packaging – establishes a context
Reference-ability – reference to a published document can act as a placeholder for the content expressed within.
Process metadata – should be sent along such as authors, audience, goal
© 2007 Max Völkel, FZI29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km6
Document (my definition) II/II
A document is a knowledge artefact consisting of several layers:
Linearity
Visual Structure
Logical Structure
Argumentative Structure
Content Semantics
– content means something. Building upon logical and argumentative structure, the author
encodes statements about a domain within the content.
– defined order for navigating through all information
items
– guides the reader informally type-setting (i.e. bold, italics, different font styles and
size), placement of figures, pages – carries additional information
– can reference smaller parts within a document i.e. paragraphs, headlines, footnotes, citations, and title
– to convey its content to the reader. Argumentative structures appear on all scales. A typical
structure is the “Introduction - Related work – Contribution - Conclusion”-pattern of scientific articles. On smaller scales, patterns like “claim-proof” and “question-answer” are used.
© 2007 Max Völkel, FZI29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km7
Ted Nelson
I propose a different document agenda:
I believe we need new electronic documents which are transparent, public, principled, and freed from the traditions of hierarchy and paper.
© 2007 Max Völkel, FZI29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km8
What do people want?
Why?
© 2007 Max Völkel, FZI29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km9
What is a Wiki? What‘s new compared to CMS?
Easy Contribution shorter time-to-publication Wiki pages can be created and edited by any user quickly and easily
Easy Writing Simple text formatting without the need to learn HTML Wiki Syntax
Easy Linking Automatic linking converts written names of pages, images and websites
to links
Recent Changes See what has happened – Awareness
Diff function shows the latest changes Easily check whether changes are ok
Fulltext search for page titles and text
Backlink function shows which pages link to the current page Find the context of this page
Directly link deep into a wiki using readable names
Wikis were the firstdeployed, collaborative
hypertext authoring environments
People want more links
© 2007 Max Völkel, FZI29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km10
EntityX
EntityY
ArtifactX
ArtifactY
TypeA1
TypeB1
Real world from theviewpoint of the individual
Modelling
TypeC1
TypeA2
TypeB2
Type C2
(Meta-)Modelling
What is a Model? Typed entities and typed relations
My definition based on OMG metamodel MOF
© 2007 Max Völkel, FZI29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km11
What is a Knowledge Model?
Document Ontology Knowledge Model
Information atoms
Text (paragraphs, images, multimedia resources)
Concepts Items (text, images, other binary resources)
- Text Short (headlines) and longer (paragraphs)
Short labels
Anything from short labels to structured documents
Order Strict linear order – Yes, may be partial and have cycles
Hierarchy Yes (chapters, sections, paragraphs, sentences)
Yes Yes, may be partial and have cycles
Annotations Yes (footnotes) Yes Yes
- Tagging (annotation with keywords)
– – Yes
- Typing (inc. Inferencing)
– Yes Yes
Hyperlinks Yes (internal references and external citations)
– Yes, don‘t have to occur inside text
Visual layout Yes – –
© 2007 Max Völkel, FZI29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km12
From Documents to Knowledge Models
From analogue to digital documents
smaller content granularity
more interconnected content
more explicit structures.
Knowledge models
very small information atoms, such as single words
Richly connected items
explicit semantics for the links.
Definition
A knowledge model is a superset of documents and formal ontologies.
Annotated documents, stored together with their annotations, can be seen as a knowledge model.
© 2007 Max Völkel, FZI29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km13
What is a CDS? Conceptual Data Structures
context
detail
before after
target
source
annotationmember
annotation
Item
M. Völkel and H. Haller: Conceptual Data Structures (CDS) - Towards an Ontology for Semi-Formal Articulation of Personal
Knowledge In Proc. of the 14th International Conference on Conceptual Structures
2006. Aalborg University - Denmark, July 2006.
© 2007 Max Völkel, FZI29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km14
What is a CDS-based Knowledge Model?
A set of addressable items (text, images, maybe even multimedia elements)
Relations between items, classified in four types Source/target: the generic, directed hyperlink link Before/after: ordering relations, linear navigation Context/detail: hierarchical relations, document and concept
hierarchies Annotation/annotationMember:
annotations, to give the ability to type items and relations, items are used as types meta-modeling
Knowledge models must be able to capture work-in-progress CDS is not strict, you can have cycles, untyped items, paradox
ordering, …
© 2007 Max Völkel, FZI29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km15
CDS: A Hierarchy of Relations
Undirected Relation: related/related
Directed Linking: source/target
Relation Typerelation/inverse
Labelled Links:…/…-inverse
Order: before/after
Hierarchy: detail/context
Instantiation: type/instance
Tagging: tag/tagMember
Subclassing: is-a/superclass-of
informal
formal
Equivalency: equivalent
Legend
Annotation: annotation/annotationMember
Taskpriority
Documentorder
Motivation
© 2007 Max Völkel, FZI29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km17
Examples for Knowledge Models
Fiction Writing
SimulationReq. Engineering
Engineering
Thinking
© 2007 Max Völkel, FZI29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km18
How does Writing/Reading works?
Writing / Sending
Write down ideas
Group them
Structure them
Add argumentation structures
Add references to literature
Link pieces in a first draft
Add introduction and conclusion
Repeat until coherent flow
Publish document
Reading / Recieving
Visualise the structure graphically
Connect new structures with existing own structures
Mind maps
Textprocessing
Reference Manager
???
???
Mind maps
„Von der Idee zum Text“ [Esselborn 2004]
© 2007 Max Völkel, FZI29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km19
The tool chains break
Create a new slide show out of three old presentation plus one from your colleague Why not have the content in smaller, more logical chunks?
Re-use the motivation part of an old paper for a new one If you find a mis-spelling, why have to fix it twice?
Search a stack of paper notes with good ideas Why are those not in your computer?
Search email archives to find out what the high-level architecture for the new authentication system is Why not browse your PKM and see the relations?
© 2007 Max Völkel, FZI29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km20
Technological Developments
accelerated distribution by many orders of magnitude
lower costs
timewritten languagein
tern
et
Analog Digital
Communicationspeed
printing press
cost
© 2007 Max Völkel, FZI29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km21
Cost of Communication Data transmission is cheap now
Total cost of communication to send content to n people:
| choosing relevant parts of the personal model | + | encoding of model parts in document parts |+ | order document parts strictly linear/hierarchical | + n ·( | data transmission |
| linear reading of the document | + | decoding of model parts from document parts | + | creating a networked model out of model parts | + | integrate new model to existing model | )
© 2007 Max Völkel, FZI29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km22
Cost of Communication Where can we save, if n is small?
Total cost of communication to send content to n people:
| choosing relevant parts of the personal model | + | encoding of model parts in document parts |+ | order document parts strictly linear/hierarchical | + n ·( | data transmission |
| linear reading of the document | + | decoding of model parts from document parts | + | creating a networked model out of model parts | + | integrate new model to existing model | )
© 2007 Max Völkel, FZI29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km23
Cost of Communication
Total cost of communication to send content to n people:
| choosing relevant parts of the personal model | + | encoding of model parts in document parts |+ | order document parts strictly linear/hierarchical | + n ·( | data transmission |
| linear reading of the document | + | decoding of model parts from document parts | + | creating a networked model out of model parts | + | integrate new model to existing model | )
© 2007 Max Völkel, FZI29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km24
Current process – culture is document-centric
Sender
Recipient(s)
Cost
© 2007 Max Völkel, FZI29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km25
Ideal process - What if not documents, but knowledge models would be exchanged between people?
Sender
Recipient(s)
Cost
© 2007 Max Völkel, FZI29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km26
Realistic (improved) process – use both
Sender
Recipient(s)
Cost
© 2007 Max Völkel, FZI29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km27
Information Management Problems Solution: Knowledge Models
Under-utilisation of the interlinked nature of information [Oren] fine-granular nature of knowledge models allows for precise and effective linking – and browsing
People have problems in using strict hierarchies [Oren] classification methods like tagging and non-strict taxonomies
Keep the context [Oren] networked nature of a knowledge model is more suited to represent contextual links than a set of documents
Granularity Represent more than the content of just one document
© 2007 Max Völkel, FZI29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km28
When to use Knowledge Models?
Use domain specific tools & languages Standardised representation formalisms Established data exchange processes
Fixed domain
Open domain- or –
Multiple domains
Use personal knowledge models Unstructured, semi-structured,
semi-formal and formal parts Ad-hoc formalisation Cheaper to create, easier to integrate
Use Documents Costly to create Cheap to read sometimes the best
solution Hard to integrate
Broad audience
Myself!
My TeamMy Community
© 2007 Max Völkel, FZI29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km29
Related Work in Semantic Authoring
Initial ideas - although that term was not used - can be found already in V. Bush and D. Engelbart
ABCDE Format from Anita de Waard
Semantically annotated Latex (SALT) by Tudor Groza
Systems allowing end-users to construct ontologies out of their linked information objects. L. Ludwig sees redundancy within and among documents as a hurdle to efficient
information usage. Traditional notion of a document is replaced by virtual documents, which render parts of the knowledge base as an interactive tree.
Bernstein describes TinderBox, a "personal content management assistant", which offers sophisticated HTML generation via templates.
Gnowsis system by Sauermann allows to link desktop objects, integrates with wiki
iMapping – semantic concept maps by Haller
Same direction in the fields of semantic desktop and semantic wiki
Semantic Web Content Repository (swecr)
© 2007 Max Völkel, FZI29.03.07, ProKW @ WM2007, Potsdam, Germany
http://xam.de/2007/doc2km30
Conclusion
Documents Document-centered culture is a
costly legacy artefact and bottleneck for our society
Personal knowledge models Superset of documents and
ontologies Integrate with the semantic desktop Make knowledge worker happier and
more productive
Authoring is the bottleneck We should bring the power
of modeling to the end-user Don‘t break the tool chain Focus on work-in-progress
Thank You very muchfor Your attention
Contact:Max Völkel, [email protected]