eksamensnotater teknologi for integrerte operasjoner og semantisk web

30
Eksamensnotater Teknologi for integrerte operasjoner og semantisk web. Part 1: Microsoft .NET and C# - .NET is multiplatform and Multilanguage - Microsoft .NET/.NET Framework is a platform/Windows component for development and execution of different types of applications by using different programming languages - Assembly in .NET: classes and their source code and resources are compiled into an assembly, which is a main building block in .NET applications - Microsoft Intermediate Language (MSIL) is a language that is outputted from for instance the C# compiler, and is what .NET assemblies contain - The Common Language Specification – specifies required language features for .NET-languages, assures a compliance between languages, is a set of features guaranteed to be in most .NET languages to allow interoperability and is standardized and openly documented - Just In Time-compiler (JIT) is responsible for compiling MSIL code in assemblies to machine code when the code is executed - Common Language Runtime (CLR) o compiles source code into a single file called an Assembly o is the core runtime engine in the .NET Framework for executing applications o core services provided by CLR are: memory management, thread management, exception handling, garbage collection, localization and security - C# is the first language to be tailor made for .NET development, developed by Microsoft and standardized by ISO - Elements in C#: Namespaces, Enums, Arrays, Structs, Properties, Indexes, Delegates and Events, Attributes, Collections, Statements, Operators, Operator Overloading, Exception Handling, XML Comments o Namespaces – every definition must be contained in a namespace to avoid name collisions and to make the API’s easier to comprehend, namespaces can and should be nested o Structs – groups of data and code, they are similar to classes but no inheritance is allowed. A struct is a

Upload: pencilsharpener42

Post on 27-Nov-2014

83 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Eksamensnotater Teknologi for Integrerte Operasjoner Og Semantisk Web

Eksamensnotater Teknologi for integrerte operasjoner og semantisk web.

Part 1: Microsoft .NET and C#- .NET is multiplatform and Multilanguage- Microsoft .NET/.NET Framework is a platform/Windows component for development and

execution of different types of applications by using different programming languages- Assembly in .NET: classes and their source code and resources are compiled into an

assembly, which is a main building block in .NET applications- Microsoft Intermediate Language (MSIL) is a language that is outputted from for instance the

C# compiler, and is what .NET assemblies contain- The Common Language Specification – specifies required language features for .NET-

languages, assures a compliance between languages, is a set of features guaranteed to be in most .NET languages to allow interoperability and is standardized and openly documented

- Just In Time-compiler (JIT) is responsible for compiling MSIL code in assemblies to machine code when the code is executed

- Common Language Runtime (CLR) o compiles source code into a single file called an Assembly o is the core runtime engine in the .NET Framework for executing applicationso core services provided by CLR are: memory management, thread management,

exception handling, garbage collection, localization and security- C# is the first language to be tailor made for .NET development, developed by Microsoft and

standardized by ISO- Elements in C#: Namespaces, Enums, Arrays, Structs, Properties, Indexes, Delegates and

Events, Attributes, Collections, Statements, Operators, Operator Overloading, Exception Handling, XML Comments

o Namespaces – every definition must be contained in a namespace to avoid name collisions and to make the API’s easier to comprehend, namespaces can and should be nested

o Structs – groups of data and code, they are similar to classes but no inheritance is allowed. A struct is a lightweight data container, value type while a class is a rich object with references, reference type.

o Inheritance – C# support multiple inheritance only from interfaces, not classeso References: Class, String, Interface. Values: Struct, int(Int32), bool(Boolean).

Public Everyone may call or accessProtected Only members of class and subclasses

may accessPrivate Only members of exactly this class may

accessSealed Can’t use as a base classInternal Public access only within assemblyProtected internal Protected in assembly

Page 2: Eksamensnotater Teknologi for Integrerte Operasjoner Og Semantisk Web

Part 2: Windows Communications Foundation- Web Services: software systems designed to support interoperable machine-to-machine

interaction over a network. Web services are hosted by web servers and uses standards for communication

- A web service is a programmable application, accessible through standard internet protocols, designed for machine to machine interaction

- The building blocks for web services are:o XML(Extensible Markup Language) – data representationo SOAP(XML) Simple Object Access Protocol – message exchange, an envelope for

messages, states what the message is and keeps track of conversation. A protocol for exchanging information over any medium

SOAP is a platform independent protocol for exchanging XML messages over computer networks, usually over HTTP/HTTPS

o WSDL(XML) Web Service Description Language– contract definitions, XML documents that defines web services: Types, message, transport binding, endpoint

o DISCO - service discoveryo UDDI Universal Description, Discovery and Integration - service provider discovery, a

phone book for web service providers- WCF:

o Implemented as a set of classes on top of the .NET Framework’s Common Language Runtime.

o Clients that access services, they interact via SOAPo Unifying platform for distributed systems on the Windows Platformo a set of extensions to the .NET Framework o let’s you build WCF services in Visual Studio using any .NET languageo a programming framework for .NET used to build applications that inter-

communicateo a set of amongst others, classes, assemblies and tools that enables design,

implementation and runtime for communication on the .NET frameworko Microsoft’s platform for, amongst others, building SOA applicationso Goals: to unify the developer experience of building distributed appso fundamentals: a runtime for message based communications, development

API(unified API), for applications that communicate with other applicationso The pros: Unification(of existing .NET Framework communication technologies),

Interoperability(with services built on other technologies), Service Oriented Development

o When creating a WCF service, you have three primary components: a service class(has contracts), a host process and one or more endpoints

- Endpoints, a service can have multiple endpoints, however, a WCF service must expose at least one endpoint to be accessible, ABC

o Address – indicating where the endpoint can be found(URL)

Page 3: Eksamensnotater Teknologi for Integrerte Operasjoner Og Semantisk Web

o Binding – defines the transport, encoding, security, reliability implementation and session support, how to access the endpoint

o Contract – defines what operations and data types the endpoint exposes, indicating which service contract exposed

- Bindings, predefinedo BasicHttpBinding: SOAP over HTTPo WsHttpBinding: SOAP over HTTP, also reliable messaging, security and atomic

transactiono NetTcpBinding: binary-encoded SOAP over TCP, only WCF-to-WCF, secure, fasto WebHttpBinding: sends information directly over HTTP or HTTPS, no SOAP envelope,

for RESTful communication- Transactions – required means that the code will always run as part of a transaction- Service Orientation and WCF:

o Services - Described using WSDL and accessed via SOAP (WCF)o Service Oriented Applications - Expose business logic through services (WCF)o Service Oriented Architecture - SOA, defines guidelines for creating and using service

oriented applications- Three terms of Service Orientation: Boundaries are explicit, Services are autonomous, Share

schema and contract, not class - Benefits of SOA: Applications can be exposed more easily to diverse clients, existing services

can more easily be reused, new applications can be created more efficiently, business people understand services, integration becomes more efficient, easier to change or replace business processes

- Sessions enables us to connect calls to different operations from the same client in the same session to each other and keep state between them

Page 4: Eksamensnotater Teknologi for Integrerte Operasjoner Og Semantisk Web

Part 3: Windows Workflow Foundation- Workflow – a collection of activities that coordinates humans and/or software

o A framework for implementing workflow in .NET applicationso A set of assemblies, classes and other code that enables design, implementation and

runtime for workflows on the .NET platformo It is not an application, and is hosted in a “hosting process”o Components: BaseActivityLibrary, Runtime Engine, Runtime Services, Visual Designero Extendible:

Activities – fundamental building blocks of WF, are classes Runtime services – PersistanceService(saves a running workflow to

database), TransactionService(used to do transactional services), Tracking(logging and monitoring), Custom(roll your own)

new types of workflowo Pros: WF provides common functionality that many workflows need, workflow

definitions can be changed on-the-fly to meet changing business demands/rules, provides a framework for implementing workflows in a consistent manner

o Workflows are executed by the WF runtime engineo The WF runtime engine needs a host processor to runo Workflows can be hosted in both console applications and windows forms

applicationso WF supports workflows defined in XAML files out of the boxo Workflows can be defined using XMLo

o There is no time limit on how long a workflow can run/stay activeo

-- Sequential Workflow or State Machine Workflow

o Sequential workflows are capable of executing activities in a predefined patterno State machine workflows are capable of responding to external events as they occur

- The EventDriven activity, when used in a state machine state defines a transition containing one or more activities that should be executed when a specific event is received when in a particular state.

- In state machine workflows, states can be nested- The TransactionScope allows combining the work done by one or more other activities into

an atomic transaction

Page 5: Eksamensnotater Teknologi for Integrerte Operasjoner Og Semantisk Web

Part X: Silverlight- Microsoft Silverlight is a cross-browser, cross-platform implementation of the .NET

Framework for building and delivering the next generation of media experiences and rich interactive applications (RIA) for the Web.

- Pros:o The application looks and performs the same everywhereo Integration of data and services from multiple networked locations into one

application using familiar .NET Framework classes and functionalityo A media-rich, compelling, and accessible user interfaceo Easy to use for developerso Streams video and audioo Reads data and updates the display without refreshing the whole page

- In practice: requires a small client plugin, cross-browser and cross-platform, applications can be deployed to local computer, especially focused on media

- Creating Silverlight apps: XAML and a subset of WPF (Windows Presentation Foundation) for GUI creation, .NET programming model, support for WCF, SOAP, AJAX.

- Architecture: o Core presentation framework – components and services oriented toward the UI and

user interaction, XAML for layouto .NET framework for Silverlight – subset of .NET Framework, including CLRo Installer and Updater – installation and update control to simplify for users

- Supports two programming models: the JavaScript API for Silverlight and the managed API for Silverlight(subset of .NET)

- Silverlight features: o WPF and XAML – to create immersive graphics, animation, media and other rich

client features, XAML provides a declarative markup syntax for creating elementso Extensions to JavaScripto Cross-browser, cross-platform supporto Integration with existing applicationso Access to the .NET Framework programming modelo Tools Supporto Networking supporto LINQ, Language-Integrated Query

Page 6: Eksamensnotater Teknologi for Integrerte Operasjoner Og Semantisk Web

Part 4: Real Time Communication- Real Time: Same time(synchronous), different place

o Types: Presence, Instant Messaging(text), Voice over Internet Protocol(VoIP), Video Conferencing

o Sending message: User input + RTP header + TCP or UDP header + IP header + Ethernet header

- RTC Call Processingo First: signaling and call control to establish, modify and terminate a call: SIP(or H.323)o Sample and convert audio or video input: o The sample is encapsulated into Real-time Transport Protocol(RTP) packetso The RTP packet is encapsulated into a network transport protocol: User Datagram

Protocol(UDP) or Transmission Control Protocol(TCP), TCP is often not preferred because it is a guaranteed transport-level protocol which causes latency with audio and video

o Real-time Control Protocol(RTCP) monitors the quality of an RTP session.o Then the network transport protocol(UDP or TCP)is encapsulated into an IP packeto Then it is encapsulated into the link layer protocol – Ethernet for example

- Session Initiation Protocol (SIP) o Signal protocol used to control a sessiono A text-based application-layer signaling and call control protocolo Sets up, manages and tears down sessions between partieso What it does: finds the target of the session, negotiates the capabilities of the

participants, establishes a session between participants, manages changes during a session

o SIP components: SIP servers –three types: proxy(intermediary between user agent client and -

server), registrar(receives REGISTER requests), redirect(accepts initiation) SIP user agents – two types: user agent client(initiates SIP requests), user

agent server(receives SIP requests)o Functions/services provided: user location, user capabilities, session setup, session

management.o Functions/services not provided: user interface, data transfer, voice and video

compression, command line interface.o Three parts:

1 Start line – contains SIP version, req: method type and SIP address/URL, resp: numeric status-code, reason-phrase

2 Headers – four categories: general, request, response, entity 3 Message body – defined by the Session Description Protocol (SDP). A

session description consists of three parts: a single session description, zero or more time descriptions, and zero or more media descriptions

o SIP methods: invite, ack, options, bye, cancel, register, subscribe, notify, messageo SIP features: lightweight, transport protocol independent, text based

Page 7: Eksamensnotater Teknologi for Integrerte Operasjoner Og Semantisk Web

- TCP retransmits lost packages, a feature that is often nor desirable in RTC applications, UDP does not.

- The most common responsibility of audio and video codecs are: compress and decompress the audio or video data to make it smaller for storage or network transfer.

- Challenges with transferring real-time audio and video communication over packet switched networks:

o Infrastructureo Firewalls, NAT(Network Address Translation)o Jittero Packet losso Packet sequenceo Echoo Security

- RTP and RTPCo RTP is a real-time transport protocol, and RTCP is a control protocol used for

monitoring RTP sessionso RTP is used by SIP to transfer digitized audio and video data between the various

parties participating in a callo RTP and RTPC are usually used with UDP as underlying transport layer and IP as the

underlying network layer, but are independent of the underlying transport and network layers

o RTP provides end-to-end network transport for real-time applications. RTP contains information about the real-time session

o RTCP packets contain information regarding the quality of the RTP session and the individuals participating in the session

- Voice quality technologies o Jitter Control, Acoustic Echo Canceller, Quality of Service, Measuring Voice Quality

- SIP Instant Messaging and Presence Language Extensions(SIMPLE) allow users to send and receive instant real-time messages(generally text messages) and to know the current availability or status of other users.

Page 8: Eksamensnotater Teknologi for Integrerte Operasjoner Og Semantisk Web

Part 5: Introduction to the Semantic Web- The semantic web is an extension of the current web- Vision: Introduce a semantic standard that covers “everything” and that anyone can use, use

semantic standard to define meaning of data/services and provide uniform access to all data/services

- Current issues: Interoperability, semantics, adaptability/flexibility, autonomy- HTML/XML for interoperability, XML provides syntactic interoperability, we need semantic

interoperability - The role of XML in the semantic web stack of languages is serialization of ontology models- Future of the semantic web:

o Near: semantic web services, information integration and/or interoperability, intelligent search

o Long-term: model-driven applications, adaptive and autonomic computing, intelligent reasoning

- Terminology:- Ontology is an explicit representation of a conceptualization, the conceptualization includes a

set of concepts, their definition and interrelationships.- Ontology: engineering artifact, constituted by a vocabulary, assumptions about intended

meaning- Formalization: logical theory accounting for the intended meaning of a formal vocabulary,

committed to a particular conceptualization of the world- Ontology vs Conceptualization: Conceptualization is language-independent, ontology is

language-dependent- Ontologies: formal, explicit specification of a shared conceptualization-- An ontology is an abstract description of commonly accepted phenomena of a domain- Ontologies define the meaning of terms and service items on the web- Some properties of ontologies: formal representation of meaning, conceptualization of a

domain, express shared understanding of a domain, explicit representation of meaning- An interpretation of an ontology based on Description Logic: a domain and a mapping from

concepts, roles and individuals to parts of this domain.- The three categories used by Uschold et al.(“A Framework for Understanding and Classifying

Ontology Applications”) to classify ontology applications are: Common access to information, indexing and neutral authoring.

- The four ontology scenarios in the “common access to information” category are: Human communication, data access via shared ontology, data access via mapped ontology and shared services.

- A framework for Understanding and Classifying Ontology Applicationso Overview of Framework:

Purpose and benefits – communication, inter-operability, system engineering benefits

Page 9: Eksamensnotater Teknologi for Integrerte Operasjoner Og Semantisk Web

Role of the ontology – three level of information (operational data, an ontology, ontology representation language(Ontolingua))

Actors – ontology author, data author, application developer, application user, knowledge worker

Supporting technologies – ontology representation languages, knowledge interchange languages, translation tools, distributed objects

Maturity levelo Ontology Application Scenarios – these scenarios are abstractions of specific

ontology applications in industry or research: Neutral authoring – the main idea is to author an artifact in a single

language, and to have that artifact translated into a different format for use in multiple target application, the authored artifact to translate can be an ontology or operational data, two types: authoring ontologies, authoring operational data

Ontology as specification – author an ontology which models the application domain, and provides a vocabulary for specifying requirements for one or more target applications, ex: Protégé, Maturity.

Common access to information – use ontologies to enable multiple target applications to have access to heterogeneous sources of information, four categories:

human communication – promote common understanding among knowledge workers, supporting technologies include ontology editors and browsers, Maturity: library classification skills have a long history

data access via shared ontology – an ontology can be used as an interchange format to enable common access to operational data, Maturity: commercial success exists in some context, while in others, the technology is a long way from being mature

data access via mapped ontology – no explicit shared ontology, instead mapping rules are used to define what a term in one ontology means in another ontology

shared services – similar to data access via shared ontology, but different in the focus of what is being shared, the ontology defines interfaces in multiple target languages, Maturity: relatively mature

Ontology-based search – use an ontology for searching an information repository for desired resources, Maturity: many commercial internet portals are beginning to explore the use of concepts for ontology-based search

o Conclusions: the paper presents a framework for understanding ontology applications, we studied:

The framework Various ontology application scenarios (neutral authoring, ontology as

specification, common access to information, ontology-based search)o We often distinguish between top-level ontologies, application ontologies, domain

ontologies and task ontologies, the type most likely to change is application ontologies.

Page 10: Eksamensnotater Teknologi for Integrerte Operasjoner Og Semantisk Web

Part 6: Ontology languages- The range of a property P in RDF is the class of those resources that may appear as values in

a triple with predicate P- Disjointness of classes cannot be expressed in RDF schema- Classes of classes cannot be expressed in OWL DL, but Cardinalities, multiple inheritance and

class equivalence can- The building blocks of RDF statements are Subject – Object – Property- A literal data value is not a resource in RDF, while a URI, a file reference and URI fragments

are- RDFS is an extension of RDF, predefined classes in RDFS are: rdfs:Resource, rdf:Property and

rdfs:Class- OWL Full provides maximum expressiveness. Other main characteristics are: flexibility of

RDF, class can be treated as individual. Full does not have full reasoning.- If class A subsumes class B, all members of B are also members of A- Semantic web building blocks:

o Namespaces – associates namespaces with URIref to disambiguate duplicate names in XML documents, simplifies writing URIrefs, Qualified names

o Uniform Resource Identifiers(URIs) – need unique references to information representation constructs, a URI is a string of characters identifying a particular resource, Terminology: URIref – fragment included in reference, Absolute URI - full specification of URI, relativeURIref..

o XML describes document structure – HTML: language for describing how to display document content, XML: language for describing the structure of document content, uniform method for describing and exchanging data using HTTP, provides a syntactic schema, XML allows authors to create their own markup, provides no means of specifying intended meaning of tags

o RDF, Resource Description Framework – a simple representation language for describing Web resources, all sentences are triples of the form “(Property(binary) Subject(URI reference) Object(URI reference or literal))”, model theoretic semantics, includes a resource “Class” and properties “type”, “subclassOf” etc

o RDF Statements – XML is used to serialize RDF(S) statementso RDF Schema, RDFS –

Classes: Class, ContainerMembershipProperty Properties: subClassOf, subPropertyOf, seeAlso, isDefinedBy, comment,

label, range, domain, membero Comments on RDF and RDFS - Severely lacking in expressive power, not useful for

checking consistency, basically a typing system, more powerful ontology representation languages are needed

- Ontology languages - degree of formality varies widely

Page 11: Eksamensnotater Teknologi for Integrerte Operasjoner Og Semantisk Web

- Formal ontology and information systems: o Ontology may affect three main components of information system: information

resources, user interfaces and application programso In Al, an ontology is an engineering artifacto Ontologies at different levels of generality: top-level ontologies, domain ontologies,

task ontologies, application ontologieso Differences in describing reality: subtypes of concepts, task dependence of

ontologieso Ontology-driven information systems:

An IS consists of components of three different types: application programs, information resource, and user interfaces

Two dimensions for analysis: temporal dimension – using ontologies at:

o development time – if we have an ontology library(set of reusable ontologies), the result is an application ontology, a specialization of both domain ontology and task ontology, the available off-the-shelf ontologies today is limited. If we have a generic ontology, more realistic, not building block, but tool, the generic ontology can increase the quality of analysis process

o run time – Ontology aware IS: an IS component is aware of the existence of an ontology and can use it. Ontology driven IS: enabling communication between software agents

structural dimension – impact of ontologies on different IS components

o using an ontology as a user interface component allows the user to query and browse the ontology, helps the user formulate specific queries, vocabulary detaching

o using an ontology for the application program component, application programs encode knowledge in the form of type or class declaration and procedures, the ontological commitment of the program can be made explicit using ontologies, programs turned into knowledge-based systems with expandable growing knowledge bases

o Conclusion: Ontology driven information system, different types of ontologies, the role of ontology in IS(time dimension, structural dimension)

-

Page 12: Eksamensnotater Teknologi for Integrerte Operasjoner Og Semantisk Web

Part 7: OWL Modeling- In OWL, an individual is an instance of one or more classes- Object properties does not link individual to primitive values like integer and strings, Object

properties may have inverse properties, they may be subproperties of other object properties, and they are binary relations

- OWL: Introduction to the Web Ontology Languageo Evolution: RDF + (OIL + DAML DAML+OIL OWL)o OWL extends RDF, OWL uses RDF to define its constructs

rdf:Resourcerdf:Class rdf:Property

owl:Class owl:DataType owl:ObjectProperty owl:DataTypeProperty

o A family of languages: OWL Lite supports users that need primarily a classification hierarchy and

simple constraints. It has lower formal complexity than OWL DL OWL DL supports users that need maximum expressiveness while retaining

computational completeness and decidability OWL Full is intended for users who need maximum expressiveness and the

syntactic freedom of RDF with no computational guarantees Each of the sublanguages is an extension of its simpler predecessor (Full

extends DL ex Lite ex RDF)o Language constructs in OWL

Classes – interpreted as sets of individuals, described using formal descriptions that state the requirements for class membership, subclasses inherit properties of super classes, 6 main ways of describing classes: Named Class; Intersection; Union; Complement; Restrictions; Enumerated classes. owl:Class rdfs:subClassOf

Properties – used to state relationships between individuals or from individuals to data values

owl:DatatypePropterty – relations between instances of classes and RDF literals and XML Schema datatypes

owl:ObjectProperty – relations between instances of two classes rdfs:subPropertyOf – hierarchical decomposition of properties rdfs:domain – limits the individual to which the property can be

applied rdfs:domain – limits the individuals that the property may have as its

value Property characteristics

inverseOf TransitiveProperty – if A relates to B and B relates to C, A relates to C SymmetricProperty FunctionalProperty – properties may be stated to have a unique

value

Page 13: Eksamensnotater Teknologi for Integrerte Operasjoner Og Semantisk Web

InverseFunctionalProperty – the inverse of the property has at most one value for each individual

Cardinality minCardinality, maxCardinality, Cardinality

Individuals Represent objects in the domain, are instances of classes and

properties others

hasValue, unionOf, complementOf, someValuesFrom osv osvo Class and property(not individual) can be hierarchically decomposed in OWL.o owl:Class is a proper subclass of rdfs:Classo Tool support: Protégé – an ontology editor and a knowledge-based editor, free,

open-source Java tool that provides an extensible architecture for the creation of customized knowledge-based applications

o Subsumption – superclass/subclass relationship, ALL members of a subclass can be inferred to be members of its superclasses, owl:Thing: superclass of all OWL Classes

o Disjointness – OWL assumes that classes overlap, you must explicitly state that classes are disjoint

o Open World Assumption - the assumption that the truth-value of a statement is independent of whether or not it is known by any single observer or agent to be true. If we don’t know, we assume it’s true.

o owl:Thing is a superclass of all OWL classes and a class for all individuals

- Editing OWL Ontologies with Protégé o Ontologies in the semantic web: provide shared data structures to exchange

information between agents, can be explicitly used as annotations in web sites, can be used for knowledge-based services using other web resources, can help to structure knowledge to build domain models

o A web language: based on RDF(S)o An ontology language: based on logico OWL ontologies: what’s inside – classes, properties, relations between classes,

restrictions on properties, characteristics of properties, annotations, individualso OWL use cases:

at least two different user groups, OWL used as data exchange language, OWL used for terminologies or knowledge models

OWL DL is the subset of OWL (Full) that is optimized for reasoning and knowledge modeling

o Reasoning with classes – Tool support for three types of reasoning exists: Consistency checking, classification, instance classificiation

o Restrictions Cardinality allValuesFrom – all values of the property must be of a certain type

Page 14: Eksamensnotater Teknologi for Integrerte Operasjoner Og Semantisk Web

someValuesFrom – at least one value of the property must be of a certain type

hasValue – at least one of the values of the property is a certain valueo Class conditions

Necessary conditions Necessary and sufficient conditions

- The Manchester OWL Syntaxo Logical symbols translated to Englisho Infix notation

- Semantic Interoperabilityo Challenges for future petroleum business: resource decline, cost increase, multitude

of companies involvedo Threatening profitability of business: lack of integration, bad decisions o Interoperability

Semantic interoperability – systems share and understand information with respect to mutually accepted domain concepts

Structural interoperability – systems share semantic schemas for common structuring of information

Syntactic interoperability – systems can exchange information by marking up data in a similar fashion

o Vision: introduce a semantic standard that covers all disciplines in subsea petroleum activities, is used by all companies. Use semantic standard to define meaning of data and provide uniform access to all data

o Experiences with semantic standardization: domain vs ontology experts, scope, ontology granularity, immaturity of the semantic web, commitment and costs

o Conclusions: IIP a collaboration project for Norwegian petroleum industry, ontology of 50 000-60 000 classes developed, challenges in industrial ontology engineering projects

- Semantic Search – Squirrelo Vision: from syntactic search via morpho-syntactic search to real semantic searcho Search is difficult and important: rapidly changing information sources, unstructured

information, fragmented information, partially relevant information, information heterogeneity, vague information needs

o Traditional search principles Bag-of-words principle – machine understands document as a set of word

frequencies Word matching principle – syntactic search: relevant documents are

documents that contain exactly those words that appear in the query.

Page 15: Eksamensnotater Teknologi for Integrerte Operasjoner Og Semantisk Web

Morpho-syntactic search: relevant documents are documents that contain inflectional variants of exactly those words that appear in the query.

One shot principle: Query and result set ignored when new query is posted Evaluation of search quality – recall is the fraction of the relevant documents

which has been retrieved: Recall = Ra/R. Precision is the fraction of the retrieved documents which is relevant. Precision = Ra/A, R-relevant, A-retrieved, Ra-relevant & retrieved

o Ontologies should help us see the semantic relationships between wordso Semantic approaches to search:

Search principles Applications – help user formulate queries, reformulate/reinterpret queries,

browse domain, formulate related queries, interoperability between search applications, semantic indexing of documents

o SEKT Squirrel Prototype Highlights: search functionality (keyword-based, semantics), browser

experience, integration of documents and metadata Challenges addressed by Squirrel: failure to handle polysemy and synonymy,

limited disambiguation of query terms, page rank issues Squirrel components

PROTON(lightweight general purpose ontology), Lucene indexing component(instances and classes), KAON2(ontology management, inference engine, SPARQL queries), NL generation(NL summaries of identified entities), DIWAF(web application framework), Ontology generation(ontology generated at index time, topic

hierarchies), User profile management(context, automatic/manual, short/long-

term), Massive Semantic Annotation(entity recognition, entity instances), Segmentation(classifying by topics)

Search refinement: Result page shows result set and topics for subset selection

Document view – presents documents, metadata, semantic markup in text Entity view – information about recognized entities and their related entities

o Conclusions Search approaches – syntactic search, morpho-syntactic search, semantic

search Squirrel – components, user experience

-

Page 16: Eksamensnotater Teknologi for Integrerte Operasjoner Og Semantisk Web

Part 8: Abstract OWL Syntax- Ontology Reasoning

o Description Logics(DL) – a formal logic-based knowledge representation language, widely used in database and semantic web

DL Basics: Concepts, roles, individual names, subsumption, operators Interpretations: DL Ontology is a set of terms and their relations,

interpretation of a DL Ontology: a possible world “model” that materializes the ontology

DL Semantics Formally – DL semantics defined by interpretations. Interpretation function .’ tells us how to interpret atomic concepts, properties and individuals

Oo OWL in practical use

Declaring a class: Class(pp:plant partial) Naming a special plant: Class(pp grass partial pp:plant) Alternative: Class(pp:grass) SubClassOf(pp:grass pp:plant) Declaring properties: ObjectProperty(pp:eaten_by) Inverse: ObjectProperty(pp:eats inverseOf(pp:eaten_by) Domain and ranges: ObjectProperty(pp:has_pet domain(pp:person)

range(pp:animal)) Datatype properties: DataProperty(pp:service_number range(xsd:integer)) Property Hierarchy: SubPropertyOf(pp:has_pet pp:likes) Algebraic properties: ObjectProperty(pp:married_to Symmetric) Individuals: Individual(pp:Tom type(owl:Person)) Individuals: Individual(pp:Rex type(pp:dog) value(pp:is_pet_of pp:Tom))

o OWL view of life: is not like a database system, there is no requirement that the only properties of an individual are those mentioned in a class it belongs to, no assumption that everything is known, classes and properties can have multiple definitions, statements about individuals need not be together in a document

o OWL is a description logic, which makes us able to check: Consistency, subsumption, equivalence, instantiation, retrieval, problems are reducible to consistency(satisfiability)

o Basic Tableau Algorithm Reasoning is machine understanding, to find facts that are implicit in the

ontology given explicitly stated facts Reasoning tasks: Knowledge is correct, is minimally redundant, is meaningful,

querying knowledge, knowledge base consistency Tableau Algorithm is the de facto standard reasoning algorithm used in DL,

basic intuitions: reduces a reasoning problem to concept satisfiability problem, finds an interpretation that satisfies concepts in question, the interpretation is incrementally constructed as a “Tableau”

o Using Pellet for Reasoning in Protégé – Pellet is one of the most common reasoning engines used for reasoning with Protégé OWL models

Page 17: Eksamensnotater Teknologi for Integrerte Operasjoner Og Semantisk Web

Standard reasoning: Pellet supports reasoning with the full expressiveness of OWL DL and OWL 1.1, this includes: cardinality restrictions, subproperty axioms, reflexivity restrictions, reflexive, irreflexive, symmetric, and anti-symmetric properties, disjoint properties, user-defined datatypes.

Consistency checking Concept satisfiability Classification Realization

Page 18: Eksamensnotater Teknologi for Integrerte Operasjoner Og Semantisk Web

Part 9: Ontology Engineering- The steps in Grüninger and Fox’s ontology engineering methodology used in the TOVE

project are: Capture of motivatin scenarios, Formulation of informal competency questions, Specification of terminology

- With the tf.idf score we extract classes and individuals- Ontology Concept Learning

o Ontology engineering: how to develop/maintain/assess complex ontologieso Ontology Modeling vs Learning:

Traditional ontology engineering approach – Project(form team), Ontology & domain experts(model domain, concepts, relationships), Domain experts(test against domain knowledge), Ontology experts(verify internal and application quality) – expensive and time-consuming approach

Ontology learning approach: Domain experts(find representative domain text), Tool(extract candidate concepts and relationships automatically), ontology and domain experts(select candidates and relationships and complete model) – very cost-effective, can also be used to verify domain quality of existing ontology

o Ontology Learning Assumptions People communicate and document using domain-specific concepts Ontology learning makes use of written documentation rather than human

involvement Requirements: representative documents, covering documents, well-defined

and consistent use of terminology in domaino Ontologies in search – Applications: help user formulate queries,

reformulate/reinterpret queries, browse domain, formulate related queries, interoperability between search applications, semantic indexing of documents

o Ontology value quadrant: Concepts familiarity – should match user’s way of subcategorizing

phenomena, ontology concepts vs user’s preferred concepts Document discrimination – combination of concepts determines which

groups of documents can be singled out, ideally, ontology concepts and user’s concepts can single out the same document sets

Query formulation – user’s query usually short, specialized or more generalized terms added to refine search

Domain stability – search domain constantly changing, documents added, deleted or changed, domain may be fragmented, ontology needs regular and frequent maintenance

o Keyphrase Extraction as Ontology Learning Unsupervised: extract optimum set of keyphrases to describe a documents

content Supervised: collection of documents with pre-assigned keyphrases

Page 19: Eksamensnotater Teknologi for Integrerte Operasjoner Og Semantisk Web

Experimental domain – project in Statoil – Extraction process: Document Data pre-processingCandidate phrase generationPhrase weighting and selectionKeyphrases

Data preprocessing – POS tagging, removes stopwords, lemmatization/stemming

Candidate phrase generation – select consecutive nouns as candidate phrases

Phrase weighting and selection – calculate tf.idf scoreo Tf.idf score – general idea: terms characteristic to a

document are given high tf.idf scores, terms frequent in all documents are given low tf.idf scores. Concept extraction idea: concepts characteristic to the domain given high tf.idf scores, terms with low tf.idf scores are not domain concepts

o Tfij =frequency in document/maximum frequency of any term in document

o Idf score = log(number of documents/number of documents in which the term occurs)

o tf.idf score = tf * idf Overall R-Precision

o R-precision – precision at the Rth position of a ranked list of results that have R relevant phrases

o Precision – number of relevant suggested phrases divided by the number of relevant phrases in collection

Optimalization: Documents should not be too short, score differences may matter, Multi-word phrases good concepts

Suitability in Search Ontologies – Concept familiarity, document discrimination, query formulation, domain stability

o Conclusions: Ontology learning is the discipline of automatically or semi-automatically constructing ontologies. Challenge to construct and maintain search ontologies. Unsupervised Keyphrase extraction for ontology learning(IR and linguistic methods, inexpensive and not very precise). Quality of unsupervised Keyphrase extraction(domain familiarity, query formulation, document discrimination, domain stability)

Page 20: Eksamensnotater Teknologi for Integrerte Operasjoner Og Semantisk Web

PUGGES!!!:The purpose of the Common Language Specification is to ensure compliance between different .NET languages. Public – everyone may call or accessProtected – only members of class and subclasses may accessPrivate – only members of exactly this class may accessSealed – can’t be used as a base classInternal – public access only within assemblyProtected internal – protected in assemblySOAP is a protocol for exchanging information over any mediumDefinition for Workflow: – a collection of activities that coordinates humans and/or softwareSIP Functions/services provided: user location, user capabilities, session setup, session management.