semantic web 0 (0) 1 1 ios press metaphactory: a platform...

17
Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform for Knowledge Graph Management Peter Haase a , Daniel M. Herzig a , Artem Kozlov a , Andriy Nikolov a,* and Johannes Trame a a metaphacts GmbH, Walldorf, Germany E-mails: [email protected], [email protected], [email protected], [email protected], [email protected] Abstract. In this system paper we describe metaphactory, a platform for building knowledge graph management applications. The metaphactory platform aims at supporting different categories of knowledge graph users within the organization by realizing relevant services for knowledge graph data management tasks, providing a rich and customizable user interface, and enabling rapid building of use case-specific applications. The paper discusses how the platform architecture design built on open standards enables its reusability in various application domains and use cases as well as facilitates integration of the knowledge graph with other parts of the organizational data and software infrastructure. We highlight the capabilities of the platform by describing its usage in four different knowledge graph application domains and share the lessons learnt from the practical experience of building knowledge graph applications in the enterprise context. Keywords: Knowledge graph, Data management, End-user platform, Industrial application 1. Introduction In the recent years, knowledge graph technologies established a solid position in the enterprise world, serving as a central element in the organizational data management infrastructure. Knowledge graphs are becoming both the repository for organization-wide master data (ontological schema and static reference knowledge) as well as the integration hub for vari- ous legacy data sources: e.g., relational databases or data streams. However, the tool support for managing knowledge graphs and building custom applications on top of them is still limited. To support organizations in managing and making use of knowledge graphs, we created the metaphactory platform which covers the whole lifecycle of knowledge graph applications: from data extraction & integration, storage, and querying to visualization and data authoring. In order to exploit the capabilities of knowledge graph technologies to the maximal extent, an organiza- * Corresponding author. E-mail: [email protected]. The au- thors are listed in the alphabetical order. tion requires many supporting functionalities beyond merely being able to store data as a graph and query it. Within a large organization, these functionalities have to provide support for different user groups: from lay users who merely want to explore the data in a conve- nient way to expert users who manage and modify the knowledge graphs and internal developers who create targeted applications customized for specific end user groups. Based on these diverse needs, such functional- ities include: Data management: Common management tasks can be time-consuming and expensive without supporting tools. Even assistance in such ba- sic functionalities like SPARQL querying with graphical UI, auto-suggestions, and query cata- log can already go a long way in uncovering the added value of knowledge graphs. Moreover, sup- port for knowledge graph authoring, including both schema ontologies and the instance data, as well as integration of data from other sources of different types are the features needed in almost any enterprise use case. 1570-0844/0-1900/$35.00 c 0 – IOS Press and the authors. All rights reserved

Upload: others

Post on 19-Aug-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform ...pdfs.semanticscholar.org/b1cd/0eeb0d288bf0210e6adb... · Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform for

Semantic Web 0 (0) 1 1IOS Press

metaphactory: A Platform for KnowledgeGraph ManagementPeter Haase a, Daniel M. Herzig a, Artem Kozlov a, Andriy Nikolov a,* and Johannes Trame a

a metaphacts GmbH, Walldorf, GermanyE-mails: [email protected], [email protected], [email protected], [email protected],[email protected]

Abstract. In this system paper we describe metaphactory, a platform for building knowledge graph management applications.The metaphactory platform aims at supporting different categories of knowledge graph users within the organization by realizingrelevant services for knowledge graph data management tasks, providing a rich and customizable user interface, and enablingrapid building of use case-specific applications. The paper discusses how the platform architecture design built on open standardsenables its reusability in various application domains and use cases as well as facilitates integration of the knowledge graph withother parts of the organizational data and software infrastructure. We highlight the capabilities of the platform by describingits usage in four different knowledge graph application domains and share the lessons learnt from the practical experience ofbuilding knowledge graph applications in the enterprise context.

Keywords: Knowledge graph, Data management, End-user platform, Industrial application

1. Introduction

In the recent years, knowledge graph technologiesestablished a solid position in the enterprise world,serving as a central element in the organizational datamanagement infrastructure. Knowledge graphs arebecoming both the repository for organization-widemaster data (ontological schema and static referenceknowledge) as well as the integration hub for vari-ous legacy data sources: e.g., relational databases ordata streams. However, the tool support for managingknowledge graphs and building custom applications ontop of them is still limited. To support organizationsin managing and making use of knowledge graphs, wecreated the metaphactory platform which covers thewhole lifecycle of knowledge graph applications: fromdata extraction & integration, storage, and querying tovisualization and data authoring.

In order to exploit the capabilities of knowledgegraph technologies to the maximal extent, an organiza-

*Corresponding author. E-mail: [email protected]. The au-thors are listed in the alphabetical order.

tion requires many supporting functionalities beyondmerely being able to store data as a graph and query it.Within a large organization, these functionalities haveto provide support for different user groups: from layusers who merely want to explore the data in a conve-nient way to expert users who manage and modify theknowledge graphs and internal developers who createtargeted applications customized for specific end usergroups. Based on these diverse needs, such functional-ities include:

– Data management: Common management taskscan be time-consuming and expensive withoutsupporting tools. Even assistance in such ba-sic functionalities like SPARQL querying withgraphical UI, auto-suggestions, and query cata-log can already go a long way in uncovering theadded value of knowledge graphs. Moreover, sup-port for knowledge graph authoring, includingboth schema ontologies and the instance data, aswell as integration of data from other sources ofdifferent types are the features needed in almostany enterprise use case.

1570-0844/0-1900/$35.00 c© 0 – IOS Press and the authors. All rights reserved

Page 2: Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform ...pdfs.semanticscholar.org/b1cd/0eeb0d288bf0210e6adb... · Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform for

2 P. Haase et al. / metaphactory: A Platform for Knowledge Graph Management

– End-user oriented interaction: The user has tobe able to interact with the knowledge graph inan intuitive and user-friendly way. This includes,for example, the need for navigation, explorationand visualization of knowledge graphs, rich se-mantic search with visual query construction andfaceting, and a simple yet powerful interface fordata authoring and editing.

– Rapid application development: Generic support-ing toolkit must enable creating use case-specificuser applications with minimal effort.

We developed the metaphactory platform to realizethese functionalities while primarily aiming at largeenterprises and organizations. The challenges of theuse case scenarios in such organizations necessitatecertain design principles which a generic knowledgegraph management platform has to follow:

– Reusability: The platform can be used in a greatvariety of contexts, domains, and use cases toserve multiple categories of users. All aspects andfunctionalities of a generic platform must be cus-tomizable to enable it to be reused in any context.Whenever possible, the design must avoid any in-herent assumptions about the data and user needs.The use of generic open standards such as RDF1,SPARQL2, SHACL3, and others where availablehelps to achieve this.

– Compatibility: In most cases, the knowledgegraph constitutes only a part of the organiza-tional data infrastructure and must be used incombination with other elements: non-RDF datasources, data processing APIs, external data an-alytics tools. A knowledge graph managementplatform must be compatible and envisage inte-gration with other elements of the infrastructureboth at the input level (combining different kindsof data for transparent access) as well as at theoutput level (enabling knowledge graph data tobe consumed by external tools). Open interfacesshould be used whenever possible to simplifysuch integration.

– Extensibility and customization: A platform whichis intended to be reused in different domains anduse cases must provide an expert user with capa-bilities to build custom targeted applications for

1https://www.w3.org/RDF/2https://www.w3.org/TR/sparql11-query/3https://www.w3.org/TR/shacl/

her needs with only a minimal support from theplatform vendor.

In this paper we present the key concepts of themetaphactory platform showing how these designprinciples help to address the knowledge graph man-agement challenges and share our experiences andlessons learnt in applying the platform in diverse real-world use cases. The rest of the article is structuredas follows. Section 2 provides a high-level overviewof the architecture of the platform and its main build-ing blocks. Further sections explain the functionali-ties of each architecture layer in detail. Section 3 de-scribes how the platform manages heterogeneous datasources and provides a uniform querying mechanism.Section 4 outlines the set of services realized by theplatform and exposed for consuming. Section 5 out-lines the design decisions behind the customizable UIof metaphactory and discusses different supported in-teraction paradigms. In section 6 we describe four ap-plication scenarios of the platform and discuss the ex-periences and lessons learnt. Finally, section 7 con-cludes the paper and provides the directions for futurework.

2. Architecture

Figure 1 shows a high-level overview of the archi-tecture of the metaphactory platform developed to pro-vide a variety of functionalities aiming at different tar-get user groups. One such group includes the end userswithin the organization. These users are not interestedin the internals of the knowledge graph data structureor technical complexities of data access and integra-tion. The expert users and data architects that have toauthor and modify the knowledge graphs and incorpo-rate legacy data are primarily interested in convenientand efficient tools to make data management opera-tions easier. Finally, application developers within theorganization require that it is easy to incorporate theknowledge graph infrastructure into the overall soft-ware infrastructure of the organization and build newtargeted end-user applications exploiting the knowl-edge graph with a minimal effort. The architecture wasdesigned to reconcile the diverse requirements of thesegroups while maintaining the design constraints out-lined above: enabling reusability in different domains,compatibility with other data sources and applications,and supporting extensibility and customization.

The platform operates on top of a graph databasestoring the knowledge graph. The communication is

Page 3: Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform ...pdfs.semanticscholar.org/b1cd/0eeb0d288bf0210e6adb... · Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform for

P. Haase et al. / metaphactory: A Platform for Knowledge Graph Management 3

Fig. 1. metaphactory system architecture.

performed by the data access infrastructure of theplatform using standard SPARQL 1.1 queries, whichmakes the system independent from a specific databasevendor. Depending on the customer needs, this al-lows the platform to connect to various different triplestores or even non-RDF sources virtually integratedand exposed via a SPARQL interface: e.g., a rela-tional database integrated using R2RML4 mappings ora custom keyword search index. To be able to interactwith multiple data sources using virtual data integra-tion, the platform contains a SPARQL federation en-gine Ephedra [1], which realizes SPARQL 1.1 queryfederation over both SPARQL endpoints and customcompute services.

On top of the data access infrastructure, the platformservices layer implemented at the platform backendside realizes a range of generic functionalities for inter-action with knowledge graphs. The services extend thecapabilities of the standard SPARQL access normallyprovided by triple stores and offer the specific capa-

4https://www.w3.org/TR/r2rml/

bilities to serve the needs of each target user groups:simplify the communication between web-based end-user UI and the knowledge graph, perform knowl-edge graph management operations over the ontolog-ical schema and data required by expert users andknowledge graph maintainers, and enable easy interac-tion with other tools, which is needed by in-house de-velopment teams. These services are exposed as RESTAPIs to be consumed by the client-side user interfaceas well as by third-party tools: e.g., Tableau5 or KN-IME6 for data analytics. One critical functionality im-plemented at this level is the user access control: theplatform allows configuring fine-grained access at thelevel of specific APIs available to each user role.

The web-component based user interface is built us-ing a customizable templating mechanism. Each URIresource in the knowledge graph is visualized using anHTML page. While one can design a separate HTMLpage for each resource, in the majority of cases this is

5http://www.tableau.com6http://www.knime.com

Page 4: Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform ...pdfs.semanticscholar.org/b1cd/0eeb0d288bf0210e6adb... · Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform for

4 P. Haase et al. / metaphactory: A Platform for Knowledge Graph Management

not required: it is possible to specify template pagesthat can be applied to any resource of the same type.When visualizing a data resource, the platform triesto select an appropriate template among the availableones. This template then gets rendered using the actualURI of the resource. The metaphactory client-side UIprovides a plethora of customizable UI components forinteracting with semantic data: from table-based andgraph-based visualizations to structured search envi-ronments and data authoring controls based on knowl-edge patterns. Each one represents an HTML5 compo-nent that can be directly inserted and configured withinHTML code.

In this way the user interface can be easily cus-tomized for the specific application use case at hand.The use of the standard HTML5 format for storingclient-side UI views enables an expert user to createand edit her own interface. Use case-specific configu-ration parameters and UI templates can be packaged asa separate app and added to the default platform instal-lation to realize a custom domain-specific knowledgegraph management application.

At the core of the architecture is the use of openstandards at all levels to enable smooth integration ofthe platform with existing tools and data providers.This also allowed using open-source libraries devel-oped in the Semantic Web community to implementthe platform functionalities. Table 1 summarizes theopen standards and tools used for implementation ofthe metaphactory platform at different architecturelayers.

3. Data Access Infrastructure

The metaphactory data access infrastructure is builton top of the RDF repository that serves as the storagefor the managed knowledge graph data. The SPARQL1.1 query language supported by most available RDFdatabases serves as a common communication proto-col. The platform reuses a popular RDF4J framework7

to realize access to data repositories. In this way, (a)the platform can be installed in any environment thatalready includes a triple store chosen by the customerorganization as well as (b) enables selecting any datastorage solution based only on the use case require-ments without introducing additional constraints on itsown. In particular, this helps addressing the scalability

7http://rdf4j.org/

issue: an appropriate backend triple store and a suit-able hardware infrastructure for it can be selected de-pending on the size of the dataset. The platform can beinstalled separately from the triple store and interactvia HTTP.

The metaphactory platform officially supports mostof the well-known triple store solutions available onthe market, e.g. Blazegraph8, Stardog9, Amazon Nep-tune10, GraphDB11, Virtuoso12, and others.

While the platform requires the existence of onemain default RDF repository containing domain data,it allows working with multiple repositories as well:the platform’s repository manager enables configuringand managing connections to many data repositoriesin a declarative way. In particular, this enables manag-ing domain data separately from the system data suchas saved SPARQL queries or SHACL data quality re-ports. Different data sources maintained by the repos-itory manager are described in RDF using the RDF4Jrepository configuration ontology. These repositoriescan include not only native RDF triple stores but otherdata sources accessible via SPARQL: most impor-tantly, relational databases virtually integrated usingR2RML mappings and exposed via SPARQL with anontology-based data access engine, either a separateone like Ontop [2] or one integrated with a triple storelike Stardog.

In many use case scenarios there arises a need tohandle hybrid information needs that require com-bining information from multiple data sources. Theseneeds are characterized by such dimensions as:

– Variety of data sources: The data to be integratedis often stored in several physical repositories.Such repositories can include both RDF triplestores and datasets that are only virtually pre-sented as RDF: e.g., relational databases exposedusing R2RML mappings.

– Variety of data modalities: RDF graph data of-ten needs to be combined with other data modali-ties: e.g., textual, temporal, or geospatial data. Tobe able to integrate those, SPARQL queries needto support special extensions for full-text, spatial,and other corresponding types of search.

8https://www.blazegraph.com/9https://www.stardog.com/10https://aws.amazon.com/neptune/11https://ontotext.com/products/graphdb/12https://virtuoso.openlinksw.com/

Page 5: Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform ...pdfs.semanticscholar.org/b1cd/0eeb0d288bf0210e6adb... · Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform for

P. Haase et al. / metaphactory: A Platform for Knowledge Graph Management 5

Table 1Open standards and open-source semantic web tools utilized in the metaphactory platform.

Architecture layer Standards used Open-source semantic web toolsData access infrastructure RDF, SPARQL 1.1, GeoSPARQL, R2RML RDF4J, OntopPlatform services LDP, SHACL, RDFS/OWL, REST RDFUnitUser interface HTML5, Web Components SPARQL.js

– Variety of data processing techniques: Relevantdata is often not stored directly in some reposi-tory, but has to be computed by some dedicateddomain-specific services: e.g., graph analytics(finding the shortest path or interconnected graphcliques), statistical analysis and machine learning(applying a machine learning classifier, findingsimilar entities using a vector space model), etc.

An application scenario can require dealing withseveral of these aspects simultaneously. While fed-erated query processing appears a natural way foron-the-fly integration of diverse data sources, the re-search effort, however, mainly concentrated on achiev-ing the transparent query federation over native RDFdatasets as opposed to the hybrid query challenges.Existing approaches either utilize meta-level informa-tion about federation members in order to build anoptimal query plan (e.g., DARQ [3], SPLENDID [4],ANAPSID [5], or HiBISCuS [6]) or use special run-time execution techniques targeting remote servicequeries (e.g., FedX [7]). Among the approaches target-ing hybrid query processing, SCRY [8] and Quetzal-RDF [9] deal with calling data processing servicesusing SPARQL queries. Quetzal-RDF defines customfunctions and table functions (generalized aggregationoperations) and invokes them from a SPARQL query,but does not follow the SPARQL 1.1 syntax. SCRYconforms to SPARQL 1.1 using special GRAPH tar-gets to wrap service invocations, although it cannotdistinguish between multiple input/output parameters.Thus, both these solutions are not generic enoughto handle the whole range of available hybrid datasources that include arbitrary structure of input and ar-bitrary size of the solution sets.

Given the limitations of existing solutions, to ad-dress these challenges we implemented Ephedra: aSPARQL federation engine for hybrid queries [1]. Weadopt the SPARQL 1.1 federation mechanism usingthe SERVICE keyword, but broaden its usage to enablecustom services to be integrated as data sources andoptimize such hybrid SPARQL queries to be executedefficiently.

Ephedra defines a common implementation inter-face, in which interactions with external services are

encapsulated in an RDF4J SAIL module13. In this way,a custom compute service can be registered in therepository manager as yet another SPARQL repositoryand referenced inside SERVICE clauses in SPARQLqueries. SPARQL graph patterns specified inside suchSERVICE clauses are parsed to extract input parame-ters for a service call as well as the variables to bind theresults returned by the service. The Ephedra SPARQLquery execution strategy sends the sub-clauses of aquery to corresponding data sources, gathers partial re-sults, combines them using union and join operations,and produces result sets. In this way, processing hybridqueries is transparent and performed in the same wayas ordinary SPARQL queries.

In order to express custom service requests as part ofSPARQL queries, hybrid services integrated throughEphedra are declaratively described using the Ephedraservice descriptor ontology. This ontology extends thewell-known SPIN14 ontology to define the acceptedgraph patterns, input arguments, and output results.For example, a wrapper for a custom service that re-trieves similar entities based on the proximity in theword2vec vector embedding space looks like the fol-lowing:

:Word2VecSimilarityService a eph:Service ;rdfs:label "A wrapper for the word2vecsimilarity service." ;eph:hasSPARQLPattern ([sp:subject :_entity ;sp:predicate mph:hasSimilar ;sp:object :_similar

]) ;spin:constraint[a spl:Argument ;rdfs:comment "URI of the search entity" ;spl:predicate :_entity ;spl:valueType xsd:anyURI

] ;spin:column[a spin:Column ;rdfs:comment "URI of a similar entity" ;spl:predicate :_similar ;spl:valueType xsd:anyURI

] .

13http://docs.rdf4j.org/sail/14http://spinrdf.org/

Page 6: Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform ...pdfs.semanticscholar.org/b1cd/0eeb0d288bf0210e6adb... · Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform for

6 P. Haase et al. / metaphactory: A Platform for Knowledge Graph Management

Based on this descriptor, Ephedra will be able to in-terpret and process the following query answering thequestion “Which painters are similar to Rembrandt?”and returning service outputs among the query result:

SELECT ?artist ?label WHERE {SERVICE mpfed:wikidataWord2Vec {

wd:Q5598 mph:hasSimilar ?artist .}?artist wdt:P106 wd:Q1028181 . # painter?artist rdfs:label ?label .

}

With this approach, services with standardized in-terfaces (e.g., REST APIs) can be included into theEphedra federation in a fully declarative way, withoutthe need to implement specific service wrappers.

Processing hybrid queries including custom servicecalls requires modifying the query processing proce-dures, because a hybrid query does not follow thestandard SPARQL semantics. Processing a SERVICEclause realizing a custom service call requires all in-put parameters to be bound, which means that the joinoperands cannot be arbitrarily re-ordered. For this rea-son, Ephedra implements dedicated static query opti-mization strategies, which produce an optimal and ex-ecutable query plan. The SPARQL query algebra is ex-tended to include service call patterns ΣS as first-classelements. Building a query plan containing hybrid ser-vice call patterns involves join order optimization toensure that such patterns can be only joined after theirinput dependencies are bound. Moreover, presence ofa service call pattern in the query plan imposes addi-tional restrictions on the selection of suitable join op-erators: for example, it necessitates the use of nestedbound join as opposed to hash join.

After constructing an executable hybrid query plan,Ephedra uses dynamic query optimization techniquesto reduce the processing time: in particular, synchro-nizing loop join requests and adaptive processing of n-ary joins. The evaluation experiments reported in [1]show that these techniques result in runtime improve-ments for all test queries, sometimes by an order ofmagnitude.

4. Platform Services

The platform backend realizes a range of servicesthat implement additional functionalities on top ofthe standard interfaces of triple stores. These servicesstreamline the specific types of interactions with theknowledge graph that are required by different tar-get user groups. They include (a) convenience servicesthat realize commonly required tasks that are usually

not provided by triple stores directly and (b) connec-tor services that make data from the knowledge graphconsumable by applications.

Convenience services implement commonly re-quired routines that are either too cumbersome to berealized by sending SPARQL queries directly or re-quire additional tools (e.g., a separate keyword searchindex). These services are implemented in a genericway avoiding dependencies on a particular triple storeand/or ontology. A simple example of a convenienceservice is a generic label service retrieving a human-readable label for a given resource. The service canbe configured to deal with various modelling pat-terns (e.g., taking into account not rdfs:label, butskos:prefLabel or even property paths) and languagepreferences. Such services provide the functionalitiesutilized by the end-user interface components.

Another set of generic convenience services realizesthe knowledge graph management tasks required byexpert users and their tools:

– Linked Data Platform (LDP) service for resource-based access, creation, update, and deletion oflinked data according to the W3C LDP specifica-tion15

– Data quality service for performing data valida-tion using SHACL constraints.

– Query catalog service for saving and managingreusable SPARQL query templates (expressed us-ing the SPIN ontology).

– Keyword search service based on the Graph-Scope16 data search engine integrated into theplatform.

Manual modifications of a knowledge graph as wellas bringing in transformed legacy data sources causethe need to verify and maintain the knowledge graphdata quality: its adherence to the schema ontologiesand domain constraints. To this end, the metaphactoryplatform utilizes the SHACL standard for defining andverifying the data constraints. SHACL provides an on-tology for expressing data constraints in the form ofshapes: RDF structures describing sets of restrictionsthat the data in the main knowledge graph must con-form to. SHACL allows defining complex combina-tions of constraints more expressive than those sup-ported by OWL. The metaphactory platform supportsboth SHACL constraints defined manually by the useras well as automatically generated from the OWL on-

15https://www.w3.org/TR/ldp/16http://metaphacts.com/graphscope

Page 7: Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform ...pdfs.semanticscholar.org/b1cd/0eeb0d288bf0210e6adb... · Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform for

P. Haase et al. / metaphactory: A Platform for Knowledge Graph Management 7

tology. Auto-generator bootstrapping rules process theRDFS and OWL restrictions and generate correspond-ing SHACL shapes. The platform includes a SHACLexecution engine based on the open-source RDFUnit17

implementation that transforms SHACL shapes intoSPARQL queries, executes them against the knowl-edge graph and generates a data quality report.

Connector services aim at pre-processing and ex-posing knowledge graph data in a way that makes iteasily consumable by external applications. These ser-vices are primarily available for the needs of applica-tion developers utilizing the knowledge graph infor-mation in combination with other elements of the in-frastructure within an organization. Examples of suchservices are the Tableau connector service that ex-poses the data for analysis by a popular Tableau ap-plication and the Alexa skill service that generatesan Alexa18 skill description to enable voice interac-tion with knowledge graph. To ease the integrationof the platform into an organizational infrastructure,the Query-as-a-Service (QaaS) functionality is imple-mented. This allows exposing pre-saved parameterizedSPIN query templates as custom REST APIs that areeasy to call by internally developed tools within a cus-tomer organization. Ability to work with a REST APIis usually preferred by internal developer teams to theneed to compose and send SPARQL queries directly.

To reconcile the needs to support diverse usergroups and to keep the platform adaptable to variousdifferent use cases and ease integration with other toolsand applications in the organization, the service layerwas implemented based on open standards wheneverpossible. While SPARQL 1.1 is used as a commonquery language, other standards are used for realiz-ing more specific functionalities: for example, SHACLto specify and process the constraints over the data,LDP to enable management operations over RDF us-ing HTTP requests, REST as a communication inter-face for external applications.

5. User Interface

The user interface design of the metaphactory plat-form aims at satisfying the requirements outlined insection 1: enabling reusability, compatibility, and cus-tomization. For this reason, the core platform interfaceis implemented in a generic way, providing tools for

17https://github.com/AKSW/RDFUnit18https://developer.amazon.com/alexa

custom-building of all specific views for the use caseat hand. In addition to that, the platform envisages ex-ternal applications connecting to it, which gives possi-bility to use alternative user interfaces according to theuser needs. One example of such an alternative UI isAmazon Alexa.

5.1. Customizable UI: templates and customcomponents

The web-based UI of the metaphactory platform isresource-centric: the platform renders an HTML viewfor a provided URI of a resource from the knowl-edge graph (Fig. 2). Resource views can be specifiedat different levels of abstraction: it is possible to de-fine a dedicated HTML view for one specific resourceas well as provide templates that will be applied to allresources of a specific kind. The criterion for choos-ing a template for a resource is configurable: by de-fault, the template is selected according to the rdf:typeproperty value, but this can be changed by providing aspecial template selection SPARQL query. This allowsadapting the platform setup depending on the high-level structure of the knowledge graph.

Resource views and templates are defined as HTMLpages. These can be created and edited directly in theplatform using the provided HTML editor. Besidesstandard HTML tags, a view can contain special UIcomponents: configurable custom client-side UI ele-ments which can be referenced in the template sourcecode as special HTML5 tags. A custom UI compo-nent receives its configuration parameters as tag at-tributes. A generic component that needs to interactwith the knowledge graph (e.g., to visualize data) re-ceives the data selection SPARQL query as part of itsconfiguration and interacts with the platform backendservice layer APIs to obtain the required data. In par-ticular, the built-in components reuse the open-sourceSPARQL.js19 library to operate with SPARQL queries.

Moreover, the web components provided by metaphac-tory are not only customizable individually, but arecomposable. Some component can define an environ-ment that can include other nested components ableto communicate via pre-defined interfaces. Such en-vironments can define common configuration param-eters that are applied to all included components aswell as define a layout for these components. An ex-ample of such environment is a structured search com-

19https://github.com/RubenVerborgh/SPARQL.js

Page 8: Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform ...pdfs.semanticscholar.org/b1cd/0eeb0d288bf0210e6adb... · Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform for

8 P. Haase et al. / metaphactory: A Platform for Knowledge Graph Management

Fig. 2. Example resource view for a protein in the Wikidata knowledge graph.

ponent that is able to combine different input controlsfor generating a search request (e.g., keyword searchor form-based search) with various options for visual-izing and exploring search results. With this approach,complex user interaction functionalities can be spec-ified in a fully declarative way by composing pre-existing atomic elements without the need to build newcomponents for each new use case application.

In this way, the user interface addresses the designrequirements supporting reusability and custom appli-cation building. View templates and configurable cus-tom components provided by the platform enable partsof the UI to be reused for different resources, datasets,and applications. Building a custom end-user applica-tion for a specific use case involves defining and pack-aging a set of required UI templates and configurationsand does not require modifications of the platform it-self. An application can be delivered as a simple plu-gin that can be installed automatically on top of thestandard platform release.

The platform provides a wide range of custom com-ponents for various user needs primarily focusing onthe functionalities most often required by end users:search, visualization, and authoring. The main proto-col of interaction with RDF stores is the SPARQLquery language: an expressive formalism, which how-

ever is not convenient for non-expert users. For thisreason, search and visual exploration capabilities thatwould capture information needs in a user-friendlyway and generate information retrieval queries basedon them are crucial for an end-user tool working withknowledge graphs.

Semantic search and visual exploration capabilitiescomplement each other in enabling the user interac-tion with the knowledge graph. Semantic search allowsdefining exact information needs in a user-friendlyway and retrieving relevant data from the knowledgegraph. On the other hand, “exploration is about ef-ficiently extracting knowledge from data even if wedo not know exactly what we are looking for” [10].Structure of semantic datasets was exploited to gener-ate facets for search refinement [11] and linked databrowsing [12], where facets are selected based on datadistribution within the dataset. The need to express in-formation needs using complex SPARQL queries ledto development of query builder tools providing vi-sual assistance to the user. For example, the ExCon-Quer [13] framework provides an interface for con-structing SPARQL queries where the user expressesher information need by selecting the concepts andproperties from the ontology and building an abstractquery structure which is afterwards translated into

Page 9: Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform ...pdfs.semanticscholar.org/b1cd/0eeb0d288bf0210e6adb... · Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform for

P. Haase et al. / metaphactory: A Platform for Knowledge Graph Management 9

a SPARQL query. OptiqueVQS [14] provides visualquery formulation allowing the user to select rele-vant concepts and properties, express value restric-tions, and visualize the resulting query pattern as agraph structure. QueryVOWL [15] combines elementsof search and exploration as it provides a visual con-struction of queries over the linked data and convertsthe user-generated diagrams into SPARQL. However,its use of exploration is rather limited as its only goalis construction of the SPARQL query, rather than en-abling the user working with its results. Others, likeLodlive [16] or Aemoo [17] primarily focus on brows-ing and graph-based exploration of data. In metaphac-tory we aimed at combining components for structuredsemantic search to capture the user needs and trans-late them into SPARQL queries with powerful explo-ration tools that would pick up the semantic search re-sults and allow the user to further explore them (e.g,by means of the Ontodia tool [18]).

In its structured search functionality, the metaphac-tory platform puts emphasis on the capability to con-figure search and adapt it to different use cases witha minimal effort. In particular, search over the knowl-edge graph is realized as an environment, which allowsspecifying different ways for generating the searchquery, refining it, and visualizing the search results. Asearch request can be generated in several ways: e.g.,by using a simple keyword search text field, by en-tering parameters in a pre-defined form, or by con-structing the search criteria iteratively, selecting rele-vant properties and constraints. All these multiple ap-proaches for expressing user information needs gener-ate the search request in the same form: as a SPARQLquery. The query can be further refined using facetsthat can be configured declaratively as part of thesearch environment. Finally, the search results returnedby the produced query can be visualized by any ap-propriate UI component depending on the form of dataand customer preferences: e.g., as a table or as a chart.

Built-in visualization components in the metaphac-tory platform provide the capabilities to show subsetsof a knowledge graph in an informative way, facilitateexploration of the graph, and summarize the informa-tion. Depending on the structure of the data and thetarget view, the developer can select alternative visu-alization strategies, e.g., a table that condenses graphdata into a more traditional relational format or a graphthat emphasizes inter-relations between entities. Forthe latter, the metaphactory platform is integrated witha powerful Ontodia [18] tool for building custom RDFgraph diagrams (Fig. 3). At the moment, Ontodia rep-

resents one of the most powerful ontology visualiza-tion tool taking into account the range of supported in-teraction techniques [19]. Ontodia allows not only vi-sualizing parts of an RDF graph, including concepts,entities, relations, and properties, but also enables theuser to further explore the dataset and extend the viewin an interactive way. User experiments performed tovalidate Ontodia in combination with semantic searchwithin the diagrammatic question answering workflowon the Wikidata dataset have shown that the systemallows the users to interact with the knowledge grapheffectively without having upfront knowledge of thedataset structure [20].

To summarize aggregated data and show the user theanalysis results, various charts are available. Commonspecial types of data, such as geospatial and temporal,can be visualized using appropriate views: e.g., map ortimeline.

To support creation and editing of knowledge graphs,the platform provides end-user friendly, customizable,and extensible authoring UI based on the compos-ite component environment and knowledge graph pat-terns. In essence, a knowledge graph pattern is a struc-ture including a SPARQL query pattern with some ad-ditional metadata that is used for creation and vali-dation of the user input. This concept helps to hidethe complexity of the underlying data model from theend user, but at the same time allows expert users toprecisely capture information needs and adjust author-ing UI for these needs by using various componentsfor data input, ranging from simple text inputs to hi-erarchical suggestion components and nested forms(Fig. 4). The application of knowledge graph patternsis not limited only to data authoring. The same patterncan be re-used in structured search for query genera-tion and in visualization components for data retrieval.This re-usability helps to make sure consistency of theUI across the whole application.

5.2. Expressive keyword search querying withGraphScope

GraphScope is a data search engine for knowledgegraphs that allows the user to access RDF data in a sim-ple way by entering keyword queries. These keywordqueries are interpreted by GraphScope and the resultsmatching the information need are shown to the user.GraphScope tries to answer the user’s query by findingthe most suitable interpretation of each keyword withrespect to concepts, properties, and instances of theknowledge graph. To this end, GraphScope relies on

Page 10: Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform ...pdfs.semanticscholar.org/b1cd/0eeb0d288bf0210e6adb... · Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform for

10 P. Haase et al. / metaphactory: A Platform for Knowledge Graph Management

Fig. 3. Graph visualization with Ontodia.

Fig. 4. Authoring forms using knowledge graph patterns.

a set of specialized indices that contain various meta-information about the distribution of entities and asso-ciated keywords in the data repository. Based on theseindices, GraphScope identifies the most appropriatematches for the keywords in the user query and gener-ates a corresponding SPARQL query to retrieve the re-sults. After the system returned an initial result set forthe user’s keyword query, GraphScope provides possi-bilities to further refine the query and explore the initialresult set. The user can select an appropriate interpreta-tion for keywords to refine the meaning of the query aswell as further expand the results by exploring the rela-

tions of the entities in the result set. These refinements,which the user performs from the UI, are automaticallytransformed into modifications of the SPARQL query.

GraphScope, originally developed as a standalonetool, was integrated with the metaphactory platform toserve as an additional query generation approach in thegeneric architecture. The integration is realized bothat the backend and frontend levels. GraphScope andmetaphactory reuse the same connection to the back-end triple store to access data via SPARQL. At thefrontend level, the GraphScope user interface for defin-ing a keyword search query and refining the search re-

Page 11: Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform ...pdfs.semanticscholar.org/b1cd/0eeb0d288bf0210e6adb... · Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform for

P. Haase et al. / metaphactory: A Platform for Knowledge Graph Management 11

sults is wrapped as a metaphactory UI HTML5 com-ponent that can be added on a template page.

5.3. Voice interface using Amazon Alexa

Separating the service layer backend and the client-side UI makes it easy to integrate the platform withexternal tools at the UI level: an alternative UI canbe deployed on top of the platform, while it can in-teract with the knowledge graph by invoking rele-vant platform services. An example of such alternativeuser interface is voice interaction using Alexa that canbe used alongside the more traditional paradigms de-scribed above [21]. The Amazon Alexa framework al-lows the developer to define a specific service (calledSkill) to process user requests and generate verbalizedanswers. The Alexa framework includes two parts: anAlexa skill which provides an abstraction over com-plex voice processing and generation services and anAmazon Lambda service implementing the applica-tion logic. An Alexa skill processes the voice mes-sages from the user and transforms them into servicerequests with provided parameters. A skill can definea number of request types called intents, which canin turn be mapped to several utterances (natural lan-guage phrases). An utterance can be parameterizedusing slots (request parameters): for example, whenAlexa receives a user’s question “Alexa, ask Wikidatawhat is the capital of Poland”, it extracts the skill name(Wikidata), identifies the intent corresponding to thequestion “what is the capital of. . .?” and the corre-sponding Lambda service. Then, the Lambda serviceis invoked with the extracted parameters: the ID of theintent (e.g., “capitalcity”) and the request parameter(“Poland”). The Lambda service is then responsible forfinding the right answer and returning it in a verbalizedform. The Alexa service is then merely pronounces thereturned verbalized answer.

Our Lambda function finds an answer to the queryin the knowledge graph by mapping an intent to aSPARQL query pattern. For example, to answer ourexample question over the Wikidata knowledge graph,we need to find the value of the property P36 “capital”for the entity Q36 “Poland”.

To handle such a direct factual question, our Lambdafunction uses a SPARQL query pattern of the follow-ing form (here using the Blazegraph full-text search):SELECT ?answer WHERE {

?uri rdfs:label ?label.?label bssearch:search "${entity}".?label bssearch:minRelevance "0.5".?label bssearch:matchAllTerms "true".?uri ${property} ?answer .

} LIMIT 1

We use an automated procedure to bootstrap theAlexa skill definition and generate descriptions of in-tents as well as example entities and verbalization tem-plates. Our Alexa skill is available in the Amazon Skillstore under the name “metaphacts” in German and En-glish20.

6. Experiences and Lessons Learnt

The metaphactory platform is used in production ina variety of use cases involving knowledge graph man-agement in different application domains. In the fol-lowing we consider four diverse practical use casesfrom the open knowledge graphs, cultural heritage, en-ergy industry, and life sciences domains highlightingdifferent aspects of knowledge graph management.

6.1. Open Knowledge Graphs (Wikidata)

Wikidata21 is a free and open knowledge graphcontaining general-purpose data across domains. Itis used as a reference data storage for other Wiki-media projects (in particular, Wikipedia). Due to itscomprehensive nature and popularity, there exists alarge volume of mappings between Wikidata entitiesand instances of other repositories, including domain-specific ones (e.g., UniProt22 for proteins, GeoN-ames23 for geographical locations, etc.).

To exploit this data, we have set up a public show-case system for the community24, easing access tothe information provided by the Wikidata query ser-vice. The system provides different search interfacesas entry points into Wikidata’s knowledge base andvisualizes search results based on a comprehensiveHTML5 based templating approach. Internally, thepublic metaphactory Wikidata system is used bothas an integration hub to facilitate integration of datafrom multiple sources for the needs of industrial usecases as well as a demo system to highlight variousplatform features. Among others, the public Wikidatametaphactory system includes such functionalities as

– Structured search over Wikidata (configured forgeneral-purpose person-organisation data as wellas for the life sciences domain)

20https://www.amazon.com/metaphacts/dp/B0745KLCFX/ forUS English.

21https://www.wikidata.org22https://www.uniprot.org/23http://www.geonames.org24https://wikidata.metaphacts.com/

Page 12: Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform ...pdfs.semanticscholar.org/b1cd/0eeb0d288bf0210e6adb... · Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform for

12 P. Haase et al. / metaphactory: A Platform for Knowledge Graph Management

– Integration with the Wikidata free-text search ser-vice

– Integration with life science-specific repositoriesto show additional views

– Semantic similarity search based on a word2vecvector space embedding model [22]

– Geospatial search using a combination ofGeoSPARQL25 queries, external services likeOpenStreetMap26, and map-based visualizations.

Using the links to external repositories, the Wikidatametaphactory system is used to integrate external datain other use cases: e.g., to bring in geospatial data inthe cultural heritage use case or additional protein tis-sue data from neXtProt27 in the life science scenario.

6.2. Cultural Heritage

Knowledge graph technologies have become promi-nent in the context of the cultural heritage domain,where the CIDOC-CRM ontology28 became a popu-lar standard for exposing cultural heritage informationas linked data. The metaphactory platform is utilizedin the context of the ResearchSpace project29 to man-age the British Museum knowledge graph and help theresearchers (a) explore meta-data about museum arti-facts: historical context, associations with geograph-ical locations, creators, discoverers and past owners,etc, and (b) use this meta-data in collaborative workby creating annotations, narratives involving semanticreferences, and argumentations exploiting knowledgegraph data as evidence [23].

A crucial piece of functionality is the structuredsearch, where the user can define complex multi-criteria information needs iteratively, by selecting ap-propriate clauses in the user interface (Fig. 5). A queryrequest, which can look like “give me all bronze arti-facts created in Egypt between 2500BC and 2000BC”,then gets translated into SPARQL and answered us-ing the knowledge graph data. Given the complexityand very high expressivity of the CIDOC-CRM on-tology, using it directly at the level of user interfacewould make the system complicated for the end-user.For this reason, we defined a set of special fundamen-tal concepts (FCs) and relations (FRs) [24], which ab-stract over physical classes and properties of the ontol-

25http://www.opengeospatial.org/standards/geosparql26https://www.openstreetmap.org27https://www.nextprot.org/28http://www.cidoc-crm.org/29http://www.researchspace.org/

ogy, while being intuitively understandable to the user(e.g., Thing, Actor, Event and relations between them).With the structured search interface, the user can spec-ify the criteria for data selection, interactively refinethe initial selection results, explore the returned resultset with faceted search and different results visualiza-tion views, and finally save the defined search for fu-ture reuse.

Knowledge graph data is used to support researchcollaborations by enabling the argumentation processand making assertions about graph entities followingthe direction outlined in [25]. User assertions have ex-plicitly stored provenance information and can com-pose complex exchanges of arguments allowing totrace the whole reasoning process back to its originsand base evidence. Finally, the data selected from theknowledge graph can be used as references in semanticnarratives that combine user-authored text, referencesto knowledge graph entities and different data visual-izations supported by the platform: from images asso-ciated to entities to charts summarizing selected datasubsets.

A demo installation of the ResearchSpace platformover the British Museum knowledge graph collectionis available on the Web30.

While the British Museum data collection was thefirst target use case in the cultural heritage domain,the resulting ResearchSpace platform extension wasre-applied for other use cases in this area. Sphaera Cor-pusTracer [26], a practical application developed incollaboration with the Max-Planck Institute manages aknowledge graph addressing science history informa-tion: e.g., tracing the survived printed publications ofmedieval astronomy textbooks in the early modern pe-riod31.

6.3. Engineering & Manufacturing Industry

Large-scale organizations involved in the engineer-ing and manufacturing industry use knowledge graphrepresentation to model master data: e.g., informationabout products and their typology, separate assemblyparts, projects and their topics, etc. An inherent traitof these use cases is the condition that the knowledgegraph constitutes only a part of a complex infrastruc-ture involving heterogeneous data sources and large-scale networks of atomic devices able to communicatedata. This raises a number of challenges for data man-

30https://demo.researchspace.org31http://db.sphaera.mpiwg-berlin.mpg.de

Page 13: Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform ...pdfs.semanticscholar.org/b1cd/0eeb0d288bf0210e6adb... · Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform for

P. Haase et al. / metaphactory: A Platform for Knowledge Graph Management 13

Fig. 5. Structured semantic search over British Museum artifacts.

agement and utilization: in particular, the need to in-tegrate static knowledge with runtime data and applypotentially complex analysis procedures.

These challenges are present in full in the usecase scenarios in which metaphactory is applied atSiemens [27]. For example, a typical Siemens gas tur-bine has about 2,000 sensors and a diagnostic task todetect whether a single particular purging operationis over requires applying multiple hundreds of signalprocessing rules. To be able to process relevant infor-mation in a timely way and achieve intelligent diag-nostics, knowledge graph data must be integrated withruntime information provided by relevant APIs on de-mand. In this context, knowledge graph informationacts as the integration backbone for the heterogeneousdata infrastructure. To provide the integrated view ofthis hybrid infrastructure, the integration capabilitiesof the metaphactory platform play a crucial role.

The Ephedra federation architecture enables com-bining the “native” knowledge graph data with the dataproduced by the API services available in the Siemensinfrastructure, such as MindSphere32, as well as withanalysis results produced by machine learning algo-

32https://siemens.mindsphere.io/

rithms in a transparent way, at the level of SPARQLqueries.

For example, one of the use cases involves usingthe knowledge graph to integrate and represent knowl-edge about a building in order to create a full virtualcopy (Digital Twin) of the real building, including thestatic information describing the building as well asdynamic sensor data reflecting its current state. In an-other case, a trained machine learning model is ex-posed as a REST API that takes as input a list of fea-tures and returns as response the classification resultsof a text categorization problem. The use of Ephedraallows wrapping these diverse API as SPARQL feder-ation members and processing federated queries thatjoin the raw data with the predictions generated by ma-chine learning. In this way, the hybrid data integra-tion capabilities enabled by the platform enable severalpractical end-user scenarios within the Siemens infras-tructure using a common approach to build several dis-tinct knowledge graph applications targeting differentuser groups.

Page 14: Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform ...pdfs.semanticscholar.org/b1cd/0eeb0d288bf0210e6adb... · Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform for

14 P. Haase et al. / metaphactory: A Platform for Knowledge Graph Management

6.4. Life Sciences

The life sciences and pharmacy industry has beenamong the early adopters of the semantic web/knowledgegraph technologies. Currently there exists a wide rangeof reusable life-science reference datasets already pub-lished in RDF and available in public linked datarepositories (e.g., neXtProt [28], ChEMBL33 [29] andothers). In order to be exploited in an industrial en-vironment, these datasets must be integrated with theinternal data sources collected within the organization.

Such a scenario involved utilizing the metaphac-tory platform within the Sanofi S.A.34 pharmaceuti-cal company. The project focused on building a tar-get dashboard to support internal decision making pro-cesses using integrated data from over 20 different datarepositories including both company-internal and pub-lic data. The metaphactory platform served as a one-stop portal for consolidated access to target-related in-formation such as function, expression, genetics, inter-actions/pathways, etc.

To achieve transparent querying of integrated data,relevant data sources were brought together using acombination of ETL processes and virtual integra-tion. Data contained in relational databases were trans-formed using R2RML mappings. A high-level ontol-ogy was designed to serve as a global schema to ab-stract from the set of local ontologies of each datasource. To support the end-user in finding and inter-preting the data, the target dashboard user interfacewas built using the metaphactory UI templating func-tionality and the generic platform services. The dash-board includes complementary search mechanisms,which can be used depending on the user task at hand:structured search that realizes ontology-based graph-ical query building and keyword search. Search re-sults are visualized using template pages according tothe relevant types of entities. In addition, the data ex-posed via the platform API services is further exploitedby third-party tools like KNIME to perform advanceddata analysis.

Given the diversity of data to be integrated and thelikelihood of error, ensuring the quality control be-comes a matter of critical importance. To this end, theplatform setup included the use of SHACL to verifythe quality of the data. Using a combination of SHACLconstraints automatically generated from the ontologyas well as manually curated domain rules help to en-

33https://www.ebi.ac.uk/chembl/34http://www.sanofi.com

sure that the added value of data integration is not com-promised by reduced data quality.

6.5. Discussion

The metaphactory platform has been deployed invarious knowledge graph management use cases tar-geted at different industries, organization types, anduser groups. This experience provided us with valu-able hints regarding the management and exploitationof knowledge graphs within enterprise organizations aswell as the challenges which have to be addressed bya general-purpose knowledge graph management plat-form. The use cases demonstrate the importance fora knowledge graph application to adapt to the needsof existing data infrastructure aiming to complementit with missing functionalities rather than replace it.The knowledge graph platform must support rapid de-velopment and deployment of a use case applicationthat would demonstrate the added value of knowledgegraph technologies. Two examples of such function-alities are transparent data integration and semanticsearch.

Data integration needs of real-world customers of-ten go beyond RDF data and relational/tabular datasources. While static data can be integrated by meansof a dedicated ETL process and physical materializa-tion of data as RDF, using dynamic data in combina-tion with the knowledge graph requires extension ofcommon data access mechanisms. The metaphactoryapproach to the problem includes support of hybrid in-formation needs by means of extended SPARQL 1.1federation capabilities. In this case, dynamic data ex-posed by means of a service can be accessed by requestand combined with the knowledge graph informationat the level of query results.

The use of open standards is crucially important inrealizing interactions with third-party tools at all lev-els. For example, using SPARQL 1.1 as a generic inter-face for communicating with data repositories enablesthe platform to interact with any triple store selected bythe customer without imposing any restrictions on theinfrastructure. For the same reason, it is also beneficialif a knowledge graph management platform can pro-vide its output using interfaces preferred by the organi-zation’s developer teams: for instance, if some data canbe retrieved by sending a SPARQL query to the end-point, it is still often preferable if the same data can beprovided via a REST API call with a set of key-valueparameters.

Page 15: Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform ...pdfs.semanticscholar.org/b1cd/0eeb0d288bf0210e6adb... · Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform for

P. Haase et al. / metaphactory: A Platform for Knowledge Graph Management 15

One generic observation regarding the requirementstowards end-user supporting tools in the industrial set-ting concerns the way reusability is achieved. In aca-demic research the focus is often on providing a maxi-mally generic approach that is able to adapt to the spe-cific task at hand by using intelligent heuristics: for ex-ample, by generating automatically the most suitableset of facets for semantic search results explorationor by automatic selection of relevant data sources toachieve transparent query federation. In the industrialsetting, the possibility to adjust the configuration ofsome module manually in a very fine-grained way isusually more important, while automating heuristicscan play only an auxiliary role. This is mainly dueto the need to deal with “long tail“ edge cases in theknowledge graph structure, for which a generic tech-nique would fail: improving the coverage of a fullygeneric approach for such cases is usually more expen-sive than manual configuration. This often creates anobstacle for applying open source tools developed inthe research community to industry use cases.

A survey in [30] introduces an assessment frame-work for knowledge graph tools, listing such dimen-sions as human interaction, machine interaction, andstrategic development. Table 2 summarizes the fea-tures implemented in the metaphactory platform tosupport each of the assessment parameters of these di-mension. To realize its goal of supporting the wholeknowledge graph lifecycle, metaphactory realizes alldimensions and parameters listed in the assessmentbenchmark. No other tool evaluated in [30] includesall the features. Moreover, the variety of different usecases described in sections 6.1 to 6.4 demonstrate theplatform ability to support rapid development of cus-tom applications with minimal effort by providing cus-tom configurations of core features rather than customimplementations.

However, based on the experience of using the plat-form in industrial use cases, it is apparent that sup-port for some of these functionality dimensions stillhas limitations and requires further improvement. Themost important of these are data curation, linking,and analytics. Manual knowledge graph populationand editing is often a costly process where automa-tion can provide added value in many ways, both to as-sist the user in manual curation workflows and to sup-port automated knowledge graph population from un-structured sources. Given the variety of the data inte-gration use cases, data interlinking challenges arise indifferent contexts and require the ability to deal withdifferent data modalities and formats (e.g, free text,

XML&JSON documents) as well as apply semi- orfully automated data interlinking algorithms. Finally,supporting data scientists in performing advanced dataanalytics tasks requires better integration of the plat-form with a variety of common machine learning andanalysis tools as well as generic out-of-the box supportfor integration of analysis results back into the knowl-edge graph.

7. Conclusions and Outlook

The metaphactory platform has been developedwith the aim to support interaction with knowledgegraphs and utilize the added value of knowledge graphdata within organizational environments. The trial ver-sion of the platform can be accessed for free after reg-istration35.

Given the feedback gathered in multiple use casesand the limitations listed in Section 6.5, we considerseveral directions particularly important to improve thecurrent version of the platform.

– Intelligent data authoring. Whenever knowledgegraph data has to be inserted manually by the userrather than imported from pre-existing sources, itoften constitutes a cumbersome and time-costlyprocess. It is important to assist the user in knowl-edge graph population tasks whenever possibleproviding intelligent auto-suggestions as well asvalidating and refining user input. To this end,the use of novel machine learning techniquesfor knowledge graph population, particularly,embedding-based methods appears particularlypromising.

– Integration with machine learning technologies.Knowledge graphs provide added value servingas input for machine learning techniques produc-ing new information, as well as can themselvesbe extended using machine learning. metaphac-tory platform already includes interfaces that canbe utilized for interaction with machine learningtools. Providing configurable general-purpose ex-tensions enabling application of machine learn-ing tools with minimal effort would further bene-fit data scientists.

35http://www.metaphacts.com/trial, at the time of writing the ver-sion 2.3.2 is available.

Page 16: Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform ...pdfs.semanticscholar.org/b1cd/0eeb0d288bf0210e6adb... · Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform for

16 P. Haase et al. / metaphactory: A Platform for Knowledge Graph Management

Table 2Assessing metaphactory platform features along benchmark dimensions [30]

Dimension Parameter FeaturesD1 Human Interaction P1 Modeling Taxonomies, Ontologies, Rules, Mappings

P2 Curation Collaborative, Form-based, GUIP3 Linking Mappings, Virtual integration (Federation)P4 Exploration/Visualization Charts, Maps, Graphs, Faceted browsing, Web GUI TemplatesP5 Search Full-text, Structured, Faceted, Federated, NL question answering

D2 Machine Interaction P6 Data Model RDF, RDFS, OWL, RDBMS + R2RMLP7 APIs SPARQL, LDP, custom REST APIs

D3 Strategic Development P8 Governance Data workflows, Best practicesP9 Security Role-based AC (Data-level, API-level)P10 Quality & Maturity SHACLP11 Provenance Data-level (Context)P12 Analytics Graph-based (Ontodia), ML, Statistical workflows (KNIME, Tableau)

– Advanced data integration tasks. This involves,in particular, platform-level support for manag-ing mappings for integration of non-RDF struc-tured (e.g., JSON, XML) and unstructured (e.g.,text) sources as well as common integration pro-cedures, such as instance matching.

– Development of “blueprints” for common usecases. Given commonalities between differentuse cases, it is desirable to develop common“blueprint” templates for applications. Such ap-plication templates will greatly reduce costs forbuilding applications targeted at new use cases.

References

[1] A. Nikolov, P. Haase, J. Trame and A. Kozlov, Ephedra: Ef-ficiently Combining RDF Data and Services Using SPARQLFederation, in: 8th International Conference on KnowledgeEngineering and Semantic Web (KESW 2017), Szczecin,Poland, 2017, pp. 246–262.

[2] D. Calvanese, B. Cogrel, S. Komla-Ebri, R. Kontchakov,D. Lanti, M. Rezk, M. Rodriguez-Muro and G. Xiao, Ontop:Answering SPARQL queries over relational databases, Seman-tic Web 8(3) (2017), 471–487.

[3] B. Quilitz and U. Leser, Querying Distributed RDF DataSources with SPARQL, in: 7th International Semantic WebConference (ISWC 2008), Vol. 5021, Springer, Karlsruhe, Ger-many, 2008, pp. 524–538.

[4] O. Görlitz and S. Staab, SPLENDID: SPARQL Endpoint Fed-eration Exploiting VOID Descriptions, in: COLD2011 Work-shop, 10th International Semantic Web Conference (ISWC2011), Bonn, Germany, 2011.

[5] M. Acosta, M.-E. Vidal, T. Lampo, J. Castillo and E. Ruck-haus, ANAPSID: An Adaptive Query Processing Engine forSPARQL Endpoints, in: 10th International Semantic Web Con-ference (ISWC 2011), Bonn, Germany.

[6] M. Saleem and A.-C. Ngonga Ngomo, HiBISCuS:Hypergraph-Based Source Selection for SPARQL EndpointFederation, in: 11th Extended Semantic Web Conference(ESWC 2014), Heraklion, Greece, 2014.

[7] A. Schwarte, P. Haase, K. Hose, R. Schenkel and M. Schmidt,FedX: Optimization Techniques for Federated Query Process-ing on Linked Data, in: 10th International Semantic Web Con-ference (ISWC 2011), Bonn, Germany, pp. 601–616.

[8] B. Stringer, A. Meroño-Peñuela, S. Abeln, F. van Harmelenand J. Heringa, SCRY: Extending SPARQL with Custom DataProcessing Methods for the Life Sciences, in: 9th InternationalConference on Semantic Web Applications and Tools for LifeSciences (SWAT4LS), Amsterdam, The Netherlands, 2016.

[9] J. Dolby, A. Fokoue, M. Rodriguez-Muro, K. Srinivas andW. Sun, Extending SPARQL for Data Analytic Tasks, in: 15thInternational Semantic Web Conference (ISWC 2016), Kobe,Japan, 2016, pp. 437–452.

[10] S. Idreos, O. Papaemmanouil and S. Chaudhuri, Overviewof Data Exploration Techniques, in: ACM SIGMOD Interna-tional Conference on Management of Data, Melbourne, Vic-toria, Australia, May 31 - June 4, 2015, Melbourne, Victoria,Australia, 2015, pp. 277–281.

[11] M. Tvarozek and M. Bieliková, Generating Exploratory SearchInterfaces for the Semantic Web, in: Human-Computer Inter-action - Second IFIP TC 13 Symposium, (HCIS 2010), Bris-bane, Australia, 2010, pp. 175–186.

[12] T. Franz, J. Koch, R.Q. Dividino and S. Staab, LENA-TR :Browsing Linked Open Data Along Knowledge-Aspects, in:Linked Data Meets Artificial Intelligence, 2010 AAAI SpringSymposium, Stanford, CA, USA, 2010.

[13] J. Attard, F. Orlandi and S. Auer, ExConQuer: Lowering barri-ers to RDF and Linked Data re-use, Semantic Web 9(2) (2018),241–255.

[14] A. Soylu, M. Giese, E. Jiménez-Ruiz, G. Vega-Gorgojo andI. Horrocks, Experiencing OptiqueVQS: a multi-paradigm andontology-based visual query system for end users, UniversalAccess in the Information Society 15(1) (2016), 129–152.

[15] F. Haag, S. Lohmann, S. Siek and T. Ertl, Visual Querying ofLinked Data with QueryVOWL, in: SumPre 2015, HSWI 2015Workshops, 12th Extended Semantic Web Conference (ESWC2015), Portoroz, Slovenia, 2015.

[16] D.V. Camarda, S. Mazzini and A. Antonuccio, LodLive, ex-ploring the web of data, in: 8th International Conference onSemantic Systems, I-SEMANTICS ’12, Graz, Austria, 2012,pp. 197–200.

[17] A.G. Nuzzolese, V. Presutti, A. Gangemi, S. Peroni andP. Ciancarini, Aemoo: Linked Data exploration based onKnowledge Patterns, Semantic Web 8(1) (2017), 87–112.

Page 17: Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform ...pdfs.semanticscholar.org/b1cd/0eeb0d288bf0210e6adb... · Semantic Web 0 (0) 1 1 IOS Press metaphactory: A Platform for

P. Haase et al. / metaphactory: A Platform for Knowledge Graph Management 17

[18] D. Mouromtsev, D. Pavlov, Y. Emelyanov, A. Morozov,D. Razdyakonov and M. Galkin, The Simple Web-based Toolfor Visualization and Sharing of Semantic Data and Ontolo-gies, in: 14th International Semantic Web Conference (ISWC2015), Bethlehem, PA, USA, 2015.

[19] M. Dudás, S. Lohmann, V. Svátek and D. Pavlov, Ontologyvisualization methods and tools: a survey of the state of the art,Knowledge Engineering Review 33 (2018).

[20] D. Mouromtsev, G. Wohlgenannt, P. Haase, D. Pavlov,Y. Emelyanov and A. Morozov, A Diagrammatic Approachfor Visual Question Answering over Knowledge Graphs, in:ESWC 2018 Satellite Events, Heraklion, Crete, Greece, 2018,pp. 34–39.

[21] P. Haase, A. Nikolov, J. Trame, A. Kozlov and D.M. Herzig,Alexa, Ask Wikidata! Voice Interaction with KnowledgeGraphs using Amazon Alexa, in: 16th International SemanticWeb Conference (ISWC 2017), Vienna, Austria, 2017.

[22] A. Nikolov, P. Haase, D.M. Herzig, J. Trame and A. Kozlov,Combining RDF Graph Data and Embedding Models for anAugmented Knowledge Graph, in: BigNet@WWW’18 Work-shop, The Web Conference 2018, (TheWebConf 2018), Lyon,France, 2018, pp. 977–980.

[23] D. Oldman and D. Tanase, Reshaping the Knowledge Graph byconnecting researchers, data and practices in ResearchSpace,in: 17th International Semantic Web Conference (ISWC 2018),Monterey, CA, USA, 2018.

[24] K. Tzompanaki and M. Doerr, Fundamental Categories andRelationships for querying CIDOC-CRM based repositories,Technical Report, TR-429, ICS-FORTH, 2012.

[25] M. Doerr, A. Kritsotaki and K. Boutsika, Factual argumenta-tion - a core model for assertions making, JOCCH 3(3) (2011),8–1834.

[26] F. Kräutli and M. Valleriani, CorpusTracer: A CIDOC databasefor tracing knowledge networks, DSH 33(2) (2018), 336–346.

[27] T. Hubauer, P. Haase and D.M. Herzig, Use Cases of the In-dustrial Knowledge Graph at Siemens, in: 17th InternationalSemantic Web Conference (ISWC 2018), Monterey, CA, USA,2018.

[28] L. Lane et al., neXtProt: a knowledge platform for human pro-teins, Nucleic Acids Research 40(Database–Issue) (2012), 76–83.

[29] E.L. Willighagen et al., The ChEMBL database as linked opendata, J. Cheminformatics 5 (2013), 23.

[30] M. Galkin, S. Auer, M. Vidal and S. Scerri, Enterprise Knowl-edge Graphs: A Semantic Approach for Knowledge Manage-ment in the Next Generation of Enterprise Information Sys-tems, in: 19th International Conference on Enterprise Infor-mation Systems (ICEIS 2017), Porto, Portugal, 2017, pp. 88–98.