text mining in poolparty semantic suite

48
Martin Kaltenböck CFO, Semantic Web Company Timea Turdean Technical Consultant, SWC POOLPARTY SEMANTIC SUITE AIMS Webinar 21st Sept 2017 1

Upload: martin-kaltenboeck

Post on 24-Jan-2018

92 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Text Mining in PoolParty Semantic Suite

Martin KaltenböckCFO, Semantic Web Company

Timea Turdean Technical Consultant, SWC

POOLPARTY SEMANTIC SUITE

AIMS Webinar 21st Sept 2017

1

Page 2: Text Mining in PoolParty Semantic Suite

PoolParty Drupal Integration

2

Agenda▸ Introduction Semantic Web Company (SWC)

▸ Introduction PoolParty Semantic Suite

▸ Using PoolParty for Text & Data Mining

▹ Text Mining for continuous knowledge graph modelling

▹ Entity linking and data integration

▹ Classification and semantic annotation / tagging

▸ DEMO(s) of text mining capability of PoolParty

▸ Customer Success Stories

▹ REEEP ClimateTagger

▹ healthdirect Australia

▹ CTCN Semantic Search

▹ EIP Water Matchmaking

▸ Q&A Session

Page 3: Text Mining in PoolParty Semantic Suite

INTRODUCTIONSemantic Web Company &

PoolParty Semantic Suite

3

Page 4: Text Mining in PoolParty Semantic Suite

INTRODUCING SEMANTIC WEB COMPANY

Semantic Web Company (SWC)▸ Founded in 2004

▸ Based in Vienna

▸ Privately held

▸ 40+ employees, experts in text

mining & linked data

▸ ~15-20% revenue growth / year

▸ 2.5 Mio Euro funding for R&D

▸ SWC named to KMWorld’s 2017

‘100 Companies That Matter in

Knowledge Management’

▸ Organising SEMANTiCS

conference series for 13 years

▸ https://www.semantic-web.com

4

Page 5: Text Mining in PoolParty Semantic Suite

INTRODUCING POOLPARTY

PoolParty Semantic Suite

▸ First release in 2009

▸ Current version 6.0

▸ W3C standards compliant

▸ Over 200 installations

worldwide

▸ 50% of revenue is reinvested

into PoolParty development

PoolParty on-premises or

used as a cloud service

▸ KMWorld listed PoolParty as Trend-Setting Product 2015, 2016 and 2017

▸ https://www.poolparty.biz/

5

Page 6: Text Mining in PoolParty Semantic Suite

SELECTED CUSTOMER REFERENCESAND PARTNERS

SWC head-quarters

6

Customer References

● Credit Suisse● Boehringer Ingelheim● Roche● adidas● The Pokémon Company● Canadian Broadcasting Corporation● Harvard Business School● Wolters Kluwer● Talend● HealthStream● TC Media● Techtarget● Seek● Alliander N.V.● Pearson - Always Learning● Education Services Australia● American Physical Society● Healthdirect Australia● World Bank Group● Inter-American Development Bank● Renewable Energy Partnership● Wood MacKenzie● Oxford University Press● International Atomic Energy Agency● Norwegian Directorate of Immigration● Ministry of Finance (AT)● Council of the E.U.● Australian National Data Service

Partners

● Accenture● EPAM Systems● Enterprise Knowledge● Mekon Intelligent Content Solutions● B-S-S Business Software Solutions● MarkLogic● Wolters Kluwer● Digirati● Quark

US East

US West

AUS/NZL

UK

Page 7: Text Mining in PoolParty Semantic Suite

MAKE USE OF POOLPARTY SEMANTIC SUITE

OVERVIEW

7

Page 8: Text Mining in PoolParty Semantic Suite

TECHNICAL CORE COMPONENTS

8

Bain Capital is a venture capital

company based in Boston, MA.

Since inception it has invested in

hundreds of companies including AMC

Entertainment, Brookstone, and Burger

King. The company was co-founded by

Mitt Romney.

Taxonomy & Ontology Server

Entity Extractor & Text Mining

Data Integration & Data Linking

Unstructured

Data

Semi-

structured

Data

Structured

Data

Unified

Views

PoolParty

GraphSearch

Identify newcandidate conceptsto be included in a controlled vocabulary

Controlled vocabulariesas a basis for highly

precise entity extraction

Entity Extractor informsall incoming data streams about its semantics and links them

Schema mapping based on ontologies

RDF

Graph Database

Page 9: Text Mining in PoolParty Semantic Suite

PoolParty Semantic Suite

System Architecture Overview

9

Page 10: Text Mining in PoolParty Semantic Suite

360-degree views over various content repositories

10

Page 11: Text Mining in PoolParty Semantic Suite

‘Elevator Pitch’

▸ Built as a ‘Semantic Middleware’

▸ Outstanding user-friendliness

▸ Fully standards-compliant

▸ Highly precise entity extraction

▸ Comprehensive API

▸ Excellent maintainability of extraction models

▸ Integrated with leading search engines & graph databases

▸ Integrated with leading content management platforms

▸ Product configuration options for growing requirements

▸ Highly expertised partners / service team

11

Page 12: Text Mining in PoolParty Semantic Suite

Product Overview

All products are available as cloud services or for on-premise installation

> PoolParty Feature & Price Matrix

12

PoolParty Basic Server

PoolParty Advanced Server

PoolParty Enterprise Server

PoolParty Semantic Integrator

SKOS Taxonomy ManagementMultiple Projects

Taxonomy Rest APIImport/Export (incl. Excel)

Rollback and History

Ontologies and Custom SchemesQuality Management & ReportsAdvanced Corpus Management

Vocabulary Mapping, Linked Data MappingLinked Data Enrichment, Frontend, and SPARQL endpoint

Entity Extractor Extractor APIAuto Populate project from DBpedia

Export to Remote RepositoryWorkflow Management

SKOS-XL (optional)

Integration with Graph databasesIntegration with Search engines

Data linking & mappingData transformation pipelines with UnifiedViews

Graph Search Server

Page 13: Text Mining in PoolParty Semantic Suite

HOW DOESTHIS WORK

Taking a look under the hood

13

Page 14: Text Mining in PoolParty Semantic Suite

BASIC PRINCIPLESBenefiting from the Semantic Web

in a Nutshell

14

Page 15: Text Mining in PoolParty Semantic Suite

Four-layered Content Architecture

15

Page 16: Text Mining in PoolParty Semantic Suite

Metadata and semantic data

16

The Peggy Guggenheim Collectionis a modern art museum on the Grand Canal in the Dorsoduro sestiere of Venice, Italy. It is one of the most visited attractions in Venice. The collection is housed in the Palazzo Venier dei Leoni, an 18th-century palace, which was the home of the American heiress Peggy Guggenheim for three decades. She began displaying her private collection of modern artworks to the public seasonally in 1951. After her death in 1979, it passed to the Solomon R. Guggenheim Foundation, which eventually opened the collection year-round.

Page 17: Text Mining in PoolParty Semantic Suite

Metadata and semantic data

17

The Peggy Guggenheim Collectionis a modern art museum on the Grand Canal in the Dorsoduro sestiere of Venice, Italy. It is one of the most visited attractions in Venice. The collection is housed in the Palazzo Venier dei Leoni, an 18th-century palace, which was the home of the American heiress Peggy Guggenheim for three decades. She began displaying her private collection of modern artworks to the public seasonally in 1951. After her death in 1979, it passed to the Solomon R. Guggenheim Foundation, which eventually opened the collection year-round.

Peggy Guggenheim

Peggy Guggenheim Collection

Venice

Canale Grande

http://my.com/resource/328832

skos:preLabel

http://my.com/docs/45367

skos:preLabel

http://my.com/docs/52345

skos:preLabel

http://my.com/resource/328832

skos:preLabel

Page 18: Text Mining in PoolParty Semantic Suite

Metadata and semantic data

18

The Peggy Guggenheim Collectionis a modern art museum on the Grand Canal in the Dorsoduro sestiere of Venice, Italy. It is one of the most visited attractions in Venice. The collection is housed in the Palazzo Venier dei Leoni, an 18th-century palace, which was the home of the American heiress Peggy Guggenheim for three decades. She began displaying her private collection of modern artworks to the public seasonally in 1951. After her death in 1979, it passed to the Solomon R. Guggenheim Foundation, which eventually opened the collection year-round.

Peggy Guggenheim

Peggy Guggenheim Collection

Venice

museum

Canale Grande

skos:preLabel

http://my.com/docs/45367

skos:preLabel

http://my.com/docs/52345

skos:preLabel

skos:preLabel

http://my.com/resource/62545

skos:preLabel

http://www.mycom.com/images/90546089

imgae

has ladmark

named after

http://my.com/resource/328832

http://my.com/resource/328832hosted in

hosted in

has

Page 19: Text Mining in PoolParty Semantic Suite

Metadata and semantic data

19

The Peggy Guggenheim Collectionis a modern art museum on the Grand Canal in the Dorsoduro sestiere of Venice, Italy. It is one of the most visited attractions in Venice. The collection is housed in the Palazzo Venier dei Leoni, an 18th-century palace, which was the home of the American heiress Peggy Guggenheim for three decades. She began displaying her private collection of modern artworks to the public seasonally in 1951. After her death in 1979, it passed to the Solomon R. Guggenheim Foundation, which eventually opened the collection year-round.

Peggy Guggenheim Collection

dct:title

Mike Miller

Michael Miller

skos:prefLabel

skos:altLabel

dct:creator

http://my.com/docs/328832

http://my.com/people/32schema:Article

rdf:type

http://my.com/img/99.jpg

schema:image

skos:subject

Peggy Guggenheim Collection Venice

museum

skos:prefLabel

skos:subject

skos:altLabel

skos:broader

skos:prefLabel

schema:image

Canale Grande

skos:prefLabel

Page 20: Text Mining in PoolParty Semantic Suite

Resolving Language Problems

“While most people can deal with linguistic features as synonyms, homographs, polyhierarchies, and even with far more peculiar characteristics of natural languages, machines often struggle with automatic sense-making because of the lack of a semantic knowledge model that can be used programmatically.”

Page 21: Text Mining in PoolParty Semantic Suite

Knowledge Graph Text Mining for

knowledge graph development

21

Page 22: Text Mining in PoolParty Semantic Suite

PoolParty Extractor

Uses several components of a knowledge model:

▸ Taxonomies based on the SKOS standard

▸ Ontologies based on RDF Schema or OWL

▸ Word form dictionaries

▸ Blacklists and stop word lists

▸ Disambiguation settings

▸ Domain-specific reference document corpus

▸ Statistical language model

22

Page 23: Text Mining in PoolParty Semantic Suite

PoolParty’s SKOS editor

23

The Audi Q3 is a compact crossover SUV made by Audi.

It is based on the PQ35 platform of Volkswagen.

A5 platform

A series

Page 24: Text Mining in PoolParty Semantic Suite

PoolParty’s ontology and custom schema management

24

Taxonomy

Ontology

Ontology 1from library

Ontology 2(imported)

Ontology 3(custom-made)

Custom Schema

Page 25: Text Mining in PoolParty Semantic Suite

‘Setting the rules’ for text mining & entity extraction via thesaurus

25

Proper use of an funduscoperequires a bit of practice and familiarity with the functions of your device.

Diagnostic Equipment

Ophtalmoscope

Page 26: Text Mining in PoolParty Semantic Suite

Disambiguation settings

26

Page 27: Text Mining in PoolParty Semantic Suite

Disambiguation settings

27

Page 28: Text Mining in PoolParty Semantic Suite

Corpus analysis results in a network of concepts and terms

28I need support to continuously extend our taxonomy / controlled vocabulary!

skos:Concept

ReferenceCorpus

- Websites- PDF, Word, …- Abstracts from

DBpedia- RSS Feeds

skos:Concept

skos:Concept

Term 1

Term 3

Term 7

Term 8

Term 6

Term 4

Term 2

Term 5

- Relevant terms and phrases- Relevancy of concepts- co-occurence between concepts and terms- co-occurence between terms and terms

Page 29: Text Mining in PoolParty Semantic Suite

Semantic AnnotationClassification and Semantic

Annotation / Tagging

29

Page 30: Text Mining in PoolParty Semantic Suite

Entity Extraction based on Knowledge Graphs

30

Page 31: Text Mining in PoolParty Semantic Suite

PoolParty as a supervised learning system

31

Content Manager

Integrator

Taxonomist/Ontologist

ThesaurusServer

Extractor

PowerTagging

uses API

is user of

is user of

is basis of

is basis of

Index

annotates

enriches

Reference Corpus

CMS

extends

is basis of

analyzesuses API

Page 32: Text Mining in PoolParty Semantic Suite

Data Integration Mapping and Linking of Data

32

Page 33: Text Mining in PoolParty Semantic Suite

PoolParty Semantic Integrator -at a glance

https://youtu.be/l_LppfS3wxk

33

Deep Data Analytics

SemanticSearch

SemanticIntegrator

Unstructured Data

Structured Data

ETL / Monitoring / Scheduling

Page 34: Text Mining in PoolParty Semantic Suite

PoolParty Semantic Integrator

High-level architecture

34

Page 35: Text Mining in PoolParty Semantic Suite

DEMO(s)… lets see how it works in action

35

Page 36: Text Mining in PoolParty Semantic Suite

PoolParty Thesaurus Manager● SKOS editor● Ontology and custom scheme manager

PoolParty PowerTagging for Drupal (backend)● Automated Tagging ● Manual Tagging ● Configuration of modules

PoolParty GraphSearch for Drupal (frontend)● Semantic Search● Explore Trends & Sentiments● Facets and Similarity

36

DEMOS

Page 37: Text Mining in PoolParty Semantic Suite

Drupal and PoolParty at a Glance

37

PoolParty Drupal Integration Demo: http://drupal.poolparty.biz/

Page 38: Text Mining in PoolParty Semantic Suite

USE CASESSuccess Stories about Text Mining and Linked Data

using PoolParty Semantic Suite

38

Page 39: Text Mining in PoolParty Semantic Suite

Use Cases: Text Mining & Linked Data

▸ Climate Tagger (PDF)Streamline and catalogue data and information resources

▸ healthdirect Australia (PDF)Semantic Search based on the Australian Health Thesaurus

▸ CTCN Semantic SearchIntegrating thousands of documents from several sources on climate technology

▸European Innovation Partnership /EIP) on Water Online Marketplace including semantic Matchmaking

39

Page 40: Text Mining in PoolParty Semantic Suite

Place your screenshot here

40

Climate TaggerHelp organizations in the climate and development arenas catalogue, categorize, contextualize, and connect data and information resources.

Climate Tagger is backed by the expansive Climate Compatible Development Thesaurus.

http://www.climatetagger.net

Page 41: Text Mining in PoolParty Semantic Suite

How does it work

41

Page 42: Text Mining in PoolParty Semantic Suite

Place your screenshot here

42

EIP Water MatchmakingControlled vocabularies enable accurate matchmaking between Supply and Demand for Water Innovation in Europe.

Matchmaking is based upon the EIP Water Innovation Thesaurus (GEMET based).

http://www.eip-water.eu

Page 43: Text Mining in PoolParty Semantic Suite

Place your screenshot here

43

CTCN Semantic SearchHelp organisations in the climate technology field to explore and find relevant content from thousands of Drupal Nodes and several sources using PoolParty, PowerTagging and s0nr webmining

CTCN is backed by the CTCN Climate Technology Thesaurus.

https://www.ctc-n.org/semantic-search

Page 44: Text Mining in PoolParty Semantic Suite

Place your screenshot here

44

healthdirect AustraliaIntegrated views and semantic search over more than 100 trusted sources.

Harmonization of various metadata systems through the use of a central vocabulary hub: Australian Health Thesaurus.

http://www.healthdirect.gov.au

Page 45: Text Mining in PoolParty Semantic Suite

SUMMARY

WHY TAXONOMISTS AND INFORMATION ARCHITECTS LIKE POOLPARTY

Read more

Different project stakeholders expect specific qualities from a semantic technology platform:45

I am a taxonomist. I need a tool that provides convenient functionalities and intuitive user interfaces for my daily work.

I am an information architect. Enterprise metadata management deserves scalable technologies, which provide semantic services on top of rich APIs based on standards.

Page 46: Text Mining in PoolParty Semantic Suite

PoolParty Academy

Get certified!

46

https://www.poolparty.biz/academy/

Page 47: Text Mining in PoolParty Semantic Suite

GET STARTED

47

Get your test account atwww.poolparty.biz

Page 48: Text Mining in PoolParty Semantic Suite

CONNECT

Timea TurdeanTechnical Consultant, SWC▸ [email protected]▸ https://www.linkedin.com/in/timeaturdean/▸ https://twitter.com/poolparty_team

48

© Semantic Web Company - http://www.semantic-web.at/ and http://www.poolparty.biz/

Martin KaltenböckCFO, Semantic Web Company

[email protected]

▸ https://www.linkedin.com/in/martinkaltenboeck

▸ https://twitter.com/semwebcompany

▸ https://blog.semantic-web.at/