weblab, open source media mining platform, ow2con'12, paris

15
OW2Con'12, November 28-29, 2012 Orange Labs, Paris. www.ow2.org. Open source media mining platform Gérard Dupont Research engineer – COEDS2 – Advanced studies

Upload: ow2-consortium

Post on 25-Jun-2015

572 views

Category:

Technology


1 download

DESCRIPTION

The Web is large and information is present in many forms. Complex techniques are necessary to discover the hidden structure of content and a single software provider cannot be expert on all them. Thus the integration platform comes as a perfect solution enabling the use of the best tools for each function. In this presentation we will present OSINT challenges and its growing importance. Then we will detail the WebLab approach to build flexible and scalable OSINT applications matching the fast-paced nature of OSINT. From semantic data models to upper architecture passing through selected technologies used, the presentation will do the complete tour of the WebLab project.

TRANSCRIPT

Page 1: WebLab, open source media mining platform, OW2con'12, Paris

OW2Con'12, November 28-29, 2012 Orange Labs, Paris. www.ow2.org.

Open source media mining platform

Gérard Dupont

Research engineer – COEDS2 – Advanced studies

Page 2: WebLab, open source media mining platform, OW2con'12, Paris

OW2Con'12, November 28-29, 2012 Orange Labs, Paris. www.ow2.org.

Media mining platform

From unstructured data from any sources...

… to structured and actionable knowledge

Page 3: WebLab, open source media mining platform, OW2con'12, Paris

OW2Con'12, November 28-29, 2012 Orange Labs, Paris. www.ow2.org.

2005 - http://www.opte.org

Page 4: WebLab, open source media mining platform, OW2con'12, Paris

OW2Con'12, November 28-29, 2012 Orange Labs, Paris. www.ow2.org.

Some activities need to be automated:

Some activities cannot be automated :- experts analysis of content ;

- linking and mapping heterogenous information ;

- evaluating reliability and assessing information ;

- report and synthesis of information.

→ Tools can provide support but keep human in the loop.

Search/Sources assessment

Data Acquisition

Classification, Screening, Indexing

Information retrieval

Knowledge capitalisation

Visualization

Summary

Alert

OSINT challenges

Page 5: WebLab, open source media mining platform, OW2con'12, Paris

OW2Con'12, November 28-29, 2012 Orange Labs, Paris. www.ow2.org.

Enriched text Translated text

Text

TranscriptionaudioTranscriptionAudio

国際グリーンピース高山チームは富士山の頂上への支援と福島第一に原子力災害の被害者のための希望のメッセージを配信します。日本と世界中の何千人もの人々から収集した、グリーンピースは、これらのメッセージは、原子力発電に反対する日本の人々を団結に役立つ、日本当局はそれらに耳を傾けることを奨励することを期待しています。

An international Greenpeace alpine team delivers messages of support and hope for the victims of the nuclear disaster at Fukushima Daiichi to the summit of Mt Fuji. Collected from thousands of people in Japan and all over the world, Greenpeace hopes that these messages will help unite the people of Japan in opposition to nuclear power.

An international Greenpeace alpine team delivers messages of support and hope for the victims of the nuclear disaster at Fukushima Daiichi to the summit of Mt Fuji. Collected from thousands of people in Japan and all over the world, Greenpeace hopes that these messages will help unite the people of Japan in opposition to nuclear power.

Alert Extraction

d’informationExtraction information

TraductionTranslation

Sphinx

Segmentationvidéo

Transcriptionaudio

Vidéo Audio Audio vocal

Texte

Epurationaudio

TraductionExtraction

d’information

Texte annoté Texte traduit

Segmentationvidéo

Transcriptionaudio

Vidéo Audio Audio vocal

Texte

Epurationaudio

TraductionExtraction

d’information

Texte annoté Texte traduit

TraductionAudioextraction

TraductionCollect

A processing workflow

Page 6: WebLab, open source media mining platform, OW2con'12, Paris

OW2Con'12, November 28-29, 2012 Orange Labs, Paris. www.ow2.org.

Integration approach

A platform providing "plug & play" functionalities for the integration of tools for collection, processing, analysis and communication...

Page 7: WebLab, open source media mining platform, OW2con'12, Paris

OW2Con'12, November 28-29, 2012 Orange Labs, Paris. www.ow2.org.

Technology pile

URI UTF-8

XML Namespaces

XSD XPath XQuery RDF

RDFSWSDL

SOAP

OWLBPEL

SOA ESB JBIPortail/Portlets

JSR168

SPARQL

Apache TomcatJava Application Server

Enterprise Service Bus

Content store

Database

Portal

Maps server

Page 8: WebLab, open source media mining platform, OW2con'12, Paris

OW2Con'12, November 28-29, 2012 Orange Labs, Paris. www.ow2.org.

Standard model & interfaces

Page 9: WebLab, open source media mining platform, OW2con'12, Paris

OW2Con'12, November 28-29, 2012 Orange Labs, Paris. www.ow2.org.

Standard model & interfaces

Page 10: WebLab, open source media mining platform, OW2con'12, Paris

OW2Con'12, November 28-29, 2012 Orange Labs, Paris. www.ow2.org.

Application per domain

Page 11: WebLab, open source media mining platform, OW2con'12, Paris

OW2Con'12, November 28-29, 2012 Orange Labs, Paris. www.ow2.org.

Application per domain

Page 12: WebLab, open source media mining platform, OW2con'12, Paris

OW2Con'12, November 28-29, 2012 Orange Labs, Paris. www.ow2.org.

Application per domain

Page 13: WebLab, open source media mining platform, OW2con'12, Paris

OW2Con'12, November 28-29, 2012 Orange Labs, Paris. www.ow2.org.

WebLab: a mature project

24 components including– 8 technical services

– 9 services

– 7 portlets

Core plateform including– Data model

– ESB

– Portal

– Ochestrator

Sample application– Local data collection

– Simple information extraction

– Text index/search

Page 14: WebLab, open source media mining platform, OW2con'12, Paris

OW2Con'12, November 28-29, 2012 Orange Labs, Paris. www.ow2.org.

WebLab: a mature project

Technical stuff :– code (SVN) ;– bug tracking (JIRA) ;– daily build (BAMBOO) ;– code quality (SONAR) ; – mailing list (8 guys).

Available tools :– Maven plugin ;– Eclipse wizard ;– SOAPui test librarie;– CLI test tools ;– Complete Bundle ;– ...

Page 15: WebLab, open source media mining platform, OW2con'12, Paris

OW2Con'12, November 28-29, 2012 Orange Labs, Paris. www.ow2.org.

Thanks for your attention

Take away [weblab.ow2.org]

HERITRIX - http://crawler.archive.orgFFMPEG - http://ffmpeg.org/SPHINX - http://cmusphinx.sourceforge.net/sphinx4/

GOOGLE TRANSLATE - http://translate.google.com/GATE - http://gate.ac.uk/JENA - http://jena.apache.org/

Logos and names of the tools presented are the property of their respective providers and are here only as illustration purposes on the already integrated technology in WebLab. Neither CASSIDIAN, nor EADS, claims any paternity on these external tools.