inscite system based on bigdata analysis

51
InSciTe TM system based on Bigdata Analysis November 2013 Dr. Sa-kwang Song and Dr. Jangwon Gim Dept. of Computer Intelligence Research Korea Institute of Science and Technology Information

Upload: hadang

Post on 03-Jan-2017

226 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: InSciTe system based on Bigdata Analysis

InSciTeTM system based on BigdataAnalysis

November 2013Dr. Sa-kwang Song

and Dr. Jangwon Gim

Dept. of Computer Intelligence ResearchKorea Institute of Science and Technology Information

Page 2: InSciTe system based on Bigdata Analysis

Introduction – KISTI

Youngseo Park

President

National

Super Computing

Institute

Advanced Information

Institute

Information Service

Center

SW Research Center

NTIS Center

Dept. of Scientific

Bigdata Research

2

Information Analysis

Institute

Sunhwa Han

Dept. of Computer

Intelligence Research

Wonkyung Sung

Hanmin Jung

SangHwan Lee

Hildesheim University

Page 3: InSciTe system based on Bigdata Analysis

Introduction – Dept. of Computer Intelligence Research

3

Dept. of Computer Intelligence Research

Hildesheim University

Page 4: InSciTe system based on Bigdata Analysis

Outline

� Introduction

� InSciTe Advanced System (2012)

� InSciTe Advisory System (2013)

�Demonstration

4Hildesheim University

Page 5: InSciTe system based on Bigdata Analysis

5Hildesheim University

Page 6: InSciTe system based on Bigdata Analysis

Introduction – Dept. of Computer Intelligence Research

ComputerComputerIntelligenceIntelligence

6

Web Web news/SNSnews/SNS

Paper/PatentsPaper/PatentsIntelligenceIntelligence

TechnologyTechnologyIIntelligencentelligence

Hildesheim University

Page 7: InSciTe system based on Bigdata Analysis

Introduction - Technology Intelligence(TI)

– Activity that enables companies to identify technological opportunities and developmentsthat could affect the future growth and survival of their businesses.

– Aims to capture and disseminate the technological information needed for strategic planning and decision making.making.

– Effective TI capabilitiesare becoming increasingly important, as technology life cycles are shorten and business becomes more globalized.

– So, we have been developing a TI system, InSciTe, since 2010.

– InSciTe: Intelligence in Science and Technology

7Hildesheim University

Page 8: InSciTe system based on Bigdata Analysis

TI System - InSciTeTM

• We have been developing InSciTe System since 2010.– InSciTe (2010)

• The First TI system using Semantic Text Mining Techniques

– InSciTe Advanced (2011)• Focused on predictive analysis based on TLCD(Technology Life-

Cycle Decision) ModelCycle Decision) Model

– InSciTe Adaptive (2012)• Focused on adaptivity as well as insight on each service.

• Mobile application (Android platform)

– InSciTe Advisory (2013)• Focused on prescriptive analysis.• Mobile and Web based application.

2013-11-06 Hildesheim University 8

http://inscite.kisti.re.kr

Page 9: InSciTe system based on Bigdata Analysis

Similar Projects

• CUBIST(Combining and Uniting Business Intelligence with Semantic Technologies) – EU� Led by Sheffield Hallam Univ., Ontotext, and SAP

� 1st CUBIST workshop in July, 2011

� For better Semantic Web search

Aims to develop new ways to interrogate not only the� Aims to develop new ways to interrogate not only themassive volume data on the Internet, but also analyzethe different formats it exist in – such as blogs, wikis,and videos.

9Hildesheim University

Page 10: InSciTe system based on Bigdata Analysis

Similar Projects

• FUSE (Foresight and Understanding from Scientific Exposition) - USA� Broad Agency Announcement (BAA) Released byIARPA at Sep. 14, 2010

� Kick off meeting in summer, 2011

� Seeks to develop automated methods that aid in the� Seeks to develop automated methods that aid in thesystematic, continuous, and comprehensiveassessment of technical emergence using informationfound in the published scientific, technical, and patentliterature

10Hildesheim University

Page 11: InSciTe system based on Bigdata Analysis

11Hildesheim University

Page 12: InSciTe system based on Bigdata Analysis

InSciTeAdaptive

• InSciTe Adaptive – Supports R&D strategy planning based on analysis of

technologies and organizations from textual documents.

– Is adaptive to users and gives insight whenever users act.

– Consists of 8 services including an automatic report generating service.generating service.

– Can be downloaded at Google play.

12Hildesheim University

Page 13: InSciTe system based on Bigdata Analysis

InSciTeSystem

Text Analysis System Semantic Analysis System

Technology Intelligence Services

13Hildesheim University

Hadoop Echo-system

Page 14: InSciTe system based on Bigdata Analysis

Internal Structure

• Text Analytics + Semantic Analytics

Hildesheim University 14

Page 15: InSciTe system based on Bigdata Analysis

Text Analysis System

15Hildesheim University

Page 16: InSciTe system based on Bigdata Analysis

Text Analysis System

The iPhone is a line of smartphones

designed and marketed by Apple

Inc. It runs Apple's iOS mobile

operating system, known as the

"iPhone OS" until mid-2010, shortly

after the release of the iPad. The

first generation iPhone was

Entity Extraction &

Semantic Relation

smartphone

iPhone

iPadiPhone 5

Virtual keyboard

Apple Inc

iOS

iPhone OSisADomain

elementary

develop

16

first generation iPhone was

released on June 29, 2007; the most

recent iPhone, the sixth-generation

iPhone 5, on September 21, 2012.

The user interface is built around

the device's multi-touch screen,

including a virtual keyboard. The

iPhone has Wi-Fi and cellular

connectivity (2G, 3G, 4G, and LTE).

Mobile operation

system

User interface

Multi-touch

screen

Wi-Fi

2G 3G

LTE

Cellular

connectivity

4G

isADomain

isADomain

isADomain

organization

technology

Hildesheim University

Page 17: InSciTe system based on Bigdata Analysis

Ontology Scheme

LocationLocationLocationLocationLocationLocationLocationLocationPersonPersonPersonPersonPersonPersonPersonPersonhasCustomer

sue

found

invest

takeover

competeOrg

17

OrganizationOrganizationOrganizationOrganizationOrganizationOrganizationOrganizationOrganization

ProductProductProductProductProductProductProductProduct TechnologyTechnologyTechnologyTechnologyTechnologyTechnologyTechnologyTechnology

competeOrg

isSimilarOrg

supplement

collaborate

competeByLaw

ownProduct

useProduct

sell

announce

produce

own

useTech

develop

launch

patentTech

isADomain

succeedingTech

substitutedForTech

elementary

competeTech

converge

similarTech

isATech

consistTech

consistProduct

part of

succeedingProduct

substitutedForProduct

competeProduct

similarProduct

Hildesheim University

Page 18: InSciTe system based on Bigdata Analysis

Statistics of Source Documents

• Text Data to Ontology

498,361,449 Triples

technical terms 43,201,941

products 62,327,156

persons 25,110,360

organizations 40,993,708

18

5,268,696Web Documents

(as of Nov. 13, 2012)

7,615,819US/EU/PCT Patents

(2001 ~ 2011)

9,765,199Scientific Papers

(2001 ~ 2011)

locations 50,350,884

Hildesheim University

Page 19: InSciTe system based on Bigdata Analysis

Semantic Analysis System

Scholar Trend

Commercial Trend

Social Trend

Overall Trend

Annual Trend Analysis

Maturity Prediction Development Speed

Maturity Analysis

Major Element Development Possibility

Element Analysis

Convergence Analysis

Technology

Navigation

Technology

Trend

Core Element

TechnologyInsig

ht A

naly

sis

A.

B.

C.

19

Web

Database

Paper

Database

Patent

Database

Background Data

Collaboration Competition

Partner Analysis

Organization Level

Nation Level

Benchmarking Agent

Agent Level Analysis

Ontology

Convergence Technology Development Possibility

Convergence Analysis

Agent Level

Convergence

Technology

Agent Partner

Report

Integrated

Roadmap

Insig

ht A

naly

sis

D.

E.

F.

G.

H.

Hildesheim University

Page 20: InSciTe system based on Bigdata Analysis

InSciTeAdvanced servicesO

verall A

naly

sis

Technolo

gy C

entr

ic

Analy

sis

20

Agent

Trend a

nd

Report

Agent

Centr

ic A

naly

sis

Hildesheim University

Page 21: InSciTe system based on Bigdata Analysis

Developing a project using InSciTeExploring & finding interesting technologies

Deciding most interesting technology

Analyzing competitors

Building Strategy and plan.

21Hildesheim University

Page 22: InSciTe system based on Bigdata Analysis

InSciTe service –Technology Navigation

• Objective– General/overall analysis of

user interested technology

• Insight– Provides users with trends on

commercial, social, and scholar interest.

22Hildesheim University

Page 23: InSciTe system based on Bigdata Analysis

InSciTeservice – Technology Trends

• Objective– analysis of current maturity

level of technologies

• Insight– guiding user to technologies

appropriate for investment appropriate for investment and participation

23Hildesheim University

Page 24: InSciTe system based on Bigdata Analysis

InSciTeservice – Agent Level

• Objective– analysis of major agents on

academic and industrial point of view

• Insight• Insight– discovering benchmarking

organizations for academic and industrial aspects

24Hildesheim University

Page 25: InSciTe system based on Bigdata Analysis

InSciTeservice – Report

• Automatically generated report

25Hildesheim University

Page 26: InSciTe system based on Bigdata Analysis

Value Pyramid

DecisionSupport

Forecasting

ScenarioPlanning

Advising

InSciTe Advanced (2012)

26

Search

Clustering

Extracting

Support

Modified from D. Bousfield & P. Fooladi, “STM Information: 2009 Final Market Size and Share Report”, 2010.

Hildesheim University

Page 27: InSciTe system based on Bigdata Analysis

27Hildesheim University

Page 28: InSciTe system based on Bigdata Analysis

InSciTeAdvisory

• InSciTe Advisory– Provides a selected researcher with how to improve his/her

research competitiveness.

– Recommends role model researchers who are advanced to and could be followed by the researcher.

– Recommends future research plans to reach to the research – Recommends future research plans to reach to the research level of the role model researchers, based on 5W1H questions.

– Consists of two main services including descriptive analytics and prescriptive analytics.

28Hildesheim University

Page 29: InSciTe system based on Bigdata Analysis

Value Pyramid

DecisionSupport

Forecasting

ScenarioPlanning

Advising

InSciTe Advisory (2013)

29

Search

Clustering

Extracting

Support

Modified from D. Bousfield & P. Fooladi, “STM Information: 2009 Final Market Size and Share Report”, 2010.

(2013)

Hildesheim University

Page 30: InSciTe system based on Bigdata Analysis

Value Pyramid

30

Data Analysis Steps for Business Intelligence

InSciTe

Advanced

InSciTe

Advanced

InSciTe

Advisory

InSciTe

Advisory

Hildesheim University

Page 31: InSciTe system based on Bigdata Analysis

Technical Trends in 2013

• 2013 Hype Cycle for Emerging Technologies

Hildesheim University 31

Page 32: InSciTe system based on Bigdata Analysis

3 Phases of Business Analytics• Descriptive Analytics: A set of technologies and processes

that use data to understand and analyze business performance

• Predictive Analytics: The extensive use of data and mathematical techniques to uncover explanatory and predictive models of business performancerepresenting the inherit relationship between data inputs and outputs/outcomes.inherit relationship between data inputs and outputs/outcomes.

• Prescriptive Analytics: A set of mathematical techniques that computationally determine a set of high-value alternative actions or decisions given a complex set of objectives, requirements, and constraints, with the goal of improving business performance.

Hildesheim University 32

A cross-brand team from IBM (Irv Lustig, Brenda Dietrich, ChristerJohnson and Christopher Dziekan)

Page 33: InSciTe system based on Bigdata Analysis

Temporal Perspective

• Descriptive Analytics

• Predictive Analytics

• Prescriptive Analytics

Hildesheim University 33

Gartner

Page 34: InSciTe system based on Bigdata Analysis

Related Work• IBM: Tax Revenue Collections in NY State (2010)

– Problems: Dynamically changing situation for taxpayers of interest (Move or disappear, File bankruptcy)

– Prescriptive analytics: creation of a set of optimal strategies that respond to the constantly changing information and uncertainty about delinquent taxpayers

• AYATA: Oil & Gas Companies (2013)• AYATA: Oil & Gas Companies (2013)– Problems: Billion-dollar decisions about when and where to drill,

how many sites to develop, and how to produce from multiple wells.

• Big impact from every decision, with the high cost of drilling and completing wells

– Prescriptive analytics: showing the impact of each decision so operations managers can ensure future production output.

Hildesheim University 34

Page 35: InSciTe system based on Bigdata Analysis

Screenshots: Researcher-centric Analysis

Search Researcher

Hildesheim University 35

Candidates

Selected Researcher

Page 36: InSciTe system based on Bigdata Analysis

Screenshots: Descriptive Analytics

Technology Selection

Hildesheim University 36

Activities in Scholar, Career, Industrial

perspectives

Researcher Power Index of

9 measures

Selection

Page 37: InSciTe system based on Bigdata Analysis

Screenshots: Prescriptive AnalyticsRole model group and

its characteristics w.r.t a selected researcher

Hildesheim University 37

Prescriptive Advices in the point of 5W1H.

Page 38: InSciTe system based on Bigdata Analysis

Screenshots: Role Model Group• a group of researchers whom the selected researcher should

follow.

• Technological characteristics of the Role Model Group, Researcher Power index, and representative researchers of the group are described.

• Role Model group selection bar can be moved so that users • Role Model group selection bar can be moved so that users can change their Role Model group.

Hildesheim University 38

Role Model Group selection bar

Role Model Group Summary

Page 39: InSciTe system based on Bigdata Analysis

Prescription (5W1H)• What to do

– Describes the goal the user has to do to achieve the level of the Role Model Group researchers.

• With whom– Recommends researchers or organizations to work together.

• How & When– Explains how to do something and when to do it in detail, in – Explains how to do something and when to do it in detail, in

order to achieve the goal.• Where

– Describes which organization is appropriate for the user in order to achieve the goal.

• Why– Included in other sections.

Hildesheim University 39

Page 40: InSciTe system based on Bigdata Analysis

Automatically Generated Report

Hildesheim University 40

2013-11-06

Page 41: InSciTe system based on Bigdata Analysis

41Hildesheim University

Page 42: InSciTe system based on Bigdata Analysis

Hadoopbased Big Data Platform

ParallelAnalysis

ResultAssign

= HDFS + MR Framework

Hildesheim University 42

…..

Page 43: InSciTe system based on Bigdata Analysis

Information Extraction

MapReducemap(docId, documents)

Split SentenceSplit Sentence

Inte

rmed

iate

file

MapReduceReduce(docId, value)

Document1Output

Parsing/Analyzing PhraseParsing/Analyzing Phrase

Inte

rmed

iate

file

MapReducereduce(docId, value)

Map PhaseIntermediate

Results

Reduce PhaseInput

OutputExtracting candidates Extracting candidates

Extracting Extracting Technology termsTechnology terms

cassandracassandraWeb searchWeb search

Hildesheim University 43

Page 44: InSciTe system based on Bigdata Analysis

Internal Structure

- Text Analysis + Semantic Analysis

Hildesheim University 44

Page 45: InSciTe system based on Bigdata Analysis

Distributed Ontology

Repository Engine Parallel Information Extraction

Entity RecognitionEngine (Hybrid

Methods)

Relation Extraction Engine

(Self-Learning)

Lexicostatistic Analysis Engine

Real-time Ontology Access APIs

Statistical Text AnalysisAPIs

Text Slot-Filling EngineQuestion & Answering

APIs

IE APIs

Real-timeUpdates

ResourceProvision

Machine Learning Engines

Entity Sense Disambiguation

Text Analysis System

Distributed Computing Environment (Hadoop/HDFS)

OntologyDB

Source Document DB (IR/DBMS)

Resource DB(Collection/Dictionaries)

Resource Aggregation & Construction

Distributed Web Crawler/Wrapper

OntologyDB

OntologyDB

OntologyDB

OntologyDB

OntologyDB

Distributed Ontology DB

Provision

u-LAMP System

Hildesheim University 45

Page 46: InSciTe system based on Bigdata Analysis

Semantic Analysis System

Hildesheim University 46

Page 47: InSciTe system based on Bigdata Analysis

Data Processing with Hadoopechosystem

HBase Table

RDBMS Table

ETL Processing

InSciTe Service Modules

Hildesheim University 47

ETL ProcessingMR Processing

Service Source Data

HDFS File

MR ProcessingETL Processing

Page 48: InSciTe system based on Bigdata Analysis

Layered Architecture of InSciTe SystemApplication Layer

Application Authoring Layer

Service 1

Analytics Layer

…..

Model VerifyingEngine

ModelExecuting Engine

ApplicationModel

Module RecommendingEngine

Authoring Workbench

Debugging Engine

Visualization Layer

Service 2Application 1

Hildesheim University 48

Hadoop-ecosystem based Bigdata Platform Layer

R, Mahout

NoSQL HDFSMapRedu

ce

RDBStructure

d data

Chukwa, Flume, Scribe

Sqoop,hiho

Unstructureddata

Pig, HiveHcatalog,

Avro

Zookeeper

Graph DB(Giraph)

Relation Modules

…Rel 1

Rel 2 Rel N

Visualization Modules

…Vz 1

Vz 2 Vz N

…Stat Modules

…Stat 1

Stat 2 Stat N

SWOT Modules

…SWOT 1

SWOT 2 SWOT N

Match Making Processing Modules

…MM 1

MM 2 MM N

Trend Analysis Modules

…TA 1

TA 2 TA N

Internal Storage ManagerInverted Index

ManagerFeature Vector

ManagerCo-occurrence Info.

ManagerMatrix

ManagerTriple

ManagerTrie

Manager …

Prefuse,FLARE

Jgraph, Ubigraph,

Esper, Storm,Hstreaming

Memcached, MemBase

Batch Processing Manager (Near) Real-time Processing Manager

Page 49: InSciTe system based on Bigdata Analysis

Layered Architecture• Big Data platform layer: various Hadoop related toolsare

included in addition to RDBMS in order to support the upper level layers

• Analytics layer: Grouped and modularized analytics modulesincluding statistical analytics, relation analytics, SWOT analytics, trend analytics, match-making analytics,

• Visualization layer: Analytics module specialized in Big • Visualization layer: Analytics module specialized in Big Data visualization.

• Application authoring layer: Application authoring workbench for building user’s applicationcomposed of one or more analytics modules in the analytics layer.

• Application layer: applications or servicesbased on the application models built in the application authoring layer.

Hildesheim University 49

GO

Page 50: InSciTe system based on Bigdata Analysis

Hildesheim University 50

Page 51: InSciTe system based on Bigdata Analysis

Q&A

Hildesheim University 512013-11-06

Sa-kwang [email protected]

http://inscite.kisti.re.krDept. of Computer Intelligence Research

Korea Institute of Science and Technology Information