big data - where from where to
DESCRIPTION
TRANSCRIPT
Copyright © 2013 Hanmin Jung
Hanmin JungHead of the Dept. of Computer Intelligence Research
KISTI
Big Data:Where from? Where to?
Copyright © 2013 Hanmin Jung
� Very Recent Activities on Big Data
� (National Science and Technology Commission) Member of Big Data Technical Impact Assessment Committee
� (Korea Communications Commission) Sub-committee Chair of Big Data Forum
� (Ministry of Knowledge Economy) Technical Secretary of Big Data Program Planning Committee
� (Ministry of Educational Science and Technology) Member of Big Data Information Strategic Program Expert Committee
� (National IT Industry Promotion Agency) Lecturer of Big Data Expertise Reinforcement Program
Let Me Introduce Myself :-)
2
Copyright © 2013 Hanmin Jung3
Questions
Where are Big Data from?
Who gathers and consumes the data?
Is the data used for?
Copyright © 2013 Hanmin Jung
Smart Work
http://files.thinkpool.com/files/bbs/2010/07/21/%EC%8A%A4%EB%A7%88%ED%8A%B8%EC%9B%8C%ED%81%AC1.jpg
4
Copyright © 2013 Hanmin Jung
Cloud Computing
� Service Platform Accelerated by Mobile Devices
http://simpleroot.com/wp-content/uploads/2012/10/Remote-Cloud-Computing.jpg
5
Copyright © 2013 Hanmin Jung6
Cloud Computing – 建建建建て前前前前& 本音本音本音本音
� Introducing iCloud
Copyright © 2013 Hanmin Jung7
Cloud Computing
� Google Data Center
http://www.youtube.com/watch?v=avP5d16wEp0
Copyright © 2013 Hanmin Jung8
Data Sources
Web -> Social -> Thing
“The next Google or Facebook may well bean Internet of Things company.”by R. MacManus (ReadWriteWeb)
Copyright © 2013 Hanmin Jung9
Social Data
http://bynoy.files.wordpress.com/2011/08/united-noy-weblife-60-seconds.jpg
Copyright © 2013 Hanmin Jung10
Machine Data
T. Baer, “What is Big Data? The Reality for Analytics”, OVUM, 2011.
Call data recordsCall data records
Sensory dataSensory data
Web log filesWeb log files
Financial Instrument TradeFinancial Instrument Trade
Copyright © 2013 Hanmin Jung11
Internet of Things
K. Escherich, “Internet of Things”, 2011.
Copyright © 2013 Hanmin Jung12
Big Data in the World
http://www.ektron.com/billcavablog/Big-Data-Big-Content-Big-Challenges/
Copyright © 2013 Hanmin Jung13
Infographics for Big Data
http://thumbnails.visually.netdna-cdn.com/big-data_50291c3b16257.jpg
Copyright © 2013 Hanmin Jung14
Google.com Traffic
http://siteanalytics.compete.com/naver.com/
Copyright © 2013 Hanmin Jung15
Naver.com Traffic
http://siteanalytics.compete.com/naver.com/
Copyright © 2013 Hanmin Jung
Foreseeable Future
� Google Project Glass
16
Copyright © 2013 Hanmin Jung17
Hype Cycle
Copyright © 2013 Hanmin Jung18
Hype Cycle – 2010
Emerging Technologies Hype Cycle 2010
Copyright © 2013 Hanmin Jung19
Hype Cycle – 2011
Emerging Technologies Hype Cycle 2011
Copyright © 2013 Hanmin Jung20
Hype Cycle – 2012
Emerging Technologies Hype Cycle 2012
Copyright © 2013 Hanmin Jung21
Google Insights
http://www.google.com/insights/search/
Copyright © 2013 Hanmin Jung22
Bottleneck in Data Ecosystem
http://quizzicaleyebrow.files.wordpress.com/2011/03/pict0044.jpg
Copyright © 2013 Hanmin Jung23
Big Data Ecosystem
http://imexresearch.com/Newsletter_HTML/bd2.png
Copyright © 2013 Hanmin Jung
Big Data Ecosystem
� New Approaches Required for
� Persistence
� Indexing
� Caching and query optimization
� Processing
� Structure
� Query language
� Compression
24T. Baer, “What is Big Data? The Reality for Analytics”, OVUM, 2011.
Copyright © 2013 Hanmin Jung25
Insights for Search
http://www.google.com/insights/search/
Copyright © 2013 Hanmin Jung
Mobile Phone
� Worldwide Market Share
� Worldwide mobile device sales to end users in 2008 ~ 2012
Gartner, IDC Worldwide Mobile Phone Tracker
4.0, 14.14.3, 17.19.9, 47.8Apple
7.5, 23.011.0, 31.68.1, 28.45.4, 21.1LG
3.3, 15.8Huawei
Company4Q2012
(%, M. Units)3Q2011
(%, M. Units)3Q2010
(%, M. Units)3Q2009
(%, M. Units)3Q2008
(%, M. Units)
Nokia 17.9, 86.3 27.1, 106.6 31.6,110.4 37.8, 108.5 38.6, 117.9
Samsung 23.0, 111.2 22.3, 87.8 20.5, 71.4 21.0, 60.2 17.0, 52.0
ZTE 3.6, 17.6 4.9, 19.1 3.5, 12.1
Sony Ericsson 4.9, 14.1 8.4, 25.7
Motorola 4.7, 13.6 8.3, 25.4
Others 42.3, 203.8 36.1, 142 32.2, 112.5 20.6, 59.1 20.1, 61.5
Total 482.5 393.7 348.9 287.1 305.4
26
Copyright © 2013 Hanmin Jung27
CDC Influenza Summary
http://www.cdc.gov/flu/weekly/usmap.htm
Copyright © 2013 Hanmin Jung28
Google Flu Trends
J. Ginsberg, “Detecting influenza epidemics using search engine query data”
Copyright © 2013 Hanmin Jung29
Voice Search Evaluation
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/40491.pdf
Copyright © 2013 Hanmin Jung30
Causes of Death
http://image.guardian.co.uk/sys-files/Guardian/documents/2011/10/28/Factfile_deaths_2_2011.pdf
Copyright © 2013 Hanmin Jung31
IBM Watson
http://powet.tv/powetblog/wp-content/uploads/2011/02/watson_the_computer_beats_ken_jennings_and_brad_rutter_at_jeopardy_full.jpg
Copyright © 2013 Hanmin Jung32
Search
Clustering
Extracting
DecisionSupport
Forecasting
ScenarioPlanning
Advising
Modified from D. Bousfield & P. Fooladi, “STM Information: 2009 Final Market Size and Share Report”, 2010.
Value Pyramid
InSciTe Advanced (2011)
InSciTe Adaptive (2012)
OntoFrame (2005~2009)
InSciTe Advanced (2010)
Copyright © 2013 Hanmin Jung33
Big Data & Decision Making
http://lithosphere.lithium.com/t5/Lithium-s-View/Big-Data-Analytics-Reducing-Zettabytes-of-Data-Down-to-a-Few/ba-p/36378
� Reducing Zettabytes of Data Down to a Few Bits
Data help us make better decisions.
The primary function of analytics is to support decision making.
The challenge of big data analytics isto reduce a lot of data down to a few bits.
Copyright © 2013 Hanmin Jung
Strategic Foresight
R. Rohrbeck, H. Arnold, and J. Heuer, “Strategic Foresight in Multimedia Enterprises”, 2007.
34
Copyright © 2013 Hanmin Jung35
Quantitative Analytics
Copyright © 2013 Hanmin Jung36
TI Projects
� FUSE
� Funded by IARPA (early 2011 ~ early 2016)
� Kick off meeting in summer, 2011
� Foresight and Understanding from Scientific Exposition Program
� Seeks to develop automated methods that aid in the systematic, continuous, and comprehensive assessment of technical emergence using information found in the published scientific, technical, and patent literature
� Partners
� BAE Systems, Brandeis Univ., New York Univ., 1790 Analytics, …
Copyright © 2013 Hanmin Jung37
TI Projects
� FUSE
Copyright © 2013 Hanmin Jung
TI Projects
� CUBIST
� Funded by the European Commission (late 2010 ~ late 2013)
� 1st CUBIST workshop in July, 2011
� Combining and Uniting Business Intelligence with Semantic TechnologiesProgram
� Aims to develop new ways to interrogate not only the massive volume data on the Internet, but also analyze the different formats it exist in – such as blogs, wikis, and video
� Partners
� SAP, Ontotext, Sheffield Hallam Univ., …
38
Copyright © 2013 Hanmin Jung39
TI Projects
� CUBIST
Copyright © 2013 Hanmin Jung
TI Projects
� Common Technologies
� Semantic technologies
� Ontology, reasoning, URI scheme
� Analytics model
� BYOM (e.g. technology opportunity discovery model, technology evolution model, formal concept analysis model)
� Information extraction (InSciTe, FUSE)
� Named entities and events/relations in textual documents
40
Copyright © 2013 Hanmin Jung
Our Vision & Architecture
41
Copyright © 2013 Hanmin Jung
InSciTe Advanced (2011)
42
Copyright © 2013 Hanmin Jung43
InSciTe Adaptive (2012)
Copyright © 2013 Hanmin Jung
Data Fact Sheet
� InSciTe Adaptive (2012)
� Articles: 22.6 millions (9.8 millions for papers, 7.6 millions for patents, 5.3 millions for Web data)
� All technical areas (2001~2011)
� Named entities: 1.9 millions
� Authority dictionary: 1.5 millions entries
� LOD data: 290 GB (are being connected)
44
Copyright © 2013 Hanmin Jung45
Supporting Decision Making
http://4.bp.blogspot.com/-Pf1hkccZZh4/TWDJahBpL2I/AAAAAAAAASU/JHLpXi8d9AQ/s640/meetings.jpg
Copyright © 2013 Hanmin Jung46
Data Scientist
http://philanthropy.com/blogs/innovation/matching-data-scientists-and-nonprofits/778
Copyright © 2013 Hanmin Jung
Evidence-based Decision Making
� Advantages
� Ensures that policies are responding to the real needs of the community
� Highlight the urgency of an issue or problem which requires immediate attention
� Enables information sharing amongst other members of the public sector
� Reduces government expenditure which may otherwise be directed into ineffective policies or programs
� Produces an acceptable return on the financial investment that is allocated toward public programs
� Ensures that decisions are made in a way that is consistent with our democratic and political processes which are characterized by transparency and accountability
http://www.abs.gov.au/ausstats/[email protected]/lookup/1500.0chapter32010
47
Copyright © 2013 Hanmin Jung48
InSciTe Project
http://semantics.kisti.re.kr
Copyright © 2013 Hanmin Jung49
Thank you
“A lot of times, people don’t know what they want until you show it to them.”
by Steve Jobs
“Many people won’t be convinced until they’ve seen it for themselves.”
by Jakob Nielsen