data analytics at nicta - university of tasmania€¦ · nicta data analytics (1) discrete ℵp(n...
TRANSCRIPT
NICTA Copyright 2013 From imagination to impact
Data Analytics at NICTA
Stephen Hardy National ICT Australia (NICTA)
NICTA Copyright 2013 From imagination to impact
Outline
• Big data = science! • Data analytics at NICTA
– Discrete – Finite – Infinite
• Machine Learning for the natural sciences
2
NICTA Copyright 2012 From imagination to impact 3
Data, Data, Everywhere…
NICTA Copyright 2012 From imagination to impact
Evolution vs. Revolution
4
Machine Learning Statistics
Computer Science
Scientific Challenges
Societal Challenges
Personal
Enterprise
Government
techniques
problems
problems
techniques
problems
techniques
Analysis of data to prove or disprove hypotheses = science!!
NICTA Copyright 2012 From imagination to impact
Not just the data…
5
Data Scale
Volume
Velocity Variety
Algorithmic complexity
Graphical models
Deep learning
Non-parametric statistics
Random forests
Graph learning
File systems
Distributed computation
SQL / NoSQL
Analytics Engines
Machine learning toolkits
Infrastructure
Big Data
Big Analytics Analytics
Data
NICTA Copyright 2010 From imagination to impact
What is NICTA?
• Australia�s National Centre of Excellence in Information and Communication Technology – 700 Staff, 5 labs, $100m/y revenue
• NICTA objectives – Research Excellence in ICT – Wealth Creation for Australia
• Transforming Industry – $3bn/y direct impact on GDP from projects
• New Industries – Eleven spin-outs, working with ICT SMEs
• Skills and Capacity – 17 University partners, 280 PhD Students
NICTA Copyright 2013 From imagination to impact
Data Analytics: A summary
7
Events
People
Signals
Location
Spatial Fields
Temporal Fields
ℵDiscrete P(ni )
ℜnFinite P(xi )
ℑInfinite P( fi )
NICTA Copyright 2013 From imagination to impact
NICTA Data Analytics (1)
Discrete ℵ P(ni ) Events, People, Text, Gene Sequences
Risk Estimation
Sentiment analysis
Behaviour prediction
Bioinfomatics
Biomedical informatics
Xenome GWIS
Opinion Watch
Biomedical texts
Event Watch
Patent analysis
Offer targeting “Scoobi” data mining / Active learning
Machine learning for Natural Language Processing
Efficient compressed storage and search for sequence data
Energy constrained machine learning Edge-distributed learning
8
NICTA Copyright 2012 From imagination to impact
Event watch • Demo
– http://pmo-eventwatch.research.nicta.com.au/demo/
9
Sentiment Analysis
40,000 world lexicon Part of Speech Sentiment Named Entity
Recognition
Key phase extractor
LDA: Latent Dirichlet Allocation
Differential topic modeling Supervised LDA
NICTA Copyright 2012 From imagination to impact
Key technology - Topic modeling
10
Documents consist of words
Document 5
Document 4
Document 3
Document 2
Document 1
Documents are modeled as a mixture of topics
A B C D
1
2
3
4
5
Probability distribution
Probability distribution
Probability distribution
Probability distribution
Probability distribution
Words are associated with topics
Topic A
Topic B
Topic C
Topic D
Vocabulary
Probability distribution
Probability distribution
Probability distribution
Probability distribution
“Latent Dirichlet Allocation” learns the distributions and allocates every word in each document to a topic
NICTA Copyright 2013 From imagination to impact
NICTA Data Analytics (2)
Finite ℜn P(xi ) Signals, Location, Genetics
Fault Prediction
Preventative Maintenance
Disease expression SparSNP
Efficient distributed sparse regression method
Non-parametric Bayesian methods
distributed, autonomous, real-time data with
classification / clustering
Cri$cal(Water(Mains(
SmartGrid(
Structural(Health(Monitoring(
Service optimisation
11
NICTA Copyright 2012 From imagination to impact 12
NICTA Copyright 2012 From imagination to impact
Existing data
• Age • Type • Material • Size • Length • Failures • Soil • Pressure • Location • Weather • … • and many more
NICTA’s analysis
• Hierarchical Beta Process
• Complex data mix
Cond. Assessment
• Accurate
• Improved prediction
Risk / age Risk /
type Risk / size Age
profile
• Data Driven prediction from multiple existing data sources • Dynamic model update and aggregation
Machine Learning Process
13
NICTA Copyright 2012 From imagination to impact NICTA COPYRIGHT 2013
Improvement on failure prediction
14
• Use 1998-2008 break records for modelling building
• Use 2009-2011 break record for testing
• Multiple factors – Laid year, material,
size, coating, and soil
NICTA
Weibull
NICTA
Weibull Fa
ilure
s de
tect
ed
Length of condition assessment
Wollongong
zoom in (2.5%)
NICTA Copyright 2012 From imagination to impact
Risk Map Risk ranking of pipes based on likelihood of failure
Top 10% pipes
10% ~ 40% pipes
40% ~ 60% pipes
Last 40% pipes
Actual breaks in the following year
Red = highest
Blue = lowest
NICTA Copyright 2013 From imagination to impact
NICTA Data Analytics (3)
Infinite ℑ P( fi ) Spatial Fields, Temporal Fields
Geothermal( Groundwater(
Data Fusion with uncertainty estimation Resource exploration
Soils(
Non-parametric Bayesian methods
Resource management
((((((((Air(quality( Solar!
Solar Energy Forecast Software
Research Excellence in ICT Wealth Creation for Australia
Technical Contact [email protected] Business Contact [email protected]
The Solar Energy Forecast Software project is part of NICTA’s Security and Environment Business Team, providing security for people, resources and critical systems.
The Problem
Did you know failure to predict solar energy production will mean we won’t fully capture available solar resources?
Impact NICTA aims to lower the costs of solar monitoring systems to allow for fast, affordable forecast systems to be installed all over Australia. Specifically, we aim to: • Develop low-cost devices ($500) that measure current
levels of rooftop solar power production by monitoring 150 households across the ACT.
• Utilise low-cost sky cameras ($250) to detect cloud cover. From these images, NICTA’s researchers will project the motion of the clouds and estimate the 'darkness' of their shadows, thereby predicting their inhibitive effect on power output.
• Develop software that will predict solar energy production by suburb within minutes and hours rather than days.
Electricity grids around the world were not designed to manage large fluctuations of supply in power generation. Traditional forms of power supply such as coal-fired stations provide a stable, non-fluctuating form of power supply. However, the energy we receive from the sun is much more unpredictable and grids are not designed to cope with the dynamic nature of renewable energy production.
Current prediction methods are not accurate enough at the suburb level and not fine-grained enough (i.e. currently a matter of days, not minutes). Current methods also require expensive (up to $75,000) and obtrusive equipment in a large area to collect the required data.
Renewable Energy
Collaborators
goog
le.c
om.a
u/im
ages
en.w
ikip
edia
.org
Resource discovery Plant system diversity Non-linear laser physics
Big(Data(Knowledge(Discovery(
16
Transparent Machine Learning
NICTA Copyright 2012 From imagination to impact
Engineered Geothermal Systems
NICTA Copyright 2012 From imagination to impact
Geophysical Data
Gravity Magnetics Core Samples Temperature Reflection Seismic Magnetotellurics Gravity Gradiometry Down-hole Geophysics Stress Porosity Passive Seismic Micro Seismic ...
NICTA Copyright 2012 From imagination to impact
Distributions of geologies
Magneto-Telleurics Seismic Magnetism Gravity
…
Probability Distribution
NICTA Copyright 2012 From imagination to impact
Results – fusing gravity & boreholes
20
Predicted mean density and uncertainty
NICTA Copyright 2012 From imagination to impact
Reuse
21
Machine Learning Statistics
Computer Science
Scientific Challenges
Societal Challenges
Personal
Enterprise
Government
techniques
problems
problems
techniques
problems
techniques
How can we apply new techniques of machine learning / analytics to science?
NICTA Copyright 2012 From imagination to impact
Machine Learning in the Natural Sciences • Big Data Knowledge Discovery • Science and Industry Endowment Fund (www.sief.org) project
• Collaboration between • NICTA (machine learning) • SIRCA (big data) • Sydney Uni (plate tectonics) • Macquarie Uni (forest ecosystems, non-linear laser physics)
• How do we make machine learning easier to use in the natural sciences?
NICTA Copyright 2012 From imagination to impact
The End