data mining and machine learning for big data chengqi zhang director of qcis university of...
TRANSCRIPT
Data Mining and Machine Learning
for Big Data Chengqi Zhang
Director of QCISUniversity of Technology, Sydney
Outline
ARC CoE bid in 2013 What we have learnt What we plan to do
18 September 2014
Big Data Research CoE
3
Big Data Research CoE
Scale
High quality postgraduates
TransformationalCritical issue
Global competitiveness
Community engagement
Capacity building
4
Vision and Mission1Overview
Today
Big Data Paradigm2Big Data Research3
4
5Team and Governance6The Critical Imperative7Distinctive Value and Impact8Big Data Research
Ecosystem
Objectives and Milestones
5
The centre will generate wealth and increased productivity for Australia by creating a vibrant Big Data Research Ecosystem that will put Australia in a global leadership position.
Mission
Transform foundational data science
Create a Big Data high performance utility to unlock Big Data for smarter decision-making
Build human capacity and train next generation of Big Data researchers
1 Vision and Mission
Vision
6
2 Big Data Paradigm -The Need
Big Data is a Game Changer
7
2
Big Data is pushing the frontiers of the current paradigm
2 Big Data Paradigm
Data in storage
Data production is big and doubling each year!
2010: we crossed the barrier of one zettabyte(ZB)
1 ZB = 1012 GB
2013: more than 4 ZB of data.
VolumeData in many forms
Network data
Spatial data
Sensor data
VarietyData on the move
VelocityData in doubt
Veracity
8
9
3 3 Big Data Research
Five target challenges
Data Acquisition and Quality: Just-in-time data linking and integration; data quality management;
provenance.1Big Data Processing: Storage and retrieval of big data; scalability; efficient indexing and searching.2 Real-time Analytics: Real-time machine learning at Big Data scale; real-time stream analytics with high volume; real-time knowledge discovery from deep analytics.3Decision-Making: Gathering the “best” evidence; making sense of Big Data; developing and exploiting insight and foresight with uncertain, inconsistent, incomplete info; risk.4Big Data Computing Paradigm: Fast real-time iterative processing with big distributed data.5
10
3 3 Big Data Research
Five research programs
Data Acquisition and Quality1Big Data Processing2Real-Time Analytics3Decision-Making4Big Data Computing Paradigm5
11
3 3 Big Data Research
Major scientific problems
Data Acquisition and Quality: Data inconsistency.1Big Data Processing: Sublinear time (approximate) algorithms against complexity.2Real-Time Analytics: Trade-off between scalability and analytics depth.3 Decision Making: Reasoning with quantitative and qualitative uncertain real-time information.4Big Data Computing Paradigm: Concurrency and mobility in computing Big Data.5
Indu
stry
Eng
agem
ent
Infrastructure and Netw
orks
Business Value:
Westfield, Woolworths, IBM, Google
Customer Behaviour:CBA, IBM,
Woolworths
Geolocation:Westfield,
Woolworths, IBM, Google
Technology:SAP, HP, CA,
Oracle, Schneider
Electric
National:e.g. NCI
Universities:UTS, UoM, UQ,
UNSW
National:CSIRO, CoEs,
Industry
Global Collaborators
: Academic, Industry, Govt
4 Big Data Research Ecosystem
Big Data Research Centre of Excellence
Training and OutreachDoctoral Training 12
4 Big Data Research Ecosystem
Big Data Research Centre of Excellence
Training and OutreachDoctoral Training
Scale of PhD program
Industry embedded
Industry Doctoral Training Centre
Co-supervision of students
Schools outreach
Big Data Research Centre of Excellence
Training and OutreachDoctoral Training 13
Overarching centre objectives: Establish the computational foundation of Big Data Science Develop new framework for high performance Big Data technologies Train new generation of researchers for Big Data research and
applications
5 Objectives and Milestones
Key mid term milestones (Year 4)• Develop 10 benchmark problems that are adopted by industry• New computational models and programming abstractions to
support improvements of one order of magnitude in computational performance on all benchmark problems
• Algorithms to enable real-time querying, analytics and processing multimodal data with competitive performance on international benchmark problems
• Methodologies for asserting data quality and enabling integration required for the benchmark problems
• Real-time interactive visualisations for decision-making with benchmark problems
• First cohort of PhDs will graduate, over 30 of which will be embedded in industry 14
Overarching centre objectives: Establish the computational foundation of Big Data Science Develop new framework for high performance Big Data
technologies Train new generation of researchers for Big Data research and
applications
5 Objectives and Milestones
Final deliverables (Year 7)
Industry-ready techniques to facilitate real-time retrieval of actionable information that enables predictive analytics and decision -making with world leading performance.
Big Data is a utility that can support three orders of magnitude improvement in computational performance
on ten benchmark problems.
15
4
Chief Investigators:• Top research groups • Awarded over 60 ARC grants since
2008• Program leaders (70% FTE time) and
CIs (50% FTE time)
Partner Investigators (Academic):• International research leaders from
USA, Europe and Asia
Partner Investigators (Engineers):• Five PIs are from leading companies,
such as IBM, SAP, Oracle
6 6 Team and Governance
The Dream Team
16
Key Sub-Committees
Industry Engagement& Outreach
Commercialisation
Advisory BoardChair: Ron Sandland
Executive Management Team
Chair: Centre Director
5 6 Team and Governance
Structure and governance
ResearchChair: Research
Director
MentoringChair: Education
Director
Industry Engagement& Outreach
Chair: IE & O Director
17
4
• Chair – Dr Ron Sandland
• National Agencies – NICTA / CSIRO
• Industry – 3 representatives (rotating)
• University – DVCRs (or nominees)
• Leading Scholars – 3 international scholars
6 Team and Governance
Advisory Board
18
4
Centre of Excellence• Critical mass• Increases
effectiveness of other efforts
• Leverages momentum• Creates intensity!
6 7 The Critical Imperative
Limited window of opportunity
AInternational effort gathering speed
Australia can benefit from first mover advantage
B
CIndustry and community focus and demand
DInternational and national “Dream Team”
19
47 8 Distinctive Value and Impact
Right area: Transforming fundamentals of data science to make Big Data a high performance utility
Right people: Elite world-leading team of academic and industry leaders with decent time commitments
Right support: Strong industry commitment; Funding, People, Infrastructure, Data
Right time: Australia should take this golden opportunity to lead world in this very important area
Our distinctive value
20
478 Distinctive Value and Impact
Establish Australia at the global forefront of information and Big Data Science
Increasing Australia’s global competitiveness and productivity
Paradigm shift - significant advances and new frameworks for decision-making with Big Data
Significant growth in national critical mass capability and knowledge
Delivering global impact
Right area: Transforming fundamentals of data science to make Big Data a high performance utility
Right people: Elite world-leading team of academic and industry leaders with decent time commitments
Right support: Strong industry commitment; Funding, ,People, Infrastructure, Data
Right time: Australia should take this golden opportunity to lead world in this very important area.
Our distinctive value
21
Outline
ARC CoE bid in 2013 What we have learnt What we plan to do
18 September 2014
What we have learnt
Track record for collaborations Specific application areas by impacting Australia
25 November 2014
Outline
ARC CoE bid in 2013 What we have learnt What we plan to do
18 September 2014
What we plan to do
Establishing a network of “Data Science Australia”
Identify one or two application areas Some pilot projects by involving all members
from DSA network
18 September 2014
Establishing a network of “Data Science Australia”
DSA was established on 13 November 2014 DSA includes UTS, UQ, U. of Melbourne, UNSW,
Monash U. and CSIRO Its objective is to prepare for next ARC CoE bid
18 September 2014
Identify one or two application areas
It could be resource industry; It could be Finance Industry; To be further investigated.
18 September 2014
Some pilot projects
This is the job to be done in next year or two.
18 September 2014
Thank You!
Questions?
18 September 2014