a roadmap for big-data research and education › cms_fs › 1.145312! › file ›...

Post on 06-Jul-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A roadmap for big-data

research and education at LTU

olov.schelen@ltu.se

Outline

•  What is big data? What’s behind the hype? •  Industry and academia outlooks •  Basic tools & frameworks •  National/international research and innovation agendas •  Roadmap opportunities:

•  Mobility & cloud •  Internet of Things, cyber-physical systems •  Data analytics •  Datacenter automation and management, etc.

•  Strengths, weaknesses, opportunities, threats

What is “big data”?

•  Data with properties meeting the 3-4 Vs •  volume: from machines, networks, social media, etc. •  variety: often unstructured •  velocity: continuous flow, often real-time •  veracity: full of bias, noise, abnormality, irrelevance

How do we process it?

•  Similar objectives as with any data •  creation •  retrieval •  storage •  analysis •  presentation •  visualization, etc.

•  However, new scalable methods needed to effectively and efficiently process the data

Origin 1: business analytics and corporate decision making in enterprises

A survey by BARC shows where data comes from

More on enterprise and business analytics

A survey by Jaspersoft shows how data is stored

Origin 2: The big four in cloud

Amazon, Google, Facebook, Yahoo (but now there are hundreds of followers) •  It is worth studying how their systems are built

under the hood. •  Based on fundamentals in distributed systems

research •  New solutions that are adapted to specific

requirements, which allow for trade-offs in order to increase speed

•  Adressing all 4 Vs

Research fields

•  Distributed and pervasive systems, grid systems

•  Computer architecture, virtualization •  Networking •  Data mining and big data analytics •  Automation •  Control theory

•  In combination with research in application areas (or deep understanding of user needs) !

Toolsets

•  The traditional tools used in the mentioned fields

•  Some relatively new ones specifically for big data processing •  Showing two example stacks on next page

•  The potential set is huge and new inventions are added quickly

•  Having some common ground knowledge and a lab that supports those tools is a success factor!

BDAS

Stratosphere

Notes

•  BDAS and Stratosphere will be presented by their originators at the Cloudberry workshop in June!

•  Whatever toolsets we prefer, it should as far as possible be used in lab assignments at undergraduate and masters level

Arenas and agendas

•  Process IT Innovations •  Cloudberry Datacenters •  Centek and county municipality efforts in the

region •  The information driven society (Vinnova SIO) •  EU arenas, Horizon 2020

•  Partnerships and cooperations

Potential roadmap items follow

•  Initial set, more can be added •  Mostly focused on systems with experimental

research and evaluation •  Theoretical evaluations where applicable

Mobility and cloud computing

•  Personalized (group) clouds •  credentials, security

•  Light-weight distributed cloud architectures •  Monitoring and profiling •  Make mobility and cloud even smoother

•  locality, caching,

Distributed algorithms and data structures

•  Based on application class specific requirements and trade-offs •  Many fundamentals where researched decades ago,

but with new deltas in requrements, there are opportunities

•  Looking into dynamic scenarios and mobility •  Not only fast lookups, but also fast re-build of data

structures, locality challenges and opportunities, etc

Machine learning

Covered in depth in Fredrik Sandins report

Content distribution and named data networking

•  A major challenge of growth in data intensive applications (e.g., video)

•  Interesting in combination with sensor data and similar models where content is produced by billions of devices •  Addressing models •  Data aggregation

Internet of Things (IoT)

•  By definition connected to the Internet •  Large number of devices •  Crowd sensing •  Aggregation and indexing architectures •  Open data, or restricted data •  Resource efficiency (power, bandwidth,

storage, space etc)

Cyber physical systems (CPS)

•  Can encompass IoT technologies •  But also embedded/closed systems •  Process industry •  Real-time systems •  Availability, fail-over, redundancy

Data analytics

•  Novel analytics methods related to the data presented on previous slides

•  Application specific data to analyse •  Where are the gaps?

SWOT Strengths good systems knowledge, experimental research, strong industry cooperation.

Opportunities the growth in datacenter industry, strong arenas, great industry interest, cross functional projects (applications software infrastructure, IoT/M2M)

Weaknesses late starter in big data, few researchers directly engaged in topic, too few graduate students in the topic.

Treats speed, ramp-up of research, lack of international cooperation, insufficient contribution/hype ratio.

So, lets kick off!

Discussions J

top related