sc4 workshop 1: simon scerri: existing tools and technologies

12
BIG DATA EUROPE Integrating Big Data, Software & Communities for Addressing Europe’s Societal Challenges Tools and Technologies

Upload: bigdataeurope

Post on 15-Apr-2017

916 views

Category:

Automotive


2 download

TRANSCRIPT

BIG DATA EUROPE

Integrating Big Data, Software & Communities for Addressing Europe’s Societal Challenges

Tools and Technologies

Open-Source Technologies for Big Data Apps (small selection :-)

2 mai 2023www.big-data-europe.eu

???

2

Big Data – Technologies vs 3Vs

www.big-data-europe.eu

Volume

Velocity

Variety

Storm

The three Big Data – Variety is often neglected

Quelle: Gesellschaft für Informatik

Big Data Technology - Groups

2 mai 2023www.big-data-europe.eu

Big Data Technologie

s

Data Storage

Technologies

Data Processin

g

Workflow Coordinati

on

Querying/

Processing

Search

Data Export/ Import

Data AnalysisStatistics

Text Mining

Big Data Requirements

2 mai 2023www.big-data-europe.eu

Analysis of historical dta Millions of entries Varying analysis quesitions Years of input data => Big Data Batch

Processing

Interactive analysis by online queries Thousands of users online Extremely fast response

time Super high availability => Big Data Databases

Analysis of actual data with low latency in "real-time" React to newest trends Low-Latency change

detection Real-time online monitoring => Big Data Stream

ProcessingBut how to put it together ?

A Big Data Management System

2 mai 2023www.big-data-europe.eu

ZooKeeper

askaban

Kafka

cassandravoldemort

MongoDBCouchDB

elastic searchsolrlucene

Conventional Hadoop Ecosystem + NoSQL components

2 mai 2023www.big-data-europe.eu

Batch Function

Speed Function

Data Storage

pages withpostings

Batch View

Realtime Viewme

ssag

e pa

ssin

g

message passing

Application

Horizontal Scalability in the Lambda Architecture

> volume

> users

> users, volume

> velocity> volume, velocity

Blueprint of the Data Aggregator Platform

Follows typical Lambda Architecture

Integrated on top of existing Big Data distribution + Semantic Layer (Retaining Semantics using LD

approach )

Batch Layer

Speed Layer

Data Storage

Real-time data &

Transactions …

Batch View

Real-time Viewm

essa

ge p

assin

g

message passing

Applications & ShowcasesReal-time dashboardsDomain-specific BDE apps

Big Data AnalyticsIn-stream Mining

BDE Platform &

IntelligenceInput dataStreamSpatialSocialStatistical TemporalTransactionalImagery

BDE Platform based on BigTop

Packaging Smoke testing VirtualizationPackage RPMs and DEBs, so that you can manage and maintain your own cluster.

Integrated smoke testing framework

Vagrant recipes, raw images, and docker recipes for deploying BigData infrastructures from zero.

2 mai 2023www.big-data-europe.eu+ Semantic Layer - Retaining Semantics using Linked Data

Data Aggregator Platform Challenges

Ingest semantic (RDF) and non-semantic (CSV, JSON, XML, …) datao Integrate various mapping techniques (R2RML, CSV on the

Web, JSON-LD) preserve semantics, provenance and metadata in Big

Data processing chainso Preserve URI/IRIso Preserve triples

Exploit semantics for aggregations2 mai 2023www.big-data-europe.eu

Thank You!

Batch Layer

Speed Layer

Data Storage

Real-time data &

Transactions …

Batch View

Real-time Viewm

essa

ge p

assin

g

message passing

Applications & ShowcasesReal-time dashboardsDomain-specific BDE apps

Big Data AnalyticsIn-stream Mining

BDE Platform &

IntelligenceInput dataStreamSpatialSocialStatistical TemporalTransactionalImagery