sc4 workshop 1: simon scerri: existing tools and technologies
TRANSCRIPT
BIG DATA EUROPE
Integrating Big Data, Software & Communities for Addressing Europe’s Societal Challenges
Tools and Technologies
Open-Source Technologies for Big Data Apps (small selection :-)
2 mai 2023www.big-data-europe.eu
???
2
Big Data Technology - Groups
2 mai 2023www.big-data-europe.eu
Big Data Technologie
s
Data Storage
Technologies
Data Processin
g
Workflow Coordinati
on
Querying/
Processing
Search
Data Export/ Import
Data AnalysisStatistics
Text Mining
Big Data Requirements
2 mai 2023www.big-data-europe.eu
Analysis of historical dta Millions of entries Varying analysis quesitions Years of input data => Big Data Batch
Processing
Interactive analysis by online queries Thousands of users online Extremely fast response
time Super high availability => Big Data Databases
Analysis of actual data with low latency in "real-time" React to newest trends Low-Latency change
detection Real-time online monitoring => Big Data Stream
ProcessingBut how to put it together ?
A Big Data Management System
2 mai 2023www.big-data-europe.eu
ZooKeeper
askaban
Kafka
cassandravoldemort
MongoDBCouchDB
elastic searchsolrlucene
Conventional Hadoop Ecosystem + NoSQL components
2 mai 2023www.big-data-europe.eu
Batch Function
Speed Function
Data Storage
pages withpostings
Batch View
Realtime Viewme
ssag
e pa
ssin
g
message passing
Application
Horizontal Scalability in the Lambda Architecture
> volume
> users
> users, volume
> velocity> volume, velocity
Blueprint of the Data Aggregator Platform
Follows typical Lambda Architecture
Integrated on top of existing Big Data distribution + Semantic Layer (Retaining Semantics using LD
approach )
Batch Layer
Speed Layer
Data Storage
Real-time data &
Transactions …
Batch View
Real-time Viewm
essa
ge p
assin
g
message passing
Applications & ShowcasesReal-time dashboardsDomain-specific BDE apps
Big Data AnalyticsIn-stream Mining
BDE Platform &
IntelligenceInput dataStreamSpatialSocialStatistical TemporalTransactionalImagery
BDE Platform based on BigTop
Packaging Smoke testing VirtualizationPackage RPMs and DEBs, so that you can manage and maintain your own cluster.
Integrated smoke testing framework
Vagrant recipes, raw images, and docker recipes for deploying BigData infrastructures from zero.
2 mai 2023www.big-data-europe.eu+ Semantic Layer - Retaining Semantics using Linked Data
Data Aggregator Platform Challenges
Ingest semantic (RDF) and non-semantic (CSV, JSON, XML, …) datao Integrate various mapping techniques (R2RML, CSV on the
Web, JSON-LD) preserve semantics, provenance and metadata in Big
Data processing chainso Preserve URI/IRIso Preserve triples
Exploit semantics for aggregations2 mai 2023www.big-data-europe.eu
Thank You!
Batch Layer
Speed Layer
Data Storage
Real-time data &
Transactions …
Batch View
Real-time Viewm
essa
ge p
assin
g
message passing
Applications & ShowcasesReal-time dashboardsDomain-specific BDE apps
Big Data AnalyticsIn-stream Mining
BDE Platform &
IntelligenceInput dataStreamSpatialSocialStatistical TemporalTransactionalImagery