apache big_data europe event: "integrators at work! real-life applications of apache big data...
TRANSCRIPT
![Page 1: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/1.jpg)
Big Data Europe : Empowering Communities with Data Technologies
Dr Hajira Jabeen, University of Bonn
Apache Big Data Europe - Seville
![Page 2: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/2.jpg)
Structure◎Evolution of BDE Architecture◎BDE Stack◎Beyond the State of the Art
o Workflows (Support Layer)o User Interfaceo Data Lake (Ontario)o Semantics Analytics Stack
2
![Page 3: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/3.jpg)
Architectural design 1
![Page 4: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/4.jpg)
Architectural design 24
![Page 5: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/5.jpg)
Architectural design 3 (released)
5
![Page 6: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/6.jpg)
Architecture for SC 7 6
Stack
![Page 7: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/7.jpg)
Supported ComponentsSearch/indexing Data processing
Apache Solr Apache Spark
Data acquisition Apache Flink
Apache Flume Semantic Components
Message passing Strabon
Apache Kafka Sextant
Data storage GeoTriples
Hue Silk
Apache Cassandra SEMAGROW
ScyllaDB LIMES
Apache Hive 4Store
Postgis OpenLink Virtuoso
7
![Page 8: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/8.jpg)
User profiles8
![Page 9: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/9.jpg)
Platform installation◎Manual installation guide◎Using Docker Machine
o On local machine (VirtualBox)o In cloud (AWS, DigitalOcean, Azure)o Bare metal
◎Screencasts
9
![Page 10: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/10.jpg)
Developing a component◎Base Docker images
o Serve as a template for a (Big Data) technology
o Easily extendable custom algorithm/data◎Published components
o Responsibilities divided b/w partnerso Image repositories on GitHubo Automated builds on DockerHubo Documentation on BDE Wiki
10
![Page 11: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/11.jpg)
Deploying a Big Data Stack
◎Stack collection of communicating components to solve a specific problem◎Described in Docker Compose
o Component configurationo Application topology
◎Orchestrator required for initialization processo Components may depend on each othero Components may require manual
intervention
11
![Page 12: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/12.jpg)
User Interfaces◎Target: Facilitate use of the platform
o User Interface Adaption ◎Available interfaces
o Workflow UIs❖Workflow Builder❖Workflow Monitor
o Swarm UIo Integrator UI
12
![Page 13: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/13.jpg)
BDE Workflow Builder13
![Page 14: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/14.jpg)
BDE Workflow Monitor14
![Page 15: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/15.jpg)
Swarm UI15
![Page 16: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/16.jpg)
Integrator UI16
![Page 17: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/17.jpg)
Beyond the State of the Art
17
![Page 18: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/18.jpg)
Smart Big Data
Increase Big Data value by adding meaning to it!
18
![Page 19: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/19.jpg)
Orchestration ◎ Goal: flexible composition of Big Data
pipelines◎ Microservice cooperation using shared
vocabulary ◎ Uses HTTP requests from a web based
frontend◎ Meta information stored in a triple store◎ Microservice uses triple store as service
backend
19
![Page 20: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/20.jpg)
Semantic Data Lake (Ontario)
◎Repository of data in its raw formato Structured, semi-structured, unstructured
◎Schema-lesso No schema is defined on write, it is
defined only on read◎Open to any kind of processing
20
![Page 21: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/21.jpg)
Data Lake 21
Source: Big Data @ Microsoft - Raghu Ramakrishnan
![Page 22: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/22.jpg)
Semantic Data Lake (Ontario)
◎Add a Semantic layer on top of the source datasetso Semantic data is handled as-iso Non-Semantic data is semantically lifted
using existing ontology terms
22
![Page 23: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/23.jpg)
Ontario: Architecture23
Translate and execute Query via Source-specific Access Method
Decompose to Source-specific Entities
Decompose SPARQL Query
![Page 24: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/24.jpg)
Semantic Analytics Stack (SANSA)
24
![Page 25: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/25.jpg)
SANSA: Motivation◎ Abundant machine readable structured
information is available (e.g. in RDF)o Across SCs, e.g. Life Science Data
(OpenPhacts)o General: DBpedia, Google knowledge
grapho Social graphs: Facebook, Twitter
◎ Need for scalable querying, inference and machine learningo Link predictiono Knowledge base completiono Predictive analytics
25
![Page 26: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/26.jpg)
![Page 27: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/27.jpg)
SANSA Stack27
![Page 28: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/28.jpg)
SANSA: Read Write Layer◎ Ingest RDF and OWL data in different formats using Jena / OWL API style interfaces
◎Represent data in multiple formats (e.g. RDD, Data Frames, GraphX, Tensors)
◎Allow transformation among these formats
◎Compute dataset statistics and apply functions to URIs, literals, subjects, objects → Distributed LODStats
28
![Page 29: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/29.jpg)
SANSA: Query Layer◎To make generic queries efficient and fast using:o Intelligent indexing o Splitting strategieso Distributed Storage o Distributed/ Federated Querying
◎Early work in progress: query evaluation (SPARQL-to-SQL approaches, Virtual Views)
◎Provision of W3C SPARQL compliant endpoint
29
![Page 30: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/30.jpg)
SANSA: Inference Layer◎W3C Standards for Modelling: RDFS and OWL
◎Parallel in-memory inference via rule-based forward chaining
◎Beyond state of the art: dynamically build a rule dependency graph for a rule set
◎→ Adjustable performance levels
30
![Page 31: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/31.jpg)
SANSA: ML Layer◎Distributed Machine Learning (ML) algorithms that work on RDF data and make use of its structure / semantics
◎Work in Progress:o Tensor Factorization for e.g. KB completion
(testing stage)o Simple spatiotemporal analytics (idea stage)o Graph Clustering (testing stage)o Association rule mining (evaluation stage)o Semantic Decision trees (idea stage)
31
![Page 33: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/33.jpg)
33
![Page 34: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/34.jpg)
BDE vs Hadoop distributions
Hortonworks Cloudera MapR Bigtop BDE
File System HDFS HDFS NFS HDFS HDFSInstallation Native Native Native Native lightweight
virtualizationPlug & play components (no rigid schema)
no no no no yes
High Availability Single failure recovery (yarn)
Single failure recovery (yarn)
Self healing, mult. failure rec.
Single failure recovery (yarn)
Multiple Failure recovery
Cost Commercial Commercial Commercial Free Free
Scaling Freemium Freemium Freemium Free FreeAddition of custom components
Not easy No No No Yes
Integration testing yes yes yes yes --Operating systems Linux Linux Linux Linux AllManagement tool Ambari Cloudera
managerMapR Control system
- Docker swarm UI+ Custom
34
![Page 35: Apache Big_Data Europe event: "Integrators at work! Real-life applications of Apache Big Data components : BDI technical overview](https://reader031.vdocuments.net/reader031/viewer/2022022202/5883eccf1a28ab34428b54ad/html5/thumbnails/35.jpg)
BDE vs Hadoop distributions
◎BDE is not built on top of existing distributions
◎Targets o Communitieso Research institutions
◎Bridges scientists and open data◎Multi Tier research efforts towards Smart Data
35