Budapest University of Technology and Economics
Department of Measurement and Information Systems
Distributed Incremental Model Queries over the Cloud
Dániel Varró
Budapest University of Technology and EconomicsFault Tolerant Systems Research Group
Outline of the Talk
Motivation & Background:
• Change detection in CPS
• Design Space Exploration
Incremental Model Queries: The EMF-IncQuery framework
• Language - Execution
Distributed Incremental Model Queries (IncQuery-D)
• Architecture -
Performance Benchmarks
• Distributed model load
• Incremental query evaluation
� Main Contributors
o István Ráth (lead)
o Ákos Horváth
o Gábor Bergmann
o Ábel Hegedüs
o Zoltán Ujhelyi
o Benedek Izsó
o Gábor Szárnyas
o Csaba Debreceni
o Dénes Harmath
o József Makai
o Dániel Stein
Challenges for IoT / CPS
Cyber
world
Physical
world
Problem
Solution
scheme
Deployment
Service
Solution
pattern
Component
service
offering
Challenge:Detect changes• in system state• in environmentAbstractionsDesign spaceexploration
CHANGE DETECTION BY
INCREMENTAL QUERIES
Big Data Analytics for CPS
Data Store
Data Store
EventsCloud basedComputation
Sensors / Services Cloud based appsData and Event sources
Polling
Events
Challenge: Change Detection in CPS
Data Store
Data Store
EventsCloud basedComputation
Sensors / Services Cloud based appsData and Event sources
Polling
Events
?
Change Detection in CPS by Incremental Queries
Data Store
Data Store
Sensors / Services Cloud based appsData and Event sources
Polling
EventsU
nified
Ch
ange
Detectio
nb
yU
nified
Ch
ange
Detectio
nb
yD
istribu
tedIn
cremen
talQu
eries
Cloud basedComputation
MOTIVATION FOR
INCREMENTAL MODEL QUERIES
Motivation: Early validation of design rules
SystemSignalGroup design rule (from AUTOSAR)
o A SystemSignal and its group must be in the same IPdu
o Challenge: find violations quickly in large models
o New difficulties
• reversenavigation
• complexmanualsolution
AUTOSAR: • standardized SW architectureof the automotive industry
• now supported by modern modeling toolsDesign Rule/Well-formedness constraint: • each valid car architecture needs to respect• designers are immediately notified if violatedChallenge: • >500 design rules in AUTOSAR tools• >1 million elements in AUTOSAR models• models constantly evolve by designers
Domain-Specific Modeling Languages
Abstract
Meta-model
Model
«type»
Validation of Well-formedness Constraints
Meta-model
Model
pattern switchWOSignal(sw) {
Switch(sw);
neg find switchHasSignal(sw);
}
pattern switchHasSignal(sw) {
Switch(sw);
Signal(sig);
Signal.mountedTo(sig, sw);}
Query
Modify
User
Result
Model sizes in practice
� Models with 10M+ elements are common:
o Car industry
o Avionics
o Source code analysis
� Models evolve and change continuously
Source: Markus Scheidgen, How Big are Models – An Estimation, 2012.
Application Model size
System models 108
Sensor data 109
Geospatial models 1012
Validation can take hours
MODEL QUERIES AND
GRAPH PATTERN MATCHING
What is a model query?
� For a programmer:
o A piece of code that searches for parts of the model
� For the scientist:
o Query = set of constraints that have to be satisfied by (parts of) the (graph) model
o Result = set of model element tuples that satisfy the constraints of the query
o Match = bind constraint variables to model elements
� A query engine: Supports
o the definition&executionof model queries
Query(A,B) � ∧condi(Ai,Bi)
• all tuples of model elements a,b• satisfying the query condition• along the match A=a and B=b• parameters A,B can be input/ output
Graph Pattern Matching for Queries
� Match:
o m: L � G (graph morphism)
o CSP:
• Variables: Nodes of L
• Constraints: Edges of L
• Domain values: G
o Complexity: |G|^|L|
L
Gstraight
left
route: Route sp: SwitchPosition
switch: Switchsensor: Sensor
switchPosition
switch
sensor
routeDefinition
All sensors with a switch that belongs to a route must directly be linked to the same route.
route: Route sp: SwitchPosition
switch: Switchsensor: Sensor
switchPosition
switch
sensor
routeDefinition
Graph Pattern Matching (Local Search)
� Search Plan:
o Select the first nodeto be matched
o Define an ordering ongraph pattern edges
� Search is restarted fromscratch each time
12
0
3
4
straight
left
route: Route sp: SwitchPosition
switch: Switchsensor: Sensor
switchPosition
switch
sensor
routeDefinition
Incremental Graph Pattern Matching
� Main idea: More space to less timeo Cache matches of patterns
o Instantly retrieve match (if valid)
o Update caches upon model changes
o Notify about relevant changes
� Approaches: o TREAT, LEAPS, RETE, …
o Tools: VIATRA, GROOVE, MoTE, TCore
straight
left
route sp switch sensor
r1 sp1 sw1
INCREMENTAL MODEL QUERIES:
THE EMF-INCQUERY PROJECT
• Declarative graph querylanguage
• Transitive closure, Negative cond., etc.
• Compositional, reusable
• Declarative graph querylanguage
• Transitive closure, Negative cond., etc.
• Compositional, reusable
Definition
• Incremental evaluation
• Cache result set
• Maintain incrementallyupon model change
• Incremental evaluation
• Cache result set
• Maintain incrementallyupon model change
Execution
• Derived features,
• On-the-fly validation
• View generation, Notifications, Soft links, Databinding,
• Derived features,
• On-the-fly validation
• View generation, Notifications, Soft links, Databinding,
Features
EMF-IncQuery: An Open Source Eclipse Project
http://eclipse.org/incquery
The IncQuery (IQ) Graph Query Language
� IQ: declarative query languageo Attribute constraints
o Local + global queries
o Compositionality+Reusabilility
o Recursion, Negation,
o Transitive Closure
o Syntax: DATALOG style
pattern routeSensor(sensor: Sensor) = {TrackElement.sensor(switch,sensor);Switch(switch);SwitchPosition. switch(sp, switch);SwitchPosition(sp);Route.switchPosition(route, sp);Route(route);neg find head(route, sensor);
}pattern head(R, Sen) = {
Route.routeDefinition(R, Sen);}
ModelQuery(A,B): • tuples of model elements A, B• satisfying the query condition• enumerate 1 / all instances• A,B can be input or output
route: Route sp: SwitchPosition
Switch: Switchsensor: Sensor
switchPosition
switch
sensor
routeDefinition
Incremental Query Evaluation by RETE
� AUTOSAR well-formedness validation rule
Communicationchannel
Logical signal Mapping Physical signal
Invalid model fragment
� Instance model
Valid model fragment
Fill the input nodesFill the worker nodesRead the result setModify the modelPropagate the changesRead the changes in the
result set (deltas)
Incremental Query Evaluation by RETE
join
join
antijoin
Result set
Communicationchannel
Logical signal Mapping Physical signal
Performance of EMF-INCQUERY
� Incremental graph queries based on Rete
� Built for the Eclipse Modeling Framework
model size
runtimebatch queries
incremental queries
Runtime is proportional to the size of the modification.
Performance of EMF-INCQUERY
model size
incremental queries
batch queries
memory limit
Storing partial resultsmemory
consumption
Selected Applications (EMF-IncQuery)• Complex traceability
• Query driven views
• Abstract models byderived objects
Toolchain forIMA configs
• Connect to MatlabSimulink model
• Export: Matlab2EMF
• Change model in EMF
• Re-import: EMF2Matlab
MATLAB-EMF Bridge
• Live models(refreshed 25 frame/s)
• Complex eventprocessing
Gesture recognition
• Experiments on opensource Java projects
• Local search vs. Incremental vs. Native Java code
Detection of bad code smells
• Rules for operations
• Complex structuralconstraints (as GP)
• Hints and guidance
• Potentially infinitestate space
Design SpaceExploration
• Itemis (developer)
• Embraer
• Thales
• ThyssenKrupp
• CERN
Known Users
INCQUERY-D: DISTRIBUTED
INCREMENTAL MODEL QUERIES
Goals of INCQUERY-D
� Objectives
o Distributed incremental pattern matching
o Adaptation of EMF-INCQUERY’s tooling to graph DBs
o Executed over cloud infrastructure (COTS hardware)
� Achieve scalability by avoiding memory bottleneck
o Sharding separately
• Data
• Indexers
• Query network
o In memory:
• Index + Query
Assumptions
• All Rete nodes fit on a server node• Indexers can be filled efficiently• Modification size ≪ model size• The application requires the complete result
set of the query (opposed to just one match)
Dimensions of Scalability
� Infrastructure
o Number of machines
o Available memory / CPU
o Network performance
o Number of concurrent users
� Model
o Model size
o Model characteristics
� Queries
o Number of queries
o Query complexity
Metrics
INCQUERY-D Architecture
Server 1
Databaseshard 1
Server 2
Databaseshard 2
Server 3
Databaseshard 3
Transaction
In-memory EMF modelDatabaseshard 0
Server 0
Rete net
Indexer layer
EMF-INCQUERY INCQUERY-D
Distributed query evaluation network
Distributed indexer Model access adapterIndexing
Indexer Indexer Indexer Indexer
Join
Join
Antijoin
In-memory storage
Distributed indexing, notification
Production network• Stores intermediate query results• Propagates changes
Distributed persistent storage
Distributed production network• Each intermediate node can be allocated
to a different host• Remote internode communicationAkka
Triple store (4store),Document DB (Mongo),RDF over Column family
(Cumulus)
Databaseshard 0
Termination Protocol in INCQUERY-D
Server 1
Databaseshard 1
Server 2
Databaseshard 2
Server 3
Databaseshard 3
Transaction
Server 0
INCQUERY-D
Indexer Indexer Indexer Indexer
Join
Join
Antijoin
When a production node reachedan ACK message is sent back Stack added to each update msg
• Registers the Rete nodes themessage passes through
User retrievesquery result
IncQuery-D Architectural Layers• Gremlin, Cypher
• SPARQL
• IQPL (IncQuery)
• Gremlin, Cypher
• SPARQL
• IQPL (IncQuery)
High-LevelQuery Lang
• Distributed Indexers(MONDIX)
• SPARQL
• Distributed Indexers(MONDIX)
• SPARQL
Low-LevelQuery Lang
• Cayley
• Titan
• 4store
• Cayley
• Titan
• 4store
DistributedGraph DB
• MongoDB
• Cassandra
• 4store
• MongoDB
• Cassandra
• 4store
NativeStorage
• RDF
• XMI / Ecore
• Property Graphs
• RDF
• XMI / Ecore
• Property Graphs
Storage Format
• Efficient element access by indices• Local queries
• Global queries• Complex navigations
• Can be transparent (via indexers) • Integrates popular graph storages
• Efficient NoSQL storages• Triple stores
• Standardized data formats• Popular interchange formats
Summary: Key Components of IncQuery-D
DistributedModel Storage
• Adaptable todifferent back-end storages
• Agnostic tograph repres.
• TripleStores(RDF), EMF,Propertygraph
Model Access Adapter
• Surrogate keyto identifydistibutedelements
• Graph manip. API
• Changenotifications
DistributedIndexer
• Type-instanceindices, etc.
• Stored onmultipleservers
• Protectsexceedingmemory limits
DistributedQuery Evaluator
• DistributedRETE network
• Distributedterminationprotocol
• Constructedand deployedby coordinatornode
Decouple and separately distribute Storage, Indexer and Query layers
USAGE PHASES
LoadModel
Update Model
RequestResult
(1) Loading a Query
DeployRETE
RETE Network
AllocateRETE
CloudInfra-
structure
ConstructRETE
LoadQuery
Construct RETE• From EMF-IncQuery specs• Should incorporate
infrastructure constraints
Deploy RETE• Managed by a
coordinator node• Intelligent sharding of
RETE nodes
LoadModel
Update Model
RequestResult
(2) Loading a Model
Modelshards
DeployRETE
RETE Network
AllocateRETE
MaintainResult Set
CloudInfra-
structure
ConstructRETE
ModelAccess Adapter
LoadQuery
Load model• Model traversal• Init indexers• Network
communication
LoadModel
Update Model
RequestResult
(3) Updating a Model
Modelshards
DeployRETE
RETE Network
AllocateRETE
MaintainResult Set
CloudInfra-
structure
ConstructRETE
ModelAccess Adapter
LoadQuery
Model manipulation• Update messages• Create / Delete
LoadModel
Update Model
RequestResult
(4) Requesting Query Result
Modelshards
DeployRETE
RETE Network
AllocateRETE
EvaluateQuery
MaintainResult Set
CloudInfra-
structure
ConstructRETE
ModelAccess Adapter
LoadQuery
Evaluate query• Process incoming
messages• Propagate along
RETE network
Retrieve results• instantly
LoadModel
Update Model
RequestResult
(5) Monitoring and Reconfiguration
Modelshards
DeployRETE
RETE Network
AllocateRETE
EvaluateQuery
MaintainResult Set
Monitor & Manage
CloudInfra-
structure
ConstructRETE
ModelAccess Adapter
LoadQuery
Visualized on a web-based dashboard
OS metrics JVM metrics Rete metricsAkka metrics
DEPLOYMENT PROCESS FOR
DISTRIBUTED RETE
RETE Deployment Process
QueryLanguage
QueryPredicates
RETE Structure
Platform Description
Allocation / Mapping
DeploymentDescriptor
pattern routeSensor(sensor: Sensor) = {TrackElement.sensor(switch,sensor);Switch(switch);SwitchPosition. switch(sp, switch);SwitchPosition(sp);Route.switchPosition(route, sp);Route(route);neg find head(route, sensor);
}pattern head(R, Sen) = {
Route.routeDefinition(R, Sen);}
route: Route sp: SwitchPosition
Switch: Switchsensor: Sensor
switchPosition
switch
sensor
routeDefinition
Tooling: RDF Pattern Language
Vocabulary <railway.rdf>
base <http://www.semanticweb.org/ontologies/2011/1/TrainRequirementOntology.owl#>
pattern posLength(Segment, SegmentLength) {
Segment(Segment);
Segment_length(Segment, SegmentLength);
check("SegmentLength <= 0");
}
segment: Segment
segment.length ≤ 0
import "http://www.semanticweb.org/ontologies/2011/1/TrainRequirementOntology.owl"
pattern posLength(Segment, SegmentLength) {
Segment(Segment);
Segment.Segment_length(Segment, SegmentLength);
check(SegmentLength <= 0);
} EMF-IncQuery syntax
RDF-IncQuery syntax
Xbase (compiles to Java)
Javascript
RETE Deployment Process
� Construct language-independent constraints
� Resolution of o syntactic sugar
o type information
QueryLanguage
QueryPredicates
RETE Structure
Platform Description
Allocation / Mapping
DeploymentDescriptor
Variables routeroute spsp switchswitch
ParameterParameter sensor
Constraints
Edge: SwitchPosition.switch
Edge: TrackElement.sensor
Edge: Route.switchPosition
Negation: head
RETE Deployment Process
� Construct RETE structure(platform independently)
� Optimizations:
o Model statistics
o Expected usage profile
QueryLanguage
QueryPredicates
RETE Structure
Platform Description
Allocation / Mapping
DeploymentDescriptor
join
join
join
RETE Deployment Process
� Architecture model(Cloud infrastructure)
o Virtual Machines• Memory limits
• CPU speed
• Storage capacity
o Communication Channels• Bandwidth
� Specified by a textual DSL (Xtext)
QueryLanguage
QueryPredicates
RETE Structure
Platform Description
Allocation / Mapping
DeploymentDescriptor
1 2
3 4
RETE Deployment ProcessMachine Allocated Nodes
1 In1, In2, Join2
2 In3
3 In4
4 Join1, Join3
QueryLanguage
QueryPredicates
RETE Structure
Platform Description
Allocation / Mapping
DeploymentDescriptor
1 2
3 4
Join1
Join3
Join2
In1 In2 In3 In4
RETE Deployment Process
� Configuration scripts for
o Deployment
o Communicationmiddleware
� Derived by automatedcode generation
o Using Eclipse technology: EMF-IncQuery + Xtend
QueryLanguage
QueryPredicates
RETE Structure
Platform Description
Allocation / Mapping
DeploymentDescriptor
DISTRIBUTED
PERFORMANCE BENCHMARKS
The Train Benchmark� Model validation workload:
o User edits the modelo Instant validation of
well-formedness constraintso Model is repaired accordingly
� Scenario:o Loado Checko Edito Re-Check
� Models:o Randomly generatedo Close to real world instanceso Following different metricso Customized distributionso Low number of violations
� Queries:o Two simple queries
(<2 objects, attributes)o Two complex queries
(4-7 joins, negation, etc.)o Validated match sets
Incremental validationBatch validation
Instancemodel
Read Check Edit ReCheck����! �
100x
Evaluation of distributed scalability
� Extensions to previous work (single workstation)
o Generation of large instance models
o Distributed, parallel loading of models
o Distributed transformation and validation
Benchmark Distributed benchmark
Model size 1K – 13M 1K – 88M
Load method Batch Distributed, parallel
Transformation and validation Single workstation Multiple servers
� Load and first validation: load the graph to the databases and execute the query
� Transformation: query the graph and delete some elements
� Revalidation: execute the query
Batch graph scenarioIncremental scenario – IncQuery-D
� Load and first validation: load the graph to the databases and initialize the Rete net and retrieve the results
Transformation RevalidationGraphML
DB shards Result set
Rete net
Load and first validation
DB shards Result set
Rete net
� Revalidation: retrieve the results from the Rete net
� Transformation: incrementally query the graph and delete some elements, propagate the changes
Benchmark environment
� Private cloud
� Different DBMSs
� Query
o The DBMS’s own query language
o IncQuery-D
SPARQL Gremlin
1
4
16
64
256
1024
4096R
un
tim
e [s
]
Model size [million elements]
4store IncQuery-D Titan IncQuery-D 4store
Load and first validation
55M model: approx. 15 minutes
Rete network’s initialization
overhead pays off
1
2
4
8
16
32
64
128
256R
un
tim
e [s
]
Model size [million elements]
4store IncQuery-D Titan IncQuery-D 4store
Model modification
1. Elementary model query2. Model modification
– Query from the Rete network’s indexer– Propagation of modifications is fast
2 orders of magnitude
0,01
0,03
0,13
0,50
2,00
8,00
32,00
128,00
Ru
nti
me
[s]
Model size [million elements]
4store IncQuery-D Titan IncQuery-D 4store
Revalidation
memory limit
Sub-second response time for models with
88M elements
Different characteristics
Benchmarking Conclusions
� Memory consumption
o Single workstation: 13M model, 4 GB
o Cloud of four servers: 55M model, <4×8 GB
� Runtime
o Same order of magnitude and similar characteristics to the single workstation tool
INCQUERY-D is scalable and significantly more efficient for queryevaluation than the native query engines in 4store, Titan and Neo4j
Applications of Distributed Incremental Queries
• Query based optimistic locking
• Queries for Attribute Based Access Control
Collaborative Modeling (MONDO)
• Events vs. Changes
• Handle compound changes as events
Complex Event Processing w/
compound changes
• System evolves along operations
• Cost / Objectives associated to
• States + Environment + Trajectory
Rule-based Design Space Exploration
CONFIGURATION SYNTHESIS BY
DESIGN SPACE EXPLORATION
TRANS-IMA Project (Avionics)Goal: Allocate SW components toARINC653 compliant IMA platform
58
FunctionalArchitecture
Platform description
Componentdatabase
Allocation
IntegratedSystem Model
Inputs: • Platform Independent Model (PIM)(functional + nonfunc. reqs; Simulink)
• Platform Description Model (PDM) for ARINC 653 (DSL)
Output: • Integrated system model• Ready for simulation• End-to-end traceability
Supply fresh air
Supply hot air
Monitor temperature
Settemperature
Designing ARINC653 configurations
PackController
ZoneController
SW functionality(critical + non-critical)
3
System Display
AirCondPanel
3
Redundancyrequirement
Job instances, Partitions, Modules
PackController
ZoneController
SW functionality(critical + non-critical)
3
System Display
AirCondPanel
3
1
2 3
7
Job instances
4
5 6
8
Partitions
Modules
Constraints
2
5
3
4
8
8
8
8
Memory needs+ constraints
Do not mix criticaland non-crit. jobs
Do not mix instances of thesame critical job
Additional constraints• WCET,• scheduling, etc.• interfaces• datatypes
Allocating communication channels
PackController
ZoneController
SW functionality
3
System Display
AirCondPanel
3
1
2
3
7
4
5
6
8
Communicationchannels
Temperature
Pressure
Humidity
Design Space Exploration
62
Design Space Exploration
Design Alternative 1
Design Alternative 2
Design Alternative 3
Design Alternative 4
Objectives
Global Constraints
Initial Design
Solvers
• CLP solvers (Choco)• model finders (Alloy)• meta-heuristics + multi-objective optimization
Design Space Exploration (Example)
63
Consistency Analysis
Design Alternative 1
Design Alternative 2
Design Alternative 3
Design Alternative 4
Objectives
Global Constraints
Initial Design
A
Bx=2
C
AA A
Bx=?
C
I1 I2
Design Space Exploration (Example)
64
Consistency Analysis
Design Alternative 1
Design Alternative 2
Design Alternative 3
Design Alternative 4
Objectives
Global Constraints
Initial Design
A
Bx=2
C
A
A A
Bx=?
C
A A
Bx=5
Bx=2
C C
C
C
A
C
C1
C2
O
�
�
�
I1 I2
+ Objects
+ Relations
+ FilledAttributes
+ FilledAttributes
Rule Based Design Space Exploration
65
Design Space Exploration
Design Alternative 1
Design Alternative 2
Design Alternative 3
Design Alternative 4
Objectives
Global Constraints
Operations
Initial Design
Special state space exploration
• potentially infinite state space• „dense” solution space
Rule-Based Guided Design Space Exploration
66
Design Space Exploration
Seq of Transf. Rules 1
Seq of Transf. Rules 2
Seq of Transf.Rules 3
Seq of Transf.Rules 4
Model queriesas Objectives
Model queriesas Constraints
Transf. Rulesas Operations
InitialModel
Guidance for exploration: Hints
• designer / end user• formal analysis
Conclusions