-
NextGeneration Archiver
Jakub GuzikRafał Kułaga
-
Agenda
• Short recap on project goals and architecture
• Status of the Frontend and the Backends
• Pilot deployments
• Large-scale performance and scalability tests
• Migration path from RDB Archiving to NGA
• Changes to the RDB Archiver schema
2
-
Recap
• New archiver for WinCC OA, supporting backends for different database technologies
• Developed by CERN and ETM in the scope of openlab
• Version 1.0 of the archiver is planned to be made available for selected customers by ETM in Q2 2019
• Requires WinCC OA 3.16 (or higher)
3
-
NextGen Archiver Frontend
• Uses WinCC OA C++ API and hides it from backend developer
• Can handle multiple backends (deployed as standalone processes or in-proc)
• Multiple archive classes per DPE
• Cannot be run in the same project with ValArch and RDB because it stores whole archive configuration
• Supports: reading and writing values and alarms, buffering, split answers, groups, smoothing and more
• Lacks: redundancy, direct queries (works ongoing to provide it for v1)
4
-
InfluxDB backend
• Developed by ETM
• Most advanced development: dpQueries as well as other retrieval functions, full writing and metadata support, dynvalues, groups, alarms and more
• Developed also as replacement for ValArch
• Tested and proved to be working with CERN’s central installations
• We see InfluxDB backend as a potential secondary solution for some of the systems (protoDUNE experiment)
5
-
Oracle backend
• The backend is developed from scratch by CERN and ETM• When ready, the backend will be taken over and maintained by ETM
• Compatibility with the RDB Archiver schema or a simple and quick migration path is the top priority• Changes to the schema made at CMS will be supported
• Oracle Backend development a little behind in comparison to InfluxDB (due to compatibility issues), it lacks: dyn values handling, full dpQuery support, full metadata and multi-group support
6
-
Kudu backend
• Kudu is a distributed storage for structured data (tables) striking a good balance between real-time and batch processing
• First prototype ready, but developement has been put on hold in order to focus on the Oracle backend
• Openlab Summer Student project –Rishi Shah• Goal: evaluate write & read performance of
Kudu on real data coming from RDB Archiver schemas
• Two tools were developed to transfer data from RDB Archiver schemas to Kudu and execute query tests on Oracle and Kudu schemas
• We hope to be able to run the tests this Winter to obtain representative performance figures
7
-
ALICE O2 streaming backend
• Custom backend developed at ALICE, streaming value changes to an external process which includes them in the physics data stream
• ALICE reports positive experience with writing a custom backend
• NGA meets performance requirements: tested with 5000 DPs changing at 1 Hz
• We are right now starting a sprint to deliver an initial, but tested version of the Oracle backend and schema upgrade scripts in order to start testing it in selected systems at ALICE
• NGA will have to be ready to be deployment in ALICE production systems by September 2019
8
-
protoDUNE: motivation
9
• Possibility to deliver modern databasing for a new experiment
• Desire to use widely recognized platforms for analytics and monitoring
• Requirement to keep Oracle as primary database solution
-
protoDUNE: follow up
10
• June: Evaluation of NGA potential by dedicated person from protoDUNE
• August: Most of problems regarding stability and performance are resolved
• September: ProtoDUNE people agreed to use NGA as a secondary archiving solution
• October: NGA is deployed together with InfluxDB and Grafanaas shadow project
-
protoDUNE: summary
11
• ProtoDUNE shadow archiving project is running with InfluxDB
• Next step is to install Oracle backend
• NGA selected as shadow archiving solution due to prototype (alpha, beta) character of the software at that time
• ProtoDUNE already profits from the new technology (using Grafana to analyze data)
• Due to protoDUNE involvement we gained real testing ground for new Archiver
-
Radiation Monitoring (REMUS) – streaming data to Kafka
• REMUS team is also looking for a solution for streaming data from WinCC OA to external applications
• Apache Kafka is a distributed streaming platform that seems to be well suited for this task and is supported at CERN
• REMUS team will evaluate the NGA to see if writing the functionality as a backend is the way to go • Alternative: writing a C++ API manager or a driver
12
Source: https://kafka.apache.org/
-
Large-scale query performance tests
• Goal: evaluate performance and scalability of the architecture without direct queries at the scale of CERN distributed systems (up to 200 nodes)
• Testing method:• Large distributed system created using scripts
• Readers from all systems querying one selected master system with the NGA running (Frontend + TestBackend)
• TestBackend – simulates very fast data storage and returns answers generated based on queried fake DPs:
dist_1:;;;;
13
-
Large-scale query performance tests – architecture
14
Event
Manager
Data
Manager
NGA
Frontend
DIST Manager
NGA
Backend
CTRL/UI
Manager 1
CTRL/UI
Manager R
...
dist_1
Event
Manager
Data
Manager
DIST Manager
CTRL/UI
Manager 1
CTRL/UI
Manager R
...
dist_2
... dist_N
Distributed system with N nodes, each with R query runners
Query runners from all systems query dist_1 system with the NGA running (Frontend + TestBackend)
-
Large-scale query performance tests – resultsTest settings:• Each query runner requests 100,000 value changes in one dpQuerySplit(), no delay
between requests, batch (split) size is 1000 value changes
Conclusions:• NGA Frontend is the bottleneck• Unacceptably high load on the core managers: Data / Event + DIST
Solution:
Direct query functionality will be added to the NGA, with backends loaded as plugins into UI/CTRL managers
15
SystemsQueryRunners per
systemThroughput (changes/s)
Event (%CPU)
Data(%CPU)
DIST (%CPU)
Frontend(%CPU)
Test Backend (%CPU)
1 4 396,000 0.1 48.3 0 134.5 53.9
1 16 441,200 0.3 55.7 0 145.7 61.6
8 4 535,100 52.4 14.9 48.3 149.9 66.4
8 16 487,200 55.7 11 51.2 145.9 64.6
-
Large-scale tests with real backends
• Preparing the tests with real backends to ensure:• Appropriate performance of the backends (writing and reading)
• Long-term stability
• Handling of distributed systems and redundancy
• Recovery from disconnections and crashes
• Correctness of archived and retrieved data (to complement testing performed at ETM)
• The tests will be performed on both Linux and Windows
16
-
Planned NGA migration and rollback paths
17
WinCC OA 3.15+
RDB Archiver
Schema version 8.9 CERN 1.x
WinCC OA 3.16+
NGAOracle backend
Schema version 8.17 CERN 1.7
WinCC OA 3.16+
RDB Archiver
Schema version 8.9 CERN 1.x
/8.17 CERN 1.7
WinCC OA version upgrade
Schema upgradeProject adjustments (done by a
fw component)
Use of new backends
Project-side (configuration) rollback onlyNGA schema is compatible with the 3.16 RDB Archiver
Data archived with NGA is still available after the rollback
Access to archived data is preserved
-
NGA-compatible version of the Oracle schema
• Priorities: 1. Compatibility with data written so far
2. Rollback to RDB Archiver possible, preserving data archived with NGA
3. Integration of the functionality of fwRDBAPI in the schema
• So far, the differences between schema versions 8.9 and 8.17 are relatively minor:• Some changes in metadata management functions
• Optional column SYS_TIME added to ALERT history tables
• Version 8.17 of the schema is planned to be relased by the end of Q1 2019• Our goal is to make this schema version compatible with the RDB Archiver from WinCC OA 3.16/3.17
• Integration of the functionality of fwRDBAPI in the schema
• Support partitioning of ALERT history tables on timestamp (performance improvement)
• For the CMS schema, a reference version with all the necessary changes will be provided – review of the changes and testing should be joint effort of BE-ICS and CMS
18
-
Inventory of WinCC OA Oracle archiving schemas at CERN
• Experience shows that schema upgrades are often problematic, mainly due to discrepancies in schema configuration, manual changes performed over the years and different migration paths
• In order to reduce the risk of failed schema upgrades, a campaign of schema inventoring and verification is needed• We intend to develop automatic tools for
that, but it is a non-trivial task
• The goal is to enter Run 3 with schema aligned schema versions and no discrepancies, fulfilling a long-standing requirement from JCOP
20
List of discrepancies to check
• Versions (schema/fwRDBAPI)• Account privileges• Schema configuration (ARC_CONFIG)• Archive group configuration• Existing archives• Configuration of partitioning• Indexes and constraints on tables• Triggers• Jobs • PL/SQL packages and functions• Inconsistencies in metadata history• ...
-
Conclusions• NextGen Archiver requires upgrade to WinCC OA 3.16 (or higher) and cannot
be used in parallel with the RDB Archiving Manager
• RDB Archiving Manager and Value Archives are eventually going to be deprecated by ETM, sometime after the release of v2 of the NGA
• Migration and rollback paths are well understood, including testing effor required to validate them
• We are aiming at CERN-wide alignment of schemas at the NGA-compatible version by the end of LS2, with all discrepancies and artifacts of previous upgrades fixed
• Many thanks for ALICE and protoDUNE for their involvement in the project as early adopters – others are encouraged to join the effort!
A lot of work remains to be done, but the development is on right track
21
-
Thank you for your attention!
Any questions?
22
-
Backup slides
Query and response routing in local and distributed systems (without direct queries)
23
-
Message routing without direct queries – local system
24
Event
Manager
Data
Manager
NGA
Frontend
DIST Manager
NGA
Backend
CTRL/UI
Manager 1
CTRL/UI
Manager R
...
query
response
query
query
response
response
query
response
Local queries:
dist_1
-
Message routing without direct queries – DIST system
25
Event
Manager
Data
Manager
NGA
Frontend
DIST Manager
NGA
Backend
CTRL/UI
Manager 1
CTRL/UI
Manager R
...
query
response
DIST queries:
dist_1
Event
Manager
Data
Manager
DIST Manager
CTRL/UI
Manager 1
CTRL/UI
Manager R
...
dist_2
... dist_Nquery
query
query
query
query
query response
response
response
response
response
response