updates from database services at cern andrei dumitru cern it department / database services
TRANSCRIPT
Updates from Database Services at CERN
Andrei DumitruCERN IT Department / Database Services
Credit: Mariusz Piorkowski
4
Databases at CERN~100 Oracle databases, most of them RAC• Mostly NAS storage plus some SAN with ASM• More than 500 TB of data files for production DBs in total
Example of critical production DBs:• LHC logging database ~170 TB, expected growth up to ~70 TB / year
But also as DBaaS, as single instances• MySQL Open community databases• PostgreSQL databases• Oracle 11g
5
Accelerator logging~ 95% data reductionMillions of records/min
~ 250000 signals~ 15 data loading processes
~ 5.5 billion records/day~ 275 GB/day 100 TB/year
throughput
~ 1 million signals~ 300 data loading processes
~ 4 billion records/day~ 160 GB/day 52 TB/year stored
Credit: C. Roderick
6
Administrative systems• All applications are based on Oracle as database• Oracle Weblogic server manages numerous HR and administrative
Java-based web applications used by CERN• Java EE and Apex• Oracle HR is E-Business Suite R12
7
Engineering & equipment data• Managing a million components over a lifecycle of 30 years• Integrated PLM platform by linking together commercial systems
3D CADCATIA
Design datamanagement
Manufacturing follow-up
Installationfollow-up
Maintenancemanagement
Data publishing
Workflow actions
Credit: D. Widegren
8
Experiment systems
Online
• Data-taking operations
• Rely on SCADA system to store and monitor the detector parameters
(temperatures, voltages, …)
• Up to 150.000 changes/second stored in Oracle databases
Offline
• Post data-taking analysis, reconstruction and reprocessing, logging of
critical operations, …
Database replication:
• Oracle Streams, Oracle Golden Gate, active standby databases
9
New DBs Services
QPSR• Quench Protection System• will store ~150 K rows/second (64GB per redo log)• 1 000 000 rows/second achieved during catchup tests • need to keep data for a few days (~ 50 TB)• Doubtful that previous HW could handle that
SCADAR• Consolidated WinCC/PVSS archive repository• Will store ~50-60K rows/second (may increase in the future)• the data retention varies depending on the application (from a few days
to 5 years)
10
Service LifecycleHW Migration
SW upgrade
Stable servicesDecomission
New systems Installation
Every ~3-4 years
11
Preparation for LHC Run 2
Requirement: changes had to fit LHC schedule
New HW installation on critical power• Decommission of some old HW• Critical power move from current location to new
location
Keep up with Oracle SW evolution
Applications’ evolution - more resources needed
Integration with Agile Infrastructure @CERN
LS1: no stop DB services
12
Service Evolution during LS1
Hardware evolution• New DB servers
Storage evolution • New generation of Storage servers
Refresh cycle of OS and OS related• Puppet • RHEL 6
Database Software evolution• Upgrade to newer Oracle version
13
Oracle Real Application Clusters - Overview
Operating System
Oracle RAC Instance 1
Operating System
Oracle RAC Instance 2
OracleClusterware
Public Network
a.k.a Cluster Interconnect
Oracle Database on Shared Storage
Node 1 Node 2
OracleClusterware
Storage Network
Private Network
Example of 2-node DB cluster
14
Our Deployment Model
Database Clusters with RAC Servers • Running Red Hat Enterprise Linux
Storage• NAS (Network-attached Storage) from NetApp• High capacity SATA + SSD cache
Network• 10 Gig Ethernet - for Storage, Interconnect, Users
Number of nodes: 2 – 5
15
New HardwareConsolidation of HW• 100 Production Servers
• Dual 8 core XEON e5-2650• 128GB/256GB RAM• 3x10Gb interfaces
Specific network requirements
1. IP1 (cs network)
2. ATLAS Pit
3. Technical network
4. Routed network accessible from outside of CERN,
5. Non-routed network only internal to CERNCredit: Paul Smith
Storage Evolution
NetApp FAS3240 NetApp FAS8060
NVRAM 1.0 GB 8.0 GB
System memory 8GB 64GB
CPU 1 x 64-bit 4-core 2.33 Ghz 2 x 64-bit 8-core 2.10 Ghz
SSD layer (maximum) 512GB 8TB
Aggregate size 180TB 400TB
OS controller Data ONTAP® 7-mode
Data ONTAP® C-mode*
16
scaling up
scaling out* Cluster made of 8 controllers. Shared with other services.Credit: Ruben Gaspar
17
Storage Evolution
Centrally managed storage • Monitored: Netapp + home made tools
Enables consolidation
Thin provisioning on file systems
Transparent volume move
More capacity for growth
More SSD => Performance gains for DB service
~2-3 times more of overall performanceCredit: Ruben Gaspar
18
Advantage of the New Hardware More memory & more CPU• MEM: RAM 48GB -> 128GB / 256GB• DB Cache: 20GB -> 86GB / 197GB
Faster storage • Storage cache
HW Migration
Available Software releases• Production databases on Oracle 11.2.0.3 before LHC LS1• All databases were upgraded and migrated to new hardware
Oracle 11g – version 11.2.0.4• Terminal patch set of Oracle 11g • Extended support ends January 2018
Oracle 12c - versions 12.1.0.1 and 12.1.0.2• First release of 12c and the subsequent patch set• Users of 12.1.0.1 will have to upgrade to 12.1.0.2 or higher by 2016
No current Oracle version fits well the entire LHC Run 2.
Preparation for LHC Run 2 - software
19
Consolidation Schema based consolidation• Many applications share the same RAC cluster• Consolidation - per customer and/or functionality
Host based consolidation• Run different DB services on the same machine• Support for different Oracle homes (versions) on the same host
RAC clusters • Load Balancing and possibility to growth• High Availability: cluster survives node failures• Maintenance: scheduled rolling interventions 20
21
ReplicationDisaster Recovery Data Guard Databases in Wigner
Active Data Guard available to users for read-only operations
Streams to Golden Gate migration completed!• Improved scalability – performance better than Streams• ATLAS ONLINE – OFFLINE (condDB)• ATLAS OFFLINE – TIER1s
• RAL• IN2P3• TRIUMF
• LHCb ONLINE-OFFLIN
22
Scalable DatabasesGoal: open and scalable analytic platform for the data currently stored in traditional databases
• LHClog / control systems archives / other monitoring and auditing systems
Solution - Hadoop cluster• shared nothing - scalable• open systems - many approaches on storing and processing the data
Conclusions• Data processing with Hadoop scales-out - no matter what engine you will use• Choosing the right data format for storing certain data is a key to deliver high
performance
All the details in “Evaluation of distributed open source solutions in CERN database use cases” by Kacper Surdy (Tuesday)
Database on Demandhttps://cern.ch/DBOnDemand
Database on DemandCovers a demand from CERN community not addressed by the Oracle service• Different RDBMS: MySQL, PostgreSQL and Oracle
Follows a DBaaS paradigm
Making users database owners - full DBA privileges
No access to underlying hardware
No DBA support or application support
No vendor support (except for Oracle)
Foreseen as single instance service
It provides tools to manage DBA actions: configuration, start/stop, upgrades, backups & recoveries, instance monitoring 24
25
Database on DemandEvolution of the amount of MySQL, Oracle and
PostgreSQL instances in the DBoD service
26
Database on Demand instances per Database Management System
27
Enhanced Monitoring now available
28
Acknowledgements
Work presented here on behalf of the• CERN Database Services group
In particular key contributions to this presentation from: Marcin Blaszczyk, Ruben Gaspar, Zbigniew Baranowski, Lorena Lobato Pardavila, Luca Canali, Eric Grancher