updates from database services at cern andrei dumitru cern it department / database services

Updates from Database Services at CERN

Andrei DumitruCERN IT Department / Database Services

Credit: Mariusz Piorkowski

4

Databases at CERN~100 Oracle databases, most of them RAC• Mostly NAS storage plus some SAN with ASM• More than 500 TB of data files for production DBs in total

Example of critical production DBs:• LHC logging database ~170 TB, expected growth up to ~70 TB / year

But also as DBaaS, as single instances• MySQL Open community databases• PostgreSQL databases• Oracle 11g

5

Accelerator logging~ 95% data reductionMillions of records/min

~ 250000 signals~ 15 data loading processes

~ 5.5 billion records/day~ 275 GB/day 100 TB/year

throughput

~ 1 million signals~ 300 data loading processes

~ 4 billion records/day~ 160 GB/day 52 TB/year stored

Credit: C. Roderick

6

Administrative systems• All applications are based on Oracle as database• Oracle Weblogic server manages numerous HR and administrative

Java-based web applications used by CERN• Java EE and Apex• Oracle HR is E-Business Suite R12

7

Engineering & equipment data• Managing a million components over a lifecycle of 30 years• Integrated PLM platform by linking together commercial systems

3D CADCATIA

Design datamanagement

Manufacturing follow-up

Installationfollow-up

Maintenancemanagement

Data publishing

Workflow actions

Credit: D. Widegren

8

Experiment systems

Online

• Data-taking operations

• Rely on SCADA system to store and monitor the detector parameters

(temperatures, voltages, …)

• Up to 150.000 changes/second stored in Oracle databases

Offline

• Post data-taking analysis, reconstruction and reprocessing, logging of

critical operations, …

Database replication:

• Oracle Streams, Oracle Golden Gate, active standby databases

9

New DBs Services

QPSR• Quench Protection System• will store ~150 K rows/second (64GB per redo log)• 1 000 000 rows/second achieved during catchup tests • need to keep data for a few days (~ 50 TB)• Doubtful that previous HW could handle that

SCADAR• Consolidated WinCC/PVSS archive repository• Will store ~50-60K rows/second (may increase in the future)• the data retention varies depending on the application (from a few days

to 5 years)

10

Service LifecycleHW Migration

SW upgrade

Stable servicesDecomission

New systems Installation

Every ~3-4 years

11

Preparation for LHC Run 2

Requirement: changes had to fit LHC schedule

New HW installation on critical power• Decommission of some old HW• Critical power move from current location to new

location

Keep up with Oracle SW evolution

Applications’ evolution - more resources needed

Integration with Agile Infrastructure @CERN

LS1: no stop DB services

12

Service Evolution during LS1

Hardware evolution• New DB servers

Storage evolution • New generation of Storage servers

Refresh cycle of OS and OS related• Puppet • RHEL 6

Database Software evolution• Upgrade to newer Oracle version

13

Oracle Real Application Clusters - Overview

Operating System

Oracle RAC Instance 1

Operating System

Oracle RAC Instance 2

OracleClusterware

Public Network

a.k.a Cluster Interconnect

Oracle Database on Shared Storage

Node 1 Node 2

OracleClusterware

Storage Network

Private Network

Example of 2-node DB cluster

14

Our Deployment Model

Database Clusters with RAC Servers • Running Red Hat Enterprise Linux

Storage• NAS (Network-attached Storage) from NetApp• High capacity SATA + SSD cache

Network• 10 Gig Ethernet - for Storage, Interconnect, Users

Number of nodes: 2 – 5

15

New HardwareConsolidation of HW• 100 Production Servers

• Dual 8 core XEON e5-2650• 128GB/256GB RAM• 3x10Gb interfaces

Specific network requirements

1. IP1 (cs network)

2. ATLAS Pit

3. Technical network

4. Routed network accessible from outside of CERN,

5. Non-routed network only internal to CERNCredit: Paul Smith

Storage Evolution

NetApp FAS3240 NetApp FAS8060

NVRAM 1.0 GB 8.0 GB

System memory 8GB 64GB

CPU 1 x 64-bit 4-core 2.33 Ghz 2 x 64-bit 8-core 2.10 Ghz

SSD layer (maximum) 512GB 8TB

Aggregate size 180TB 400TB

OS controller Data ONTAP® 7-mode

Data ONTAP® C-mode*

16

scaling up

scaling out* Cluster made of 8 controllers. Shared with other services.Credit: Ruben Gaspar

17

Storage Evolution

Centrally managed storage • Monitored: Netapp + home made tools

Enables consolidation

Thin provisioning on file systems

Transparent volume move

More capacity for growth

More SSD => Performance gains for DB service

~2-3 times more of overall performanceCredit: Ruben Gaspar

18

Advantage of the New Hardware More memory & more CPU• MEM: RAM 48GB -> 128GB / 256GB• DB Cache: 20GB -> 86GB / 197GB

Faster storage • Storage cache

HW Migration

Available Software releases• Production databases on Oracle 11.2.0.3 before LHC LS1• All databases were upgraded and migrated to new hardware

Oracle 11g – version 11.2.0.4• Terminal patch set of Oracle 11g • Extended support ends January 2018

Oracle 12c - versions 12.1.0.1 and 12.1.0.2• First release of 12c and the subsequent patch set• Users of 12.1.0.1 will have to upgrade to 12.1.0.2 or higher by 2016

No current Oracle version fits well the entire LHC Run 2.

Preparation for LHC Run 2 - software

19

Consolidation Schema based consolidation• Many applications share the same RAC cluster• Consolidation - per customer and/or functionality

Host based consolidation• Run different DB services on the same machine• Support for different Oracle homes (versions) on the same host

RAC clusters • Load Balancing and possibility to growth• High Availability: cluster survives node failures• Maintenance: scheduled rolling interventions 20

21

ReplicationDisaster Recovery Data Guard Databases in Wigner

Active Data Guard available to users for read-only operations

Streams to Golden Gate migration completed!• Improved scalability – performance better than Streams• ATLAS ONLINE – OFFLINE (condDB)• ATLAS OFFLINE – TIER1s

• RAL• IN2P3• TRIUMF

• LHCb ONLINE-OFFLIN

22

Scalable DatabasesGoal: open and scalable analytic platform for the data currently stored in traditional databases

• LHClog / control systems archives / other monitoring and auditing systems

Solution - Hadoop cluster• shared nothing - scalable• open systems - many approaches on storing and processing the data

Conclusions• Data processing with Hadoop scales-out - no matter what engine you will use• Choosing the right data format for storing certain data is a key to deliver high

performance

All the details in “Evaluation of distributed open source solutions in CERN database use cases” by Kacper Surdy (Tuesday)

Database on Demandhttps://cern.ch/DBOnDemand

Database on DemandCovers a demand from CERN community not addressed by the Oracle service• Different RDBMS: MySQL, PostgreSQL and Oracle

Follows a DBaaS paradigm

Making users database owners - full DBA privileges

No access to underlying hardware

No DBA support or application support

No vendor support (except for Oracle)

Foreseen as single instance service

It provides tools to manage DBA actions: configuration, start/stop, upgrades, backups & recoveries, instance monitoring 24

25

Database on DemandEvolution of the amount of MySQL, Oracle and

PostgreSQL instances in the DBoD service

26

Database on Demand instances per Database Management System

27

Enhanced Monitoring now available

28

Acknowledgements

Work presented here on behalf of the• CERN Database Services group

In particular key contributions to this presentation from: Marcin Blaszczyk, Ruben Gaspar, Zbigniew Baranowski, Lorena Lobato Pardavila, Luca Canali, Eric Grancher

updates from database services at cern andrei dumitru cern it department / database services

Documents

oracle streams

oracle rac instance

widegren slide

roderick slide

apex oracle hr

newer oracle version

oracle golden gate

data retention