a data lake and a data lab to optimize operations and safety within a nuclear fleet
TRANSCRIPT
A Data Lake and a Data Lab to Optimize Operations and Safety Within a Nuclear Fleet
Hadoop Summit 2016, San José, June 30th
Marie-Luce PICARD, EDF R&D – [email protected]
Jean-Marc RANGOD, EDF-DPNT Christophe SALPERWYCK, EDF R&D
Special thanks to Raphaël QUERCIA EDF-DTG, Carole MAI and Amandine PIERROT EDF R&D
2
Outline1. A FEW WORDS ABOUT EDF2. CONTEXT AND OBJECTIVES3. A DATA LAKE FOR A NUCLEAR FLEET4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS5. A DATA LAB IN PROGRESS6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
3
Outline1. A FEW WORDS ABOUT EDF2. CONTEXT AND OBJECTIVES3. A DATA LAKE FOR A NUCLEAR FLEET4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS5. A DATA LAB IN PROGRESS6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
4
ELECTRICITY GENERATION623.5 TWH
All electricity-related activitiesGenerationTransmission & DistributionTrading and Sales & MarketingEnergy services
Key figures* €72.9 billion in sales 38.5 million customers 158,161 employees worldwide 84.7% of generation does not emit CO2
2014 INVESTMENTS €4.5 BILLION
EDF: A GLOBAL LEADER IN ELECTRICITY
*as of 2015
EDF :AN EFFICIENT,
RESPONSIBLE ELECTRICITY COMPANY
AND THE CHAMPION OF LOW-CARBON
GROWTH
EDF 2015 I
WORLD’S LEADING OPERATOR, EXCELLENT PERFORMANCE IN FRANCE72.9 GW installed capacity, 54% of the Group’s net generation capacity
477.7 TWh generated, 77% of the Group’s output
58 reactors operated in France, 15 in the UK
3 EPR under construction: — 1 in Flamanville (France) — 2 in Taishan (China)
2 EPR in project phase
OSART safety audit17 best practices identified by IAEA
France Best generation performance for six years
UKWorld record for safety in the workplace
China Strengthened cooperation agreement with CNNC
NUCLEAR
P.5
R&D KEY FIGURES
Scientific partnerships with actors of Paris-Saclay
research departments8
exceptional buildings4
outstanding hall test1 Unique equipment,
innovative communication tools
Diverse areas of expertise
1500work stations
Plenty of collaborative spaces
EDF LAB PARIS-SACLAY
9
Main Big Data related challenges for EDFPower Generation
Process monitoring and condition-based maintenance from sensors
Power generation forecasting for renewables
Energy management Load forecasting Balancing and optimizing generation and consumption
(using smart metering information, including renewables)
Electrical networks Smart Grid operations (local) Condition-based maintenance
Customers and sales New services to customers using smart-metering data Smart Homes, Smart Building, Smart Cities management
related to energy
10
Outline1. A FEW WORDS ABOUT EDF2. CONTEXT AND OBJECTIVES3. A DATA LAKE FOR A NUCLEAR FLEET4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS5. A DATA LAB IN PROGRESS6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
11
Operations and maintenance of the nuclear fleet
The maintenance policy of EDF generation fleet is optimized to ensure reliability and safety of equipment and systems while strengthening our competitiveness: Have better diagnosis, improved performance and availability Make a better use of data and documents, so far stored into Data silos
More globally, the IT teams and projects aim at: Strengthen performance of operations and maintenance through a global fleet approach Simplify the Industrial Information System architecture Improve and develop the way we use our data Accumulate and archive data through time
… while reducing costs
12
Voluminous and heterogeneous data …. stored in data silos
Source : Wikipedia
One DB by nuclear site, gathering data from sensors. Use of Data Historians.
Focus on data: High volume:
data is stored up to 40-60 years (lifetime of the plant) SCADA data can be sampled every 20 to 40 ms (but mainly a few
seconds) Around 10.000 sensors per plant
Variety: Data is heterogeneous Time series, images, documents Various data sources
The actual systems (historians) don’t allow too many concurrent access, and their SLA are quite bad
13
A Data Lake for the nuclear fleet
ESPADON : the Data Lake for the nuclear fleet
One DB by nuclear site, gathering data from sensors. Use of Data Historians.
Source : Wikipedia
© M. Caraveo, Hadoop cluster NOE data center
14
Outline1. A FEW WORDS ABOUT EDF2. CONTEXT AND OBJECTIVES3. A DATA LAKE FOR A NUCLEAR FLEET4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS5. A DATA LAB IN PROGRESS6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
| 15
A data lake for the nuclear fleet: big picture
….
Files (chemical
information)
Historian - SCADA
Files (dosimetry)
E-monitoring application
Viz
Interactive queries and
reporting
ODBC
Web Service
Web Service
Hadoop cluster – ESPADON Data Lake
Reports
© M. Caraveo, Hadoop cluster NOE data center
16
Zoom on data 4 generations of plants, but high level of normalization of data and sensors (for
example, use of trigrams for identification of elementary systems) Two main types of sensors : ANA (for analogic) and TOR (for state events)
Time series
Volume For the POC, 10 plants, 2 years: about 20 billions of points Target (59 plants) : 15 To of data (all plants, whole lifecycle)
Metric, global Date Value QualityBU2ABP177MT- 2015-04-30T22:05:00.000Z 156.6 Good/MBU2ABP177MT- 2015-04-30T22:06:00.000Z 156.4 Good/MBU2ABP177MT- 2015-04-30T22:07:00.000Z 156.2 Good/MBU2ABP177MT- 2015-04-30T22:08:00.000Z 156.0 GoodBU2ABP177MT- 2015-04-30T22:09:00.000Z 156.2 Good/MBU2ABP177MT- 2015-04-30T22:10:00.000Z 156.4 Good/MBU2ABP177MT- 2015-04-30T22:12:00.000Z 156.7 Good/MBU2ABP177MT- 2015-04-30T22:14:00.000Z 157.1 GoodBU2ABP177MT- 2015-04-30T22:15:00.000Z 157.3 GoodBU2ABP177MT- 2015-04-30T22:16:00.000Z 157.5 GoodBU2ABP177MT- 2015-04-30T22:19:00.000Z 157.3 Good/MBU2ABP177MT- 2015-04-30T22:20:00.000Z 157.1 Good/MBU2ABP177MT- 2015-04-30T22:21:00.000Z 157.3 Good/MBU2ABP177MT- 2015-04-30T22:22:00.000Z 157.1 Good/MBU2ABP177MT- 2015-04-30T22:24:00.000Z 156.9 Good/MBU2ABP177MT- 2015-04-30T22:27:00.000Z 157.1 Good/MBU2ABP177MT- 2015-04-30T22:28:00.000Z 157.3 Good/MBU2ABP177MT- 2015-04-30T22:29:00.000Z 157.5 Good/MBU2ABP177MT- 2015-04-30T22:30:00.000Z 157.7 Good/M
17
Data model
Use of HBASE and PHOENIX Distributed key/values store Allows models update (normalization requirements evolution, new indicators… new plants) Phoenix for SQL compliance + BI tools
Tables 3 tables : DDT, ANA, TOR Rowkey : <sensorid, timestamp> (queries mainly consider one or several sensors for a period of time) Sequential storage ; split into Hfiles and Hregion according to the plant unit
Clé ColumnFamily Colonne Valeur Phoenix typem (concat(metriqueid, timestamp))
0 v H_ValeurANA Floatq H_QualitéANA Char(10)n H_NiveauxANA varchar(10)
Clé ColumnFamily Colonne Valeur Phoenix typem (concat(metriqueid, timestamp))
0 v H_ValeurTOR Varchar(10)q H_QualiteTOR Char(10)n H_NiveauxTOR Varchar(10)
18
Validation and performances evaluation
POC validation Upload of historical data; queries / analyses Existing functions: viz, reports, services Data injection: SCADA for the whole fleet,
integration of other sources of data
Results 6 weeks (estimated) needed to upload historical data
from 59 plants Queries for validating the model :
Use of Jmeter for simulating load With or without insertion workload ~ < 1 second for drawing a curve for a selected month
Integration of an existing GUI for viz (realized within a few days)
Validation of specific calculation within reports ODBC link for specific e-monitoring application Integration of various sources of (structured) data into
the data lake ‘Real-time’ insertion of data (micro-batch):
Up to 2M points / s Very low latency between insertion and availability (< 10s)
SELECT MIN(v), MAX(v), FIRST_VALUE(v) WITHIN GROUP (ORDER BY ts
ASC),LAST_VALUE(v) WITHIN GROUP (ORDER BY ts ASC),
TO_CHAR(ts, 'dd') as day, TO_CHAR(ts, 'HH') as hour,
TO_CHAR(ts, 'mm') as minute,count(*) as cnt
FROMORLI_ANA
WHERE m = ? AND
ts > current_time()-1 AND //last 24hts < current_time() GROUP BY
day, hour, minute
Phoenix query (ANA)
19
Outline1. A FEW WORDS ABOUT EDF2. CONTEXT AND OBJECTIVES3. A DATA LAKE FOR A NUCLEAR FLEET4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS5. A DATA LAB IN PROGRESS6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
20
Added value of data science algorithms on heterogeneous data: Operations and maintenance can be better optimized through data analytics run on data coming from the whole fleet
Active and reactive power are indicators of constraints on alternators: effect on their wears
• ~ 50 plants• 20 years of data• 10 min interval data
• Phoenix queries allow to select plants and periods of time• Compute and show reactive power per day or per hour of the
day• More detailed analysis • Fleet level analysis• Interactive queries
21
Added value of data science algorithms on heterogeneous data: Operations and maintenance can be better optimized through data analytics run on data coming from the whole fleet
Monitoring and control of contractual agreements when network frequency varies (plants have to contribute to the global balance)
• Pattern matching• Response time for different plants
• Different levels of analysis : by plant, by generation, global
• Generic approach implemented for any kind of patterns
22
Added value of data science algorithms on heterogeneous data
Prediction of plants cooling according to the quality of incoming water in the plants
• Correlations?• According to the plants• Use of GAM models
• Integration of two internal sources + external data
• Better understanding• // Work in progress //
23
Integration of data science and visualization: architecture
Hadoop Cluster Web Service REST(VM)
Browser
24
Integration of data science: a global approach
Pre-processing
Data qualitySamplingSynchronization…
Selection and queries
ThresholdPattern matchingPeriod of time…
Analysis and data science
ReportingExploratory analysis (distribution …)Modelling …
25
Outline1. A FEW WORDS ABOUT EDF2. CONTEXT AND OBJECTIVES3. A DATA LAKE FOR A NUCLEAR FLEET4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS5. A DATA LAB IN PROGRESS6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
26
A Data Lab in progress: a team, an approach … … and some questions
Objectives: Bring value from data analytics
Issues: Skills and organization (between entities) Architecture : Operational Hadoop cluster and loads (use of a multitenant
enterprise cluster) Other loads (data science) Data prep within Hadoop + edge machine for data science (Spark, R,
Python) How to quantify value Developments costs and maintenance How to industrialize
Source: Xebia
27
Outline1. A FEW WORDS ABOUT EDF2. CONTEXT AND OBJECTIVES3. A DATA LAKE FOR A NUCLEAR FLEET4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS5. A DATA LAB IN PROGRESS6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
28
Takeaways A Data Lake for our nuclear fleet
In progress : industrialization and decommissioning of Historian applications Great reduction of licensing costs
A Data Lab under construction POCs showing the added value of data science algorithms
predictive maintenance In the context of fleet renovation for plant life extension (major overhaul program): operations & maintenance, generation
costs optimization Issues remaining : skills, organization, technical architecture, quantify value
Perspectives and technical issues: Data lakes and labs for other fleets (thermal plants, hydro, renewables) Scalable time-series analytics (synchronization, missing data …) Handling heterogeneous data (textual, images, graphs …) IoT platform
References
A proof of concept with Hadoop: storage and analytics of electrical time-series. Marie-Luce Picard, Bruno Jacquin, Hadoop Summit 2012, Californie, USA, June 2012: http://www.slideshare.net/Hadoop_Summit/proof-of-concent-with-hadoop
Massive Smart Meter Data Storage and Processing on top of Hadoop. Leeley D. P. dos Santos, Alzennyr G. da Silva, Bruno Jacquin, Marie-Luce Picard, David Worms,Charles Bernard. Workshop Big Data 2012, Conférence VLDB (Very Large Data Bases), Istanbul, Turquie, 2012: http://www.cse.buffalo.edu/faculty/tkosar/bigdata2012/program.php
Searching time-series with Hadoop in an electric power company. Alice Bérard, Georges Hébrail, BigMine Workshop, KDD2013, Chicago, August 2013: http://bigdata-mining.org/
Real-time energy data-analytics with Storm.Rémy Saissy, Marie-Luce Picard, Charles Bernard, Bruno Jacquin, Simon Maby, Benoît Grossin, Hadoop Summit 2014, Californie, USA, June 2014: http://fr.slideshare.net/Hadoop_Summit/t-525p212picard
Computing Data Quality Indicators on Big Data Stream Using a CEPWenlu Yang, Alzennyr Gomes Da Silva, Marie-Luce Picard, IEEE Xplore - IWCIM 2015, Prague, Novembre 2015.
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical NetworkGuillaume Germaine, Thomas Vial, Hadoop Summit Europe 2016, Dublin http://www.slideshare.net/HadoopSummit/exploring-titan-and-spark-graphx-for-analyzing-timevarying-electrical-networks