nxcals development and its python api · data ingestion java api ready we store data from cmw...

9
NXCALS development and its Python API Nikolay Tsvetkov BE-CO-DS

Upload: others

Post on 04-Jun-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NXCALS development and its Python API · Data Ingestion Java API ready We store data from CMW devices Data Access API with Spark first version ready Any Spark analysis on top of Spark

NXCALS development and its Python API

Nikolay Tsvetkov

BE-CO-DS

Page 2: NXCALS development and its Python API · Data Ingestion Java API ready We store data from CMW devices Data Access API with Spark first version ready Any Spark analysis on top of Spark

NXCALS Architecture

Data Ingestion Data ExtractionHDFS

Storage

GobblinData Access

APIOther tools,

APIs, etc.

Hbase

Compactor

CCDB

Log. Proc.

Log. Proc.

Kafka

Schema PartitionProvider

21/04/2017 BE-CO 2

Page 3: NXCALS development and its Python API · Data Ingestion Java API ready We store data from CMW devices Data Access API with Spark first version ready Any Spark analysis on top of Spark

NXCALS Data Partitioning

HDFS

system/classifier/schema/date

Storage

• Data stored in Parquet file format of records {f1,f2,…,fn}• Accepts dynamic records of ANY content• Represents a change of “state” in time for some “entities”• Schema per entity CAN change over time

Compactor

Data Ingestion Data Extraction

21/04/2017 BE-CO 3

Page 4: NXCALS development and its Python API · Data Ingestion Java API ready We store data from CMW devices Data Access API with Spark first version ready Any Spark analysis on top of Spark

Data ExtractionHDFS

Storage

Data Access APIOther tools,

APIs, etc.

Hbase

Compactor

Schema PartitionProvider

NXCALS Data Access API

21/04/2017 BE-CO 4

Page 5: NXCALS development and its Python API · Data Ingestion Java API ready We store data from CMW devices Data Access API with Spark first version ready Any Spark analysis on top of Spark

Spark High-level Architecture

Worker Node

Executor

Task Task Task

Worker Node

Executor

Task Task Task

Worker Node

Executor

Task Task Task

Cluster Manager

Driver

SparkContext

21/04/2017 BE-CO 5

Page 6: NXCALS development and its Python API · Data Ingestion Java API ready We store data from CMW devices Data Access API with Spark first version ready Any Spark analysis on top of Spark

PySpark example

>>> query = DevicePropertyQueryBuilder().system("CMW") \

.device("BLM1").property("Acquisition") \

.start_time("2017-02-22 00:00:00.000") \

.end_time("2017-02-23 13:30:00.000").build_query()

>>> df = spark.read.options(**query.get_map()) \

.format("cern.accsoft.nxcals.data.access.api").load()

>>> df.count() # prints: Count: 21455

>>> df2 = df.select("acqStamp", "field1").where("field2 = 123")

# Collect to Pandas:

>>> pandas = df2.toPandas()

>>> pandas["field1"].plot()

NXCALS User Guide: https://wikis.cern.ch/display/CALS/NXCALS+-+Data+Access+User+Guide

21/04/2017 BE-CO 6

Page 7: NXCALS development and its Python API · Data Ingestion Java API ready We store data from CMW devices Data Access API with Spark first version ready Any Spark analysis on top of Spark

21/04/2017 BE-CO 7

Open in SWAN

Page 8: NXCALS development and its Python API · Data Ingestion Java API ready We store data from CMW devices Data Access API with Spark first version ready Any Spark analysis on top of Spark

NXCALS State?

Official dates : Ready by LS2 (Q1 2019)System development solid beta for Q1 2018

Data Ingestion Java API readyWe store data from CMW devices

Data Access API with Spark first version ready Any Spark analysis on top of Spark DataSets

Python binding & SWAN integration

What comes next (for analytics): Clients integration (business use-cases)

21/04/2017 BE-CO 8

Page 9: NXCALS development and its Python API · Data Ingestion Java API ready We store data from CMW devices Data Access API with Spark first version ready Any Spark analysis on top of Spark

Questions ?E-mail: [email protected]

21/04/2017 BE-CO 9