data handling system (dhs)€¦ · bulk data transport (bdt) manages large-volume, high-rate data...
TRANSCRIPT
Data Handling System (DHS)
October 3, 2011
Bruce Cowan
DHS Responsibilities
● Receive science data from ATST virtual cameras
● Route data appropriately
● Provide means to process science data to reduce its volume
● Provide means of producing and displaying quality assurance data
● Receive header info from all ATST systems & associate with corresponding
science data
● Retain calibration data for subsequent use
● Distribute science data and header info for external use
4.3-2100/2110
Basic DHS Structure
Virtual Camer
a 1
Camera Line 1 Storage
Header DB
Calibration
DB
Virtual Camer
a 2
Camera Line 2 Storage
Virtual Camer
a X
Camera Line X Storage
Da
ta D
istrib
utio
n
4.3-22XX
Basic DHS Structure
Virtual Camer
a 1
Camera Line 1 Storage
Header DB
Calibration
DB
VC 2
Camera Line 2 Storage
Virtual Camer
a X
Camera Line X Storage
Da
ta D
istrib
utio
n
VC 3
VC 4
Virtual Camera Line 2
Virtual Camera Line 3
Virtual Camera Line 4
Virtual Camera Line 1
4.3-22XX
Configurable Data Routing
Virtu
al C
am
era
s
Sto
rage
4.3-2230/2370
Configurable Data Routing
Virtu
al C
am
era
s
Sto
rag
e
4.3-2230/2370
Data Routing
● Flexibility achieved using RTI DDS publish/subscribe over multicast
● Applications (components) “discover” each other using multicast without
having to have a data path predefined for each sender. A sender simply
declares it will “publish” data of a certain topic/size, and all receivers
“subscribing” to that topic/size will automatically receive any messages that
are published.
● Multicast provides reduced network traffic compared to unicast (point-to-
point). A single message can be received by multiple clients.
4.3-2230/2370
Data Routing - Unicast
Node 1
Switch
Node 2
Sender
Node 3
Unicast: A separate message must be specifically sent to each, so network traffic increases proportionally with number of recipients. The sender must know the addresses of all receivers.
Data Routing - Multicast
Node 1
Switch
Node 2
Sender
Node 3
Multicast: A single message is sent to a “group” address, and the switch makes a copy to send to all machines that have joined the group. Interested receivers just register with the switch to be included.
DHS Physical Overview
DHS Software Layers
DHS Applications
Data Processing
Pipeline (DPP)
Quality Assurance
System (QAS)
Data Storage +
Distribution (DSD)
Databases / Data Stores
Bulk Data Transport (BDT)
Bulk Data Transport (BDT)
● Manages large-volume, high-rate data flows within the DHS
● Provides a way of creating a Camera Line, by defining a data route through a
sequence of computer nodes. This route may be different from one experiment to
the next.
● Two classes of data
● Science data (up to 960 MB/s for each camera line)
● Quality assurance data (much slower, < 100 MB/s)
● Data routing requirements accomplished using publish/subscribe abilities of Data
Distribution Service (DDS) middleware. Science data will flow over 10 GbE, and
QA data over standard gigabit.
4.3-22XX
BDT 10GbE Sustained Speed Test
Simulated camera line - Various data sizes
VBI's 32MB @ 30Hz requirement drove BDT development, as evidenced by the 32MB “sweet spot” in the table.
4.3-2240/2440
Quality Assurance Support (QAS)
● Allow system users to view/analyze data as it is collected to enable them to adjust
the ATST system to improve data quality
● Two aspects to quality assurance
● Quick Look Display - Near real-time display of raw data with simple image
manipulation capability.
● Detailed Display – More thorough checking of data with data-specific
processing that may introduce some delay. This processing is accomplished
via plugins supplied by the instrument manufacturers, which are inserted into
the data stream leading to the display.
● The BDT data stream is accessed by insertion of a QAS Probe, which will send
sampled data to an associated QAS Sink for display. The current baseline choice
for display functionality is DS9.
4.3-2505/2510
Quality Assurance System
Transfer Node
QL Display
QAS Probe
QAS Sink
Gigabit
Processing
Node
QAS Probe
Detailed Display
Instrument
Developer
Plugin
QAS Sink
Gigabit
4.3-2505/2510
Data Storage and Delivery (DSD)
● Acceptance of science information from source application nodes and the
persistence of that information to storage media
● Long term storage of calibration data needed for summit processing
● Long term storage of engineering data and system logs
● Repackaging and distribution of science data and information to external sites
4.3-26XX
Data Processing Pipeline (DPP)
● Allows for the insertion of extra steps into the BDT data stream. This can be done
to either:
● Sample data from the primary stream without affecting the data within it. This
is the mechanism used by the Quality Assurance System to access the
stream for display purposes.
● Modify data available to all downstream components. Some instruments,
such as the VBI, need significant processing resources to reduce the raw
data into a product that can be transferred off-site in a timely fashion.
● Each Camera Line is capable of having its own independent DPP.
● The DPP is not responsible for the algorithms employed in the processing, but
manages a plugin's access to the data stream.
4.3-23XX
DPP Configuration
DataProcessingComponent/QasDDProbeComponent configuration attributes:
● dhs.cameraLine – camera line number
● dhs.topicName – subscribed topic name
● dhs.maxData – subscribed maximum data size (in bytes)
● dhs.qas.ddHandlerClass – the full class name of the IDataHandler plugin
●
● dhs.repubTopicName – the topic name under which to publish the plugin’s data product
● dhs.maxPluginData – the maximum data size (in bytes) of the plugin’s published data
The IDataHandler plugin may require configuration. In addition to any custom, implementation specific
attributes, the following are available for plugins that need to subscribe to more than the one primary data
stream:
● dhs.subtopic.cameraLine.# - the subtopic’s camera line
● dhs.subtopic.maxData.# - the maximum data size (in bytes) for the subtopic
● dhs.subtopic.name.# - the subtopic’s topic name
● dhs.subtopic.key.# - if > 1 subtopic, the subtopic “key”
Allowing for multiple subtopics, the “#” must be substituted with an unbroken numerical sequence starting with
“1” for the first subtopic.
4.3-2330
IDataHandler Interface (for Plugins)
// Allow plugin opportunity to allocate resources
public void onDoInit()
// Allows the plugin to receive the component's doSet() attributes
public void set(IAttributeTable table)
// Get array of event names this plugin wishes to subscribe to
public String[] getEventNames()
// Notify receipt of subscribed event
public void eventNotify(String eventName, IAttributeTable eventValue)
// Receives the data buffer, plus the publisher on which to optionally publish the new data product.
public void process(IBdtBuffer buffer, IBdtPublisher pub)
// If subscriptions to other BDT streams have been configured, this method delivers the incoming content. The
// "key" is a string specified in the configuration to help distinguish multiple subtopics. It may also be null if there
// is only one.
public void subTopicReceive(String key, IBdtBuffer buffer)
// Allow plugin opportunity to de-allocate resources
public void onDoUninit()
4.3-2330
Camera Line Detail 1
Virtual Camera
Transfer Store
Zero or more
Transfer Node
Processing Node
Quick Look Display
Detailed Display
Camera Store
Camera Line Detail 2
Virtual Camera
Pub
Transfer Store
Transfer Node
Pub
Sub
QAS Probe
Sub
Pub
data
Quick Look Display
QAS Sink
Sub
Zero or more
Processing Node Sub
Pub
QAS Probe
Sub
Pub
data
Detailed Display
QAS Sink
Sub
Camera Store
Sub
Sub
DSD Data Flow 4.3-26XX
Data Transfer to ATST Base Facility
TeraPac3
Portable Rugged Hard Drive Array Briefcase
● 8 x 3.5” SATA drives provide up to 24TB with currently available drives. ● Pelican water tight roller case for transport
DHS Hardware
● Storage technologies are expected to evolve before final deployment of the DHS, but its
design can be extrapolated from currently available hardware, and proposed next gen.
● Drive Controller – SAS 2.0
● 8 ports operating at 600 MB/s per port, v3.0 will be 1.2 GB/s per port
● LSI SAS 9269-8i benchmarked at 2875 MB/s read, 1800 MB/s write
● Host Bus – PCI-Express v2.0
● 500 MB/s per lane vs 250 MB/s for v1.0. (SAS 2.0 can't run full speed)
● Computers with PCIe v3.0 (1 GB/s per lane) now being sold.
● Network Connectivity – 10 GbE
● Storage – Solid State Drives (SSD)
● Too costly now, but SSD enterprise arrays already at 12.8 GB/s read, 9.7 GB/s write
Mini DHS - Functionality
● Test platform for instrument/camera developers
● Receive and store data from a single Virtual Camera via 10 GbE
● May not provide the full-speed capture (960 MB/s) of a full-blown DHS
Camera Line, but should be close. Capture duration will be limited due to
drive array capacity.
● Fixed QAS Probe monitors camera data stream, and publishes QAS Display data
to gigabit port.
● Receive and store camera header data
● Access Portal accepts HTTP requests to build FITS file exports from camera and
header data
● Supports routing data through an external Processing Node for DPPS testing.
Mini DHS - Proposed Hardware
● Intel Xeon X5660 processor
● 2.8 GHz, 12 MB cache, 6 cores + hyper-threading
● Motherboard upgradeable to second CPU
● 24 GB DDR3 1333 MHz RAM
● Motherboard upgradeable to 48 GB
● 9 x 300/600 GB Seagate Cheetah 15K.7 SAS 2.0 drives
● 1 drive for O/S+CSF/ICE/etc, 8 in RAID 0 array for fast storage
● LSI SAS9260-8i SAS 2.0 PCI-Express 2.0 RAID controller
● Intel X520-DA2 10-GbE dual-port PCI-Express 2.0 network card
● Dual-port gigabit ethernet on motherboard
Compliance Matrix
● Separate file...