data placement: hpc and portals - sc18sc18.supercomputing.org/proceedings/panel/panel... · •...
Post on 04-Jun-2020
2 Views
Preview:
TRANSCRIPT
Data Placement: HPC and Portals
Eli Dart, Science EngagementEnergy Sciences Network (ESnet)Lawrence Berkeley National Laboratory
Panel: HPC in Cloud or Cloud in HPC
SC18, Dallas, TX
November 13, 2018
The Importance of Data Placement
• Data placement is a key capability• Whether running on HPC, a local cluster, or cloud, data must
be accessible to running code• Older methods of data placement don’t scale• Modern capabilities composed of several components– Architecture– Tools/Platforms– Interconnected facilities
11/19/182
Science DMZ in HPC Facility
© 2014, Energy Sciences Network3 – ESnet Science Engagement (engage@es.net) - 11/19/18
Routed
Border Router
WAN
Core Switch/Router
Firewall
Offices
perfSONAR
perfSONAR
perfSONAR
Supercomputer
Parallel Filesystem
Front endswitch
Data Transfer Nodes
Front endswitch
High Latency WAN Path
Low Latency LAN Path
http://fasterdata.es.net/science-dmz/
Building On The Science DMZ
• Enhanced cyberinfrastructure substrate now exists– Wide area networks (ESnet, GEANT, Internet2, Regionals)– Science DMZs connected to those networks– DTNs in the Science DMZs
• What does the scientist see?– Scientist sees a science application• Data transfer, Data portal, Data analysis• User interface is critical to productivity
– Science applications are the user interface to networks and DMZs• Large-scale data-intensive science requires that we build larger structures
on top of those components– Platforms scale better than tools– What does the scientist spend time on? Tool integration or science??
11/19/184
DTN Cluster Performance – HPC Facilities (2017)
21.2/22.6/24.5Gbps
23.1/33.7/39.7Gbps
26.7/34.7/39.9Gbps
33.2/43.4/50.3Gbps
35.9/39.0/40.7Gbps
29.9/33.1/35.5Gbps
34.6/47.5/56.8Gbps
44.1/46.8/48.4Gbps
41.0/42.2/43.9Gbps
33.0/35.0/37.8Gbps
43.0/50.0/56.3Gbps
55.4/56.7/57.4Gbps
DTN
DTN
DTN
DTN
NERSC DTN clusterGlobus endpoint: nersc#dtnFilesystem: /project
Data set: L380Files: 19260Directories: 211Other files: 0Total bytes: 4442781786482 (4.4T bytes)Smallest file: 0 bytes (0 bytes)Largest file: 11313896248 bytes (11G bytes)Size distribution:
1 - 10 bytes: 7 files10 - 100 bytes: 1 files100 - 1K bytes: 59 files1K - 10K bytes: 3170 files10K - 100K bytes: 1560 files100K - 1M bytes: 2817 files1M - 10M bytes: 3901 files10M - 100M bytes: 3800 files100M - 1G bytes: 2295 files1G - 10G bytes: 1647 files10G - 100G bytes: 3 files
Petascale DTN Project
November 2017L380 Data Set
Gigabits per second(min/avg/max), threetransfers
ALCF DTN clusterGlobus endpoint: alcf#dtn_miraFilesystem: /projects
OLCF DTN clusterGlobus endpoint: olcf#dtn_atlasFilesystem: atlas2
NCSA DTN clusterGlobus endpoint: ncsa#BlueWatersFilesystem: /scratch
11/19/185
Science Data Portals
• Large repositories of scientific data
– Climate data
– Sky surveys (astronomy, cosmology)
– Many others
– Data search, browsing, access
• Many scientific data portals were designed 15+ years ago
– Single-web-server design
– Data browse/search, data access, user awareness all in a single system
– All the data goes through the portal server
• In many cases by design
• E.g. embargo before publication (enforce access control)
11/19/186
Legacy Portal Design
10GE
Border Router
WAN
Firewall
Enterprise
perfSONAR
perfSONAR
Filesystem(data store)
10GE
Portal Server
Browsing pathQuery pathData path
Portal server applications:· web server· search· database· authentication· data service
11/19/187
• Very difficult to improve performance without architectural change– Software components all tangled
together– Difficult to put the whole portal in a
Science DMZ because of security– Even if you could put it in a DMZ, many
components aren’t scalable• What does architectural change mean?
Next-Generation Portal Leverages Science DMZ
10GE10GE
10GE
10GE
Border Router
WAN
Science DMZSwitch/Router
Firewall
Enterprise
perfSONAR
perfSONAR
10GE
10GE
10GE10GE
DTN
DTN
API DTNs(data access governed
by portal)
DTN
DTN
perfSONAR
Filesystem (data store)
10GE
Portal Server
Browsing pathQuery path
Portal server applications:· web server· search· database· authentication
Data Path
Data Transfer Path
Portal Query/Browse Path
11/19/188
https://peerj.com/articles/cs-144/
JGI Data Portal
11/19/189
11/19/1810
Links
– ESnet fasterdata knowledge base• http://fasterdata.es.net/
– Science DMZ paper• http://www.es.net/assets/pubs_presos/sc13sciDMZ-final.pdf
– Science DMZ email list• https://gab.es.net/mailman/listinfo/sciencedmz
– perfSONAR• http://fasterdata.es.net/performance-testing/perfsonar/• http://www.perfsonar.net
© 2014, Energy Sciences Network11 – ESnet Science Engagement (engage@es.net) - 11/19/18
Thanks!
Eli DartEnergy Sciences Network (ESnet)Lawrence Berkeley National Laboratory
http://my.es.net/
http://www.es.net/
http://fasterdata.es.net/
ESnet - the basic facts:High-speed international networking facility, optimized for data-intensive science: • connecting 50 labs, plants and facilities with >150 networks,
universities, research partners globally
• supporting every science office, and serving as an integral extension of many instruments
• 400Gbps transatlantic extension in production since Dec 2014• >1.3 Tbps of external connectivity, including high speed access
to commercial partners such as Amazon
• growing number university connections to better serve LHC science (and eventually: Belle II)
• older than commercial Internet, growing ~twice as fastAreas of strategic focus: software, science engagement.
• Engagement effort now 12% of staff • Software capability critical to next-generation network
13
Extra Slides – Science DMZ Talk
11/19/1814
Overview
• Science DMZ Motivation and Introduction
• Science DMZ Architecture
• Performance Monitoring
• Data Transfer Nodes & Applications
• Science DMZ Security
• Wrap Up
15 – ESnet Science Engagement (engage@es.net) - 11/19/18 © 2014, Energy Sciences Network
• Networks are an essential part of data-intensive science– Connect data sources to data analysis– Connect collaborators to each other– Enable machine-consumable interfaces to data and analysis resources
(e.g. portals), automation, scale
• Performance is critical– Exponential data growth– Constant human factors– Data movement and data analysis must keep up
• Effective use of wide area (long-haul) networks by scientists has historically been difficult
Motivation
© 2014, Energy Sciences Network16 – ESnet Science Engagement (engage@es.net) - 11/19/18
The Central Role of the Network
• The very structure of modern science assumes science networks exist: high performance, feature rich, global scope
• What is “The Network” anyway?– “The Network” is the set of devices and applications involved in the use of a
remote resource• This is not about supercomputer interconnects• This is about data flow from experiment to analysis, between facilities, etc.
– User interfaces for “The Network” – portal, data transfer tool, workflow engine– Therefore, servers and applications must also be considered
• What is important? Ordered list:1. Correctness2. Consistency3. Performance
© 2014, Energy Sciences Network17 – ESnet Science Engagement (engage@es.net) - 11/19/18
TCP – Ubiquitous and Fragile
• Networks provide connectivity between hosts – how do hosts see the network?– From an application’s perspective, the interface to “the other end” is a
socket– Communication is between applications – mostly over TCP
• TCP – the fragile workhorse– TCP is (for very good reasons) timid – packet loss is interpreted as
congestion– Packet loss in conjunction with latency is a performance killer– Like it or not, TCP is used for the vast majority of data transfer
applications (more than 95% of ESnet traffic is TCP)
© 2014, Energy Sciences Network18 – ESnet Science Engagement (engage@es.net) - 11/19/18
A small amount of packet loss makes a huge difference in TCP performance
Metro Area
Local(LAN)
Regional
Continental
International
Measured (TCP Reno) Measured (HTCP) Theoretical (TCP Reno) Measured (no loss)
With loss, high performance beyond metro distances is essentially impossible
© 2014, Energy Sciences Network19 – ESnet Science Engagement (engage@es.net) - 11/19/18
Working With TCP In Practice
• Far easier to support TCP than to fix TCP– People have been trying to fix TCP for years – only some success
• RFC1323• Buffer autotuning
– Like it or not we’re stuck with TCP in the general case
• Pragmatically speaking, we must accommodate TCP– Sufficient bandwidth to avoid congestion– Zero packet loss– Verifiable infrastructure
• Networks are complex• Must be able to locate problems quickly• Small footprint is a huge win – small number of devices so that problem
isolation is tractable
© 2014, Energy Sciences Network20 – ESnet Science Engagement (engage@es.net) - 11/19/18
Putting A Solution Together
• Effective support for TCP-based data transfer– Design for correct, consistent, high-performance operation– Design for ease of troubleshooting
• Easy adoption is critical– Large laboratories and universities have extensive IT deployments– Drastic change is prohibitively difficult
• Cybersecurity – defensible without compromising performance
• Borrow ideas from traditional network security– Traditional DMZ
• Separate enclave at network perimeter (“Demilitarized Zone”)• Specific location for external-facing services• Clean separation from internal network
– Do the same thing for science – Science DMZ
© 2014, Energy Sciences Network21 – ESnet Science Engagement (engage@es.net) - 11/19/18
Dedicated
Systems for Data
Transfer
Network
Architecture
Performance
Testing &
Measurement
Data Transfer Node
• High performance
• Configured specifically
for data transfer
• Proper tools
Science DMZ
• Dedicated network
location for high-speed
data resources
• Appropriate security
• Easy to deploy - no need
to redesign the whole
network
perfSONAR
• Enables fault isolation
• Verify correct operation
• Widely deployed in ESnet
and other networks, as
well as sites and facilities
The Science DMZ Design Pattern
© 2014, Energy Sciences Network22 – ESnet Science Engagement (engage@es.net) - 11/19/18
Abstract or Prototype Deployment
• Add-on to existing network infrastructure– All that is required is a port on the border router– Small footprint, pre-production commitment
• Easy to experiment with components and technologies– DTN prototyping– perfSONAR testing
• Limited scope makes security policy exceptions easy– Only allow traffic from partners– Add-on to production infrastructure – lower risk
© 2014, Energy Sciences Network23 – ESnet Science Engagement (engage@es.net) - 11/19/18
Science DMZ Design Pattern (Abstract)
10GE
10GE
10GE
10GE
10G
Border Router
WAN
Science DMZSwitch/Router
Enterprise Border Router/Firewall
Site / CampusLAN
High performanceData Transfer Node
with high-speed storage
Per-service security policy control points
Clean, High-bandwidth
WAN path
Site / Campus access to Science
DMZ resources
perfSONAR
perfSONAR
perfSONAR
© 2014, Energy Sciences Network24 – ESnet Science Engagement (engage@es.net) - 11/19/18
Local And Wide Area Data Flows
10GE
10GE
10GE
10GE
10G
Border Router
WAN
Science DMZSwitch/Router
Enterprise Border Router/Firewall
Site / CampusLAN
High performanceData Transfer Node
with high-speed storage
Per-service security policy control points
Clean, High-bandwidth
WAN path
Site / Campus access to Science
DMZ resources
perfSONAR
perfSONAR
High Latency WAN Path
Low Latency LAN Path
perfSONAR
© 2014, Energy Sciences Network25 – ESnet Science Engagement (engage@es.net) - 11/19/18
Supercomputer Center Deployment
• High-performance networking is assumed in this environment– Data flows between systems, between systems and storage, wide area, etc.– Global filesystem often ties resources together
• Portions of this may not run over Ethernet (e.g. IB)• Implications for Data Transfer Nodes
• “Science DMZ” may not look like a discrete entity here– By the time you get through interconnecting all the resources, you end up
with most of the network in the Science DMZ– This is as it should be – the point is appropriate deployment of tools,
configuration, policy control, etc.• Office networks can look like an afterthought, but they aren’t
– Deployed with appropriate security controls– Office infrastructure need not be sized for science traffic
© 2014, Energy Sciences Network26 – ESnet Science Engagement (engage@es.net) - 11/19/18
Supercomputer Center
© 2014, Energy Sciences Network27 – ESnet Science Engagement (engage@es.net) - 11/19/18
Routed
Border Router
WAN
Core Switch/Router
Firewall
Offices
perfSONAR
perfSONAR
perfSONAR
Supercomputer
Parallel Filesystem
Front endswitch
Data Transfer Nodes
Front endswitch
Supercomputer Center Data Path
© 2014, Energy Sciences Network28 – ESnet Science Engagement (engage@es.net) - 11/19/18
Routed
Border Router
WAN
Core Switch/Router
Firewall
Offices
perfSONAR
perfSONAR
perfSONAR
Supercomputer
Parallel Filesystem
Front endswitch
Data Transfer Nodes
Front endswitch
High Latency WAN Path
Low Latency LAN Path
Common Threads
• Two common threads exist in these examples
• Accommodation of TCP– Wide area portion of data transfers traverses purpose-built path– High performance devices that don’t drop packets
• Ability to test and verify– When problems arise (and they always will), they can be solved if the
infrastructure is built correctly– Small device count makes it easier to find issues– Multiple test and measurement hosts provide multiple views of the data
path• perfSONAR nodes at the site and in the WAN• perfSONAR nodes at the remote site
© 2014, Energy Sciences Network29 – ESnet Science Engagement (engage@es.net) - 11/19/18
Overview
• Science DMZ Motivation and Introduction
• Science DMZ Architecture
• Performance Monitoring
• Data Transfer Nodes & Applications
• Science DMZ Security
• Wrap Up
© 2014, Energy Sciences Network30 – ESnet Science Engagement (engage@es.net) - 11/19/18
Performance Monitoring• Everything may function perfectly when it is deployed• Eventually something is going to break
– Networks and systems are complex– Bugs, mistakes, …– Sometimes things just break (this is why we buy support contracts!)
• Must be able to find and fix problems when they occur• Must be able to find problems in other networks (your network may
be fine, but someone else’s problem can impact your users)
• TCP was intentionally designed to hide all transmission errors from the user:– “As long as the TCPs continue to function properly and the internet
system does not become completely partitioned, no transmission errors will affect the users.” (From RFC793, 1981)
© 2014, Energy Sciences Network31 – ESnet Science Engagement (engage@es.net) - 11/19/18
Soft Network Failures – Hidden Problems
• Hard failures are well-understood– Link down, system crash, software crash– Traditional network/system monitoring tools designed to quickly find hard
failures– Routing protocols reconverge automatically
• Soft failures result in degraded capability– Connectivity exists– Performance impacted– Typically something in the path is functioning, but not well
• Soft failures are hard to detect with traditional methods– No obvious single event– Sometimes no indication at all of any errors
• Independent testing is the only way to reliably find soft failures
© 2014, Energy Sciences Network32 – ESnet Science Engagement (engage@es.net) - 11/19/18
Rebooted router with full route table
Gradual failure of optical line card
Sample Soft Failures
Gb/s
normal performance
degrading performance
repair
one month
© 2014, Energy Sciences Network33 – ESnet Science Engagement (engage@es.net) - 11/19/18
Testing Infrastructure – perfSONAR
• perfSONAR is:– A widely-deployed test and measurement infrastructure
• ESnet, Internet2, US regional networks, international networks• Laboratories, supercomputer centers, universities
– A suite of test and measurement tools– A collaboration that builds and maintains the software tools
• By installing perfSONAR, a site can leverage over 2000 test servers deployed around the world
• perfSONAR is ideal for finding soft failures– Alert to existence of problems– Fault isolation– Verification of correct operation
© 2014, Energy Sciences Network34 – ESnet Science Engagement (engage@es.net) - 11/19/18
Lookup Service Directory Search: http://stats.es.net/ServicesDirectory/
© 2014, Energy Sciences Network35 – ESnet Science Engagement (engage@es.net) - 11/19/18
Overview
• Science DMZ Motivation and Introduction
• Science DMZ Architecture
• Performance Monitoring
• Data Transfer Nodes & Applications
• Science DMZ Security
• Wrap Up
© 2014, Energy Sciences Network36 – ESnet Science Engagement (engage@es.net) - 11/19/18
Dedicated Systems – Data Transfer Node
• The DTN is dedicated to data transfer
• Set up specifically for high-performance data movement
– System internals (BIOS, firmware, interrupts, etc.)
– Network stack
– Storage (global filesystem, Fibrechannel, local RAID, etc.)
– High performance tools
– No extraneous software
• Limitation of scope and function is powerful– No conflicts with configuration for other tasks
– Small application set makes cybersecurity easier
© 2014, Energy Sciences Network37 – ESnet Science Engagement (engage@es.net) - 11/19/18
Data Transfer Tools For DTNs
• Parallelism is important– It is often easier to achieve a given performance level with four parallel
connections than one connection– Several tools offer parallel transfers, including Globus/GridFTP
• Latency interaction is critical– Wide area data transfers have much higher latency than LAN transfers– Many tools and protocols assume a LAN
• Workflow integration is important
• Key tools: Globus Online, HPN-SSH
© 2014, Energy Sciences Network38 – ESnet Science Engagement (engage@es.net) - 11/19/18
Data Transfer Tool Comparison
• In addition to the network, using the right data transfer tool is critical• Data transfer test from Berkeley, CA to Argonne, IL (near Chicago). RTT = 53
ms, network capacity = 10Gbps.
Tool Throughput
scp: 140 MbpsHPN patched scp: 1.2 Gbpsftp 1.4 Gbps
GridFTP, 4 streams 5.4 GbpsGridFTP, 8 streams 6.6 Gbps
Note that to get more than 1 Gbps (125 MB/s) disk to disk requires properly
engineered storage (RAID, parallel filesystem, etc.)
© 2014, Energy Sciences Network39 – ESnet Science Engagement (engage@es.net) - 11/19/18
Overview
• Science DMZ Motivation and Introduction
• Science DMZ Architecture
• Performance Monitoring
• Data Transfer Nodes & Applications
• Science DMZ Security
• Wrap Up
© 2014, Energy Sciences Network40 – ESnet Science Engagement (engage@es.net) - 11/19/18
• Goal: Disentangle security policy and enforcement for science flows from security for business systems
• Rationale– Science data traffic is simple from a security perspective– Narrow application set on Science DMZ• Data transfer, data streaming packages• No printers, document readers, web browsers, building control
systems, financial databases, staff desktops, etc. – Security controls that are typically implemented to protect business
resources often cause performance problems• Separation allows each to be optimized
Science DMZ Security
Science DMZ as Security Architecture
• Allows for better segmentation of risks, more granular application of controls to those segmented risks.– Limit risk profile for high-performance data transfer applications– Apply specific controls to data transfer hosts– Avoid including unnecessary risks, unnecessary controls
• Remove degrees of freedom – focus only on what is necessary– Easier to secure– Easier to achieve performance– Easier to troubleshoot
Performance is a Core Requirement
• Core information security principles– Confidentiality, Integrity, Availability (CIA)
• In data-intensive science, performance is an additional core mission requirement: CIA à PICA– CIA principles are important, but if the performance isn’t there the
science mission fails – This isn’t about “how much” security you have, but how the security is
implemented– Need to appropriately secure systems without performance
compromises
Science DMZ Placement Outside the Enterprise Firewall
• Why? For performance reasons– Specifically: Science DMZ traffic does not traverse the firewall data
plane– This has nothing to do with whether packet filtering is part of the
security enforcement toolkit• Lots of heartburn over this, especially from the perspective of a
conventional firewall manager– Organizational policy directives can mandate firewalls– Firewalls are designed to protect converged enterprise networks– Why would you put critical assets outside the firewall?
• The answer: Firewalls are typically a poor fit for high-performance science applications
Firewall Capabilities and Science Traffic
• Firewalls have a lot of sophistication in an enterprise setting– Application layer protocol analysis (HTTP, POP, MSRPC, etc.)– Built-in VPN servers– User awareness
• Data-intensive science flows don’t match this profile– Common case – data on filesystem A needs to be on filesystem Z• Data transfer tool verifies credentials over an encrypted channel• Then open a socket or set of sockets, and send data until done (1TB, 10TB,
100TB, …)– One workflow can use 10% to 50% or more of a 10G network link
• Do we have to use a firewall?
Firewalls as Access Lists
• What does a firewall admin ask for when asked to allow data transfers?– IP address of your host– IP address of the remote host– Port range– That looks like an ACL to me – I can do that on the router
• No special config for advanced protocol analysis – just address/port
• Router ACLs do not drop traffic permitted by policy, while enterprise firewalls can (and often do)
Firewall Performance Example• Observed performance, via perfSONAR, through a firewall:
• Observed performance, via perfSONAR, bypassing firewall:
Almost 20 times slower through the firewall
Huge improvement without the firewall
47 – ESnet Science Engagement (engage@es.net) - 11/19/18
© 2015, The Regents of the University of California, through Lawrence Berkeley National Laboratory and is licensed under CC BY-
NC-ND 4.0
Security Without Enterprise Firewalls
• Data intensive science traffic interacts poorly with enterprise firewalls• Does this mean we ignore security? NO!– We must protect our systems– We need to find a way to do security that does not prevent us from
getting the science done• Key point – security policies and mechanisms that protect the Science
DMZ should be implemented so that they do not compromise performance
• Traffic permitted by policy should not experience performance impact as a result of the application of policy
• This gets back to network segmentation, and the value that separation provides
Other Security Mechanisms: ACLs And Applications
• Aggressive access lists– More useful with project-specific DTNs– Exchanging data with a small set of remote collaborators = ACL is fairly
easy to manage– Large-scale data distribution servers = difficult/time consuming to handle
(but then, the firewall ruleset for such a service would be, too)
• Limitation of the application set– Makes it easier to protect– Keep unnecessary applications off the DTN (and watch for them anyway
using a host IDS – take violations seriously)
Other Security Mechanisms: Network IDS
• Intrusion Detection Systems (IDS)– One example is Bro – http://bro-ids.org/– Bro is high-performance and battle-tested• Bro protects several high-performance national assets• Bro can be scaled with clustering: http://www.bro-
ids.org/documentation/cluster.html– Other IDS solutions also available
Other Security Mechanisms: Host IDS
• Using a Host IDS is recommended for hosts in a Science DMZ
• Several open source solutions exist:• OSSec: http://www.ossec.net/• Rkhunter: http://rkhunter.sourceforge.net (rootkit detection + FIM)• chkrootkit: http://chkrootkit.org/• Logcheck: http://logcheck.org (log monitoring)• Fail2ban: http://www.fail2ban.org/wiki/index.php/Main_Page• denyhosts: http://denyhosts.sourceforge.net/
Collaboration Within The Organization
• All stakeholders should collaborate on Science DMZ design, policy, and enforcement
• The security people have to be on board– Political cover for security officers– If the deployment of a Science DMZ is going to jeopardize the job of
the security officer, expect pushback• The Science DMZ is a strategic asset, and should be understood by the
strategic thinkers in the organization– Changes in security models– Changes in operational models– Enhanced ability to compete for funding– Increased institutional capability – greater science output
Overview
• Science DMZ Motivation and Introduction
• Science DMZ Architecture
• Performance Monitoring
• Data Transfer Nodes & Applications
• Science DMZ Security
• Wrap Up
© 2014, Energy Sciences Network53 – ESnet Science Engagement (engage@es.net) - 11/19/18
Wrap Up
• The Science DMZ design pattern provides a flexible model for supporting
high-performance data transfers and workflows
• Key elements:
– Accommodation of TCP
• Sufficient bandwidth to avoid congestion
• Loss-free IP service
– Location – near the site perimeter if possible
– Test and measurement
– Dedicated systems
– Appropriate security
• Support for advanced capabilities (e.g. SDN) is much easier with a Science
DMZ
54 – ESnet Science Engagement (engage@es.net) -11/19/18 © 2014, Energy Sciences Network
Science DMZ Applications
11/19/18 Footer55
Context: Science DMZ Adoption
• DOE National Laboratories– Supercomputer centers, LHC sites, experimental facilities
– Both large and small sites
• NSF CC* programs have funded many Science DMZs– Large investments across the US university complex: over $100M
– Significant strategic importance
• Other US agencies– National Institutes of Health
– US Department of Agriculture
• Outside the USA– Australia https://www.rdsi.edu.au/dashnet
– Brazil
– Netherlands
– UK11/19/1856
Strategic Impacts• What does this mean?– We are in the midst of a significant cyberinfrastructure upgrade– Enterprise networks need not be unduly perturbed J
• Significantly enhanced capabilities compared to 5 years ago– Terabyte-scale data movement is much easier– Petabyte-scale data movement possible outside the LHC experiments• ~3.1Gbps = 1PB/month• ~14Gbps = 1PB/week
– Widely-deployed tools are much better (e.g. Globus)• Metcalfe’s Law of Network Utility– Value of Science DMZ proportional to the number of DMZs• n2 or n(logn) doesn’t matter – the effect is real
– Cyberinfrastructure value increases as we all upgrade
11/19/1857
Next Steps – Building On The Science DMZ
• Enhanced cyberinfrastructure substrate now exists– Wide area networks (ESnet, GEANT, NRENs, Internet2, Regionals)– Science DMZs connected to those networks– DTNs in the Science DMZs
• What does the scientist see?– Scientist sees a science application• Data transfer• Data portal• Data analysis
– Science applications are the user interface to networks and DMZs
• Large-scale data-intensive science requires that we build larger structures on top of those components
11/19/1858
HPC Centers Matter
• Computing centers are special– Centers of excellence / expertise– Data repositories– Computing for simulation, data analysis
• Really though – the people + cyberinfrastructure combo is special– People who know how computers, networking, and storage work– Enough resources to make things happen
• Computing facilities are anchors for many collaborations– Common pattern: multi-institution team with access to one HPC center– Shared data, analysis, simulation platform
11/19/1859
Data And HPC: The Petascale DTN Project• Built on top of the Science DMZ• Effort to improve data transfer performance between the DOE
ASCR HPC facilities at ANL, LBNL, and ORNL, and also NCSA.– Multiple current and future science projects need to transfer data
between HPC facilities– Performance goal of 15 gigabits per second (equivalent to 1PB/week)– Realize performance goal for routine Globus transfers without special
tuning
• Reference data set is 4.4TB of cosmology simulation data• Use performant, easy-to-use tools with production options on– Globus Transfer service (previously Globus Online)– Use GUI just like a user would, with default options• E.g. integrity checksums enabled, as they should be
11/19/1860
DTN Cluster Performance – HPC Facilities
21.2/22.6/24.5Gbps
23.1/33.7/39.7Gbps
26.7/34.7/39.9Gbps
33.2/43.4/50.3Gbps
35.9/39.0/40.7Gbps
29.9/33.1/35.5Gbps
34.6/47.5/56.8Gbps
44.1/46.8/48.4Gbps
41.0/42.2/43.9Gbps
33.0/35.0/37.8Gbps
43.0/50.0/56.3Gbps
55.4/56.7/57.4Gbps
DTN
DTN
DTN
DTN
NERSC DTN clusterGlobus endpoint: nersc#dtnFilesystem: /project
Data set: L380Files: 19260Directories: 211Other files: 0Total bytes: 4442781786482 (4.4T bytes)Smallest file: 0 bytes (0 bytes)Largest file: 11313896248 bytes (11G bytes)Size distribution:
1 - 10 bytes: 7 files10 - 100 bytes: 1 files100 - 1K bytes: 59 files1K - 10K bytes: 3170 files10K - 100K bytes: 1560 files100K - 1M bytes: 2817 files1M - 10M bytes: 3901 files10M - 100M bytes: 3800 files100M - 1G bytes: 2295 files1G - 10G bytes: 1647 files10G - 100G bytes: 3 files
Petascale DTN Project
November 2017L380 Data Set
Gigabits per second(min/avg/max), threetransfers
ALCF DTN clusterGlobus endpoint: alcf#dtn_miraFilesystem: /projects
OLCF DTN clusterGlobus endpoint: olcf#dtn_atlasFilesystem: atlas2
NCSA DTN clusterGlobus endpoint: ncsa#BlueWatersFilesystem: /scratch
11/19/1861
Petascale DTN Lifts All Boats
• Petascale DTN project benefits all projects which use the HPC facility DTNs
• Modern science data portal architecture– Data portals which use modern architecture benefit from DTN
improvements– DTN scaling/improvements benefit all data portals which use the same
pool
• Globus API supports this – see Globus World Tour– https://www.globusworld.org/tour/
11/19/1862
Science Data Portals
• Large repositories of scientific data
– Climate data
– Sky surveys (astronomy, cosmology)
– Many others
– Data search, browsing, access
• Many scientific data portals were designed 15+ years ago
– Single-web-server design
– Data browse/search, data access, user awareness all in a single system
– All the data goes through the portal server
• In many cases by design
• E.g. embargo before publication (enforce access control)
– Better than old command-line FTP, but outdated by today’s standards
11/19/1863
Legacy Portal Design
10GE
Border Router
WAN
Firewall
Enterprise
perfSONAR
perfSONAR
Filesystem(data store)
10GE
Portal Server
Browsing pathQuery pathData path
Portal server applications:· web server· search· database· authentication· data service
11/19/1864
• Very difficult to improve performance without architectural change– Software components all tangled
together– Difficult to put the whole portal in a
Science DMZ because of security– Even if you could put it in a DMZ, many
components aren’t scalable• What does architectural change mean?
Next-Generation Portal Leverages Science DMZ
10GE10GE
10GE
10GE
Border Router
WAN
Science DMZSwitch/Router
Firewall
Enterprise
perfSONAR
perfSONAR
10GE
10GE
10GE10GE
DTN
DTN
API DTNs(data access governed
by portal)
DTN
DTN
perfSONAR
Filesystem (data store)
10GE
Portal Server
Browsing pathQuery path
Portal server applications:· web server· search· database· authentication
Data Path
Data Transfer Path
Portal Query/Browse Path
11/19/1865
https://peerj.com/articles/cs-144/
Put The Data On Dedicated Infrastructure
• We have separated the data handling from the portal logic• Portal is still its normal self, but enhanced– Portal GUI, database, search, etc. all function as they did before– Query returns pointers to data objects in the Science DMZ– Portal is now freed from ties to the data servers (run it on Amazon if you
want!)• Data handling is separate, and scalable– High-performance DTNs in the Science DMZ– Scale as much as you need to without modifying the portal software
• Outsource data handling to computing centers– Computing centers are set up for large-scale data– Let them handle the large-scale data, and let the portal do the orchestration
of data placement• https://peerj.com/articles/cs-144/ - Modern Research Data Portal paper
11/19/1866
NCAR RDA Data Portal
• Let’s say I have a nice compute allocation at NERSC – climate science
• Let’s say I need some data from NCAR for my project
• https://rda.ucar.edu/
• Data sets (there are many more, but these are two):
• https://rda.ucar.edu/datasets/ds199.1/
• https://rda.ucar.edu/datasets/ds313.0/
• Download to NERSC (could also do ALCF or NCSA or OLCF)
11/19/1867
11/19/1868
11/19/1869
11/19/1870
11/19/1871
Portal creates a Globus transfer job for us
11/19/1872
Submit the transfer job, go about our business
11/19/1873
Data Transfer from RDA Portal – Results
11/19/1874
NCAR RDA Performance to DOE HPC Facilities
13.9 Gbps 16.6 Gbps 11.9 Gbps
DTN
nersc#dtnNERSC
DTN
olcf#dtn_atlasOLCF
DTN
alcf#dtn_miraALCF
DTN
NCAR RDArda#datashare
11/19/1875
• 1.5TB data set• 1121 files
Advanced Light Source
11/19/1876
ALS Science DMZ implementation
11/19/1877
Beamline Camera &Data Acquisition
BeamlineSwitch
Short Term Cache
Long TermStorage
ALS Router
Campus Zone
Router
ALSScience DMZ
Router
LBNL Border
Router 1
LBNL Border
Router 2Building Switch
Analysis Workstations
@ 1Gbps
10Gbps Link1Gbps Link
100Gbps Link
Science DMZ / WANCampus LANBeamline LAN
Supercomputing Facility
Beamline Workstations
@ 10Gbps
Superfacility: Computing, experiments, networks in a single ecosystem
- 78 -
HipGISAXS & RMC
GISAXS
Slot-die printing of Organic photovoltaics
Liu et al, “Fast printing and in situ morphology …”. Adv Mater. 2015
79
Getting The Data To HPC
• The data source is one end – the HPC facility is the other end
• Connecting facilities together is done by means of a network (e.g. ESnet)
• Several ways to do this from a data perspective– Move data to persistent storage before the job starts• Move data into filesystem• Move data into burst buffer
– Remote data access during the job• Remote I/O• RDMA
11/19/1880
Abstract HPC Facility
11/19/1881
ComputeNodes
Burst BufferNodes
HPCFacilityNetwork
GatewayNodes
DTN
DTN
DTN
DTN
DTNCluster
Interconnect
External Networks
(e.g. ESnet)
Filesystem /Object Store
• No facility looks exactly like this– Abstract cartoon only– Contains essential elements for
reasoning about data ingest• Gateway nodes and DTNs have
external connectivity– DTNs are external interface for
filesystem / object store– Gateway nodes provide external
connectivity for interior components of the machine
• Different workflows touch different parts, with performance and software stack implications
Data Transfer Via DTNs and Filesystem
11/19/1882
ComputeNodes
Burst BufferNodes
HPCFacilityNetwork
GatewayNodes
DTN
DTN
DTN
DTN
DTNCluster
Interconnect
External Networks
(e.g. ESnet)
Filesystem /Object Store
Data Ingest PathCompute Access Path
• Common data ingest method for HPC
facilities
– Data transferred to filesystem via
DTNs
– Compute nodes access data from
filesystem at run time
• Persistent storage allows decoupling
data ingest from compute job
• Well-understood workflow with well-
developed stacks (Globus, others)
• Filesystem performance may be an
issue going forward
Data Transfer Via Burst Buffer
11/19/1883
ComputeNodes
Burst BufferNodes
GatewayNodes
DTN
DTN
DTN
DTN
DTNCluster
Interconnect
External Networks
(e.g. ESnet)
Filesystem /Object Store
Data Ingest PathCompute Access Path
HPCFacilityNetwork
• Data transferred to burst buffer instead of to persistent storage– Higher performance than filesystem– No copy on persistent storage unless
accomplished after ingest• High performance access from compute• Burst buffer allows decoupling data
ingest from compute job (assuming facility permits staging to burst buffer)
• Tools and workflows less developed – Burst buffer Globus endpoint?– Automation of path through gateway
nodes? (under current development)
Streaming Access From Compute Job
11/19/1884
ComputeNodes
Burst BufferNodes
GatewayNodes
DTN
DTN
DTN
DTN
DTNCluster
Interconnect
External Networks
(e.g. ESnet)
Filesystem /Object Store
Data Ingest PathCompute Access Path
HPCFacilityNetwork
• Compute job reads data into memory– Access from remote via the network– No data staging at all before job runs– Increased flexibility: fetch what you
need when you need it from wherever it is
• No file transfer semantics at all
• Network is in the critical path of job execution
• LHC experiments do this today using pull model from job (XRootD)
• Other communities expressing interest and doing trials (e.g. via RDMA)
Long-Term Vision
ESnet(Big Science facilities,
DOE labs)Internet2 + Regionals
(US Universities and affiliated institutions)
International Networks(Universities and labs in Europe, Asia, Americas, Australia, etc.)
High-performance feature-rich science network ecosystem
Commercial Clouds(Amazon, Google,
Microsoft, etc.)
Agency Networks(NASA, NOAA, etc.)
Campus HPC+
Data
11/19/1885
It’s All A Bunch Of Science DMZs
High-performance feature-rich science network ecosystem
DTN
DTN
DTN
DTN
DMZDMZ
DMZDMZ
DTNDATA
DTN
DMZDMZ
DTN
DTN
DMZDMZDATA
DTN DTN
DMZDMZ
Parallel Filesystem
DTN DTN
DTN
DTNDMZDMZ
DATA
DTN
DTN
DMZDMZ
Experiment Data Archive
DTN
DTN
DTN
DTN
DTN DMZDMZ
DTN DTN
DMZDMZ
DATA
11/19/1886
It’s All A Bunch Of Science DMZs
High-performance feature-rich science network ecosystem
DTN
DTN
DTN
DTN
DMZDMZ
DMZDMZ
DTNDATA
DTN
DMZDMZ
DTN
DTN
DMZDMZDATA
DTN DTN
DMZDMZ
Parallel Filesystem
DTN DTN
DTN
DTNDMZDMZ
DATA
DTN
DTN
DMZDMZ
Experiment Data Archive
DTN
DTN
DTN
DTN
DTN DMZDMZ
DTN DTN
DMZDMZ
DATAHPC Facilities
Single Lab
Experiments
DataPortal
LHC Experiments
UniversityComputing
11/19/1887
In conclusion – ESnet’s vision:
Scientific progress will be completely unconstrained by the physical location of instruments, people, computational
resources, or data. 88
Links
– ESnet fasterdata knowledge base• http://fasterdata.es.net/
– Science DMZ paper• http://www.es.net/assets/pubs_presos/sc13sciDMZ-final.pdf
– Science DMZ email list• https://gab.es.net/mailman/listinfo/sciencedmz
– perfSONAR• http://fasterdata.es.net/performance-testing/perfsonar/• http://www.perfsonar.net
© 2014, Energy Sciences Network89 – ESnet Science Engagement (engage@es.net) - 11/19/18
Extra Slides
© 2014, Energy Sciences Network90 – ESnet Science Engagement (engage@es.net) - 11/19/18
Real-World Example – Using perfSONAR• Methodology is important
• Segment-to-segment testing is unlikely to be helpful– TCP dynamics will be different, and in this case all the pieces do not equal the whole
• E.g. high throughput on a 1ms path with high packet loss vs. the same segment in a longer 20ms path
– Problem links can test clean over short distances– An exception to this is hops that go thru a firewall
• Run long-distance tests– Run the longest clean test you can, then look for the shortest dirty test that includes
the path of the clean test• In order for this to work, the testers need to be already deployed when you start
troubleshooting– ESnet has at least one perfSONAR host at each hub location.
• Many (most?) R&E providers in the world have deployed at least 1– Deployment of test and measurement infrastructure dramatically reduces time to
resolution• Otherwise the problem resolution is burdened by a deployment effort
91 – ESnet Science Engagement (engage@es.net) - 11/19/18
A small amount of packet loss makes a huge difference in TCP performance
Metro Area
Local(LAN)
Regional
Continental
International
Measured (TCP Reno) Measured (HTCP) Theoretical (TCP Reno) Measured (no loss)
With loss, high performance beyond metro distances is essentially impossible
© 2014, Energy Sciences NetworkSource: Brian Tierney, ESnet
Wide Area Testing – User Problem Statement
10GE
10GE
10GE
Nx10GE
10GE
10GE
perfSONARperfSONARBorder perfSONAR Science DMZ perfSONAR
perfSONARBorder perfSONAR
perfSONARScience DMZ perfSONAR
PoorPerformance
WAN
University CampusNational Laboratory
93 – ESnet Science Engagement (engage@es.net) - 11/19/18
Wide Area Testing – Full Context
10GE
10GE
10GE10GE 10GE10GE
10GE10GE
10GE
10GE
Nx10GE
Nx10GE
100GE
100GE
10GE
10GE
10GE
10GE
10GE
100GE100GE
100GE
perfSONAR
perfSONAR
perfSONARBorder perfSONAR Science DMZ perfSONAR
perfSONAR
perfSONARperfSONAR perfSONAR perfSONAR
perfSONAR
10GE
perfSONAR
perfSONARBorder perfSONAR
perfSONARScience DMZ perfSONAR
Internet2 path~15 msec
ESnet path~30 msec
RegionalPath
~2 msec
Campus~1 msecLab
~1 msec
PoorPerformance
94 – ESnet Science Engagement (engage@es.net) - 11/19/18
Wide Area Testing – Long Clean Test
10GE
10GE
10GE10GE 10GE10GE
10GE10GE
10GE
10GE
Nx10GE
Nx10GE
100GE
100GE
10GE
10GE
10GE
10GE
10GE
100GE100GE
100GE
perfSONAR
perfSONAR
perfSONAR
48 msec
Border perfSONAR Science DMZ perfSONAR
perfSONAR
perfSONARperfSONAR perfSONAR perfSONAR
perfSONAR
10GE
perfSONAR
perfSONARBorder perfSONAR
perfSONARScience DMZ perfSONAR
Internet2 path~15 msec
Clean,FastClean,
Fast
ESnet path~30 msec
RegionalPath
~2 msec
Campus~1 msecLab
~1 msec
95 – ESnet Science Engagement (engage@es.net) - 11/19/18
10GE
10GE
10GE10GE 10GE10GE
10GE10GE
10GE
10GE
Nx10GE
Nx10GE
100GE
100GE
10GE
10GE
10GE
10GE
10GE
100GE100GE
100GE
perfSONAR
perfSONAR
perfSONAR
48 msec
Border perfSONAR Science DMZ perfSONAR
perfSONAR
perfSONARperfSONAR perfSONAR perfSONAR
perfSONAR
10GE
perfSONAR
perfSONARBorder perfSONAR
perfSONARScience DMZ perfSONAR
49 msec
49 msec
Internet2 path~15 msec
Clean,Fast
Clean,FastClean,
Fast
Dirty,Slow
Dirty,Slow
Clean,Fast
ESnet path~30 msec
RegionalPath
~2 msec
Campus~1 msecLab
~1 msec
Wide Area Testing – Dirty Tests
96 – ESnet Science Engagement (engage@es.net) - 11/19/18
10GE
10GE
10GE10GE 10GE10GE
10GE10GE
10GE
10GE
Nx10GE
Nx10GE
100GE
100GE
10GE
10GE
10GE
10GE
10GE
100GE100GE
100GE
perfSONAR
perfSONAR
perfSONAR
48 msec
Border perfSONAR Science DMZ perfSONAR
perfSONAR
perfSONARperfSONAR perfSONAR perfSONAR
perfSONAR
10GE
perfSONAR
perfSONARBorder perfSONAR
perfSONARScience DMZ perfSONAR
49 msec
49 msec
Internet2 path~15 msec
Clean,Fast
Clean,FastClean,
Fast
Dirty,Slow
Dirty,Slow
Clean,Fast
ESnet path~30 msec
RegionalPath
~2 msec
Campus~1 msecLab
~1 msec
Wide Area Testing – Problem Localization
97 – ESnet Science Engagement (engage@es.net) - 11/19/18
Slow tests indicate likely problem
area
Lessons From The Example
• This testing can be done quickly if perfSONAR is already deployed
• Huge productivity
– Reasonable hypothesis developed quickly
– Probable administrative domain with problem identified
– Testing time can be short – an hour or so at most
• Without perfSONAR cases like this are very challenging
– Time to resolution measured in months (I’ve worked those cases too)
• In order to be useful for data-intensive science, the network must be fixable
quickly, because problems happen in all networks
• The Science DMZ model allows high-performance use of the network, but
perfSONAR is necessary to ensure the whole kit functions well
98 – ESnet Science Engagement (engage@es.net) - 11/19/18
The Science DMZ in 1 SlideConsists of three key components, all required:
• “Friction free” network path– Highly capable network devices (wire-speed, deep queues)– Virtual circuit connectivity option– Security policy and enforcement specific to science workflows– Located at or near site perimeter if possible
• Dedicated, high-performance Data Transfer Nodes (DTNs)– Hardware, operating system, libraries all optimized for transfer– Includes optimized data transfer tools such as Globus Online and GridFTP
• Performance measurement/test node– perfSONAR
• To foster adoption, Engagement with end users is critical
Details at http://fasterdata.es.net/science-dmz/
© 2013 Wikipedia
© 2014, Energy Sciences Network99 – ESnet Science Engagement (engage@es.net) - 11/19/18
top related