grid computing and the globus toolkit jennifer m. schopf argonne national lab national escience...
TRANSCRIPT
Grid Computing and the Grid Computing and the Globus ToolkitGlobus Toolkit
Jennifer M. SchopfJennifer M. SchopfArgonne National LabArgonne National Lab
National eScience CentreNational eScience Centrehttp://www.mcs.anl.gov/~jms/Talks/http://www.mcs.anl.gov/~jms/Talks/
2
What is a Grid?What is a Grid?
Resource sharing– Computers, storage, sensors, networks, …
– Sharing always conditional: issues of trust, policy, negotiation, payment, …
Coordinated problem solving– Beyond client-server: distributed data analysis,
computation, collaboration, … Dynamic, multi-institutional virtual orgs
– Community overlays on classic org structures
– Large or small, static or dynamic
3
Why Is this Hard or Different?Why Is this Hard or Different?
Lack of central control– Where things run
– When they run Shared resources
– Contention, variability Communication
– Different sites implies different sys admins, users, institutional goals, and often “strong personalities”
4
So Why Do It?So Why Do It?
Computations that need to be done with a time limit
Data that can’t fit on one site Data owned by multiple sites
Applications that need to be run bigger, faster, more
5
What Kinds of Applications?What Kinds of Applications? Computation intensive
– Interactive simulation (climate modeling)
– Large-scale simulation and analysis (galaxy formation, gravity waves, event simulation)
– Engineering (parameter studies, linked models) Data intensive
– Experimental data analysis (e.g., physics)
– Image & sensor analysis (astronomy, climate) Distributed collaboration
– Online instrumentation (microscopes, x-ray) Remote visualization (climate studies, biology)
– Engineering (large-scale structural testing)
6
Key Common FeatureKey Common Feature
The size and/or complexity of the problem requires that people in several organizations collaborate and share computing resources, data, instruments
8
The Role of the Globus ToolkitThe Role of the Globus Toolkit
A collection of solutions to problems that come up frequently when building collaborative distributed applications
Heterogeneity– A focus, in particular, on overcoming
heterogeneity for application developers Standards
– We capitalize on and encourage use of existing standards (IETF, W3C, OASIS, GGF)
– GT also includes reference implementations of new/proposed standards in these organizations
9
Globus is an Hour GlassGlobus is an Hour Glass
Local sites have an their own policies, installs – heterogeneity!– Queuing systems, monitors,
network protocols, etc Globus unifies
– Build on Web services
– Use WS-RF, WS-Notification to represent/access state
– Common management abstractions & interfaces Local heterogeneity
Higher-Level Servicesand Users
Standard GT4Interfaces
10
On April 29, 2005 the On April 29, 2005 the Globus Alliance releasedGlobus Alliance releasedthe finest version of the the finest version of the Globus Toolkit to date!Globus Toolkit to date!
Don’t take our word for it!Read the UK eScience Evaluation of GT4
www.nesc.ac.uk/technical_papers/UKeS-2005-03.pdf(Reachable from www.globus.org, under “News”)
15
Globus is Grid InfrastructureGlobus is Grid Infrastructure
Software for Grid infrastructure– Service enable new & existing resources
– E.g., GRAM on computer, GridFTP on storage system, custom application service
– Uniform abstractions & mechanisms Tools to build applications that exploit Grid
infrastructure– Registries, security, data management, …
Open source & open standards– Each empowers the other
Enabler of a rich tool & service ecosystem
17
Globus is a Building BlockGlobus is a Building Block
Basic components for grid functionality Highest-level services are often application
specific, we let applications concentrate there
Easier to reuse than to reinvent– Compatibility with other Grid systems
comes for free We provide basic infrastructure to get you
one step closer
24
Globus is a ToolGlobus is a Tool
A Grid development environment– Develop new OGSA-compliant Web Services
– Develop applications using Java or C/C++ Grid APIs
– Secure applications using basic security mechanisms A set of basic Grid functionality
– Services and clients
– Libraries
– Development tools and examples The prerequisites for many Grid community tools
25
GT Domain AreasGT Domain Areas
Core runtime– Infrastructure for building new services
Security– Apply uniform policy across distinct systems
Execution management– Provision, deploy, & manage services
Data management– Discover, transfer, & access large data
Monitoring– Discover & monitor dynamic services
26
Data Mgmt
SecurityCommonRuntime
Execution Mgmt
Info Services
GridFTPAuthenticationAuthorization
ReliableFile
Transfer
Data Access& Integration
Grid ResourceAllocation &
ManagementIndex
CommunityAuthorization
DataReplication
CommunitySchedulingFramework
Delegation
ReplicaLocation
Trigger
Java Runtime
C Runtime
Python Runtime
WebMDS
WorkspaceManagement
Grid Telecontrol
Protocol
Globus Toolkit v4www.globus.org
CredentialMgmt
Globus Toolkit:Globus Toolkit: Open Source Grid Infrastructure Open Source Grid Infrastructure
27
Our Goals for GT4Our Goals for GT4
Usability, reliability, scalability, …– Web service components have quality equal or
superior to pre-WS components
– Documentation at acceptable quality level Consistency with latest standards (WS-*,
WSRF, WS-N, etc.) and Apache platform– WS-I Basic Profile compliant
– WS-I Basic Security Profile compliant New components, platforms, languages
– And links to larger Globus ecosystem
28
WSRF vs XML/SOAPWSRF vs XML/SOAP
The definition of WSRF means that the Grid and Web services communities can move forward on a common base
Why Not Just Use XML/SOAP?– WSRF and WS-N are just XML and SOAP– WSRF and WS-N are just Web services
Benefits of following the specs:– These patterns represent best practices that have
been learned in many Grid applications– There is a community behind them– Why reinvent the wheel?– Standards facilitate interoperability
30
GT2 vs GT4GT2 vs GT4
Pre-WS Globus is in GT4 release– Both WS and pre-WS components (ala 2.4.3) are
shipped
– These do NOT interact, but both can run on the same resource independently
Basic functionality is the same– Run a job
– Transfer a file
– Monitoring
– Security Code base is completely different
31
Why Use GT4?Why Use GT4?
Performance and reliability– Literally millions of tests and queries run against GT4
services Scalability
– Many lessons learned from GT2 have been addressed in GT4
Support– This is our active code base, much more attention
Additional functionality– New features are here– Additional GRAM interfaces to schedulers, MDS Trigger
service, GridFTP protocol interfaces, etc Easier to contribute to
32
4.0 is not a typical “.0” release,4.0 is not a typical “.0” release,but the culmination of months of testing but the culmination of months of testing
3.0.0 3.2.0
3.9.5
4.0.03.9.4
3.9.3
3.9.2
3.9.1
3.9.0
3.3.0
3.2.13.0.1
3.0.2
CVS trunk
4.0.1
Stable release branch
Development release
Stable release
4.0.2
33
Versioning and SupportVersioning and Support
Versioning– Evens are production (4.0.x, 4.2.x),
– Odds are development (4.1.x) We support this version and the one
previous– Currently we’re at 4.0.2 so we support
3.2 and 4.0
– There is also a 4.1.0 development release
34
Several Possible Next VersionsSeveral Possible Next Versions
4.0.3 – stable release– 100% same interfaces, bug fixes only– Perhaps in the fall?
4.1.1 – development release– New functionality– Likely 6-10 weeks?
4.2 - stable release– When 4.1 has “enough” new functionality,
and is stable 5.0 – substantial code base change
– With any luck, not for years :)
35
Testing OverviewTesting Overview Nightly builds and tests TestGrid at USC/ISI
– Stand up services for several weeks
– Perform stress tests TestGrid at LBNL
– Focus on WS Core performance and interoperability tests
Performance and reliability testing is a major focus– Component-specific approaches mostly
Calls for Community Testing near release time - we welcome new testing help!
36
Tested PlatformsTested Platforms
Debian Fedora Core FreeBSD HP/UX IBM AIX Red Hat Sun Solaris
SGI Altix (IA64 running Red Hat)
SuSE Linux Tru64 Unix Apple MacOS X (no
binaries) Windows – Java
components only
List of binaries and known platform-specific install bugs at
http://www.globus.org/toolkit/docs/4.0/admin/ docbook/ ch03.html
37
Documentation OverviewDocumentation Overview
Current document significantly more detailed than earlier versions– http://www.globus.org/toolkit/docs/4.0/
Tutorials available for those of you building a new service– http://www-unix.globus.org/toolkit/tutorials/BAS/
Globus® Toolkit 4: Programming Java Services (The Morgan Kaufmann Series in Networking), by Borja Sotomayor, Lisa Childers (Available through Amazon, £19.99 or $20)
38
Grid Packaging Technology (GPT)Grid Packaging Technology (GPT) Collection of XML-based packaging tools
– Straight forward definition of complex dependency and compatibility relationships between packages
– Way for developers to define the packaging data and include it as part of their source code distribution
– Automatic generation of binary packages Developer tools
– Convert a source distribution into a GPT package– Patch-n-build capability similar to RPM spec files so you
can retain their own build system if needed User Tools
– Enable collections of packages to be built and/or installed– Package manager for those systems that don't have one
Developed at NCSA
40
Installation in a nutshellInstallation in a nutshell
Quickstart guide is very useful
http://www.globus.org/toolkit/docs/4.0/ admin/docbook/quickstart.html
Verify your prereqs! Security – check spellings and permissions Globus is system software – plan
accordingly
Now that you’veNow that you’vedone your installation… done your installation… Lets talk about what you Lets talk about what you
get!get!
42
Data Mgmt
SecurityCommonRuntime
Execution Mgmt
Info Services
GridFTPAuthenticationAuthorization
ReliableFile
Transfer
Data Access& Integration
Grid ResourceAllocation &
ManagementIndex
CommunityAuthorization
DataReplication
CommunitySchedulingFramework
Delegation
ReplicaLocation
Trigger
Java Runtime
C Runtime
Python Runtime
WebMDS
WorkspaceManagement
Grid Telecontrol
Protocol
Globus Toolkit v4www.globus.org
CredentialMgmt
Globus Toolkit:Globus Toolkit: Open Source Grid Infrastructure Open Source Grid Infrastructure
43
GT4 Web Services RuntimeGT4 Web Services Runtime
Supports both GT (GRAM, RFT, Delegation, etc.) & user-developed services
Redesign to enhance scalability, modularity, performance, usability
Leverages existing WS standards– WS-I Basic Profile: WSDL, SOAP, etc.
– WS-Security, WS-Addressing Adds support for emerging WS standards
– WS-Resource Framework, WS-Notification Java, Python, & C hosting environments
– Java is standard Apache
44
What does Core give you?What does Core give you?
Reference implementation of WSRF and WS-N functions Naming and bindings (basis for virtualization)
– Every resource can be uniquely referenced and has one or more associated services for interacting
Lifecycle (basis for resilient state management)– Resources created by svcs following a factory pattern– Resource destroyed immediately or scheduled
Information model (basis for monitoring & discovery)– Resource properties associated with resources– Operations for querying and setting this info– Asynchronous notification of changes to properties
Service groups (basis for registries & collective svcs)– Group membership rules and membership management
Base fault type
45
Apache Axis Apache Axis Web Services ContainerWeb Services Container
Good news for Java WS developers: GT4.0 works with standard Axis* and Tomcat*– GT provides Axis-loadable libraries, handlers– Includes useful behaviors such as inspection,
notification, lifetime mgmt (WSRF)– Others implement GRAM, etc.
Major Globus contributions to Apache– ~50% of WS-Addressing code– ~15% of WS-Security code– Many bug fixes– WSRF code a possible next contribution
* Modulo Axis and Tomcat release cycle issues
Axis
SecurityAddressing
GTbits
Appbits
46
CustomWeb
ServicesWS-Addressing, WSRF,
WS-Notification
CustomWSRF Web
Services
GT4WSRF Web
Services
WSDL, SOAP, WS-Security
User Applications
Reg
istr
yA
dmin
istr
atio
n
GT
4 C
onta
iner
GT4 Web Services RuntimeGT4 Web Services Runtime
49
GetRP TestGetRP Test
Distributed client and service on same LAN(times in milliseconds)
GT4 - Java
GT4 - C
pyGridWare
WSRF::Lite
WSRF.NET
No Security
GT4 - Java
GT4 - C
pyGridWare
WSRF::Lite
WSRF.NET
GT4 - Java
GT4 - C
pyGridWare
WSRF::Lite
WSRF.NET
X509 Signing HTTPS
10.05
2.34
25.57
17.1
8.23
181.96
14.8
140.5
81.39
N/A11.46
2.8512.91
55.6
149.67
51
Data Mgmt
SecurityCommonRuntime
Execution Mgmt
Info Services
GridFTPAuthenticationAuthorization
ReliableFile
Transfer
Data Access& Integration
Grid ResourceAllocation &
ManagementIndex
CommunityAuthorization
DataReplication
CommunitySchedulingFramework
Delegation
ReplicaLocation
Trigger
Java Runtime
C Runtime
Python Runtime
WebMDS
WorkspaceManagement
Grid Telecontrol
Protocol
Globus Toolkit v4www.globus.org
CredentialMgmt
Globus Toolkit:Globus Toolkit: Open Source Grid Infrastructure Open Source Grid Infrastructure
52
Globus SecurityGlobus Security
Control access to shared services– Address autonomous management, e.g.,
different policy in different work-groups Support multi-user collaborations
– Federate through mutually trusted services
– Local policy authorities rule Allow users and application communities to
set up dynamic trust domains– Personal/VO collection of resources working
together based on trust of user/VO
53
Organization A Organization B
Compute Server C1Compute Server C2
Compute Server C3
File server F1 (disks A and B)
Person C(Student)
Person A(Faculty)
Person B(Staff) Person D
(Staff)Person F(Faculty)
Person E(Faculty)
Virtual Community C
Person A(Principal Investigator)
Compute Server C1'
Person B(Administrator)
File server F1 (disk A)
Person E(Researcher)
Person D(Researcher)
Virtual Organization (VO) ConceptVirtual Organization (VO) Concept
VO for each application or workload Carve out and configure resources for a particular
use and set of users
54
GT4 SecurityGT4 Security
VO
RightsUsers
Rights’
ComputeCenter
Access
Services (runningon user’s behalf)
Rights
Local policyon VO identityor attributeauthority
CAS or VOMSissuing SAMLor X.509 ACs
SSL/WS-Securitywith ProxyCertificates
Authz Callout:SAML, XACML
KCA
MyProxy
55
GT Authorization FrameworkGT Authorization Framework
56
GT4 SecurityGT4 Security Public-key-based authentication Transport- and message-level authentication Extensible authorization framework based on Web
services standards– SAML-based authorization callout
– Integrated policy decision engine> XACML policy language, per-operation policies, pluggable
Credential management service– MyProxy (One time password support)
Community Authorization Service Standalone delegation service Ability to map between Grid and local identity
57
Security ToolsSecurity Tools
Basic Grid Security Mechanisms Certificate Generation Tools Certificate Management Tools
– Getting users “registered” to use a Grid
– Getting Grid credentials to wherever they’re needed in the system
Authorization/Access Control Tools– Storing and providing access to system-
wide authorization information
58
Other Security Services Include …Other Security Services Include …
MyProxy– Simplified credential management
– Web portal integration
– Single-sign-on support KCA & kx.509
– Bridging into/out-of Kerberos domains SimpleCA
– Online credential generation PERMIS
– Authorization service callout
59
A Cautionary NoteA Cautionary Note
Grid security mechanisms are tedious to set up– If exposed to users, hand-holding is usually required
– These mechanisms can be hidden entirely from end users, but still used behind the scenes
These mechanisms exist for good reasons.– Many useful things can be done without Grid
security
– It is unlikely that an ambitious project could go into production operation without security like this
– Most successful projects end up using Grid security, but using it in ways that end users don’t see much
67
Data Mgmt
SecurityCommonRuntime
Execution Mgmt
Info Services
GridFTPAuthenticationAuthorization
ReliableFile
Transfer
Data Access& Integration
Grid ResourceAllocation &
ManagementIndex
CommunityAuthorization
DataReplication
CommunitySchedulingFramework
Delegation
ReplicaLocation
Trigger
Java Runtime
C Runtime
Python Runtime
WebMDS
WorkspaceManagement
Grid Telecontrol
Protocol
Globus Toolkit v4www.globus.org
CredentialMgmt
Globus Toolkit:Globus Toolkit: Open Source Grid Infrastructure Open Source Grid Infrastructure
68
Execution Management (GRAM)Execution Management (GRAM)
Common WS interface to schedulers– Unix, Condor, LSF, PBS, SGE, …
More generally: interface for process execution management– Lay down execution environment
– Stage data
– Monitor & manage lifecycle
– Kill it, clean up A basis for application-driven provisioning
69
GRAM - Basic Job GRAM - Basic Job Submission and Control ServiceSubmission and Control Service
A uniform service interface for remote job submission and control– Includes file staging and I/O
management– Includes reliability features– Supports basic Grid security
mechanisms– Available in Pre-WS and WS
GRAM is not a scheduler.– No scheduling– No metascheduling/brokering– Often used as a front-end to
schedulers, and often used to simplify metaschedulers/brokers
70
GT4 WS GRAMGT4 WS GRAM
2nd-generation WS implementation optimized for performance, flexibility, stability, scalability
Streamlined critical path– Use only what you need
Flexible credential management– Credential cache & delegation service
GridFTP & RFT used for data operations– Data staging & streaming output
– Eliminates redundant GASS code
71
GRAMGRAM
Intended for jobs where arbitrary programs, stateful monitoring, credential management, and file staging are important
If the application is lightweight, with modest input/output, may be a better candidate for hosting directly as a WSRF service
72
GRAMservices
GT4 Java Container
GRAMservices
Delegation
RFT FileTransfer
Transferrequest
GridFTPRemote storage element(s)
Localscheduler
Userjob
Compute element
GridFTP
sudo
GRAMadapter
FTPcontrol
Local job control
Delegate
FTP data
Cli
ent Job
functions
Delegate
Service host(s) and compute element(s)
GT4 WS GRAM ArchitectureGT4 WS GRAM Architecture
SEGJob events
73
GRAMservices
GT4 Java Container
GRAMservices
Delegation
RFT FileTransfer
Transferrequest
GridFTPRemote storage element(s)
Localscheduler
Userjob
Compute element
GridFTP
sudo
GRAMadapter
FTPcontrol
Local job control
Delegate
FTP data
Cli
ent Job
functions
Delegate
Service host(s) and compute element(s)
GT4 WS GRAM ArchitectureGT4 WS GRAM Architecture
SEGJob events
Delegated credential can be:Made available to the application
74
GRAMservices
GT4 Java Container
GRAMservices
Delegation
RFT FileTransfer
Transferrequest
GridFTPRemote storage element(s)
Localscheduler
Userjob
Compute element
GridFTP
sudo
GRAMadapter
FTPcontrol
Local job control
Delegate
FTP data
Cli
ent Job
functions
Delegate
Service host(s) and compute element(s)
GT4 WS GRAM ArchitectureGT4 WS GRAM Architecture
SEGJob events
Delegated credential can be:Used to authenticate with RFT
75
GRAMservices
GT4 Java Container
GRAMservices
Delegation
RFT FileTransfer
Transferrequest
GridFTPRemote storage element(s)
Localscheduler
Userjob
Compute element
GridFTP
sudo
GRAMadapter
FTPcontrol
Local job control
Delegate
FTP data
Cli
ent Job
functions
Delegate
Service host(s) and compute element(s)
GT4 WS GRAM ArchitectureGT4 WS GRAM Architecture
SEGJob events
Delegated credential can be:Used to authenticate with GridFTP
76
Submitting a Sample JobSubmitting a Sample Job
Specify a remote host with –F
globusrun-ws –submit –F host2 –c /bin/true
The return code will be the job’s exit code if supported by the scheduler
77
Data Staging and StreamingData Staging and Streaming
Simplest stage-in/stage-out example is stdout/stderr
globusrun-ws –S –s –c /bin/date
-S is short for “-submit” -s is short for –streaming The output will be sent back to the terminal,
control will not return until the job is done
78
Resource Specification LanguageResource Specification Language
For more complicated jobs, we’ll use RSL to specify the job
<job>
<executable>/bin/echo</executable>
<argument>this is an example_string </argument>
<argument>Globus was here</argument>
<stdout>${GLOBUS_USER_HOME}/stdout</stdout>
<stderr>${GLOBUS_USER_HOME}/stderr</stderr>
</job>
79
Resource Specification LanguageResource Specification Language
<job>
<executable>/bin/echo</executable> <directory>/tmp</directory> <argument>12</argument>
<environment><name>PI</name> <value>3.141</value></environment>
<stdin>/dev/null</stdin>
<stdout>stdout</stdout>
<stderr>stderr</stderr>
</job>
82
At Most Once SubmissionAt Most Once Submission
You may specify a UUID with your job submission
If you’re not sure the submission worked, you may submit the job again with the same UUID
If the job has already been submitted, the new submission will have no effect
If you do not specify a UUID, one will be generated for you
89
Batch SubmissionBatch Submission
Your client does not have to stay attached to the execution of the job
-batch will disconnect from the job and output an EPR– You may redirect the EPR to a file with –o
Use the EPR file with –monitor or -status You may also kill the job using -kill
90
Specifying Scheduler OptionsSpecifying Scheduler Options
RSL lets you specify various scheduler options– what queue to submit to
– which project to select for accounting
– max CPU and wallclock time to spend
– min/max memory required All defined online under the schema
document for GRAM
93
WS GRAM PerformanceWS GRAM Performance
Time to submit a basic GRAM job– Pre-WS GRAM: < 1 second– WS GRAM: 2 seconds
Concurrent jobs– Pre-WS GRAM: 300 jobs– WS GRAM: 32,000 jobs
98
Data Mgmt
SecurityCommonRuntime
Execution Mgmt
Info Services
GridFTPAuthenticationAuthorization
ReliableFile
Transfer
Data Access& Integration
Grid ResourceAllocation &
ManagementIndex
CommunityAuthorization
DataReplication
CommunitySchedulingFramework
Delegation
ReplicaLocation
Trigger
Java Runtime
C Runtime
Python Runtime
WebMDS
WorkspaceManagement
Grid Telecontrol
Protocol
Globus Toolkit v4www.globus.org
CredentialMgmt
Globus Toolkit:Globus Toolkit: Open Source Grid Infrastructure Open Source Grid Infrastructure
99
GT4 Data ManagementGT4 Data Management Stage/move large data to/from nodes
– GridFTP, Reliable File Transfer (RFT)
– Alone, and integrated with GRAM Locate data of interest
– Replica Location Service (RLS) Replicate data for performance/reliability
– Distributed Replication Service (DRS) Provide access to diverse data sources
– File systems, parallel file systems, hierarchical storage: GridFTP
– Databases: OGSA DAI
100
GridFTPGridFTP
A high-performance, secure, reliable data transfer protocol optimized for high-bandwidth wide-area networks– FTP with well-defined extensions– Uses basic Grid security (control and data channels)– Multiple data channels for parallel transfers– Partial file transfers– Third-party (direct server-to-server) transfers– Reusable data channels– Command pipelining
GGF recommendation GFD.20
101
GridFTP in GT4GridFTP in GT4
100% Globus code– No licensing issues
– Stable, extensible IPv6 Support XIO for different transports Striping multi-Gb/sec wide area transport Pluggable
– Front-end: e.g., future WS control channel
– Back-end: e.g., HPSS, cluster file systems
– Transfer: e.g., UDP, NetBLT transport
Bandwidth Vs Striping
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
0 10 20 30 40 50 60 70
Degree of Striping
Ba
nd
wid
th (
Mb
ps
)
# Stream = 1 # Stream = 2 # Stream = 4
# Stream = 8 # Stream = 16 # Stream = 32
Disk-to-disk onTeraGrid
102
Striped ServerStriped Server Multiple nodes work together and act as a single
GridFTP server An underlying parallel file system allows all nodes to see
the same file system and must deliver good performance (usually the limiting factor in transfer speed)
– I.e., NFS does not cut it Each node then moves (reads or writes) only the pieces
of the file that it is responsible for. This allows multiple levels of parallelism, CPU, bus, NIC,
disk, etc.
– Critical if you want to achieve better than 1 Gbs without breaking the bank
103
Striped GridFTP ServiceStriped GridFTP Service
A distributed GridFTP service that runs on a storage cluster– Every node of the cluster
is used to transfer data into/out of the cluster
– Head node coordinates transfers
Multiple NICs/internal busses lead to very high performance– Maximizes use of Gbit+
WANs
Parallel TransferFully utilizes bandwidth of
network interface on single nodes.
Striped TransferFully utilizes bandwidth of
Gb+ WAN using multiple nodes.
Par
alle
l F
iles
yste
m
Par
alle
l F
iles
yste
m
104
MODE ESPAS (Listen) - returns list of host:port pairsSTOR <FileName>
MODE ESPOR (Connect) - connect to the host-port pairsRETR <FileName>
18-Nov-03
GridFTP Striped Transfer
Host Z
Host Y
Host A
Block 1
Block 5
Block 13
Block 9
Host B
Block 2
Block 6
Block 14
Block 10
Host C
Block 3
Block 7
Block 15
Block 11
Host D
Block 4
Block 8 - > Host D
Block 16
Block 12 -> Host D
Host X
Block1 -> Host A
Block 13 -> Host A
Block 9 -> Host A
Block 2 -> Host B
Block 14 -> Host B
Block 10 -> Host B
Block 3 -> Host C
Block 7 -> Host C
Block 15 -> Host C
Block 11 -> Host C
Block 16 -> Host D
Block 4 -> Host D
Block 5 -> Host A
Block 6 -> Host B
Block 8
Block 12
105
Network ProtocolNetwork
Protocol
Typical Approach (without XIO)Typical Approach (without XIO)
Application
Disk
Network Protocol
Special Device
Protocol API
POSIX IO
Proprietary API
106
Network ProtocolNetwork
Protocol
Globus XIO ApproachGlobus XIO Approach
ApplicationDisk
Network Protocol
Special Device
Glo
bus
XIO
Driver
Driver
Driver
107
DriversDrivers Make 1 API do many types of IO Specific drivers for specific protocols/devices Transform
– Manipulate or examine data
– Do not move data outside of process space
– Compression, Security, Logging Transport
– Moves data across a wire
– TCP, UDP, File IO, Device IO
– Typically move data outside of process space
108
StackStack
Transport– Exactly one per stack
– Must be on the bottom Transform
– Zero or many per stack Control flows from user to the top
of the stack, to the transport driver.
Example Driver Stack
Compression
Logging
TCP
109
Copying Files (in a nutshell)Copying Files (in a nutshell) globus-url-copy [options] srcURL dstURL guc gsiftp://localhost/foo file:///bar
– Client/server, using FTP stream mode guc –vb –dbg –tcp-bs 1048576 –p 8
gsiftp://localhost/foo gsiftp://localhost/bar– 3rd party transfer, MODE E
guc https://host.domain.edu/foo ftp://host.domain.gov/bar– from secure http to ftp server
110
The Options: Improving PerformanceThe Options: Improving Performance
-p (parallelism or number of streams)– rule of thumb 4-8, start with 4
-tcp-bs (TCP buffer size)– use either ping or traceroute to determine the
RTT between hosts
– buffer size = BW (Mbs) * RTT (ms) *1000/8/<(parallelism value – 1)>
– If that is still too complicated use 2MB -vb if you want performance feedback -dbg if you have trouble
111
Tuning GridFTPTuning GridFTP
Many ways you can tune the performance
Two sources of data arehttp://www.globus.org/toolkit/docs/4.0/data/gridftp/rn01re01.htmlhttp://www.nsf-middleware.org/OnTheGrid
/ 2004-09-MaxGridFTP.pdf
116
RFT - File Transfer QueuingRFT - File Transfer Queuing
A WSRF service for queuing file transfer requests– Server-to-server transfers
– Checkpointing for restarts
– Database back-end for failovers Allows clients to requests transfers and
then “disappear”– No need to manage the transfer
– Status monitoring available if desired
117
Reliable File Transfer:Reliable File Transfer:Third Party TransferThird Party Transfer
RFT Service
RFT Client
SOAP Messages
Notifications(Optional)
DataChannel
Protocol Interpreter
MasterDSI
DataChannel
SlaveDSI
IPCReceiver
IPC Link
MasterDSI
Protocol Interpreter
Data Channel
IPCReceiver
SlaveDSI
Data Channel
IPC Link
GridFTP Server GridFTP Server
Fire-and-forget transfer Web services interface Many files & directories Integrated failure recovery Has transferred 900K files
118
Replica Location ServiceReplica Location Service
Identify location of files via logical to physical name map
Distributed indexing of names, fault tolerant update protocols
GT4 version scalable & stable
Managing ~40 million files across ~10 sites
IndexIndex
Local DB
Update send (secs)
Bloom filter
(secs)
Bloom filter (bits)
10K <1 2 1 M
1 M 2 24 10 M
5 M 7 175 50 M
119
Cardiff
AEI/Golm
Birmingham•
Reliable Wide Area Data Reliable Wide Area Data ReplicationReplication
Replicating >1 Terabyte/day to 8 sites>30 million replicas so farMTBF = 1 month
LIGO Gravitational Wave Observatory
www.globus.org/solutions
121
OGSA-DAIOGSA-DAI
Provide service-based access to structured data resources as part of Globus
Specify a selection of interfaces tailored to various styles of data access—starting with relational and XML
122MySQL
OGSA-DAI service
Engine
SQLQuery
JDBCData
Resources
Activities
DB2
The OGSA-DAI FrameworkThe OGSA-DAI Framework
GZip GridFTPXPath
XMLDB
XIndice
readFile
File
SWISSPROT
XSLT
SQLServer
Data-bases
ApplicationApplicationClient ToolkitClient Toolkit
124
OGSA-DAI: A Framework OGSA-DAI: A Framework for Building Applicationsfor Building Applications
Supports data access, insert and update– Relational: MySQL, Oracle, DB2, SQL Server, Postgres– XML: Xindice, eXist– Files – CSV, BinX, EMBL, OMIM, SWISSPROT,…
Supports data delivery– SOAP over HTTP– FTP; GridFTP– E-mail– Inter-service
Supports data transformation– XSLT– ZIP; GZIP
Supports security– X.509 certificate based security
125
OGSA-DAI: Other FeaturesOGSA-DAI: Other Features
A framework for building data clients– Client toolkit library for application developers
A framework for developing functionality– Extend existing activities, or implement your own
– Mix and match activities to provide functionality you need
Highly extensible– Customise our out-of-the-box product
– Provide your own services, client-side support, and data-related functionality
128
Data Mgmt
SecurityCommonRuntime
Execution Mgmt
Info Services
GridFTPAuthenticationAuthorization
ReliableFile
Transfer
Data Access& Integration
Grid ResourceAllocation &
ManagementIndex
CommunityAuthorization
DataReplication
CommunitySchedulingFramework
Delegation
ReplicaLocation
Trigger
Java Runtime
C Runtime
Python Runtime
WebMDS
WorkspaceManagement
Grid Telecontrol
Protocol
Globus Toolkit v4www.globus.org
CredentialMgmt
Globus Toolkit:Globus Toolkit: Open Source Grid Infrastructure Open Source Grid Infrastructure
129
Monitoring and Discovery SystemMonitoring and Discovery System(MDS4)(MDS4)
Grid-level monitoring system – Aid user/agent to identify host(s) on which to run an
application
– Warn on errors Uses standard interfaces to provide publishing of
data, discovery, and data access, including subscription/notification– WS-ResourceProperties, WS-BaseNotification, WS-
ServiceGroup Functions as an hourglass to provide a common
interface to lower-level monitoring tools
130
Standard Schemas(GLUE schema, eg)
Information Users :Schedulers, Portals, Warning Systems, etc.
Cluster monitors(Ganglia, Hawkeye,Clumon, and Nagios) Services
(GRAM, RFT, RLS)
Queuing systems(PBS, LSF, Torque)
WS standard interfaces for subscription, registration, notification
131
MDS4 ComponentsMDS4 Components
Information providers– Monitoring is a part of every WSRF service– Non-WS services are also be used
Higher level services– Index Service – a way to aggregate data– Trigger Service – a way to be notified of changes– Both built on common aggregator framework
Clients– WebMDS
All of the tool are schema-agnostic, but interoperability needs a well-understood common language
132
Information ProvidersInformation Providers
Data sources for the higher-level services Some are built into services
– Any WSRF-compliant service publishes some data automatically
– WS-RF gives us standard Query/Subscribe/Notify interfaces
– GT4 services: ServiceMetaDataInfo element includes start time, version, and service type name
– Most of them also publish additional useful information as resource properties
133
Information Providers (2)Information Providers (2)
Other sources of data – Any executables– Other (non-WS) services– Interface to another archive or data
store– File scraping
Just need to produce a valid XML document
134
Information Providers:Information Providers:GT4 ServicesGT4 Services
Reliable File Transfer Service (RFT)– Service status data, number of active transfers,
transfer status, information about the resource running the service
Community Authorization Service (CAS)– Identifies the VO served by the service instance
Replica Location Service (RLS)– Note: not a WS– Location of replicas on physical storage systems
(based on user registrations) for later queries
135
Information Providers:Information Providers:Cluster and Queue DataCluster and Queue Data
Interfaces to Hawkeye, Ganglia, CluMon, Nagios– Basic host data (name, ID), processor information,
memory size, OS name and version, file system data, processor load data
– Some condor/cluster specific data
– This can also be done for sub-clusters, not just at the host level
Interfaces to PBS, Torque, LSF– Queue information, number of CPUs available and
free, job count information, some memory statistics and host info for head node of cluster
136
Higher-Level ServicesHigher-Level Services
Index Service– Caching registry
Trigger Service– Warn on error conditions
Archive Service– Database store for history (in development)
All of these have common needs, and are built on a common framework
137
Common Aggregator FrameworkCommon Aggregator Framework
Basic framework for higher-level functions– Subscribe to Information Provider(s)
– Do some action
– Present standard interfaces
138
Aggregator Framework FeaturesAggregator Framework Features
1) Common configuration mechanism– Specify what data to get, and from where
2) Self cleaning – Services have lifetimes that must be refreshed
3) Soft consistency model– Published information is recent, but not
guaranteed to be the absolute latest
4) Schema Neutral– Valid XML document needed only
139
MDS4 Index ServiceMDS4 Index Service
Index Service is both registry and cache– Datatype and data provider info, like a registry
(UDDI)– Last value of data, like a cache
In memory default approach– DB backing store currently being developed to
allow for very large indexes Can be set up for a site or set of sites, a
specific set of project data, or for user-specific data only
Can be a multi-rooted hierarchy– No *global* index
140
MDS4 Trigger ServiceMDS4 Trigger Service
Subscribe to a set of resource properties Evaluate that data against a set of pre-
configured conditions (triggers) When a condition matches, action occurs
– Email is sent to pre-defined address
– Website updated
Similar functionality in Hawkeye
141
WebMDS User InterfaceWebMDS User Interface
Web-based interface to WSRF resource property information
User-friendly front-end to Index Service Uses standard resource property requests to
query resource property data XSLT transforms to format and display them Customized pages are simply done by using
HTML form options and creating your own XSLT transforms
Sample page:– http://mds.globus.org:8080/webmds/webmds?
info=indexinfo&xsl=servicegroupxsl
142
143
Working with TeraGridWorking with TeraGrid
Large US project across 9 different sites– Different hardware, queuing systems and
lower level monitoring packages Starting to explore MetaScheduling
approaches– GRMS (Poznan)– W. Smith (TACC)– K. Yashimoto (SDSC)– User Portal
Need a common source of data with a standard interface for basic scheduling info
144
Data CollectedData Collected
Provide data at the subcluster level– Sys admin defines a subcluster, we query one
node of it to dynamically retrieve relevant data Can also list per-host details Interfaces to Ganglia, Hawkeye, CluMon, and
Nagios available now– Other cluster monitoring systems can write into
a .html file that we then scrape Also collect basic queuing data, some TeraGrid
specific attributes
145
146
Scalability ExperimentsScalability Experiments
MDS index– Dual 2.4GHz Xeon processors, 3.5 GB RAM– Sizes: 1, 10, 25, 50, 100
Clients– 20 nodes also dual 2.6 GHz Xeon, 3.5 GB RAM– 1, 2, 3, 4, 5, 6, 7, 8, 16, 32, 64, 128, 256, 384, 512, 640,
768, 800 Nodes connected via 1Gb/s network Each data point is average of 8 minutes
– Ran for 10 mins but first 2 spent getting clients up and running
– Error bars are SD over 8 mins Experiments by Ioan Raicu, U of Chicago, using DiPerf
147
149
Data Mgmt
SecurityCommonRuntime
Execution Mgmt
Info Services
GridFTPAuthenticationAuthorization
ReliableFile
Transfer
Data Access& Integration
Grid ResourceAllocation &
ManagementIndex
CommunityAuthorization
DataReplication
CommunitySchedulingFramework
Delegation
ReplicaLocation
Trigger
Java Runtime
C Runtime
Python Runtime
WebMDS
WorkspaceManagement
Grid Telecontrol
Protocol
Globus Toolkit v4www.globus.org
CredentialMgmt
Globus Toolkit:Globus Toolkit: Open Source Grid Infrastructure Open Source Grid Infrastructure
150
The Globus EcosystemThe Globus Ecosystem
Globus components address core issues relating to resource access, monitoring, discovery, security, data movement, etc.– GT4 being the latest version
A larger Globus ecosystem of open source and proprietary components provide complementary components– A growing list of components
These components can be combined to produce solutions to Grid problems– We’re building a list of such solutions
151
Many Tools Build on, or Can Many Tools Build on, or Can Contribute to, GT4-Based Grids Contribute to, GT4-Based Grids
Condor-G, DAGman MPICH-G2 GRMS Nimrod-G Ninf-G Open Grid Computing Env. Commodity Grid Toolkit GriPhyN Virtual Data System Virtual Data Toolkit GridXpert Synergy Platform Globus Toolkit
VOMS PERMIS GT4IDE Sun Grid Engine PBS scheduler LSF scheduler GridBus TeraGrid CTSS NEES IBM Grid Toolbox …
152
GlobalGlobalCommunityCommunity
155
Example SolutionsExample Solutions
Portal-based User Reg. System (PURSE) VO Management Registration Service Service Monitoring Service TeraGrid TGCP Tool Lightweight Data Replicator GriPhyN Virtual Data System
156
Condor-GCondor-G
The Condor Project @ U Wisconsin Madison develops software for high-throughput computing on collections of distributed compute resources
Condor-G is an interface to GRAM created by the Condor team that allows users to submit jobs to GRAM servers
159
MPICH-G2MPICH-G2
MPICH-G2, developed at Northern Illinois University and Argonne National Lab, is a grid-enabled implementation of the MPI v1.1 standard
MPICH-G2 is implemented using the pre-WS GRAM component in GT4; integration with GT4 WS GRAM is expected in the near future
163
SRBSRB
SRB is a package from SDSC providing a uniform interface for connecting to network-based heterogeneous data resources
GT4’s GridFTP includes an interface to SRB data sources, and vice versa
165
Tells Us About YourTells Us About YourGrid Tools & Solutions Grid Tools & Solutions
We list links to related projects on the “Related Software” of the Globus Toolkit web www.globus.org/toolkit/tools/
“Solutions” are documented on the Globus web www.globus.org/solutions/
If we’ve got details wrong or you have a GT4-related tool to list on our website, please send mail to [email protected]
166
The Globus Commitment The Globus Commitment to Open Sourceto Open Source
Globus was first established as an open source project in 1996
The Globus Toolkit is open source to:– allow for inspection
> for consideration in standardization processes
– encourage adoption> in pursuit of ubiquity and interoperability
– encourage contributions> harness the expertise of the community
The Globus Toolkit is distributed under the (BSD-style) Apache License version 2
167
The Future:The Future:StructureStructure
NSF Community Driven Improvement of Globus Software (CDIGS) project– 5 years of funding for GT enhancement
GlobDev http://dev.globus.org– Globus Development Envionrment
168
Why is Globus Software Why is Globus Software Open Source?Open Source?
To allow for inspection– For consideration in standardization
processes
To encourage adoption– In pursuit of ubiquity and interoperability
To encourage contributions– Harness the expertise of the community
169
Open ContributionOpen Contribution
But distributing code under an open source license does not guarantee open development!
Open development requires open processes
So we have created dev.globus to facilitate contributions– http://dev.globus.org/
170
Governance ModelGovernance Model
Based on Apache Jakarta– Individual development efforts organized as
projects
– Consensus-based decision making
Control over each project in the hands of its most active and respected contributors (committers)
Globus Management Committee (GMC) providing overall guidance and conflict resolution
171
Common InfrastructureCommon Infrastructure
Code repositories (CVS, SVN) Mailing lists
– *-dev, *-user, *-announce, *-commit for every project
Issue tracking (bugzilla)– Including roadmap info for future development
Wikis Known interactions for people accessing your
project
172
SampleSample
http://dev.globus.org/wiki/GRAM
173
174
175
176
177
Technology ProjectsTechnology Projects
Common runtime projects– C Core Utilities, C WS Core, CoG jglobus, Core WS Schema,
Java WS Core, XIO Data projects
– GridFTP, Reliable File Transfer, Replica Location, Data Replication
Execution projects– GRAM, Dynamic accounts, Virtual workspaces
Information services projects– MDS4
Security Projects – C Security, CAS/SAML Utilities, Delegation Service, MyProxy
178
Non-Technolgy ProjectsNon-Technolgy Projects
Distribution Projects – Globus Toolkit Distribution
– Process was used for April 4.0.2 release Documentation Projects
– GT Release Manuals Incubation Projects
– Incubation management project
– And any new projects wanting to join
179
Incubator Process in dev.globusIncubator Process in dev.globus
Entry point for new Globus projects Incubator Management Project (IMP)
– Oversees incubator process form first contact to becoming a Globus project
– Quarterly reviews of current projects
– Process being debugged by “Incubator Pioneers”
http://dev.globus.org/wiki/IncubatorDraft
180
Incubator Process (1 of 3)Incubator Process (1 of 3)
Project proposes itself as a Candidate– A proposed name for the project; – A proposed project chair, with contact info;– A list of the proposed committers for the
project; – An overview of the aims of the project; – An overview of any current user base or user
community, if applicable;– An overview of how the project relates to
other parts of Globus; – A summary of why the project would enhance
and benefit Globus.
181
Incubator Process (2 of 3)Incubator Process (2 of 3)
IMP meet, discuss, and accept project as a ProtoProject– ProtoProject now part of the Incubator
framework– Get assigned a Mentor to help
>Member of IMP>Bridge between Globus and new
ProtoProject– Opportunity to get up to speed on Globus
Development process
182
Incubator Process (3 of 3)Incubator Process (3 of 3)
Quarterly reviews by IMP determine– Stay a ProtoProject
– Retire
– Escalate to a full Globus project
Escalation when ProtoProject passes checklist– Legal
– Meritocracy
– Alignment/Synergy
– Infrastructure
184
You Can Begin Participating You Can Begin Participating in Globus Development Today!in Globus Development Today!
Monitor and comment on Globus development discussions; recent threads include:– GT Backward Compatibility ([email protected])
– 4.2 Wish List for GRAM ([email protected])
Submit bug fixes and other contributions Start your own Globus project!
186
The Future:The Future:ContentContent
We now have a solid and extremely powerful Web services base
Next, we will build an expanded open source Grid infrastructure– Virtualization
– New services for provisioning, data management, security, VO management
– End-user tools for application development
– Etc., etc. And of course responding to user requests for
other short-term needs
189
Opportunities for CollaborationOpportunities for Collaboration
Use of Globus software– Feedback & involvement in design
Development of new Globus components– Help an existing project
> New information providers to enable use of GT to manage an entire Grid
> GRAM interfaces, etc
– Become an incubator project New applications and tools
– E.g., Grid operations, emergency response, ecogrid, bioinformatics, …
191
CreditsCredits
Globus Toolkit v4 is the work of many talented Globus Alliance members, at– Argonne Natl. Lab & U.Chicago
– USC Information Sciences Corporation
– National Center for Supercomputing Applns
– U. Edinburgh
– Swedish PDC
– Univa Corporation
– Other contributors at other institutions Supported by DOE, NSF, UK EPSRC, and other
sources
192
Some Useful PointersSome Useful Pointers
UK ETF GT4 reporthttp://www.nesc.ac.uk/technical_papers/ UKeS-2005-
03.pdf GRAM command line man page
http://www.globus.org/toolkit/docs/4.0/ execution/wsgram/rn01re01.html
GridFTP command line man page
http://www.globus.org/toolkit/docs/4.0/data/gridftp/rn01re01.html
MDS4 TG site
http://snipurl.com/j24r
193
For More InformationFor More Information
Jennifer Schopf– [email protected]
– www.mcs.anl.gov/~jms Globus Alliance
– www.globus.org GlobusWORLD 2006
– September 10-14
– Washington, D.C. 2nd Edition
www.mkp.com/grid2