experiences with nmi at michigan
DESCRIPTION
Experiences with NMI at Michigan. Shawn McKee October 1, 2004 NMI/SURA Testbed Workshop. Outline. A little history: NMI at Michigan About our environment and motivations Comments on some middleware components Issues for Middleware at Michigan Outlook and Summary. - PowerPoint PPT PresentationTRANSCRIPT
Lunch in (34-1) slides
Experiences with NMI at Michigan
Shawn McKeeOctober 1, 2004
NMI/SURA Testbed Workshop
Lunch in (34-2) slides
Outline
A little history: NMI at Michigan About our environment and motivations Comments on some middleware
components Issues for Middleware at Michigan Outlook and Summary
Lunch in (34-3) slides
History: Michigan as an “Honorary” SURA Member!
Michigan proposed to join the NMI/SURA testbed as soon as we heard about the opportunity
Michigan has a long history of work in this area: LDAP, NSFNet, AFS/IFS, KX509, CoSign, LDAP, NSFNet, AFS/IFS, KX509, CoSign, CHEF/OGCE/Sakai, NFS V4, …CHEF/OGCE/Sakai, NFS V4, …
We were beginning to start up a campus-wide Initiative call “MGRID” (Michigan Grid Research and Infrastructure Development)…NMI fit perfectly into our plans and interests
We were accepted into the testbed as its northern-most member…
Lunch in (34-4) slides
Campus Research and Grid Motivation
MichiganMichigan is a major research institution with a large, varied mix of researchers.
Many of our departments make extensive use of computing/storage/network resources and are always requiring more, for the same (or less) cost…
Many of our researchers are part of larger national or international collaborations.
Grid computing and NMI middleware help us to optimize our existing resources and plug us in to developing national and international efforts.
This is likely the case for most Universities around the country…
Lunch in (34-5) slides
Research Funding at Michigan…
Lunch in (34-6) slides
Some More Context
Michigan, thru our MGRID initiative, has been adapting and adopting Middleware to enable our distributed resources
NMI has been a key component of our work Portals seem to be the key to enabling transparent
access to various resources We are building out for our future needs: tools like
KX.509KX.509 and our XML Grid AccountingXML Grid Accounting are being augmented with additional components like WaldenWalden and new applications like NTAPNTAP…
Lunch in (34-7) slides
MGRID – www.mgrid.umich.edu
A center to develop, deploy, and sustain an institutional grid at Michigan
Many groups across the University participate in compute/data/network-intensive research grants – increasingly Grid is the solution• ATLAS, NPACI, NEESGrid, Visible Human, NFSv4, NMI
MGRID allows work on common infrastructure instead of custom solutions
Middleware, like NMI, make it possible
Lunch in (34-8) slides
NMI Components
The NMI package consists of many components Michigan used many of the components in our
work on MGRID and with various application domains
KX.509 was central to much of our work bridging our Kerberos users to X509 (PKI) space
Grids components (Globus, Condor, etc) were the primary means to make resources accessible
Many NMI components were included in VDT, Grid3 and NPACI Rocks distributions
Lunch in (34-9) slides
One Application Domain Perspective
I would also like to comment a bit on my biased application perspective
As a high-energy physicist I need to worry about accessing and processing LOTS of data, globally
In less than 3 years the LHC collider will begin to run and our ATLAS experiment will need to make ~10 Petabytes/year of data available to ~2000 physicists worldwide.
To handle this we need all the resources we can get…middleware is the basis for making these resources accessible and usable.
Lunch in (34-10) slides
ATLAS (www.usatlas.bnl.gov)
A Torroidal LHC Apparatus
Collaboration• 150 institutes• 1850 physicists
Detector• Inner tracker• Calorimeter• Magnet• Muon
United States ATLAS• 29 universities, 3 national labs• 20% of ATLAS
Lunch in (34-11) slides
Tier 1
Tier2 Center
Online SystemOffline Farm,
CERN Computer Ctr ~25 TIPS
BNL CenterFrance ItalyUK
InstituteInstituteInstituteInstitute ~0.25TIPS
Workstations
~100-400 MBytes/sec
100 - 10000
Mbits/sec
Physicists work on analysis “channels”
Each institute has ~10 physicists working on one or more channels
Physics data cache
~PByte/sec
~10-40 Gbits/sec
Tier2 CenterTier2 CenterTier2 Center
~10 Gbps
Tier 0 +1
Tier 3
Tier 4
Tier2 Center Tier 2
CERN/Outside Resource Ratio ~1:4Tier0/( Tier1)/( Tier2) ~1:2:2
Data Grids for High Energy Physics
Lunch in (34-12) slides
Abort, retry, fail?
The Problem
Lunch in (34-13) slides
The Solution
Lunch in (34-14) slides
Building Upon NMI
Middleware is glue to enable applications easy access to resources, data and instruments
Portals organize the middleware while hiding complexity
Lunch in (34-15) slides
Grid Portal Work
mod ssl
mod kx509mod kct
mod jk
CHEF
Apache
Tomcat
KCT
GateKeeperResource Manager
Service
Grid Service
KCA
Browser
kx509libpkcs11
kinit
User Workstation
KDC
Kerberos V5
SSL – Client Certificate required
GSI
Kerberos
Kerberos
WaldenWaldenLDAP
SASL
MGRID Portal
We would like to propose these for NMI R5+!
Lunch in (34-16) slides
MGRID Accounting
Step 1: Grid scheduling software (e.g. PBSPro, Condor) generates usage log files in various formats
Step 2: MGRID Accounting translates usage log files into common XML format (http://www.psc.edu/~lfm/Grid/UR-WG/)
Step 3: MGRID Accounting ingests data into MySql database for report generation and review
Lunch in (34-17) slides
Accounting Example on MGRID
Through our portal we can easily select and display account information for MGRID resources
Lunch in (34-18) slides
MGRID Walden Authorization
Fine-Grained authorization module based on XACML standard (XACML-based policy engine)
Cluster owners have complete administrative control over who users their resources
Policy files define rules based on group membership, time of day, resource load, etc.
Local account management is unnecessary Group membership can be assigned from one or
several secure LDAP servers
Lunch in (34-19) slides
Flowchart for Walden
Lunch in (34-20) slides
NTAP: Network Testing and Performance Purpose: provide a secure and extensible network
testing and performance tool invocation service at U-M Service based on GlobusGlobus Has modular, fine-grained authorization
• Added signed group membership(s) to reservation data • Now provides two authorization methods:
- Keynote policy engine / AFS PTS group service- PERMIS policy engine / LDAP group service
Runs on dedicated nodes attached to routers in a VLAN environment
MGRID NTAP Projecthttp://www.citi.umich.edu/projects/ntap/
Lunch in (34-21) slides
NTAP Architecture
mod ssl
mod kx509mod kct
Apache
kx509kinit
User Workstation Portal Host
mod jp
libpkcs11browser
LDAP
pilot
NW Topology Output
KCTKCAKDC
Kerberos V5
1. The user authenticates to the portal host via kx.509 and submits a network test request
2. The portal host constructs a path between specified
endpoints, issues test reservations, and
updates the output database.
GateKeeper
iperf, etcResource Mgr
PMP Host
GateKeeper
iperf, etcResource Mgr
PMP Host3. PMPs* on the test path runperformance tests between pairs of routers.
* Performance Monitoring Platform4. The portal host displays results.
mod php
Lunch in (34-22) slides
GridNFS (NMI Development)http://www.citi.umich.edu/projects/ Michigan has been funded to develop GridNFS, a middleware
solution that extends distributed file system technology and flexible identity management techniques to meet the needs of grid-based virtual organizations.
The foundation for data sharing in GridNFS is NFS version 4 The challenges of authentication and authorization in GridNFS are
met with X.509 credentials In tying these middleware technologies together in the way we
propose, we fill the gap for two vital, missing capabilities.• Transparent and secure data management integrated with existing grid
authentication and authorization tools.• Scalable and agile name space management for establishing and controlling
identity in virtual organizations and for specifying their data resources. GridNFS is a new approach that extends “best of breed” Internet
technologies with established Grid architectures and protocols to meet these immediate needs
Lunch in (34-23) slides
Some Comments about Select NMI Components
In the next few slides I want to discuss our experiences with a few specific components
Overall the NMI components have been indispensable for our activities at Michigan
There are numerous EDIT components regarding information management and organization that I won’t cover in detail, though these are required to make progress on inter-institutional collaboration and resource sharing
Lunch in (34-24) slides
Globus Experiences
We had already been using Globus since V1.1.3 for our work on the US ATLAS testbed
The NMI release was nice because of the GPT packaging which made installation trivial.
There were some issues with configuration and coexistence:• Had to create a separate NMI gatekeeper to not impact
our production grid users• No major issues found…Globus just worked
Our primary Globus installation was via the Grid3 package for ATLAS
Lunch in (34-25) slides
Condor-G
Condor was already in use at our site and in our testbed.
Condor-G installed over existing Condor installations produced some problems:• Part of the difficulty was not understanding the details of
the difference between Condor and Condor-G• A file ($LOG/.schedd_address) was owned by root rather
than the condor user and this “broke” Condor-G. Resolved via the testbed support list
Condor-G has evolved over the life of the testbed and is an integral part of our ATLAS Data Challenge infrastructure
Lunch in (34-26) slides
Network Weather Service (NWS)
Installation was trivial via GPT (server/client bundles) Interesting product for us. We have done significant
work with monitoring. NWS advantages:
• Easy to automate network testing, once you understand the config details
• Prediction of future value of resources is fairly unique and potentially useful for grid scheduling
NWS disadvantages:• Difficult user interface (relatively obscure syntax to access
measured/predicted data)
Our REU student may take up an NWS related project
Lunch in (34-27) slides
KX509 for Enabling Access
The University of Michigan has around 200,000 active “uniqnames” in its Kerberos authentication system. It is not feasible to replicate this into other systems and so we have developed KX509 for translation to PKI space.
Our MGRID portal and gatekeepers are all configured to utilize KX509 generated credentials from our users normal Kerberos identities.
This makes authentication trivial for our installed user base.
Lunch in (34-28) slides
GSI OpenSSH
Useful program to extend functionality of PKI to OpenSSH.
Allows “automatic” interactive login to proxy holders based upon Globus mapfile entries
Simple to install---In principle a superset of OpenSSH on the server end
We had a problem with a conflict in dynamic libraries which it installs on a non-NMI host
Very convenient in conjunction with KX509
Lunch in (34-29) slides
Campus Grid Implementation
Our MGRID challenge has been how to develop and enable a useable, deployable grid infrastructure across different academic/administrative divisions within the University
A key aspect of the challenge is the NMI components which are intended to “standardize” much of the needed functionality around information flow, authentication, authorization, monitoring and resource delivery
Delivering something which is as easy to use and deploy as possible is a very important…
Lunch in (34-30) slides
Distribution and Installation
As we started to integrate NMINMI components and extend and develop our own concepts we ran into a major issue: others want to use/take advantage of what we are delivering.
Many of you likely realize the complexity which can surround the installation/configuration of even a single grid component, let alone a complete system involving many components.
Our plans are to provide PacMan PacMan distributions of our software as well as CDs for “bare metal” installs. This is a critically important (and just beginning) effort for us, especially as more users on campus start asking “How can I participate/take advantage of MGRID?”
Lunch in (34-31) slides
Ease of Use and Adoption
One thing we realized early was a requirement that any grid solution we developed be easy to adopt and use.
MGRID choices have been strongly influenced by this overriding concept:• Using a portal to provide client capability• Leveraging existing authentication and information services as much
as possible• Providing tools and an environment for our “virtual grid computer”
similar to what a single workstation provides for its users Thus “Ease of Use and Adoption” is not just for Users but for
Administrators and Managers as well!
Lunch in (34-32) slides
Authorization
Some of the hardest issues MGRID is facing are related to authorization.
We are tracking packages like Permis and Shibboleth to help provide solutions
Secure LDAP (Walden) can help provide a campus-wide resource building upon existing attributes to help “feed” authorization policy engines which are being developed
This is an area of intense interest for us, especially because of our work at Michigan on NFS V4, GridNFS and NTAP
Lunch in (34-33) slides
Ongoing and Future Efforts
GridNFS has been funded for 3 years by the NSF NMI Development program
Development of MARS, a “Meta-scheduling” package, is now funded by NSF.
Planning how to merge NTAP into GNMI/Internet2 Easy to use installation and upgrade packages are under
development and are critical to our success on campus. Continue to emphasize standards and ease of use and
adoption as our guidelines for delivering functionality Continue efforts in Authorization and Accounting to
produce grid systems which deliver a range of capabilities similar to what individual systems current provide.
Lunch in (34-34) slides
Points to Conclude With…
NMI has been a key component of our efforts Use of a portal can make access to various distributed
resources safe and easy Making it easy to distribute, deploy and configure middleware
has to be a priority if we are to make a real impact. Working with others is very important: Working with others is very important:
• Learning from their experiences• Input on our directions• Collaborating for common solutions
Michigan plans to continue working with NMINMI and developing needed infrastructure for successful, effective grids and networks.
LUNCH!