indianauniversityindianauniversity grid2003 report john hicks transpac hpcc engineer indiana...

Download INDIANAUNIVERSITYINDIANAUNIVERSITY Grid2003 Report John Hicks TransPAC HPCC Engineer Indiana University HENP Meeting  Hawaii 25-January-2004

If you can't read please download the document

Upload: gordon-dickerson

Post on 18-Jan-2018

219 views

Category:

Documents


0 download

DESCRIPTION

INDIANAUNIVERSITYINDIANAUNIVERSITY Introduction to Grid3 Grid3 is a coordinated project between US LHC experiments (US ATLAS, US CMS), grid projects (iVDGL, GriPhyN, PPDG), and computing projects (LIGO, SDSS, BTeV) Purpose of Grid3 is to build a multi-experiment multi-VO grid environment −Test the infrastructure and services for production and analysis of scientific experiments −Provide a platform for technology demonstrators Grid3 is supported by the National Science Foundation and the Department of Energy

TRANSCRIPT

INDIANAUNIVERSITYINDIANAUNIVERSITY Grid2003 Report John Hicks TransPAC HPCC Engineer Indiana University HENP Meeting Hawaii 25-January-2004 INDIANAUNIVERSITYINDIANAUNIVERSITY Overview Introduction to Grid2003 (Grid3) Experiments Grid3 Software Grid3 Monitoring efforts Supercomputing 2003 Questions INDIANAUNIVERSITYINDIANAUNIVERSITY Introduction to Grid3 Grid3 is a coordinated project between US LHC experiments (US ATLAS, US CMS), grid projects (iVDGL, GriPhyN, PPDG), and computing projects (LIGO, SDSS, BTeV) Purpose of Grid3 is to build a multi-experiment multi-VO grid environment Test the infrastructure and services for production and analysis of scientific experiments Provide a platform for technology demonstrators Grid3 is supported by the National Science Foundation and the Department of Energy INDIANAUNIVERSITYINDIANAUNIVERSITY The Grid3 Project Grid3 is running at 28 sites The peak processor count is ~2800 CPUs There are 6 virtual organizations (VO) SDSS ATLAS iVDGL USCMS LIGO (now LIGO Scientific Collaboration, LSC) BTeV There are currently 11 application Resources are dynamically roll-in/out Applications are dynamically installed Grid3 provides a base for a persistent grid INDIANAUNIVERSITYINDIANAUNIVERSITY Science Applications Each VO provides and maintains its applications Applications do not require privileged access to be installed or to operate Reserved areas for applications, data stage-in/out, temporary files, are available Installation location information is published in MDS Multiple versions of an application may exist HEP, CS demonstrators, Astrophysics, Biology applications INDIANAUNIVERSITYINDIANAUNIVERSITY Grid3 Experiments: USATLAS The US ATLAS group consists of 31 universities and 3 national laboratories. It is participating in the building and operation of the ATLAS (A Toroidal LHC Apparatus) experiment to be installed in one of the interaction regions at the Large Hadron Collider (LHC) at CERN, Geneva Switzerland.Large Hadron ColliderCERN INDIANAUNIVERSITYINDIANAUNIVERSITY Grid3 Experiments: USCMS USCMS is a collaboration of US scientists participating in the Compact Muon Solenoid (CMS) experiment at the Lepton Hadron Collider (LHC) at CERN in Geneva, Switzerland. INDIANAUNIVERSITYINDIANAUNIVERSITY Grid3 Experiments: LIGO The Laser Interferometer Gravitational-Wave Observatory (LIGO) is a facility dedicated to the detection of cosmic gravitational waves and the harnessing of these waves for scientific research. It consists of two widely separated installations within the United States one in Hanford Washington (left) and the other in Livingston, Louisiana (right) operated in unison as a single observatory. INDIANAUNIVERSITYINDIANAUNIVERSITY Grid3 Experiments: SDSS The Sloan Digital Sky Survey (SDSS) is a collaboration of scientists and engineers to map one- quarter of the entire sky, determining the positions and absolute brightnesses of more than 100 million celestial objects. INDIANAUNIVERSITYINDIANAUNIVERSITY Grid3 Experiments: BTeV The BTeV experiment is designed to challenge the Standard Model explanation of CP violation, mixing and rare decays of beauty and charm quark states. INDIANAUNIVERSITYINDIANAUNIVERSITY Grid3 Software Pacman Packing and installation software Main deployment tool for Grid3 All software pacmanized VDT The Virtual Data Toolkit (VDT) is a set of grid software that can be easily installed and configured. The goal of the VDT is to make it as easy as possible for users to install grid software It includes fundamental grid software, Virtual Data software, and utilities INDIANAUNIVERSITYINDIANAUNIVERSITY Job submission and data transfer Globus Toolkit The Globus Toolkit is an open source toolkit used for building grids. The toolkit components can be used independently or together to develop applications. These components help support and manage elements like: Security, Fault Detection, Information infrastructure, Portability, Resource management, Data management, Communication MDS - a directory service used to publish configuration information RLS - The replica location service (RLS) maintains and provides access to mapping information from logical names for data items to target names. These target names may represent physical locations of data items, or an entry in the RLS may map to another level of logical naming for the data item. Condor Condor is an open source work management system for compute-intensive jobs which provides : A job queuing mechanism, Scheduling policy, Priority scheme, Resource monitoring, Resource management INDIANAUNIVERSITYINDIANAUNIVERSITY Job submission and data transfer (cont.) VDS The Virtual Data System (VDS Chimera/Pegasus/Sphinx/DAGMan) is open-source software which provides a method for storing the representation of computational procedures used to generate data, those procedures themselves and the datasets produced by them. This allows the auditing and lineage of derived data to be recorded and the automatic on-demand re-derivation of said data. This is important in large collaborations where it may be more difficult to determine how particular data was generated. INDIANAUNIVERSITYINDIANAUNIVERSITY User Management Virtual Organization Membership Service VOMS (Virtual Organization Membership Service) is open-source software which provides information on a user's membership within a virtual organization (VO). A virtual organization is an abstract entity grouping Users, Institutions and Resources into the same administrative domain. A User's membership in a VO indicates that he may have permissions to utilize resources at individual institutions. Grid User Management System Develop Model for Distributed User Registration Work With Existing VO Management Tools including EDG VOMS servers used in Grid2003 Help Define Requirements for New & Improved VO Tools Focus on Site Tools for User Management INDIANAUNIVERSITYINDIANAUNIVERSITY Information Services MDS Based Schemas/Information needed MDS core, GLUE (Grid Laboratory Universal Environment) Grid3 Site specific information on Grid3 ($GRID3, $APP, $DATA, $TMP, $TMP_WIN) VO specific information on Grid3 (run time environments needed to run VO specific applications) Vo and application Specific INDIANAUNIVERSITYINDIANAUNIVERSITY iVDGL Grid Operations Center (iGOC) The iGOC is currently located at Indiana University The iGOC provides 24x7x365 operational support backed by Services Level Agreements (SLA) Support includes: Problem alert, tracking, and trouble ticket support Support for systems which host the Globus Index Information Service (GIIS), VOMS Database Service, Replica Location Service (RLS), and Monitoring Tools Grid3 monitoring is coordinated through the iVDGL operations group and the iGOC INDIANAUNIVERSITYINDIANAUNIVERSITY Monitoring/Interactive Analysis services Ganglia Open source tool to collect cluster monitoring information such as CPU and network load, memory and disk usage MonALISA Monitoring tool to support resource discovery, access to information and gateway to other information gathering systems ACDC (Advanced Computational Data Center) Job Monitoring System Application using grid submitted jobs to query the job managers and collect information about jobs. This information is stored in a DB and available for aggregated queries and browsing. Metrics Data Viewer (MDViewer) analyzes and plots information collected by the different monitoring tools, such as the DBs at iGOC. Distributed Interactive Analysis of Large datasets (DIAL) provides connection between interactive analysis tools (like JAS, ROOT) and data processing applications (like ATHENA). INDIANAUNIVERSITYINDIANAUNIVERSITY Monitoring services VO GIIS MonALISA GIIS Site Catalog Ganglia ACDC JobDB ML repository OS (syscall, /proc) GRIS Job manager Log files System config. Producers Consumers Intermediaries MonALISA client MDViewer IS Clients WWW Reports User clients IS Clients Client tools INDIANAUNIVERSITYINDIANAUNIVERSITY Monitoring services (2) Web Ganglia MDS GRIS Job sched agents Information providers MonALISA ML repository Server DB Report Web Outputs Information consumers SNMP ACDC Job DB VO GIIS GIIS MDViewer Web Report INDIANAUNIVERSITYINDIANAUNIVERSITY Ganglia snapshots INDIANAUNIVERSITYINDIANAUNIVERSITY MonALISA framework INDIANAUNIVERSITYINDIANAUNIVERSITY Interactive analysis services Metrics Data Viewer (MDViewer) analyzes and plots information collected by the different monitoring tools, such as the DBs at iGOC Distributed Interactive Analysis of Large datasets (DIAL) provides connection between interactive analysis tools (like JAS, ROOT) and data processing applications (like ATHENA) Differentiate the possible information sources for MDViewer (other DBs, log files, ) and provide different GUIs (e.g. servlet) Make DIAL Grid enabled and to add a dataset catalog to it INDIANAUNIVERSITYINDIANAUNIVERSITY Grid3 status tool Choose the sites from the catalog Site list, available resources Availability test Site specific information INDIANAUNIVERSITYINDIANAUNIVERSITY Site Status tool CRON JOB update_db.pl template.igoc (pl) Get overall results Create map grid3 db gits_output.xml individual test results for each host get hostnames run gits script Look at each test result for each site and determine final result pass/fail Update each test results in database index.php presents detailed view of test results catalog.php map.php user igoc.config Update results table with final result INDIANAUNIVERSITYINDIANAUNIVERSITY Monitor Job execution Check the submitted/running/held jobs Verify the increased load Control the traffic Look the expected completion time INDIANAUNIVERSITYINDIANAUNIVERSITY Grid3 at SC2003 Users point of view Ease to become a Site (well defined instruction, responsive mailing list for support) Ease to package an application for the Grid (well defined example to follow, will provide automatic installation, submission - biology group at ANL prepared for grid execution in less than 1 week using Chimera-Pegasus) ATLAS validated the full chain event generation, simulation, reconstruction, analysis (Higgs event observed during SC03) CMS currently using grid3 for effective production more than CPU*day used by VOs in the last 2 months (real jobs, no tests) INDIANAUNIVERSITYINDIANAUNIVERSITY Submissions during SC2003 week Total number of jobs submitted during SC2003 week: ~ 3400 successful (data produced, transferred to SE, registered in RLS): ~2300. Row statistics, can be improved resubmitting the jobs which failed due to different reasons. 1. Simulation jobs: SUB OK "Higgs" sample (200evts): ~1500 ~1020 "Top" sample (200evts): ~1200 ~ Reconstruction jobs: "Higgs" sample (200evts): ~710 ~675 These data has been analyzed by David Adams using DIAL. The production chain resulted validated by the reconstruction of a Higgs trace Different errors, sometime with unknown cause, others due to changes in resource availability, failed transfer or registration, competition in shared resources (RAM), certificate issues (DOEgrid/DOEsciencegrid) INDIANAUNIVERSITYINDIANAUNIVERSITY Statistics per VO Met Targets Data transferred per day>1 TB Number of concurrent jobs >1100 (11/20/03) Number of users>100 Number of different applications >11 Number of sites running multiple applications >10 Rate of Faults/Crashes < 1/hour Operational Support Load of full demonstrator < 2 FTEs More than CPU*days used INDIANAUNIVERSITYINDIANAUNIVERSITY For more info GGF http://www.ggf.org/http://www.ggf.org/ Globus http://www.globus.org/http://www.globus.org/ Grid2003 http://www.ivdgl.org/grid2003/http://www.ivdgl.org/grid2003/ Monitoring http://grid.uchicago.edu/metrics/http://grid.uchicago.edu/metrics/ INDIANAUNIVERSITYINDIANAUNIVERSITY Questions and discussion John Hicks Indiana University