berkeley research computing town hall meeting savio overview · savio - the need has been stated...
Post on 16-May-2020
4 Views
Preview:
TRANSCRIPT
SAVIO - The Need Has Been Stated
Inception and design was based on a specific need articulated by Eliot Quataert and nine other faculty:
Dear Graham,
We are writing to propose that UC Berkeley adopt a condominium computing model, i.e., a more centralized model for supporting research computing on campus...
SAVIO - Condo Service Offering
● Purchase into Savio by contributing standardized compute hardware
● An alternative for running a cluster in a closet with grad students and postdocs
● The condo trade-off:○ Idle resources are made available to others○ There are no (ZERO) operational costs for
administration, colocation, base storage, optimized networking and access methods, and user services
● Scheduler gives priority access to resources equivalent to the hardware contribution
SAVIO - Faculty Computing Allowance
● Provides allocations to run on Savio as well as support to researchers who have not purchased Condo nodes
● 200k Service Units (core hours) annually● More than just compute:
○ File systems○ Training/support○ User services
● PIs request their allocation via survey● Early user access (based on readiness) now● General availability planned for fall semester
SAVIO - System Overview
● Similar in design to a typical research cluster○ Master Node role has been broken out
(management, scheduling, logins, file system, etc..)● Home storage: Enterprise level, backups,
quotaed● Scratch space: Large and fast (Lustre)● Multiple login/interactive nodes● DTN: Data Transfer Node● Compute nodes are delineated based on role
SAVIO - Specification
● Hardware○ Compute Nodes: 20-core, 64GB, InfiniBand○ BigMem Nodes: 20-core, 512GB, InfiniBand
● Software Stack○ Scientific Linux 6 (equivalent to Red Hat Enterprise
Linux 6)○ Parallelization: OpenMPI, OpenMP, POSIX threads○ Intel Compiler○ SLURM job scheduler○ Software Environment Modules
SAVIO - OTP
● The biggest security threat that we encounter ...
STOLEN CREDENTIALS
● Credentials are stolen via keyboard sniffers installed on researchers laptops or workstations, incorrectly assumed to be secure
● OTP (One Time Passwords) offers mitigation● Easy to learn, simple to use, and works on both
computers and smartphones!
SAVIO - Future Services
● Serial/HTC Jobs○ Expanding the initial architecture beyond just HPC○ Specialized node hardware (12-core, 128GB, PCI
flash storage)○ Designed for jobs that use <= 1 node○ Nodes are shared between jobs
● GPU nodes○ GPUs are optimal for massively parallel algorithms○ Specialized node hardware (8-core, 64GB, 2x Nvidia
K80)
SAVIO - Faculty Computing Allowance
● Eligibility requirements○ ladder-rank faculty or PI on UCB campus.○ In need of compute power to solve a research problem.
● Allowance Request Procedure○ First fill out the Online Requirements Survey○ Allowance can be used either by the faculty or by immediate group members.○ For additional cluster accounts fill out - Additional User Account Request Form
● Allowances○ New allowances start on June 1st of every year.○ Mid-year requests are granted a prorated allocation○ A cluster specific project (fc_projectname) with all user accounts is setup○ Scheduler account (fc_projectname) with 200K core hours is setup○ Annual allocation exipres on May 31st of the following year
SAVIO - Access● Cluster access
○ Connect using SSH (server name - hpc.brc.berkeley.edu)○ Uses OTP - One Time Passwords (Multifactor authentication) ○ Multiple login nodes (randomly distribute users)
● Coming in future○ NERSC’s NEWT REST API for web portal development○ iPython notebooks & Jupyter hub integration
SAVIO - Data Storage Options
● Storage ○ No local storage on compute nodes○ All storage accessed over network○ Either NFS or Lustre protocol
● Multiple file systems○ HOME - NFS, 10GB quota, Backed up, No purge.○ SCRATCH - Lustre, No quota, No Backups, can be purged○ Project (GROUP) space - NFS, 200GB quota, No Backups, No Purge.○ No long term archive.
SAVIO - Data Transfers
● Use only the dedicated Data Transfer Node (DTN)● Server name - dtn.brc.berkeley.edu● Highly recommend using Globus (Web interface) for management ● Many other traditional tools are also supported on the DTN
○ SCP/SFTP○ Rsync○ BBCP
SAVIO - Software Support● Software module farm
○ Many of the most commonly used packages are already available.○ In most cases packages compiled from source○ Easy command line tools to browse and access packages ($ module cmd)
● Supported package list○ Open Source
■ Tools - octave, gnuplot, imagemagick, visit, qt, ncl, paraview, lz4, git, valgrind, etc..
■ Languages - GNU C/C++/Fortran compilers, Java (JRE), Python, R, etc..
○ Commercial■ Intel C/C++/Fortran compiler suite, Matlab with 80 core license for MDCS
● User applications○ Individual user/group specific packages can be built from source by users○ Recommend using GROUP storage space for sharing with others in group.○ SAVIO consultants available to answer your questions.
SAVIO - Job Scheduler
● SLURM
● Multiple Node Options (partitions)
● Interaction with Scheduler○ Only with command line tools and utilities.○ Online web interfaces for job management can be supported in future via
NERSC’s NEWT REST API or iPython/Jupyter or both.
Quality of Service Max allowed running time/job Max number of nodes/job
savio_debug 30 minutes 4
savio_normal 72 hours (i.e 3 days) 24
Partition # of nodes # of cores/node Memory/node Local Storage
savio 160 20 64 GB No local storage
savio_bigmem 4 20 512 GB No local storage
savio_htc 12 12 128 GB Local PCI Flash
SAVIO - Job Accounting
● Jobs gain exclusive access to assigned compute nodes.● Jobs are expected to be highly parallel and capable of using all
the resources on assigned nodes.
For example:
● Running on one standard node for 5 hours uses 1 (nodes) * 20 (cores) * 5 (hours) = 100 core-hours (or Service Units).
● Online User Documentation○ User Guide - http://research-it.berkeley.edu/services/high-performance-
computing/user-guide○ New User Information - http://research-it.berkeley.edu/services/high-
performance-computing/new-user-information
● Helpdesk○ Email : brc-hpc-help@lists.berkeley.edu○ Monday - Friday, 9:00 am to 5:00 pm○ Best effort in non working hours
SAVIO - How to Get Help
top related