provisioning complex software environments for scientific ...dthain/talks/ccl-cvmfs-2018.pdfnative...

37
Provisioning Complex Software Environments for Scientific Applications Prof. Douglas Thain, University of Notre Dame http://www.nd.edu/~dthain [email protected] @ProfThain

Upload: others

Post on 04-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

ProvisioningComplexSoftwareEnvironments

forScientificApplicationsProf.DouglasThain,UniversityofNotreDame

http://www.nd.edu/[email protected]@ProfThain

Page 2: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

TheCooperativeComputingLab•  Wecollaboratewithpeoplewhohavelargescalecomputingproblemsinscience,engineering,andotherfields.

•  WeoperatecomputersystemsontheO(10,000)cores:clusters,clouds,grids.

•  Weconductcomputerscienceresearchinthecontextofrealpeopleandproblems.

•  Wedevelopopensourcesoftwareforlargescaledistributedcomputing.

2

http://ccl.cse.nd.edu

Page 3: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

ParrotVirtualFileSystem

UnixAppl

ParrotVirtualFileSystem

Local iRODS Chirp HTTP CVMFS

Capture System Calls via ptrace

/home=/chirp/server/myhome/software=/cvmfs/cms.cern.ch/cmssoft

Custom Namespace

File Access Tracing Sandboxing User ID Mapping . . .

Douglas Thain, Christopher Moretti, and Igor Sfiligoi, Transparently Distributing CDF Software with Parrot, Computing in High Energy Physics, pages 1-4, February, 2006.

Page 4: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

Parrot+CVMFS

wwwserver

CMSTask

Parrot

squidproxysquidproxysquidproxy

CVMFSLibrarymetadata

data

data

data

metadata

data

data

CAS Cache

CMS Software 967 GB

31M files

Content Addressable

Storage

Bui

ld C

AS

HTTP GET HTTP GET

Jakob Blomer, Predrag Buncic, Rene Meusel, Gerardo Ganis, Igor Sfiligoi and Douglas Thain, The Evolution of Global Scale Filesystems for Scientific Software Distribution, IEEE/AIP Computing in Science and Engineering, 17(6), pages 61-71, December, 2015. DOI: 10.1109/MCSE.2015.111

Page 5: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

Howdoweruncomplexworkflowsondiversecomputingresources?

Page 6: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

6

Makeflow=Make+Workflow

Makeflow

Local HTCondor Torque WorkQueue

•  Providesportabilityacrossbatchsystems.•  Enablesparallelism(butnottoomuch!)•  Faulttoleranceatmultiplescales.•  Dataandresourcemanagement.•  Transactionalsemanticsforjobexecution.

http://ccl.cse.nd.edu/software/makeflow

Amazon

Page 7: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

7

Makeflow=Make+Workflow

Makeflow

Local HTCondor Torque WorkQueue

•  Providesportabilityacrossbatchsystems.•  Enablesparallelism(butnottoomuch!)•  Faulttoleranceatmultiplescales.•  Dataandresourcemanagement.•  Transactionalsemanticsforjobexecution.

http://ccl.cse.nd.edu/software/makeflow

AmazonMesosKuber-netes

Charles Zheng ([email protected])

Page 8: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

Example:SpeciesDistributionModeling

FullWorkflow:12,500speciesx15climatescenariosx6experimentsx500MBperprojection=1.1Mjobs,72TBofoutput Small Example: 10 species x 10 expts

www.lifemapper.org

Page 9: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

MoreExamples

http://github.com/cooperative-computing-lab/makeflow-examples

Page 10: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

WorkflowLanguageEvolution

output.5.txt : input.txt mysim.exe mysim.exe –p 10 input.txt > output.5.txt

{ "command" : "mysim.exe –p 10 input.txt > output.5.txt", "outputs" : [ "output.5.txt" ], "inputs" : [ "input.dat", "mysim.exe" ] }

{ "command" : "mysim.exe –p " + x*2 + " input.txt > output." + x + " .txt", "outputs" : [ "output" + x + "txt" ], "inputs" : [ "input.dat", "mysim.exe" ] } for x in [ 1, 2, 3, 4, 5 ]

Classic "Make" Representation

JSON Representation of One Job

JX (JSON + Expressions) for Multiple Jobs

Tim Shaffer ([email protected])

Page 11: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

ElaboratingJobswithWrappers

Nick Hazekamp ([email protected])

Taskinput

inputoutput Original Job

strace

Task

input

input

output

logfile Add Debug Tool

Singularity

rhel6.img

input

input

strace

Task

output

logfile

Add Container Environment

Page 12: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

Local Files and Programs

Work Queue Architecture

WorkerProcess

Cache Dir

A

C B

WorkQueueMaster

4-core machine

Task.1 Sandbox

A

BT

2-core task

Task.2 Sandbox

C

AT

2-core task

Send files

A B C

Submit Complete

Send tasks

Page 13: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

NationalComputingResource

CampusHTCondor

Pool

PublicCloud

Provider

PrivateCluster

MakeflowMaster

LocalFilesandPrograms

Makeflow+WorkQueue

W

W

W

ssh

WW

WW

torque_submit_workers

W

W

W

condor_submit_workers

W

W

W

ThousandsofWorkersinaPersonalCloud

submittasks

Page 14: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

Problem:SoftwareDeployment•  Gettingsoftwareinstalledonanewsiteisabigpain!Theuser(probably)knowsthetoplevelpackage,butdoesn'tknow:– Howtheysetupthepackage(sometimelastyear)– Dependenciesofthetop-levelpackage.– Whichpackagesaresystemdefaultvsoptional– HowtoimportthepackageintotheirenvironmentviaPATH,LD_LIBRARY_PATH,etc.

•  Manyscientificcodesarenotdistributedviarpm,yum,pkg,etc.(anduserisn'troot)

Page 15: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

EvenBiggerDifferences:•  HardwareArchitecture

–  X86-64,KNL,BlueGene,GPUs,FPGAs,...•  OperatingSystem

–  GreenAvocadoLinux,BlueDolphinLinux,RedRashLinux,...•  BatchSystemorResourceManager

–  HTCondor,PBS,Torque,Cobalt,Mesos,...

•  ContainerTechnology–  None,Docker,Singularity,CharlieCloud,Shifter,…

•  RunningServices–  FUSE,CVMFS,HTTPProxy,Frontier,...

•  NetworkConfiguration–  Public/Private,Incoming/Outgoing,Firewalls

Page 16: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

OurApproach:•  Providetoolsthatmakeaflexiblemappingbetweentheuser'sintentandthelocalsite:– "IneedOSRHEL6"

•  Checkifalreadypresent,otherwiserunincontainer.– "IneedcontainerX.img"

•  TryDocker,trySingularity,tryCharlieCloud.– "Ineed/cvmfs/repo.cern.ch"

•  Lookfor/cvmfs;activateFUSE;build/runparrot.

– "IneedsoftwarepackageX"•  LookforXinstalledlocally,elsebuildfromrecipe.

Page 17: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

DeliveringPlatformswithRunOS

Page 18: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

"runosslc6– mysim.exe"

SiteA SiteCSiteB

slc6 rhel7 debian45

docker singularity charliecloudmysim.exe

slc6

mysim.exe

slc6

mysim.exeKyle Sweeney ([email protected])

Page 19: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

DeliveringSoftwarewithVC3-Builder

Page 20: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

TypicalUserDialogInstallingBLAST

"I just need BLAST." "Oh wait, I need Python!" "Sorry, Python 2.7.12" "Python requires SSL?" "What on earth is pcre?" "I give up!"

Page 21: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

MAKERBioinformaticsPipeline

Page 22: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

SealedPackage

VC3-BuilderArchitecture

UpstreamSources

Builder

CachedSources

InstallTree

Task

Task Sandbox A B

C D

SoftwareRecipes

CachedRecipes

A B

C DRecipe

Archival or Disconnected Operation

PATH PYTHONPATH LD_LIBRARY_PATH

Page 23: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

..Plan: ncbi-blast => [, ]

..Try: ncbi-blast => v2.2.28

....Plan: perl => [v5.008, ]

....Try: perl => v5.10.0

....could not add any source for: perl v5.010 => [v5.8.0, ]

....Try: perl => v5.16.0

....could not add any source for: perl v5.016 => [v5.8.0, ]

....Try: perl => v5.24.0

......Plan: perl-vc3-modules => [v0.001.000, ]

......Try: perl-vc3-modules => v0.1.0

......Success: perl-vc3-modules v0.1.0 => [v0.1.0, ]

....Success: perl v5.24.0 => [v5.8.0, ]

....Plan: python => [v2.006, ]

....Try: python => v2.6.0

....could not add any source for: python v2.006 => [v2.6.0, ]

....Try: python => v2.7.12

......Plan: openssl => [v1.000, ] ……………….. Downloading 'Python-2.7.12.tgz' from http://download.virtualclusters.org/builder-files details: /tmp/test/vc3-root/x86_64/redhat6/python/v2.7.12/python-build-log processing for ncbi-blast-v2.2.28 preparing 'ncbi-blast' for x86_64/redhat6 Downloading 'ncbi-blast-2.2.28+-x64-linux.tar.gz' from http://download.virtualclusters.org… details: /tmp/test/vc3-root/x86_64/redhat6/ncbi-blast/v2.2.28/ncbi-blast-build-log

"vc3-builder–requirencbi-blast"(New Shell with Desired Environment)

bash$ which blastx /tmp/test/vc3-root/x86_64/redhat6/ncbi-blast/v2.2.28/bin/blastx bash$ blastx –help USAGE blastx [-h] [-help] [-import_search_strategy filename] . . . bash$ exit

Page 24: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

Problem:LongBuildonHeadNode•  Manycomputingsiteslimittheamountofworkthatcanbedoneontheheadnode,soastomaintainqualityofserviceforeveryone.

•  Solution:Movethebuildjobsouttotheclusternodes.(Whichmaynothavenetworkconnections.)

•  Idea:Reducetheproblemtosomethingwealreadyknowhowtodo:Workflow!

•  Buthowdowebootstraptheworkflowsoftware?Withthebuilder!

Page 25: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

vc3-builder--requiremakeflow--requirencbi-blast--makeflow–Tcondorblast.mf

Page 26: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

HeadNode

BootstrappingaWorkflow

UpstreamSources

Builder

SoftwareRecipes Worker

Nodes

Makeflow Makeflow

BuildTask

BuildTask

BuildTask

BuildTask

BuildTask

BuildTask

Build Makeflow

Build BLAST

BLAST

BLASTTask

BLASTTask

BLASTTask

BLASTTask

BLASTTask

BLASTTask

Page 27: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

ExampleApplications

MAKER Octave Benjamin Tovar, Nicholas Hazekamp, Nathaniel Kremer-Herman, and Douglas Thain, Automatic Dependency Management for Scientific Applications on Clusters, IEEE International Conference on Cloud Engineering (IC2E) , April, 2018.

Page 28: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

DeliveringServiceswithVC3-Builder

Page 29: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

..Plan: cvmfs => [, ]

..Try: cvmfs => v2.0.0

....Plan: parrot => [v6.0.16, ]

....Try: parrot => v6.1.1

......Plan: cctools => [v6.1.1, ]

......Try: cctools => v6.1.1

........Plan: zlib => [v1.002, ]

........Try: zlib => v1.2.8

........Success: zlib v1.2.8 => [v1.2.0, ]

......Fail-prereq: cctools-v6.1.1

........Plan: perl => [v5.010.000, v5.010001]

........Try: perl => v5.10.0

..........Plan: perl-vc3-modules => [v0.001.000, ]

..........Try: perl-vc3-modules => v0.1.0

..........Success: perl-vc3-modules v0.1.0 => [v0.1.0, ]

........could not add any source for: perl v5.010 => [v5.10.0, v5.10001.0]

........Try: perl => v5.16.0

..........Plan: perl-vc3-modules => [v0.001.000, ]

..........Try: perl-vc3-modules => v0.1.0

..........Success: perl-vc3-modules v0.1.0 => [v0.1.0, ]

........could not add any source for: perl v5.016 => [v5.10.0, v5.10001.0]

........Try: perl => v5.24.0

..........Plan: perl-vc3-modules => [v0.001.000, ]

..........Try: perl-vc3-modules => v0.1.0

..........Success: perl-vc3-modules v0.1.0 => [v0.1.0, ]

........Success: perl v5.24.0 => [v5.10.0, v5.10001.0]

"vc3-builder–requirecvmfs"

(New Shell with Desired Environment) bash$ ls /cvmfs/oasis.opensciencegrid.org atlas csiu geant4 ilc nanohub osg-software auger enmr glow ligo nova sbgrid cmssoft fermilab gluex mis osg snoplussnolabca . . . bash$ exit

Page 30: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

PuttingitAllTogether

Page 31: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

Submit Batch Jobs

NativeRHEL7Machines

RunOS"rhel6"

SingularityContainer

VC3Builder

Parrot+CVMFS

Factory

Request 128 nodes of16 cores, 4G RAM, 16G disk with RHEL6 operating system, CVMFS and Maker software installed:

Worker

128X

BatchSystem

Makeflow

Sandbox

Task

Page 32: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

VC3:VirtualClustersforCommunityComputation

DouglasThain,UniversityofNotreDameRobGardner,UniversityofChicago

JohnHover,BrookhavenNationalLab

http://virtualclusters.org

Page 33: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

YouhavedevelopedalargescaleworkloadwhichrunssuccessfullyataUniversitycluster.

Now,youwanttomigrateandexpandthatapplicationtonational-scaleinfrastructure.(Andallowotherstoeasilyaccessandrunsimilarworkloads.)

TraditionalHPCFacility DistributedHTCFacility CommercialCloud

Page 34: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

TraditionalHPCFacility DistributedHTCFacility CommercialCloud

Concept:VirtualCluster•  200nodesof24coresand64GBRAM/node•  150GBlocaldiskpernode•  100TBsharedstoragespace•  10Gboutgoingpublicinternetaccessfordata•  CMSsoftware8.1.3andpython2.7

VirtualClusterService

VirtualClusterFactory

DeployServices DeployServices DeployServices

VirtualClusterFactory

VirtualCluster

VirtualClusterFactory

VirtualClusterFactory

VirtualClusterFactory

Page 35: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

SomeThoughts:•  Makesoftwaredependenciesmoreexplicit.

– Proposed:Nothingshouldbeavailablebydefault,allsoftwareshouldrequirean"import"step.

•  Layertoolswithcommonabstractions:–  Factory->HTCondor->Singularity->Builder->Worker–  Provision->Schedule->Contain->Build->Execute

•  Needbetter,portable,waysofexpressing:– Whatsoftwareenvironmenttheuserwants.– Whatenvironmentthesiteprovides.

•  Theabilitytonestenvironmentsiscritical!

Page 36: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

Acknowledgements

DE-SC0015711 VC3: Virtual Clusters for Community Computation

ACI-1642409 SI2-SSE: Scaling up Science on Cyberinfrastructure with the Cooperative Computing Tools

Notre Dame CMS: Kevin Lannon Mike Hildreth Kenyi Hurtado Univ. Chicago: Rob Gardner Lincoln Bryant Suchandra Thapa Benedikt Riedel Brookhaven Lab: John Hover Jose Caballero

Page 37: Provisioning Complex Software Environments for Scientific ...dthain/talks/ccl-cvmfs-2018.pdfNative RHEL7 Machines RunOS "rhel6" Singularity Container VC3 Builder Parrot + CVMFS Factory

http://ccl.cse.nd.edu

@ProfThain