the art of cpu-pinning: evaluating and improving the

30
The Art of CPU-Pinning: Evaluating and Improving the Performance of Virtualization and Containerization Platforms Davood GhatrehSamani, Chavit Denninart Joseph BacikMohsen Amini SalehiHigh Performance Cloud Computing Lab (HPCC) School of Computing and Informatics University of Louisiana Lafayette 1

Upload: others

Post on 04-Dec-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

The Art of CPU-Pinning:Evaluating and Improving the

Performance of Virtualization and Containerization PlatformsDavoodGhatrehSamani,Chavit Denninart

JosephBacik†MohsenAmini Salehi‡

HighPerformanceCloudComputingLab(HPCC)SchoolofComputingandInformaticsUniversityofLouisianaLafayette

1

Introduction

• Executionplatforms1. Bare-Metal(BM)2. HardwareVirtualization(VM)3. OSVirtualization(containers,CN)

• Choosingaproperexecutionplatform,basedontheimposedoverhead▪ ContainerontopofVM(VMCN)isnotstudiedandcomparedtootherplatformsindepth

2

Introduction

• Overheadbehaviorandtrend▪ Differentexecutionplatforms(BM,VM,CN,VMCN)

▪ Differentworkloadpatterns(CPUintensive,IOintensive,etc.)

▪ Increasingcomputeresources▪ Computetuningapplied(CPUpinning)

• Cloudsolutionarchitectchallenge:▪ WhichExecutionplatformsuitswhatkindofworkload

3

Hardware Virtualization (VM)

• Operatesbasedonahypervisor• VM:aprocessrunninginthehypervisor• HypervisorhasnovisibilitytoVM’sprocesses

• KVM:apopularopen-sourcehypervisorAWSNitroC5VMtype

4

OS Virtualization (Container)

• LightweightOSlayervirtualization• Noresourceabstraction(CPU,Memory,etc.)

• HostOShascompletevisibilitytothecontainerprocesses

• Container=namespace+cgroups• Docker:Themostwidelyadoptedcontainertechnology

5

VM vs Container

6

CPU Provisioning in Virtualized Platform

• Default:Timesharing▪ Linux:CompletelyFairScheduler(CFS)▪ AllCPUcoresareutilizedevenifthere

isonlyoneVMinthehostortheworkloadisnotheavy

▪ EachCPUquantumdifferentsetofCPUcores

▪ CalledVanillamodeinthisstudy• Pinned:FixedsetofCPUcoresforall

quantum▪ Overridedefaulthost/hypervisorOS

scheduler▪ Processisdistributedonlyamong

thosedesignatedCPUcores

7

Execution Platforms

8

Application types and measurements

• Measuredperformancemetric▪ TotalExecutionTime

• Overheadratio▪ !"#$%&##(#)*+,-.+,/#-00#$#123+4#56%+0-$/

!"#$%&##(#)*+,-.+,/#-02%$#7/#+%6

• Performancemonitoringandprofilingtools▪ BCC(BPFCompilerCollection:cpudist,offcputime),iostat,perf,

htop,top

9

Configuration of instance types

• Hostserver:DELLPowerEdgeR830▪ 4×IntelXeonE5-4628Lv4

o Eachprocessoris1.80,35MBcacheand14processingcores(28threads)

o 112homogeneouscores▪ 384GBmemory

o 24×16GBDDR4DRAM▪ RAID1(2×900GBHDD10k)storage.

10

Motivation

• IndepthstudyofcontainerontopofVM(VMCN)

• Comparingdifferentexecutionplatforms(BM,VM,CN,VMCN)alltogather

• Reallifeapplicationswithdifferentworkloadpatterns

• Findinganoverheadtrendbyincreasingresourceconfigurations

• InvolvingCPUpinningintheevaluation

11

Contribution to this work

• Unveiling▪ PSO(PlatformSizeOverhead)▪ CHR(ContainertoHostcoreRatio)

• LeveragePSOandCHRtodefineoverheadbehaviorpatternfor▪ Differentresourceconfigurations▪ Differentworkloadtypes

• Asetofbestpracticesforcloudsolutionarchitects▪ Whichexecutionplatformsuitswhatkindof

workload

Experiment and analysis: Video Processing Workload Using FFmpeg

• FFmpeg:Widelyusedvideotranscoder▪ Veryhighprocessingdemand▪ Multithreaded(upto16cores)▪ Smallmemoryfootprint▪ RepresentativeofaCPU-intensiveworkload

• Workload:▪ Function:codecchangefromAVC(H.264)to

HEVC(H.265)▪ Sourcevideofile:30MBHDvideo▪ Meanandconfidenceintervalacross20timesof

executionforeachplatformcollected

Experiment and analysis: Video Processing Workload Using FFmpeg

Experiment and analysis: Parallel Processing Workload Using MPI

• MPI:Widely-usedHPCplatform▪ Multi-threaded▪ ResourceusagefootprinthighlydependsontheMPIprogram

• Workload▪ Applications:MPI_Search,Prime_MPI▪ Computeintensive,however,communicationbetweenCPUcoresdominatesthecomputation

▪ Meanandconfidenceintervalacross20timesofexecutionforeachplatformcollected

Experiment and analysis: Parallel Processing Workload Using MPI

Experiment and analysis: Web-based Workload Using WordPress

• WordPress▪ PHP-basedCMS:ApacheWebServer+MySQL▪ IOintensive(networkanddiskinterrupts)

• Workload▪ AsimplewebsiteissetuponWordPress▪ Browsingbehaviorofawebuserisrecorded▪ 1,000simultaneouswebusersaresimulated

o ApacheJmeter▪ Eachexperimentisperformed6times▪ Meanexecutiontime(responsetime)oftheseweb

processesisrecorded

Experiment and analysis: Web-based Workload Using WordPress

18

Experiment and analysis: Web-based Workload Using WordPress

NoSQL Workload using Apache Cassandra

• ApacheCassandra:▪ DistributedNoSQL,BigDataplatform▪ Demandscompute,memory,anddiskIO.

• Workload▪ 1,000operationswithinonesecond

o Cassandra-stresso 25%Write,85%Reado 100threads,eachonesimulatingoneuser

▪ Eachexperimentisrepeated20times▪ Averageexecutiontime(responsetime)ofallthe

synthesizedoperations.

NoSQL Workload using Apache Cassandra

21

NoSQL Workload using Apache Cassandra

22

Cross-Application Overhead Analysis

• Platform-TypeOverhead(PTO)• Resourceabstraction(VM)• Constanttrend• Pinningisnohelpful

• Platform-SizeOverhead(PSO)▪ DiminishedbyincreasingthenumberofCPUcores▪ Specifictocontainers▪ JustreportedbyIBMforDocker(Websphere tuning)• Pinningishelpfulalot

Parameters affecting PSO

1. ContainerResourceUsageTracking• cgroups

2. Container-to-HostCoreRatio(CHR)

• CHR=AssignedcorestothecontainerTotalnumberofhostcores

3. IOOperations4. Multitasking

Impact of CHR on PSO

• Lower value of CHR imposes a larger overhead (PSO)

• Application characteristics define the value of CHR

• CPU intensive• 0.14 < 𝐶𝐻𝑅 < 0.28

• IO intensive: higher• 0.28 < 𝐶𝐻𝑅 < 0.57

Container Resource Usage Tracking

• OSschedulerallocatesallavailableCPUcorestotheCNprocess

• cgroupscollectsusagescumulatively• EachschedulingeventhasdifferentCPUallocationforthatCN

• cgroupsisanatomic(kernelspace)process• Containerhastobesuspendedwhileaggregatingresource

• OSschedulingenforcesprocessmigration,cgroupsenforceresourceusagetracking-- Synergistic

The Impact of Multitasking on PSO

27

The Impact of IO operations on PSO

Experimental Setup: MPI task

• CPU pinning can mitigate this kind of overhead

Summary

Experimental Setup: MPI task

1. Applicationcharacteristicisdecisiveontheimposedoverhead

2. CPUpinningreducetheoverheadforIO-boundapplicationsrunningoncontainers.

3. CHRplaysasignificantroleontheoverheadofcontainers

4. ContainersmayinducehigheroverheadincomparingtoVMs

5. ContainersontopofVMs(calledVMCN)imposealoweroverheadforIOintensiveapplications

Best Practices

Experimental Setup: MPI task

1. Avoidsmallvanillacontainers2. UsepinningforCPU-boundcontainers3. NotworthwhiletousepinningforCPU-boundVMs4. UsepinningforIOintensiveworkloads5. CPUintensiveapplications:0.07<𝐶𝐻𝑅 <0.146. IOintensiveapplications:0.14<𝐶𝐻𝑅 <0.287. UltraIOintensiveapplications:0.28<𝐶𝐻𝑅 <0.57