university of maryland profile-driven selective program loading tugrul ince [email protected] jeff...

22
University of Maryland Profile-Driven Selective Program Loading Tugrul Ince [email protected] Jeff Hollingsworth Department of Computer Science University of Maryland, College Park, MD 20742

Upload: hannah-ferguson

Post on 14-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: University of Maryland Profile-Driven Selective Program Loading Tugrul Ince tugrul@cs.umd.edu Jeff Hollingsworth Department of Computer Science University

University of Maryland

Profile-Driven Selective Program Loading

Tugrul [email protected]

Jeff HollingsworthDepartment of Computer Science

University of Maryland, College Park, MD 20742

Page 2: University of Maryland Profile-Driven Selective Program Loading Tugrul Ince tugrul@cs.umd.edu Jeff Hollingsworth Department of Computer Science University

University of Maryland2

Motivation Programs are getting larger!

– Many frameworks and libraries Many supercomputers lack demand-

paging– Example: Cray XT and BlueGene series– Available memory is scarce

Observation: Most programs do not use every available function!– Frameworks and libraries are too general– Code that handles errors or special cases

Why not remove functions that are not used in the common case?

Page 3: University of Maryland Profile-Driven Selective Program Loading Tugrul Ince tugrul@cs.umd.edu Jeff Hollingsworth Department of Computer Science University

University of Maryland3

Aim

Reduce memory footprintby selectively loading

parts of shared libraries

Page 4: University of Maryland Profile-Driven Selective Program Loading Tugrul Ince tugrul@cs.umd.edu Jeff Hollingsworth Department of Computer Science University

University of Maryland

Target Platforms and Applications

Unix/Linux systems that support ELF– Modifies ELF program headers

Applications with many libraries– Most current reasonable applications

Parallel programs running on multiple nodes– MPI etc.

Platforms without demand-paging– Cray XT and BlueGene series

4

Page 5: University of Maryland Profile-Driven Selective Program Loading Tugrul Ince tugrul@cs.umd.edu Jeff Hollingsworth Department of Computer Science University

University of Maryland

Architecture Overview

5

Application is profiled. It is rewritten with

– Modified Shared Libraries– A Signal Handler

Application is executed as usual.

Page 6: University of Maryland Profile-Driven Selective Program Loading Tugrul Ince tugrul@cs.umd.edu Jeff Hollingsworth Department of Computer Science University

University of Maryland

Profiler

Need a list of never-called functions in each shared library– Profile the application several times– May not be perfect

DynInst-based profiler– Write small program (~ 70 LOC)– Rewrite shared libraries– Profile as many times as necessary

6

Page 7: University of Maryland Profile-Driven Selective Program Loading Tugrul Ince tugrul@cs.umd.edu Jeff Hollingsworth Department of Computer Science University

University of Maryland

Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x000000 0x00000000 0x00000000 0x090000 0x090000 R E 0x1000 LOAD 0x112000 0x00112000 0x00112000 0x012584 0x012584 R E 0x1000

Rewriting

Do not load unused functions– Modify ELF program headers– Example: libpetsc.so

7

Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x000000 0x00000000 0x00000000 0x124584 0x124584 R E 0x1000

.text

LOAD 0x124584 0x00125584 0x00125584 0x013f8 0x0a434 RW 0x1000 DYNAMIC 0x12459c 0x0012559c 0x0012559c 0x00130 0x00130 RW 0x4 GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4

First Loadable Section:.text, .init, .fini, .plt

Second Loadable Section:.dynamic, .got, .got.plt, .data, .bss

Page 8: University of Maryland Profile-Driven Selective Program Loading Tugrul Ince tugrul@cs.umd.edu Jeff Hollingsworth Department of Computer Science University

University of Maryland

Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x000000 0x00000000 0x00000000 0x090000 0x090000 R E 0x1000 LOAD 0x112000 0x00112000 0x00112000 0x012584 0x012584 R E 0x1000

Rewriting

Do not load unused functions– Modify ELF program headers– Example: libpetsc.so

8

.text

LOAD 0x124584 0x00125584 0x00125584 0x013f8 0x0a434 RW 0x1000 DYNAMIC 0x12459c 0x0012559c 0x0012559c 0x00130 0x00130 RW 0x4 GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4

First Loadable Section:.text, .init, .fini, .plt

Second Loadable Section:.dynamic, .got, .got.plt, .data, .bss

Page 9: University of Maryland Profile-Driven Selective Program Loading Tugrul Ince tugrul@cs.umd.edu Jeff Hollingsworth Department of Computer Science University

University of Maryland

Rewriting

Rewriter based on DynInst Profile data is used to create lists of

Used and Unused functions Access / Modify symbols Defragment functions to maximize

space savings– Requires moving functions inside shared

libraries

9

Page 10: University of Maryland Profile-Driven Selective Program Loading Tugrul Ince tugrul@cs.umd.edu Jeff Hollingsworth Department of Computer Science University

University of Maryland

Function Defragmentation

10

UsedUnused

Page 11: University of Maryland Profile-Driven Selective Program Loading Tugrul Ince tugrul@cs.umd.edu Jeff Hollingsworth Department of Computer Science University

University of Maryland

Challenges: Relative Calls

Common way of calling functions in PIC.

If either callee or caller is moved, their relative positioning changes.

Offsets in such relative call instructions need to be updated

11

call d

foo

d

call d’

foo

d'

Page 12: University of Maryland Profile-Driven Selective Program Loading Tugrul Ince tugrul@cs.umd.edu Jeff Hollingsworth Department of Computer Science University

University of Maryland

Challenges: Symbols

Runtime linker uses symbols to resolve cross-library calls.– Uses procedure linkage tables (plt)

If a function is moved, its associated symbol has to be updated.

12

call foo@plt

foo@plt

foo: 0xdeadbeef

foo call foo@plt

foo@plt

foo: 0xbeefdead

foo

Page 13: University of Maryland Profile-Driven Selective Program Loading Tugrul Ince tugrul@cs.umd.edu Jeff Hollingsworth Department of Computer Science University

University of Maryland

Challenges: Jump Tables

Used to represent n-way branches at machine level

Targets are read from jump table– Entries are offsets of targets from the GOT

address

Becomes invalid if the function referenced in a jump table is moved

DynInst reads jump tables to generate CFGs

We update entries so that they can be used to point to new location of targets

13

Page 14: University of Maryland Profile-Driven Selective Program Loading Tugrul Ince tugrul@cs.umd.edu Jeff Hollingsworth Department of Computer Science University

University of Maryland

Unexpectedly Called Function

Execution is not always predictable– Unexpected function calls

Rewrite original executable with a Signal Handler

Load the function upon an unexpected call– Signal Handler picks up page faults

(SIGSEGV)– Loads requested page on-demand– Execution resumes

User-level: No OS modifications14

Page 15: University of Maryland Profile-Driven Selective Program Loading Tugrul Ince tugrul@cs.umd.edu Jeff Hollingsworth Department of Computer Science University

University of Maryland15

Experiments Tested on

– PETSc ex5 in snes package– PETSc ex2 in ksp package– GS2

Compiled with debug flag and no optimization

Used Open MPI Tested on 64-node cluster at UMD

– Dual-core x86 processors– Unmodified Linux kernel

Space savings of about 82% on average

Page 16: University of Maryland Profile-Driven Selective Program Loading Tugrul Ince tugrul@cs.umd.edu Jeff Hollingsworth Department of Computer Science University

University of Maryland

PETSc – snes (ex5)

16

Library Name

Text Pages (Original)

Text Pages (Modified)

Reduction %

petsc 260 68 73.85

petscdm 161 19 88.2

petscksp 335 39 88.36

petscmat 772 40 94.82

petscvec 204 52 74.51

petscsnes 20 20 0

mpi_cxx 10 5 50

mpi 142 37 73.94

open-pal 62 34 45.16

open-rte 55 34 38.18

m 28 3 89.29

Library Name

Text Pages (Original)

Text Pages (Modified)

Reduction %

X11 146 7 95.21

lapack 866 2 99.77

blas 80 3 96.25

stdc++ 133 12 90.98

gcc_s 12 2 83.33

Xau 2 2 0

Xdcm 3 3 0

gfortran 123 4 96.75

dl 2 2 0

nsl 14 2 85.71

util 2 2 0

OVERALL 2021 348 82.78

Page 17: University of Maryland Profile-Driven Selective Program Loading Tugrul Ince tugrul@cs.umd.edu Jeff Hollingsworth Department of Computer Science University

University of Maryland

PETSc – snes (ex5)

17

Page 18: University of Maryland Profile-Driven Selective Program Loading Tugrul Ince tugrul@cs.umd.edu Jeff Hollingsworth Department of Computer Science University

University of Maryland

PETSc – ksp (ex2)

18

Library NameText Pages (Original)

Text Pages (Modified) Reduction %

petsc 260 72 72.31

petscdm 161 3 98.14

petscksp 335 49 85.37

petscmat 772 49 93.65

petscvec 204 54 73.53

mpi_cxx 10 5 50

mpi 142 47 66.9

open-pal 62 37 40.32

open-rte 55 36 34.55

OVERALL 2001 352 82.41

Page 19: University of Maryland Profile-Driven Selective Program Loading Tugrul Ince tugrul@cs.umd.edu Jeff Hollingsworth Department of Computer Science University

University of Maryland

GS2

19

Library Name Text Pages (Original)Text Pages (Modified) Reduction %

MdsLib 21 0 100

MdsShr 21 0 100

TdiShr 220 3 98.64

TreeShr 38 0 100

fftw 70 25 64.29

rfftw 58 8 86.21

mpi_f77 13 2 84.62

mpi 142 40 71.83

open-pal 62 36 41.94

open-rte 55 36 34.55

OVERALL 700 150 78.57

Page 20: University of Maryland Profile-Driven Selective Program Loading Tugrul Ince tugrul@cs.umd.edu Jeff Hollingsworth Department of Computer Science University

University of Maryland

Running Times

GS2 takes 5 seconds less on average– (36m 38s vs. 36m 33s)

Overhead on PETSc examples– ex2 runs for 2.7 secs, ex5 runs for 1.05 secs.

20

Page 21: University of Maryland Profile-Driven Selective Program Loading Tugrul Ince tugrul@cs.umd.edu Jeff Hollingsworth Department of Computer Science University

University of Maryland

Running Times

Results suggest no overhead for reasonably-long running programs– Initial cost for signal handler registration– Better instruction cache and TLB performance

21

Page 22: University of Maryland Profile-Driven Selective Program Loading Tugrul Ince tugrul@cs.umd.edu Jeff Hollingsworth Department of Computer Science University

University of Maryland22

Summary

Our tool reduces memory footprint of shared libraries

Rewrite shared libraries with holes– Defragment functions to maximize space

savings

On-demand page loading if a not-yet-loaded function is called

About 82% memory space savings for shared libraries

Might improve instruction cache and TLB performance