open source kan sneller, the startlog study

20
(C) ALbert Mietus, PTS B&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study 1 OpenSource kan sneller! Historical, Linux is used on systems where boot-time is not relevant. With EQSLinux it has become easy to start using embedded Linux. Now, we want to make to make Linux boot faster! But, Linux is complex ... It always has a thousand solutions ... And we even are not sure about the problem ... So, How to solve it? Time to investigate, to study, ...

Upload: albert-mietus

Post on 15-Jul-2015

558 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Open Source Kan Sneller, the StartLog Study

(C)

AL

bert

Mie

tus,

PT

SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study1

OpenSource kan sneller!Historical, Linux is used on systems where boot-time is not relevant.With EQSLinux it has become easy to start using embedded Linux.Now, we want to make to make Linux boot faster!

But,Linux is complex ...It always has a thousand solutions ...And we even are not sure about the problem ...

So,How to solve it?

Time to investigate, to study, ...

Page 2: Open Source Kan Sneller, the StartLog Study

(C)

AL

bert

Mie

tus,

PT

SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study2

Today1. Is there a problem?

Linux is a slow starter...

• How does Linux boot?Some theory

and a few quick wins...

• How to measure?We need to KNOWKNOW, not hope or guessMeasure, change, re-measure and compare!

• Results and solutionsNot only PTS’ EQSLinux,

but for all (embedded) Linux-systems!

Page 3: Open Source Kan Sneller, the StartLog Study

(C)

AL

bert

Mie

tus,

PT

SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study3

Typical Linux ...Linux is never designed for embedded systems

Unix/Linux used to be a ‘server’Boot-time is not important, flexibility is

Linux@theDesktop is ‘hot’People get used to wait for computersThere is a lot of ‘user-space’ waiting

‘Building the desktop’ takes a lot more then system-boot

Example: Booting one of our development-systems takes143143 seconds!

☞ suse-9.1, with lots of HW

On most embedded systems, that is not acceptableWe are used to millimillisecond, not thousands of them!

Page 4: Open Source Kan Sneller, the StartLog Study

(C)

AL

bert

Mie

tus,

PT

SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study4

Booting Linux is complexFlexibility

Unix-brands, Linux-distributions, SystemUse☞E.g. Banking versus SW-development

Hardware, Systems

Legacy RunLevels, Compatibility

Lack of design-vision Background & know-how of developers

Some developers have other ideas, or “enrich” (don’t understand) the concept

Focus on ‘C’, ‘desktop’, ‘new’No time left for: Makefiles, engineering, booting, embedded

Page 5: Open Source Kan Sneller, the StartLog Study

(C)

AL

bert

Mie

tus,

PT

SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study5

Booting Linux Systems

POST(bios)

BOOT(grub)

Kernel(vmlinuz)

modules(*.ko)

Start-up(rc-scripts)

RE

AD

Y(lo

gin/

appl

icat

ion)

time

UP

-nes

s (%

)

There are 5 phasesWith several sub-phases; some of them are time-outs!And thousands of steps

example development system :143 seconds !

Page 6: Open Source Kan Sneller, the StartLog Study

(C)

AL

bert

Mie

tus,

PT

SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study6

Booting Linux Systems

POST(bios)

BOOT(grub)

Kernel(vmlinuz)

modules(*.ko)

Start-up(rc-scripts)

RE

AD

Y(lo

gin

/app

licat

ion)

time

UP

-nes

s (%

)

There are 5 phasesWith several sub-phases; some of them are time-outs!And thousands of steps

example development system :143 seconds !

Page 7: Open Source Kan Sneller, the StartLog Study

(C)

AL

bert

Mie

tus,

PT

SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study7

How to measure?Reliable measuring the boot-steps is difficult

All is SW-only. So, no external (scope) measurement Multiple domains, HW, SW & versions

• E.g. bios; grub, uboot, redboot; Linux-kernel-2.*.*; scripts• Instrumenting is difficult

– Can’t change BIOS, hardly can change ‘BOOT’ (there is no space!)– Lot’s & lots of code in Linux-system; fork() is architecture-dependent

Time-resolution: seconds or micro-seconds?• Some steps take less then 1 milli-second!• Every statement takes time! a_time() takes to much!• We need to see the ‘big picture’; not only details

Note: monitoring 143s in ms-resolution is (about) 500-meters of print-out!!

Page 8: Open Source Kan Sneller, the StartLog Study

(C)

AL

bert

Mie

tus,

PT

SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study8

StartLogConcept

Make it fast• Capture data in real-time, transfer later• Process data off-line

Make it simple• Minimize the changes (simple to port to every Linux)• Both for kernel, modules and start-up scripts

Make it reliable• Measure the measurement• Measure, change Linux, re-measure AND compare!

Result: StartLog (patch & post-processing)

Page 9: Open Source Kan Sneller, the StartLog Study

(C)

AL

bert

Mie

tus,

PT

SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study9

Details: patchLog system-call ‘exec’

Start a new program☞ It will miss ' source *.sh '

Filename is availableArch independent (Linux)

Use ‘dmesg’ storage It’s availableEasy to read Increase size ! (Watch for build-bug)

Use ‘jiffies’ for timeCounter (long unsigned integer)

Can wrap; random startAbout 1 ms (most systems)

do_execve(... filename ...) {

static int PTS_startlog=0; // int PTS_i;

printk("@PTS@startlog=%07d,jiffies=%012lu, do_execve(%s)",

PTS_startlog++, jiffies, filename);/* //Optional: for (PTS_i =0; PTS_i < 9; PTS_i++) printk("%s,", argv[PTS_i] ); printk(” ; "); for (PTS_i = 0; PTS_i < 9; PTS_i++) printk("%s,", envp[PTS_i] ); printk(")\n”);*/...}

Page 10: Open Source Kan Sneller, the StartLog Study

(C)

AL

bert

Mie

tus,

PT

SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study10

Details: processing & QualityStore Log

dmesg -s 512000 > aFile ftp/scp to host

Off-lineSome awk script

• Filter, recount jiffies, csv-formatExcel macros & graphing

• X-Y (scatter) for timeline• Histogram (bar), #calls• Pie, for bottleneck

Customisable It is easy to adapt to find

each problem.

Influence (real-time) Each log: 200-300 µsec

☞Depending on CPU! Near linear slowdown

Especially for interesting partsAccurate for slow steps!

Repeatability Variation ∆jiffies: <5 (95%)

Double, triple log for check Use series of three or more

Example:When flashing memory, the first boot always takes a few seconds somewhere !

EnvironmentWatch it! It has effect on boot-speedE.g. dhcp timeout!

Page 11: Open Source Kan Sneller, the StartLog Study

(C)

AL

bert

Mie

tus,

PT

SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study11

Some Results

The following sheets show some results. They show what CAN be (& is) measured Some general examples are selected

• Measurements are very project specific• And are –often– very boring for others

Often, POST and BOOT phases excluded• Not generic, not changeable,

Not measured with StartLogStartLog (but can be measured and added to graphs!)

For that reason, details are not explainedPlease contact me directly/offline for your specific questions

All times (numbers) in jiffies

All systems are NON-optimized!

Page 12: Open Source Kan Sneller, the StartLog Study

(C)

AL

bert

Mie

tus,

PT

SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study12

A first impressionBoot and clock-time does give some info, but

Only limited information• No real phases

But close, for a 1st look

• What to improve?• Hard to explain

(where) is the system Busy (‘overloaded’), or

Waiting?

We need more detail ! And concentrate on

• Linux/OpenSource parts• Delays (system is doing nothing)

Global

0

10

20

30

40

50

60

70

80

Systeem "on" Boot choise First log Ready (login)

Phase

Seconds

Trident

Trident (zonder netwerk)

Gumstix

Gumstix (zonder netwerk)

Start (HDD)

Start (HDD)(zonder netwerk)

Start

Start (zonder netwerk)

Page 13: Open Source Kan Sneller, the StartLog Study

(C)

AL

bert

Mie

tus,

PT

SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study13

0

100

200

300

0 1000 2000 3000 4000Time (jiffies)

starting-process-no

Gumstix (typical)

Gumstix (networktimeout)

Some timelines

Gumstix 2 timelines Exactly the same, but for

• the horizontal line; which is

• a network timeout (See next sheet)A lot less processes are started: Only look to % of total number! It uses less modules than above

0

500

1000

1500

2000

0 5000 10000 15000 20000 25000 30000 35000Time (jiffies)

starting-process-no

EmbeddedPC-1 (start-1)

EmbeddedPC-2 (start-3)

EmbeddedPC-3 (HD ipv CF)

EmbeddedPC-4 (TR-1)

Embedded PC 4 timelines

Similar but differentNotice: Speed:

Fastest: yellowSlowest: purple

Horizontal lines:No progress!

Page 14: Open Source Kan Sneller, the StartLog Study

(C)

AL

bert

Mie

tus,

PT

SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study14

Bottleneck && Network timeout

Normal (top)Kernel 23%Modules 35%Networking 28%*-mount 10%

S01 + S10 + S11

Network Timeout (bottom) Times are equal, but for S20networkingS20networking

When the dhcp-server is gone(no network, cable or server)

It takes an extra 1616 secondsto boot

/etc/rcS.d/S01mountvirtfs/etc/rcS.d/S05module-init-tools

/etc/rcS.d/S10mountall/etc/rcS.d/S11RAMdisk

/etc/rcS.d/S15hostname/etc/rcS.d/S17sysklogd

/etc/rcS.d/S20networking

/etc/rcS.d/S25cron/etc/rcS.d/S30thttpd

Kernel

Gumstix-NetworkTimeout

613

34

16

26

2091

22

22

147

27

396

109

Gumstix

109

613

34

16

26

487

22

25

396

150

27

Page 15: Open Source Kan Sneller, the StartLog Study

(C)

AL

bert

Mie

tus,

PT

SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study15

Program Count

0

20

40

60

80

100

120

140

/sbin/thttpd/bin/egrep

/etc/rcS.d/S15hostname/etc/rcS.d/S01mountvirtfs

/usr/sbin/modprobe

/bin/grep/usr/bin/expr

/bin/mount/etc/init.d/rcS

/sbin/modprobe/bin/uname

/bin/login

/etc/rcS.d/S11RAMdisk

/sbin/dhcpcd/sbin/hotplug/bin/bash

/bin/sh

/etc/rcS.d/S25cron

/etc/rcS.d/S05module-init-tools

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

count programs

log of count

0

200

400

600

800

1000

1200

1400

/Config/run/dhcp/dhcpcd.exe

/bin/dmesg/bin/egrep

/bin/hostname/bin/mount/bin/uname

/etc/rcS.d/S01mountvirtfs/etc/rcS.d/S10mountall/etc/rcS.d/S15hostname/etc/rcS.d/S20networking

/etc/rcS.d/S30thttpd

/sbin/getty/sbin/ifconfig/sbin/modprobe

/sbin/syslogd/usr/bin/[/usr/bin/expr/usr/bin/utelnetd/usr/sbin/crond

/usr/sbin/modprobe

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5Count(programm)

Log of count

Often-exec’ed programs are good candidates to optimize

Found some surprises• ‘hotplug’ is called directly by

the kernel• Even when it does not exists!• Called 123 (aside)

to 1315 (below) times!Most: called a few times

Some: an awful lot☞ Use logarithmic 2nd axis

Page 16: Open Source Kan Sneller, the StartLog Study

(C)

AL

bert

Mie

tus,

PT

SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study16

Summary (1/4)Problems & Solutions

Linux is a slow starter It needs more attention then a traditional RTOS There are thousand of ‘improvements’ on the Net

• Google on ‘make embedded Linux boot faster’1.4 million hits, of which

198 PowerPoint presentations in past 3 months (excluding this one)

• Usually they are (more or less ‘good’) ideas But, what do they improve? Or change?

• Does it apply to your HW, system, version, ... too? Do you know your bottleneck?

• How to measure that improvement?

Page 17: Open Source Kan Sneller, the StartLog Study

(C)

AL

bert

Mie

tus,

PT

SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study17

Summary (2/4)Booting Linux

Flexibility & Power do come with a cost• Embedded Linux boots a lot faster then ‘normal Linux’

There is much more but ‘the LinuxLinux kernel’• At least 5 phases• Thousands of steps

The ‘environment’ has influence• dhcp example: 16 extra seconds without networking!

Linux is OpenSource ... You can change it! There is more OpenSource then ‘Linux’ only

• Non-kernel stuff; other OS (both Unix-alike and others)

Page 18: Open Source Kan Sneller, the StartLog Study

(C)

AL

bert

Mie

tus,

PT

SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study18

Summary (3/4)StartLog

Capture, measure and visualize the Linux start-up• Simple, reliable, repeatable

Cheap• It is a concept, with little, free code• Easy and fast to operate

– For interpretation Linux know-how is neededUseable on all Linux versions

• So you can improve your system!

Improve ‘Measure, pin-point, change, re-measure’

Page 19: Open Source Kan Sneller, the StartLog Study

(C)

AL

bert

Mie

tus,

PT

SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study19

Summary (4/4)Generic Quick Wins

Ø Disable timeoutsØ Disable/remove unneeded kernel-modulesØ Trade-off time/space

Uncompressed images are (usually) faster!

Advice Make specific (non-generic) boot-scripts Use delayed/background processing

E.g. Start network (dhcp) late, background fsck (BSD only)

Measure and compare what is going on!

Page 20: Open Source Kan Sneller, the StartLog Study

(C)

AL

bert

Mie

tus,

PT

SB&C: 12 April 2006 OpenSource kan sneller, 'The StartLog Study20

Questions and More infoAlthough the sheets are overloaded with info,

it’s only a fraction of what’s available.More info:

Most patches & scripts are available This presentation is available

– See the ‘note-pages’! (Print hidden sheets!) See the website(s) for the latest versions.

Questions:http://www.PTS.nl http://www.EQSL.PTS.nl

☎ 035 6926969 [email protected]://albert.mietus.nl [email protected]://www.PassieVoorTechniek.nl