linux internals

Download Linux Internals

If you can't read please download the document

Upload: tuxologynet

Post on 12-Nov-2014

40 views

Category:

Documents


5 download

DESCRIPTION

The slides for a course on the Internals of the Linux kernel, based on original material by Free Electrons, modified by Codefidence Ltd.

TRANSCRIPT

LinuxInternalsUnabletohandlekernelpagingrequestatvirtualaddress4d1b65e8 Unabletohandlekernelpagingrequestatvirtualaddress4d1b65e8 Covers pgd=c0280000 versions pgd=c0280000 / 2.6.17.7 2.4.32 [4d1b65e8]*pgd=00000000[4d1b65e8]*pgd=00000000 Internalerror:Oops:f5[#1] Internalerror:Oops:f5[#1] Moduleslinkedin:Moduleslinkedin:hx4700_udchx4700_udcasic3_baseasic3_base CPU:0 CPU:0 PCisatset_pxa_fb_info+0x2c/0x44 PCisatset_pxa_fb_info+0x2c/0x44 LRisathx4700_udc_init+0x1c/0x38[hx4700_udc] LRisathx4700_udc_init+0x1c/0x38[hx4700_udc] pc:[]lr:[]Nottainted sp:c076df78ip:60000093fp:c076df84 pc:[]lr:[]Nottainted

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

1

RightstocopyThiskitcontainsoriginalworkbythe followingauthors:AttributionShareAlike2.0 Youarefree tocopy,distribute,display,andperformthework tomakederivativeworks tomakecommercialuseofthework Underthefollowingconditions Attribution.Youmustgivetheoriginalauthorcredit. ShareAlike.Ifyoualter,transform,orbuilduponthiswork, youmaydistributetheresultingworkonlyunderalicense identicaltothisone. Foranyreuseordistribution,youmustmakecleartoothersthe licensetermsofthiswork. Anyoftheseconditionscanbewaivedifyougetpermissionfrom thecopyrightholder. Yourfairuseandotherrightsareinnowayaffectedbytheabove. Licensetext:http://creativecommons.org/licenses/bysa/2.0/legalcodeCopyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Copyright20042006 MichaelOpdenacker FreeElectrons [email protected] http://www.freeelectrons.com Copyright20032006 OronPeled [email protected] http://www.actcom.co.il/~oron Copyright20042006 GiladBenYossef Codefidenceltd. [email protected] http:/www.codefidence.comBased on material by:

2

WhatisLinux?Linux is a kernel that implements the POSIX and Single Unix Specification standards which is developed as an Open Source project. Usually when one talks of installing Linux, one is referring to a Linux Distribution. A distribution is a combination of Linux and other programs and library that form an operating system. There exists many such distribution for various purposes, from high end servers to embedded systems. They all share the same interface, thanks to the LSB standard Linux runs on 15 main platforms and supports applications ranging from ccNUMA super clusters to cellular phones and micro controllers. Linux is 11 years old, but is based on the 30 years old Unix design philosophy

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

3

WhatisOpenSource?Open Source is a way to develop software application in a distributed fashion that allows cooperation of multiple bodies to create the end product. They don't have to be from the same company or indeed, any company. With Open Source software the source code is published and any one can use, learn, distribute, adapt and sell the program. An Open Source program is protected under copyright law and is licensed to it's users under a software license agreement. It is NOT software in the public domain. When making use of Open Source software it is imperative to understand what license governs the use of the work and what is and what is not allowed by the terms of the license. The same thing is true for ANY external code used in a product.

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

4

OpenSourceLicenses

BSD MIT X11 ArtisticAs long as you acknowledge by copyright and promise not to sue me you can do what ever you want with the code.Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

LGPL

GPL CPL APLCustomer receives the same rights I have in the WHOLE work, including source code.

Customer receives the same rights I have in the SPECIFIC PART USED, including source code.

Based on material by:

5

LayersinaLinuxsystemUserprograms

Kernel Clibrary Systemlibraries Applicationlibraries Userprograms

Kernel Clibrary

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

6

LinuxInternals

KerneloverviewLinuxfeatures

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

7

Studiedkernelversion:2.6Linux2.4Mature Butdevelopmentsstopped;very fewdeveloperswillingtohelp. Nowobsoleteandlacksrecent features. Stillfineifyougetyour sources,toolsandsupportfrom commercialLinuxvendors.

Linux2.62yearsoldstableLinuxrelease! SupportfromtheLinux developmentcommunityandall commercialvendors. Nowmatureandmoreexhaustive. Mostdriversupgraded. Cuttingedgefeaturesand increasedperformance.

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

8

LinuxkernelkeyfeaturesPortabilityandhardwaresupport Runsonmostarchitectures. Scalability Canrunonsupercomputersas wellasontinydevices (4MBofRAMisenough). Compliancetostandardsand interoperability. Exhaustivenetworkingsupport. Security Itcan'thideitsflaws.Itscodeis reviewedbymanyexperts. Stabilityandreliability. Modularity Canincludeonlywhatasystem needsevenatruntime. Easytoprogram Youcanlearnfromexistingcode. Manyusefulresourcesonthenet.Based on material by:

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

9

SupportedhardwarearchitecturesSeethearch/directoryinthekernelsources Minimum:32bitprocessors,withorwithoutMMU 32bitarchitectures(arch/subdirectories) alpha,arm,cris,frv,h8300,i386,m32r,m68k,m68knommu, mips,parisc,ppc,s390,sh,sparc,um,v850,xtensa 64bitarchitectures: ia64,mips64,ppc64,sh64,sparc64,x86_64 Seearch//Kconfig,arch//README,or Documentation//fordetails

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

10

LinuxInternals

KerneloverviewKernelcode

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

11

ImplementedinCImplementedinClikeallUnixsystems. (CwascreatedtoimplementthefirstUnixsystems) AlittleAssemblyisusedtoo: CPUandmachineinitialization,criticallibraryroutines. Seehttp://www.tux.org/lkml/#s153 forreasonsfornotusingC++ (mainreason:thekernelrequiresefficientcode).

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

12

CompiledwithGNUCNeedGNUCextensionstocompilethekernel. So,youcannotuseanyANSICcompiler! SomeGNUCextensionsusedinthekernel:InlineCfunctions Inlineassembly Structurememberinitialization inanyorder(alsoinANSIC99) Branchannotation(seenextpage)

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

13

Helpgcctooptimizeyourcode!Usethelikelyandunlikelystatements (include/linux/compiler.h) Example: if(unlikely(err)){ ... } TheGNUCcompilerwillmakeyourcodefaster forthemostlikelycase. Usedinmanyplacesinkernelcode! Don'tforgettousethesestatements!Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

14

NoClibraryThekernelhastobestandaloneandcan'tuseuserspacecode. Userspaceisimplementedontopofkernelservices,nottheopposite. Kernelcodehastosupplyitsownlibraryimplementations (stringutilities,cryptography,uncompression...) So,youcan'tusestandardClibraryfunctionsinkernelcode. (printf(),memset(),malloc()...). YoucanalsousekernelCheaders. Fortunately,thekernelprovidessimilarCfunctionsforyour convenience,likeprintk(),memset(),kmalloc()...

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

15

KernelStackVerysmallandfixedstack.2pagestack(8k),pertask. Or1pagestack,pertaskandoneforinterrupts. 2.6 Choseninbuildtimeviamenu. Notforallarchitectures

Forsomearchitectures,thekernelprovidesdebugfacilityto detectstackoverruns.

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

16

ManagingendianismLinuxsupportsbothlittleandbigendianarchitectures Eacharchitecturedefines__BIG_ENDIANor__LITTLE_ENDIAN in Canbeconfiguredinsomeplatformssupportingboth. Tomakeyourcodeportable,thekerneloffersconversionmacros (thatdonothingwhennoconversionisneeded).Mostusefulones: u32cpu_to_be32(u32); //CPUbyteordertobigendian u32cpu_to_le32(u32); //CPUbyteordertolittleendian u32be32_to_cpu(u32); //LittleendiantoCPUbyteorder u32le32_to_cpu(u32); //BigendiantoCPUbyteorderCopyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

17

KernelcodingguidelinesNeverusefloatingpointnumbersinkernelcode.Yourcode mayberunonaprocessorwithoutafloatingpointunit(likeon arm).Floatingpointcanbeemulatedbythekernel,butthisis veryslow. Defineallsymbolsasstatic,exceptexportedones(avoidname spacepollution) Allsystemcallsreturnnegativenumebrs(errorcodes)for errors:#include

SeeDocumentation/CodingStyleformoreguidelinesCopyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

18

KernellogPrintingtothekernellogisdoneviatheprintkfunction. Thekernelkeepsthemessagesinacircularbuffer (sothatdoesn'tconsumemorememorywithmanymessages) Kernellogmessagescanbeaccessedfromuserspacethroughsystem calls,orthrough/proc/kmsg Kernellogmessagesarealsodisplayedinthesystemconsole.

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

19

printkTheprintkfunction: Similartostdlib'sprintf(3) Nofloatingpointformat. Logmessageareprefixedwitha,wherethenumber denotesseverity,from1(mostsevere)to8. Macrosaredefinedtobeusedforseveritylevels: KERN_EMERG,KERN_ALERT,KERT_CRIT, KERN_ERR,KERN_WARNING,KERN_NOTICE, KERN_INFO,KERN_DEBUG. Usageexample:Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

printk(KERN_DEBUGHelloWorldnumber%d\n,num);Based on material by:

20

AccessingthekernellogManywaysareavailable!Watchthesystemconsole syslogd Daemongatheringkernelmessages in/var/log/messages Followchangesbyrunning: tailf/var/log/messages Caution:thisfilegrows! Uselogrotatetocontrolthis dmesg Foundinallsystems DisplaysthekernellogbufferCopyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

logread Same.Oftenfoundinsmall embeddedsystemswithno /var/log/messagesorno dmesg.ImplementedbyBusybox. cat/proc/kmsg Waitsforkernelmessagesand displaysthem. Usefulwhennoneoftheabove userspaceprogramsareavailable (tinysystem)Based on material by:

21

LinkedListsManyconstructsusedoublylinkedlists. Listdefinitionandinitialization:structlist_headmylist=LIST_HEAD_INIT(mylist);

orLIST_HEAD(mylist);

orINIT_LIST_HEAD(&mylist);

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

22

ListManipulationListdefinitionandinitialization:voidlist_add(structlist_head*new,structlist_head *head); voidlist_add_tail(structlist_head*new,struct list_head*head); voidlist_del(structlist_head*entry); voidlist_del_init(structlist_head*entry); voidlist_move(structlist_head*list,structlist_head *head); voidlist_add_tail(structlist_head*list,struct list_head*head);

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

23

ListManipulation(cont.)Listsplicingandquery:voidlist_splice(structlist_head*list,struct list_head*head); voidlist_add_splice_init(structlist_head*list, structlist_head*head); voidlist_empty(structlist_head*head);

In2.6,therearevariantsoftheseAPI'sforRCUprotected lists(seesectionaboutLocksahead).

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

24

ListIterationListsalsohaveiteratormacrosdefined:list_for_each(pos,head); list_for_each_prev(pos,head); list_for_each_safe(pos,n,head); list_for_each_entry(pos,head,member);

Example:structmydata*pos; list_for_each_entry(pos,head,dev_list){ pos>some_data=0777; }

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

25

LinuxInternals

KerneloverviewKernelsubsystems

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

26

KernelarchitectureApp1 App2 Clibrary Systemcallinterface Process management Memory management Filesystem support Filesystem types CPUsupport code CPU/MMU supportcode Storage drivers Character devicedrivers Network devicedrivers Hardware CPU RAM StorageBased on material by:

...

User space

Device control

Networking

Kernel space

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

27

KernelModevs.UserModeAllmodernCPUssupportadualmodeofoperation:Usermode,forregulartasks. Supervisor(orprivileged)mode,forthekernel.

ThemodetheCPUisindetermineswhichinstructionsthe CPUiswillingtoexecute:SensitiveinstructionswillnotbeexecutedwhentheCPUis inusermode.

TheCPUmodeisdeterminedbyoneoftheCPUregisters, whichstoresthecurrentRingLevelCopyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

0forsupervisormode,3forusermode,12unusedbyLinux.Based on material by:

28

TheSystemCallInterfaceWhenauserspacetasksneedstouseakernelservice,itwill makeaSystemCall. TheClibraryplacesparametersandnumberofsystemcallin registersandthenissuesaspecialtrapinstruction. Thetrapatomicallychangestheringleveltosupervisor modeandthesetstheinstructionpointertothekernel. Thekernelwillfindtherequiredsystemcalledviathesystem calltableandexecuteit. Returningfromthesystemcalldoesnotrequireaspecial instruction,sinceinsupervisormodetheringlevelcanbe Copyright20062004,MichaelOpdenacker Based on material by: changeddirectly. Copyright20032006,OronPeledCopyright20042006CodefidenceLtd.

29

LinuxSystemCallPathdo_name() sys_name() entry.S Function call Trap

Kernel

Task

Glibc Task

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

30

KernelmemoryconstraintsWhocanlookafterthekernel? Nomemoryprotection Accessingillegalmemory locationsresultin(oftenfatal) kerneloopses. Fixedsizestack(8or4KB) Unlikeinuserspace, nowaytomakeitgrow. Kernelmemorycan'tbeswapped out(forthesamereasons).User process Attempt toaccess Illegal memory location Exception (MMU) Userspacememorymanagement Usedtoimplement: memoryprotection stackgrowth memoryswappingtodisk

SIGSEGV,kill

Kernel

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

31

LinuxInternals

KerneloverviewLinuxversioningschemeanddevelopmentprocess

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

32

LinuxKernelDevelopmentTimeline

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

33

LinuxstablereleasesMajorversions 1majorversionevery2or3years Examples:1.0,2.0,2.4,2.6 Stablereleases 1stablereleaseevery1or2months Examples:2.0.40,2.2.26,2.4.27,2.6.7... Stablereleaseupdates(sinceMarch2005) Updatestostablereleasesuptoseveraltimesaweek Addressonlycriticalissuesinthelateststablerelease Examples:2.6.11.1to2.6.11.7Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Evennumber

Based on material by:

34

LinuxdevelopmentandtestingreleasesTestingreleases Severaltestingreleasespermonth,beforethenextstableone. Youcancontributetomakingkernelreleasesmorestableby testingthem! Example:2.6.12rc1 Developmentversions Unstableversionsusedbykerneldevelopers beforemakinganewstablemajorrelease Examples:2.3.42,2.5.74

Oddnumber

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

35

ContinueddevelopmentinLinux2.6Since2.6.0,kerneldevelopershavebeenabletointroducelots ofnewfeaturesonebyoneonasteadypace,withouthavingto makemajorchangesinexistingsubsystems. OpeninganewLinux2.7(or2.9)developmentbranchwillbe requiredonlywhenLinux2.6isnolongerabletoaccommodate keyfeatureswithoutundergoingtraumaticchanges. Thankstothis,morefeaturesarereleasedtousersatafasterpace. However,theinternalkernelAPIcanundergochangesbetween two2.6.xreleases.Amodulecompiledforagivenversionmay nolongercompileorworkonamorerecentone.Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

36

LinuxKernelDevelopmentProcess

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

37

What'snewineachLinuxrelease?(1)commit 3c92c2ba33cd7d666c5f83cc32aa590e794e91b0 Author: Andi Kleen Date: Tue Oct 11 01:28:33 2005 +0200 [PATCH] i386: Don't discard upper 32bits of HWCR on K8 Need to use long long, not long when RMWing a MSR. I think it's harmless right now, but still should be better fixed if AMD adds any bits in the upper 32bit of HWCR. Bug was introduced with the TLB flush filter fix for i386 Signed-off-by: Andi Kleen Signed-off-by: Linus Torvalds

? ?!

...

TheofficiallistofchangesforeachLinuxreleaseisjusta hugelistofindividualpatches! Verydifficulttofindoutthekeychangesandtogetthe globalpictureoutofindividualchanges.Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

38

What'snewineachLinuxrelease?(2)Fortunately,asummaryofkeychanges withenoughdetailsisavailableon http://wiki.kernelnewbies.org/LinuxChanges

? ?!

Foreachnewkernelrelease,youcanalsogetthe changesinthekernelinternalAPI: http://lwn.net/Articles/2.6kernelapi/ What'snext? Documentation/featureremovalschedule.txt liststhefeatures,subsystemsandAPIsthatare plannedforremoval(announced1yearinadvance).Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

39

LinuxInternals

KerneloverviewKerneluserinterface

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

40

MountingvirtualfilesystemsLinuxmakessystemandkernelinformationavailableinuser spacethroughvirtualfilesystems(virtualfilesnotexistingonany realstorage).Noneedtoknowkernelprogrammingtoaccessthis!

Mounting/proc: mounttprocnone/proc Mounting/sys: mounttsysfsnone/sysFilesystemtype Mountpoint Rawdevice orfilesystemimage Inthecaseofvirtual filesystems,anystringisfineBased on material by:

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

41

KerneluserspaceinterfaceAfewexamples: /proc/cpuinfo:processorinformation /proc/meminfo:memorystatus /proc/version:versionandbuildinformation /proc/cmdline:kernelcommandline /proc//environ:callingenvironment /proc//cmdline:processcommandline ...andmanymore!Seebyyourself!Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

42

UserspaceinterfacedocumentationLotsofdetailsaboutthe/procinterfaceareavailablein Documentation/filesystems/proc.txt (almost2000lines)inthekernelsources. Youcanalsofindotherdetailsintheprocmanualpage: manproc SeetheNewDeviceModelsectionfordetailsabout/sys

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

43

LinuxInternals

CompilingandbootingLinuxGettingthesources

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

44

LinuxkernelsizeLinux2.6.16sources: Rawsize:260MB(20400files,approx7millionlinesofcode) bzip2compressedtararchive:39MB(bestchoice) gzipcompressedtararchive:49MB MinimumcompiledLinuxkernelsize(withLinuxTinypatches) approx300KB(compressed),800KB(raw) Whyarethesesourcessobig? Becausetheyincludethousandsofdevicedrivers,manynetwork protocols,supportmanyarchitecturesandfilesystems... TheLinuxcore(scheduler,memorymanagement...)isprettysmall!Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

45

kernel.org

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

46

GettingLinuxsources:2possibilitiesFullsources Theeasiestway,butlongertodownload. Example: http://kernel.org/pub/linux/kernel/v2.6/linux2.6.14.1.tar.bz2 Orpatchagainstthepreviousversion Assumingyoualreadyhavethefullsourcesofthepreviousversion Example: http://kernel.org/pub/linux/kernel/v2.6/patch2.6.14.bz2(2.6.13to2.6.14) http://kernel.org/pub/linux/kernel/v2.6/patch2.6.14.7.bz2(2.6.14to2.6.14.7)

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

47

DownloadingfullkernelsourcesDownloadingfromthecommandline Withawebbrowser,identifytheversionyouneedonhttp://kernel.org Intherightdirectory,downloadthesourcearchiveanditssignature (copyingthedownloadaddressfromthebrowser):wgethttp://kernel.org/pub/linux/kernel/v2.6/linux2.6.11.12.tar.bz2 wgethttp://kernel.org/pub/linux/kernel/v2.6/linux2.6.11.12.tar.bz2.sign

Checktheelectronicsignatureofthearchive:gpgverifylinux2.6.11.12.tar.bz2.sign

~/.wgetrcconfigfileforproxies:http_proxy=: ftp_proxy=: proxy_user=(ifany) proxy_password=(ifany)

Extractthecontentsofthesourcearchive:tarjxvflinux2.6.11.12.tar.bz2Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

48

Downloadingkernelsourcepatches(1)Assumingyoualreadyhavethelinuxx.y.version Identifythepatchesyouneedonhttp://kernel.orgwithawebbrowser Downloadthepatchfilesandtheirsignature:Patchfrom2.6.10to2.6.11wgetftp://ftp.kernel.org/pub/linux/kernel/v2.6/patch2.6.11.bz2 wgetftp://ftp.kernel.org/pub/linux/kernel/v2.6/patch2.6.11.bz2.sign

Patchfrom2.6.11to2.6.11.12(lateststablefixes)wgethttp://www.kernel.org/pub/linux/kernel/v2.6/patch2.6.11.12.bz2 wgethttp://www.kernel.org/pub/linux/kernel/v2.6/patch2.6.11.12.bz2.sign

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

49

Downloadingkernelsourcepatches(2)Checkthesignatureofpatchfiles:gpgverifypatch2.6.11.bz2.sign gpgverifypatch2.6.11.12.bz2.sign

Applythepatchesintherightorder:cdlinux2.6.10/ bzcat../patch2.6.11.bz2|patchp1 bzcat../patch2.6.11.12.bz2|patchp1 cd.. mvlinux2.6.10linux2.6.11.12

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

50

CheckingtheintegrityofsourcesKernelsourceintegritycanbecheckedthroughOpenPGPdigitalsignatures. Fulldetailsonhttp://www.kernel.org/signature.html Ifneeded,readhttp://www.gnupg.org/gph/en/manual.htmlandcreateanew privateandpublickeypairforyourself. ImportthepublicGnuPGkeyofkerneldevelopers:gpgkeyserverpgp.mit.edurecvkeys0x517D0F0E

Ifblockedbyyourfirewall,lookfor0x517D0F0Eonhttp://pgp.mit.edu/ ,copyandpastethekeytoalinuxkey.txtfile: gpgimportlinuxkey.txt Checkthesignatureoffiles: gpgverifylinux2.6.11.12.tar.bz2.signCopyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

51

AnatomyofapatchfileApatchfileistheoutputofthediffcommanddiffcommandline diffNrua/Makefileb/Makefile a/Makefile2005030409:27:1508:00 Filedateinfo +++b/Makefile2005030409:27:1508:00 @@1,7+1,7@@ Linenumbersinfiles VERSION=2 Contextinfo:3linesbeforethechange PATCHLEVEL=6 Usefultoapplyapatchwhenlinenumberschanged SUBLEVEL=11 EXTRAVERSION= Removedline(s)ifany +EXTRAVERSION=.1 Addedline(s)ifany NAME=WoozyNumbat

#*DOCUMENTATION*Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Contextinfo:3linesafterthechange

Based on material by:

52

UsingthepatchcommandThepatchcommandapplieschangestofilesinthecurrentdirectory: Makingchangestoexistingfiles Creatingordeletingfilesanddirectories patchusageexamples: patchpops=&acme_fops; acme_cdev>owner=THIS_MODULE;

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

202

Characterdeviceregistration(2)Then,nowthatyourstructureisready,addittothesystem: intcdev_add( structcdev*p, /*Characterdevicestructure*/ dev_tdev, /*Startingdevicemajor/minornumber*/ unsignedcount); /*Numberofdevices*/

Example(continued):if(cdev_add(acme_cdev,acme_dev,acme_count)){ printk(KERN_ERRChardriverregistrationfailed\n);

...

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

203

CharacterdeviceunregistrationFirstdeleteyourcharacterdevice: voidcdev_del(structcdev*p); Then,andonlythen,freethedevicenumber: voidunregister_chrdev_region(dev_tfrom, unsignedcount); Example(continued): cdev_del(acme_cdev); unregister_chrdev_region(acme_dev,acme_count);

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

204

LinuxerrorcodesTrytoreporterrorswitherrornumbersasaccurateaspossible! Fortunately,macronamesareexplicitandyoucanremember themquickly. Genericerrorcodes: include/asmgeneric/errnobase.h Platformspecificerrorcodes: include/asm/errno.h

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

205

Chardriverexamplesummary(1)staticvoid*acme_buf; staticacme_bufsize=8192; staticintacme_count=1; staticdev_tacme_dev; staticstructcdev*acme_cdev; staticssize_tacme_write(...){...} staticssize_tacme_read(...){...} staticstructfile_operationsacme_fops= { .owner=THIS_MODULE, .read=acme_read, .write=acme_write, };

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

206

Chardriverexamplesummary(2)staticint__initacme_init(void) { acme_buf=kmalloc(acme_bufsize, GFP_KERNEL); if(!acme_buf){ err=ENOMEM; gotoerr_exit; } if(alloc_chrdev_region(&acme_dev,0, acme_count,acme)){ err=ENODEV; gotoerr_free_buf; } acme_cdev=cdev_alloc(); if(!acme_cdev){ err=ENOMEM; gotoerr_dev_unregister; } acme_cdev>ops=&acme_fops; acme_cdev>owner=THIS_MODULE; if(cdev_add(acme_cdev,acme_dev, acme_dev_count)){ err=ENODEV; gotoerr_free_cdev; } return0; err_free_cdev: kfree(acme_cdev); err_dev_unregister: unregister_chrdev_region( acme_dev,acme_count); err_free_buf: kfree(acme_buf); err_exit: returnerr; } staticvoid__exitacme_exit(void) { cdev_del(acme_cdev); unregister_chrdev_region(acme_dev, acme_count); kfree(acme_buf); }

Showhowtohandleerrorsanddeallocateresourcesintherightorder!Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

207

CharacterdriversummaryCharacterdriverwriterDefinethefileoperationscallbacksforthedevicefile:read,write,ioctl... Inthemoduleinitfunction,getmajorandminornumberswithalloc_chrdev_region(), initacdevstructurewithyourfileoperationsandaddittothesystemwithcdev_add(). Inthemoduleexitfunction,callcdev_del()andunregister_chrdev_region()

Systemadministration

Systemuser Kernel

Openthedevicefile,read,write,orsendioctl'stoit. Executesthecorrespondingfileoperations

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

Kernel

Userspace

Loadthecharacterdrivermodule In/proc/devices,findthemajornumberituses. Createthedevicefilewiththismajornumber Thedevicefileisreadytouse!

Kernel

208

LinuxInternals

DriverdevelopmentDebugging

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

209

UsefulnessofaserialportMostprocessorsfeatureaserialportinterface(usuallyvery wellsupportedbyLinux).Justneedthisinterfacetobe connectedtotheoutside. Easywayofgettingthefirstmessagesofanearlykernel version,evenbeforeitboots.Aminimumkernelwithonly serialportsupportisenough. Oncethekernelisfixedandhascompletedbooting,possible toaccessaserialconsoleandissuecommands. Theserialportcanalsobeusedtotransferfilestothetarget.Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

210

Whenyoudon'thaveaserialportOnthehost Notanissue.YoucangetaUSBtoserialconverter.Usuallyvery wellsupportedonLinuxandroughlycosts$20.Thedeviceappears as/dev/ttyUSB0onthehost. Onthetarget CheckwhetheryouhaveanIrDAport.It'susuallyaserialporttoo. IfyouhaveanEthernetadapter,trywithit Youmayalsotrytomanuallyhookuptheprocessorserialinterface (checktheelectricalspecificationsfirst!)Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

211

DebuggingwithprintkUniversaldebuggingtechniqueusedsincethebeginningof programming(firstfoundincavemendrawings) Printedornotintheconsoleor/var/log/messages accordingtothepriority.Thisiscontrolledbytheloglevel kernelparameter,orthrough/proc/sys/kernel/printk (seeDocumentation/sysctl/kernel.txt) Availablepriorities(include/linux/kernel.h):#defineKERN_EMERG""/*systemisunusable*/ #defineKERN_ALERT""/*actionmustbetakenimmediately*/ #defineKERN_CRIT""/*criticalconditions*/ #defineKERN_ERR""/*errorconditions*/ #defineKERN_WARNING""/*warningconditions*/ #defineKERN_NOTICE""/*normalbutsignificantcondition*/ #defineKERN_INFO""/*informational*/ #defineKERN_DEBUG""/*debuglevelmessages*/Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

212

DebuggingwiththeMagicKeyYoucanhaveamagickeytocontrolthekernel. Toactiviatethisfeature,makesurethat:KernelconfigurationCONFIG_MAGIC_SYSRQenabled. Enableitatruntime:echo1>/proc/sys/kernel/sysrq

Thekeyis:PCConsole: SerialConsole: Fromshell(2.6only):Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

SysRq SendaBREAKechot>/proc/sysrqtriggerBased on material by:

213

DebuggingwiththeMagicKeyCont.Togetherwiththemagickey,youusethefollowing:b:hardboot(nosync,nounmount) s:sync u:Remountallreadonly. t:tasklist(proccesstable). 18:Setconsoleloglevel. e:ShowInstructionPointer. Andmore...presshforhelp.

Programmerscanaddtheirownhandlersaswell.Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

SeeDocumentation/sysrq.txtformoredetails.

Based on material by:

214

Debuggingwith/procor/sys(1)Insteadofdumpingmessagesinthekernellog,youcanhaveyour driversmakeinformationavailabletouserspace Throughafilein/procor/sys,whichcontentsarehandledby callbacksdefinedandregisteredbyyourdriver. Canbeusedtoshowanypieceofinformation aboutyourdeviceordriver. Canalsobeusedtosenddatatothedriverortocontrolit. Caution:anybodycanusethesefiles. Youshouldremoveyourdebugginginterfaceinproduction!Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

215

Debuggingwith/procor/sys(2)Examples cat/proc/acme/stats(dummyexample) Displaysstatisticsaboutyouracmedriver. cat/proc/acme/globals(dummyexample) Displaysvaluesofglobalvariablesusedbyyourdriver.echo600000>/sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed

AdjuststhespeedoftheCPU(controlledbythecpufreqdriver).

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

216

DebuggingwithioctlCanusetheioctl()systemcalltoqueryinformation aboutyourdriver(ordevice)orsendcommandstoit. Thiscallstheioctlfileoperationthatyoucanregisterin yourdriver. Advantage:yourdebugginginterfaceisnotpublic. Youcouldevenleaveitwhenyoursystem(oritsdriver)isin thehandsofitsusers.

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

217

DebuggingwithgdbSchrdingerpenguinprinciple. Ifyouexecutethekernelfromadebuggeronthesamemachine, thiswillinterferewiththekernelbehavior. However,youcanaccessthecurrentkernelstatewithgdb: gdb/usr/src/linux/vmlinux/proc/kcore uncompressedkernelkerneladdressspace Youcanaccesskernelstructures,followpointers...(readonly!) RequiresthekerneltobecompiledwithCONFIG_DEBUG_INFO (Kernelhackingsection)

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

218

kgdbkernelpatchhttp://kgdb.linsyssoft.com/ Theexecutionofthepatchedkernelisfullycontrolledby gdbfromanothermachine,connectedthroughaserialline. Candoalmosteverything,includinginsertingbreakpointsin interrupthandlers. Supportedarchitectures:i386,x86_64,ppcands390.

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

219

Kernelcrashanalysiswithkexeckexecsystemcall:makesitpossibleto callanewkernel,withoutrebootingand goingthroughtheBIOS/firmware. Idea:afterakernelpanic,makethe kernelautomaticallyexecuteanew, cleankernelfromareservedlocationin RAM,toperformpostmortemanalysis ofthememoryofthecrashedkernel. SeeDocumentation/kdump/kdump.txt inthekernelsourcesfordetails.Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

1.Copydebug kernelto reserved RAM 3.Analyze crashed kernelRAM

Standardkernel 2.kernel panic,kexec debugkernel Debugkernel

RegularRAM

Based on material by:

220

DecryptingoopsmessagesYouoftengetkerneloopsmessageswhen youdevelopdrivers(dereferencingnull pointers,illegalaccessestomemory...). Theygiverawinformationaboutthe functioncallstackandCPUregisters. Youcanmakethesemessagesmore explicitinyourdevelopmentkernel,for examplebyreplacingrawaddressesby symbolnames,bysetting: #GeneralSetup CONFIG_KALLSYMS=y Replacestheksymoopstoolwhich shouldn'tbeusedanymorewithLinux2.6Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.Unable to handle kernel paging req uest at virtual address 4d 1b65e8 Unable to handle kernel pag ing req uest at virtual address 4d1b6 5e8 pgd = c0280000 pgd = c0280000 [4d1b65e8] *pgd=00000000[4d1b65e8] *pgd=00000000 Internal error: Oops: f5 [#1] Internal error: Oops: f5 [#1] Modules linked in:Modules linked in: hx4700_udc hx4700_udc asic3_base asic3_base CPU: 0 CPU: 0 PC is at set_pxa_fb_info+0x2c/0x44 PC is at set_pxa_fb_info+0x2c/0x44 LR is at hx4700_udc_init+0x1c/0x38 [hx4700_udc] LR is at hx4700_udc_init+0x1c/0x38 [hx4700_udc] pc : [] lr : [] Not tainted sp : c076df78 ip : 60000093 fp : c076df84 pc : [] lr : [] Not tainted sp : c076df78 ip : 60000093 fp : c076df84 r10: 00000002 r9 : c076c000 r8 : c001c7e4 r10: 00000002 r9 : c076c000 r8 : c001c7e4 r7 : 00000000 r6 : c0176d40 r5 : bf007500 r4 : c0176d58 r7 : 00000000 r6 : c0176d40 r5 : bf007500 r4 : c0176d58 r3 : c0176828 r2 : 00000000 r1 : 00000f76 r0 : 80004440 r3 : c0176828 r2 : 00000000 r1 : 00000f76 r0 : 80004440 Flags: nZCvFlags: nZCv IRQs on FIQs on Mode SVC_32 Segme nt user

Based on material by:

221

DebuggingwithKprobeshttp://sourceware.org/systemtap/kprobes/ Fairlysimplewayofinsertingbreakpointsinkernelroutines Unlikeprintkdebugging,youneitherhavetorecompilenorrebootyour kernel.Youonlyneedtocompileandloadadedicatedmoduletodeclarethe addressoftheroutineyouwanttoprobe. Nondisruptive,basedonthekernelinterrupthandler Kprobesevenletsyoumodifyregistersandglobalkernelinternals. Supportedarchitectures:i386,x86_64,ppc64andsparc64 Niceoverviews:http://lwn.net/Articles/132196/ andhttp://www106.ibm.com/developerworks/library/lkprobes.htmlCopyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

222

KerneldebuggingtipsIfyourkerneldoesn'tbootyetorhangswithoutanymessage,it canhelptoactivateLowLeveldebugging (KernelHackingsection,onlyavailableonarm): CONFIG_DEBUG_LL=y Moreaboutkerneldebugginginthefree LinuxDeviceDriversbook(Referencessection)!

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

223

LinuxInternals

DriverdevelopmentConcurrentaccesstoresources

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

224

SourcesofconcurrencyissuesThesameresourcescanbeaccessedbyseveralkernelprocessesin parallel,causingpotentialconcurrencyissues Severaluserspaceprogramsaccessingthesamedevicedataor hardware.Severalkernelprocessescouldexecutethesamecodeon behalfofuserprocessesrunninginparallel. Multiprocessing:thesamedrivercodecanberunningonanother processor.ThiscanalsohappenwithsingleCPUswithhyperthreading. Kernelpreemption,interrupts:kernelcodecanbeinterruptedatany time(justafewexceptions),andthesamedatamaybeaccessbyanother processbeforetheexecutioncontinues.Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

225

AvoidingconcurrencyissuesAvoidusingglobalvariablesandshareddatawheneverpossible (cannotbedonewithhardwareresources) Don'tmakeresourcesavailabletootherkernelprocessesuntil theyarereadytobeused. Usetechniquestomanageconcurrentaccesstoresources. SeeRustyRussell'sUnreliableGuideToLocking Documentation/DocBook/kernellocking/ inthekernelsources.

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

226

ConcurrencyprotectionwithsemaphoresProcess1Failed Acquirelock Success Criticalcodesection Tryagain SuccessWaitlockrelease

Process2

Sharedresource

ReleaselockCopyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

227

KernelsemaphoresAlsocalledmutexes(MutualExclusion)1 (free) P (down)P:Probeer Try(todecrement)inDutch V:Verhoog IncrementinDutch

V (up) 0 (locked)

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

228

InitializingasemaphoreStatically DECLARE_MUTEX(name); DECLARE_MUTEX_LOCKED(name); Dynamically voidinit_MUTEX(structsemaphore*sem); voidinit_MUTEX_LOCKED(structsemaphore*sem);

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

229

lockingandunlockingsemaphoresvoiddown(structsemaphore*sem); Decrementsthesemaphoreifsetto1,waitsotherwise. Caution:can'tbeinterrupted,causingprocessesyoucannotkill! intdown_interruptible(structsemaphore*sem); Same,butcanbeinterrupted.Ifinterrupted,returnsanonzerovalue anddoesn'tholdthesemaphore.Testthereturnvalue!!! intdown_trylock(structsemaphore*sem); Neverwaits.Returnsanonzerovalueifthesemaphoreisnot available. voidup(structsemaphore*sem); Releasesthesemaphore.Makesureyoudoitassoonaspossible!Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

230

Reader/writersemaphoresAllowsharedaccessbyunlimitedreaders,orbyonly1writer.Writersgetpriority. voidinit_rwsem(structrw_semaphore*sem); voiddown_read(structrw_semaphore*sem); intdown_read_trylock(structrw_semaphore*sem); intup_read(structrw_semaphore*sem); voiddown_write(structrw_semaphore*sem); intdown_write_trylock(structrw_semaphore*sem); intup_write(structrw_semaphore*sem); Wellsuitedforrarewrites,holdingthesemaphorebriefly.Otherwise,readersget starved,waitingtoolongforthesemaphoretobereleased.

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

231

WhentousesemaphoresBeforeandafteraccessingsharedresources Beforeandaftermakingotherresourcesavailabletoother partsofthekernelortouserspace(typicallyandmodule initialization). Insituationswhensleepingisallowed.

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

232

SpinlocksLockstobeusedforcodethatcan'tsleep(criticalsections, interrupthandlers...Beverycarefulnottocallfunctionswhich cansleep! Intendedformultiprocessorsystems Spinlocksarenotinterruptible, don'tsleepandkeepspinninginaloop untilthelockisavailable.Spinlock

Stilllocked?

SpinlockscausekernelpreemptiontobedisabledontheCPU executingthem. Mayrequireinterruptstobedisabledtoo.Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

233

InitializingspinlocksStatic spinlock_tmy_lock=SPIN_LOCK_UNLOCKED; Dynamic voidspin_lock_init(spinlock_t*lock);

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

234

Usingspinlocksvoidspin_[un]lock(spin_lock_t*lock); voidspin_[un]lock_irqsave(spin_lock_t*lock, unsignedlongflags); DisablesIRQsonthelocalCPU voidspin_[un]lock_irq(spin_lock_t*lock); DisablesIRQswithoutsavingflags.Whenyou'resurethatnobody alreadydisabledinterrupts. voidspin_[un]lock_bh(spin_lock_t*lock); Disablessoftwareinterrupts,butnothardwareones Notethatreader/writerspinlocksalsoexist.Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

235

DeadlocksituationsTheycanlockupyoursystem.Makesuretheyneverhappen!Don'tcallafunctionthatcantry togetaccesstothesamelock Holdingmultiplelocksisrisky!

Getlock1 Getlock1 call Waitforlock1Dead Lock!Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Getlock2

Getlock2

Dead Lock!

Getlock1

Based on material by:

236

AlternativestolockingAswehavejustseen,lockingcanhaveastrongnegativeimpacton systemperformance.Insomesituations,youcoulddowithoutit. ByusinglockfreealgorithmslikeReadCopyUpdate(RCU). RCUAPIavailableinthekernel (Seehttp://en.wikipedia.org/wiki/RCU). Whenavailable,useatomicoperations.

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

237

AtomicvariablesUsefulwhenthesharedresourceisan integervalue Evenaninstructionliken++isnot guaranteedtobeatomiconallprocessors! Header #include Type atomic_t containsasignedinteger(atleast24bits)

Operationswithoutreturnvalue:voidatomic_inc(atomic_t*v); voidatomic_dec(atomic_*v); voidatomic_add(inti,atomic_t*v); voidatomic_sub(inti,atomic_t*v);

Simularfunctionstestingtheresult:intatomic_inc_and_test(...); intatomic_dec_and_test(...); intatomic_sub_and_test(...);

Functionsreturningthenewvalue:intatomic_inc_and_return(...); intatomic_dec_and_return(...); intatomic_add_and_return(...); intatomic_sub_and_return(...);

Atomicoperations(mainones) Setorreadthecounter:atomic_set(atomic_t*v,inti); intatomic_read(atomic_t*v);Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

238

AtomicbitoperationsSupplyveryfast,atomicoperations Onmostplatforms,applytoanunsignedlongtype. Applytoavoidtypeonafewothers. Set,clear,toggleagivenbit: voidset_bit(intnr,unsignedlong*addr); voidclear_bit(intnr,unsignedlong*addr); voidchange_bit(intnr,unsignedlong*addr); Testbitvalue: inttest_bit(intnr,unsignedlong*addr); Testandmodify(returnthepreviousvalue): inttest_and_set_bit(...); inttest_and_clear_bit(...); inttest_and_change_bit(...);Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

239

LinuxInternals

DriverdevelopmentProcessesandscheduling

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

240

ProcessesandThreadsAprocessisaninstanceofarunningprogramMultipleinstancesofthesameprogramcanberunning. Programcode(textsection)memoryisshared. Eachprocesshasitsowndatasection,addressspace,open filesandsignalhandlers.

AthreadisasingletaskinaprogramItbelongstoaprocessandsharesthecommondatasection, addressspace,openfilesandpendingsignals. Ithasit'sownstack,pendingsignalsandstate.

It'scommontorefertosinglethreadedprogramsas Copyright20062004,MichaelOpdenacker Based on material by: processes. Copyright20032006,OronPeledCopyright20042006CodefidenceLtd.

241

TheKernelandThreadsThe2.4kerneldidnothaveanotionofthreads.Allthreadswereimplementedasprocessesthathappento sharethesameaddressspace,filesystemresources,file descriptorsandsignalhandlersastheirparentprocess.

In2.6anexplicitnotionofprocessesandthreadswas introducedtothekernel. Schedulingisstilldoneonathreadbythreadbasis.

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

242

Threadvs.Processvs.TaskTask 123 Task 124 Task 125 Task 126 Task 127 Task 128 Linux Kernel

Memory/Files Process 123 T 1 T 2 T 3

M/F 126 T 1

M/F 127 T 1

M/F 128 T 1 POSIX API

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

243

task_structEachtaskisrepresentedbyatask_struct. Thetaskislinkedinthetasktreevia:parent Pointertoit'sparent children sibling Alinkedlist Alinkedlist

task_structcontainsapidfieldpidismappedtotask_structpointerviaahashtable

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

244

TaskIdentifiersEachtask_structhasthefollowingidentities:PID Globallyunique.Differentoneforeachthread. TGID ThreadGroupId.Returnedtouserspaceas getpid()Sharedbyallthreadsofaprocess. Forsinglethreadedprocess==PID.

PGID ProccessGroupId.(Posix.1). SID SessionId(Posix.1).

currentpointstothecurrentprocesstask_structCopyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Whenapplicablenotvalidininterruptcontext.

Based on material by:

245

AprocesslifeParentprocessCallsfork() andcreates anewprocess Theprocessiselected bythescheduler Taskterminatedbutits resourcesarenotfreedyet. Waitingforitsparent toacknowledgeitsdeath.

TASK_ZOMBIE

TASK_RUNNINGReadybut notrunning Theprocessispreempted bytoschedulertorun ahigherprioritytask

TASK_RUNNINGActuallyrunning

Theeventoccurs ortheprocessreceives asignal.Processbecomes runnableagain

TASK_INTERRUPTIBLE orTASK_UNINTERRUPTIBLEWaiting

Decidestosleep onawaitqueue foraspecificevent

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

246

ProcesscontextUserspaceprogramsandsystemcallsarescheduledtogether

Processexecutinginuserspace... (canbepreempted) Systemcall orexception

Processcontinuinginuserspace... (orreplacedbyahigherpriorityprocess) (canbepreempted)

Kernelcodeexecuted onbehalfofuserspace (canbepreemptedtoo!)

Stillhasaccesstoprocess data(openfiles...)

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

247

KernelthreadsThekerneldoesnotonlyreactfromuserspace(systemcalls,exceptions) orhardwareevents(interrupts).Italsorunsitsownprocesses. Kernelspacearestandardprocessesscheduledandpreemptedinthesame way(youcanviewthemwithtoporps!)Theyjusthavenospecial addressspaceandusuallyrunforever. Kernelthreadexamples: pdflush:regularlyflushesdirtymemorypagestodisk(file changesnotcommittedtodiskyet). ksoftirqd:managessoftirqs.

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

248

ProcessprioritiesRegularprocesses Prioritiesfrom20(maximum)to19(minimum) Onlyrootcansetnegativepriorities (rootcangiveanegativeprioritytoaregularuserprocess) Usethenicecommandtorunajobwithagivenpriority: nicen Usetherenicecommandtochangeaprocesspriority: renicep

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

249

RealtimeprocessesRealtimeprocessescanbestartedbyrootusingthePOSIXAPI Availablethrough(seemansched.hfordetails) 100realtimeprioritiesavailable SCHED_FIFOschedulingclass: TheprocessrunsuntilcompletionunlessitisblockedbyanI/O,voluntarily relinquishestheCPU,orispreemptedbyahigherpriorityprocess. SCHED_RRschedulingclass: Difference:theprocessesarescheduledinaRoundRobinway. Eachprocessisrununtilitexhaustsamaxtimequantum.Thenother processeswiththesamepriorityarerun,andsoandso...

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

250

TimerfrequencyTimerinterruptsareraisedeveryHZthofsecond(=1jiffy) HZisnowconfigurable(inProcessortypeandfeatures): 100,250(i386default)or1000. Supportedoni386,ia64,ppc,ppc64,sparc64,x86_64 Seekernel/Kconfig.hz. Compromisebetweensystemresponsivenessandglobalthroughput. Caution:notanyvaluecanbeused.Constraintsapply! AnotherideaistocompletelyturnoffCPUtimerinterruptswhenthe systemisidle(dynamictick):seehttp://muru.com/linux/dyntick. Thissavespower.Supportsarmandi386sofar.Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

251

O(1)schedulerThekernelmaintains2priorityarrays: theactiveandtheexpiredarray. Eacharraycontains140entries(100realtimepriorities+40 regularones),1foreachpriority,eachcontainingalistof processeswiththesamepriority. Thearraysareimplementedinawaythatmakesitpossible topickaprocesswiththehighestpriorityinconstanttime (whateverthenumberofrunningprocesses).

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

252

ChoosingandexpiringprocessesTheschedulerfindsthehighestprocesspriority Itexecutesthefirstprocessinthepriorityqueueforthis priority. Oncetheprocesshasexhausteditstimeslice,itismovedto theexpiredarray. Theschedulergetsbacktoselectinganotherprocesswiththe highestpriorityavailable,andsoon... Oncetheactivearrayisempty,the2arraysareswapped! Again,everythingisdoneinconstanttime!Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

253

Whenisschedulingrun?Eachprocesshasaneed_reschedflagwhichisset: Afteraprocessexhausteditstimeslice. Afteraprocesswithahigherpriorityisawakened. Thisflagischecked(possiblycausingtheexecutionofthescheduler) Whenreturningtouserspacefromasystemcall Whenreturningfromaninterrupthandler(includingthecputimer) Schedulingalsohappenswhenkernelcodeexplicitelyruns schedule()orexecutesanactionthatsleeps.

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

254

TimeslicesThescheduleralsoprioritizeshighpriorityprocessesbygiving themabiggertimeslice. Initialprocesstimeslice:parent'stimeslicesplitin2 (otherwiseprocesswouldcheatbyforking). Minimumpriority:5msor1jiffie(whicheverislarger) Defaultpriorityinjiffies:100ms Maximumpriority:800ms Note:actuallydependsonHZ. Seekernel/sched.cfordetails.Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

255

DynamicprioritiesOnlyappliestoregularprocesses Forabetteruserexperience,theLinuxschedulerbootsthepriority ofinteractiveprocesses(processeswhichspendmostoftheirtime sleeping,andtaketimetoexhausttheirtimeslices).Such processesoftensleepbutneedtorespondquicklyafterwakingup (example:wordprocessorwaitingforkeypresses). Prioritybonus:upto5points. Conversely,theLinuxschedulerreducesthepriorityofcompute intensivetasks(whichquicklyexhausttheirtimeslices). Prioritypenalty:upto5points.Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

256

LinuxInternals

DriverdevelopmentSleeping

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

257

Howtosleep(1)Sleepingisneededwhenauserprocessiswaitingfordatawhich arenotreadyyet.Theprocessthenputsitselfinawaitingqueue. Staticqueuedeclaration DECLARE_WAIT_QUEUE_HEAD(module_queue); Dynamicqueuedeclaration wait_que_head_tqueue; init_waitqueue_head(&queue);

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

258

Howtosleep(2)Severalwaystomakeakernelprocesssleepwait_event(queue,condition); Sleepsuntilthegivenbooleanexpressionistrue. Caution:can'tbeinterrupted(i.e.bykillingtheclientprocessinuserspace) wait_event_interruptible(queue,condition); Canbeinterrupted wait_event_timeout(queue,condition,timeout); Sleepsandautomaticallywakesupafterthegiventimeout.wait_event_interruptible_timeout(queue,condition,timeout);

Sameasabove,interruptible.

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

259

Wakingup!Typicallydonebyinterrupthandlerswhendatasleeping processesarewaitingforareavailable. wake_up(&queue); Wakesupallthewaitingprocessesonthegivenqueue wake_up_interruptible(&queue); Doesthesamejob.Usuallycalledwhenprocesseswaited usingwait_event_interruptible.

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

260

LinuxInternals

DriverdevelopmentInterruptmanagement

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

261

NeedforinterruptsInternalprocessorinterruptsusedbytheprocessor,for exampleformultitaskscheduling. Externalinterruptsneededbecausemostinternalandexternal devicesareslowerthantheprocessor.Betternotkeepthe processorwaitingforinputdatatobereadyordatatobe output.Whenthedeviceisreadyagain,itsendsaninterrupt togettheprocessorattentionagain.

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

262

InterrupthandlerconstraintsNotrunfromausercontext: Can'ttransferdatatoandfromuserspace (needtobedonebysystemcallhandlers) InterrupthandlerexecutionismanagedbytheCPU,notby thescheduler.Handlerscan'trunactionsthatmaysleep, becausethereisnothingtoresumetheirexecution. Inparticular,needtoallocatememorywithGFP_ATOMIC Havetocompletetheirjobquicklyenough: theyshouldn'tblocktheirinterruptlinefortoolong.

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

263

Registeringaninterrupthandler(1)Definedininclude/linux/interrupt.h intrequest_irq( unsignedintirq, Requestedirqchannel irqreturn_t(*handler)(...), Interrupthandler unsignedlongirq_flags, Optionmask(seenextpage) constchar*devname, Registeredname void*dev_id); Pointertosomehandlerdata CannotbeNULLandmustbeuniqueforsharedirqs! voidfree_irq(unsignedintirq,void*dev_id);

?

Whydoesdev_idhavetobeunique?Answer...

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

264

Registeringaninterrupthandler(2)irq_flagsbitvalues(canbecombined,noneisfinetoo) SA_INTERRUPT "Quick"interrupthandler.Runwithallinterruptsdisabledonthecurrentcpu. Shouldn'tneedtobeusedexceptinspecificcases(suchastimerinterrupts) SA_SHIRQ Runwithinterruptsdisabledonlyonthecurrentirqlineandonthelocalcpu. Theinterruptchannelcanbesharedbyseveraldevices. RequiresahardwarestatusregistertellingwhetheranIRQwasraisedornot. SA_SAMPLE_RANDOM Interruptscanbeusedtocontributetothesystementropypoolusedby /dev/randomand/dev/urandom.Usefultogenerategoodrandom numbers.Don'tusethisiftheinterruptbehaviorofyourdeviceispredictable!Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

265

WhentoregisterthehandlerEitheratdriverinitializationtime: consumeslotsofIRQchannels! Oratdeviceopentime(firstcalltotheopenfileoperation): betterforsavingfreeIRQchannels. Needtocountthenumberoftimesthedeviceisopened,to beabletofreetheIRQchannelwhenthedeviceisnolonger inuse.

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

266

Informationoninstalledhandlers/proc/interruptsCPU0 0:5616905XTPICtimer#Registeredname 1:9828XTPICi8042 2:0XTPICcascade 3:1014243XTPICorinoco_cs 7:184XTPICIntel82801DBICH4 8:1XTPICrtc 9:2XTPICacpi 11:566583XTPICehci_hcd,uhci_hcd, uhci_hcd,uhci_hcd,yenta,yenta,radeon@PCI:1:0:0 12:5466XTPICi8042 14:121043XTPICide0 15:200888XTPICide1 NMI:0#NonMaskableInterrupts ERR:0Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

267

Totalnumberofinterruptscat/proc/stat|grepintr intr819076760929671037701102775520196...Totalnumber ofinterrupts IRQ1 total IRQ2 IRQ3 total ...

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

268

Interruptchanneldetection(1)Usefulwhenadrivercanbeusedindifferentmachines/architectures SomedevicesannouncetheirIRQchannelinaregister ManualdetectionRegisteryourinterrupthandlerforallpossiblechannels Askforaninterrupt LetthecalledinterrupthandlerstoretheIRQnumberinaglobal variable. Tryagainifnointerruptwasreceived Unregisterunusedinterrupthandlers.

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

269

Interruptchanneldetection(2)Kerneldetectionutilities mask=probe_irq_on(); Activateinterruptsonthedevice Deactivateinterruptsonthedevice irq=probe_irq_off(mask);>0:uniqueIRQnumberfound =0:nointerrupt.Tryagain! Pseudofilesystems) Doesn'twasteRAM:growsandshrinkstoaccommodatestoredfiles SavesRAM:noduplication;canswapoutpagestodiskwhenneeded. SeeDocumentation/filesystems/tmpfs.txtinkernelsources.

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

320

LinuxInternals

TheNetworksubsystem andNetworkdevicedrivers

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

321

AviewoftheLinuxnetworkingsubsystemStack App App 1 App2 Socket Layer UDP Networking Stack Driver Stack Driver Hardware TCP IP Networking Stack Driver ICMP Bridge App3

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

322

NetworkDeviceDriverHardwareInterfacexxx TxSend Send Send SentOK SendErr Free

xxx

xxx

xxx

xxxMemory Access

Driver

Rx

Free

Free

RcvOk

RcvErr RecvCRC RcvOK

xxxDMA

xxx

xxx

xxx

Driver allocates Ring Buffers. Driver resets descriptors to initial state. Driver puts packet to be sent in Tx buffers. Device puts received packet in Rx buffers. Driver/Device update descriptors to indicate state. Usually, device indicates Rx and end of Tx with interrupt, unless polling or NAPI. Based on material by:

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

323

NetworkDeviceDriverStackInterfaceAnetworkdevicedriverprovidesinterfacetothenetwork stack. Itdoesnothaveorusemajor/minornumbers,likecharacter devices. Anetworkdriverisrepresentedbya:structnet_device

Andisregisteredvia:intregister_netdev(structnet_device*dev); intunregister_netdev(structnet_devicedev);Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Afterfillinginsomeimportantbits...

Based on material by:

324

SocketBuffersWeneedtomanipulatepacketsthroughthestack Thismanipulationinvolvesefficiently:Addingprotocolheaders/trailersdownthestack. Removingprotocolheaders/trailersupthestack.

Packetscanbechainedtogether. Eachprotocolshouldhaveconvenientaccesstoheader fields. Todoallthisthekernelprovidesthesk_buffstructure.Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

325

sk_buffAnsk_buffrepresentsasinglepacket. Thisstructispassedthroughtheprotocolstack. Itholdspointerstoabufferwiththepacketdata:sk_buff mac head data nh tcp h tail telnet tailroom 326 end

headroom mac ipCopyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

sk_buffManipulationManipulatesk_buff:unsignedchar*skb_put(structsk_buff*skb, unsignedintlen); tail+=len unsignedchar*skb_push(structsk_buff*skb, unsignedintlen); data=len unsignedchar*skb_pull(structsk_buff*skb, unsignedintlen); data+=len

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

327

sk_buffManipulationManipulatesk_buff:intskb_headroom(conststructsk_buff*skb); datahead intskb_tailroom(conststructsk_buff*skb); endtail intskb_reserve(conststructsk_buff*skb,unsignedintlen); tail=(data+=len)

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

328

sk_buffAllocationLowlevelallocationisdonevia:structsk_buff*alloc_skb(unsignedint size,intgfp_mask);

Butitisbettertousethewrapper:structsk_buff*dev_alloc_skb(unsignedint size); Whichreservessomespaceforoptimization.

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

329

sk_buffAllocationExampleImmediatelyafterallocation,weshouldreservetheneeded headroom:structsk_buff*skb; skb=dev_alloc_skb(1500); if(unlikely(!skb))break;

/*Markasbeingusedbythisdevice*/ skb>dev=dev; /*AlignIPon16byteboundaries*/ skb_reserve(skb,2);Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

330

SoftnetWasintroducedinkernel2.4.x ParallelizepackethandlingonSMPmachines Packettransmit/receiveishandledviatwosoftirqs:NET_TX_SOFTIRQFeedspacketsfromnetworkstackto driver. NET_RX_SOFTIRQFeedspacketsfromdrivertonetwork stack.

Thetransmit/receivequeuesarestoredinpercpu softnet_data.Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

331

PacketReceptionThedriver:Allocatesanskb. setsupadescriptorintheringbuffersforthehardware.

ThedriverRxinterrupthandlercallsnetif_rx(skb). netif_rx(skb)Depositsthesk_buffinthepercpuinputqueue. MarkstheNET_RX_SOFTIRQtorun.

Laternet_rx_action()iscalledbyNET_RX_SOFTIEQ, whichcallsthedriverpoll()methodtofeedthepacketup.Copyright20062004,MichaelOpdenacker Based on material by: Normallypoll()issettoproccess_backlog()bynet_dev_init() Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

332

PacketReceptionOverview

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

333

PacketTransmissionEachnetworkdevicedefinesamethod:int(*hard_start_xmit)(structsk_buff*skb,struct net_device*dev);

Thismethodisindirectlycalledfromthe NET_TX_SOFTIRQ Callstothismethodareserializedviadev>xmit_lock_owner Thedrivercanmanagethetransmitqueue:voidnetif_start_queue(structnet_device*net); voidnetif_stop_queue(structnet_device*net); voidnetif_wake_queue(structnet_device*net);Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

intnetif_queue_stopped(structnet_device*net);Based on material by:

334

PacketReceptionOverview

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

335

NetworkDeviceAllocationEachnetworkdeviceisrepresentedbyastructnet_device Theyareallocatedusing:structnet_device*alloc_netdev(size,mask, setup_func);

sizesizeofourprivdatapart maskanamingpattern(e.g.eth%d) setup_funcAfunctiontopreparetherestofnet_device.

Anddeallocatedwithvoidfree_netdev(struct*net_device);

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

336

NetworkDeviceAllocation(cont.)ForEthernetwehaveashortversion:structnet_device*alloc_etherdev(size);

whichcallsalloc_netdev(size,eth%d,ether_setup);

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

337

NetworkDeviceInitializationThenet_deviceshouldbefilledwithnumerousmethods:openrequestresources,registerinterrupts,startqueues. stopdeallocatesresources,unregisterirq,stopqueue. get_statsreportstatistics set_multicast_listconfiguredeviceformulticast do_ioctldevicespecificIOCTLfunction change_mtuControldeviceMTUsetting hard_start_xmitcalledbythestacktoinitiateTx.

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

338

NetworkDeviceInitialization(Cont.)Also,thedev>flagsshouldbesetaccordingtodevice capabilities:IFF_MULTICASTDevicesupportmulticast IFF_NOARPDevicedoesnotsupportARPprotocol

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

339

NAPINetnetworkAPI Optionalprovidesinterruptmitigationunderhighload Requirements:ADMAringbuffer. Abilitytoturnoffreceiveinterrupts.

Itisusedbydefininganewmethod:int(*poll)(structnet_device*dev,int*budget);

Calledbythenetworkstackperiodicallywhensignaledby thedrivertodoso.Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

340

NAPI(cont.)Whenareceiveinterruptoccurs,driver:Turnsoffreceiveinterrupts. Callsnetif_rx_schedule(dev)togetstacktostart callingit'spollmethod.

PollmethodScansreceiveringbuffers,feedingpacketstothestackvia: netif_receive_skb(skb). Ifworkfinishedwithinbudgetparameter,reenablesinterrupts andcallsnetif_rex_complete(dev) Else,stackwillcallpollmethodagain.Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

341

LinuxInternals

AdviceandresourcesGettinghelpandcontributions

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

342

SolvingissuesIfyoufaceanissue,anditdoesn'tlookspecifictoyourworkbut rathertothetoolsyouareusing,itisverylikelythatsomeoneelse alreadyfacedit. SearchtheInternetforsimilarerrorreports Onwebsitesormailinglistarchives (usingagoodsearchengine) Onnewsgroups:http://groups.google.com/ Youhavegreatchancesoffindingasolutionorworkaround,orat leastanexplanationforyourissue. Otherwise,reportingtheissueisuptoyou!Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

343

GettinghelpIfyouhaveasupportcontract,askyourvendor Otherwise,don'thesitatetoshareyourquestionsandissues onmailinglistsEithercontacttheLinuxmailinglistforyourarchitecture(likelinux armkernelorlinuxshdev...) Orcontactthemailinglistforthesubsystemyou'redealingwith (linuxusbdevel,linuxmtd...).Don'taskthemaintainerdirectly! MostmailinglistscomewithaFAQpage.Makesureyoureadit beforecontactingthemailinglist RefrainfromcontactingtheLinuxKernelmailinglist,unlessyou're anexperienceddeveloperandneedadviceCopyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

344

LinuxInternals

AdviceandresourcesBugreportandpatchsubmission

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

345

ReportingLinuxbugsFirstmakesureyou'reusingthelatestversion Makesureyouinvestigatetheissueasmuchasyoucan: seeDocumentation/BUGHUNTING Makesurethebughasnotbeenreportedyet.Abugtrackingsystem (http://bugzilla.kernel.org/)existsbutveryfewkerneldevelopersuseit. Besttousewebsearchengines(accessingpublicmailinglistarchives) Ifthesubsystemyoureportabugonhasamailinglist,useit. Otherwise,contacttheofficialmaintainer(seetheMAINTAINERSfile). Alwaysgiveasmanyusefuldetailsaspossible.

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

346

HowtosubmitpatchesordriversDon'tmergepatchesaddressingdifferentissues Youshouldidentifyandcontacttheofficialmaintainerforthe filestopatch. SeeDocumentation/SubmittingPatchesfordetails. Fortrivialpatches,youcancopytheTrivialPatchMonkey. Specialsubsystems:ARMplatform:it'sbesttosubmityourARMpatchestoRussell King'spatchsystem: http://www.arm.linux.org.uk/developer/patches/Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

347

Howtobecomeakerneldeveloper?GregKroahHartmangatheredusefulreferencesandadvicefor peopleinterestedincontributingtokerneldevelopment: Documentation/HOWTO(inkernelsourcessince2.6.15rc2) Donotmissthisveryusefuldocument!

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

348

LinuxInternals

AdviceandresourcesReferences

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

349

Informationsites(1)LinuxWeeklyNews http://lwn.net/ TheweeklydigestoffallLinuxandfreesoftware informationsources Indepthtechnicaldiscussionsaboutthekernel Subscribetofinancetheeditors($5/month) Articlesavailablefornonsubscribers after1week.

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

350

Informationsites(2)KernelTrap http://kerneltrap.org/ Forumwebsiteforkerneldevelopers News,articles,whitepapers,discussions,polls,interviews Perfectifadigestisnotenough!

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

351

Usefulreading(1)LinuxDeviceDrivers,3rdedition,Feb2005 ByJonathanCorbet,AlessandroRubini,GregKroahHartman,O'Reilly http://www.oreilly.com/catalog/linuxdrive3/ Freelyavailableonline! Greatcompaniontotheprintedbookforeasyelectronicsearches! http://lwn.net/Kernel/LDD3/(1PDFfileperchapter) http://freeelectrons.com/community/kernel/ldd3/(singlePDFfile) AmusthavebookforLinuxdevicedriverwriters!

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

352

Usefulreading(2)LinuxKernelDevelopment,2ndEdition,Jan2005 RobertLove,NovellPress http://rlove.org/kernel_book/ Averysyntheticandpleasantwaytolearnaboutkernel subsystems(beyondtheneedsofdevicedriverwriters) UnderstandingtheLinuxKernel,3rdedition,Nov2005 DanielP.Bovet,MarcoCesati,O'Reilly http://oreilly.com/catalog/understandlk/ AnextensivereviewofLinuxkernelinternals,coveringLinux2.6atlast. Unfortunately,onlycoversthePCarchitecture.

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

353

UsefulonlineresourcesLinuxkernelmailinglistFAQ http://www.tux.org/lkml/ CompleteLinuxkernelFAQ Readthisbeforeaskingaquestiontothemailinglist KernelNewbies http://kernelnewbies.org/ Glossaries,articles,presentations,HOWTOs, recommendedreading,usefultoolsforpeople gettingfamiliarwithLinuxkernelordriver development.Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

354

Internationalconferences(1)UsefulconferencesfeaturingLinuxkernelpresentations OttawaLinuxSymposium(July):http://linuxsymposium.org/ Rightafterthe(private)kernelsummit. Lotsofkerneltopics.Manycorekernelhackersstillpresent. Fosdem:http://fosdem.org(Brussels,February) Fordevelopers.Kernelpresentationsfromwellknownkernelhackers. CELinuxForum:http://celinuxforum.org/ Organizesseveralinternationaltechnicalconferences,inparticularin California(SanJose)andinJapan.NowopentononCELFmembers! Veryinterestingkerneltopicsforembeddedsystemsdevelopers.Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

355

Internationalconferences(2)linux.conf.au:http://conf.linux.org.au/(Australia/NewZealand) Featuresafewpresentationsbykeykernelhackers. LinuxKongress(Germany,September/October) http://www.linuxkongress.org/ Lotsofpresentationsonthekernelbutveryexpensiveregistrationfees. Don'tmissourfreeconferencevideoson http://freeelectrons.com/community/videos/conferences/!

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

356

LinuxInternals

AdviceandresourcesLastadvice

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

357

UsetheSource,Luke!ManyresourcesandtricksontheInternetfindyouwill,but solutionstoalltechnicalissuesonlyintheSourcelie.

ThankstoLucasArtsCopyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

358

LinuxInternals

AnnexesQuizanswers

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

359

Quizanswersrequest_irq,free_irq Q:Whydoesdev_idhavetobeuniqueforsharedIRQs? A:Otherwise,thekernelwouldhavenowayofknowingwhichhandlerto release.Alsoneededformultipledevices(disks,serialports...)managed bythesamedriver,whichrelyonthesameinterrupthandlercode. Interrupthandling Q:Whydidthekernelsegfaultatmoduleunload(forgettingtounregister ahandlerinasharedinterruptline)? A:Kernelmemoryisallocatedatmoduleloadtime,tohostmodulecode. Thismemoryisfreedatmoduleunloadtime.Ifyouforgettounregistera handlerandaninterruptcomes,thecpuwilltrytojumptotheaddressof thehandler,whichisinafreedmemoryarea.Crash!Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

360

LinuxInternals

AnnexesInitrunlevels

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

361

SystemVinitrunlevels(1)IntroducedbySystemVUnix MuchmoreflexiblethaninBSD Makeitpossibletostartorstop differentservicesforeach runlevel Correspondtotheargumentgiven to/sbin/init. Runlevelsdefinedin /etc/inittab.

/etc/initabexcerpt:id:5:initdefault: #Systeminitialization. si::sysinit:/etc/rc.d/rc.sysinit l0:0:wait:/etc/rc.d/rc0 l1:1:wait:/etc/rc.d/rc1 l2:2:wait:/etc/rc.d/rc2 l3:3:wait:/etc/rc.d/rc3 l4:4:wait:/etc/rc.d/rc4 l5:5:wait:/etc/rc.d/rc5 l6:6:wait:/etc/rc.d/rc6

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

362

SystemVinitrunlevels(2)Standardlevels init0 Haltthesystem init1 Singleusermodeformaintenance init6 Rebootthesystem initS Singleusermodeformaintenance. Mountingonly/.Oftenidenticalto1 Customizablelevels:2,3,4,5 init3 Oftenmultiusermode,withonly commandlinelogin init5 Oftenmultiusermode,with graphicallogin

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

363

initscriptsAccordingto/etc/inittabsettings,initruns: First/etc/rc.d/rc.sysinitforallrunlevels Thenscriptsin/etc/rc.d/ Startingservices(1,3,5,S): runsS*scriptswiththestartoption Killingservices(0,6): runsK*scriptswiththestopoption Scriptsareruninfilenamelexicalorder Justuselsltofindouttheorder!Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

364

/etc/init.dRepositoryforallavailableinitscripts /etc/rc.d/onlycontainslinkstothe/etc/init.d/ scriptsneededforrunleveln /etc/rc1.d/example(fromFedoraCore3)K01yum>../init.d/yum K02cupsconfigdaemon>../init.d/cups configdaemon K02haldaemon>../init.d/haldaemon K02NetworkManager> ../init.d/NetworkManager K03messagebus>../init.d/messagebus K03rhnsd>../init.d/rhnsd K05anacron>../init.d/anacron K05atd>../init.d/atd S00single>../init.d/single S01sysstat>../init.d/sysstat S06cpuspeed>../init.d/cpuspeed

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

365

HandlinginitscriptsbyhandSimplycallthe/etc/init.dscripts! /etc/init.d/sshdstart Startingsshd:[OK] /etc/init.d/nfsstop ShuttingdownNFSmountd:[FAILED] ShuttingdownNFSdaemon: [FAILED]ShuttingdownNFSquotas: [FAILED] ShuttingdownNFSservices:[OK] /etc/init.d/pcmciastatus cardmgr(pid3721)isrunning... /etc/init.d/httpdrestart Stoppinghttpd:[OK] Startinghttpd:[OK]Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

366

LinksCodeexamples,additionalresources andupdatesareavailableat: http://www.codefidence.com/sourcedrop/course

Codefidencespecialistswillbedelightedtoprovideonetoone handsonconsultationandsupport.Giveusacall: http://www.codefidence.com

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Based on material by:

367