linux-kernel archive_ re_ random panic in load_balance() with 3

2
1/7/2015 LinuxKernel Archive: Re: Random panic in load_balance() with 3.16rc http://lkml.iu.edu//hypermail/linux/kernel/1407.3/00650.html 1/2 Re: Random panic in load_balance() with 3.16rc From: Linus Torvalds Date: Thu Jul 24 2014 14:47:25 EST Next message: Jerome Glisse: "Re: [PATCH v2 00/25] AMDKFD kernel driver" Previous message: Igor Bezukh: "[PATCH] Staging: vt6655: remove redundant comments from bssdb.h" In reply to: Peter Zijlstra: "Re: Random panic in load_balance() with 3.16rc" Next in thread: Peter Zijlstra: "Re: Random panic in load_balance() with 3.16rc" Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] On Wed, Jul 23, 2014 at 6:43 PM, Michel DÃnzer <michel@xxxxxxxxxxx> wrote: >> >> Michel, mind doing >> >> make kernel/sched/fair.s >> >> and sending us the resulting file? > > Here it is, gzipped, hope that's okay. > > Note that my tree is now based on 3.16rc6. Ok, so I'm looking at the code generation and your compiler is pure and utter *shit*. Adding Jakub to the cc, because gcc4.9.0 seems to be terminally broken. Lookie here, your compiler does some absolutely insane things with the spilling, including spilling a *constant*. For chrissake, that compiler shouldn't have been allowed to graduate from kindergarten. We're talking "sloth that was dropped on the head as a baby" level retardation levels here: ... movq $load_balance_mask, 136(%rbp) #, %sfp subq $184, %rsp #, movq (%rdx), %rax # sd_22(D)>parent, sd_parent movl %edi, 144(%rbp) # this_cpu, %sfp movl %ecx, 140(%rbp) # idle, %sfp movq %r8, 200(%rbp) # continue_balancing, %sfp movq %rax, 184(%rbp) # sd_parent, %sfp movq 136(%rbp), %rax # %sfp, tcp_ptr__ #APP add %gs:this_cpu_off, %rax # this_cpu_off, tcp_ptr__ #NO_APP ... Note the contents of 136(%rbp). Seriously. That's an _immediate_constant_ that the compiler is spilling. Somebody needs to raise that as a gcc bug. Because it damn well is some seriously crazy shit. However, that constant spilling part just counts as "too stupid to live". The real bug is this: movq $load_balance_mask, 136(%rbp) #, %sfp subq $184, %rsp #, where gcc creates the stack frame *after* having already used it to save that constant *deep* below the stack frame. The x8664 ABI specifies a 128byte redzone under the stack pointer, and this is ok by that limit. It looks like it's illegal (136 > 128), but the fact is, we've had four "pushq"s to update %rsp since loading the frame pointer, so it's just *barely* legal with the redzoning. But we build the kernel with mnoredzone. We do *not* follow the x8664 ABI wrt redzoning, because we *cannot*: interrupts while in kernel mode *will* use the stack without a redzone. So that "mnoredzone" is not some "optional guideline". It's a hard and harsh requirement for the kernel, and gcc4.9 is a buggy piece of shit for ignoring it. And your bug happens becuase you happen to hit an interrupt _just_ in that single instruction window (or perhaps hit some other similar case and corrupted kernel data structures earlier).

Upload: testabc

Post on 18-Dec-2015

212 views

Category:

Documents


0 download

DESCRIPTION

ghfhf

TRANSCRIPT

  • 1/7/2015 LinuxKernelArchive:Re:Randompanicinload_balance()with3.16rc

    http://lkml.iu.edu//hypermail/linux/kernel/1407.3/00650.html 1/2

    Re:Randompanicinload_balance()with3.16rcFrom:LinusTorvaldsDate:ThuJul24201414:47:25EST

    Nextmessage:JeromeGlisse:"Re:[PATCHv200/25]AMDKFDkerneldriver"Previousmessage:IgorBezukh:"[PATCH]Staging:vt6655:removeredundantcommentsfrombssdb.h"Inreplyto:PeterZijlstra:"Re:Randompanicinload_balance()with3.16rc"Nextinthread:PeterZijlstra:"Re:Randompanicinload_balance()with3.16rc"Messagessortedby:[date][thread][subject][author]

    OnWed,Jul23,2014at6:43PM,MichelDnzerwrote:>>>>Michel,minddoing>>>>makekernel/sched/fair.s>>>>andsendingustheresultingfile?>>Hereitis,gzipped,hopethat'sokay.>>Notethatmytreeisnowbasedon3.16rc6.

    Ok,soI'mlookingatthecodegenerationandyourcompilerispureandutter*shit*.

    AddingJakubtothecc,becausegcc4.9.0seemstobeterminallybroken.

    Lookiehere,yourcompilerdoessomeabsolutelyinsanethingswiththespilling,includingspillinga*constant*.Forchrissake,thatcompilershouldn'thavebeenallowedtograduatefromkindergarten.We'retalking"sloththatwasdroppedontheheadasababy"levelretardationlevelshere:

    ...movq$load_balance_mask,136(%rbp)#,%sfpsubq$184,%rsp#,movq(%rdx),%rax#sd_22(D)>parent,sd_parentmovl%edi,144(%rbp)#this_cpu,%sfpmovl%ecx,140(%rbp)#idle,%sfpmovq%r8,200(%rbp)#continue_balancing,%sfpmovq%rax,184(%rbp)#sd_parent,%sfpmovq136(%rbp),%rax#%sfp,tcp_ptr__#APPadd%gs:this_cpu_off,%rax#this_cpu_off,tcp_ptr__#NO_APP...

    Notethecontentsof136(%rbp).Seriously.That'san_immediate_constant_thatthecompilerisspilling.

    Somebodyneedstoraisethatasagccbug.Becauseitdamnwellissomeseriouslycrazyshit.

    However,thatconstantspillingpartjustcountsas"toostupidtolive".Therealbugisthis:

    movq$load_balance_mask,136(%rbp)#,%sfpsubq$184,%rsp#,

    wheregcccreatesthestackframe*after*havingalreadyusedittosavethatconstant*deep*belowthestackframe.

    Thex8664ABIspecifiesa128byteredzoneunderthestackpointer,andthisisokbythatlimit.Itlookslikeit'sillegal(136>128),butthefactis,we'vehadfour"pushq"stoupdate%rspsinceloadingtheframepointer,soit'sjust*barely*legalwiththeredzoning.

    Butwebuildthekernelwithmnoredzone.Wedo*not*followthex8664ABIwrtredzoning,becausewe*cannot*:interruptswhileinkernelmode*will*usethestackwithoutaredzone.Sothat"mnoredzone"isnotsome"optionalguideline".It'sahardandharshrequirementforthekernel,andgcc4.9isabuggypieceofshitforignoringit.Andyourbughappensbecuaseyouhappentohitaninterrupt_just_inthatsingleinstructionwindow(orperhapshitsomeothersimilarcaseandcorruptedkerneldatastructuresearlier).

  • 1/7/2015 LinuxKernelArchive:Re:Randompanicinload_balance()with3.16rc

    http://lkml.iu.edu//hypermail/linux/kernel/1407.3/00650.html 2/2

    Now,Isuspectthatthisredzoningbugmightactuallyberelatedtothefactthatgccisstupidinspillingaconstant.Iwouldnotbesurprisedifthereissomelivenessanalysisgoingontodecide*when*toinsertthestackdecrement,andconstantsarebeingignoredbecauseclearlylivenessisn'tanissueforaconstantvalue.Sothetwobugs("stupidconstantspilling"and"invaliduseorredzonestack")gohandinhand.Butwhoknows.

    Anyway,thisisnotakernelbug.Thisisyourcompilercreatingcompletelybrokencode.Wemayneedtoaddawarningtomakesurenobodycompileswithgcc4.9.0,andtheDebianpeopleshouldprobablydowngratetheirshinynewcompiler.

    Jakub,anyideas?

    LinusTounsubscribefromthislist:sendtheline"unsubscribelinuxkernel"inthebodyofamessagetomajordomo@xxxxxxxxxxxxxxxMoremajordomoinfoathttp://vger.kernel.org/majordomoinfo.htmlPleasereadtheFAQathttp://www.tux.org/lkml/

    Nextmessage:JeromeGlisse:"Re:[PATCHv200/25]AMDKFDkerneldriver"Previousmessage:IgorBezukh:"[PATCH]Staging:vt6655:removeredundantcommentsfrombssdb.h"Inreplyto:PeterZijlstra:"Re:Randompanicinload_balance()with3.16rc"Nextinthread:PeterZijlstra:"Re:Randompanicinload_balance()with3.16rc"Messagessortedby:[date][thread][subject][author]