saving the floating-point state (linus torvalds)

3

Click here to load reader

Upload: testabc

Post on 26-Sep-2015

215 views

Category:

Documents


0 download

DESCRIPTION

cbvhnvn

TRANSCRIPT

  • 1/7/2015 Savingthefloatingpointstate(LinusTorvalds)

    http://yarchive.net/comp/linux/fp_state_save.html 1/3

    IndexHomeAboutBlog

    Newsgroups:fa.linux.kernelFrom:[email protected](LinusTorvalds)Subject:Re:contextswitchvs.signaldelivery[was:Re:Acceleratinguser modelinux]OriginalMessageID:Date:Mon,5Aug200205:36:20GMTMessageID:

    Inarticle,AndiKleenwrote:>IngoMolnarwrites:>>>>actuallytheoppositeistrue,ona2.2GHzP4:>>>>$./lat_sigcatch>>Signalhandleroverhead:3.091microseconds>>>>$./lat_ctxs02>>20.90>>>>ie.*processtoprocess*contextswitchesare3.4timesfasterthansignal>>delivery.Ie.wecanswitchtoahelperthreadandback,andstillbe>>fasterthana*single*signal.>>Thisisbecausethesignalsave/restoredoesalotofunnecessarystuff.>OneoptimizationIimplementedatonetimewasaddingaSA_NOFPsignal>bitthattoldthekernelthatthesignalhandlerdidnotintend>tomodifyfloatingpointstate(fewsignalhandlersneedFP)Itwould>notsavetheFPUstatethenandreachedquitesomespeedupinsignal>latency.>>LinuxgotalotslowerinsignaldeliverywhentheSSE2supportwas>added.Thatgotthisspeedback.

    Thiswillbreak_horribly_when(if)glibcstartsusingSSE2forthingslikememcpy()etc.

    Iagreethatitisreallysadthatwehavetosave/restoreFPonsignals,butIthinkit'sunavoidable.Yourhackmayworkforyou,butitjustgetsreallydangerousingeneral.havingsignalsrandomlysubtlycorruptsomeSSE2statejustbecausethesignalhandlerusessomethinglikememcpy(withoutevenrealizingthatthatcouldleadtotrouble)isbad,bad,bad.

    Inotherwords,"notintendingto"doesnotimply"willnot".It'sjustpotentiallytooeasytochangeSSE2statebymistake.

    Andyes,thissignalhandlerthingisclearlyvisibleonbenchmarks.MUCHtooclearlyvisible.Ijustdidn'tseeanysafealternatives(andIstilldon't;()

    Linus

    Newsgroups:fa.linux.kernelFrom:LinusTorvaldsSubject:Re:contextswitchvs.signaldelivery[was:Re:AcceleratinguserOriginalMessageID:Date:Mon,5Aug200216:39:34GMTMessageID:

    OnMon,5Aug2002,JamieLokierwrote:

    >LinusTorvaldswrote:>>Iagreethatitisreallysadthatwehavetosave/restoreFPon>>signals,butIthinkit'sunavoidable.>>Couldn'tyoumarktheFPUasunusedforthedurationofthe>handler,andletthelazyFPUmechanismsavethestatewhenitisused>bythesignalhandler?

    Nope.Believeme,Igavesomethoughttocleverthingstodo.

    Thekernelwon'teven_see_alongjmp()outofasignalhandler,sothekernelhasareallyhardtimetryingtodoanycleverlazystuff.

    Also,peoplewhoplaygameswithFPactuallychangetheFPdataonthestackframe,anddependonsignalreturntoreloadit.AdmittedlyI'veonlyeverseenthisonSIGFPE,butanywaythisisalldonewithintegerinstructionsthatjusttouchbitpatternsonthestack..Thekernelcan'tcatchitsanely.

    >Forsophisticateduserspaceuses,liketheabove,I'dliketosee>atraphandlingmechanismthatsavesonlythe_minimum_state.

    Iwouldnotmindanextrapersignalflagthatsays"don'tbotherwithFPsaves"(thesamewaywealreadyhave"don'trestart"etc),butIwouldbeverynervousifglibcuseditbydefault(evenifglibcdoesn'tuseSSE2inmemcpy,gccitselfcandoit,andobviously_users_mayjustdoit

  • 1/7/2015 Savingthefloatingpointstate(LinusTorvalds)

    http://yarchive.net/comp/linux/fp_state_save.html 2/3

    themselves).

    SoitwouldhavetobeexplicitlyenabledwithaSA_NOFPSIGHANDLERflagorsomething.

    (Andyes,it'stheFPstuffthattakesmostofthetime.Ithinkthelmbenchnumbersforsignaldeliverytripledwhenthatwentin).

    Linus

    Newsgroups:fa.linux.kernelFrom:LinusTorvaldsSubject:Re:contextswitchvs.signaldelivery[was:Re:AcceleratingusermodeOriginalMessageID:Date:Mon,5Aug200220:24:54GMTMessageID:

    OnMon,5Aug2002,OliverNeukumwrote:>>>Also,peoplewhoplaygameswithFPactuallychangetheFPdataonthe>>stackframe,anddependonsignalreturntoreloadit.AdmittedlyI've>>onlyeverseenthisonSIGFPE,butanywaythisisalldonewithinteger>>instructionsthatjusttouchbitpatternsonthestack..Thekernelcan't>>catchitsanely.>>Couldthefpstatebeputonitsownpageandthedirtybit>evaluatedinthedecisionwhethertorestorefpustate?

    I'msureanythingis_possible_,butthereareafewproblemswiththatapproach.Inparticular,playingVMgamestendstobequiteexpensiveonSMP,sinceyouneedtomakesurethattheTLBentryforthatpageisinvalidatedonalltheotherCPU'sbeforeyouinserttheFPUpage.

    Also,you'dneedtoplaygameswithdirtybithandling,sincethepage_is_dirty(itcontainsFPdata),sotheVMmustknowtowriteitoutifitpagesthings.That'sokwehaveseparateperpageandperTLBentrydirtybitsanyway,butrightnowtheVMlayerknowsitcanmovetheTLBentrydirtybitintotheperpagedirtybitanddropitwhichwouldn'tbethecaseifwealsohaveaFPUdirtybit.

    That'sfixablewecouldjustmakea"softwareTLBdirtybit"thatitupdatedwheneverthehardwareTLBdirtybitisclearedandmovedintotheperpagedirtybit.

    Buttheendresultsoundsrathercomplicated,especiallysinceallthepagetablewalkingnecessaryforsettingthisallupislikelytobeaboutasexpensiveasthethingwe'retryingtoavoid..

    Ruleofthumb:italmostneverpaystobe"clever".

    Linus

    Newsgroups:fa.linux.kernelFrom:LinusTorvaldsSubject:Re:contextswitchvs.signaldelivery[was:Re:AcceleratinguserOriginalMessageID:Date:Mon,5Aug200216:22:27GMTMessageID:

    On5Aug2002,AndiKleenwrote:>>Ithinkthepossibilityatleastformemcpyisratherremote.Anysane>SSEmemcpywouldonlykickinforreallybigarguments(forsmall>memcpysitdoesn'tmakeanysenseatallbecauseofthecontextsave/possible>reformattingpenaltyoverhead).Soonlypeopledoingreally>bigmemcpyscouldbepossiblyhurt,andthatisratherunlikely.

    Andthisiswhythekernel_has_tosavetheFPstate.

    It'sthe"onlyhappensinabluemoon"bugsthataretheabsolute_worst_bugs.IwanttooptimizethekerneluntilI'mblueintheface,butthekernelmustNEVEREVERhavea"nonstable"interface.

    Signalhandlersthatdon'trestorestatearehardas_hell_todebug.Mostofthetimeitdoesn'treallymatter(unlessthelackofrestoreissomethingreallymajorlikeoneofthemostcommonintegerregisters),butthendependingonwhatlibrariesyouuse,andjust_exactly_whenthesignalcomesin,yougetsubtledatacorruptionthatmaynotshowupuntilmuchlater.

    AtwhichpointyourprogrammerwondersifhemistakenlywanderedintoMSWindowsland.

    Nothankyou.I'lltakeslowsignalhandlersoveronesthat_sometimes_don'twork.

    >AfterallLinuxshouldgiveyouenoughropetoshotyourselfinthefoot;)

  • 1/7/2015 Savingthefloatingpointstate(LinusTorvalds)

    http://yarchive.net/comp/linux/fp_state_save.html 3/3

    Onpurpose,yes.It'soktotakecarefulaim,andsay"I'mnowshootingmyselfinthefoot".

    Andyes,it'salsooktosay"Idon'tknowwhatI'mdoing,soImaybeshootingmyselfinthefoot"(thisisobviouslythemostcommonfootshooter).

    Andifyoucometomeandcomplainabouthowdrunkyouwere,andhowyoushotyourselfinthefootbymistakeduetothat,I'lljustignoreyou.

    BUTandthisisabigBUTifyouaredoingeverythingright,andyouactuallyknowwhatyou'redoing,andyouendupshootingyourselfinthefootbecausethekernelwastakingashortcut,thenIthinkthekernelis_wrong_.

    AndI'dratherhaveaslowkernelthatdoesthingsright,thanafastkernelwhichscrewswithpeople.

    >Intheoryyoucoulddoasuperhack:puttheFPcontextintoanunmapped>pageonthestackandonlysavewithlazyFPUoraccesstotheunmapped>page.

    Thatwouldbeextremelyinterestingespeciallywithsignalhandlersthatdoalongjmp()thing.

    Therealfixforalotofprogramsonx86wouldbeforthemtonevereveruseFPinthefirstplace,inwhichcasethekernelwouldbeabletojustnotsaveandrestoreitatall.

    However,glibcfiddleswiththefpuatstartup,evenfornonFPprograms.Dunnowhattodoaboutthat.

    Linus

    From:LinusTorvaldsNewsgroups:fa.linux.kernelSubject:Re:[patch2.6.13rc3a]i386:inlinerestore_fpuDate:Tue,26Jul200521:53:46UTCMessageID:OriginalMessageID:

    OnTue,26Jul2005,ChuckEbbertwrote:>>SincefxsaveleavestheFPUstateintact,thereoughttobeabetterwaytodo>thisbutitgetstricky.MaybeusingtheTSCtoputatimestampineverythread>savearea?

    WeusedtohavetotallylazyFPsaving,andnottouchtheFPstateat_all_intheschedulerexcepttojustsettheTSbit.

    ItworkedwonderfullywellonUP,butgettingitworkingonSMPisamajorpain,sincethelazystateyouwanttoswitchbackintomightbecachedonsomeotherCPU'sregisters,soweneverdiditonSMP.EventuallyitgottoopainfultomaintaintwototallydifferentlogicalcodepathsbetweenUPandSMP,andsomebugorotherendedupresultinginthecurrent"lazyonatimeslicelevel"thingwhichworkswellinSMPtoo.

    Also,alotofthecostisreallythesave,andbeforeSSE2thefnsavewouldcleartheFPUstate,soyoucouldn'tjustdoasaveandtrytoelidejusttherestoreinthelazycase.InSSE2(withfxsave)we_could_trytodothat,butthethingis,Idoubtitreallyhelps.

    Firstoff,99%ofallprogramsdon'thitthenastycaseatall,andforsomethingbrokenlikevolanomarkthat_does_hitit,IbetthatthereismorethanonethreadusingtheFP,soyoucan'tjustcachetheFPstateintheCPU_anyway_.

    Sowecouldenhancethecurrentstatebyhavinga"nonlazy"modelikeintheexamplepatch,exceptwe'dhavetomakeitadynamicflag.Whichcouldeitherbedonebyexplicitlymarkingbinarieswewanttobenonlazy,orbyjustdynamicallynoticingthattherateofFPrestoresisveryhigh.

    Doesanybodyreallycareaboutvolanomark?Quitefrankly,Ithinkyou'dseea_lot_moreperformanceimprovementifyoucouldinsteadteachtheJavastuffnottouseFPallthetime,soitfeelsabitlikepaperingoverthe_real_bugifwe'dtrytooptimizethisabnormalandsillycaseinthekernel.

    Linus

    IndexHomeAboutBlog