saving the floating-point state (linus torvalds)
DESCRIPTION
cbvhnvnTRANSCRIPT
-
1/7/2015 Savingthefloatingpointstate(LinusTorvalds)
http://yarchive.net/comp/linux/fp_state_save.html 1/3
IndexHomeAboutBlog
Newsgroups:fa.linux.kernelFrom:[email protected](LinusTorvalds)Subject:Re:contextswitchvs.signaldelivery[was:Re:Acceleratinguser modelinux]OriginalMessageID:Date:Mon,5Aug200205:36:20GMTMessageID:
Inarticle,AndiKleenwrote:>IngoMolnarwrites:>>>>actuallytheoppositeistrue,ona2.2GHzP4:>>>>$./lat_sigcatch>>Signalhandleroverhead:3.091microseconds>>>>$./lat_ctxs02>>20.90>>>>ie.*processtoprocess*contextswitchesare3.4timesfasterthansignal>>delivery.Ie.wecanswitchtoahelperthreadandback,andstillbe>>fasterthana*single*signal.>>Thisisbecausethesignalsave/restoredoesalotofunnecessarystuff.>OneoptimizationIimplementedatonetimewasaddingaSA_NOFPsignal>bitthattoldthekernelthatthesignalhandlerdidnotintend>tomodifyfloatingpointstate(fewsignalhandlersneedFP)Itwould>notsavetheFPUstatethenandreachedquitesomespeedupinsignal>latency.>>LinuxgotalotslowerinsignaldeliverywhentheSSE2supportwas>added.Thatgotthisspeedback.
Thiswillbreak_horribly_when(if)glibcstartsusingSSE2forthingslikememcpy()etc.
Iagreethatitisreallysadthatwehavetosave/restoreFPonsignals,butIthinkit'sunavoidable.Yourhackmayworkforyou,butitjustgetsreallydangerousingeneral.havingsignalsrandomlysubtlycorruptsomeSSE2statejustbecausethesignalhandlerusessomethinglikememcpy(withoutevenrealizingthatthatcouldleadtotrouble)isbad,bad,bad.
Inotherwords,"notintendingto"doesnotimply"willnot".It'sjustpotentiallytooeasytochangeSSE2statebymistake.
Andyes,thissignalhandlerthingisclearlyvisibleonbenchmarks.MUCHtooclearlyvisible.Ijustdidn'tseeanysafealternatives(andIstilldon't;()
Linus
Newsgroups:fa.linux.kernelFrom:LinusTorvaldsSubject:Re:contextswitchvs.signaldelivery[was:Re:AcceleratinguserOriginalMessageID:Date:Mon,5Aug200216:39:34GMTMessageID:
OnMon,5Aug2002,JamieLokierwrote:
>LinusTorvaldswrote:>>Iagreethatitisreallysadthatwehavetosave/restoreFPon>>signals,butIthinkit'sunavoidable.>>Couldn'tyoumarktheFPUasunusedforthedurationofthe>handler,andletthelazyFPUmechanismsavethestatewhenitisused>bythesignalhandler?
Nope.Believeme,Igavesomethoughttocleverthingstodo.
Thekernelwon'teven_see_alongjmp()outofasignalhandler,sothekernelhasareallyhardtimetryingtodoanycleverlazystuff.
Also,peoplewhoplaygameswithFPactuallychangetheFPdataonthestackframe,anddependonsignalreturntoreloadit.AdmittedlyI'veonlyeverseenthisonSIGFPE,butanywaythisisalldonewithintegerinstructionsthatjusttouchbitpatternsonthestack..Thekernelcan'tcatchitsanely.
>Forsophisticateduserspaceuses,liketheabove,I'dliketosee>atraphandlingmechanismthatsavesonlythe_minimum_state.
Iwouldnotmindanextrapersignalflagthatsays"don'tbotherwithFPsaves"(thesamewaywealreadyhave"don'trestart"etc),butIwouldbeverynervousifglibcuseditbydefault(evenifglibcdoesn'tuseSSE2inmemcpy,gccitselfcandoit,andobviously_users_mayjustdoit
-
1/7/2015 Savingthefloatingpointstate(LinusTorvalds)
http://yarchive.net/comp/linux/fp_state_save.html 2/3
themselves).
SoitwouldhavetobeexplicitlyenabledwithaSA_NOFPSIGHANDLERflagorsomething.
(Andyes,it'stheFPstuffthattakesmostofthetime.Ithinkthelmbenchnumbersforsignaldeliverytripledwhenthatwentin).
Linus
Newsgroups:fa.linux.kernelFrom:LinusTorvaldsSubject:Re:contextswitchvs.signaldelivery[was:Re:AcceleratingusermodeOriginalMessageID:Date:Mon,5Aug200220:24:54GMTMessageID:
OnMon,5Aug2002,OliverNeukumwrote:>>>Also,peoplewhoplaygameswithFPactuallychangetheFPdataonthe>>stackframe,anddependonsignalreturntoreloadit.AdmittedlyI've>>onlyeverseenthisonSIGFPE,butanywaythisisalldonewithinteger>>instructionsthatjusttouchbitpatternsonthestack..Thekernelcan't>>catchitsanely.>>Couldthefpstatebeputonitsownpageandthedirtybit>evaluatedinthedecisionwhethertorestorefpustate?
I'msureanythingis_possible_,butthereareafewproblemswiththatapproach.Inparticular,playingVMgamestendstobequiteexpensiveonSMP,sinceyouneedtomakesurethattheTLBentryforthatpageisinvalidatedonalltheotherCPU'sbeforeyouinserttheFPUpage.
Also,you'dneedtoplaygameswithdirtybithandling,sincethepage_is_dirty(itcontainsFPdata),sotheVMmustknowtowriteitoutifitpagesthings.That'sokwehaveseparateperpageandperTLBentrydirtybitsanyway,butrightnowtheVMlayerknowsitcanmovetheTLBentrydirtybitintotheperpagedirtybitanddropitwhichwouldn'tbethecaseifwealsohaveaFPUdirtybit.
That'sfixablewecouldjustmakea"softwareTLBdirtybit"thatitupdatedwheneverthehardwareTLBdirtybitisclearedandmovedintotheperpagedirtybit.
Buttheendresultsoundsrathercomplicated,especiallysinceallthepagetablewalkingnecessaryforsettingthisallupislikelytobeaboutasexpensiveasthethingwe'retryingtoavoid..
Ruleofthumb:italmostneverpaystobe"clever".
Linus
Newsgroups:fa.linux.kernelFrom:LinusTorvaldsSubject:Re:contextswitchvs.signaldelivery[was:Re:AcceleratinguserOriginalMessageID:Date:Mon,5Aug200216:22:27GMTMessageID:
On5Aug2002,AndiKleenwrote:>>Ithinkthepossibilityatleastformemcpyisratherremote.Anysane>SSEmemcpywouldonlykickinforreallybigarguments(forsmall>memcpysitdoesn'tmakeanysenseatallbecauseofthecontextsave/possible>reformattingpenaltyoverhead).Soonlypeopledoingreally>bigmemcpyscouldbepossiblyhurt,andthatisratherunlikely.
Andthisiswhythekernel_has_tosavetheFPstate.
It'sthe"onlyhappensinabluemoon"bugsthataretheabsolute_worst_bugs.IwanttooptimizethekerneluntilI'mblueintheface,butthekernelmustNEVEREVERhavea"nonstable"interface.
Signalhandlersthatdon'trestorestatearehardas_hell_todebug.Mostofthetimeitdoesn'treallymatter(unlessthelackofrestoreissomethingreallymajorlikeoneofthemostcommonintegerregisters),butthendependingonwhatlibrariesyouuse,andjust_exactly_whenthesignalcomesin,yougetsubtledatacorruptionthatmaynotshowupuntilmuchlater.
AtwhichpointyourprogrammerwondersifhemistakenlywanderedintoMSWindowsland.
Nothankyou.I'lltakeslowsignalhandlersoveronesthat_sometimes_don'twork.
>AfterallLinuxshouldgiveyouenoughropetoshotyourselfinthefoot;)
-
1/7/2015 Savingthefloatingpointstate(LinusTorvalds)
http://yarchive.net/comp/linux/fp_state_save.html 3/3
Onpurpose,yes.It'soktotakecarefulaim,andsay"I'mnowshootingmyselfinthefoot".
Andyes,it'salsooktosay"Idon'tknowwhatI'mdoing,soImaybeshootingmyselfinthefoot"(thisisobviouslythemostcommonfootshooter).
Andifyoucometomeandcomplainabouthowdrunkyouwere,andhowyoushotyourselfinthefootbymistakeduetothat,I'lljustignoreyou.
BUTandthisisabigBUTifyouaredoingeverythingright,andyouactuallyknowwhatyou'redoing,andyouendupshootingyourselfinthefootbecausethekernelwastakingashortcut,thenIthinkthekernelis_wrong_.
AndI'dratherhaveaslowkernelthatdoesthingsright,thanafastkernelwhichscrewswithpeople.
>Intheoryyoucoulddoasuperhack:puttheFPcontextintoanunmapped>pageonthestackandonlysavewithlazyFPUoraccesstotheunmapped>page.
Thatwouldbeextremelyinterestingespeciallywithsignalhandlersthatdoalongjmp()thing.
Therealfixforalotofprogramsonx86wouldbeforthemtonevereveruseFPinthefirstplace,inwhichcasethekernelwouldbeabletojustnotsaveandrestoreitatall.
However,glibcfiddleswiththefpuatstartup,evenfornonFPprograms.Dunnowhattodoaboutthat.
Linus
From:LinusTorvaldsNewsgroups:fa.linux.kernelSubject:Re:[patch2.6.13rc3a]i386:inlinerestore_fpuDate:Tue,26Jul200521:53:46UTCMessageID:OriginalMessageID:
OnTue,26Jul2005,ChuckEbbertwrote:>>SincefxsaveleavestheFPUstateintact,thereoughttobeabetterwaytodo>thisbutitgetstricky.MaybeusingtheTSCtoputatimestampineverythread>savearea?
WeusedtohavetotallylazyFPsaving,andnottouchtheFPstateat_all_intheschedulerexcepttojustsettheTSbit.
ItworkedwonderfullywellonUP,butgettingitworkingonSMPisamajorpain,sincethelazystateyouwanttoswitchbackintomightbecachedonsomeotherCPU'sregisters,soweneverdiditonSMP.EventuallyitgottoopainfultomaintaintwototallydifferentlogicalcodepathsbetweenUPandSMP,andsomebugorotherendedupresultinginthecurrent"lazyonatimeslicelevel"thingwhichworkswellinSMPtoo.
Also,alotofthecostisreallythesave,andbeforeSSE2thefnsavewouldcleartheFPUstate,soyoucouldn'tjustdoasaveandtrytoelidejusttherestoreinthelazycase.InSSE2(withfxsave)we_could_trytodothat,butthethingis,Idoubtitreallyhelps.
Firstoff,99%ofallprogramsdon'thitthenastycaseatall,andforsomethingbrokenlikevolanomarkthat_does_hitit,IbetthatthereismorethanonethreadusingtheFP,soyoucan'tjustcachetheFPstateintheCPU_anyway_.
Sowecouldenhancethecurrentstatebyhavinga"nonlazy"modelikeintheexamplepatch,exceptwe'dhavetomakeitadynamicflag.Whichcouldeitherbedonebyexplicitlymarkingbinarieswewanttobenonlazy,orbyjustdynamicallynoticingthattherateofFPrestoresisveryhigh.
Doesanybodyreallycareaboutvolanomark?Quitefrankly,Ithinkyou'dseea_lot_moreperformanceimprovementifyoucouldinsteadteachtheJavastuffnottouseFPallthetime,soitfeelsabitlikepaperingoverthe_real_bugifwe'dtrytooptimizethisabnormalandsillycaseinthekernel.
Linus
IndexHomeAboutBlog