arm assembly language: programming and architecture
TRANSCRIPT
ARMAssemblyLanguageProgramming&Architecture
MuhammadAliMazidi
SarmadNaimiSepehrNaimiJaniceMazidi
Copyright©2013MazidisandNaimis
Allrightsreserved
(Aportionofthisbookwastakenfrom“The80x86IBMPC&CompatibleComputersVol1&2:4thedition”andwaspreviouslypublishedbyPearsonEducation,
Inc.)
“Regardmanasaminerichingemsofinestimablevalue.
Educationcan,alone,causeittorevealitstreasures,andenablemankindtobenefittherefrom.”
Baha’u’llah
DedicationTothefaculty,staff,andstudentsofBIHEuniversityfortheirdedicationand
steadfastness.
PrefaceTheARMprocessor is becoming the dominantCPUarchitecture in the computer
industry. It is already the leadingarchitecture incellphonesand tablet computers.WithsuchalargenumberofcompaniesproducingARMchips,itiscertainthatthearchitecturewillmovetothelaptop,desktopandhigh-performancecomputerscurrentlydominatedbyx86 architecture from Intel and AMD. Currently the PIC and AVR microcontrollersdominate the 8-bit microcontroller market. The ARM architecture will have a majorimpactinthisareatooasdesignersbecomemorefamiliarwithitsarchitecture.ThisbookisintendedasanintroductiontoARMassemblylanguageprogrammingandarchitecture.We assumeno prior background in assembly language programmingwith otherCPUs.However,weurgeyou to studyChapter0covering the fundamentalsofdigital systemssuchashexadecimalnumbers,varioustypesofmemory,memoryandI/Ointerfacing,andmemory address decoding. Chapter 0 is available free of charge on our websitehttp://www.MicroDigitalEd.com/ARM/ARM_books.htm
UniversitiesandcollegesThis book is intended for both academic and industry readers. The answers to
reviewquestionsatendofeachsectionareprovidedatendofthechapter.Ifyouareusingthisbookforauniversitycoursethereareend-of-chapterproblemsthatcanbefoundonwww.MicroDigitalEd.com/ARM/ARM_books.htm
OurupcomingbooksintheARMseriesThisbookcoverstheAssemblylanguageprogrammingoftheARMchip.TheARM
Assemblylanguageisstandardregardlessofwhomakesthechip.TheARMlicenseesarefree to implement theon-chipperipheral (ADC,Timers, I/O,…)as theychoose.SincetheARMperipherals arenot standard among thevariousvendors,wehavededicated aseparatebooktoeachvendor.Thefollowingbooksareplannedinthisseries:
TIARMPeripheralProgrammingandInterfacing
FreescaleARMPeripheralProgrammingandInterfacing
NXPARMPeripheralProgrammingandInterfacing
AtmelARMPeripheralProgrammingandInterfacing
TheabovebooksuseClanguagetoprogramtheperipherals.
Finally,wewould like to thankprofessorsShujenChen,RabahAoufi, andClydeKnight for their reading of the book before publication. We sincerely appreciate theirinsightsandinputs.WewouldalsoliketothankFadaMahmoudi,MozhdeAmiri,KeyvanRoshani,AzaliaYaghini,ArashZamani,ParhamFazlali,andAshkanFarivarforsharingtheircommentsontheprepublicationversionofthise-book.
Contactusatthefollowingemailaddress:
TableofContentsChapter1:TheHistoryofARMandMicrocontrollers
Section1.1:IntroductiontoMicrocontrollers
Section1.2:TheARMFamilyHistory
Problems
AnswerstoReviewQuestions
Chapter2:ARMArchitectureandAssemblyLanguageProgramming
Section2.1:TheGeneralPurposeRegistersintheARM
Section2.2:TheARMMemoryMap
Section2.3:LoadandStoreInstructionsinARM
Section2.4:ARMCPSR(CurrentProgramStatusRegister)
Section2.5:ARMDataFormatandDirectives
Section2.6:IntroductiontoARMAssemblyProgramming
Section2.7:AssemblinganARMProgram
Section2.8:TheProgramCounterandProgramROMSpaceintheARM
Section2.9:SomeARMAddressingModes
Section2.10:RISCArchitectureinARM
Section2.11:ViewingRegistersandMemorywithARMKeilIDE
Problems
AnswerstoReviewQuestions
Chapter3:ArithmeticandLogicInstructionsandPrograms
Section3.1:ArithmeticInstructions
Section3.2:LogicInstructions
Section3.3:RotateandBarrelShifter
Section3.4:ShiftandRotateInstructionsinARMCortex(CaseStudy)
Section3.5:BCDandASCIIConversion
Problems
AnswerstoReviewQuestions
Chapter4:Branch,Call,andLoopinginARM
Section4.1:LoopingandBranchInstructions
Section4.2:CallingSubroutinewithBL
Section4.3:ARMTimeDelayandInstructionPipeline
Section4.4:ConditionalExecution
Problems
AnswerstoReviewQuestions
Chapter5:SignedNumbersandIEEE754FloatingPoint
Section5.1:SignedNumbersConcept
Section5.2:SignedNumberInstructionsandOperations
Section5.3:IEEE754Floating-PointStandards
Problems:
AnswerstoReviewQuestions
Chapter6:ARMMemoryMap,MemoryAccess,andStack
Section6.1:ARMMemoryMapandMemoryAccess
Section6.2:StackandStackUsageinARM
Section6.3:ARMBit-AddressableMemoryRegion
Section6.4:AdvancedIndexedAddressingMode
Section6.5:ADR,LDR,andPCRelativeAddressing
Problems
AnswerstoReviewQuestions
Chapter7:ARMPipelineandCPUEvolution
Section7.1:ARMPipelineEvolution
Section7.2:OtherCPUEnhancements
Problems
AnswerstoReviewQuestions
AppendixA:ARMCortex-M3InstructionDescription
SectionA.1:ListofARMCortex-M3Instructions
SectionA.2:ARMCortex-M3InstructionDescription
AppendixB:ARMAssemblerDirectives
SectionB.1:ListofARMAssemblerDirectives
SectionB.2:DescriptionofARMAssemblerDirectives
AppendixC:Macros
Whatisamacroandhowisitused?
Macrosvs.subroutines
AppendixD:FlowchartsandPseudocode
Flowcharts
Pseudocode
AppendixE:PassingArgumentsintoFunctions
E.1:Passingargumentsthroughregisters
E.2:Passingthroughmemoryusingreferences
E.3:Passingargumentsthroughstack
E.4:AAPCS(ARMApplicationProcedureCallStandard)
AppendixF:ASCIICodes
Chapter1:TheHistoryofARMandMicrocontrollersInSection1.1welookatthehistoryofmicrocontrollersthenweintroducesomeof
theavailablemicrocontrollers.ThehistoryofARMisprovidedinSection1.2.
Section1.1:IntroductiontoMicrocontrollersTheevolutionofMicroprocessorsandMicrocontrollers
In early computers, CPUs were designed using a number of vacuum tubes. Thevacuum tubewas bulky and consumed a lot of electricity. The invention of transistors,followed by the IC (Integrated Circuit), provided the means to put a CPU on printedcircuitboards.TheadvancesinICtechnologyallowedputtingtheentireCPUonasingleICchip.This ICwascalledamicroprocessor.Someof themicroprocessorsare thex86family of Intel used widely in desktop computers, and the 68000 of Motorola. ThemicroprocessorsdonotcontainRAM,ROM,orI/Operipherals.Asaresult,theymustbeconnectedexternallytoRAM,ROMandI/O,asshowninFigure1-1.
Figure1-1:AComputerMadebyGeneralPurposeMicroprocessor
In thenextstep, thedifferentpartsofasystem, includingCPU,RAM,ROM,andI/Os, were put together on a single IC chip and it was called microcontroller. SOC(System on Chip) andMCU (Micro Controller Unit) are other names used to refer tomicrocontrollers. Figure 1-2 shows the simplified view of the internal parts ofmicrocontrollers.
Figure1-2:SimplifiedViewoftheInternalPartsofMicrocontrollers(SOC)
Since the microcontrollers are cheap and small, they are widely used in manydevices.
TypesofComputers
Typically,computersarecategorizedinto3groups:desktopcomputers,servers,andembeddedsystems.
Desktop computers, including PCs, tablets, and laptops, are general purposecomputers.Theycanbeusedtoplaygames,readandeditarticles,anddoanyothertaskjust by running the proper application programs. The desktop computers usemicroprocessors.
Incontrast,embeddedsystemsarespecial-purposecomputers.Inembeddedsystemdevices,thesoftwareapplicationandhardwareareembeddedtogetherandaredesignedtodoaspecifictask.Forexample,theKindle,digitalcamera,vacuumcleaner,mp3player,mouse, keyboard, and printer, are some examples of embedded systems. Inmost casesembeddedsystemsrunafixedprogramandcontainamicrocontroller.Itisinterestingtonote that embedded systems are the largest class of computers though they are notnormallyconsideredascomputersbythegeneralpublic.
Serversarethefastcomputerswhichmightbeusedaswebhosts,databaseservers,andinanyapplicationinwhichweneedtoprocessahugeamountofdatasuchasweatherforecasting. Similar to desktop computers, servers are made of microprocessors but,multipleprocessorsareusuallyusedineachserver.Bothserversanddesktopcomputersare connected to a number of embedded systemdevices such asmouse, keyboard, diskcontroller,Flashstickmemoryandsoon.
ABriefHistoryoftheMicrocontrollers
Inthe1980sand1990s,IntelandMotoroladominatedthefieldofmicroprocessorsandmicrocontrollers. Intel had the x86 (8088/86, 80286, 80386, 80486, and Pentium).Motorola (nowFreescale) had the 68xxx (68000, 68010, 68020, etc.).Many embeddedsystemsusedIntel’s32-bitchipsofx86(386,486,Pentium)andMotorola’s32-bit68xxxforhigh-endembeddedproductssuchasrouters.Forexample,Ciscoroutersused68xxxfor theCPU.At the lowend, the8051 fromInteland68HC11fromMotorolawere thedominant8-bitmicrocontrollers.With the introductionofPICfromMicrochipandAVRfromAtmel, they becamemajor players in the 8-bitmarket formicrocontroller. At thetime of this writing, PIC and AVR are the leaders in terms of volume for 8-bitmicrocontrollers. In the late 1990s, the ARM microcontroller started to challenge thedominanceofIntelandMotorolainthe32-bitmarket.AlthoughbothIntelandMotorolausedRISCfeaturestoenhancetheperformanceoftheirmicroprocessors,duetotheneedtomaintain compatibilitywith legacy software, they could notmake a clean break andstart over. Intel used massive amounts of gates to keep up the performance of x86architecture and that in turn increased the power consumption of the x86 to a levelunacceptable for battery-powered embedded products.Meanwhile Freescale (Motorola)streamlinedtheinstructionsofthe68xxxCPUandcreatedanewlineofmicroprocessorscalledColdFire,whileatthesametimeworkedwithIBMtodesignanewRISCprocessorcalledPowerPC.WhilebothPowerPCandColdfireare still aliveandbeingused in the32-bit market, it is ARM which has become the leading microcontroller in the 32-bitmarket.
CurrentlyAvailableMicrocontrollers
Therearemanymicrocontrollersavailableinthemarket.SomeofthemarelistedinTable1-1.
32-bit
ARM, AVR32 (Atmel), ColdFire (Freescale), MIPS32, PIC32 (Microchip), PowerPC, TriCore(Infineon),SuperH
16-bit
MSP430(TI),HCS12(Freescale),PIC24(Microchip),dsPIC(Microchip)
8-bit
8051,AVR(Atmel),HCS08(Freescale),PIC16,PIC18
Table1-1:SomeMicrocontrollers
Introductiontosome32-bitmicrocontrollers
x86: The x86 and Pentium processors are based on the 32-bit architecture of the386.AlthoughbothIntelandAMDarepushingthex86intotheembeddedmarket,duetothehighpowerconsumptionof thesechips, theembeddedmarkethasnotembraced thex86.Intelisworkinghardtomakealow-powerversionofthe386calledAtomavailablefortheembeddedmarket.
PIC32:ItisbasedontheMIPSarchitectureandisgettingsomeattentionduetothefact it shares some of the peripherals with the PIC24/PIC18 chips and also using theMPLABfor IDE. Microchiphopes the freeMPLABIDEandengineers’knowledgeofthe 8-bit PIC will attract embedded developers to the PIC32 as they move to 32-bitsystemsfortheirhighendembeddedproducts.
ColdFire: The Freescale (formerly Motorola) is based on the venerable 680x0(68000, 68010) so popular in the 1980s and 1990s. They streamlined the 68000instructions to make it more RISC-type architecture and is the top seller of 32-bitprocessorsfromtheFreescale.InrecentyearsFreescalerevampedandredesignedthe8-bitHCS08(fromthe6808)tosharesomeoftheperipheralswithColdFireandarepushingthemunderthenameFlexis.TheyhopeengineersusetheHCS08atthelow-endandmovetoColdfireforhigh-endoftheembeddedproductswithminimumlearningcurve.
PowerPC: Thiswas developed jointly by IBM and Freescale. Itwas used in theAppleMacforafewyears.ThenAppleswitchedtox86forawhileandcurrentlyisusingARMinall theirproducts.Nowadays,bothFreescaleandIBMmarket thePowerPCforthehigh-endoftheembeddedsystems.
Howtochooseamicrocontroller
Thefollowingtwofactorscanbeimportantinchoosingamicrocontroller:
· Chipcharacteristics:Someofthefactorsinchoosingamicrocontrollerchipareclockspeed,powerconsumption,price,andon-chipmemoriesandperipherals.
· Availableresources:OtherfactorsinchoosingamicrocontrollerincludetheIDEcompiler,legacysoftware,andmultiplesourcesofproduction.
ReviewQuestions
1.Trueorfalse.Microcontrollersarenormallylessexpensivethanmicroprocessors.
2. Whencomparingasystemboardbasedonamicrocontrollerandageneral-purposemicroprocessor,whichoneischeaper?
3.Amicrocontrollernormallyhaswhichofthefollowingdeviceson-chip?
(a)RAM(b)ROM(c)I/O(d)alloftheabove
4.Ageneral-purposemicroprocessornormallyneedswhichofthefollowingdevicestobeattachedtoit?
(a)RAM(b)ROM(c)I/O(d)alloftheabove
5.Anembeddedsystemisalsocalledadedicatedsystem.Why?
6.Whatdoesthetermembeddedsystemmean?
7.Whydoeshavingmultiplesourcesofagivenproductmatter?
Section1.2:TheARMFamilyHistoryInthissection,welookattheARManditshistory.
AbriefhistoryoftheARM
TheARMcameoutofacompanycalledAcornComputers inUnitedKingdominthe1980s.ProfessorSteveFurberofManchesterUniversityworkedwithSophieWilsontodefine theARMarchitecture and instructions.TheVLSITechnologyCorp.producedthe first ARM chip in 1985 for Acorn Computers and was designated as Acorn RISCMachine(ARM).Unable tocompetewithx86(8088,80286,80386,…)PCsfromIBMandotherpersonalcomputermakers,theAcornwasforcedtopushtheARMchipintothesingle-chipmicrocontrollermarketforembeddedproducts.ThatiswhenAppleCorp.gotinterestedinusingtheARMchipforthePDA(personaldigitalassistants)products.ThisrenewedinterestinthechipledtothecreationofanewcompanycalledARM(AdvancedRISCMachine).ThisnewcompanybetitsentirefortuneonsellingtherightstothisnewCPU to other siliconmanufacturers and design houses. Since the early 1990s, an everincreasingnumberofcompanieshavelicensedtherighttomaketheARMchip.SeeTable1-2forthemajormilestonesoftheARM.
Alsoseehttp://www.arm.com/about/company-profile/milestones.phpforthelist.
Table1-2:ARMCompanymilestones(www.ARM.com)
1982
AcornproducedacomputerforBBCnamedBBCmicro.GoodsalesofthecomputermotivatedAcorntodecidetomakeitsownmicroprocessor.
1983
AcornandVLSIbegandesigningtheARMmicroprocessor.
1985
AcornComputerGroupdevelopedtheworld’sfirstcommercialRISCprocessor.TheARMv1had2500transistors,andworkedwithafrequencyof4MHz.
1987
Acorn’sARMprocessordebutsasthefirstRISCprocessorforlow-costPCs
1989
AcornintroducedARMv3withafrequencyof25MHz.Ithada4KBcacheaswell.
1990
AdvancedRISCMachines(ARM)spinsoutofAcornandAppleComputer’scollaborationeffortswithachartertocreateanewmicroprocessorstandard.VLSITechnologybecomesaninvestorandthefirstlicensee.
1991
ARMintroduceditsfirstembeddableRISCcore,theARM6solutionusingARMv3architecture.
1992
GECPlesseyandSharplicensedARMtechnology
1993
TexasInstrumentslicensedARMtechnology
ARMintroducedtheARM7core.
1995
ARMannouncedtheThumbarchitectureextension,whichgives32-bitRISCperformanceat16-bitsystemcostandoffersindustry-leadingcodedensity
ARMlaunchedSoftwareDevelopmentToolkit
1996
ARMandVLSITechnologyintroducedtheARM810microprocessor
ARMandMicrosoftworkedtogethertoextendWindowsCEtotheARMarchitecture
1997
Hyundai,Lucent,Philips,RockwellandSonylicensedARMtechnology
ARM9TDMIfamilyannounced
1998
HP,IBM,Matsushita,SeikoEpsonandQualcommlicensedARMtechnology
ARMdevelopedsynthesizableversionoftheARM7TDMIcore
ARMPartnersshippedmorethan50millionARM-poweredproducts
1999
LSILogic,STMicroelectronicsandFujitsulicensedARMtechnology
ARMannouncedsynthesizableARM9Eprocessorwithenhancedsignalprocessing
2000
Agilent,Altera,Micronas,Mitsubishi,Motorola,Sanyo,TriscendandZTEIClicensedARMtechnology
ARMlaunchedSecurCorefamilyforsmartcards
TSMCandUMCbecamemembersofARMFoundryProgram
2001
ARM’sshareofthe32-bitembeddedRISCmicroprocessormarketgrewto76.8percent
ARMannouncednewARMv6architecture
Fujitsu,GlobalUniChip,SamsungandZeevolicensedARMtechnology
ARMacquiredkeytechnologiesandanembeddeddebugdesignteamfromNoralMicrologicsLtd
2002
ARMannouncedthatithadshippedoveronebillionofitsmicroprocessorcorestodate
ARMtechnologylicensedtoSeagate,Broadcom,Philips,Matsushita,Micrel,eSilicon,ChipExpressandITRI
ARMlaunchedtheARM11micro-architecture
ARMlaunchesitsRealViewfamilyofdevelopmenttools
FlextronicsbecamethefirstARMLicensingPartnerprogrammember,allowingittosub-licenseARMtechnologytoitsowncustomers
2004
TheARMCortexfamilyofprocessors,basedontheARMv7architecture,isannounced.TheARMCortex-M3isannouncedinconjunction,asthefirstofthenewfamilyofprocessors
ARMCortex-M3processorannounced,thefirstofanewCortexfamilyofprocessorcores
MPCoremultiprocessorlaunched,thefirstintegratedmultiprocessor
OptimoDEtechnologylaunched,thegroundbreakingembeddedsignalprocessingcore
2005
ARMacquiredKeilSoftware
ARMCortex-A8processorannounced
2007
FivebillionthARMPoweredprocessorshippedtothemobiledevicemarket
ARMCortex-M1processorlaunched–thefirstARMprocessordesignedspecificallyforimplementationonFPGAs
RealViewProfilerforEmbeddedSoftwareAnalysisintroduced
ARMunveilsCortex-A9processorsforscalableperformanceandlow-powerdesigns
2008
ARMannounces10billionthprocessorshipment
ARMMali-200GPUWorldsFirsttoachieveKhronosOpenGLES2.0conformanceat1080pHDTVresolution
2009
ARMannounces2GHzcapableCortex-A9dualcoreprocessorimplementation
ARMlaunchesitssmallest,lowestpower,mostenergyefficientprocessor,Cortex-M0
2010
ARMlaunchesCortex-M4processorforhighperformancedigitalsignalcontrol
ARMtogetherwithkeyPartnersformLinarotospeedrolloutofLinux-baseddevices
MicrosoftbecomesanARMArchitectureLicensee
ARM&TSMCsignlong-termagreementtoachieveoptimizedSystems-on-ChipbasedonARMprocessors,extendingdownto20nm
ARMextendsperformancerangeofprocessorofferingwiththeCortex-A15MPCoreprocessor
ARMMalibecomesthemostwidelylicensedembeddedGPUarchitecture
ARMMali-T604GraphicsProcessingUnitintroducedprovidingindustry-leadinggraphicsperformancewithanenergy-efficientprofile
2011
MicrosoftunveilsWindowsonARMatCES2011
IBMandARMcollaboratetoprovidecomprehensivedesignplatformsdownto14nm
ARMandUMCextendpartnershipinto28nm
Cortex-A7processorlaunched
Big-Littleprocessingannounced,linkingCortex-A15andCortex-A7processors
ARMv8architectureunveiledatTechCon
AMPannouncelicenseandplansforfirstARMv8-basedprocessor
ARMMali-T658GPUlaunched
ARMexpandsR&DpresenceinTaiwanwithHsinchuDesignCenter
ARMandAvnetlaunchEmbeddedSoftwareStore(ESS)
ARM,CadenceandTSMCtapeoutfirst20nmCortex-A15multicoreprocessor
2012
ARM,GemaltoandG&Dformjointventuretodelivernext-generationmobilesecurity
FirstWindowsRT(WindowsonARM)devicesrevealed
ARM,AMD,Imagination,MediaTekandTexasInstrumentsfoundingmembersofHeterogeneousSystemArchitecture(HAS)Foundation
ARMandTSMCworktogetheronFinFETprocesstechnologyfornext-generation64-bitARMprocessors
ARMformsfirstUKforumtocreatetechnologyblueprint“InternetofThings”devices
ARMnamedoneofBritain’sTopEmployers
MITTechnologyReviewnamedARMinitslistof50MostInnovativeCompanies
Currently theARMCorp. receives its entire revenue from licensing theARM toother companies since it does not own state of the art chip fabrication facility. ThisbusinessmodelofmakingmoneyfromsellingIP(intellectualproperty)hasmadeARMoneofthemostwidelyusedCPUarchitecturesintheworld.UnlikeIntelorFreescalewhodefinethearchitectureandfabricate thechip,hundredsofcompanieswhohavelicensedtheARMIPfeelalevelplayingfieldwhenitcomestocompetingwiththeoriginatorofthechip.
ARMandApple
WhenSteveJobscamebacktoruntheApplein1996,thecompanywasindecline.Ithadlostthepersonalcomputerracethathadstarted20yearsearlier.TheintroductionofiPod in 2001 changed the fortune of that companymore than anything else.Apple hadtriedtosellaPDAcalledNewtoninthe1990sbutwasnotsuccessful.TheNewtonwasusing theARMprocessor and itwas too early for its time.The iPodused an enhancedversionofARMcalledARM7andbecameaninstantsuccess.iPodbroughttheattentionto the ARM chip that it deserved. Since then Apple has been using the ARM chip iniPhonesand iPads.Today, theARMmicrocontroller is theCPUofchoice fordesigningcell phone and other hand-held devices. In the future,ARMwillmake further in-roadsinto the tablet and laptopPCmarketnow thatMicrosoftCorphas introduced theARMversionofitsWindowsoperatingsystem.
ARMfamilyvariations
AlthoughtheARM7familyisthemostwidelyusedversion,ARMisdeterminedtopushthearchitectureintothelowendofthemicrocontrollermarketwhere8-and16-bitmicrocontrollershavebeentraditionallydominating. ForthisreasontheyhavecomeupwithamicrocontrollerversionofARMcalledCortex.Aswewillseeinfuturechapters,the Cortex family of ARM microcontrollers maintains compatibility with the ARM7without sacrificing performance.TheARMarchitecture is also being pushed into high-performancesystemswheremulticorechipssuchasIntelXeondominate.
Figure 1-3 shows some of the most widely used ARM processors. It should beemphasized that we cannot use the terms ARM family and ARM architectureinterchangeably. For example, ARM11 family is based on ARMv6 architecture and
ARMv7AisthearchitectureofCortex-Afamily.
Figure1-3:ARMFamilyandArchitecture
OneCPU,manyperipherals
ARMhasdefinedthedetailsofarchitecture,registers,instructionset,memorymap,andtimingoftheARMCPUandholdsthecopyrighttoit.Thevariousdesignhousesandsemiconductormanufacturers license the IP (intellectual property) for theCPUand canadd their own peripherals as they please. It is up to the licensee (design houses andsemiconductormanufactures)todefinethedetailsofperipheralssuchasI/Oports,serialportUART,timer,ADC,SPI,DAC,I2C,andsoon.AsaresultwhiletheCPUinstructionsand architecture are same across all the ARM chips made by different vendors, theirperipheralsarenotcompatible.ThatmeansifyouwriteaprogramfortheserialportofanARMchipmadebyTI(TexasInstrument), theprogrammightnotnecessarilyrunonanARMchipsoldbyNXP.ThisistheonlydrawbackoftheARMmicrocontroller.ThegoodnewsistheIDE(integrateddevelopmentenvironment)suchasKeil(seewww.keil.com)orIAR(seewww.IAR.com)doprovideperipherallibrariesforchipsfromvariousvendorsandmake the jobofprogrammingtheperipheralsmucheasier. Itmustbenoted that inrecentyearsARMprovidestheIPforsomeperipheralssuchasUARTandSPI,butunliketheCPUarchitecture,itsadoptionisnotmandatoryanditisuptothechipmanufacturerwhether to adopt it or not. This is in contrast to the Coldfire microcontroller fromFreescale,inwhichtheFreescaledefinesthearchitectureandperipherals,fabricates,sells,andsupportsthechip.Figure1-4showstheARMsimplifiedblockdiagramandTable1-3providesalistofsomeARMvendors.
Figure1-4:ARMSimplifiedBlockDiagram
Actel AnalogDevices Atmel
Broadcom Cypress Ember
DustNetworks Energy Freescale
Fujitso Nuvoton NXP
Renesas Samsung ST
Toshiba TexasInstruments TriadSemiconductor
Table1-3:ARMVendors
ReviewQuestions
1.Trueorfalse.TheARMCPUinstructionsareuniversalregardlessofwhomakesthechip.
2.Trueorfalse.TheperipheralsofARMmicrocontrollerarestandardizedregardlessof
whomakesthechip.
3.AnARMmicrocontrollernormallyhaswhichofthefollowingdeviceson-chip?
(a)RAM(b)Timer(c)I/O(d)alloftheabove
4.Forwhichofthefollowings,ARMhasdefinedstandard?
(a)RAMsize(b)ROMsize(c)instructionset(d)alloftheabove
SeethefollowingwebsitesforARMmicrocontrollersandARMtrainers:
http://www.ARM.com
http://www.MicroDigitalEd.com
ProblemsSection1.1:IntroductiontoMicrocontrollers
1.TrueorFalse.Ageneral-purposemicroprocessorhason-chipROM.
2.TrueorFalse.Generally,amicrocontrollerhason-chipROM.
3.TrueorFalse.Amicrocontrollerhason-chipI/Oports.
4.TrueorFalse.AmicrocontrollerhasafixedamountofRAMonthechip.
5.Whatcomponentsareusuallyputtogetherwiththemicrocontrollerontoasinglechip?
6.Intel’sPentiumchipsusedinWindowsPCsneedexternal______and_____chipstostoredataandcode.
7.ListthreeembeddedproductsattachedtoaPC.
8.Whywouldsomeonewanttouseanx86asanembeddedprocessor?
9.Givethenameandthemanufacturerofsomeofthemostwidelyused8-bitmicrocontrollers.
10.InQuestion9,whichonehasthemostmanufacturesources?
11.Inabattery-basedembeddedproduct,whatisthemostimportantfactorinchoosingamicrocontroller?
12.Inanembeddedcontrollerwithon-chipROM,whydoesthesizeoftheROMmatter?
13.Inchoosingamicrocontroller,howimportantisittohavemultiplesourcesforthatchip?
14.Whatdoestheterm“third-partysupport”mean?
Section1.2:TheARMFamilyHistory
15.WhatdoesARMstandfor?
16.Trueorfalse.InARM,architectureshavethesamenamesasfamilies.
17.Trueorfalse.In1990s,ARMwaswidelyusedinmicroprocessorworld.
18.Trueorfalse.ARMiswidelyusedinAppleproducts,likeiPhoneandiPod.
19.Trueorfalse.CurrentlytheMicrosoftWindowsdoesnotsupportARMproducts.
20.Trueorfalse.AllARMchipshavestandardinstructions.
21.Trueorfalse.AllARMchipshavestandardperipherals
22.Trueorfalse.TheARMcorp.alsomanufacturestheARMchip.
23.Trueorfalse.TheARMIPmustbelicensedfromARMcorp.
24.Trueorfalse.AgivenserialcommunicationprogramiswrittenforTIARMchip.It
shouldworkwithoutanymodificationonFreescaleARMchip
25.Trueorfalse.AgivenAssemblylanguageprogramiswrittenforagivenfamilyofARMCortexchip.AnyotherCortexARMchipcanexecutetheprogram.
26.Trueorfalse.Atthepresenttime,ARMhasjustonemanufacturer.
27.WhatisthedifferencebetweentheARMproductsofdifferentmanufacturers?
28.Namesome32-bitmicrocontrollers.
29.WhatisIntel’schallengeindecreasingthepowerconsumptionofthex86?
AnswerstoReviewQuestionsSection1.1
1.True
2.Amicrocontroller-basedsystem
3.d
4.d
5.Itisdedicatedbecauseitdoesonlyonetypeofjob.
6.Embeddedsystemmeansthattheapplication(software)andtheprocessor(hardwaresuchasCPUandmemory)areembeddedtogetherintoasinglesystem.
7.Havingmultiplesourcesforagivenpartmeansyouarenothostagetoonesupplier.Moreimportantly,competitionamongsuppliersbringsaboutlowercostforthatproduct.
Section1.2
1.True
2.False
3.d
4.c
Chapter2:ARMArchitectureandAssemblyLanguageProgramming
CPUsuseregisterstostoredatatemporarily.ToprograminAssemblylanguage,wemustunderstand the registersandarchitectureofagivenCPUand the role theyplay inprocessing data. In Section 2.1we look at the general purpose registers (GPRs) of theARM.WedemonstratetheuseofGPRswithsimpleinstructionssuchasMOVandADD.Memory map and memory access of the ARM are discussed in Sections 2.2 and 2.3,respectively. In Section 2.4 we discuss the status register’s flag bits and how they areaffectedbyarithmeticinstructions.InSection2.5welookatsomewidelyusedAssemblylanguagedirectives,pseudo-code,anddata types related to theARM.InSection2.6weexamineAssembly languageandmachine languageprogramminganddefine termssuchas mnemonics, opcode, operand, and so on. The process of assembling and creating aready-to-runprogramfortheARMisdiscussedinSection2.7.Step-by-stepexecutionofan ARM program and the role of the program counter are examined in Section 2.8.Section2.9examinessomeARMaddressingmodes.ThemeritsofRISCarchitectureareexaminedinSection2.10.Section2.11discussestheKeilIDE.
Section2.1:TheGeneralPurposeRegistersintheARMCPUsuseregisterstostoredatatemporarily.ToprograminAssemblylanguage,we
mustunderstand the registersandarchitectureofagivenCPUand the role theyplay inprocessing data. In this sectionwe look at the general purpose registers (GPRs) of theARMandwedemonstrate the use ofGPRswith simple instructions such asMOVandADD.
ARMmicrocontrollershavemany registers for arithmetic and logicoperations. IntheCPU,registersareusedtostoreinformationtemporarily.Thatinformationcouldbeabyteofdatatobeprocessed,oranaddresspointingtothedatatobefetched.AllofARMregistersare32-bitwide.The32bitsofaregisterareshowninFigure2-1.TheserangefromtheMSB(most-significantbit)D31totheLSB(least-significantbit)D0.Witha32-bitdata type,anydata larger than32bitsmustbebrokeninto32-bitchunksbefore it isprocessed.AlthoughtheARMdefaultdatasizeis32-bitmanyassemblersalsosupportthesinglebit,8-bit,and16-bitdata types,aswewillseeinfuturechapters.The32-bitdatasizeoftheARMisoftenreferredasword.Thisisincontrasttox86CPUinwhichwordisdefined as 16-bit. InARM the 16-bit data is referred to as half-word. ThereforeARMsupportsbyte,half-word(twobyte),andword(fourbytes)datatypes.
Figure2-1:ARMRegistersDataSize
InARMthereare13generalpurposeregisters.TheyareR0–R12.SeeFigure2-2.Alloftheseregistersare32bitswide.
Figure2-2:ARMRegisters
The general purpose registers in ARM are the same as the accumulator in other
microprocessors.Theycanbeusedbyallarithmeticandlogicinstructions.Tounderstandthe use of the general purpose registers,wewill show it in the context of three simpleinstructions:MOV,ADD,andSUB.TheARMcorehasthreespecialfunctionregistersofR13, R14, and R15. We will examine their use in the next section. In some ARMprocessors,wealsohaveshadowregistersinvariousoperatingmodesdesignedtospeeduptheprogramexecutionwhenCPUswitchestask.
ARMInstructionFormat
TheARMCPUusesthetri-partinstructionformatformostinstructions.Oneofthemostcommonformatis:
instructiondestination,source1,source2
Depending on the instruction the source2 can be a register, immediate (constant)value,ormemory.Thedestinationisoftenaregisterorread/writememory.
MOVinstruction
Simply stated, the MOV instruction copies data into register or from register toregister.Ithasthefollowingformats:
MOVRn,Op2;loadRnregisterwithOp2(Operand2).
;Op2canbeimmediate
Op2canbeanimmediate(constant)number#Kwhichisan8-bitvaluethatcanbe0–255indecimal,(00–FFinhex).Op2canalsobearegisterRm.RnorRmareanyoftheregistersR0toR15.Ifweseetheword“immediate”,wearedealingwithaconstantvaluethat must be provided right there with the instruction. Notice the # before immediatevalues.ThefollowinginstructionloadstheR2registerwithavalueof0x25(25inhex).
MOVR2,#0x25;loadR2with0x25(R2=0x25)
ThefollowinginstructionloadstheR1registerwiththevalue0x87(87inhex).
MOVR1,#0x87;copy0x87intoR1(R1=0x87)
ThefollowinginstructionloadsR5withthevalueofR7.
MOVR5,R7;copycontentsofR7intoR5(R5=R7)
Notice the position of the source and destination operands. As you can see, theMOVloadstherightoperandintotheleftoperand.Inotherwords,thedestinationcomesfirst.
Towrite a comment inAssembly languagewe use ‘;’. It is the same as ‘//’ inClanguage,whichcauses theremainderof the lineofcodetobe ignored.For instance, intheaboveexamplestheexpressionsmentionedafter‘;’justexplainthefunctionalityoftheinstructionstoyou,anddonothaveanyeffectsontheexecutionoftheinstructions.
When programming the registers of theARMmicrocontrollerwith an immediatevalue,thefollowingpointsshouldbenoted:
1.Weput#infrontofeveryimmediatevalue.
2.Ifwewanttopresentanumberinhex,weputa0xinfrontofit.Ifweputnothinginfrontofanumber,itisindecimal.Forexample,in“MOVR1,#50”,R1isloadedwith50indecimal,whereasin“MOVR1,#0x50”,R1isloadedwith50inhex(80indecimal).
3.Ifvalues0toFFaremovedintoa32-bitregister,therestofthebitsareassumedtobeallzeros.Forexample, in“MOVR1,#0x5” the resultwillbeR1=0x00000005;thatis,R1=00000000000000000000000000000101inbinary.
4.Movinganimmediatevaluelargerthan255(FFinhex)intotheregisterwillcauseanerror.
Note!
Wecannotloadvalueslargerthan0xFF(255)intoregistersR0toR12usingtheMOVinstruction.Forexample,thefollowinginstructionisnotvalid:
MOVR5,#0x999999;invalidinstruction
ThereasonisthefactthatalthoughtheARMinstructionis32-bitwide,only8bitsofMOVinstructioncanbeusedasanimmediatevaluewhichcantakevaluesnotlargerthan0xFF(255).
ADDinstruction
TheADDinstructionhasthefollowingformat:
ADDRd,Rn,Op2;ADDRntoOp2andstoretheresultinRd
;Op2canbeImmediatevalue#K(Kisbetween0and255)
;orRegisterRm
TheADDinstructiontellstheCPUtoaddthevalueofRntoOp2andputtheresultinto the Rd (destination) register. Aswementioned before, Op2 can be an immediatevalue#Kbetween0–255indecimal(00–FFinhex)oraregisterRm.Toaddtwonumberssuchas0x25and0x34,onecandoanyofthefollowing:
MOVR1,#0x25;copy0x25intoR1(R1=0x25)
MOVR7,#0x34;copy0x34intoR1(R7=0x34)
ADDR5,R1,R7;addvalueR7toR1andputitinR5
;(R5=R1+R7)
or
MOVR1,#0x25;load(copy)0x25intoR1(R1=0x25)
ADDR5,R1,#0x34;add0x34toR1andputitinR5
;(R5=R1+0x34)
ExecutingtheabovelinesresultsinR5=0x59(0x25+0x34=0x59)
SUBinstruction
TheSUBinstructionislikeADDinstructionformat.ItsubtractsOp2fromRnandputtheresultinRd(destination)
SUBRd,Rn,Op2;Rd=Rn–Op2
Tosubtracttwonumberssuchas0x34and0x25,onecandothefollowing:
MOVR1,#0x34;load(copy)0x34intoR1(R1=0x34)
SUBR5,R1,#0x25;R5=R1–0x25(R1=0x34–0x25)
Theoldformat
NoticethatinmostofinstructionslikeADDandSUB,RncanbeomittedifRdandRnarethesame.ThisformatisnolongerrecommendedbyUnifiedAssemblerLanguage.
Forexample,eachpairofthefollowinginstructionsarethesame.
SUBR1,R1,#0x25;R1=R1-0x25
SUBR1,#0x25;R1=R1-0x25
SUBR1,R1,R2;R1=R1-R2
SUBR1,R2;R1=R1-R2
ADDR1,R1,#0x25;R1=R1+0x25
ADDR1,#0x25;R1=R1+0x25
ADDR1,R1,R2;R1=R1+R2
ADDR1,R2;R1=R1+R2
Figure2-3showsthegeneralpurposeregisters(GPRs)andtheALUinARM.TheeffectofarithmeticandlogicoperationsonthestatusregisterwillbediscussedinSection2.4.InTable2-1youseesomeoftheARM ALUinstructions.
Figure2-3:ARMRegistersandALU
Instruction Description
ADDRd,Rn,Op2* ADDRntoOp2andplacetheresultinRd
ADCRd,Rn,Op2 ADDRntoOp2withCarryandplacetheresultinRd
ANDRd,Rn,Op2 ANDRnwithOp2andplacetheresultinRd
BICRd,Rn,Op2 ANDRnwithNOTofOp2andplacetheresultinRd
CMPRn,Op2 CompareRnwithOp2andsetthestatusbitsofCPSR**
CMNRn,Op2 CompareRnwithnegativeofOp2andsetthestatusbits
EORRd,Rn,Op2 ExclusiveORRnwithOp2andplacetheresultinRd
MVNRd,Op2 PlaceNOTofOp2inRd
MOVRd,Op2 MOVE(Copy)Op2toRd
ORRRd,Rn,Op2 ORRnwithOp2andplacetheresultinRd
RSBRd,Rn,Op2 SubtractRnfromOp2andplacetheresultinRd
RSCRd,Rn,Op2 SubtractRnfromOp2withcarryandplacetheresultinRd
SBCRd,Rn,Op2 SubtractOp2fromRnwithcarryandplacetheresultinRd
SUBRd,Rn,Op2 SubtractOp2fromRnandplacetheresultinRd
TEQRn,Op2 Exclusive-ORRnwithOp2andsetthestatusbitsofCPSR
TSTRn,Op2 ANDRnwithOp2andsetthestatusbitsofCPSR
*Op2canbeanimmediate8-bitvalue#Kwhichcanbe0–255indecimal,(00–FFinhex).Op2canalsobearegisterRm.Rd,RnandRmareanyofthegeneralpurposeregisters
**CPSRisdiscussedlaterinthischapter
***Theinstructionsarediscussedindetailinthenextchapters
Table2-1:ALUInstructionsUsingGPRs
ReviewQuestions
1.Writeinstructionstomovethevalue0x34intotheR2register.
2.Writeinstructionstoaddthevalues0x16and0xCD.PlacetheresultintheR1register.
3.Trueorfalse.NovaluecanbemoveddirectlyintotheGPRs.
4.Whatisthelargesthexvaluethatcanbemovedintoa32-bitregisterusingMOVinstruction?Whatisthedecimalequivalentofthathexvalue?
5.AlloftheregistersintheARMare_____-bit.
Section2.2:TheARMMemoryMapInthissectionwediscussthememorymapforARMfamilymembers.
TheSpecialFunctionRegistersinARM
InARM theR13,R14,R15, andCPSR (current program status register) registersare called SFRs (special function registers) since each one is dedicated to a specificfunction.Agivenspecialfunctionregisterisdedicatedtospecificfunctionsuchasstatusregister,programcounter,stackpointer,andsoon.ThefunctionofeachSFRisfixedbytheCPUdesigneratthetimeofdesignbecauseitisusedforcontrolofthemicrocontrollerorkeepingtrackofspecificCPUstatus.ThefourSFRsofR13,R14,R15,andCPSRplayanextremelyimportantrole inARM.TheR13issetasideforstackpointer.TheR14isdesignatedaslinkregisterwhichholdsthereturnaddresswhentheCPUcallsasubroutineandtheR15istheprogramcounter(PC).ThePC(programcounter)pointstotheaddressof thenext instructiontobeexecutedaswewillseeinnextsection.TheCPSR(currentprogramstatusregister)isusedforkeepingconditionflagsamongotherthings,aswewillsee in Section 2.4. In contrast to SFRs, the GPRs (R0-R12) do not have any specificfunction and are used for storing general data. For this reason some ARM instructionformats(theThumb)haveonlyR0-7buteveryvariationofARMchiphasR13-R15SFRs.The Thumb instruction format is designed to compete with the 8- and 16-bitmicrocontrollersandincreasecodedensity.
ProgramCounterintheARM
OneofthemostimportantregisterintheARMmicrocontrolleristhePC(programcounter).Aswementionedearlier,theR15istheprogramcounter.TheprogramcounterisusedbytheCPUtopointtotheaddressofthenextinstructiontobeexecuted.AstheCPUfetches theopcodefromtheprogrammemory, theprogramcounter is incrementedautomatically to point to the next instruction.Thewider the programcounter, themorememorylocationsaCPUcanaccess.Thatmeansthata32-bitprogramcountercanaccessamaximumof4G(232=4G)bytesofprogrammemorylocations.
InARMmicrocontrollerseachmemorylocationisabytewide.Inthecaseofa32-bit program counter, the code space is 4G bytes (232 = 4G), which occupies the0x00000000–0xFFFFFFFFaddressrange.TheprogramcounterintheARMfamilyis32bits wide. This means that the ARM family can access addresses 0x00000000 to0xFFFFFFFF,atotalof4Gbytesoflocations(memoryspaces).Althoughthis4Gbytesofmemory spacecanbeallocated toon-chiporoff-chipmemory;however, at the timeofthiswriting,noneofthemembersoftheARMmicrocontrollerfamilyhavetheentire4Gbytesofon-chipmemorypopulated.SeeTable2-2.
Company Device Flash(KBytes) RAM(KBytes) I/OPins
Atmel AT91SAM7X512 512 128 62
NXP LPC2367 512 58 70
ST STR750FV2 256 16 72
TI TMS470R1A256 256 12 49
Freescale MK10DX256VML7 256 64 74
Table2-2:On-chipMemorySizeforsomeARMChips
MemoryspaceallocationintheARM
TheARMhas4Gbytesofdirectlyaccessiblememoryspace.Thismemoryspacehasaddresses0to0xFFFFFFFF.The4Gbytesofmemoryspacecanbedividedintofivesections.Theyareasfollows:
1.On-chipperipheralandI/Oregisters:ThisareaisdedicatedtogeneralpurposeI/O(GPIO)andspecialfunctionregisters(SFRs)ofperipheralssuchastimers,serialcommunication,ADC,andsoon.Inotherwords,ARMusesmemory-mappedI/O.See Chapter 0 for discussion of memory-mapped I/O. The function and addresslocationofeachSFRisfixedbythechipvendoratthetimeofdesignbecauseitisused forport registersofperipherals.Thenumberof locations set aside forGPIOregistersandSFRsdependsonthepinnumbersandperipheralfunctionssupportedbythatchip.Thatnumbercanvaryfromchiptochipevenamongmembersofthesamefamily fromthesamevendor.Due to the fact thatARMdoesnotdefine thetype and number of I/O peripherals one must not expect to have same addresslocationsfortheperipheralregistersamongvariousvendors.
2. On-chipdataSRAM:ARAMspace ranging froma fewkilobytes to severalhundredkilobytesissetasidemainlyfordatastorage.ThedataRAMspaceisusedfordatavariablesandstackandisaccessedbythemicrocontrollerinstructions.Thedata RAM space is read/write memory used by the CPU for storage of datavariables, scratch pad, and stack. The ARM microcontrollers’ data SRAM sizeranges from 2K bytes to several thousand kilobytes depending on the chip. Evenwithinthesamefamily,thesizeofthedataSRAMspacevariesfromchiptochip.AlargerdataSRAMsizemeansmoredifficultiesinmanagingtheseRAMlocationsifyou use Assembly language programming. In today’s high-performancemicrocontroller, however, with over a thousand bytes of data RAM, the job ofmanagingthemishandledbytheCcompilers.Indeed,theCcompilersaretheveryreasonweneedalargedataRAMsinceitmakesiteasierforCcompilerstostoreparametersandallowsthemtoperformtheirjobsmuchfaster.TheamountandthelocationoftheSRAMspacevaryfromchiptochipintheARMchips.AsectionofthedataRAMspaceisusedbystackaswewillseeinChapter6.AlthoughmanyoftheARMmicrocontrollersusedinembeddedproductstheSRAMspaceisusedfordata;onecanalsobuyordesignanARM-basedsysteminwhichtheRAMspaceisusedforbothdataandprogramcodes.Thisiswhatweseeinthex86PCs.Insuchsystemsnormallyoneconnects theARM CPUtoexternalDRAMand theDRAMmemoryisusedforbothcodeanddata.MicrosoftWindows8usessuchasystemforARM-basedTabletcomputers.
3.On-chipEEPROM:Ablockofmemoryfrom1Kbytestoseveralthousandbytesis set aside forEEPROMmemory.Theamount and the locationof theEEPROMspace vary from chip to chip in the ARM microcontrollers. Although in someapplicationstheEEPROMisusedforprogramcodestorage,itisusedmostoftenforsavingcriticaldata.NotallARMchipshaveon-chipEEPROM.
4.On-chipFlashROM:Ablockofmemoryfromafewkilobytestoseveralhundredkilobytesissetasideforprogramspace.Theprogramspaceisusedfortheprogramcode.Intoday’sARM microcontrollerchips,thecodeROMspaceisofFlashtypememory.The amount and the location of the codeROMspace vary fromchip tochip in the ARM products. See Table 2-2 and Examples 2-1 and 2-2. The FlashmemoryofcodeROMisunderthecontrolofthePC(programcounter).ThecodeROMmemorycanalsobeusedforstorageofstaticfixeddatasuchasASCIIdatastringsandlook-uptables.
Example2-1
AgivenARMchiphasthefollowingaddressassignments.Calculatethespaceandtheamountofmemorygiventoeachsection.
(a)Addressrangeof0x00100000–0x00100FFFforEEPROM
(b)Addressrangeof0x40000000–0x40007FFFforSRAM
(c)Addressrangeof0x00000000–0x0007FFFFforFlash
(d)Addressrangeof0xFFFC0000–0xFFFFFFFFforperipherals
Solution:
(a)Withaddressspaceof0x00100000to00100FFF,wehave00100FFF–00100000=0FFFbytes.Converting0FFFtodecimal,weget4,095+1,whichisequalto4Kbytes.
(b)Withaddressspaceof0x40000000to0x40007FFF,wehave40007FFF–40000000=7FFFbytes.Converting7FFFtodecimal,weget32,767+1,whichisequalto32Kbytes.
(c)Withaddressspaceof0000to7FFFF,wehave7FFFF–0=7FFFFbytes.Converting7FFFFtodecimal,weget524,287+1,whichisequalto512Kbytes.
(d)WithaddressspaceofFFFC0000toFFFFFFFF,wehaveFFFFFFFF–FFFC0000=3FFFFbytes.Converting3FFFFtodecimal,weget262,143+1,whichisequalto256Kbytes.
SeeFigure2-4.
Example2-2
FindtheaddressspacerangeofeachofthefollowingmemoryofanARMchip:
(a)2KBofEEPROMstartingataddress0x80000000
(b)16KBofSRAMstartingataddress0x90000000
(c)64KBofFlashROMstartingataddress0xF0000000
Solution:
(a)With2Kbytesofon-chipEEPROMmemory,wehave2048bytes(2×1024=2048).Thismapstoaddresslocationsof0x80000000to0x800007FF.Noticethat0isalwaysthefirstlocation.
(b)With16Kbytesofon-chipSRAM memory,wehave16,384bytes(16×1024=16,384),and16,384locationsgives0x90000000–0x90003FFF.
(c)With64Kwehave65,536bytes(64×1024=65,536),therefore,thememoryspaceis0xF0000000to0xF000FFFF.
Figure2-4:AnExampleofARMMemoryAllocation
5. Off-chipDRAM space: A DRAMmemory ranging from fewmegabytes toseveralhundredmegabytescanbe implemented for externalmemoryconnection.Althoughmany of theARMmicrocontrollers used in embedded products use theon-chipSRAMfordata, one can alsodesign anARM-based system inwhich theRAMisusedforbothdataandprogramcodes.Thisiswhatweseeinthex86PCs.In such systems one connects theARM CPU to externalDRAM and theDRAMmemoryisusedforbothcodeanddata,justlikethex86PCs.ManyARMvendors
are pushing theARM11 chip for the high-end of themarket such as servers anddatabasecomputers. In anARM11-based server computers, theexternal (off-chip)DRAM is used and managed by the operating system while on-chip Flash,EEPROM, and SRAMmemories are used for BIOS (basic input output system),POST (power on self test), and CPU scratch pad, respectively. In such cases thesystem is not different from x86 computers currently in use, except it usesARMCPU instead of a Pentium chip from Intel or x86 from AMD. The MicrosoftWindows8usesARMmotherboardwithoff-chipDRAM
Notice thefollowingdifferencesamong theon-chipFlashROM,dataSRAM,andEEPROMmemoriesinARM microcontrollersusedforembeddedproducts:
a)The data SRAM is used by theCPU for data variables and stack,whereas theEEPROMsareconsideredtobememorythatonecanalsoaddexternallytothechip.Inotherwords, whilemanyARMmicrocontroller chips have no EEPROMmemory, it isveryunlikelyforanARMmicrocontrollertohavenoon-chipdataSRAM.
b)Theon-chipFlashROMisusedforprogramcode,while theEEPROMisusedmostoftenforcriticalsystemdatathatmustnotbelostifpoweriscutoff.RememberthatdataSRAMisvolatilememoryanditscontentsarelostifthepowertothechipiscutoff.SincevolatiledataSRAMisused fordynamicvariables (constantlychangingdata)andstack.WeneedEEPROMmemorytosecurecriticalsystemdatathatdoesnotchangeveryoftenandwillnotbelostintheeventofpowerfailure.
c)Theon-chipFlashROMisprogrammedanderasedinblocksize.Theblocksizeis8,16,32,or64bytesormoredependingonthechiptechnology.That isnot thecasewith EEPROM, since the EEPROM is byte programmable and erasable. Both theEEPROMandFlashmemorieshave limitednumberoferase/writecycles,whichcanbe100,000 or more. While all semiconductor memories have unlimited number of reads(accesses),thenumberoftimesthatwecaneraseandwritetotheFlashandEEPROMarelimitedtoaround100,000timesatthetimeofthiswriting.Asthisnumberincreases,thelikelihoodthatwewillhaveharddrivemadeoutofFlashmemorywillalsoincrease.WehavealreadyseenhowtheFlashstickmemoryhasreplacedthefloppydriveswhichweresocommonintoearly2000s.
MemorymappedI/OintheARM
TraditionalCPUssuchasx86had twodistinct spaces: the I/Ospaceandmemoryspace.Inx86,whilealloftheI/OportsareaccessedusingINandOUTinstructions,thememoryaddressspaceisaccessedusingtheMOVinstruction.IntheARMCPUwehaveonlyonespaceanditismemoryspaceanditcanbeashighas4Gigabytes.TheARMusesthis4GigabytesforbothmemoryandI/Ospace.ThismappingoftheI/OportstomemoryspaceiscalledmemorymappedI/OandwasdiscussedinChapter0.
ReviewQuestions
1.Trueorfalse.TheGPRregistersareusedforstoringdata.
2.TheR0-R12registersarecalled______.
3.TheGPRregistersinARMare_____-bit.
4.TheR13-R15registersarecalled_____.
5.TheSFRregistersinARMare______-bit.
Section2.3:LoadandStoreInstructionsinARMTheinstructionswehaveusedsofarworkedwiththeimmediate(constant)valueof
KandtheGPRs.TheyalsousedtheGPRsastheirdestination.Wesawsimpleexamplesof using MOV and ADD earlier in Section 2.1. The ARM allows direct access to alllocations in the memory. In this section we show the instructions accessing variouslocations of the memory. This is one of the most important sections in the book formastering the topic of ARM Assembly language programming. Before we embark onstudying the load and store instructions of theARM,wemust note the fact that all theinstructions of theARM are 32-bit wide. In other words, every instruction of ARM isfixedat32-bit.Aswewillseeinlatersection,thefixedsizeinstructionisoneofthemostimportantcharacteristicsofRISCarchitecture.Incaseswherethereisnoneedforallthe32-bit,theARMhasaddedzerotomaketheinstructionfixedat32-bit.
LDRRd,[Rx]instruction
LDRRd,[Rx];loadRdwiththecontentsoflocationpointed
;tobyRxregister.Rxisanaddressbetween
;0x00000000to0xFFFFFFFF
TheLDRinstruction tells theCPUto load(bring in)oneword(32-bitor4bytes)fromabaseaddresspointedtobyRxintotheGPR.Afterthisinstructionisexecuted,theRdwillhavethesamevalueasfourconsecutivelocationsinthememory.Obviouslysinceeachmemorylocationcanholdonlyonebyte(ARMisabyteaddressableCPU),andourGPR is 32-bit, the LDR will bring in 4 bytes of data from 4 consecutive memorylocations.ThelocationscanbeintheSRAMoraFlashmemory.Forexample,the“LDRR2,[R5]” instructionwill copy the contents ofmemory locations pointed to byR5 intoregisterR2.SincetheR2registeris32-bitwide,itexpectsa32-bitoperandintherangeof0x00000000 to 0xFFFFFFFF.Thatmeans theR5 register gives the base address of thememory inwhich it holds the data. Therefore if R5=0x80000, theCPUwill fetch intoregisterR2,thecontentsofmemorylocations0x80000,0x80001,0x80002,and0x80003.
Thefollowing instruction loadsR7with thecontentsof location0x40000200.SeeFigure2-5.
;assumeR5=0x40000200
LDRR7,[R5];loadR7withthecontentsoflocations
;0x40000200-0x40000203
Figure2-5:ExecutingtheLDRInstruction
STRRx,[Rd]instruction
STRRx,[Rd];storeregisterRxintolocationspointedtobyRd
TheSTRinstructiontellstheCPUtostore(copy)thecontentsoftheGPRtoabaseaddress location pointed to by the Rd register. Notice that the source register of STRinstruction isplacedbefore thedestination register.ObviouslysinceGPRis32-bitwide(4-byte)weneed four consecutivememory locations to store the contents ofGPR.ThememorylocationsmustbewritablesuchasSRAM.SeeFigure2-6.The“STRR3,[R6]”instruction will copy the contents of R3 into locations pointed to by R6. Locations0x40000200 through 0x40000203 of the SRAMmemory will have the contents of R3sinceR6=0x40000200.
Figure2-6:ExecutingtheSTRInstruction
ThefollowinginstructionstoresthecontentsofR5intolocationspointedtobyR1.Assume0x40000340isanaddressofinternalRAMlocationsandheldbyregisterR1.
;assumeR1=0x40000340
STRR5,[R1];storeR5intolocationspointedtobyR1.
LDRBRd,[Rx]instruction
LDRBRd,[Rx];loadRdwiththecontentsofthelocation
;pointedtobyRxregister.
The LDRB instruction tells the CPU to load (copy) one byte from an addresspointed tobyRxinto the lowerbyteofRd.After this instruction isexecuted, the lower
byte ofRdwill have the same value asmemory location pointed to byRx. Itmust benoted that theunusedportion (theupper24bits)of theRd registerwillbeall zeros, asshowninFigure2-7.
Figure2-7
LDRvs.LDRB
Aswementioned earlier,we canuse theLDR instruction to copy the contentsoffourconsecutivememorylocationsintoa32-bitGPR.Therearesituationsthatwedonotneed to bring 4 bytes of data intoGPR.AnUART register is such a case. TheUARTregistersaregenerally8-bitand takeonlyonememoryspace location(memorymappedI/O).UsingLDRB,we canbring intoGPRa single byte of data fromUART registers.Thisisawidelyusedinstructionforaccessingthe8-bitI/Oandperipheralports.
STRBRx,[Rd]instruction
STRBRx,[Rd];storethebyteinregisterRxinto
;locationpointedtobyRd
The STRB instruction tells the CPU to store (copy) the byte value in Rx to anaddress location pointed to by the Rd register. After this instruction is executed, thememorylocationspointedtobytheRdwillhavethesamebyteasthelowerbyteoftheRx,asshowninFigure2-8.
Figure2-8
ThefollowingprogramfirstloadstheR1registerwithvalue0x55,thenstoresthisvalueintolocation0x40000100:
;assumeR5=0x40000100
MOVR1,#0x55;R1=55(inhex)
STRBR1,[R5];copyR1locationpointedtobyR5
GeneralpurposeI/O(GPIO)arepartofthespecialfunctionregistersinthememory-mapped I/O. They are connected to the I/O pins of the ARMmicrocontroller. See thedatasheetofyourARMmicrocontroller.WecanalsostorethecontentsofaGPRintoanylocationintheSRAMregionofthedataspace.SeeExamples2-3and2-4.
Example2-3
StatethecontentsofRAMlocations0x92to0x96afterthefollowingprogramisexecuted:
MOVR1,#0x99;R1=0x99
MOVR6,#0x92;R6=0x92
STRBR1,[R6];storeR1intolocationpointedtobyR6
;(location0x92)
ADDR6,R6,#1;R6=R6+1
MOVR1,#0x85;R1=0x85
STRBR1,[R6];storeR1intolocationpointedtobyR6
;(location0x93)
ADDR6,R6,#1;R6=R6+1
MOVR1,#0x3F;R1=0x3F
STRBR1,[R6];storeR1intolocationpointedtobyR6
ADDR6,R6,#1;R6=R6+1
MOVR1,#0x63;R1=0x63
STRBR1,[R6];storeR1intolocationpointedtobyR6
ADDR6,R6,#1;R6=R6+1
MOVR1,#0x12;R1=0x12
STRBR1,[R6]
Solution:
AftertheexecutionofSTRBR1,[R6]datamemorylocation0x92hasvalue0x99;
aftertheexecutionofSTRBR1,[R6]datamemorylocation0x93hasvalue0x85;
aftertheexecutionofSTRBR1,[R6]datamemorylocation0x94hasvalue0x3F;andsoon,asshowninthechart.
Address Data
0x92 0x99
0x93 0x85
0x94 0x3F
0x95 0x63
0x96 0x12
Example2-4
StatethecontentsofR2,R1,andmemorylocation0x20afterthefollowingprogram:
MOVR2,#0x5;loadR2with5(R2=0x05)
MOVR1,#0x2;loadR1with2(R1=0x02)
ADDR2,R1,R2;R2=R1+R2
ADDR2,R1,R2;R2=R1+R2
MOVR5,#0x20;R5=0x20
STRBR2,[R5];storeR2intolocationpointedtobyR5
Solution:
TheprogramloadsR2withvalue5.ThenitloadsR1withvalue2.ThenitaddstheR1registertoR2twice.Attheend,itstorestheresultinlocation0x20ofmemory.
AfterMOVR2,#0x05
Location Data
R2 5
R1
0x20
AfterMOVR1,#0x02
Location Data
R2 5
R1 2
0x20
AfterADDR2,R1,R1
Location Data
R2 7
R1 2
0x20
AfterADDR2,R1,R1
Location Data
R2 9
R1 2
0x20
AfterSTRB[R5],R2
Location Data
R2 9
R1 2
0x20 9
STRvs.STRB
Aswementionedearlier,wecanusetheSTRinstructiontocopythecontentsof32-bitGPR into four consecutivememory locations. The I/O ports are generally 8-bit andtakeonlyonememoryspacelocation(memorymappedI/O).UsingSTRB,wecansendabyteofdata fromGPR tomemory locationsuchasan I/Oport.Again, this isawidelyusedinstructionforaccessingthe8-bitI/Oandperipheralports.
LDRHRd,[Rx]instruction
LDRHRd,[Rx];loadRdwiththehalf-wordpointed
;tobyRxregister
TheLDRH instruction tells theCPU to load (copy) half-word (16-bit or 2 bytes)from a base address pointed to byRx into the lower 16-bits ofRdRegister.After thisinstructionisexecuted,thelower16-bitofRdwillhavethesamevalueastwoconsecutivelocations in the memory pointed to by base address of Rx. It must be noted that theunusedportion(theupper16bits)oftheRdregisterwillbeallzeros,asshowninFigure2-9.
Figure2-9
Table2-3comparesLDRB,LDRH,andLDR.
DataSize Bits Decimal Hexadecimal Loadinstructionused
Byte 8 0–255 0-0xFF LDRB
Half-word 16 0–65535 0-0xFFFF LDRH
Word 32 0–232-1 0-0xFFFFFFFF LDR
Table2-3:UnsignedDataRangeinARMandassociatedLoadInstructions
STRHRx,[Rd]instruction
STRHRx,[Rd];storehalf-word(2-byte)inregisterRx
;intolocationspointedtobyRd
TheSTRHinstructiontellstheCPUtostore(copy)thelower16-bitcontentsoftheRxtoanaddresslocationpointedtobytheRdregister.Afterthisinstructionisexecuted,thememorylocationspointedtobytheRdwillhavethesamevalueasthelower16-bitofRxRegister.Thelocationsarepartofthedataread/writememoryspacesuchason-chipSRAM.Forexample,the“STRHR3,[R6]”instructionwillcopythe16-bitlowercontentsofR3intotwoconsecutivelocationspointedtobybaseregisterR6.AsyoucanseeinFigure2-10,locations0x2000and0x2001oftheSRAMmemorywillhavethecontentsofthelowerhalfwordofR3sinceR6=0x2000.
Figure2-10
InTable2-4youseeacomparisonbetweenSTRB,STRH,andSTR.
DataSize Bits Decimal Hexadecimal Loadinstructionused
Byte 8 0–255 0-0xFF STRB
Half-word 16 0–65535 0-0xFFFF STRH
Word 32 0–232-1 0-0xFFFFFFFF STR
Table2-4:UnsignedDataRangeinARMandassociatedStoreInstructions
ReviewQuestions
1.Trueorfalse.No32-bitvaluecanbeloadeddirectlyintointernalR0-R12.
2.Writeinstructionstoloadvalue0x95intolocationwithaddress0x20.
3.WriteinstructionstomovethecontentsofR2tomemorylocationpointedtobyR8.
4.Writeinstructionstoloadvaluesfrommemorylocations0x20–0x23toR4register.
5.Whatisthelargesthexvaluethatcanbemovedintoasinglelocationinthedatamemory?Whatisthedecimalequivalentofthehexvalue?
6.“LDRR6,[R3]”putstheresultin_____.
7.Whatdoes“STRBR1,[R2]”do?
8.Whatisthelargesthexvaluethatcanbemovedintofourconsecutivelocationsinthedatamemory?Whatisthedecimalequivalentofthehexvalue?
Section2.4:ARMCPSR(CurrentProgramStatusRegister)Likeallothermicroprocessors, theARMhasa flag register to indicate arithmetic
conditions such as the carry bit. The flag register in the ARM is called the currentprogramstatusregister(CPSR).Inthissection,wediscussvariousbitsofthisregisterandprovidesomeexamplesofhowitisaltered.Chapters3and4showhowtheflagbitsofthestatusregisterareused.
ARMcurrentprogramstatusregister
The status register is a 32-bit register. See Figure 2-11 for the bits of the statusregister.ThebitsC,Z,N,andVarecalledconditionalflags,meaningthat theyindicatesomeconditionsthatresultafteraninstructionisexecuted.Eachoftheconditionalflagscanbeusedtoperformaconditionalbranch(jump),aswewillseeinChapter4.
Figure2-11:CPSR(CurrentProgramStatusRegister)
The following is a brief explanationof the flagbits of the current program statusregister(CPSR).Theimpactofinstructionsonthisregisteristhendiscussed.
C,thecarryflag
This flag is set whenever there is a carry out from the D31 bit. This flag bit isaffectedaftera32-bitadditionorsubtraction.Chapter4showshowthecarryflagisused.
Z,thezeroflag
Thezeroflagreflects theresultofanarithmeticor logicoperation. If theresult iszero,thenZ=1.Therefore,Z=0iftheresultisnotzero.SeeChapter4toseehowweusetheZflagforlooping.
N,thenegativeflag
BinaryrepresentationofsignednumbersusesD31asthesignbit.Thenegativeflagreflectstheresultofanarithmeticoperation.IftheD31bitoftheresultiszero,thenN=0andtheresultispositive.IftheD31bitisone,thenN=1andtheresultisnegative.Thenegative and V flag bits are used for the signed number arithmetic operations and arediscussedinChapter5.
V,theoverflowflag
This flag is set whenever the result of a signed number operation is too large,causingthehigh-orderbittooverflowintothesignbit.Ingeneral,thecarryflagisusedtodetect errors inunsignedarithmeticoperationswhile theoverflow flag isused todetecterrorsinsignedarithmeticoperations.TheVandNflagbitsareusedforsignednumberarithmeticoperationsandarediscussedinChapter5.
TheTflagbitisusedtoindicatetheARMisinThumbstate.TheIandFflagsareusedtoenableordisabletheinterrupt.SeetheARMmanual.
Ssuffixandthestatusregister
MostofARMinstructionscanaffectthestatusbitsofCPSRaccordingtotheresult.Ifwe need an instruction to update the value of status bits inCPSR,we have to put Ssuffixattheendofinstructions.Thatmeans,forexample,ADDSinsteadofADDisused.
ADDinstructionandthestatusregister
NextweexaminetheimpactoftheSUBSandADDSinstructionsontheflagbitsCandZofthestatusregister.Someexamplesshouldclarifytheirmeanings.AlthoughalltheflagbitsC,Z,V,andNareaffectedbytheADDSandSUBSinstruction,wewillfocusonflagsCandZfornow.TheotherflagbitsarediscussedinChapter5,becausetheyrelateonlytosignednumberoperations.ExamineExample2-5toseetheimpactoftheADDSinstruction on selected flag bits. See also Example 2-6 to see the impact of the SUBSinstructiononselectedflagbits.
Example2-5
ShowthestatusoftheCandZflagsaftertheadditionof
a)0x0000009Cand0xFFFFFF64inthefollowinginstruction:
;assumeR1=0x0000009CandR2=0xFFFFFF64
ADDSR2,R1,R2;addR1toR2andplacetheresultinR2
b)0x0000009Cand0xFFFFFF69inthefollowinginstruction:
;assumeR1=0x0000009CandR2=0xFFFFFF69
ADDSR2,R1,R2;addR1toR2andplacetheresultinR2
Solution:
a)
0x0000009C 00000000000000000000000010011100
+0xFFFFFF64
+11111111111111111111111101100100
0x100000000 100000000000000000000000000000000
C=1becausethereisacarrybeyondtheD31bit.
Z=1becausetheR2(theresult)hasvalue0initaftertheaddition.
b)
0x0000009C 00000000000000000000000010011100
+0xFFFFFF69
+11111111111111111111111101101001
0x100000005 100000000000000000000000000000101
C=1becausethereisacarrybeyondtheD31bit.
Z=0becausetheR2(theresult)doesnothavevalue0initaftertheaddition.(R2=0x00000005)
Example2-6
ShowthestatusoftheZflagduringtheexecutionofthefollowingprogram:
MOVR2,#4;R2=4
MOVR3,#2;R3=2
MOVR4,#4;R4=4
SUBSR5,R2,R3;R5=R2-R3(R5=4-2=2)
SUBSR5,R2,R4;R5=R2-R4(R5=4-4=0)
Solution:
TheZflagisraisedwhentheresultiszero.Otherwise,itiscleared(zero).Thus:
After ValueofR5 Zflag
SUBSR5,R2,R3 2 0
SUBSR5,R2,R4 0 1
Notallinstructionsaffecttheflags
SomeinstructionsaffectallthefourflagbitsC,Z,V,andN(e.g.,ADDS).Butsomeinstructions affectno flagbits at all.Thebranch instructions are in this category.Someinstructionsaffectonlysomeof theflagbits.The logic instructions(e.g.,ANDS)are inthiscategory.
Table 2-5 shows the instructions and the flag bits affected by them.AppendixAprovidesacompletelistofalltheinstructionsandtheirassociatedflagbits.
Instruction FlagsAffected
ANDS C,Z,N
ORRS C,Z,N
MOVS C,Z,N
ADDS C,Z,N,V
SUBS C,Z,N,V
B Noflags
NotethatwecannotputSafterBinstruction.
Table2-5:FlagBitsAffectedbyDifferentInstructions
Flagbitsanddecisionmaking
Thereareinstructionsthatwillmakeaconditionaljump(branch)basedonthestatusof the flag bits. Table 2-6 provides some of these instructions. Chapter 4 discusses theconditionalbranchinstructionsandhowtheyareused.
Instruction FlagsAffected
BCS BranchifC=1
BCC BranchifC=0
BEQ BranchifZ=1
BNE BranchifZ=0
BMI BranchifN=1
BPL BranchifN=0
BVS BranchifV=1
BVC BranchifV=0
Table2-6:ARMBranch(Jump)InstructionsUsingFlagBits
ReviewQuestions
1.TheflagregisterintheARMiscalledthe________.
2.WhatisthesizeoftheflagregisterintheARM?
3.FindtheCandZflagbitsforthefollowingcode:
;assumeR2=0xFFFFFF9F
;assumeR1=0x00000061
ADDSR2,R1,R2
4.FindtheZflagbitforthefollowingcode:
;assumeR7=0x22
;assumeR3=0x22
ADDSR7,R3,R7
5.FindtheCandZflagbitsforthefollowingcode:
;assumeR2=0x67
;assumeR1=0x99
ADDSR2,R1,R2
Section2.5:ARMDataFormatandDirectivesInthissectionwelookatsomewidelyuseddataformatsanddirectivessupportedby
theARMassembler.
ARMdatatype
ARMhasfourdatatypes.Theyarebit,byte(8-bit),half-word(16-bit)andword(32bit).DuetothefactthatARMregistersare32-bititisthejobofprogrammer/compilertobreakdowndatalargerthan32bitstobeprocessedbytheCPU.ThedatatypesusedbytheARMcanbepositiveornegative.AdiscussionofsignednumbersisgiveninChapter5.
Dataformatrepresentation
There are several ways to represent a byte of data in the ARM assembler. Thenumberscanbeinhex,binary,decimal,orASCIIformats.Thefollowingareexamplesofhoweachworks.
Hexnumbers
TorepresentHexnumbersinanARMassemblerweput0x(or0X)infrontofthenumberlikethis:
MOVR1,#0x99
Hereareafewlinesofcodethatusethehexformat:
MOVR2,#0x75;R2=0x75
MOVR1,#0x11
ADDR2,R1,R2;R2=R1+R2=0x75+0x11=0x86
Decimalnumbers
ToindicatedecimalnumbersinsomeARMassemblerssuchasKeilwesimplyusethedecimal(e.g.,12)andnothingbeforeorafterit.Herearesomeexamplesofhowtouseit:
MOVR7,#12;R7=00001100or0Cinhex
MOVR1,#32;R1=32=0x20
Binarynumbers
To represent binary numbers in an ARM assembler we put 2_ in front of thenumber.Itisasfollows:
MOVR6,#2_10011001;R6=10011001inbinaryor99inhex
Numbersinanybasebetween2and9
To indicate a number in any base n between 2 and 9 in an ARM assembler wesimplyusethen_infrontofit.Herearesomeexamplesofhowtouseit:
MOVR7,#8_33;R7=33inbase8or011011inbinaryformat
MOVR6,#2_10011001;R6=10011001inbase2or99inhex
ASCIIcharacters
TorepresentASCIIdatainanARMassemblerweusesinglequotesasfollows:
LDRR3,#‘2’;R3=00110010or32inhex(SeeAppendixF)
This is the same as other assemblers such as the 8051 and x86. Here is anotherexample:
LDRR2,#‘9’;R2=0x39,whichishexnumberforASCII‘9’
Torepresentastring,doublequotesareused;andfordefiningASCIIstrings(morethanonecharacter),weusetheDCBdirective.
Assemblerdirectives
While instructions tell the CPU what to do, directives (also called pseudo-instructions) give directions to the assembler. For example, the MOV and ADDinstructionsarecommandstotheCPU,butEQU,END,andENTRYaredirectivestotheassembler.The followingsectionpresentssomewidelyuseddirectivesof theARMandhow they are used. The directives help us develop our program easier and make ourprogramlegible(morereadable).Table2-7showssomeassemblerdirectives.
Directive Description
AREA Instructstheassemblertoassembleanewcodeordatasection
END Informstheassemblerthatithasreachedtheendofasourcefile.
ENTRY Declaresanentrypointtoaprogram.
EQU Givesasymbolicnametoanumericconstant,aregister-relativevalueoraPC-relativevalue.
INCLUDE Itaddsthecontentsofafiletoourprogram.
Table2-7:SomeWidelyUsedARMDirective
AREA
TheAREA directive tells the assembler to define a new section ofmemory. Thememory can be code (instruction) or data and can have attributes such as ReadOnly,ReadWrite, and so on. This iswidely used to define one ormore blocks of indivisiblememoryforcodeordatatobeusedbythelinker.EveryAssemblylanguageprogramhasatleastoneAREA.Thefollowingistheformat:
AREAsectionname,attribute,attribute,…
ThefollowinglinedefinesanewareanamedMY_ASM_PROG1whichhasCODEandREADONLYattributes:
AREAMY_ASM_PROG1,CODE,READONLY
Among widely used attributes are CODE, DATA, READONLY, READWRITE,COMMON,andALIGN.Thefollowingdescribesthesewidelyusedattributes.
READWRITE isanattributegiven toanareaofmemorywhichcanbereadfromandwrittento.SinceitisREADWRITEsectionoftheprogramitisbydefaultforDATA.InARMAssemblylanguageweusethisareatosetasideSRAMmemoryforscratchpadandstack.TheAssemblerputstheREADWRITEsectionsnexttoeachotherintheSRAMmemory.
READONLY is an attribute given to an area ofmemorywhich can only be readfrom.SinceitisREADONLYsectionoftheprogramitisbydefaultforCODE.InARMAssemblylanguageweusethisareatowriteourinstructionsformachinecodeexecution.TheREADONLYsectionsareputnexttoeachotherintheflashmemory.
Note
InKeil,ThememoryspaceofREADONLYandREADWRITEaredefinedintheLinkerandTargettabsoftheProject\Options.Keilsetsthevaluesaccordingtothememorymapofthechosenchip.
CODE is an attribute given to an area of memory used for executable machineinstruction.SinceitisusedforcodesectionoftheprogramitisbydefaultREADONLYmemory. In ARM Assembly language we use this area to write our instructions. Thefollowinglinedefinesanewareaforwritingprograms:
AREAOUR_ASM_PROG,CODE,READONLY
DATA isanattributegiven toanareaofmemoryusedfordataandno instruction(machine instructions)canbeplaced in thisarea.Since it isusedfordatasectionof theprogram it is bydefault aREADWRITEmemory. InARMAssembly languageweusethisareatosetasideSRAMmemoryforscratchpadandstack.Thefollowinglinedefinesanewareafordefiningvariables:
AREAOUR_VARIABLES,DATA,READWRITE
Todefineconstantvaluesintheflashmemorywewritethefollowing:
AREAOUR_CONSTS,DATA,READONLY
COMMONisanattributegiventoanareaofDATAmemorysectionwhichcanbeusedcommonlybyseveralprogramcodes.WedonotinitializetheCOMMONsectionofthe memory since it is used by compiler exclusively. The compiler initializes theCOMMONmemoryareawithallzeros.
ALIGN is another attribute given to an area ofmemory to indicate howmemoryshouldbeallocatedaccordingtotheaddresses.WhentheALIGNisusedforCODEandREADONLYitalignedin4-bytesaddressboundarybydefaultsincetheARMinstructionsare all 32-bit (4-bytes) word. The ALIGN attribute of AREA has a number after likeALIGN=3whichindicatestheinformationshouldbeplacedinmemorywithaddressesof
23,thatis0x50000,0x50008,0x50010,0x50020,andsoon.Wewillseemoreaboutthatsoon.TheusageandimportanceofALIGNattributeisdiscussedinChapter6.
ENTRY
Another important pseudocode is the ENTRY directive. This indicates to theassemblerthebeginningoftheexecutablecode.TheENTRYdirectiveisthefirstlineoftheARMAssemblylanguagecodesectionoftheprogram,meaningthatanythingaftertheENTRY directive in the source code is considered actual machine instruction to beexecutedbytheCPU.ForagivenARMAssemblylanguageprogramwecanhaveonlyoneENTRYpoint.HavingmultipleENTRYdirectiveinanAssemblylanguageprogramwillgiveyouanerrorbyassembler.
ENDdirective
AnotherimportantpseudocodeistheENDdirective.Thisindicatestotheassemblertheendofthesource(asm)file.TheENDdirectiveisthelastlineoftheARMAssemblylanguageprogram,meaning that anything after theENDdirective in the source code isignored by the assembler. Program 2-1 shows how the AREA, ENTRY and ENDdirectivesareused.
Program2-1
;ARMAssemblyLanguageProgramToAddSomeDataandStoretheSUMinR3.
AREAPROG_2_1,CODE,READONLY
ENTRY
MOVR1,#0x25;R1=0x25
MOVR2,#0x34;R2=0x34
ADDR3,R2,R1;R3=R2+R1
HEREBHERE;stayhereforever
END
LDR
Inthelastsection,westatedthatonecannot loadvalueslarger than0xFFintothe32-bit registers ofARM since only 8 bits are used for the immediate value.TheARMassemblerprovideusapseudo-instructionof“LDRRd,=32-bit_immidiate_vlaue”toloadvaluegreaterthan0xFF.Wewillexaminehowthispseudo-instructionworksinChapter 6.Fornow,justnoticethe=signusedinthesyntax.Thefollowingpseudo-instructionloadsR7with0x112233.
LDRR7,=0x112233
Wewill use this pseudo-instruction to load 32-bit value into register extensivelythroughout the book. To load values less than 0xFF, we still use the “MOV Rd,#8-
bit_immidiate_value” instruction since it is a real instruction of ARM, therefore moreefficientincodesize.
EQU(equate)
Thisisusedtodefineaconstantvalueorafixedaddress.TheEQUdirectivedoesnotsetasidestorageforadata item,butassociatesaconstantnumberwithadataoranaddresslabelsothatwhenthelabelappearsintheprogram,itsconstantwillbesubstitutedfor the label.ThefollowingusesEQUfor thecounterconstant,and then theconstant isusedtoloadtheR2register:
COUNTEQU0x25
……….
MOVR2,#COUNT;R2=0x25
Whenexecutingtheaboveinstruction“MOVR2,#COUNT”,theregisterR2willbeloadedwiththevalue0x25.WhatistheadvantageofusingEQU?Assumethataconstant(afixedvalue)isusedthroughouttheprogram,andtheprogrammerwantstochangeitsvalue everywhere. By the use of EQU, the programmer can change it once and theassembler will change all of its occurrences throughout the program. This allows theprogrammertoavoidsearchingtheentireprogramtryingtofindeveryoccurrence.
UsingEQUforfixeddataassignment
TogetmorepracticeusingEQUtoassignfixeddata,examinethefollowing:
DATA1EQU0x39;thewaytodefinehexvalue
DATA2EQU2_00110101;thewaytodefinebinaryvalue(35inhex)
DATA3EQU39;decimalnumbers(27inhex)
DATA4EQU‘2’;ASCIIcharacters
UsingEQUforSFRaddressassignment
EQUisalsowidelyusedtoassignSFRaddresses.Examinethefollowingcode:
PORTBEQU0xF0018;SFRPortBaddress
MOVR6,#0x01;R6=0x01
LDRR2,=PORTB;R2=0xF0018
STRBR6,[R2];PortBnowhas0x01
UsingEQUforRAMaddressassignment
AnothercommonusageofEQUisfortheaddressassignmentoftheinternalSRAM.ItisexactlylikeusingEQUforSFRaddressassignment.Examinethefollowingcode:
SUMEQU0x40000120;assignRAMloctoSUM
MOVR2,#5;loadR2with5
MOVR1,#2;loadR1with2
ADDR2,R2,R1;R2=R2+R1
LDRR3,=SUM;loadR3with0x40000120
STRBR2,[R3];storetheresultSUM
This is especiallyhelpfulwhen theaddressneeds tobechanged inorder touseadifferentARMchipforagivenproject.ItismucheasiertorefertoanamethananumberwhenaccessingRAMaddresslocations.
RN(equate)
Thisisusedtodefineanameforaregister.TheRNdirectivedoesnotsetasideaseperate storage for the name, but associates a registerwith that name. It improves theclarity.Program2-2showshowweuseSUMnameforR3.
Program2-2:AnARMAssemblyLanguageProgramUsingRNDirective
;ARMAssemblyLanguageProgramToAddSomeData
;andstoretheSUMinR3.
VAL1RNR1;defineVAL1asanameforR1
VAL2RNR2;defineVAL2asanameforR2
SUMRNR3;defineSUMasanameforR3
AREAPROG_2_2,CODE,READONLY
ENTRY
MOVVAL1,#0x25;R1=0x25
MOVVAL2,#0x34;R2=0x34
ADDSUM,VAL1,VAL2;R3=R2+R1
HEREBHERE
END
INCLUDEdirective
The includedirective tells theARM assembler toadd thecontentsofa file toourprogram(likethe#includedirectiveinClanguage).
Assemblerdataallocationdirectives
In most Assembly languages there are some directives to allocate memory andinitializeitsvalue.InARMAssemblylanguageDCB,DCD,andDCWallocatememoryandinitializethem.TheSPACEdirectiveallocatesmemorywithoutinitializingit.
DCBdirective(defineconstantbyte)
TheDCBdirectiveallocatesabytesizememoryandinitializesthevalues.
MYVALUEDCB5;MYVALUE=5
MYMSAGEDCB“HELLOWORLD”;string
DCWdirective(defineconstanthalf-word)
TheDCWdirectiveallocatesahalf-wordsizememoryandinitializesthevalues.
MYDATADCW0x20,0xF230,5000,0x9CD7
DCDdirective(defineconstantword)
TheDCDdirectiveallocatesawordsizememoryandinitializesthevalues.
MYDATADCD0x200000,0xF30F5,5000000,0xFFFF9CD7
SeeTables2-8and2-9.
Directive Description
DCB Allocatesoneormorebytesofmemory,anddefinestheinitialruntimecontentsofthememory
DCW Allocatesoneormorehalfwordsofmemory, alignedon two-byteboundaries, anddefinestheinitialruntimecontentsofthememory.
DCWU Allocatesoneormorehalfwordsofmemory,anddefinestheinitialruntimecontentsofthememory.Thedataisnotaligned.
DCD Allocates one or more words of memory, aligned on four-byte boundaries, anddefinestheinitialruntimecontentsofthememory.
DCDU Allocatesoneormorewordsofmemoryanddefinestheinitialruntimecontentsofthememory.Thedataisnotaligned.
Table2-8:SomeWidelyUsedARMMemoryAllocationDirectives
DataSize Bits Decimal Hexadecimal Directive Instruction
Byte 8 0–255 0-0xFF DCB STRB/LDRB
Half-word 16 0–65535 0-0xFFFF DCW STRH/LDRH
Word 32 0–232-1 0-0xFFFFFFFF DCD STR/LDR
Table2-9:UnsignedDataRangeinARMandassociatedInstructions
In Program 2-3A you see an example of storing constant values in the programmemoryusingthedirectives.Figure2-12showshowthedataisstoredinmemory.Intheexample, theprogramgoes to locations0x00 to0x0F.TheDCBdirectivestoresdata inaddresses0x10–0x17.Asyouseeonebyteisallocatedforeachdata.TheDCDallocates4
bytesforeachdata.Asaresultthelowestbyteof0x23222120(whichis0x20)isstoredinlocation0x18andthenextbytesarestoredinthenextlocations.
Program2-3A:SampleofStoringFixedDatainProgramMemory
;storingdatainprogrammemory.
AREALOOKUP_EXAMPLE,READONLY,CODE
ENTRY
LDRR2,=OUR_FIXED_DATA;pointtoOUR_FIXED_DATA
LDRBR0,[R2];loadR0withthecontents
;ofmemorypointedtobyR2
ADDR1,R1,R0;addR0toR1
HEREBHERE;stayhereforever
OUR_FIXED_DATA
DCB0x55,0x33,1,2,3,4,5,6
DCD0x23222120,0x30
DCW0x4540,0x50
END
Figure2-12:MemoryDumpforProgram2-3A
TheDCWdirective allocates 2 bytes for eachdata.For example, the lowbyte of0x4540islocatedinaddress0x20andthehighbyteofitgoestoaddress0x21.Similarlythelowbyteof0x50islocatedinaddress0x22andthehighbyteofitinaddress0x23.
Intheprogramtoaccessthedata,firsttheR2registerisloadedwiththeaddressofOUR_FIXED_DATA.Inthisexample,OUR_FIXED_DATAhasaddress0x10. So,R2isloadedwith0x10.Then,thecontentsoflocation0x10isloadedintoregisterR0,usingtheLDRBinstruction.
SPACEdirective
Using the SPACE directivewe can allocatememory for variables. The followinglinesallocate4and2bytesofmemoryandnamethemasLONG_VARandOUR_ALFA:
LONG_VARSPACE4;Allocate4bytes
OUR_ALFASPACE2;Allocate2bytes
In the followingprogram3variablesaredefined:A,B,andC.ThenAandBareinitializedwith5and4,respectively.InthenextstepAandBareaddedtogetherandtheresultisputinC:
Program2-3B
AREAOUR_PROG,CODE,READONLY
ENTRY
;A=5
LDRR0,=A;R0=Addr.ofA
MOVR1,#5;R1=5
STRR1,[R0];init.Awith5
;B=4
LDRR0,=B;R0=Addr.ofB
MOVR1,#4;R1=4
STRR1,[R0];init.Bwith4
;R1=A
LDRR0,=A;R0=Addr.ofA
LDRR1,[R0];R1=valueofA
;R2=B
LDRR0,=B;R0=Addr.ofA
LDRR2,[R0];R2=valueofA
;C=R1+R2(C=A+B)
ADDR3,R1,R2;R3=A+B
LDRR0,=C;R0=Addr.ofC
STRR3,[R0];C=R3
loopBloop
AREAOUR_DATA,DATA,READWRITE
;AllocatesthefollowingsinSRAMmemory
ASPACE4
BSPACE4
CSPACE4
END
ADRdirective
ToloadregisterswiththeaddressesofmemorylocationswecanalsousetheADRpseudo-instructionwhichhasabetterperformance.SeeChapter6formore.ADRhasthefollowingsyntax:
ADRRn,label
For example, in Program 2-3A we can load R2 with the address ofOUR_FIXED_DATAusingthefollowingpseudo-instruction:
ADRR2,OUR_FIXED_DATA;pointtoOUR_FIXED_DATA
ALIGN
Thisisusedtomakesuredataisalignedin32-bitwordor16-bithalfwordmemoryaddress.ThefollowingusesALIGNtomakethedata32-bitwordaligned:
ALIGN4;thenextinstructionisword(4bytes)aligned
…
ALIGN2;thenextinstructionishalf-word(2bytes)aligned
…
Example2-7showstheresultofusingtheALIGNdirective.
Example2-7
ComparetheresultofusingALIGNinthefollowingprograms:
a)
AREAE2_7A,READONLY,CODE
ENTRY
ADRR2,DTA
LDRBR0,[R2]
ADDR1,R1,R0
H1BH1
DTADCB0x55
DCB0x22
END
b)
AREAE2_7B,READONLY,CODE
ENTRY
ADRR2,DTA
LDRBR0,[R2]
ADDR1,R1,R0
H1BH1
DTADCB0x55
ALIGN2
DCB0x22
END
c)
AREAE2_7C,READONLY,CODE
ENTRY
ADRR2,DTA
LDRBR0,[R2]
ADDR1,R1,R0
H1BH1
DTADCB0x55
ALIGN4
DCB0x22
END
Solution:
a)
WhenthereisnoALIGNdirectivetheDCBdirectiveallocatesthefirstemptylocationforitsdata.Inthisexample,address0x10isallocatedfor0x55.So0x22goestoaddress0x11.
b)
IntheexampletheALIGNissetto2whichmeansthedatashouldbeputinalocationwithevenaddress.The0x55goestothefirstemptylocationwhichis0x10.Thenextemptylocationis0x11whichisnotamultipleof2.So,itisfilledwith0andthenextdatagoestolocation0x12.
c)
IntheexampletheALIGNissetto4whichmeansthedatashouldgotolocationswhoseaddressismultipleof4.The0x55goestothefirstemptylocationwhichis0x10.Thenextemptylocationsare0x11,0x12,and0x13whicharenotamultipleof4.So,theyarefilledwith0sandthenextdatagoestolocation0x14.
RulesforlabelsinAssemblylanguage
Bychoosing labelnames that aremeaningful, aprogrammercanmakeaprogrammucheasier to readandmaintain.Thereareseveral rules thatnamesmust follow.First,each label name must be unique. The names used for labels in Assembly languageprogramming consist of alphabetic letters in both uppercase and lowercase, the digits 0through9,andthespecialcharactersquestionmark(?),period(.),at(@),underline(_),and dollar sign ($). The first character of the labelmust be an alphabetic character. Inotherwords,itcannotbeanumber.Everyassemblerhassomereservedwordsthatmustnot be used as labels in the program. Foremost among the reserved words are themnemonics for the instructions.Forexample,“MOV”and“ADD”are reservedbecausethey are instruction mnemonics. In addition to the mnemonics there are some other
reservedwords.Checkyourassemblerforthelistofreservedwords.
ReviewQuestions
1.GiveanexampleofhexdatarepresentationintheARMassembler.
2. Showhow to represent decimal 20 in formats of (a) hex, (b) decimal, and (c)binaryintheARMassembler.
3.WhatistheadvantageinusingtheEQUdirectivetodefineaconstantvalue?
4.Showthehexnumbervalueusedbythefollowingdirectives:
(a)ASC_DATAEQU‘4’(b)MY_DATAEQU2_00011111
5.GivethevalueinR2forthefollowing:
MYCOUNTEQU15
MOVR2,#MYCOUNT
6.Givethevalueindatamemorylocation0x200000forthefollowing:
MYCOUNTEQU0x95
MYMEMEQU0x200000
MOVR0,#MYCOUNT
LDRR2,=MYMEM
STRBR0,[R2]
7.Givethevalueindatamemory0x630000forthefollowing:
MYDATAEQU12
MYMEMEQU0x00630000
FACTOREQU0x10
MOVR1,#MYDATA
MOVR2,#FACTOR
LDRR3,=MYMEM
ADDR1R2,R1
STRBR1,[R3]
Section2.6:IntroductiontoARMAssemblyProgrammingInthissectionwediscussAssemblylanguageformatanddefinesomewidelyused
terminologyassociatedwithAssemblylanguageprogramming.
WhiletheCPUcanworkonlyinbinary,itcandosoataveryhighspeed.Itisquitetedious and slow for humans, however, to dealwith 0s and 1s in order to program thecomputer.Aprogramthatconsistsof0sand1s iscalledmachine language. In theearlydaysof thecomputer,programmerscodedprograms inmachine language.Although thehexadecimal systemwasused as amore efficientway to represent binarynumbers, theprocess of working in machine code was still cumbersome for humans. Eventually,Assembly languagesweredeveloped,whichprovidedmnemonics for themachine codeinstructions,plusotherfeaturesthatmadeprogrammingfasterandlesspronetoerror.Thetermmnemonicisfrequentlyusedincomputerscienceandengineeringliteraturetoreferto codes and abbreviations that are relatively easy to remember. Assembly languageprograms must be translated into machine code by a program called an assembler.Assemblylanguageisreferredtoasalow-levellanguagebecauseitdealsdirectlywiththeinternal structureof theCPU.Toprogram inAssembly language, theprogrammermustknowalltheregistersoftheCPUandthesizeofeach,aswellasotherdetails.
Today,onecanusemanydifferentprogramminglanguages,suchasBASIC,Pascal,C, C++, Java, and numerous others. These languages are called high-level languagesbecause the programmer does not have to be concernedwith the internal details of theCPU. Whereas an assembler is used to translate an Assembly language program intomachine code (sometimes also called object code or opcode for operation code), high-level languages are translated into machine code by a program called a compiler. Forinstance,towriteaprograminC,onemustuseaCcompilertotranslatetheprogramintomachinelanguage.NextwelookatARMAssemblylanguageformat.
StructureofAssemblylanguage
AnAssemblylanguageprogramconsistsof,amongotherthings,aseriesoflinesofAssembly language instructions. An Assembly language instruction consists of amnemonic,optionallyfollowedbytwoorthreeoperands.Theoperandsarethedataitemsbeingmanipulated,andthemnemonicsarethecommandstotheCPU,tellingitwhattodowiththoseitems.SeeProgram2-4.
Program2-4:SampleofanARMAssemblyLanguageProgram
;ARMAssemblylanguageprogramtoaddsomedataandstoretheSUMinR3.
AREAPROG_2_4,CODE,READONLY
ENTRY
MOVR1,#0x25;R1=0x25
MOVR2,#0x34;R2=0x34
ADDR3,R2,R1;R3=R2+R1
HEREBHERE
END
AnAssemblylanguageprogramisaseriesofstatements,orlines,whichareeitherAssemblylanguageinstructions,suchasADDandMOV,orstatementscalleddirectives.While instructions tell the CPUwhat to do, directives (also called pseudo-instructions)givedirectionstotheassembler.Forexample,inProgram2-4,whiletheMOVandADDinstructionsarecommandstotheCPU,ENTRYandENDaredirectivestotheassembler.The directive END tells the assembler that it is the end of the code, while ENTRY isbeginningofthecode.
AnAssemblylanguageinstructionconsistsoffourfields:
[label]mnemonic[operands][;comment]
Bracketsindicatethatafieldisoptionalandnotalllineshavethem.Bracketsshouldnotbetypedin.Regardingtheaboveformat,thefollowingpointsshouldbenoted:
1.Thelabelfieldallowstheprogramtorefertoalineofcodebyname.Thelabelfieldcannotexceedacertainnumberofcharacters.Checkyourassemblerfortherule.
2.TheAssemblylanguagemnemonic(instruction)andoperand(s)fieldstogetherperform the real work of the program and accomplish the tasks forwhich theprogramwaswritten.InAssemblylanguagestatementssuchas
MOVR3,#0x55
MOVR2,#0x67
ADDR2,R2,R3;R2=R2+R3
ADDandMOVarethemnemonicsthatproduceopcodes;the“0x55”and“0x67”arethe operands. Instead of a mnemonic and an operand, these two fields could containassembler pseudo-instructions, or directives. Remember that directives do not generateanymachinecode(opcode)andareusedonlybytheassembler,asopposedtoinstructionsthataretranslatedintomachinecode(opcode)fortheCPUtoexecute.InProgram2-4thecommands END and ENTRY are examples of directives. Many of these pseudo-instructionswerediscussedinthelastsection.
3.Thecommentfieldbeginswithasemicoloncommentindicator“;”.Commentsmaybe at the endof a lineoron a lineby themselves.The assembler ignorescomments,but theyare indispensable toprogrammers.Althoughcommentsareoptional, it isrecommendedthattheybeusedtodescribetheprograminawaythatmakesiteasierforsomeoneelsetoreadandunderstand.
4. Noticethelabel“HERE”inthelabelfieldinProgram2-4.IntheB(Branch)statementtheARMistoldtostayinthisloopindefinitely.Ifyoursystemhasamonitor program you do not need this line and should delete it from your
program.Inthenextsectionwewillseehowtocreateaready-to-runprogram.
Note!
Thefirstcolumnofeachlineisalwaysconsideredaslabel.Thus,becarefultopressaTabatthebeginningofeachlinethatdoesnothavelabel;otherwise,yourinstructionisconsideredasalabelandanerrormessagewillappearwhencompiling.
ReviewQuestions
1.Whatisthepurposeofpseudo-instructions?
2._____________aretranslatedbytheassemblerintomachinecode,whereas__________________arenot.
3.Trueorfalse.Assemblylanguageisahigh-levellanguage.
4.Whichofthefollowinginstructionsproducesopcode?Listallthatdo.
(a)MOVR6,#0x25(b)ADDR2,R1,R3(c)END(d)HEREBHERE
5.Pseudo-instructionsarealsocalled___________.
6.Trueorfalse.AssemblerdirectivesarenotusedbytheCPUitself.Theyaresimplyaguidetotheassembler.
7.InQuestion4,whichoneisanassemblerdirective?
Section2.7:AssemblinganARMProgramNowthatthebasicformofanAssemblylanguageprogramhasbeengiven,thenext
questionis:Howitiscreated,assembled,andmadereadytorun?ThestepstocreateanexecutableAssemblylanguageprogram(Figure2-13)areoutlinedasfollows:
Figure2-13:StepstoCreateaProgram
1. FirstweuseatexteditortotypeinaprogramsimilartoProgram2-4.Inthecase of ARM, we can use the Keil IDE, which has a text editor, assembler,simulator, and much more all in one software package. It is an excellentdevelopmentsoftwarethatsupportsalltheARMchipsandisfreeforuniversitystudents.Seewww.keil.comforevaluationversionof thesoftwareforstudents.Manyeditorsorwordprocessorsarealsoavailablethatcanbeusedtocreateoredittheprogram.AwidelyusededitoristheNotepadinWindows,whichcomeswith all Microsoft operating systems. Notice that the editor must be able toproduce an ASCII file. For assemblers, the file names follow the usual DOSconventions, but the source file has the extension “.a” or “.asm”. The “a”extensionforthesourcefileisusedbyanassemblerinthenextstep.
2. The“a”sourcefilecontainingtheprogramcodecreatedinstep1isfedtotheARMassembler.Theassemblerproducesanobjectfile,andalistfile.Theobjectfilehastheextension“.o”,andthelistfilehas“.lst”extension.
3.Theobjectfileplusascriptfileareusedbythelinkertoproducemapfileandhex file. The map file has the extension “.map”, and the hex file has “.hex”extension.The script file is optional and can be replacedwith some commandlineoptions.Aftera successful link, thehex file is ready tobeburned into theARM’sprogramROMandisdownloadedintotheARMchip.
Moreaboutasmandobjectfiles
Theasmfileisalsocalledthesourcefileandmusthavethe“a”or“asm”extension.Asmentioned earlier, this file is created with a text editor such asWindowsNotepad.
Manyassemblerscomewithatexteditor.Theassemblerconvertstheasmfile’sAssemblylanguage instructions intomachine languageandprovides theo (object) file.Theobjectfile,asmentionedearlier,hasan“o”asitsextension.Theobjectfileisusedasinputtoasimulatororanemulator.
Beforewecanassembleaprogramtocreateaready-to-runprogram,wemustmakesurethatitiserrorfree.TheKeiluVisionIDEprovidesuserrormessagesandweexaminethemtoseethenatureofsyntaxerrors.Theassemblerwillnotassembletheprogramuntilallthesyntaxerrorsarefixed.AsampleofanerrormessageisshowninFigure2-14.
Buildtarget‘Target1’
assemblinga1.asm…a1.asm(7):error:A1163E:UnknownopcodeMOVE,expectingopcodeorMacro
Targetnotcreated
Figure2-14:SampleofanErrorMessage
“lst”and“map”files
Themap file shows the labels defined in the program togetherwith their values.ExamineFigure2-15.ItshowstheMapfileofProgram2-4.
MemoryMapoftheimage
ImageEntrypoint:0x00000000
LoadRegionLR_1(Base:0x00000000,Size:0x00000010,Max:0xffffffff,ABSOLUTE)
ExecutionRegionER_RO(Base:0x00000000,Size:0x00000010,Max:0xffffffff,ABSOLUTE)
BaseAddrSizeTypeAttrIdxESectionNameObject
0x000000000x00000010CodeRO1*PROG_2_1a2.o
ExecutionRegionER_RW(Base:0x40000000,Size:0x00000000,Max:0xffffffff,ABSOLUTE)
****Nosectionassignedtothisexecutionregion****
ExecutionRegionER_ZI(Base:0x40000000,Size:0x00000000,Max:0xffffffff,ABSOLUTE)
****Nosectionassignedtothisexecutionregion****
Figure2-15:SampleofaMapFile
Thelst(list)file,whichisoptional,isveryusefultotheprogrammer.Thelistshowsthebinaryandsourcecode;italsoshowswhichinstructionsareusedinthesourcecode,andtheamountofmemorytheprogramuses.SeeFigure2-16.
ARMMacroAssembler Page1
100000000 ;ARMAssemblyLanguageProgramToAddSomeDataandStoretheSUMinR3.
2
00000000
300000000 AREAPROG_2_4,CODE,READONLY
400000000 ENTRY
500000000 E3A01025 MOVR1,#0x25;R1=0x25
600000004 E3A02034 MOVR2,#0x34;R2=0x34
700000008 E0823001 ADDR3,R2,R1;R3=R2+R1
80000000C EAFFFFFE HERE BHERE
900000010 END
Figure2-16:SampleofaListFileforARM
Manyassemblersassumethatthelistfileisnotwantedunlessyouindicatethatyouwant to produce it. These files can be accessed by a text editor such as Notepad anddisplayedonthemonitor,orsenttotheprintertogetahardcopy.Theprogrammerusesthelistandmapfilestolocatesyntaxerror.
There are many different ARM assemblers available for evaluation nowadays. Ifyou use theWindows operating system,ARM IAR IDE andKeil uVision can be used.Theyhavegreatfeaturesandniceenvironments.
ReviewQuestions
1.Trueorfalse.TheARMuVisionIDEandWindowsNotepadtexteditorbothproduceanASCIIfile.
2.Trueorfalse.Theextensionforthesourcefileis“a”.
3.Whichofthefollowingfilescanbeproducedbyatexteditor?
(a)myprog.a(b)myprog.obj(c)myprog.hex(d)myprog.lst
4.Whichofthefollowingfilesisproducedbyanassembler?
(a)myprog.asm(b)myprog.obj(c)myprog.hex(d)myprog.lst
Section2.8:TheProgramCounterandProgramROMSpaceintheARMIn this section we discuss the role of the program counter (PC) in executing a
programandshowhowthecodeisfetchedfromROMandexecuted.Wewillalsodiscusstheprogram(code)ROMspaceforvariousARMfamilymembers.Finally,weexaminetheHarvardarchitectureoftheARM.
ProgramcounterintheARM
ThemostimportantregisterinARMisthePC(programcounter).AswementionedearliertheR15istheprogramcounterinARM.TheprogramcounterisusedbytheCPUto point to the address of the next instruction to be executed. As the CPU fetches theopcode from the program ROM, the program counter is incremented automatically topointtothenextinstruction.Thewidertheprogramcounter,themorememorylocationsaCPUcanaccess.Thatmeansthata32-bitprogramcountercanaccessamaximumof4G(232=4G)bytesprogrammemorylocations.
Inmostmicrocontrollerseachmemorylocationis1bytewide.Inthecaseofa32-bit program counter, the memory space is 4G (232 = 4G) bytes, which occupies the0x00000000 –0xFFFFFFFF address range since it is byte-addressable. The programcounterintheARMfamilyis32bitswide.ThismeansthattheARMfamilycanaccessaddresses00000000to0xFFFFFFFF,atotalof4Gbyteofmemoryspacelocations.The4GbytesofmemoryspacelocationsareallocatedamongtheI/Operipherals,SRAM,andFlashROM.However,atthetimeofthiswriting,noneofthemembersoftheARMfamilyhavetheentire4Gbytesofmemoryspacepopulatedwithon-chipperipherals,SRAM,andFlash.
PoweruplocationforARM
Onequestionthatwemustaskaboutanymicrocontroller(ormicroprocessor)is:“Atwhat address does the CPUwake upwhen power is applied?” Eachmicroprocessor isdifferent.InthecaseoftheARMmicrocontrollers(thatis,allmembersregardlessofthefamilyandvariation),themicrocontrollerwakesupatmemoryaddress0x00000000whenit ispoweredup.BypoweringupwemeanapplyingVCCoractivatingRESETpin. Inotherwords,when theARM ispoweredup, thePC (programcounter)has thevalueof0x00000000init.ThismeansthatitexpectsthefirstopcodetobestoredatROMaddress0x00000000.For this reason, in theARMsystem, the first opcodemustbeburned intomemorylocation0x00000000ofprogramROMbecausethisiswhereitlooksforthefirstinstruction when it is booted. Next we discuss the step-by-step action of the programcounterinfetchingandexecutingasampleprogram.
PlacingcodeinprogramROM
To get a better understanding of the role of the program counter in fetching andexecutingaprogram,weexaminetheactionoftheprogramcounteraseachinstructionisfetchedandexecuted.First,weexamineoncemorethelistfileofthesampleprogramandshowhow the code is placed into theFlashROMof theARMchip.Aswe can see inFigure2-16,theopcodeandoperandforeachinstructionarelistedontheleftsideofthe
listfile.
AftertheprogramisburnedintoROMofanARMchip,theopcodeandoperandareplacedinROMmemorylocationsstartingat0x00000000asshownintheProgram2-4listfile.
Thelistshowsthataddress00000000containsE3A01025,whichistheopcodeformovingavalueintoregister,andtheoperand(inthiscase0x25)tobemovedtoanotheroperand (in this caseR1). Therefore, the instruction “MOVR1, #0x25” has amachinecodeof“E3A01025”,whereE3Aistheopcodeand01025istheoperands.SeeFigure2-16. Similarly, the machine code “E3A02034” is located in ROM memory location00000004 and represents the opcode and the operands for the instruction “MOV R2,#0x34”. In the same way, machine code “E0813002” is located in memory location00000008 and represents the opcode and the operand for the instruction “ADDR3,R1,R2”. The opcode for “HERE B HERE” and its target address are located inlocations 0000000C. Notice that all the instructions in this program are 4-byteinstructions.
Executingaprograminstructionbyinstruction
Assuming that the above program is burned into the ROM of anARM chip, thefollowingisastep-by-stepdescriptionoftheactionoftheARMuponapplyingpowertoit:
1. WhentheARMispoweredup,thePC(programcounter)has00000000andstartstofetchthefirstinstructionfromlocation00000000oftheprogramROM.InthecaseoftheaboveprogramthefirstcodeisE3A01025,whichisthecodeformovingoperand0x25 toR1.Uponexecuting thecode, theCPUplaces thevalueof25inR1.Nowoneinstructionisfinished.Theprogramcounterisnowincremented to point to 00000004 (PC = 00000004), which contains codeE3A02034,themachinecodefortheinstruction“MOVR2,#0x34”.
2. UponexecutingthemachinecodeE3A02034,thevalue0x34isloadedtoR2.Theprogramcounterisincrementedto00000008.
3. ROM location 00000008 has the machine code for instruction “ADDR3,R2,R1”.ThisinstructionisexecutedandnowPC=0000000C.
4. Now PC = 0000000C points to the next instruction, which is “HERE BHERE”.Aftertheexecutionofthisinstruction,PC=0000000C.Thiskeepstheprograminaninfiniteloop.
The fact that the program counter points at the next instruction to be executedexplains why some microprocessors (notably the x86) call the program counter theinstructionpointer.
The steps of running a code in ARM is slightly different from what mentionedabovebecauseoftheuseofpipelineinARMarchitecture.WewillexaminepipelineslaterinChapter 7.AlsoinARMCortexMchipsthepower-onResetlocationvalueisdifferentaswewillseeinChapter8.
InstructionformationoftheARM
RecallthattheARMinstructionsarealways4-byte.Nextweexploretheinstructionformationforafewoftheinstructionswehaveusedinthischapter.ThisshouldgiveyousomeinsightintotheinstructionsoftheARM.
ADDinstructionformation
TheADDisa4-byte(32-bit)instruction.SeeFigure2-17.Ofthe32bits,thefirst4bitsaresetasidefortheconditionfieldwhichwillbediscussedmoreinChapter4.Bits26and27arealways0inADDinstruction.Bit25whichisindicatedbyIdefinesthetypeofsecondoperand.Aswementionedbefore,thesecondoperandcanbeeitheraregisteroranimmediate value between 0–255. If I = 1, the second operand is an immediate valueotherwiseitshouldbearegister.Bits24to21aretheoperationcodeofADDinstruction.Whenthesebitsare0100theCPUknowsthatitshouldruntheADDinstruction.Bit20whichisindicatedbySdefineseithertheinstructionshouldupdatetheflagbitsornot.InADDinstructionthisbitiszerowhileinADDSinstructionitisone.Bits19to16definethefirstoperand(Rn).ItcanbearegisternumberbetweenR0toR15.Likewise,bits15to12definethedestinationregister(Rd).Finally,bits11to0definethesecondoperand.Aswementionedbefore,bit25(I)definesthateitherthesecondoperandshouldbearegisteroranimmediatevalue.WewilldiscussthebitsofOperand2inmoredetailinChapter3.
Figure2-17:ADDInstructionFormation
SUBinstructionformation
TheSUBisa4-byte(32-bit)instruction.Ofthe32bits,thefirst4bitsaresetasidefor theconditionfield.Bits26and27arealways0 inSUBinstruction.Bit25which isindicated by I defines the type of second operand. If I = 1, the second operand is animmediatevalueotherwiseitshouldbearegister.Bits24to21aretheoperationcodeofSUB instruction.When these bits are 0010 theCPUknows that it should run theSUBinstruction.Bit20whichisindicatedbySdefineseithertheinstructionshouldupdatetheflagbitsornot.InSUBinstructionthisbitiszerowhileinSUBSinstructionitisone.Bits19 to16define the first operand (Rn). It canbe a registernumberbetweenR0 toR15.Likewise,bits15to12definethedestinationregister(Rd).Finally,bits11to0definethesecondoperand.Aswementionedbefore,bit25(I)definesthateitherthesecondoperandshouldbearegisteroranimmediatevalue.TheformationofSUBinstructionisshowninFigure2-18.
Figure2-18:SUBInstructionFormation
Generalformationofdataprocessinginstructions
Asyoumayhavenoticed,theformationofADDandSUBinstructionsarethesame
exceptbits24to21whicharecalledtheoperationcodeandtellstheCPU whatinstructionitshouldexecute.InARM,allofthedataprocessinginstructionshavethesameformat.Figure2-19showsthegeneralformationofdataprocessinginstructions.Eachofthedataprocessing instructionhasauniqueoperationcode. InTable2-10you see the listof alldataprocessinginstructionsandtheiropcodes.
Figure2-19:GeneralFormationofDataProcessingInstructions
Branchinstructionformation
TheBisa4-byte(32-bit)instruction.SeeFigure2-20.Ofthe32bits,thefirst4bitsaresetasidefortheconditionfieldwhichwillbediscussedmoreinChapter4.Bits27to25arealways101inBinstruction.Bit24whichisindicatedbyLiszeroinBinstructionandoneinBLinstruction.Bits23to0giveusbranchtargetlocationrelativetothecurrentaddress.ThesewillbediscussedfurtherinChapter4.
Figure2-20:BranchInstructionFormation
ROMwidthintheARM
Aswehaveseenso far in this section,each locationof theaddressspaceholds1byte. Ifwe have 32 address lines, thiswill give us 232 locations,which is 4Gbytes ofmemory locationwith an addressmap of 0x00000000–0xFFFFFFFF. To bring inmoreinformation(codeordata) intotheCPUineachbuscycle,ARMincreasedthewidthofthedatabusto32bits.Inotherwords,theARMisword-addressable.Incontrast,the8051CPUisbyte-addressableonly.Inasense,thedatabusisliketrafficlanesonthehighwaywhereeachlaneis8bitswide.Themorelanes,themoreinformationwecanbringintotheCPUforprocessing.FortheARM,theinternaldatabusbetweenthecodememoryandtheCPUis32bitswide,asshowninFigure2-21.Therefore,the4Gmemoryspaceisshownas1G×32usinga32-bitworddatabussize.ThewideningofthedatapathbetweentheprogramROMand theCPU is anotherway inwhich theARMdesigners increased theprocessingpoweroftheARMfamily.Anotherreasontomakethecodememory32bitswideistomatchitwiththeinstructionwidthoftheARMbecausealloftheinstructionsare4-bytewide.Thisway, theCPUbrings inan instructionfrommemoryevery time itmakesatriptotheprogrammemory.Thatwillmakeinstructionfetchasinglecycle,aswewillseeintheChapter4wheninstructiontimingisdiscussed.
TheARMdesignershavemadeallinstructionsfixedat4-byte;thereareno1-byte,2-byte,or3-byteinstructions,asisthecasewiththex86and8051chips.ThisispartoftheRISCarchitecturalphilosophy,whichwill bediscussed later in this chapter. ItmustalsobenotedthatthedatamemorySRAMintheARMmicrocontrollerarealso4-byte.
HarvardandvonNeumannarchitecturesintheARM
In Chapter 0,we discussedHarvard andVonNeumann architecture.ARM 9 andnewerarchitecturesuseHarvardarchitecture,whichmeans that thereare separatebusesforthecodeandthedatamemory.SeeFigure2-21.TheprogrambusprovidesaccesstotheprogrammemorywhereasthedatabusisusedforbringingdatatotheCPU.
Figure2-21:Harvardvs.VonNeumannArchitecture
AswecanseeinFigure2-21,intheprogrambus,thedatabusis32bitswideandthe address bus is as wide as the PC register to enable the CPU to address the entireprogrammemory.
InSections2-2and2-3,we learnedaboutdatamemoryspaceandhowtouse theSTR and LDR instructions. When the CPU wants to execute the “LDR Rd,[Rx]”instruction, itputsRxon theaddressbusof thedatabus,and receivesdata through thedatabus.Forexample,toexecute“LDRR2,[R5]”,assumingthatR5=0x40000200,theCPUputsthevalueofR5ontheaddressbus.Thelocation0x40000200isintheSRAM(seeFigure2-4).Thus, theSRAMputs thecontentsof location0x40000200onthedatabus.TheCPUgetsthecontentsoflocation0x40000200throughthedatabusandbringsintoCPUandputsitinR2.
The “STR Rx,[Rd]” instruction is executed similarly. The CPU puts Rd on theaddressbusandthecontentsofRxonthedatabus.Thememorylocationwhoseaddressisontheaddressbusreceivesthecontentsofdatabus.
Littleendianvs.bigendianwar
ExaminetheplacingofthecodeintheARMprogrammemory,showninFigure2-22.The lowbyte goes to the lowmemory location, and the highbyte goes to the highmemory address. This convention is called little endian to contrast it with big endian.Figure2-23showsstoring thesamedatausingbigendianconvention.Theoriginof thetermsbigendianandlittleendianisfromanargumentinaGulliver’sTravelsstoryoverhowaneggshouldbeopened:fromthebigendorthelittleend.Inthebigendianmethod,thehighbytegoestothelowaddress,whereasinthelittleendianmethod,thehighbytegoestothehighaddressandthelowbytetothelowaddress.AllIntelmicroprocessorsand
many microcontrollers use the little endian convention. Freescale (formerly Motorola)microprocessors,alongwithsomemainframes,usebigendian.Thedifferencemightseemastrivialaswhethertobreakaneggfromthebigendorthelittleend,butitisanuisanceinconvertingsoftwarefromonecamptoberunonacomputeroftheothercamp.Manymicroprocessors,includingtheARM,letthesoftwaredesignerchooselittleendianorbigendianconvention.
Figure2-22:ARMProgramMemoryContentsforProgram2-4ListFile(LittleEndian)
Figure2-23:BigEndianConvention
ReviewQuestions
1.IntheARM,theprogramcounteris______bitswide.
2.Trueorfalse.EverymemberoftheARMfamilywakesupatmemory0x00000000whenitispoweredup.
3.AtwhatROMlocationdowestorethefirstopcodeofanARMprogram?
4.Trueorfalse.AlltheinstructionsintheARMare4-byteinstructions.
5.Trueorfalse.ARM9andnewerarchitecturesusevonNeumannarchitecture.
6.Trueorfalse.ARM7andolderarchitecturesusevonNeumannarchitecture.
Section2.9:SomeARMAddressingModesTheCPUcanaccessoperands(data)invariousways,calledaddressingmodes.The
number of addressing modes is determined when the microprocessor is designed andcannotbechanged.Usingadvancedaddressingmodestheaccessingofdifferentdatatypesanddatastructures(e.g.arrays,pointers,classes)arediscussedinChapter6.SomeofthesimpleARMaddressingmodesare:
1.register
2.immediate
3.registerindirect(indexedaddressingmode)
Registeraddressingmode
The register addressingmode involves the use of registers to hold the data to bemanipulated.Memoryisnotaccessedwhenthisaddressingmodeisexecuted;therefore,itisrelativelyfast.SeeFigure2-24.
Figure2-24:RegisterAddressingMode
Examplesofregisteraddressingmodeareasfollow:
MOVR6,R2;copythecontentsofR2intoR6
ADDR1,R1,R3;addthecontentsofR3tocontentsofR1
SUBR7,R7,R2;subtractR2fromR7
Immediateaddressingmode
Intheimmediateaddressingmode, thesourceoperandisaconstant.In immediateaddressingmode, as the name implies, when the instruction is assembled, the operandcomes immediately after the opcode. For this reason, this addressing mode executesquickly.SeeFigure2-25.Examples:
MOVR9,#0x25;move0x25intoR9
MOVR3,#62;loadthedecimalvalue62intoR3
ADDR6,R6,#0x40;add0x40toR6
Figure2-25:ImmediateAddressingMode
Inthefirsttwoaddressingmodes,theoperandsareeitherinsidethemicroprocessor
ortaggedalongwiththeinstruction.Inmostprograms,thedatatobeprocessedisofteninsomememorylocationoutsidetheCPU.Therearemanywaysofaccessingthedatainthedatamemoryspace.Thefollowingdescribesoneofthemethods.
RegisterIndirectAddressingMode(Indexedaddressingmode)
Intheregisterindirectaddressingmode,theaddressofthememorylocationwheretheoperandresidesisheldbyaregister.SeeFigure2-26.Forexample:
STRR5,[R6];moveR5intothememorylocation
;pointedtobyR6
LDRR10,[R3];moveintoR10thecontentsofthe
;memorylocationpointedtobyR3.
Figure2-26:RegisterIndirectAddressingMode
SampleUsage:Registerindirectaddressingmode
Usingregisterindirectaddressingmodewecanimplementthedifferentpointers.Sincetheregistersare32-bittheycanaddresstheentirememoryspace.HereyouseeasimplecodeinCanditsequivalentinAssembly:
CLanguage:char*ourPointer;
ourPointer=(char*)0x12456;//Pointtolocation12456
*ourPointer=25;//store25inlocation0x12456
ourPointer++;//pointtonextlocation
AssemblyLanguage:LDRR2,=0x12456;pointtolocation0x12456
MOVR0,#25;R0=25
STRBR0,[R2];storeR0inlocation0x12456
ADDR2,R2,#1;incrementR2topointtonextlocation
Dependingonthedatatypethatthepointerpointsto,STR/LDR,STRH/LDRH,orSTRB/LDRBmightbeused.Intheaboveexample,sinceitpointstochar(whichis8-bit)STRBisused.
SeeChapter6formoreadvancedaddressingmodes.
ReviewQuestions
1.CantheARMprogrammermakeupnewaddressingmodes?
2.Whichregisterscanbeusedfortheregisterindirectaddressingmode?
3.Whereisthedatalocatedinimmediateaddressingmode?
Section2.10:RISCArchitectureinARMThere are three ways available to microprocessor designers to increase the
processingpoweroftheCPU:
1. Increasetheclockfrequencyof thechip.Onedrawbackof thismethodisthatthehigherthefrequency,themorepowerandheatdissipation.Powerandheatdissipationisespeciallyaproblemforhand-helddevices.
2. UseHarvardarchitecturebyincreasingthenumberofbusestobringmoreinformation(codeanddata)intotheCPUtobeprocessed.Whileinthecaseofx86 and other general purpose microprocessors this architecture is veryexpensiveandunrealistic,intoday’smicrocontrollersthisisnotaproblem.AswesawinSection2.8,someoftheARMchipshaveHarvardarchitecture.
3. Change the internalarchitectureof theCPUandusewhat iscalledRISCarchitecture.
ARMhasusedall threemethods to increase theprocessingpowerof theARMmicrocontrollers.InthissectionwediscussthemeritsofRISCarchitecture.
In theearly1980s,acontroversybrokeout in thecomputerdesigncommunity,butunlikemostcontroversies, itdidnotgoaway.Since the1960s, inallmainframesandminicomputers,designersputasmanyinstructionsastheycouldthinkofintotheCPU.Someoftheseinstructionsperformedcomplextasks.Anexampleisaddingdatamemory locations and storing the sum into memory. Naturally, microprocessordesignersfollowedtheleadofminicomputerandmainframedesigners.Becausethesemicroprocessorsused sucha largenumberof instructions,manyofwhichperformedhighly complex activities, they came to be known as CISC (complex instruction setcomputer) processors. According to several studies in the 1970s, many of thesecomplex instructions etched into CPUs were never used by programmers andcompilers. The huge cost of implementing a large number of instructions (some ofthem complex) into the microprocessor, plus the fact that a good portion of thetransistorsonthechipareusedbytheinstructiondecoder,madesomedesignersthinkofsimplifyingandreducingthenumberofinstructions.Asthisconceptdeveloped,theresultingprocessorscametobeknownasRISC(reducedinstructionsetcomputer).
FeaturesofRISC
The following are some of the features ofRISC as implemented by theARMmicrocontroller.
Feature1
RISCprocessorshaveafixedinstructionsize.InaCISCmicroprocessorssuchasthex86,instructionscanbe1,2,3,oreven5bytes.Forexample,lookatthefollowinginstructionsinthex86:
CLRC;clearCarryflag,a1-byteinstruction
ADDAccumulator,#mybyte;a2-byteinstruction
LJMPtarget_address;a5-byteinstruction
This variable instruction size makes the task of the instruction decoder verydifficult because the size of the incoming instruction is never known. In a RISCarchitecture, the size of all instructions is fixed. Therefore, theCPU can decode theinstructionsquickly.This is likeabricklayerworkingwithbricksof thesamesizeasopposed tousingbricksofvariablesizes.Ofcourse, it ismuchmoreefficient tousebricks of the same size. In the last section we saw how the ARM uses 4-byteinstructions and if not all the 32 bits are needed to form the instruction it fillswithzeros.
Feature2
One of the major characteristics of RISC architecture is a large number ofregisters.AllRISCarchitectureshaveat least8or16registers.Ofthese16registers,onlya fewareassigned toadedicatedfunction.Oneadvantageofa largenumberofregistersisthatitavoidstheneedforalargestacktostoreparameters.AlthoughastackisimplementedonaRISCprocessor,itisnotasessentialasinCISCbecausesomanyregistersareavailable.InARMtheuseofalargenumberofgeneralpurposeregisters(GPRs)satisfiesthisRISCfeature.ThestackfortheARMiscoveredinChapter6.
Feature3
RISCprocessorshavea small instruction set.RISCprocessorshaveonlybasicinstructions such as ADD, SUB, MUL, LOAD, STORE, AND, OR, EOR, CALL,JUMP,andsoon.ThelimitednumberofinstructionsisoneofthecriticismsleveledattheRISCprocessorbecauseitmakesthejobofAssemblylanguageprogrammersmuchmoretediousanddifficultcomparedtoCISCAssemblylanguageprogramming.Thisisone reason that RISC is used more commonly in high-level language environmentssuchastheCprogramminglanguageratherthanAssemblylanguageenvironments.ItisinterestingtonotethatsomedefendersofCISChavecalledit“completeinstructionsetcomputer”insteadof“complexinstructionsetcomputer”becauseithasacompletesetofeverykindofinstruction.Howmanyoftheseinstructionsareusedandhowoftenisanothermatter.The limitednumberof instructions inRISC leads toprograms thatare large. Although these programs can use more memory, this is not a problembecausememoryischeap.Beforetheadventofsemiconductormemoryinthe1960s,however, CISC designers had to pack as much action as possible into a singleinstruction toget themaximumbangfor theirbuck. In theARMwehavearound50instructions. We will examine more of the instruction set for the ARM in futurechapters.
Feature4
At this point, one might ask, with all the difficulties associated with RISCprogramming, what is the gain? The most important characteristic of the RISCprocessoristhatmorethan99%ofinstructionsareexecutedwithonlyoneclockcycle,incontrasttoCISCinstructions.Evensomeofthe1%oftheRISCinstructionsthatareexecuted with two clock cycles can be executed with one clock cycle by juggling
instructionsaround(codescheduling).CodeschedulingisdiscussedinChapter7.WewillexaminetheinstructioncycletimeandpipeliningoftheARMinChapter7.
Feature5
RISCprocessorshaveseparatebusesfordataandcode.Inallthex86processors,likeallotherCISCcomputers,thereisonesetofbusesfortheaddress(e.g.,A0–A31inthe 80386) and another set of buses for data (e.g., D0–D31 in the 80386) carryingopcodes and operands in and out of the CPU. To access any section of memory,regardlessofwhetheritcontainscodeordataoperands,thesameaddressbusanddatabusareused.InmanyRISCprocessors, therearefoursetsofbuses:(1)asetofdatabusesforcarryingdata(operands)inandoutoftheCPU,(2)asetofaddressbusesforaccessingthedata,(3)asetofbusestocarrytheopcodes,and(4)asetofaddressbusesto access the opcodes. The use of separate buses for code and data operands iscommonlyreferred toasHarvardarchitecture.WeexaminedtheHarvardarchitectureoftheARMintheprevioussection.
Feature6
Because CISC has such a large number of instructions, each with so manydifferentaddressingmodes,microinstructions(microcode)areusedtoimplementthem.TheimplementationofmicroinstructionsinsidetheCPUemploysmorethan40–60%oftransistors inmanyCISCprocessors.RISCinstructions,however,dueto thesmallsetof instructions, are implementedusing thehardwiremethod.HardwiringofRISCinstructionstakesnomorethan10%ofthetransistors.
Feature7
RISC uses load/store architecture. In CISC microprocessors, data can bemanipulatedwhile it is still inmemory. For example, in instructions such as “ADDReg,Memory”, themicroprocessormust bring the contents of the external memorylocationintotheCPU,addittothecontentsoftheregister,thenmovetheresultbacktotheexternalmemorylocation.Theproblemistheremightbeadelayinaccessingthedatafromexternalmemory.Thenthewholeprocesswouldbestalled,preventingotherinstructions fromproceeding in thepipeline. InRISC,designersdidawaywith thesekindsof instructions. InRISC, instructionscanonly load fromexternalmemory intoregisters or store registers into externalmemory locations.There is nodirectwayofdoingarithmeticand logicoperationsbetweena register and thecontentsofexternalmemory locations. All these instructions must be performed by first bringing bothoperands into the registers inside the CPU, then performing the arithmetic or logicoperation,andthensendingtheresultbacktomemory.Thisideawasfirstimplementedby the Cray 1 supercomputer in 1976 and is commonly referred to as load/storearchitecture. In the last section, we saw that the arithmetic and logic operations arebetweentheGPRsregisters,butnoneinvolvesamemorylocation.Forexample,thereisno“ADDR1,RAM-Loc”instructioninARM.
In concluding this discussion of RISC processors, it is interesting to note thatRISC technologywasexploredby thescientistsat IBMin themid-1970s,but itwas
DavidPattersonof theUniversityofCaliforniaatBerkeleywho in1980brought themeritsofRISCconcepts totheattentionofcomputerscientists.Itmustalsobenotedthat in recent years CISC processors such as the Pentium have used some RISCfeatures in their design. This was the only way they could enhance the processingpowerof thex86processorsandstaycompetitive.Ofcourse, theyhad touse lotsoftransistorstodothejob,becausetheyhadtodealwithalltheCISCinstructionsofthex86processorsandthelegacysoftwareofDOS/Windows.
ReviewQuestions
1.WhatdoRISCandCISCstandfor?
2.Trueorfalse.TheCISCarchitectureexecutesthevastmajorityofitsinstructionsin2,3,ormoreclockcycles,whileRISCexecutestheminoneclock.
3.RISCprocessorsnormallyhavea_____(large,small)numberofgeneral-purposeregisters.
4.Trueorfalse.Instructionssuchas“ADDR16,ROMmemory”donotexistinRISCmicroprocessorssuchastheARM.
5.HowmanyinstructionsofARMare32-bitwide?
6.Trueorfalse.WhileCISCinstructionsareofvariablesizes,RISCinstructionsareallthesamesize.
7.WhichofthefollowingoperationsdonotexistfortheADDinstructioninRISC?
(a)registertoregister(b)immediatetoregister(c)memorytomemory
8. Trueorfalse.Harvardarchitectureusesthesameaddressanddatabusestofetchbothcodeanddata.
Section2.11:ViewingRegistersandMemorywithARMKeilIDETheARMmicroprocessorhasgreattoolsandsupportsystems,manyofthemfree
orinexpensive.ARMKeiluVisionisanassemblerandsimulatorprovidedforfreebyKeil Corporation and can be downloaded from the www.keil.com website. Seehttp://www.MicroDigitalEd.com for tutorials on how to use the Keil ARM uVisionassemblerandsimulator.
ManyassemblersandCcompilerscomewithasimulator.Simulatorsallowustoview the contents of registers and memory after executing each instruction (single-stepping). It is strongly recommended to use a simulator to single-step some of theprograms in this chapter and future chapters. Single-stepping a program with asimulatorgivesusadeeperunderstandingofmicrocontrollerarchitecture,inadditiontothefactthatwecanuseittofindtheerrorsinourprograms.
Figures2-27and2-28showscreenshots forARMsimulators fromARMKeiluVision.
Figure2-27:KeiluVisionScreenshot
Figure2-28:KeiluVisionScreenshot
ProblemsSection2.1:TheGeneralPurposeRegistersintheARM
1.ARMisa(n)_____-bitmicroprocessor.
2.Thegeneralpurposeregistersare_____bitswide.
3.ThevalueinMOVR2,#valueis_____bitswide.
4.ThelargestnumberthatanARMGPRregistercanhaveis_____inhex.
5.Whatistheresultofthefollowingcodeandwhereisitkept?
MOVR2,#0x15
MOVR1,#0x13
ADDR2,R1,R2
6.Whichofthefollowingsis(are)illegal?
(a)MOVR2,#0x50000(b)MOVR2,#0x50(c)MOVR1,#0x00
(d)MOVR1,255(e)MOVR17,#25(f)MOVR23,#0xF5
(g)MOV123,0x50
7.Whichofthefollowingis(are)illegal?
(a)ADDR2,#20,R1(b)ADDR1,R1,R2(c)ADDR5,R16,R3
8.Whatistheresultofthefollowingcodeandwhereisitkept?
MOVR9,#0x25
ADDR8,R9,#0x1F
9.Whatistheresultofthefollowingcodeandwhereisitkept?
MOVR1,#0x15
ADDR6,R1,#0xEA
10.Trueorfalse.Wehave32generalpurposeregistersintheARM.
Section2.2:TheARMMemoryMap
11.Trueorfalse.R13andR14arespecialfunctionregisters.
12.Trueorfalse.Theperipheralregistersaremappedtomemoryspace.
13.Trueorfalse.TheOn-chipFlashisthesamesizeinallmembersofARM.
14.Trueorfalse.TheOn-chipdataSRAMisthesamesizeinallmembersofARM.
15.WhatisthedifferencebetweentheEEPROManddataSRAMspaceintheARM?
16.CanwehaveanARMchipwithnoEEPROM?
17.CanwehaveanARMchipwithnodataRAM?
18.WhatisthemaximumnumberofbytesthattheARMcanaccess?
19.Findtheaddressofthelastlocationofon-chipFlashforeachofthefollowing,assumingthefirstlocationis0:
(a)ARMwith32KB(b)ARMwith8KB
(c)ARMwith64KB(d)ARMwith16KB
(e)ARMwith128KB(f)ARMwith256KB
20.Showthelowestandhighestvalues(inhex)thattheARMprogramcountercantake.
21.AgivenARMhas0x7FFFastheaddressofthelastlocationofitson-chipROM.Whatisthesizeofon-chipFlashforthisARM?
22.RepeatQuestion21for0x3FFF.
23.Findtheon-chipprogrammemorysizeinKfortheARMchipwiththefollowingaddressranges:
(a)0x0000–0x1FFF(b)0x0000–0x3FFF
(c)0x0000–0x7FFF(d)0x0000–0xFFFF
(e)0x0000–0x1FFFF(f)0x00000–0x3FFFF
24.Findtheon-chipprogrammemorysizeinKfortheARMchipswiththefollowingaddressranges:
(a)0x00000–0xFFFFFF(b)0x00000–0x7FFFF
(c)0x00000–0x7FFFFF(d)0x00000–0xFFFFF
(e)0x00000–0x1FFFFF(f)0x00000–0x3FFFFF
Section2.3:LoadandStoreInstructionsinARM
25.Showasimplecodetostorevalues0x30and0x97intolocations0x20000015and0x20000016,respectively.
26.Showasimplecodetoloadthevalue0x55intolocations0x20000030–0x20000038.
27.Trueorfalse.WecannotloadimmediatevaluesintothedataSRAMdirectly.
28.Showasimplecodetoloadthevalue0x11intolocations0x20000010–0x20000015.
29.RepeatProblem28,exceptloadthevalueintolocations0x20000034–0x2000003C.
Section2.4:ARMCPSR(CurrentProgramStatusRegister)
30.Thestatusregisterisa(n)______-bitregister.
31.WhichbitsofthestatusregisterareusedfortheCandZflagbits,respectively?
32.WhichbitsofthestatusregisterareusedfortheVandNflagbits,respectively?
33.IntheADDinstruction,whenisCraised?
34.IntheADDinstruction,whenisZraised?
35.WhatisthestatusoftheCandZflagsafterthefollowingcode?
LDRR0,=0xFFFFFFFF
LDRR1,=0xFFFFFFF1
ADDSR1,R0,R1
36.FindtheCflagvalueaftereachofthefollowingcodes:
(a)LDRR0,=0xFFFFFF54
LDRR5,=0xFFFFFFC4
ADDSR2,R5,R0
(b)MOVR3,#0
LDRR6,=0xFFFFFFFF
ADDSR3,R3,R6
(c)LDRR3,=0xFFFFFFFF
LDRR8,=0xFFFFFF05
ADDSR2,R3,R8
37.Writeasimpleprograminwhichthevalue0x55isadded5times.
Section2.5:ARMDataFormatandDirectives
38.Statethevalue(inhex)usedforeachofthefollowingdata:
MYDAT_1EQU55
MYDAT_2EQU98
MYDAT_3EQU‘G’
MYDAT_4EQU0x50
MYDAT_5EQU200
MYDAT_6EQU‘A’
MYDAT_7EQU0xAA
MYDAT_8EQU255
MYDAT_9EQU2_10010000
MYDAT_10EQU2_01111110
MYDAT_11EQU10
MYDAT_12EQU15
39.Statethevalue(inhex)foreachofthefollowingdata:
DAT_1EQU22
DAT_2EQU0x56
DAT_3EQU2_10011001
DAT_4EQU32
DAT_5EQU0xF6
DAT_6EQU2_11111011
40.Showasimplecodetoloadthevalue0x10102265intolocations0x40000030–0x4000003F.
41.Showasimplecodeto(a)loadthevalue0x23456789intolocations0x40000060–0x4000006F,and(b)addthemtogetherandplacetheresultinR9asthevaluesareadded.UseEQUtoassignthenamesTEMP0–TEMP3tolocations0x40000060–0x4000006F.
Section2.6:IntroductiontoARMAssemblyProgrammingand
Section2.7:AssemblinganARMProgram
42.Assemblylanguageisa________(low,high)-levellanguagewhileCisa________(low,high)-levellanguage.
43.OfCandAssemblylanguage,whichismoreefficientintermsofcodegeneration(i.e.,theamountofprogrammemoryspaceituses)?
44.Whichprogramproducestheobjfile?
45.Trueorfalse.Thesourcefilehastheextension“asm”.
46.Trueorfalse.Thesourcecodefilecanbeanon-ASCIIfile.
47.Trueorfalse.EverysourcefilemusthaveEQUdirective.
48.DotheEQUandENDdirectivesproduceopcodes?
49.Whyarethedirectivesalsocalledpseudocode?
50.Thefilewiththe_______extensionisdownloadedintoARMFlashROM.
51.GivethreefileextensionsproducedbyARMKeil.
Section2.8:TheProgramCounterandProgramROMSpaceintheARM
52.EveryARMfamilymemberwakesupataddress_________whenitispoweredup.
53.Aprogrammerputsthefirstopcodeataddress0x100.Whathappenswhenthe
microcontrollerispoweredup?
54.ARMinstructionsare_______bytes.
55.Writeaprogramtoaddeachofyour5-digitIDtoaregisterandplacetheresultintomemorylocation0x4000100.UsetheprogramlistingtoshowtheFlashmemoryaddressesandtheircontents.
56.Showtheplacementofdatainfollowingcode:
LDRR1,=0x22334455
LDRR2,=0x20000000
STRR1,[R2]
Usea)littleendianandb)bigendian.
57.Showtheplacementofdatainfollowingcode:
LDRR1,=0xFFEEDDCC
LDRR2,=0x2000002C
STRR1,[R2]
Usea)littleendianandb)bigendian.
58.HowwideisthememoryintheARMchip?
59.HowwideisthedatabusbetweentheCPUandtheprogrammemoryintheARM7chip?
60.In“ADDRd,Rn,operand2”,explainhowmanybitsaresetasideforRdandhowitcoverstheentireGPRsintheARMchip.
Section2.9:SomeARMAddressingModes
61.Givetheaddressingmodeforeachofthefollowing:
(a)MOVR5,R3 (b)MOVR0,#56
(c)LDRR5,[R3] (d)ADDR9,R1,R2
(e)LDRR7,[R2] (f)LDRBR1,[R4]
62.Showthecontentsofthememorylocationsaftertheexecutionofeachinstruction.
(a)LDRR2,=0x129F
LDRR1,=0x1450
LDRR2,[R1]
(b)LDRR4,=0x8C63
LDRR1,=0x2400
LDRHR4,[R1]
0x1450=(…….) 0x2400=(…….)
0x1451=(…….) 0x2401=(…….)
Section2.10:RISCArchitectureinARM
63.WhatdoRISCandCISCstandfor?
64.In______(RISC,CISC)architecturewecanhave1-,2-,3-,or4-byteinstructions.
65.In______(RISC,CISC)architectureinstructionsarefixedinsize.
66.In______(RISC,CISC)architectureinstructionsaremostlyexecutedinoneortwocycles.
67.In______(RISC,CISC)architecturewecanhaveaninstructiontoADDaregistertoexternalmemory.
68.Trueorfalse.MostinstructionsinCISCareexecutedinoneortwocycles.
AnswerstoReviewQuestionsSection2.1
1.MOVR2,#0x34
2.
MOVR1,#0x16
MOVR2,#0xCD
ADDR1,R1,R2
or
MOVR1,#0x16
ADDR1,R1,#0xCD
3.False
4.FFinhexand255indecimal
5.32
Section2.2
1.True
2.general-purposeregisters
3.32
4.Specialfunctionregisters(SFRs)
5.32
Section2.3
1.True
2.
MOVR1,#0x20
MOVR2,#0x95
STRBR2,[R1]
3.STRR2,[R8]
4.
MOVR1,#0x20
LDRR4,[R1]
5.FFinhexor255indecimal
6.R6
7.Itcopiesthelower8bitsofR1intolocationpointedtobyR2.
8.FFFFFFFFinhexor4,294,967,295indecimal(232-1)
Section2.4
1.CPSR(currentprogramstatusregister)
2.32bits
3.
Hex Binary
FFFFFF9F 11111111111111111111111110011111
+00000061 +00000000000000000000000001100001
100000000 100000000000000000000000000000000
ThisleadstoC=1andZ=1.
4.
Hex Binary
00000022 00000000000000000000000000100010
+00000022 +00000000000000000000000000100010
000000000 00000000000000000000000001000100
ThisleadstoZ=0.
5.
Hex Binary
00000067 00000000000000000000000001100111
+00000099 +00000000000000000000000010011001
00000100 00000000000000000000000100000000
ThisleadstoC=0andZ=0.
Section2.5
1.MOVR1,#0x20
2.(a)MOVR2,#0x14(b)MOVR2,#20(c)MOVR2,#2_00010100
3.Ifthevalueistobechangedlater,itcanbedoneonceinoneplaceinsteadofateveryoccurrenceandthecodebecomesmorereadable,aswell.
4.(a)0x34(b)0x1F
5.15indecimal(0x0Finhex)
6.Valueoflocation0x00000200=0x95
7.0x0C+0x10=0x1Cwillbeindatamemorylocation0x00000630.
Section2.6
1.TherealworkisperformedbyinstructionssuchasMOVandADD.Pseudo-instructions,alsocalledAssemblydirectives,directtheassemblerindoingitsjob.
2.Theinstructionmnemonics,pseudo-instructions
3.False
4.Allexcept(c)
5.Assemblerdirectives
6.True
7.(c)
Section2.7
1.True
2.True
3.(a)
4.(b)and(d)
Section2.8
1.32
2.True
3.0x00000000
4.True
5.False
6.True
Section2.9
1.No
2.Thegeneralpurposeregisters(R0toR15)
3.Itisapartoftheinstruction
Section2.10
1.RISCisreducedinstructionsetcomputer;CISCstandsforcomplexinstructionsetcomputer.
2.True
3.Large
4.True
5.Allofthem
6.True
7.(c)
8.False
Chapter3:ArithmeticandLogicInstructionsandProgramsIn this chapter, most of the arithmetic and logic instructions are discussed and
programexamples are given to illustrate the applicationof these instructions.Unsignednumbersareusedinthisdiscussionofarithmeticandlogicinstructions.InSection3.1weexamine the arithmetic instructions for unsigned numbers. The logic instructions andprograms are covered in Section 3.2. Section 3.3 is dedicated to rotate and shiftoperations.Weexamineloadingfixed(constant)valuesintoregistersusingrotateoptions,as well. In Section 3.4 we discuss the ARM Cortex instructions for rotate and shift.Section3.5isdedicatedtoBCDandASCIIdataconversion.
Section3.1:ArithmeticInstructionsUnsignednumbersaredefinedasdatainwhichallthebitsareusedtorepresentdata
andnobitsaresetasideforthepositiveornegativesign.Thismeansthattheoperandcanbe between 00 and 0xFF (0 to 255 decimal) for 8-bit data and between 0x0000 and0xFFFF(0to65535decimal)for16-bitdata.Forthe32-bitoperanditcanbebetween0and0xFFFFFFFF (0 to232 -1).SeeTable3-1.This sectioncovers theADD,SUB,andmultiplyinstructionsforunsignednumber.
DataSize Bits Decimal Hexadecimal Loadinstructionused
Byte 8 0–255 0-0xFF STRB
Half-word 16 0–65535 0-0xFFFF STRH
Word 32 0–232-1 0–0xFFFFFFFF STR
Table3‑1:UnsignedDataRangeSummaryinARM
AffectingflagsinARMinstructions
AuniquefeatureoftheexecutionofARMarithmeticinstructionsisthatitdoesnotaffect(updates)theflagsunlesswespecifyit.Thisisdifferentfromothermicrocontrollersand CPUs such as 8051 and x86. In the x86 and 8051 the arithmetic instructionsautomaticallychangetheZandCflagsregardlessofwewantitornot.Thisisnotthecasewith ARM. The default for the ARM instruction is not to affect these flags after theexecutionofarithmeticinstructionssuchasADDandSUB.TheARMassemblergivesustheoptionof telling theCPU toupdate the flagbits in theCSPR register to reflect theresult.WeoverridethedefaultbyhavingletterSintheinstruction.Forexample,wemustuseSUBSinsteadofSUBsincetheSUBinstructionwillnotupdatetheflags.TheSUBSmeanssubtractandsettheflags,whiletheSUBsimplysubtractswithouthavinganyeffectontheflags.SeeTable3-2andFigure3-1.
Figure3-1:GeneralFormationofDataProcessingInstruction
Instruction(Flagsunchanged) Instruction(Flagsupdated)
ADD Add ADDS Addandsetflags
ADC Addwithcarry ADCS Addwithcarryandsetflags
SUB SUBS SUBS Subtractandsetflags
SBC Subtractwithcarry SBCS Subtractwithcarryandsetflags
MUL Multiply MULS Multiplyandsetflags
UMULL Multiplylong UMULLS MultiplyLongandsetflags
RSB Reversesubtract RSBS Reversesubtractandsetflags
RSC Reversesubtractwithcarry RSCS Reversesubtractwithcarryandsetflags
Note:TheaboveinstructionaffectalltheZ,C,VandNflagbitsofCPSR(currentprogramstatusregister)buttheNandVflagsareforsigneddataandarediscussedinChapter5.
Table3-2:ArithmeticInstructionsandFlagBitsforUnsignedData
Additionofunsignednumbers
TheformoftheADDinstructionis
ADDRd,Rn,Op2;Rd=Rn+Op2
The instructions ADD and ADC are used to add two operands. The destinationoperandmustbearegister.TheOp2operandcanbearegisterorimmediate.Rememberthatmemory-to-registerormemory-to-memoryarithmeticandlogicoperationsareneverallowedinARMAssembly languagesince it isaRISCprocessor.The instructioncouldchangeanyoftheZ,C,N,orVbitsofthestatusflagregister,aslongasweusetheADDSinsteadofADD.TheeffectoftheADDSinstructionontheoverflow(V)andN(negative)flagsisdiscussedinChapter5sincetheyareusedinsignednumberoperations.LookatExamples3-1and3-2fortheeffectofADDSinstructiononZandCflags.
Example3-1
Showtheflagbitsofstatusregisterforthefollowingcases:
a)LDRR2,=0xFFFFFFF5;R2=0xFFFFFFF5(noticethe=sign)
MOVR3,#0x0B
ADDSR1,R2,R3;R1=R2+R3andupdatetheflags
b)LDRR2,=0xFFFFFFFF
ADDSR1,R2,#0x95;R1=R2+95andupdatetheflags
Solution:
a)
0xFFFFFFF5 11111111111111111111111111110101
+0x0000000B
+00000000000000000000000000001011
0x100000000 100000000000000000000000000000000
First,noticehowthe“LDRR2,=0xFFFFFFF5”pseudo-instructionloadsthe32-bitvalueintoR2register.AlsonoticetheuseofADDSinstructioninsteadofADDsincetheADDinstructiondoesnotupdatetheflags.Now,aftertheaddition,theR1register(destination)contains0andtheflagsareasfollows:
C=1,sincethereisacarryoutfromD31
Z=1,theresultoftheactioniszero(forthe32bits)
b)
0xFFFFFFFF 11111111111111111111111111111111
+0x00000095
+00000000000000000000000010010101
0x100000094 100000000000000000000000010010100
Aftertheaddition,theR1register(destination)contains0x94andtheflagsareasfollows:
C=1,sincethereisacarryoutfromD31
Z=0,theresultoftheactionisnotzero(forthe32bits)
Example3-2
Showtheflagbitsofstatusregisterforthefollowingcase:
LDRR2,=0xFFFFFFF1;R2=0xFFFFFFF1
MOVR3,#0x0F
ADDSR3,R3,R2;R3=R3+R2andupdatetheflags
ADDR3,R3,#0x7;R3=R3+0x7andflagsunchanged
MOVR1,R3
Solution:
0xFFFFFFF1 11111111111111111111111111110001
+0x0000000F +00000000000000000000000000001111
0x100000000 100000000000000000000000000000000
AftertheADDSaddition,theR3register(destination)contains0andtheflagsareasfollows:
C=1,sincethereisacarryoutfromD31
Z=1,theresultoftheactioniszero(forthe32bits)
Afterthe“ADDR3,R3,#0x7”addition,theR3register(destination)contains0x7(0x+07=07)andtheflagsareunchangedfrompreviousinstructionsinceweusedADDinsteadofADDS.Therefore,theZ=1andC=1.Ifweuse“ADDSR3,R3,#0x7”instructioninsteadof“ADDR3,R3,#0x7”,wewillhaveZ=0andC=0.UsetheKeilARMsimulatortoverifythis.
Comment
MicrosoftWindowscomeswithacalculator.Use it toverify thecalculations in thisandfuturechapters.Thecalculatorsupportsdatasizeofupto64-bit
NoincrementinstructioninARM
There is no increment instruction in theARMprocessor. Insteadwe useADD toperformthisaction.Theinstruction“ADDSR4,R4,#1”willincrementtheR4andplacestheresultinR4register.TheRISCprocessorseliminatetheunnecessaryinstructionsanduseanexistinginstructiontoperformtheoperation.
ADC(addwithcarry)
Thisinstructionisusedformultiword(datalargerthan32-bit)numbers.TheformoftheADCinstructionis
ADCRd,Rn,Op2;Rd=Rn+Op2+C
Indiscussingaddition,thefollowingtwocaseswillbeexamined:
1.Additionofindividualworddata
2.Additionofmultiworddata
CASE1:Additionofindividualworddata
Inpreviousexamples,inprogramsregardingaddition,thetotalsumwaspurposelykeptlessthan0xFFFFFFFF,themaximumvaluea32-bitregistercanhold.Inrealworldto calculate the total sumof anynumberof operands, the carry flag shouldbe checkedaftertheadditionofeachoperandtoseeifthetotalsumisgreaterthan0xFFFFFFFF.SeeExample3-3andProgram3-1.
Example3-3
Showtheflagbitsofstatusregisterforthefollowingcase:
LDRR2,=0xFFFFFFF1;R2=0xFFFFFFF1
MOVR3,#0x0F
ADDSR3,R3,R2;R3=R3+R2andupdatetheflags
ADDR3,R3,#0x7;R3=R3+0x7andflagsunchanged
MOVR1,R3
Solution:
0xFFFFFFF1 11111111111111111111111111110001
+0x0000000F +00000000000000000000000000001111
0x100000000 100000000000000000000000000000000
AftertheADDSaddition,theR3register(destination)contains0andtheflagsareasfollows:
C=1,sincethereisacarryoutfromD31
Z=1,theresultoftheactioniszero(forthe32bits)
Afterthe“ADDR3,R3,#0x7”addition,theR3register(destination)contains0x7(0x0+0x7=0x7)andtheflagsareunchangedfrompreviousinstructionsinceweusedADDinsteadofADDS.Therefore,theZ=1andC=1.Ifweuse“ADDSR3,R3,#0x7”instructioninsteadof“ADDR3,R3,#0x7”,wewillhaveZ=0andC=0.UsetheKeilARMsimulatortoverifythis.
Program3-1Writeaprogramtocalculatethetotalsumoffivewordsofdata.Eachdatavaluerepresentsthemassofaplanetininteger.Thedecimaldataareasfollow:1000000000,2000000000,3000000000,4000000000,and
4100000000.
AREAPROG3_1,CODE,READONLY
ENTRY
LDRR1,=1000000000
LDRR2,=2000000000
LDRR3,=3000000000
LDRR4,=4000000000
LDRR5,=4100000000
MOVR8,#0;R8=0forsavingthelowerword
MOVR9,#0;R9=0foraccumulatingthecarries
ADDSR8,R8,R1;R8=R8+R1
ADCR9,R9,#0;R9=R9+0+Carry
;(incrementR9ifthereiscarry)
ADDSR8,R8,R2;R8=R8+R2
ADCR9,R9,#0;R9=R9+0+Carry
ADDSR8,R8,R3;R8=R8+R3
ADCR9,R9,#0;R9=R9+0+Carry
ADDSR8,R8,R4;R8=R8+R4
ADCR9,R9,#0;R9=R9+0+Carry
ADDSR8,R8,R5;R8=R8+R5
ADCR9,R9,#0;R9=R9+0+Carry
HEREBHERE
END;Markendoffile
CASE2:Additionofmultiwordnumbers
AssumeaprogramisneededthatwilladdthetotalU.S.budgetforthelast100years
or themass of all the planets in the solar system. In cases like this, the numbers beingaddedcouldbeupto8byteswideormore.SinceARMregistersareonly32bitswide(4bytes),itisthejoboftheprogrammertowritethecodetobreakdowntheselargenumbersinto smaller chunks to be processed by the CPU. If a 32-bit register is used and theoperand is 8 bytes wide, that would take a total of two iterations. See Example 3-4.However,ifa16-bitregisterisused,thesameoperandswouldrequirefouriterations.Thisobviouslytakesmoretimefor theCPU.This isonereasontohavewideregisters in thedesignoftheCPU.
Example3-4
Analyzethefollowingprogramwhichadds0x35F62562FAto0x21F412963B:
LDRR0,=0xF62562FA;R0=0xF62562FA
LDRR1,=0xF412963B;R1=0xF412963B
MOVR2,#0x35;R2=0x35
MOVR3,#0x21;R3=0x21
ADDSR5,R1,R0;R5=0xF62562FA+0xF412963B
;nowC=1
ADCR6,R2,R3;R6=R2+R3+C
;=0x35+21+1=0x57
Solution:
AftertheR5=R0+R1thecarryflagisone.SinceC=1,whenADCisexecuted,R6 = R2 + R3 + C =0x35+0x21+1=0x57.
MicrosoftWindowscalculatorsupportdatasizeofup64-bit(doubleword).Useittoverifytheabovecalculations.
Subtractionofunsignednumbers
SUBRd,Rn,Op2;Rd=Rn-Op2
Insubtraction,theARMmicroprocessors(indeed,almostallmodernCPUs)usethe2’s complementmethod.Although everyCPU contains adder circuitry, itwould be too
cumbersome(and take toomany logicgates) todesignseparate subtractorcircuitry.Forthis reason, theARMuses internal adder circuitry toperform the subtractionoperation.AssumingthattheARMisexecutingsimplesubtractinstructions,onecansummarizethestepsofthehardwareoftheCPUinexecutingtheSUBinstructionforunsignednumbersasfollows:
1.Takethe2’scomplementofthesubtrahend(Op2operand).
2.Addittotheminuend(Rnoperand).
3.PlacetheresultindestinationRd.
4.Setthecarryflagifthereisacarry.
ThesefourstepsareperformedforeverySUBSinstructionbytheinternalhardwareoftheARMCPU.Itisafterthesefourstepsthattheresultisobtainedandtheflagsareset.Examples3-5through3-7illustratesthefoursteps.
Example3-5
Showthestepsinvolvedforthefollowingcases:
a)
MOVR2,#0x4F;R2=0x4F
MOVR3,#0x39;R3=0x39
SUBSR4,R2,R3;R4=R2–R3
b)
MOVR2,#0x4F;R2=0x4F
SUBSR4,R2,#0x05;R4=R2–0x05
Solution:
a)
0x4F 0000004F
–0x39 +FFFFFFC7 2’scomplementof0x39
16 100000016 (C=1step4)
Theflagswouldbesetasfollows:C=1,andZ=0.Theprogrammermustlookatthecarryflag(notthesignflag)todetermineiftheresultispositiveornegative.
b)
0x4F 0000004F
–0x05 +FFFFFFFB 2’scomplementof0x05
0x4A 10000004A (C=1step4)
Example3-6
Analyzethefollowinginstructions:
LDRR2,=0x88888888;R2=0x88888888
LDRR3,=0x33333333;R3=0x33333333
SUBSR4,R2,R3;R4=R2–R3
Solution:
Followingarethestepsfor“SUBR4,R2,R3”:
88888888 88888888
–33333333 +CCCCCCCD (2’scomplementof0x33333333)
55555555 155555555 (C=1step4)resultispositive
Notice that, unlikex86CPUs,ARMdoesnot invert the carry flagafterSUBSsoC=0 when there is borrow and C=1 when there is no borrow. It means that after theexecutionofSUBS,ifC=1,theresultispositive;ifC=0,theresultisnegativeandthedestination has the 2’s complement of the result. Normally, the result is left in 2’scomplement,butyoucantakethe2’scomplementoftheresultbyinvertingitandaddingonetoit.
Example3-7
Analyzethefollowinginstructions:
MOVR1,#0x4C;R1=0x4C
MOVR2,#0x6E;R2=0x6E
SUBSR0,R1,R2;R0=R1–R2
Solution:
Followingarethestepsfor“SUBR0,R1,R2”:
4C 0000004C
–6E +FFFFFF92 (2’scomplementof0x6E)
–22 0FFFFFFDE (C=0step4)resultisnegative
SBC(subtractwithborrow)
SBCRd,Rn,Op2;Rd=Rn–Op2–1+C
This instruction is used for subtraction of multiword (data larger than 32-bit)numbers. Notice that in some other architectures, the CPU inverts the C flag aftersubtraction so the content of carry flag is theborrowbit of subtract operation. In thosearchitectures the subtractwith borrow is implemented as “Rd=Rn –Op2 –C” but inARMthecarryflagisnotinvertedaftersubtractionandcarryflagisinvertofborrow.Toinvertthecarryflagwhilerunningthesubtractwithborrowinstructionitisimplementedas“Rd = Rn – Op2 –1+ C”SeeExample3-8.
Example3-8
Analyzethefollowingprogramwhichsubtracts0x21F62562FAfrom0x35F412963B:
LDRR0,=0xF62562FA;R0=0xF62562FA,
;noticethesyntaxforLDR
LDRR1,=0xF412963B;R1=0xF412963B
MOVR2,#0x21;R2=0x21
MOVR3,#0x35;R3=0x35
SUBSR5,R1,R0;R5=R1–R0
;=0xF412963B–0xF62562FA,andC=0
SBCR6,R3,R2;R6=R3–R2–1+C
;=0x35–0x21–1+0=0x13
Solution:
AftertheR5=R1–R0thereisaborrowsothecarryflagiscleared.SinceC=0,whenSBCisexecuted,R6=R3–R2–1+C=0x35–0x21–1+0=0x35–0x21–1=0x13.
NodecrementinstructioninARM
There isnodecrement instruction in theARMprocessor. InsteadweuseSUB toperform the action. The instruction “SUB R4,R4,#1” will decrement one from R4 andplaces the result in R4 register. The RISC processors eliminate the unnecessaryinstructionsanduseanexistinginstructiontoperformthedesiredoperation.
RSB(reversesubtract)
TheformatfortheRSBinstructionis
RSBRd,Rn,Op2;Rd=Op2–Rn
Notice the differencebetween theRSBandSUB instruction.They are essentiallythesameexcept thewaythesourceoperandsaresubtractedisreversed.This instructioncanbeusedtoget2’scomplementofa32-bitoperand.SeeExample3-9.
Example3-9
FindtheresultofR0forthefollowings:
a)
MOVR1,#0x6E;R1=0x6E
RSBR0,R1,#0;R0=0–R1
b)
MOVR1,#0x1;R1=1
RSBR0,R1,#0;R0=0–R1=0–1
Solution:
a)Followingarethestepsfor“RSBR0,R1,#0”:
0 0000000
–6E +FFFFFF92 (2’scomplement)
–6E FFFFFF92 (C=0)resultisnegative
b)Thisisonewaytogetafixedvalueof0xFFFFFFFFinaregister.Therefore,wehaveR0=0xFFFFFFFF.
RSC(reversesubtractwithcarry)
TheformoftheRSCinstructionis
RSCRd,Rn,Op2;Rd=Op2–Rn–1+C
Notice thedifferencebetween theRSBandRSCinstructions.Theyareessentiallythesameexcept thewaythesourceoperandsaresubtractedisreversed.This instructioncanbeusedtogetthe2’scomplementofthe64-bitoperand.SeeExample3-10.
Example3-10
Showhowtocreate2’scomplementofa64-bitdatainR0andR1register.TheR0holdthelower32-bit.
Solution:
LDRR0,=0xF62562FA;R0=0xF62562FA
LDRR1,=0xF812963B;R1=0xF812963B
RSBR5,R0,#0;R5=0–R0
;=0–0xF62562FA=9DA9D06andC = 0
RSCR6,R1,#0;R6=0–R1–1+C
;=0–0xF812963B – 1 + 0=7ED69C4
UseMicrosoftWindowscalculatortoverifytheabovecalculations.
Multiplicationanddivisionofunsignednumbers
Not all CPUs have instructions for multiplication and division. All the ARMprocessorshave amultiplication instructionbut not thedivision.Some familymemberssuchasARMCortexhaveboththedivisionandmultiplicationinstructions.Inthissectionwe examine the multiplication of unsigned numbers. Signed numbers multiplication istreatedinChapter5.
MultiplicationofunsignednumbersinARM
TheARMgivesyou twochoicesofunsignedmultiplication:normalmultiplyandlongmultiply.Thenormalmultiplyinstruction(MUL)isusedwhentheresultislessthan32-bit,whilethelongmultiply(MULL)mustbeusedwhentheresultisgreaterthan32-bit.SeeTable3-3.Inthissectionweexaminebothofthem.
Instruction Source1 Source2 Destination Result
MUL Rn Op2 Rd(32bits) Rd=Rn×Op2
UMULL Rn Op2 RdLo,RdHi(64bits) RdLo:RdHi=Rn×Op2
Note1:UsingMULforword×wordmultiplicationprovidesonlythelower32-bitresultinRdandtherestaredroppediftheresultisgreaterthan32-bit.Iftheresultisgreaterthan0xFFFFFFFF,thenwemustuseUMULL(unsignedMultiplyLong)instruction.
Note2:inword-by-wordmultiplicationusingMULinstruction,iftheresultisgreaterthan32-bitonlythelower32-bitissavedbyARMandtheupperpartisdroppedwithoutsettinganyflag.InsomeCPUstheCflagisusedtoindicatetheresultisgreaterthan32-bitbutthisisnotthecasewithARM.
Table3-3:UnsignedMultiplication(UMULRd,Rn,Op2)Summary
MUL(multiply)
MULRd,Rn,Op2;Rd=Rn×Op2
Innormalmultiplication,theoperandsmustbeinregisters.Afterthemultiplication,thedestinationregisterswillcontaintheresult.Seethefollowingexample:
MOVR1,#0x25;R1=0x25
MOVR2,#0x65;R2=0x65
MULR3,R1,R2;R3=R1×R2=0x65×0x25
Note that in the case of half-word times half-word or smaller sources since thedestinationregisteris32-bitthereisnoprobleminkeepingtheresultof65,535×65,535,the highest possible unsigned 16-bit data. That is not the case in word times wordmultiplicationbecause32-bit×32-bitcanproducearesultgreaterthan32-bit.IftheMULinstruction is used, the destination register will hold the lower word (32-bit) and theportion beyond 32-bit is dropped. So it is not safe to use MUL for multiplication ofnumbersgreater than65,536. Inmanymicroprocessorswhen theresultgoesbeyond thedestination register size, the C flag is raised to indicate that. This is not the casewithARM.Seethefollowingexample:
LDRR1,=100000;R1=100,000
LDRR2,=150000;R2=150,000
MULR3,R2,R1;R3isnot15,000,000,000because
;itcannotfitin32bits.
For this reason wemust use UMULL (unsignedmultiply long) instruction if theresultisgoingtobegreaterthan0xFFFFFFFF.
UMULL(unsignedmultiplylong)
UMULLRdLo,RdHi,Rn,Op2
;RdHi:RdLoRd=Rn×Op2
In unsigned long multiplication, the operands must be in registers. After themultiplication, the destination registerswill contain the result.Notice that the leftmost
register,RdLoinourcase,willholdthelowerwordandthehigherportionbeyond32-bitissavedinthesecondregister,RdHi.Seethefollowingexample:
LDRR1,=0x54000000;R1=0x54000000
LDRR2,=0x10000002;R2=0x10000002
UMULLR3,R4,R2,R1;0x54000000×0x10000002
;=0x054000000A8000000
;R3=0xA8000000,thelower32bits
;R4=0x05400000,thehigher32bits
Notice that it is the job of programmer to choose the best type ofmultiplicationdependingonthesizeofoperandsandtheresult.SeeExample3-11.
Example3-11
Writeashortprogramtomultiply0xFFFFFFFFbyitself.
Solution:
MOVR0,#1;R0=1
RSBR1,R0,#0;R1=0–R0
;=0xFFFFFFFF
RSBR2,R0,#0;R2=0xFFFFFFFF
UMULLR3,R4,R1,R2
Since 0xFFFFFFFF × 0xFFFFFFFF = 0xFFFFFFFE00000001, then R4=0xFFFFFFFEandR3=0x00000001.IfwehadusedMULinstruction,thenthe0xFFFFFFFFwouldhavebeendroppedandonly0x00000001wouldhavebeenkeptbythedestinationregister.
MultiplyandAccumulateInstructionsinARM
Insomeapplicationsuchasdigitalsignalprocessing(DSP)weneedtomultiplytworegistersandaddtheresultwithanotherregister.TheARMhasaninstructiontodobothjobs in a single instruction. The format of MLA (multiply and add) instruction is asfollows:
MLARd,Rm,Rs,Rn;Rd=Rm×Rs+Rn
Inmultiplicationandadd,theoperandsmustbeinregisters.Afterthemultiplicationandadd,thedestinationregisterwillcontaintheresult.Seethefollowingexample:
MOVR1,#100;R1=100
MOVR2,#5;R2=5
MOVR3,#40;R3=40
MLAR4,R1,R2,R3;R4=R1×R2+R3=100×5+40=540
Noticethatmultiplyandaddcanproducearesultgreaterthan32-bit, if theMLAinstruction is used, the destination register will hold the lower word (32-bit) and theportion beyond 32-bit is dropped. For this reason we must use UMLAL (unsignedmultiplyandadd long) instruction if theresult isgoing tobegreater than0xFFFFFFFF.TheformatofUMLALinstructionisasfollows:
UMLALRdLo,RdHi,Rn,Op2;RdHi:RdLo=Rn×Op2+RdHi:RdLo
Inmultiplicationandadd,theoperandsmustbeinregisters.Noticethattheaddendandthehighwordofthedestinationusethesameregisters,thetwoleftmostregistersintheinstruction.ItmeansthatthecontentsoftheregisterswhichhavetheaddendwillbechangedafterexecutionofUMLALinstruction.Seethefollowingexample:
LDRR1,=0x34000000;R1=0x34000000
LDRR2,=0x2000000;R2=0x2000000
EORR3,R3,R3;R3=0x00
LDRR4,=0x00000BBB;R4=0x00000BBB
UMLALR4,R3,R2,R1;0x34000000×0x2000000+0xBBB
;=0x068000000000000BBB
DivisionofunsignednumbersinARM
SomeARMfamilies donot have an instruction for divisionof unsignednumberssinceittakestoomanygatestoimplementit.WecanuseSUBinstructiontoperformthedivision. In the next chapter, after explaining conditional branches, we will show anexampleofunsigneddivisionusingsubtractoperation.
ReviewQuestions
1.ExplainthedifferencebetweenADDSandADDinstructions.
2.TheADCinstructionthathasthesyntax“ADCRd,Rn,Op2”means_____________.
3.ExplainwhytheZ=0forthefollowing:
MOVR2,#0x4F
MOVR4,#0xB1
ADDSR2,R4,R2
4.ExplainwhytheZ=1forthefollowing:
MOVR2,#0x4F
LDRR4,=0xFFFFFFB1
ADDSR2,R4,R2
5.ShowhowtheCPUwouldsubtract0x05from0x43.
6.IfC=1,R2=0x95,andR3=0x4Fpriortotheexecutionof“SBCR2,R2,R3”,whatwillbethecontentsofR2afterthesubtraction?
7.Inunsignedmultiplicationof“MULR2,R3,R4”,theproductwillbeplacedinregister__________.
8.Inunsignedmultiplicationof“MULR1,R2,R4”,theR2canbemaximumof___________ifR4=0xFFFFFFFF.
Section3.2:LogicInstructionsInthissectionwediscussthelogicinstructionsAND,OR,andEx-ORinthecontext
ofmanyexamples. Just likearithmetic instruction,wemustuse theS in the instructionsyntaxifwewanttoupdatetheflags.IftheSsyntaxisusedtheZflagwillbesetifandonlyiftheresultisallzeros,andtheNflagwillbesettothelogicalvalueofbit31oftheresult.TheVflagintheCPSRwillbeunaffected,andtheCflagwillbesettothecarryoutfromthebarrelshifterwhichisdiscussedinthenextsection.SeeTable3-4.
Instruction
(FlagsUnchanged)Action
Instruction
(FlagsChanged)Hexadecimal
AND ANDing ANDS Andingandsetflags
ORR ORRing ORS Oringandsetflags
EOR Exclusive-ORing EORS ExclusiveOringandsetflags
BIC BitClearing BICS Bitclearingandsetflags
Table3-4:LogicInstructionsandFlagBits
TheinstructionformatoflogicinstructionsinARMissimilartotheformatofotherdataprocessinginstructions.SeeFigure3-2.
Figure3-2:GeneralFormationofDataProcessingInstruction
AND
ANDRd,Rn,Op2;Rd=RnANDedOp2
Inputs Output Symbol
X Y XANDY
0 0 0
0 1 0
1 0 0
1 1 1
ThisinstructionwillperformabitwiselogicalANDontheoperandsandplacetheresult in thedestination.Thedestinationand the first sourceoperandsare registers.Thesecondsourceoperandcanbearegisteroranimmediatevalueoflessthan0xFF.
IfweuseANDSinsteadofANDitwillchangetheCandZflagsaccordingtotheresult.AsseeninExample 3-12,ANDcanbeusedtomaskcertainbitsoftheoperand.
Example3-12
Showtheresultsofthefollowingcases
a)
MOVR1,#0x35
ANDR2,R1,#0x0F;R2=R1ANDedwith0x0F
b)
MOVR0,#0x97
MOVR1,#0xF0
ANDR2,R0,R1;R2=R0ANDedwithR1
Solution:
a)
0x3500110101
AND0x0F00001111
0x0500000101
b)
0x9710010111
AND0xF011110000
0x9010010000
ORR
ORRRd,Rn,Op2;Rd=RnORedOp2
Inputs Output Symbol
X Y XORY
0 0 0
0 1 1
1 0 1
1 1 1
TheoperandsareORedandtheresultisplacedinthedestination.ORRcanbeusedtosetcertainbitsofanoperandtoone.Thedestinationandthefirstsourceoperandsareregisters.Thesecondsourceoperandcanbeeitheraregisteroranimmediatevalueoflessthan0xFF.
IfweuseORRSinsteadofORR,theflagswillbeupdated,justthesameasfortheANDSinstruction.SeeExample3-13.
Example3-13
Showtheresultsofthefollowingcases:
a)
MOVR1,#0x04;R1=0x04
ORRSR2,R1,#0x68;R2=R1ORed0x68
b)
MOVR0,#0x97
MOVR1,#0xF0
ORRR2,R0,R1;R2=R0ORedwithR1
Solution:
a)
0x0400000100
OR0x6801101000Flagwillbe:Z=0
0x6C01101100
b)
0x9710010111
OR0xF011110000Flagwillbeunchanged
0xF711110111
The ORR instruction can also be used to test for a zero operand. For example,“ORRSR2,R2,#0”willORtheregisterR2withzeroandmakeZ=1ifR2iszero.
EOR
EORRd,Rn,Op2;Rd=RnEx-ORedwithOp2
Inputs Output Symbol
X Y XEORY
0 0 0
0 1 1
1 0 1
1 1 0
TheEOR instructionwill Exclusive-OR the operands and place the result in thedestinationregister.EORsetstheresultbitsto1iftheyarenotequal;otherwise,theyarereset to 0. The flags are updated if we use EORS instead of EOR. The rules for theoperandsarethesameasintheANDandORinstructions.SeeExamples3-14and3-15.
Example3-14
Showtheresultsofthefollowing:
MOVR1,#0x54
EORR2,R1,#0x78;R2=R1ExOredwith0x78
Solution:
0x5401010100
XOR0x7801111000
0x2C00101100
Example3-15
TheEORinstructioncanbeusedtoclearthecontentsofaregisterbyEx-ORingitwithitself.
Showhow“EORSR1,R1,R1”clearsR1,assumingthatR1=0x45.
Solution:
0x4501000101
XOR0x4501000101
0x0000000000
EOR can also be used to see if two registers have the same value. “EORSR2,R3,R4”willmakeZ=1ifbothregistersR4andR3havethesamevalue,andiftheydo,theresult(00000000)issavedinR2,thedestination.
Another widely used application of EOR is to toggle bits of an operand. Forexample,totogglebit2ofregisterR2:
EORR2,R2,#0x04;EORR2with00000100
Thiswouldcausebit2ofR2tochangetotheoppositevalue;allotherbitswouldremainunchanged.
BIC(bitclear)
BICRd,Rn,Op2;clearcertainbitsofRnspecifiedby
;theOp2andplacetheresultinRd
Inputs Output
X Y XAND(NOTY)
0 0 0
0 1 0
1 0 1
1 1 0
TheBIC(bitclear) instruction isused toclear theselectedbitsof theRnregister.TheselectedbitsareheldbyOp2.ThebitsthatareHIGHinOp2willbeclearedandbitswith LOW will be left unchanged. For example, assuming that R3 = 00001000 theinstruction “BIC R2,R2,R3” will clear bit 3 of R2 and leaves the rest of the bitsunchanged.Inreality,theBICinstructionperformsANDoperationonRnregisterwiththecomplement ofOp2 and places the result in destination register. Look at the followingexample:
MOVR1,#0x0F
MOVR2,#0xAA
BICR3,R2,R1;nowR3=0xAAANDedwith0xF0=0xA0
Ifwewanttheflagstobeupdated,thenwemustuseBICSinsteadofBIC.
MVN(movenegative)
MVNRd,Rn;movenegativeofRntoRd
TheMVN(movenegative)instructionisusedtogenerateone’scomplementofanoperand.Forexample,theinstruction“MVNR2,#0”willmakeR2=0xFFFFFFFF.Lookatthefollowingexample:
LDRR2,=0xAAAAAAAA;R2=0xAAAAAAAA
MVNR2,R2;R2=0x55555555
WecanalsouseEx-ORinstructiontogenerateone’scomplementofanoperand.Ex-ORinganoperandwith0xFFFFFFFFwillgeneratethe1’scomplement.Seethefollowingcode:
LDRR2,=0xAAAAAAAA;R2=0xAAAAAAAA
MVNR0,#0;R0=0xFFFFFFFF
EORR2,R2,R0;R2=R2ExORedwith0xFFFFFFFF
;=0x55555555
Itmustbenotedthattheinstruction“MVNRd,#0”iswidelyusedtoloadthefixedvalueof0xFFFFFFFFintodestinationregister.Wecanusethe“LDRRd,=0xFFFFFFFF”pseudo-instruction to do the same thing, but theARMassemblerwill substitute severalrealARMinstructionsinitsplaceandthereforeittakesmorecodespace.
ReviewQuestions
1.Useoperands0x4FCAand0xC237toperform:
(a)AND(b)OR(c)XOR
2.ANDingawordoperandwith0xFFFFFFFFwillresultinwhatvalueforthewordoperand?Tosetallbitsofanoperandto0,itshouldbeANDedwith______.
3.Tosetallbitsofanoperandto1,itcouldbeORedwith_______.
4.XORinganoperandwithitselfresultsinwhatvaluefortheoperand?
5.Writeaninstructionthatsetsbit4ofR7.
6.Writeaninstructionthatclearsbit3ofR5.
Section3.3:RotateandBarrelShifter Although ARM Cortex has shift and rotate instruction, for the ARM7 we can
perform the shift and rotate operations as part of other instructions such as MOV. Inprevious sections we discussed that as the second argument of process instructions(arithmetic and logic instructions) we can use register or immediate values. In otherwords,theprocessinstructionscanbeusedinoneofthefollowingforms:
1.opcodeRd,Rn,Rs(e.g.ADDR1,R2,R3)
2.opcodeRd,Rn,immediateValue(e.g.ADDR2,R3,#5)
ARMisabletoshiftorrotatethesecondargumentbeforeusingitastheargument.Inthissectionwefirstdiscussshiftingandrotatingonregistersandthenwecovershiftingimmediatevalues.
BarrelShifter
Therearetwokindsofshifts:logicalandarithmetic.Thelogicalshiftisforunsignedoperandsandthearithmeticshiftisforsignedoperands.Logicalshiftwillbediscussedinthis section and the discussionof arithmetic shift is covered inChapter 5.UsingMOVinstructionsinARMonecanshift thecontentsofaregisterrightor left.Thenumberoftimes(orbits)thattheoperandisshiftedcanbespecifieddirectlyorthrougharegister.
LSR
Thisisthelogicalshiftright.Theoperandisshiftedrightbitbybit,andforeveryshift the LSB (least significant bit) will go to the carry flag (C) and the MSB (mostsignificantbit)isfilledwith0.Onecanuseanimmediateoperandoraregistertoholdthenumberoftimesitistobeshifted.Examples3-16and3-17shouldhelptoclarifyLSR.
Example3-16
ShowtheresultofLSRinthefollowing:
MOVR0,#0x9A;R0=0x9A
MOVSR1,R0,LSR#3;shiftR0toright3times
;andthenmove(copy)theresulttoR1
Solution:
0x9A=000000000000000000000000000000010011010
firstshift:000000000000000000000000000000001001101C=0
secondshift:000000000000000000000000000000000100110C=1
thirdshift:000000000000000000000000000000000010011C=0
Aftershiftingrightthreetimes,R1=0x00000013andC=0.Anotherwaytowritetheabovecodeis:
MOVR0,#0x9A
MOVR2,#0x03
MOVR1,R0,LSRR2;shiftR0torightR2times
;andmovetheresulttoR1
Example3-17
ShowtheresultsofLSRinthefollowing:
TIMESEQU0x4
LDRR1,=0x777;R1=0x777
MOVR2,#TIMES;R2=0x04
MOVSR3,R1,LSRR2;shiftR1rightR2numberoftimes
;andplacetheresultinR3
Solution:
Afterfourshifts,theR3willcontain0x77.ThefourLSBsarelostthroughthecarry,onebyone,and0sfillthefourMSBs.
AlthoughLSRdoesaffecttheSandZflags,theyarenotimportantinthiscase.Butnoticethatifyouwanttheflagstobeset,usetheMOVSinstructioninsteadofMOV.
OnecanusetheLSRtodivideanumberby2.SeeExample3-18.
Example3-18
ShowtheresultsofLSRinthefollowing:
LDRR0,=0x88;R0=0x88
MOVSR1,R0,LSR#3;shiftR0rightthreetimes(R1=0x11)
Solution:
Afterthethreeshifts,theR1willcontain0x11.Thisdividesthenumberby8since2tothepower3is8.
LSL
Shiftleftisalsoalogicalshift.ItisthereverseofLSR.Aftereveryshift,theLSBisfilledwith0andtheMSBgoestoC.AlltherulesarethesameasforLSR.Onecanuseanimmediateoperandora register tohold thenumberof times it is tobeshifted left.SeeExample3-19.OnecanusetheLSLtomultiplyanumberby2.SeeExample3-20.
Example3-19
ShowtheeffectsofLSLinthefollowing:
LDRR1,=0x0F000006
MOVSR2,R1,LSL#8
Solution:
00001111000000000000000000000110
C=000011110000000000000000000001100(shiftedleftonce)
C=000111100000000000000000000011000
C=001111000000000000000000000110000
C=011110000000000000000000001100000
C=111100000000000000000000011000000
C=111000000000000000000000110000000
C=110000000000000000000001100000000
C=100000000000000000000011000000000(shiftedeighttimes)
Aftereightshiftsleft,theR2registerhas0x00000600andC=1.TheeightMSBsarelostthroughthecarry,onebyone,and0sfilltheeightLSBs.Anotherwaytowritetheabovecodeis:
LDRR1,=0x0F000006
MOVR0,#0x08
MOVR2,R1,LSLR0
Example3-20
ShowtheresultsofLSLinthefollowing:
TIMESEQU0x5
LDRR1,#0x7;R1=0x7
MOVR2,#TIMES;R2=0x05
MOVR1,R1,LSLR2;shiftR1leftR2numberoftimes
;andplacetheresultinR1
Solution:
Afterthefiveshifts,theR1willcontain0x000000E0.0xE0is224indecimal.Noticethatitmultipliesnumberbypowerof2.7×32=224=0xE0since2tothepowerof5is32.
Table3-5liststhelogicalshiftoperationsinARM.
Operation Destination Source Numberofshifts
LSR(ShiftRight) Rd Rn Immediatevalue
LSR(ShiftRight) Rd Rn registerRm
LSL(ShiftLeft) Rd Rn Immediatevalue
LSL(ShiftLeft) Rd Rn registerRm
Note:Numberofshiftcannotbemorethan32.
Table3-5:LogicShiftoperationsforunsignednumbersinARM
ArithmeticshiftrightASR
ThearithmeticshiftASRisusedforsignednumbersandisdiscussedinChapter5.
Rotatingthebitsofanoperandrightandleft
Therearetwotypesofrotations.Oneisasimplerotationofthebitsoftheoperand,andtheotherisarotationthroughthecarry.Eachisexplainedbelow.
ROR(rotateright)
Inrotateright,asbitsareshiftedfromlefttorighttheyexitfromtherightend(LSB)andentertheleftend(MSB).Inaddition,aseachbitexitstheLSB,acopyofitisgiventothecarryflag.Inotherwords,inRORtheLSBismovedtotheMSBandisalsocopiedtoCflag,asshowninthediagram.Onecanuseanimmediateoperandoraregistertoholdthenumberoftimesitistoberotated.
MOVR1,#0x36
;R1=00000000000000000000000000110110
MOVSR1,R1,ROR#1
;R1=00000000000000000000000000011011C=0
MOVSR1,R1,ROR#1
;R1=10000000000000000000000000001101C=1
MOVSR1,R1,ROR#1
;R1=11000000000000000000000000000110C=1
or:
MOVR1,#0x36
;R1=00000000000000000000000000110110
MOVR0,#3
;R0=3numberoftimestorotate
MOVSR1,R1,RORR0
;R1=11000000000000000000000000000110C=1
alsolookatthefollowingcase:
LDRR2,=0xC7E5
;R2=00000000000000001100011111100101
MOVR4,#0x06
;R4=6numberoftimestorotate
MOVSR3,R2,RORR4
;R3=10010100000000000000001100011111C=1
Rotateleft
ThereisnorotateleftoptioninARM7sinceonecanusetherotateright(ROR)todothejob.Thatmeansinsteadofrotatingleftnbitswecanuserotateright32–nbitstodothe job of rotate left. Using this method does not give us the proper carry if actualinstructionofROLwasavailable.Lookatthefollowingexamples:
LDRR0,=0x00000072
;R0=00000000000000000000000001110010
MOVSR0,R0,ROR#31
;R0=00000000000000000000000011100100C=0
MOVSR0,R0,ROR#31
;R0=00000000000000000000000111001000C=0
MOVSR0,R0,ROR#31
;R0=00000000000000000000001110010000C=0
MOVSR0,R0,ROR#31
;R0=00000000000000000000011100100000C=0
or:
MOVR0,#0x72;R0=01110010
MOVR1,#28;R1=32–4=28
MOVSR0,R0,RORR1;R0=011100100000C=0
;assumeR2=0x672A
LDRR2,=0x671A
;R2=00000000000000000110011100101010
MOVSR2,R2,ROR#27
;R2=00000000000011001110010101000000
;C=0
RRXrotaterightthroughcarry
InRRX,asbitsareshiftedfromlefttoright,theyexitfromtherightend(LSB)tothecarryflag,and thecarryflagenters the leftend(MSB). Inotherwords, inRRXtheLSBismovedtoCandCismovedtotheMSB.Inreality,Cflagactsasifitispartoftheoperand.ThatmeanstheRRXislike33-bitregistersincetheCflagis33thbit.TheRRXtakesnoargumentsandthenumberoftimesanoperandtoberotatedisfixedatone.
;assumeC=0
MOVR2,#0x26
;R2=00000000000000000000000000100110
MOVSR2,R2,RRX
;R2=00000000000000000000000000010011C=0
MOVSR2,R2,RRX
;R2=00000000000000000000000000001001C=1
MOVSR2,R2,RRX
;R2=10000000000000000000000000000100C=1
MOVR2,#0x0F
;R2=00000000000000000000000000001111
MOVSR2,R2,RRX
;R2=00000000000000000000000000000111C=1
MOVSR2,R2,RRX
;R2=10000000000000000000000000000011C=1
MOVSR2,R2,RRX
;R2=11000000000000000000000000000001C=1
Table3-6liststherotateinstructionsoftheARM.
Operation Destination Source NumberofRotates
ROR(RotateRight) Rd Rn Immediatevalue
ROR(RotateRight) Rd Rn registerRm
RRX(RotateRightThroughCarry) Rd Rn 1bit
Table3-6:RotateoperationsforunsignednumbersinARM
ShiftingImmediateArguments
Wejustexaminedtherotateandshiftoperations.Onecanusetherotateoperationtoloadfixed(constant)valuesintoARMregister,aswell.ExaminetheMOVinstructionbitassignmentinFigure3-3.Ofthe32-bitopcode,theupper12bits(D31–D20)areusedfortheopcodeitself.Thelowest8(D7–D0)bitsareusedforfixedvaluesand4bits(D11–D8)areusedforthenumberoftimesrotateoperationisperformedbeforeloadedintotheregister.The8bitsforthefixedvaluegivesus00to255(00to0xFFinhex)rangeandthe4bitsoftherotatenumbergivesus0to15(0to0xFinhex)range.TheMOVinstructionscanbeconstructedtorotatean8-bitfixedvaluetotherightanumberoftimesandthenloadedintotheregister.Thenumberoftimesrotatedrightisalwaystwicethenumberintherotateportionoftheinstruction.Sincerotatevaluecanbe0–15thatgivesnumberofrotationsbetween0–30.Thismeans thatwhenever the secondoperand is an immediatevaluethenumberofrotationisanevennumber.
Figure3-3:MOVInstruction
Aswementioned earlier, theARM supports the rotate right only and there is norotateleftoption.Sotodorotateleftwemustuse32-nforthenumberofrotationright.Therefore, “MOV R0,#0xFF,#30” is the same as rotating left 2 times and “MOVR0,#0xFF,#28”isthesameasrotatingleft4times.SeeExamples3-21through3-23.
Example3-21
ShowallthepossiblecasesofusingMOVinstructionfor0xFFvalueandrotateoptions.
Solution:
MOVR0,#0xFF,#0
;R0=0xFFisrotatedright0times.R0=0x000000FF
MOVR0,#0xFF,#2
;R0=0xFFisrotatedright2times.R0=0xC000003F
MOVR0,#0xFF,#4
;R0=0xFFisrotatedright4times.R0=0xF000000F
MOVR0,#0xFF,#6
;R0=0xFFisrotatedright6times.R0=0xFC000003
MOVR0,#0xFF,#8
;R0=0xFFisrotatedright8times.R0=0xFF000000
MOVR0,#0xFF,#10
;R0=0xFFisrotatedright10times.R0=0x3FC00000
MOVR0,#0xFF,#12
;R0=0xFFisrotatedright12times.R0=0x0FF00000
MOVR0,#0xFF,#14
;R0=0xFFisrotatedright14times.R0=0x03FC0000
MOVR0,#0xFF,#16
;R0=0xFFisrotatedright16times.R0=0x00FF0000
MOVR0,#0xFF,#18
;R0=0xFFisrotatedright18times.R0=0x003FC000
MOVR0,#0xFF,#20
;R0=0xFFisrotatedright20times.R0=0x000FF000
MOVR0,#0xFF,#22
;R0=0xFFisrotatedright22times.R0=0x0003FC00
MOVR0,#0xFF,#24
;R0=0xFFisrotatedright24times.R0=0x0000FF00
MOVR0,#0xFF,#26
;R0=0xFFisrotatedright26times.R0=0x00003FC0
MOVR0,#0xFF,#28
;R0=0xFFisrotatedright28times.R0=0x00000FF0
MOVR0,#0xFF,#30
;R0=0xFFisrotatedright30times.R0=0x000003FC
Example3-22
UsingMOVinstruction,showhowtorotateleftthefixedvalueof0x99totalof(a)4,(b)8,and(c)16times.Alsogivethevalueintheregisteraftertherotation.
Solution:
Sincewedonothaverotateleftoperationwemustuserotateright32–ntimes.
MOVR1,#0x99,#28
;rotatingright28timesisthesameasrotateleft4times
MOVR2,#0x99,#24
;rotatingright24timesisthesameasrotateleft8times
MOVR3,#0x99,#16
;rotatingright16timesisthesameasrotateleft16times
Now,wehave(a)R1=0x00000990,(b)R2=0x00009900,and(c)R3=0x00990000
Example3-23
UsingtheKeilIDE,assembletheprograminExample3-21andexaminethelistfile.Compareandcontrastthecountvalueintheinstructionwithcountvalueoftheopcodes.
Solution:
Asexpected,thenumberoftimesrotatedrightistwicethenumberofrotatefield.
AlsoseeExample3-24forfurtherexamplesofrotateoperation.
Example3-24
Givetheregistervalueforeachofthefollowinginstructionsafteritisexecuted.
MOVR0,#0xAA,#2
MOVR1,#0x20,#28
MOVR4,#0x99,#6
MOVR2,#0x55,#24
MOVR3,#0x01,#20
MOVR7,#0x80,#12
MOVR10,#0x0F,#14
MOVR5,#0x66,#2
Solution:
MOVR0,#0xAA,#2
;R0=0xAAisrotatedright2times.R0=0x8000002A
MOVR1,#0x20,#28
;R1=0x20isrotatedright28times.R0=0x00000200
MOVR4,#0x99,#6
;R4=0x99isrotatedright6times.R0=0x64000002
MOVR2,#0x55,#24
;R2=0x55isrotatedright24times.R0=0x00005500
MOVR3,#0x01,#20
;R3=0x01isrotatedright20times.R0=0x00001000
MOVR7,#0x80,#12
;R7=0x80isrotatedright12times.R0=0x08000000
MOVR10,#0x0F,#14
;R10=0x0Fisrotatedright14times.R0=0x003C0000
MOVR5,#0x66,#2
;R5=0x66isrotatedright2times.R0=0x80000019
WecanusetheconceptofrotatewithotherdataprocessinginstructionsoftheARMtoloadfixedvaluesintotheARMregister.SeeExample3-25.
Example3-25
Givetheregistervalueforeachofthefollowinginstructionsafteritisexecuted.
MVNR0,#0xAA,#2
MVNR1,#0x20,#28
MVNR4,#0x99,#6
MVNR2,#0x55,#24
MVNR3,#0x01,#20
MVNR7,#0x80,#12
MVNR10,#0x0F,#14
MVNR5,#0x66,#2
Solution:
MVNR0,#0xAA,#2
;0xAArotatedright2times=0x8000002A;R0=0x7FFFFFD5
MVNR1,#0x20,#28
;0x20isrotatedright28times=0x00000200;R1=0xFFFFFDFF
MVNR4,#0x99,#6
;0x99isrotatedright6times=0x64000002;R4=0x9BFFFFFD
MVNR2,#0x55,#24
;0x55isrotatedright24times=0x00001A40;R2=0xFFFFAAFF
MVNR3,#0x01,#20
;0x01isrotatedright20times=0x00001000;R3=0xFFFFEFFF
MVNR7,#0x80,#12
;0x80isrotatedright12times=0x08000000;R7=0xF7FFFFFF
MVNR10,#0x0F,#14
;0x0Fisrotatedright14times=0x003C0000;R10=0xFFC3FFFF
MVNR5,#0x66,#2
;0x66isrotatedright2times=0x80000019;R5=0x7FFFFFE6
GeneralFormationofProcessinstruction
NextwewillshowhowthedifferentoperandsaresupportedbyARM.InFigure3-4youseethegeneralformationofprocessinstructions.
Figure3-4:DataProcessInstructions
InallARMinstructionsbits28–31areputasideforconditionfieldwhichiscoveredinChapter4.
Bits26and27are0inprocessinstructionsshowingthattheinstructionisaprocessinstruction and the opcode is represented by bits 24–21. Using 4 bits 16 differentinstructionsareprovidedasshowninFigure3-4.
TheSbit (bit20) shows if the flagsshouldbeupdated.WhenweaddanS (e.g.MOVS)thebitwillbesetwhichshowsthattheflagsshouldbeupdatedbytheCPU.
Bits12–15containthedestinationregister.Using4bitswecanselectregistersR0–R15.
Bits16–19representthefirstoperandregister.
Bit25(I)showsthetypeofsecondoperand.Thebitis1whenthesecondoperandisanimmediatevalue.Wheneverthesecondoperandisaregisterthisbitiszero.
Immediate values: In the case that I is 1, bits 0–7, contain an immediate valuewhichcanbeanumberbetween0–0xFF.
Second register: In the case that I is 0, bits 0–11 represent the second operandregister, togetherwith the amount of shift/rotate and type of shift/rotate.Asmentionedearlier the shift amount can be provided either by a register or an immediate value.WhenevertheIbitiscleared,bit4showsthewayshiftamountisprovided.SeeFigure3-5.
Figure3-5:DataProcessInstructions
Example3-26shouldhelptoclarifythis.
Example3-26
UsingtheKeilIDE,assemblethefollowingprogramandcomparetheinstructionwiththeprocessinstructionformat.
AREAEX3_26,CODE,READONLY
ENTRY
ADDR0,R1,R5,LSR#2
ADDR0,R1,R5,LSRR2
ADDR0,R1,R5,LSLR2
ADDR0,R1,R5
ADDR0,R1,#5,#2
ADDR0,R1,#5
H1BH1
END
Solution:
In“ADDR0,R1,R5,LSR#2”thesecondoperandisaregister.Therefore,theIbitiscleared.Sincetheshiftamountisanimmediatevalue,thebit4iscleared,aswell.Theshifttypeissetto01representingLSR.
In“ADDR0,R1,R5,LSRR2”thesecondoperandisaregister.Therefore,theIbitiscleared.Asaregisterprovidestheshiftamount,thebit4isset.
The“ADDR0,R1,R5,LSLR2”instructionisthesameas“ADDR0,R1,R5,LSRR2”exceptthattheshifttypeissetto00torepresentLSL.
Themachinecoderepresentsinfacttheinstruction“ADD R0,R1,R5,LSL#0”.But,ifweshiftanumber0bitstoleft,thenumberremainsunchanged.Asaresult,itrepresents“ADD R0,R1,R5”.
In“ADDR0,R1,#5,#2”thesecondoperandisimmediate.Therefore,theIbitisset.Inimmediateoperandsthevalueisshiftedtwicetherotatefield.Thatiswhytherotatefieldis1.
In“ADDR0,R1,#5”,theimmediatevalueisrotated0bits.Therefore,theimmediatevalueremainsunchanged.
LoadingfixedvaluesintoRegisters
In this section, we examined loading fixed values into registers. However, aswehaveseensofar,itisnotpossibletocreateeveryconstantvaluebyusingtherotateoptionoftheinstructions.
AnotherwayistousecombinationoftheseinstructionssuchasMVN,MOV,ADD,AND,OR,andsoontoloadanyvalueintoaregister.ForexamplethefollowingprogramloadsR1with0x87654321:
MOVR1,#0x21;R1=0x00000021
ORRR1,R1,#0x43,#24
;R1=0x00000021ORed0x00004300=0x00004321
ORRR1,R1,#0x65,#16
;R1=0x00004321ORed0x00650000=0x00654321
ORRR1,R1,#0x87,#8
;R1=0x00654321ORed0x87000000=0x87654321
But it is very tedious and time consuming to come upwith such combination ofinstructions to get the resultwe need. For this reason theARM assembler has pseudo-instructionsuchasLDR.TheLDRisnotarealinstructionlikeMOVandADDsinceyouwillnotfindtheopcodeforitintheARMmanual.WhentheARMassemblerassemblesaprogramwiththeLDRitreplacestheLDRwithagroupofinstructionstodothejobofloading a fixed value in themost efficient way possible. Aswe have seen in previouschapters, theLDRuses = sign for the immediate value instead of # used by theMOVinstruction. Indeed this is thewaywedistinguishbetween them.Examine the followingcodestoseethedifferencesinsyntaxes:
MOVR1,#0x55;R1=0x55
LDRR2,=0x5555;R2=0x5555
ItneedstobeemphasizedthatwemustusetheMOVinstructionforloadingfixed
valueoflessthan(orequalto)0xFFandusetheLDRpseudo-instructiononlyforloadingvalues larger than0xFF.This isdue to thefact thatMOVisareal instructionand takesless memory space when it is compiled. See Chapter 6 for more on LDR and ADRpseudo-instructions.
ReviewQuestions
1.FindthecontentsofR3afterexecutingthefollowingcode:
MOVR0,#0x04
MOVR3,R0,LSR#2
2.FindthecontentsofR4afterexecutingthefollowingcode:
LDRR1,=0xA0F2
MOVR2,#0x3
MOVR4,R1,LSRR2
3.FindthecontentsofR3afterexecutingthefollowingcode:
LDRR1,=0xA0F2
MOVR2,#0x3
MOVR3,R1,LSLR2
4.FindthecontentsofR5afterexecutingthefollowingcode:
SUBSR0,R0,R0
MOVR0,#0xAA
MOVR5,R0,ROR#4
5.FindthecontentsofR0afterexecutingthefollowingcode:
LDRR2,=0xA0F2
MOVR1,0x4
MOVR0,R2,RRXR1
6.GivetheresultinR1forthefollowing:
MVNR1,#0x01,#2
7.GivetheresultinR2forthefollowing:
MVNR2,#0x02,#28
Section3.4:ShiftandRotateInstructionsinARMCortex(CaseStudy)ARMCortexhasnew instructions specifically for shift and rotate. In this section,
wediscusstheseinstructions.ForfullinstructionsetseeAppendixA.ForthepurposeofcomparisonwehavecopiedthissectionfromAppendixA.
LSLLogicalShiftLeft
LSLRd,Rm,Rn
Function:AseachbitofRmregisterisshiftedleft,theMSBisremovedandtheemptybitsarefilledwithzeros.ThenumberofbitstobeshiftedleftisgivenbyRnandtheresultisplacedinRdregister.TheLSLdoesnotupdatestheflags.
Example1:
LDRR2,=0x00000010
LSLR0,R2,#8;R0=R2isshiftedleft8times
;now,R0=0x00001000,flagsnotupdated
Example2:
LDRR0,=0x00000018
MOVR1,#12
LSLR2,R0,R1;R2=R0isshiftedleftR1numberoftimes
;now,R2=0x000018000,flagsnotupdated
Example3:
LDRR0,=0x0000FF18
MOVR1,#16
LSLR2,R0,R1;R2=R0isshiftedleftR1numberoftimes
;now,R2=0xFF180000,flagsnotupdated
LSLSLogicalShiftLeft(updatetheflags)
LSLSRd,Rm,Rn
Function:AseachbitofRmregisterisshiftedleft,theMSBiscopiedtoCflagandtheemptybitsarefilledwithzeros.ThenumberofbitstobeshiftedleftisgivenbyRnandtheresultisplacedinRdregister.TheLSLSupdatestheflags.
Example1:
LDRR2,=0x00000010
LSLSR0,R2,#8;R0=R2isshiftedleft8times
;now,R0=0x00001000,C=0,N=0,Z=0
Example2:
LDRR0,=0x00000018
MOVR1,#12
LSLSR2,R0,R1;R2=R0isshiftedleftR1numberoftimes
;now,R2=0x000018000,C=0,N=0,Z=0
Example3:
LDRR0,=0x000FFF18
MOVR1,#16
LSLSR2,R0,R1;R2=R0isshiftedleftR1numberoftimes
;now,R2=0xFF180000,C=1,Z=0,N=0
Thelogicalshiftleftisusedforshiftingunsignednumbers.LSLSessentiallymultipliesRmbyapowerof2aftereachbitisshifted.
LSRLogicalShiftRight
LSRRd,Rm,Rn
Function:AseachbitofRmregisterisshiftedright,theLSBisremovedandtheemptybitsarefilledwithzeros.ThenumberofbitstobeshiftedleftisgivenbyRnandtheresultisplacedinRdregister.TheLSRdoesnotupdatetheflags.
Example1:
LDRR2,=0x00001000
LSRR0,R2,#8;R0=R2isshiftedright8times
;now,R0=0x00000010,C=0
Example2:
LDRR0,=0x000018000
MOVR1,#12
LSRR2,R0,R1;R2=R0isshiftedrightR1numberoftimes
;now,R2=0x00000018,C=0
Example3:
LDRR0,=0x7F180000
MOVR1,#16
LSRR2,R0,R1;R2=R0isshiftedrightR1numberoftimes
;now,R2=0x00007F18,C=0
Thelogicalshiftrightisusedforshiftingunsignednumbers.LSRessentiallydividesRmbyapowerof2aftereachbitisshifted.
LSRSLogicalShiftRight(updatetheflags)
LSRSRd,Rm,Rn
Function:AseachbitofRmregisterisshiftedright,theLSBiscopiedtoCflagandtheemptybitsarefilledwithzeros.ThenumberofbitstobeshiftedleftisgivenbyRnandtheresultisplacedinRdregister.TheLSRSupdatestheflags.
Example1:
LDRR2,=0x00001FFF
LSRSR0,R2,#8;R0=R2isshiftedright8times
;now,R0=0x0000001F,C=1,N=0,Z=0
Example2:
LDRR0,=0x00000018
MOVR1,#12
LSRSR2,R0,R1;R2=R0isshiftedrightR1numberoftimes
;now,R2=0x000000000,C=0,N=0,Z=1,
Example3:
LDRR0,=0x000FFF18
MOVR1,#16
LSRSR2,R0,R1;R2=R0isshiftedrightR1numberoftimes
;now,R2=0x0000000F,C=1,Z=0,N=0
The logical shift right is used for shifting unsigned numbers. LSRS essentiallydividesRmbyapowerof2aftereachbitisshifted.
RORRotateRight
RORRd,Rm,Rn;Rd=rotateRmrightRnbitpositions
Function:AseachbitofRmregisterisshiftedfromlefttoright,theyexitfromtheend(LSB)andenteredfromleftend(MSB).ThenumberofbitstoberotatedrightisgivenbyRnandtheresultisplacedinRdregister.TheRORdoesnotupdatetheflags.
Example1:
LDRR2,=0x00000010
RORR0,R2,#8;R0=R2isrotatedright8times
;now,R0=0x10000000,C=0
Example2:
LDRR0,=0x00000018
MOVR1,#12
RORR2,R0,R1;R2=R0isrotatedrightR1numberoftimes
;now,R2=0x01800000,C=0
Example3:
LDRR0,=0x0000FF18
MOVR1,#16
RORR2,R0,R1;R2=R0isrotatedrightR1numberoftimes
;now,R2=0xFF180000,C=0
RORSRotateRight(updatetheflags)
RORSRd,Rm,Rn;Rd=rotateRmrightRnbitpositions
Function:AseachbitofRmregistershiftsfromlefttoright,theyexitfromtherightend(LSB)andenterfromtheleftend(MSB).InadditionaseachbitexitstheLSB,acopyofitisgiventoCflag.ThenumberofbitstoberotatedrightisgivenbyRnandtheresultisplacedinRdregister.TheRORSupdatestheflags.
Example1:
LDRR2,=0x00000010
RORSR0,R2,#8;R0=R2isrotatedright8times
;now,R0=0x01000000,C=0,N=0,Z=0
Example2:
LDRR0,=0x00000018
MOVR1,#12
RORSR2,R0,R1;R2=R0isrotatedrightR1numberoftimes
;now,R2=0x01800000,C=0,N=0,Z=0
Example3:
LDRR0,=0x0000FF18
MOVR1,#16
RORSR2,R0,R1;R2=R0isrotatedrightR1numberoftimes
;now,R2=0xFF180000,C=1,N=0,Z=0
RRXRotateRightwithextend
RRXRd,Rm;Rd=rotateRmright1bitposition
Function:EachbitofRmregisterisshiftedfromlefttorightonebit.TheRRXdoesnotupdatetheflags.
Example:
LDRR2,=0x00000002
RRXR0,R2;R0=R2isshiftedrightonebit
;now,R0=0x00000001
RRXSRotateRightwithextend(updatetheflags)
RRXSRd,Rm;Rd=rotateRmright1bitposition
Function:EachbitofRmregisterisshiftedfromlefttorightonebit.TheRRXSupdatestheflags.
Example1:
LDRR2,=0x00000002
RRXSR0,R2;R0=R2isshiftedrightonebit
;now,R0=0x00000001
ReviewQuestions
1.Trueorfalse.ARMCortexhasitsowninstructionforrotateandshift.
2.FindthecontentsofR3afterexecutingthefollowingcode:
MOVR1,#0x08
RORR2,R1,#2
3.FindthecontentsofR4afterexecutingthefollowingcode:
MOVR2,#0x3
LSLR5,R3,#2
Section3.5:BCDandASCIIConversionThissectioncoversbinary,BCD,andASCIIconversionswithsomeexamples.
BCDnumbersystem
BCDstandsforbinarycodeddecimal.BCDisneededbecauseweusethedigits0to9fornumbersineverydaylife.BCDsystemiswidelyusedinreal-timeclock(RTC)oftheembeddedsystems.Binaryrepresentationof0to9iscalledBCD.IncomputerliteratureoneencounterstwotermsforBCDnumbers:(1)unpackedBCD,and(2)packedBCD.
Digit BCD
0 0000
1 0001
2 0010
3 0011
4 0100
5 0101
6 0110
7 0111
8 1000
9 1001
Table3-9:BCDCodes
UnpackedBCD
InunpackedBCD,thelower4bitsofthenumberrepresenttheBCDnumberandtherestofthebitsare0.Forexample,“00001001”and“00000101”areunpackedBCDfor9and5,respectively.InthecaseofunpackedBCDittakes1byteofmemorylocationoraregisterof8bitstoholdthenumber.
PackedBCD
In the caseofpackedBCD, a singlebytehas twoBCDnumbers in it, one in thelower4bitsandoneintheupper4bits.Forexample,“01011001”ispackedBCDfor59.Ittakesonly1byteofmemorytostorethepackedBCDoperands.ThisisonereasontousepackedBCDsinceitistwiceasefficientinstoringdata.
ASCIInumbers
InASCIIkeyboards,whenkey“0”ispressed,“0110000”(0x30)isprovidedtothecomputer.Inthesameway,0x31(0110001)isprovidedforkey“1”,andsoon,asshown
inthefollowinglist:
Key ASCII Binary(hex) BCD(unpacked)
0 30 0110000 00000000
1 31 0110001 00000001
2 32 0110010 00000010
3 33 0110011 00000011
4 34 0110100 00000100
5 45 0110101 00000101
6 36 0110110 00000110
7 37 0110111 00000111
8 38 0111000 00001000
9 39 0111001 00001001
Itmust be noted that althoughASCII is standard in theUnited States (andmanyother countries), BCD numbers have universal application. Now since the keyboard,printers,andmonitorsareallinASCII,howdoesdatagetconvertedfromASCIItoBCD,andviceversa?Thesearethesubjectscoverednext.
ASCIItounpackedBCDconversion
ToconvertASCIIdatatounpackedBCD,theprogrammermustgetridofthetagged“011” in theupper4bitsof theASCII.Todo that,eachASCIInumber isANDedwith“00001111”(0x0F).
ASCIItopackedBCDconversion
ToconvertASCIItopackedBCD,itisfirstconvertedtounpackedBCD(togetridofthe3)andthencombinedtomakepackedBCD.Forexample,for2and7thekeyboardgives0x32and0x37,respectively.Thegoalistoproduce0x27or“00100111”,whichiscalledpackedBCD,asdiscussedearlier.Thisprocessisillustratedindetailbelow.
Key ASCII UnpackedBCD PackedBCD
2 32 00000010
7 37 00000111 00100111(0x27)
MOVR1,#0x37;R1=0x37
MOVR2,#0x32;R2=0x32
ANDR1,R1,#0x0F;mask3togetunpackedBCD
ANDR2,R2,#0x0F;mask3togetunpackedBCD
MOVR3,R2,LSL#4;shiftR24bitstolefttogetR3=0x20
ORRR4,R3,R1;ORthemtogetpackedBCD,R4=0x27
PackedBCDtoASCIIconversion
Fordata tobedisplayedon themonitororbeprintedby theprinter, itmustbe inASCII format. Conversion from packed BCD to ASCII is discussed next. To convertpackedBCDtoASCII,itmustfirstbeconvertedtounpackedandthentheunpackedBCDis tagged with 011 0000 (0x30). The following shows the process of converting frompackedBCDtoASCII.
PackedBCD UnpackedBCD ASCII
0x29 0x02&0x09 0x32&0x39
00101001 00000010&00001001 0110010&0111001
MOVR0,#0x29
ANDR1,R0,#0x0F;maskupperfourbits
ORRR1,R1,#0x30;combinewith30togetASCII
MOVR2,R0,LSR#04;shiftright4bitstogetunpackedBCD
ORRR2,R2,#0x30;combinewith30togetASCII
ReviewQuestions
1.Forthefollowingdecimalnumbers,givethepackedBCDandunpackedBCDrepresentationsinbinary
(a)15(b)99
2.ForthefollowingpackedBCDnumbers,givethedecimalandunpackedBCDrepresentations.
(a)0x41(b)0x09
3.Repeatquestion2forASCII.
ProblemsSection3.1:ArithmeticInstructions
1.FindCandZflagsforeachofthefollowing.Alsoindicatetheresultoftheadditionandwheretheresultissaved.
(a)
MOVR1,#0x3F
MOVR2,#0x45
ADDSR3,R1,R2
(b)
LDRR0,=0x95999999
LDRR1,=0x94FFFF58
ADDSR1,R1,R0
(c)
LDRR0,=0xFFFFFFFF
ADDSR0,R0,#1
(d)
LDRR2,=0x00000001
LDRR1,=0xFFFFFFFF
ADDSR0,R1,R2
ADCSR0,R0,#0
(e)
LDRR0,=0xFFFFFFFE
ADDSR0,R0,#2
ADCR1,R0,#0x0
2.StatethethreestepsinvolvedinaSUBandshowthestepsforthefollowingdata.
(a)0x23–0x12(b)0x43–0x51(c)0x99–0x99
Section3.2:LogicInstructions
3.Assumethatthefollowingregisterscontainthesehexcontents:R0=0xF000,R1=0x3456,andR2=0xE390.Performthefollowingoperations.Indicatetheresultandtheregisterwhereitisstored.
Note:theoperationsareindependentofeachother.
(a)ANDR3,R2,R0 (b)ORRR3,R2,R1
(c)EORR0,R0,#0x76 (d)ANDR3,R2,R2
(e)EORR0,R0,R0 (f)ORRR3,R0,R2
(g)ANDR3,R0,#0xFF (h)ORRR3,R0,#0x99
(i)EORR3,R1,R0 (j)EORR3,R1,R1
4.GivethevalueinR2afterthefollowingcodeisexecuted:
MOVR0,#0xF0
MOVR1,#0x55
BICR2,R1,R0
5.GivethevalueinR2afterthefollowingcodeisexecuted:
LDRR1,=0x55555555
MVNR0,#0
EORR2,R1,R0
Section3.3:RotateandBarrelShifter
6.AssumingC=0,whatisthevalueofR1afterthefollowing?
MOVR1,#0x25
MOVSR1,R1,ROR#4
7.AssumingC=0,whatarethevaluesofR0andCafterthefollowing?
LDRR0,=0x3FA2
MOVR2,#8
MOVSR0,R0,RORR2
8.AssumingC=0whatisthevalueofR2andCafterthefollowing?
MOVR2,#0x55
MOVSR2,R2,RRX
9.AssumingC=0whatisthevalueofR1afterthefollowing?
MOVR1,#0xFF
MOVR3,#5
MOVSR1,R1,RORR3
10.Givetheregistervalueforeachofthefollowinginstructionsafteritisexecuted.
a)MOVR1,#0x88,#4 b)MOVR0,#0x22,#22
c)MOVR2,#0x77,#8 d)MOVR4,#0x5F,#28
e)MOVR6,#0x88,#22 f)MOVR5,#0x8F,#16
g)MOVR7,#0xF0,#20 h)MOVR1,#0x33,#28
11.Givetheregistervalueforeachofthefollowinginstructionsafteritisexecuted.
a)MVNR2,#0x1 b)MVNR2,#0xAA,#20
c)MVNR1,0x55,#4 d)MVNR0,#0x66,#28
e)MVNR1,#0x80,#24 f)MVNR6,#0x10,#20
g)MVNR7,#0xF0,#24 h)MVNR4,#0x99,#4
12.FindthecontentsofregistersandCflagafterexecutingeachofthefollowingcodes:
a)
MOVR0,#0x04
MOVSR1,R0,LSR#2
MOVSR3,R0,LSRR1
b)
LDRR1,=0xA0F2
MOVR2,#0x3
MOVSR3,R1,LSLR2
c)
LDRR1,=0xB085
MOVR2,#3
MOVSR4,R1,LSRR2
13.FindthecontentsofregistersandCflagafterexecutingeachofthefollowingcodes:
a)
SUBSR2,R2,R2
MOVR0,#0xAA
MOVSR1,R0,ROR#4
b)
MOVR2,#0xAA,#4
MOVR0,#1
MOVSR1,R2,RORR0
c)
LDRR1,=0x1234
MOVR2,#0x010,#2
MOVSR1,R0,RORR2
d)
MOVR0,#0xAA
MOVSR1,R0,RRX
14.UsingMOVinstruction,showhowyourotateleftthefixedvalueof0x33totalofa)4,b)8,andc)12times.Alsogivethevalueintheregisteraftertherotation.
Section3.5:BCDandASCIIConversion
15.Writeaprogramtoconvert0x76frompackedBCDnumbertoASCII.PlacetheASCIIcodesintoR1andR2.
16.For3and2thekeyboardgives0x33and0x32,respectively.Writeaprogramtoconvert0x33and0x32topackedBCDandstoretheresultinR2.
AnswerstoReviewQuestionsSection3.1:ArithmeticInstructions
1.TheADDSinstructionupdatestheflagbitswhileADDdoesnotdothat.
2.Rd=Rn+Op2+C
3.0x4F+0xB1=0x100,sinceitisabyteadditionandresultislessthan32-bittheC=0andZ=0.
4.0x4F+0xFFFFFFB1=00000000,sinceitisawordadditionandresultisgreaterthan32-bit,theC=1andZ=1.
5.
0x43 01000011 00000000000000000000000001000011
–0x05 00000101 2’scomplement= +11111111111111111111111111111011
0x3E 100000000000000000000000000111110
C=1;therefore,theresultispositive
6.R2=R2–R3–C+1=0x95–0x4F–1+1=0x46
7.R2
8.R2=1
Section3.2:LogicInstructions
1.(a)0x4202(b)0xCFFF(c)0x8DFD
2.Theoperandwillremainunchanged;allzeros
3.Allones
4.Allzeros
5.ORRR7,R7,#0x10;R7=R7ORed00010000
6.ANDR5,R5,#0x8;R5=R5ORed00001000
Section3.3:RotateandBarrelShifterOperation
1.R3=1
2.R4=0x0000141E
3.R3=0x00050790
4.R5=0xA000000A
5.R0=0x00005079
6.0xBFFFFFFF
7.0xFFFFFFDF
Section3.4:ShiftandRotateInstructionsinARMCortex
1.True
2.0x01
3.0x0C
Section3.5:BCDandASCIIConversion
1.(a)15=00010101packedBCD=0000000100000101unpackedBCD
(b)99=10011001packedBCD=0000100100001001unpackedBCD
2.(a)0x41=0000010000000001unpackedBCD=41indecimal
(b)0x09=0000000000001001unpackedBCD=9indecimal
3.(a)0x34,0x31
(b)0x30,0x39
Chapter4:Branch,Call,andLoopinginARMIn the sequence of instructions to be executed, it is often necessary to transfer
programcontroltoadifferentlocation.(e.g.whenafunctioniscalled,executionofaloopisrepeated,oraninstructionexecutesconditionally)TherearemanyinstructionsinARMto achieve this. This chapter covers the control transfer instructions available in ARMAssembly language. InSection4.1,wediscuss instructionsused for looping, aswell asinstructionsforconditionalandunconditionalbranches(jumps).Inthesecondsection,weexamine the instructions associated with calling subroutine. In Section 4.3, instructionpipeliningoftheARMisexamined.Instructiontimingandtimedelaysubroutinesarealsodiscussed in Section 4.3. In Section 4.4, we examine the conditional execution of theARMinstructionswhichisauniquefeatureofARM.
Section4.1:LoopingandBranchInstructionsInthissectionwefirstdiscusshowtoperformaloopingactioninARMandthenthe
branch(jump)instructions,bothconditionalandunconditional.
LoopinginARM
Repeating a sequenceof instructionsor anoperation a certainnumberof times iscalleda loop.The loop isoneof themostwidelyusedprogramming techniques. In theARM,thereareseveralwaystorepeatanoperationmanytimes.Onewayistorepeattheoperationoverandoveruntilitisfinished,asshownbelow:
MOVR0,#0;R0=0
MOVR1,#9;R1=9
ADDR0,R0,R1;R0=R0+R1,add9toR0(NowR0is0x09)
ADDR0,R0,R1;R0=R0+R1,add9toR0(NowR0is0x12)
ADDR0,R0,R1;R0=R0+R1,add9toR0(NowR0is0x1B)
ADDR0,R0,R1;R0=R0+R1,add9toR0(NowR0is0x24)
ADDR0,R0,R1;R0=0x2D
ADDR0,R0,R1;R0=0x36
Intheaboveprogram,weadd0x9toR0sixtimes.Thatmakes6×9=54=0x36.One problemwith the above program is that toomuch code spacewould be needed toincrease the number of repetitions to 50or 1000. Amuchbetterway is to use a loop.Next,wedescribethemethodtodoaloopinARM.
UsinginstructionBNEforlooping
TheBNE(branch ifnotequal) instructionuses thezero flag in thestatus register.TheBNEinstructionisusedasfollows:
BACK………;startoftheloop
………;bodyoftheloop
………;bodyoftheloop
SUBSRn,Rn,#1;Rn=Rn-1,settheflagZ=1ifRn=0
BNEBACK;branchifZ=0
Inthelasttwoinstructions,theRn(e.g.R2orR3)isdecremented;ifitisnotzero,itbranches(jumps)backtothetargetaddressreferredtobythelabel.Priortothestartoftheloop,theRnisloadedwiththecountervalueforthenumberofrepetitions.NoticethattheBNE instruction refers to the Z flag of the status register affected by the previousinstruction,SUBS.ThisisshowninExample4-1.
Example4-1
Writeaprogramto(a)clearR0,(b)add9toR0athousandtimes,then
(c)placethesuminR4.
UsethezeroflagandBNEinstruction.
Solution:
;–thisprogramaddsvalue9totheR0a1000times–
AREAEXAMPLE4_1,CODE,READONLY
ENTRY
LDRR2,=1000;R2=1000(decimal)forcounter
MOVR0,#0;R0=0(sum)
AGAINADDR0,R0,#9;R0=R0+9(add09toR1,R1=sum)
SUBSR2,R2,#1;R2=R2-1andsettheflags.Decrementcounter
BNEAGAIN;repeatuntilCOUNT=0(whenZ=1)
MOVR4,R0;storethesuminR4
HEREBHERE;stayhere
END
IntheprograminExample 4-1,registerR2isusedasacounter.Thecounterisfirstset to1000. Ineach iteration, theSUBSinstructiondecrements theR2andsets the flagbitsaccordingly.IfR2isnotzero(Z=0),itjumpstothetargetaddressassociatedwiththe
label“AGAIN”.ThisloopingactioncontinuesuntilR2becomeszero.AfterR2becomeszero(Z=1),itfallsthroughtheloopandexecutestheinstructionimmediatelybelowit,inthiscase“MOVR4,R0”.
ItmustbeemphasizedagainthatwemustuseSUBSinsteadofSUBsincetheSUBinstructionwillnotchange(update)theflags.SincewearemonitoringtheZflagfortheloop counter we must use SUBS instruction for the decrementing the counter. As wementioned inChapter3,manyof theARMinstructionshave theoptionofaffecting theflags.Intheseinstructionsthedefaultisnottoaffecttheflags.Thereforetomakethemtoeffect the flag we must add letter S to the instruction. That means SUBS and ADDSinstructions are different from SUB and ADD, as far as the flags are concerned. AsanotherexampleseeExample4-2.
Example4-2
Writeaprogramtoplacevalue0x55into100bytesofRAMlocations.
Solution:
AREAEXAMPLE4_2,CODE,READONLY
ENTRY
RAM_ADDREQU0x40000000;changetheaddressforyourARM
MOVR2,#25;counter(25times4=100byteblocksize)
LDRR1,=RAM_ADDR;R1=RAMAddress
LDRR0,=0x55555555;R0=0x55555555
OVERSTRR0,[R1];sendittoRAM
ADDR1,R1,#4;R1=R1+4toincrementpointer
SUBSR2,R2,#1;R2=R2–1fordec.counter
BNEOVER;keepdoingit
HEREBHERE
END
Loopingatrilliontimeswithloopinsidealoop
AsshowninExample 4-3,themaximumcountis232-1.Whathappensifwewanttorepeatanactionmoretimesthanthat?Todothat,weusealoopinsidealoop,whichis
called a nested loop. In a nested loop, we use two registers to hold the count. SeeExample 4-3.
Example4-3
ExplainwhatisthemaximumnumberoftimesthattheloopinExample4-1canberepeated?Now,writeaprogramto(a)loadtheR0registerwiththevalue0x55,and(b)complementit16,000,000,000(16billion)times.
Solution:
BecauseRxisa32-bitregister,itcanholdamaximumof0xFFFFFFFF(232–1decimal);therefore,theloopcanberepeatedamaximumof232–1times.Thisexampleshowshowtocreateanestinglooptogobeyond4billiontimes.Because16,000,000,000islargerthan0xFFFFFFFF(themaximumcapacityofanyR0–R12registers),weusetworegisterstoholdthecount.ThefollowingcodeshowshowtouseR2andR1asaregisterforcountersinanestingloop.
AREAEXAMPLE4_3,CODE,READONLY
ENTRY
MOVR0,#0x55;R0=0x55
MOVR2,#16;load16intoR2(outerloopcount)
L1LDRR1,=1000000000;R1=1,000,000,000(innerloopcount)
L2EORR0,R0,#0xFF;complementR0(R0=R0Ex-OR0xFF)
SUBSR1,R1,#1;R1=R1–1,dec.R1(innerloop)
BNEL2;repeatituntilR1=0
SUBSR2,R2,#1;R2=R2–1,dec.R2(outerloop)
BNEL1;repeatituntilR2=0
HEREBHERE;stayhere
END
In this program,R1 is used to keep the inner loop count. In the instruction “BNEL2”,whenever R1 becomes 0 it falls through and “SUBS R2,R2,#1” is executed. The nextinstructions force theCPUto load the innercountwith1,000,000,000 ifR2 isnotzero,andtheinnerloopstartsagain.ThisprocesswillcontinueuntilR2becomeszeroandtheouterloopisfinished.IfyouusetheKeilIDEtoverifytheoperationoftheaboveprogram
usesmallervaluesforcountertogothroughtheiterations.SeeFigure4-1.
Figure4-1:FlowchartforExample4-3
OtherconditionalBranches
Aswementioned inChapter 3,C andZ flags reflect the result of calculation onunsigned numbers. Table 4-1 lists available conditional branches for unsigned numbersthatuseCandZflags.Moredetailsofeach instructionareprovided inAppendixA.InTable 4-1noticethattheinstructions,suchasBEQ(BranchifZ=1)andBCS(Branchif
carry set, C = 1), jump only if a certain condition is met. Next, we examine someconditionalbranch instructionswithexamples.Theotherconditionalbranch instructionsassociatedwiththesignednumbersarediscussedinChapter5whenarithmeticoperationsforsignednumbersarediscussed.
Instruction Action
BCS/BHS branchifcarryset/branchifhigherorsame BranchifC=1
BCC/BLO branchifcarryclear/branchlower BranchifC=0
BEQ branchifequal BranchifZ=1
BNE branchifnotequal BranchifZ=0
BLS branchiflessorsame BranchifZ=1orC=0
BHI branchifhigher BranchifZ=0andC=1
Table4-1:ARMConditionalBranchInstructionsforUnsignedData
BCC(branchifcarryclear,branchifC=0)
In this instruction, thecarry flagbit inprogramstatus registers (CPSR) isused tomakethedecisionwhethertobranch.Inexecuting“BCClabel”,theprocessorlooksatthecarry flag to see if it is cleared (C = 0). If it is, the CPU starts to fetch and executeinstructionsfromtheaddressofthelabel.IfC=1, itwillnotjumpbutwillexecutethenextinstructionbelowBCC.SeeExample4-4.
Example4-4
ExaminethefollowingcodeandgivetheresultinregistersR0,R1,andR2.
MOVR1,#0;clearhighword(R1=0)
MOVR0,#0;clearlowword(R0=0)
LDRR2,=0x99999999;R2=0x99999999
ADDSR0,R0,R2;R0=R0+R2andsettheflags
BCCL1;ifC=0,jumptoL1andaddnextnumber
ADDSR1,R1,#1;ELSE,increment(R1=R1+1)
L1ADDSR0,R0,R2;R0=R0+R2andsettheflags
BCCL2;ifC=0,addnextnumber
ADDSR1,R1,#1;ifC=1,increment
L2ADDSR0,R2;R0=R0+R2andsettheflags
BCCL3;ifC=0,addnextnumber
ADDSR1,R1,#1;C=1,increment
L3ADDSR0,R2;R0=R0+R2andsettheflags
BCCL4;ifC=0,addnextnumber
ADDSR1,R1,#1;ifC=1,andsettheflags
L4
Solution:
Thisprogramadds0x99999999togetherfourtimes.
R1(highbyte) R0(lowbyte)
Atfirst 0 0
JustbeforeL1 0 0x99999999
JustbeforeL2 1 0x33333332
JustbeforeL3 1 0xCCCCCCCB
JustbeforeL4 2 0x66666664
Hereistheloopversionoftheaboveprogramthatruns10times.
AREAEXAMPLE4_4,CODE,READONLY
ENTRY
MOVR1,#0;clearhighword(R1=0)
MOVR0,#0;clearlowword(R0=0)
LDRR2,=0x99999999;R2=0x99999999
MOVR3,#10;counter
L1ADDSR0,R2;R0=R0+R2andsettheflags
BCCNEXT;ifC=0,addnextnumber
ADDR1,R1,#1;ifC=1,incrementtheupperword
NEXTSUBSR3,R3,#1;R3=R3-1andsettheflags
;(Decrementcounter)
BNEL1;nextroundifZ=0
HEREBHERE;stayhere
END
Notethatthereisalsoa“BCSlabel”instruction.IntheBCSinstruction,ifC=1itjumps to the target address. We will give more examples of these instructions in thecontextofapplications.
Comparisonofunsignednumbers
CMPRn,Op2;compareRnwithOp2andsettheflags
TheCMPinstructioncomparestwooperandsandchangestheflagsaccordingtotheresult of the comparison. The operands themselves remain unchanged. There is nodestination register and the second source operands can be a register or an immediatevaluenotlargerthan0xFF.Itmustbeemphasizedthat“CMPRn,Op2”instructionisreallyasubtractoperation.Op2issubtractedfromRn(Rn–Op2)andtheresultisdiscardedandflags are set accordingly. The contents of Rn and Op2 remain unchanged after theexecutionofCMPinstruction.AlthoughalltheC,S,Z,andVflagsreflecttheresultofthecomparison,onlyCandZareusedforunsignednumbers,asoutlinedinTable4-2.
Instruction C Z
Rn>Op2 1 0
Rn=Op2 1 1
Rn<Op2 0 0
Table4-2:FlagSettingsforCompare(CMPRn,Op2)ofUnsignedData
Lookatthefollowingcase:
LDRR1,=0x35F;R1=0x35F
LDRR2,=0xCCC;R2=0xCCC
CMPR1,R2;compare0x35Fwith0xCCC
BCCOVER;branchifC=0
MOVR1,#0;ifC=1,thenclearR1
OVERADDR2,R2,#1;R2=R2+1=0xCCC+1=0xCCD
Figure4-2showsthediagramandtheClanguageversionofthecode.
Pseudocode:
R1=0x35F
R2=0xCCC
InC:
R1=0x35F;
R2=0xCCC;
IF(R1>=R2)THEN
R1=0
ENDIF
R2=R2+1
if(R1>=R2)
{
R1=0;
}
R2=R2+1;
Figure4-2:FlowchartofifInstruction
Intheaboveprogram,R1islessthantheR2(0x35F<0xCCC);therefore,C=0andBCC(branchifcarryclear)willgototargetOVER.Incontrast,lookatthefollowing:
LDRR1,=0xFFF
LDRR2,=0x888
CMPR1,R2;compare0xFFFwith0x888
BCCNEXT
ADDR1,R1,#0x40
NEXTADDR1,R1,#0x25
In theabove,R1 isgreater thanR2 (0xFFF>0x888),which setsC=1,making“BCCNEXT”fallthroughsothat“ADDR1,R1,0x40”isexecuted.
Again,itmustbeemphasizedthatinCMPinstructions,theoperandsareunaffectedregardlessoftheresultofthecomparison.Onlytheflagsareaffected.ThisisdespitethefactthatCMPusestheSUBoperationtosetorresettheflags.Italsomaybenotedthat,unlike other arithmetic and logic instructions, there is no need to put S in the CMPinstructiontoupdatetheflags.Inotherwords,theCMPinstructionautomaticallyupdatestheflags.
Program4-1usestheCMPinstructiontosearchforthehighestbyteinaseriesof5databytes.Tosearchforthehighestvaluetheinstruction“CMPR1,R3”worksasfollows,whereR1 is the contents of thememory location brought into R1 register by the [R2]pointer.
a)IfR1<R3,thenC=0andR3becomesthebasisofthenewcomparison.
b) IfR1≥R3, thenC=1andR1 is the largerof the twovaluesand remains thebasisofcomparison.
Program4-1Assumethatthereisaclassoffivepeoplewiththefollowinggrades:69,87,96,45,and75.
Findthehighestgrade.
;searchingforhighestvalue
COUNTRNR0;COUNTisthenewnameofR0
MAXRNR1;MAXisthenewnameofR1
;(MAXhasthehighestvalue)
POINTERRNR2;POINTERisthenewnameofR2
NEXTRNR3;NEXTisthenewnameofR3
AREAPROG_4_1D,DATA,READONLY
MYDATADCD69,87,96,45,75
AREAPROG_4_1,CODE,READONLY
ENTRY
MOVCOUNT,#5;COUNT=5
MOVMAX,#0;MAX=0
LDRPOINTER,=MYDATA
;POINTER=MYDATA(addressoffirstdata)
AGAINLDRNEXT,[POINTER]
;loadcontentsofPOINTERlocationtoNEXT
CMPMAX,NEXT
;compareMAXandNEXT
BHSCTNU
;ifMAX>NEXTbranchtoCTNU
MOVMAX,NEXT;MAX=NEXT
CTNUADDPOINTER,POINTER,#4
;POINTER=POINTER+4topointtothenext
SUBSCOUNT,COUNT,#1;decrementcounter
BNEAGAIN;branchAGAINifcounterisnotzero
HEREBHERE
END
Program4-1searchesthroughfivedataitemstofindthehighestvalue.Theprogramhasavariablecalled“MAX”thatholds thehighestgradefoundsofar.Onebyone, thegradesarecompared toHighest. Ifanyof them ishigher, thatvalue isplaced inMAX.
Thiscontinuesuntilalldataitemsarechecked.AREPEAT-UNTILstructurewaschosenintheprogramdesign.Figure4-3showstheflowchartforProgram4-1.Thisdesigncouldbeusedtocodetheprograminmanydifferentlanguages.
Pseudocode:
Count=5
Highest=0
REPEAT
IF(Next>MAX)
THEN
MAX=Next
ENDIF
Incrementpointer
DecrementCount
UNTILCount=0
;nowMAXistheHighest
InC:
//InKeil,longis32-bitwide
unsignedlongmyData[5]={69,87,96,45,75};
unsignedlongcount=5;
unsignedlongmax=0;
unsignedlongnext;
unsignedlong*pointer=myData;
do
{
next=*pointer;
if(max<next)
max=next;
pointer++;
count—;
}while(count!=0);
Figure4-3:FlowchartandPseudocodeforProgram4-1
Using CMP followed by conditional branches we can make any comparison onunsigned numbers, as shown in Table 4-3. Although BCS (branch carry set) and BCC(branchcarryclear)checkthecarryflagandcanbeusedafteracompareinstruction,itisrecommendedthatBHS(branchhigherorsame)andBLO(branchbelow)beusedfortworeasons.OnereasonisthatassemblerswillunassembleBCSasBLO,andBCCasBHS,whichmaybeconfusingtobeginnerprogrammers.Anotherreasonisthat“branchhigher”and “branch below” are easier to understand than “branch carry set” and “branch carryclear,”sinceitismoreimmediatelyapparentthatonenumberislargerthananother,thanwhetheracarrywouldbegeneratedifthetwonumbersweresubtracted.
Instruction Action
BCS/BHS branchifcarryset/branchifhigherorsame BranchifRn≥Op2
BCC/BLO branchifcarryclear/branchlower BranchifRn<Op2
BEQ branchifequal BranchifRn=Op2
BNE branchifnotequal BranchifRn≠Op2
BLS branchiflessorsame BranchifRn≤Op2
BHI branchifhigher BranchifRn>Op2
Table4-3:ARMConditionalBranchInstructionsforUnsignedData
DivisionofunsignednumbersinARM
TheolderARMfamilymembersdonothaveaninstructionfordivisionofunsignednumberssinceittooktoomanygatestoimplementit.However,manyoftheARMCortexchipsimplementthedivideinstruction.InARMswithnodivideinstructionswecanuseSUB instruction to perform the division. Program 4-2 shows an example of unsigneddivisionusingsubtractoperation.Intheprogramthenumeratorisplacedinaregisterandthedenominatorissubtractedfromitrepeatedly.Thequotientisthenumberoftimeswesubtractedandtheremainderisintheregisteruponcompletion.SeeFigure4-4.
Program4-2:DivisionbyRepeatedSubtractions
AREAPROG_4_2,CODE,READONLY;Divisionbysubtractions
ENTRY
LDRR0,=2012;R0=2012(numerator)
;itwillcontainremainder
MOVR1,#10;R1=10(denominator)
MOVR2,#0;R2=0(quotient)
L1CMPR0,R1;CompareR0withR1toseeiflessthan10
BLOFINISH;ifR0<R1jumptofinish
SUBR0,R0,R1;R0=R0-R1(divisionbysubtraction)
ADDR2,R2,#1;R2=R2+1(quotientisincremented)
BL1;gotoL1(Bisdiscussedinthenextsection)
FINISHBFINISH
Pseudocode:
NUM=2012
DENOM=10
WHILENUM>=DENOM
SubtractDENOMfromNUM
IncrementQUOTIENT
ENDWHILE
InC:
R0=2012;
R1=10;
while(R0>=R1)
{
R0=R0-R1;
R2=R2+1;
}
Figure4-4:FlowchartandPseudo-codeforProgram4-2
TST(Test)
TSTRn,Op2;RnANDwithOp2andflagbitsareupdated
TheTSTinstructionisusedtotestthecontentsofregistertoseeifanybitissettoHIGH. After the operands are ANDed together the flags are updated. After the TSTinstructionifresult iszero,thenZflagisraisedandonecanuseBEQ(branchequal)tomakedecision.Lookatthefollowingexample:
MOVR0,#0x04;R0=00000100inbinary
LDRR1,=myport;portaddress
OVERLDRBR2,[R1];loadR2frommyport
TSTR2,R0;isbit2HIGH?
BEQOVER;keepchecking
InTST,likeotherlogicorarithmeticdataprocessinginstructions,theOp2canbeanimmediatevalueoflessthan0xFF.Lookatthefollowingexample:
LDRR1,=myport;portaddress
OVERLDRBR2,[R1];loadR2frommyport
TSTR2,#0x04;isbit2HIGH?
BEQOVER;keepchecking
SeeExample4-5.
Example4-5
Assumeaddresslocation0x200000isassignedtoaninputportaddressandconnectedto8DIPswitches.WriteasimpleshortprogramtocheckthePORTandwheneverbothpins4or6areLOW,R4registerisincremented.
Solution:
MYPORTEQU0x200000
MOVR0,#2_01010000;R0=0x50(01010000inbinary)
LDRR1,=MYPORT;R1=portaddress
OVERLDRBR2,[R1];getabytefromPORTandplaceitinR2
TSTR2,R0;arebits4and6LOW?
BNEOVER;keepchecking
ADDR4,R4,#1
TEQ(testequal)
TEQRn,Op2;RnEX-ORedwithOp2andflagbitsareset
TheTEQinstructionisusedtotesttoseeifthecontentsoftworegistersareequal.After the source operands are Ex-ORed together the flag bits are set according to theresult.AftertheTEQinstructionifresultis0,thenZflagisraisedandonecanuseBEQ(branch zero) tomake decision.Recall that ifweExclusive-OR a valuewith itself, theresult is zero.Look at the following example for checking the temperatureof 100on agivenport:
TEMPEQU100
MOVR0,#TEMP;R0=Temp
LDRR1,=myport;portaddress
OVERLDRBR2,[R1];loadR2frommyport
TEQR2,R0;isit100?
BNEOVER;keepchecking
Unconditionalbranch(jump)instruction
Theunconditionalbranchisajumpinwhichcontrolistransferredunconditionallytothetarget location.IntheARMtherearetwounconditionalbranches:B(branch)andBX(branchandexchange).Thisisdiscussednext.
B(Branch)
B(branch)isanunconditionaljumpthatcangotoanymemorylocationinthe32MbyteaddressspaceoftheARM.AnothersyntaxforBinstructionisBAL(branchalways).
Bhasdifferentusageslikeimplementingif/else,while,andforinstructions.Inthefollowingcodeyouseeanexampleofimplementingtheif/elseinstruction:
CMPR1,R2
BHSL1
MOVR3,#2
BOVER
L1MOVR3,#5
OVER
//inC
if(R1<R2)
{
R3=2;
}
else
{
R3=5;
}
Intheabovecode,R3isinitializedwith2whenR1islowerthanR2.Otherwise,itisinitializedwith5.
Asanexampleofimplementingthewhileinstructionseethefollowingprogram.Itcalculatesthesumofnumbersbetween1and5:
MOVR1,#1
MOVR2,#0
L1CMPR1,#5
BHIL2
ADDR2,R2,R1
ADDR1,R1,#1
BL1
L2MOVR3,#5
//inC
unsignedlongR1=1;
unsignedlongR2=0;
while(R1<=5)
{
R2=R2+R1;
R1=R1+1;
}
Thefor instructioncanbeimplementedthesamewayasthewhileinstruction.Forexample,theaboveassemblyprogramcanbeconsideredasaforloop.
Incaseswherethereisnooperatingsystemormonitorprogram,weusetheBranchto itself inorder tokeep themicrocontrollerbusy.Asimplewayofdoing that isshownbelow:
HEREBHERE;stayhere
AnothersyntaxfortheBinstructionisBAL(branchalways)asshownbelow:
HEREBALHERE;stayhere
SinceARMinstructionis32-bit,8bitsareusedfortheopcode,andtheother24bitsrepresenttheaddressofthetargetlocation.The24-bittargetaddressisshiftedlefttwiceandthatallowsajumpto–32Mto+32Mbytesofmemorylocationsfromtheaddressofcurrentinstruction.Next,weexplainthereasonforthis.
Allbranchesareshortbranches(jumps)
Itmustbenotedthatallbranchinstructions(conditionalandunconditional)areshortjumps,meaningtheaddressofthetargetmustbewithin32Mbytesoftheprogramcounter(PC). That means the short jumps cannot cover the entire address space of 4G bytes(0x00000000to0xFFFFFFFF).
Calculatingtheshortbranchaddress
AllconditionalbranchessuchasBCC,BEQ,andBNEareshortbranches.This isduetothefactthattheARMinstructionsare32-bitandsomeofthe32-bitmustbeusedforopcodeandthereforeitcannotcovertheentire4GbytesaddressspaceofARM.Inthebranchinstructiontheopcodeis8bitsandtherelativeaddressis24bits.SeeFigure4-5.Thetargetaddressisrelativetothevalueoftheprogramcounter.Iftherelativeaddressispositive, the jump is forward. If the relative address is negative, then the jump isbackwards. The relative address can covermemory space of –32Mbytes to +32Mbytes
fromcurrentlocationofprogramcounter.Thereasonfor32MBisthefactthat24bitsareshiftedlefttwice(multipliedby4)bytheARMCPUautomatically.Thatgivesus26bits.Now, since one bit is used for positive or negative sign, we have only 25 bits formagnitude.The25bitsmagnitudegivesus32Mbytes(225=32M)ineachdirection.Thatis–32Mbytesif it isbackwardand+32MBif it isforwardjump.Therefore, tocalculatethetargetaddress,therelativeaddressisshiftedlefttwiceandaddedtotheaddress ofthenextinstructiontobefetched[targetaddress=relativeaddressshiftedlefttwice+addressof the next instruction to be fetched]. It must be noted that the next instruction to befetched is two instructions below the current branch instruction. The reason is theinstructionrightbelowthebranchisalreadyinthepipeline.SeeFigure 4-5.
Figure4-5:B(Branch)Instruction
Youmight askwhyweadd the relativeaddress to theaddressof two instructionsbelowthecurrentinstruction.(whydon’tweaddtherelativeaddresstotheaddressoftheinstruction rightbelow thecurrent instructionas it is inotherCPUs).This isdue to thepipelineissuesaswewillseeinnextsectionandChapter7.
Now, to calculate the target address of the branch we add address of the nextinstructiontobefetchedtotheoffsetshiftedleft twice(multipliedby4).Thereasonforshifting left twice is tomake sure it is word aligned. Given the fact that all theARMinstructionsare4-byte (word) longandarewordaligned,with225=32Mbytesaddressspacefortheforwardandbackwardjumpsthetargetaddressofjump(branch)canbeupto8Minstructions(32MBytes/4byes=8Minstructions)fromthecurrent instructionineachdirection.Althoughthisdoesnotcovertheentire1Ginstructions(4GBytes/4byes=1Ginstructions)ofARMmemoryspace,itismorethanadequateformanyapplications.SeeExample 4-6.
Example4-6
InARM7,thenextinstructiontobefetchedis2instructionsbelowthecurrentexecutinginstruction.Usingthefollowinglistfileverifythejumpforwardaddresscalculation.
LINEADDRESSMachineMnemonicOperand
100000000AREAEXAMPLE_4_6,CODE,READONLY
200000000ENTRY
300000000E3A01015MOVR1,#0x15;R1=0x15
400000004EA000002BTHERE
500000008E3A01025MOVR1,#0x25;R1=0x25
60000000CE3A02035MOVR2,#0x35;R2=0x35
700000010E3A03045MOVR3,#0x45;R3=0x45
800000014E3A04055THEREMOVR4,#0x55;R4=0x55
900000018EAFFFFFEHEREBHERE
10END
Solution:
FirstnoticethattheBinstructioninline4jumpsforward.Tocalculatethetargetaddress,therelativeaddress(offset)isshiftedlefttwiceandaddedtothePC ofthenextinstructiontobefetched.Thepositionofthenextinstructiontobefetchedis2instructionsbelowthecurrentinstruction.RecallthateachinstructionofARMtakes4bytes.Sothenextinstructiontobefetchedis2×4bytes=8bytesbelowthecurrentinstructionaddress(00000004).Sotheaddressofthenextinstructiontobefetchedis0000004+8=000000C(thepositionofMOVinstructioninline6).Inline4theinstruction“BTHERE”hasthemachinecodeofEA000002.Todistinguishtheoperandandopcodeparts,weshouldcomparethemachinecodewiththeBinstructionformat.Inthefollowing,youseetheformatoftheBinstruction.InthisexamplethemachinecodeisEA000002.IfwecompareitwiththeBformat,weseethat,theoperandis000002andtheopcodeisEA.The000002istheoffset,relativetotheaddressofthenextinstruction.Recallthattocalculatethetargetaddress,therelativeaddress(offset)isshiftedlefttwiceandaddedtothecurrentvalueofthePC(ProgramCounter).Shiftingtheoffset(000002)lefttwiceresultsin000008andthenaddingittotheaddressofthenextinstructiontobefetched(0000000C)wehave000008+0000000C=00000014whichisexactlytheaddressofTHERElabel.Allthejumpinstructions,whosemnemonicsbeginwithB,havethesameinstructionformat,andtheopcodechangesfrominstructiontoinstruction.So,wecancalculatetheshortbranchaddressforanyofthem,aswejustdidinthisexample.
Itmustalsobenotedthatforthebackwardbranchtherelativevalueisnegative(2’scomplement).ThatisshowninExample 4-7.
Example4-7
VerifythecalculationofbackwardjumpsforthelistingofExample4-1,shownbelow.
LINEADDRESSMachineMnemonicOperand
500000000E3A02FFALDRR2,=1000;R2=1000
600000004E3A00000MOVR0,#0;R0=0,sum
700000008E2800009AGAINADDR0,R0,#9;R0=R0+9
80000000CE2522001SUBSR2,R2,#1;R2=R2-1
9000000101AFFFFFCBNEAGAIN;repeat
1000000014E1A04000MOVR4,R0;storethesuminR4
1100000018EAFFFFFEHEREBHERE;stayhere
120000001CEND
Solution:
Intheprogramlist,“BNEAGAIN”inline9hasmachinecode1AFFFFFC.Tospecifytheoperandandopcode,wecomparetheinstructionwiththebranchinstructionformat,whichyousawinthepreviousexample.Theopcodeis1Aandtheoperand(relativeoffsetaddress)isFFFFFC.TheFFFFFCgivesus–4,whichmeansthedisplacementis(–4×4=–16=–0x10).
Thebranchislocatedinaddress0x0010.Theaddressofthenextinstructiontobefetchedistwoinstructionsaheadofcurrentbranchinstruction,andeachinstructionis4-bytewide.Therefore,addressofthenextinstructiontobefetched=0x0010+(2×4)=0x0018
Whentherelativeaddressof–0x10isaddedto00000018,wehave–0x0010+0x0018=0x08
Noticethat00000008istheaddressofthelabelAGAIN.
FFFFFCisanegativenumberandthatmeansitwillbranchbackward.Forfurtherdiscussionoftheadditionofnegativenumbers,seeChapter5.
Branchingbeyond32Mbytelimit
To branch beyond the address space of 32M bytes, we use BX (branch andexchange) instruction.The “BXRn” instructionuses registerRn to hold target address.
SinceRncanbeanyoftheR0–R14registersandtheyare32-bitregisters,the“BXRn”instruction can land anywhere in the 4G bytes address space of the ARM. In theinstruction“BXR2”theR2isloadedintotheprogramcounter(R15)andCPUstartstofetch instructions from the target address pointed to by R15, the program counter. SeeFigure 4-6.Sincetheinstructionsarewordaligned,wemustmakesurethatthelowertwobitsoftheRnare0s.
Figure4-6:BX(Branchandexchange)InstructionTargetAddress
TheBXinstructionisalsousedtoswitchtoTHUMBversionoftheARMCPU.FormoreinformationseetheARM manual.
ReviewQuestions
1.ThemnemonicBNEstandsfor_______.
2. True or false. “BNEBACK”makes its decision based on the last instructionaffectingtheZflag.
3.“BNEHERE”isa___-byteinstruction.
4.In“BEQNEXT”,whichflagbitischeckedtoseeifitishigh?
5.B(ranch)isa(n)___-byteinstruction.
6.CompareBandBXinstructions.
Section4.2:CallingSubroutinewithBLAnothercontroltransferinstructionistheBL(branchandlink)instruction,whichis
used to call a subroutine. Subroutines are often used to perform tasks that need to beperformed frequently. This makes a program more structured in addition to savingmemoryspace.In theARMthere isonlyoneinstructionforcallandthat isBL(branchandlink).TouseBLinstructionforcall,wemustleavetheR14registerunusedsincethatis where the ARM CPU stores the address of the next instruction where it returns toresumeexecutingtheprogram.
BL(BranchandLink)instructionandcallingsubroutine
In the32-bit instructionBL, 8 bits are used for theopcode and theother 24bits,k23–k0, are used for the address of the target subroutine, just like in the Branchinstruction. Therefore, BL can be used to call subroutines located anywherewithin the32MaddressspaceoftheARM,asshowninFigure 4-7.
Figure4-7:BL(BranchandLink)Instruction
Tomake sure that theARMknowswhere to comeback to after executionof thecalled subroutine, the ARM automatically saves in the link register (LR), the R14, theaddressoftheinstructionimmediatelybelowtheBL.WhenasubroutineiscalledbytheBL instruction, control is transferred to that subroutine, and the processor saves thePC(program counter) in the R14 register and begins to fetch instructions from the newlocation.Afterfinishingexecutionofthesubroutine,wemustuse“BXLR“instructiontotransfercontrolbacktothecaller.Everysubroutineneeds“BXLR”asthelastinstructionforreturnaddress.
BLinstructionandtheroleoflinkerregister
When a subroutine is called using BL instruction, first the processor saves theaddress of the instruction just below theBL instruction on theR14 register (LR, linkerregister), and thencontrol is transferred to that subroutine.This ishow theCPUknows
wheretoresumewhenitreturnsfromthecalledsubroutine.
Thelinkerregisterandreturningfromsubroutine
ThereisnoRETurninstructioninARM7.Thereforeattheendofthesubroutinewemustcopythelinkerregister,R14,totheprogramcounter(R15).WhentheBLinstructionis executed the address of the instruction below the BL instruction is placed into R14register, so, when the execution of the function finishes, the address of the instructionbelow the BL must be reloaded back into the PC, and so the CPU can resume theexecutionofinstructionsbelowtheBLinstruction.
TounderstandtheroleoftheR14registerinBLinstructionandthereturn,examinetheExamples 4-8.ThefollowingpointsshouldbenotedfortheExample 4-8:
1. Notice theDELAY subroutine.Upon executing the first “BLDELAY”, theaddress of the instruction right below it, “MOVR0,#0xAA”, is savedonto theR14register,andtheARMstartstoexecuteinstructionsatDELAYsubroutine.
2.IntheDELAYsubroutine,firstthecounterR3issetto5(R3=5);therefore,theinnerloopisrepeated5times.WhenR3becomes0,controlfallstothe“BXLR”instruction,which restores the address into the program counter and returns tomainprogramtoresumeexecutingtheinstructionsaftertheBL.
Example4-8
Writeaprogramtotoggleallthebitsofaddress0x40000000bysendingtoitthevalues0x55and0xAAcontinuously.Putatimedelaybetweeneachissuingofdatatoaddresslocation.
Solution:
AREAEXAMPLE4_8,CODE,READONLY
ENTRY
RAM_ADDREQU0x40000000;changetheaddressforyourARM
LDRR1,=RAM_ADDR;R1=RAMaddress
AGAINMOVR0,#0x55;R0=0x55
STRBR0,[R1];sendittoRAM
BLDELAY;calldelay(R14=PCofnextinstruction)
MOVR0,#0xAA;R0=0xAA
STRBR0,[R1];sendittoRAM
BLDELAY;calldelay
BAGAIN;keepdoingit
;––––––—DELAYSUBROUTINE
DELAYLDRR3,=5;R3 =5,modifythisvaluefordifferentsizedelay
L1SUBSR3,R3,#1;R3=R3-1
BNEL1
BXLR;returntocaller
;––––––—endofDELAYsubroutine
END;noticetheplaceforENDdirective
UseKeilIDEsimulatorforARMtosimulatetheaboveprogramandexaminetheregistersandmemorylocation0x40000000.Youmighthavetochangetheaddress0x40000000tosomeothervaluedependingontheRAMaddressoftheARMchipyouuse.
Inaboveprogram,inplaceof“BXLR”forreturn,wecouldhaveused“BXR14”,“MOV R15,R14”,or“MOVPC,LR”instructions.Allofthemdothesamething;butitisrecommendedtousethe“BXLR”instruction.
Theamountof timedelay inExample 4-8dependson the frequencyof theARMchip.Howtocalculatethetimewillbeexplainedinthelastsectionofthischapter.
MainProgramandCallingSubroutines
In realworld projectswe divide the programs into small subroutines (also calledfunctions) and the subroutines are called from themain program.Figure 4-8 shows theformat.
;MAINprogramcallingsubroutines
AREAPogramName,CODE,READONLY
ENTRY
MAINBLSUBR_1;CallSubroutine1
BLSUBR_2;CallSubroutine1
BLSUBR_3;CallSubroutine1
HEREBALHERE;stayhere.BAListhesameasB
;––-endofMAIN
;––––––—SUBROUTINE1
SUBR_1….
….
BXLR;returntomain
;––endofsubroutine1
;––––––—SUBROUTINE2
SUBR_2….
….
BXLR;returntomain
;––endofsubroutine2
;––––––—SUBROUTINE3
SUBR_3….
….
BXLR;returntomain
;––endofsubroutine3
END;noticetheENDoffile
Figure4-8:ARMAssemblyMainProgramThatCallsSubroutines
Program4-3showsanexampleofthemainprogramcallingsubroutine.
Program4-3
;Thisprogramfillsablockofmemorywithafixedvalueand
;thentransfers(copies)theblocktonewareaofmemory
AREAPROGRAM4_3,CODE,READONLY
ENTRY
RAM1_ADDREQU0x40000000;ChangetheaddressforyourARM
RAM2_ADDREQU0x40000100;ChangetheaddressforyourARM
BLFILL;callblockfillsubroutine
BLCOPY;callblocktransfersubroutine
HEREBALHERE;BAL(branchalways)isthesameasB
;–––––-BLOCKFILLSUBROUTINE
FILLLDRR1,=RAM1_ADDR;R1=RAMAddresspointer
MOVR0,#10;counter
LDRR2,=0x55555555
L1STRR2,[R1];sendittoRAM
ADDR1,R1,#4;R1=R1+4toincrementpointer
SUBSR0,R0,#1;R0=R0-1fordeccounter
BNEL1;keepdoingit
BXLR;returntocaller
;–––––—BLOCKCOPYSUBROUTINE
COPYLDRR1,=RAM1_ADDR;R1=RAMAddresspointer(source)
LDRR2,=RAM2_ADDR;R2=RAMAddresspointer(dest.)
MOVR0,#10;counter
L2LDRR3,[R1];getfromRAM1
STRR3,[R2];sendittoRAM2
ADDR1,R1,#4;R1=R1+4toincrementpointerforRAM1
ADDR2,R2,#4;R2=R2+4toincrementpointerforRAM2
SUBSR0,R0,#1;R0=R0–1fordecrementingcounter
BNEL2;keepdoingit
BXLR;returntocaller
;–––-
END;noticetheplaceofENDdirective
Therearecasesinwhichweneedtocallasubroutinewithinacall(nestedcall).TodothatwemustusestacksincetheARMCPUcansupportonlyonecallatatimesincethereisonlyonelinkerregister(R14).
ReviewQuestions
1.ThemnemonicBLstandsfor_______.
2.Trueorfalse.“BLDELAY”savestheaddressoftheinstructionbelowBLinLRregister.
3.“BLDELAY”isa___-byteinstruction.
4.LRisan___-bitregister.
5.LRisthesameas______register.
6.ExplainthedifferencebetweenBandBLinstructions.
Section4.3:ARMTimeDelayandInstructionPipelineIn this sectionwediscusshow togeneratevarious timedelays and calculate time
delays for the ARM. We will also discuss instruction pipelining and its impact onexecutiontime.
DelaycalculationfortheARM
IncreatingatimedelayusingAssemblylanguageinstructions,onemustbemindfuloftwofactorsthatcanaffecttheaccuracyofthedelay:
1. Thecrystalfrequency:ThefrequencyofthecrystaloscillatorconnectedtotheCPUisonefactorinthetimedelaycalculation.Thedurationoftheclockperiodfortheinstructioncycleisafunctionofthiscrystalfrequency.
2. TheARMdesign: Since the 1970s, both the field of IC technology and thearchitecturaldesignofmicroprocessorshave seengreat advancements.Due to thelimitationsofICtechnologyandlimitedCPUdesignexperienceformanyyears,theinstruction cycle duration was longer. Advances in both IC technology and CPUdesign in the 1980s and 1990s havemade the single instruction cycle a commonfeatureofmanymicroprocessors.Indeed,onewaytoincreaseperformancewithoutlosingcodecompatibilitywiththeoldergenerationofagivenfamilyistoreducethenumberof instructioncycles it takes toexecutean instruction.OnemightwonderhowmicroprocessorssuchasARMareabletoexecuteaninstructioninonecycle.Thereare threeways todo that: (a)UseHarvardarchitecture toget themaximumamountofcodeanddata into theCPU,(b)useRISCarchitecturefeaturessuchasfixed-size instructions, and finally (c) use pipelining to overlap fetching andexecutionofinstructions.WehaveexaminedtheHarvardandRISCarchitecturesinChapter2.Next,wegiveabriefdiscussionofpipelining.Chapter7coverstheARMpipelineinmuchmoredetail.
Pipelining
Inearlymicroprocessorssuchasthe8085,theCPUcouldeitherfetchorexecuteatagiventime.Inotherwords,theCPUhadtofetchaninstructionfrommemory,decode,andthenexecuteit,andthenfetchthenextinstruction,decodeandexecuteit,andsoonasshown inFigure 4-9.The ideaofpipelining in its simplest form is toallow theCPUtofetch and execute at the same time. That is an instruction is being fetched while thepreviousinstructionisbeingexecuted.
Figure4-9:Non-pipelineexecution
We can use a pipeline to speed up execution of instructions. In pipelining, theprocessofexecutinginstructionsissplitintosmallstepsthatareallexecutedinparallel.Inthisway,theexecutionofmanyinstructionsisoverlapped.Onelimitationofpipeliningisthatthespeedofexecutionislimitedtothesloweststageofthepipeline.Comparethistomaking pizza. You can split the process ofmaking pizza intomany stages, such as
flatteningthedough,puttingonthetoppings,andbaking,buttheprocessislimitedtotheslowest stage, baking, no matter how fast the rest of the stages are performed. Whathappensifweusetwoorthreeovensforbakingpizzastospeeduptheprocess?Thismaywork for making pizza but not for executing programs, because in the execution ofinstructionswemustmake sure that the sequence of instructions is kept intact and thatthereisnoout-of-stepexecution.
ARMmultistageexecutionpipeline
As shown in Figure 4-10, in the ARM, each instruction is executed in 3 stages:Fetch,Decode,andExecute.
Figure4-10:PipelineinARM
In step 1, the opcode is fetched. In step 2, the opcode is decoded. In step 3, theinstruction is executed and result is written into the destination register. This 3-stagepipelinewasusedintheoriginalARM.ThenewerversionofARMmayhavemorestagesofpipeline.SeeChapter7andyourARMmanual.
InstructioncycletimefortheARM
IttakesacertainamountoftimefortheCPUtoexecuteaninstruction.Thisamountoftimeisreferredtoasmachinecycles.ThankstotheRISCarchitecture,ARMexecutesmost instructions in onemachine cycle. In theARM family, the length of themachinecycle depends on the frequency of the oscillator connected to the ARM system. Thecrystal oscillator, along with on-chip circuitry, provide the clock source for the ARMCPU.IntheARM,onemachinecycleconsistsofoneoscillatorperiod,whichmeansthatwitheachoscillatorclock,onemachinecyclepasses.Therefore,tocalculatethemachinecyclefortheARM,wetaketheinverseofthecrystalfrequency,asshowninExample 4-9.
Example4-9
ThefollowingshowsthecrystalfrequencyforfourdifferentARM-basedsystems.Findtheperiodoftheinstructioncycleineachcase.
(a)80MHz(b)160MHz(c)100MHz(d)50MHz
Solution:
(a)instructioncycleis1/80MHz=0.0125ms(microsecond)=12.5ns(nanosecond)
(b)instructioncycle=1/160MHz=0.00625ms=6.25ns
(c)instructioncycle=1/100MHz=0.01ms=10ns
(d)instructioncycle=1/50MHz=0.02ms=20ns
Branchpenalty
Theoverlappingoffetchandexecutionoftheinstructioniswidelyusedintoday’smicroprocessorssuchasARM.Fortheconceptofpipeliningtowork,weneedabufferorqueue in which an instruction is prefetched and ready to be executed. In somecircumstances,theCPUmustflushoutthequeue.Forexample,whenabranchinstructionisexecuted,theCPUstartstofetchcodesfromthenewmemorylocationandthecodeinthequeue thatwasfetchedpreviously isdiscarded. In thiscase, theexecutionunitmustwaituntil thefetchunitfetchesthenewinstruction.This iscalledabranchpenalty.Thepenaltyisanextrainstructioncycletofetchtheinstructionfromthetargetlocationinsteadofexecutingtheinstructionrightbelowthebranch.Rememberthattheinstructionbelowthe branch has already been fetched and is next in line to be decoded when the CPUbranches to a different address. This means that while the vast majority of ARMinstructions take only onemachine cycle, some instructions take threemachine cycles.These are Branch, BL (call), and all the conditional branch instructions such as BNE,BLO,andsoon.Theconditionalbranchinstructioncantakeonlyonemachinecycleifitdoes not jump.For example, theBNEwill jump ifZ=0 and that takes threemachinecycles.IfZ=1,thenitfallsthroughandittakesonlyonemachinecycle.SeeExamples 4-10and4-11.
Example4-10
ForanARMsystemof100MHz,findhowlongittakestoexecuteeachofthefollowinginstructions:
(a)MOV(b)SUB(c)B
(d)ADD(e)NOP(f)BHI
(g)BLO(h)BNE(i)EQU
Solution:
Themachinecycleforasystemof100MHzis10ns,asshowninExample4-9.Therefore,wehave:
InstructionInstructioncyclesTimetoexecute
(a)MOV11×10ns=10ns
(b)SUB11×10ns=10ns
(c)B33×10ns=30ns
(d)ADD11×10ns=10ns
(e)NOP11×10ns=10ns
Forthefollowing,duetobranchpenalty,3clockcyclesiftakenand1ifitfallsthrough:
(f)BHI3/13×10ns=30ns
(g)BLO3/13×10ns=30ns
(h)BNE3/13×10ns=30ns
(i)EQU0(directivesdonotproducemachineinstructions)
NoticethatARMchipdoesnothavetheNOPinstruction.TheARMAssemblerreplacesitwithsomeother1-cycleinstruction.OnyourARMAssembler,youmightgetawarningthatNOPdoesnotexist,butitstillcompilestheprogram.Itdoesnotgiveanerror.
DelaycalculationforARM
Adelaysubroutineconsistsoftwoparts:(1)settingacounter,and(2)aloop.Mostofthetimedelayisperformedbythebodyoftheloop,asshowninExample 4-11.
Example4-11
Findthesizeofthedelayofthecodesnippetbelowifthecrystalfrequencyis100MHz:
DELAYMOVR0,#255
AGAINNOP
NOP
SUBSR0,R0,#1
BNEAGAIN
MOVPC,LR;return
Solution:
WehavethefollowingmachinecyclesforeachinstructionoftheDELAYsubroutine:
InstructionMachineCycle
DELAYMOVR0,#255;1
AGAINNOP;1
NOP;1
SUBSR0,R0,#1;1
BNEAGAIN;3/1
MOVPC,LR;1
Therefore,wehaveatimedelayof[1+((1+1+1+3)×255)+1]×10ns=15,320ns.
NoticethatBNEtakesthreeinstructioncyclesifitjumpsback,andtakesonlyonecyclewhenfallingthroughtheloop.Thatmeanstheabovenumbershouldbe153.0ns.Becausethelasttime,whenR0iszero,theBNEtakesonlyonecyclebecauseitfallsthroughtheloop
Veryoftenwecalculatethetimedelaybasedontheinstructionsinsidetheloopandignoretheclockcyclesassociatedwiththeinstructionsoutsidetheloop.
InExample 4-11,thelargestvaluetheR0registercantakeis232=4G.Onewaytoincrease the delay is to use NOP instructions in the loop. NOP, which stands for “nooperation,” simply wastes time, but takes 4 bytes of program memory and that is tooheavyapricetopayforjustoneinstructioncycle.Abetterwayistouseanestedloop.
Loopinsidealoopdelay
Anotherwaytogetalargedelayistousealoopinsidealoop,whichisalsocalledanestedloop.SeeExample 4-12.
Example4-12
InagivenARMtraineranI/Oportisconnectedto8LEDs.ThefollowingprogramtogglestheLEDsbysendingtoit0x55and0xAAvaluescontinuously.CalculatethetimedelayfortogglingofLEDs.Assumethesystemclockfrequencyof100MHz.
Solution:
AREAExample4_12,CODE,READONLY
ENTRY
PORT_ADDREQU0x40000000;changetheaddressforyourARM
LDRR1,=PORT_ADDR;R1=portaddress
AGAINMOVR0,#0x55;R0=0x55
STRBR0,[R1];sendittoLEDs
BLDELAY;calldelay
MOVR0,#0xAA;R0=0xAA
STRBR0,[R1];sendittoLEDs
BLDELAY;calldelay
BALAGAIN;keepdoingitforever(BAListhesameasB)
;––––––—DELAYSUBROUTINE
DELAY
MOVR3,#100;R3=100,modifythisvaluefordifferentsizedelay
L1LDRR4,=250000;R4=250,000(innerloopcount)
L2SUBSR4,R4,#1;1clock
BNEL2;3clock
SUBSR3,R3,#1;R3=R3-1
BNEL1
MOVPC,LR;returntocaller
END
Ignoringthedelayassociatedwiththeouterloop,wehavethefollowingtimedelay:
[(1+3)×250,000×100]×10ns=1secondsince1/100MHz=10ns.
ExaminetheworkingfrequencyforyourARMtrainer,changetheaboveaddress0x40000000toyourARMtrainerportaddressandverifythetimedelayusingoscilloscope.
Fromthesediscussionsweconcludethat theuseofinstructionsingeneratingtimedelayisnot themostreliablemethod.Togetmoreaccurate timedelayTimersareused.All ARM microcontrollers come with on-chip Timers. We can use Keil uVision’ssimulatortoverifydelaytimeandnumberofcyclesused.Meanwhile,togetanaccuratetimedelayforagivenARMmicrocontroller,wemustuseanoscilloscopetomeasuretheexacttimedelay.
ReviewQuestions
1.Trueorfalse.IntheARM,themachinecyclelasts1clockperiodofthecrystalfrequency.
2.TheminimumnumberofmachinecyclesneededtoexecuteanARMinstructionis_____.
3.Findthemachinecycleforacrystalfrequencyof66MHz.
4.Assumingacrystalfrequencyof100MHz,findthetimedelayassociatedwiththeloopsectionofthefollowingDELAYsubroutine:
DELAYLDRR2,=50000000
HERENOP
NOP
NOP
NOP
NOP
SUBSR2,R2,#1
BNEHERE
MOVPC,LR
5.FindthemachinecycleforanARMifthecrystalfrequencyis50MHz.
6.Trueorfalse.IntheARM,theinstructionfetchingandexecutionaredoneatthesametime.
7.Trueorfalse.BandBLwillalwaystake2machinecycles.
8.Trueorfalse.TheBNEinstructionwillalwaystake3machinecycles.
Section4.4:ConditionalExecutionEverymicroprocessor has the conditional branch (jump) instruction based on the
statusofflagbitssuchasZandC.InstructionssuchasBEQ(branchequal,Z=1)orBNC(branchifnocarry,C=0)arecommoninallCPUs.TheARMCPUhasauniquefeaturethatwedonotseeinothermicroprocessors.InARM,theconceptofconditionalexecutionisimplementedforallinstructionandnotjustforbranchwhichmakesyouabletodecidetorunorignoreeachsingleinstructiondependingonthestatusofflagbits.Inotherwords,notonlythebranchinstructionbutalloftheARMinstructionscanbeconditional.Aswediscussed inChapters 2 and 3, theADD, SUB, and other arithmetic instruction do notaffect the flag bits in CPSR (current program status register) register unless they haveletterS in the syntax.Thedefault isnot toupdate the flags.Weoverride thedefaultbyhaving letterS in the instruction.Thesame thing is trueaboutconditional fieldofeachinstruction. If we do not add a condition after an instruction, it will be executedunconditionallybecausethedefaultisnottochecktheflagsandexecuteunconditionally.If we want an instruction to be executed only when a condition is met, we put theconditionsyntaxrightaftertheinstruction.
ThisfeatureofARMallowstheexecutionofaninstructionconditionallybasedonthestatusofZ,C,VandNflags.Todothat,theARMinstructionshavesetasidethemostsignificant 4bits of the instruction field for the conditions.SeeFigure4-11.The4bitsgivesus16possibleconditions.Table4-4showsthelistofallthe16possibleconditions.
Figure4-11:ConditionFieldinARMInstructions
Bits MnemonicExtension Meaning Flag
0000 EQ Equal Z=1
0001 NE Notequal Z=0
0010 CS/HS CarrySet/HigherorSame C=1
0011 CC/LO CarryClear/Lower C=0
0100 MI Minus/Negative N=1
0101 PL Plus N=0
0110 VS VSet(Overflow) V=1
0111 VC VClear(NoOverflow) V=0
1000 HI Higher C=1andZ=0
1001 HS LowerorSame C=1andZ=1
1010 GE GreaterthanorEqual N=V
1011 LT Lessthan N≠V
1100 GT Greaterthan Z=0andN=V
1101 LE LessthanorEqual Z=0orN≠V
1110 AL Always(unconditional)
1111 – NotValid
Table4-4:ARMConditioncodesfortheOpcodebits[31-28]
Note!
By default, all the instructions are executed unconditionally. As a result the AL(Always)suffixhasnoeffectontheinstruction.Forexample,BAL(BranchAlways)isexactlythesameasB(Branch).Thesameistrueforallinstructions.
Tomakeaninstructionconditional,simplyweputtheconditionsyntaxfromTable4-4infrontofit.Seethefollowingexamples:
MOVR1,#10;R1=10
MOVR2,#12;R2=12
CMPR2,R1;compare12with10,Z=0becausetheyarenotequal
MOVEQR4,#20;thislineisnotexecutedbecause
;theconditionEQisnotmet
Thefollowingcodeadds10toR1ifitisnotzero:
CMPR1,#0;compareR1with0
ADDNER1,R1,#10;thislineisexecutedifZ=0
;(ifinthelastCMPoperandswerenotequal)
NotethatwecanaddbothSandconditiontosyntaxofaninstruction.ItiscommontoputSafterthecondition.Seethefollowingexamples:
ADDNESR1,R1,#10
;thislineisexecutedandsettheflagsifZ=0
Oneadvantageofusingconditionalexecutionisitsavestimeintheexecutionofaninstruction by avoiding branch penalty. As we discussed earlier, each instruction goesthroughthreepipelinestagesoffetch,decode,andexecution.WhenabranchinstructionsuchasBCC(branchifCcleared)entersthepipeline,theinstructionbelowisalsofetchedintotheCPUnotknowingwhethertheCflagishighorlow.IftheC=0,thebranchwilljump to new location resulting in emptying the instruction pipeline queue which wasworking on the instruction below theBCC instruction. This penalty can be avoided by
usingconditionalexecutionfeatureoftheARM.WecanmaketheexecutionofmanyoftheARMinstructionsconditionalbysimplyaddingamnemonicextensiontotheendofit.For example, we can use ADDCS instead of ADD. The ADDCS means add if C=1.(ADDCS:AddifCarrySet).BythesametokentheADDEQmeansaddonlyifZ=1.
Example 4-13 shows two versions of the Example 4-4we covered earlier. In thenewversion,we use the conditional execution instructions. Simulate and compare bothversionstoseehowtheconditionalinstructionsareexecuted.
Example4-13
Thefollowingprogramadds0x99999999together10times.Comparethecodesyntaxtoseehowtheconditionalexecutionofthecodeisused.
Code1:(Example4-4withoutconditionalexecution)
AREAEXAMPLE4_4,CODE,READONLY
ENTRY
MOVR1,#0;clearhighword(R1=0)
MOVR0,#0;clearlowword(R0=0)
LDRR2,=0x99999999;R2=0x99999999
MOVR3,#10;counter
L1ADDSR0,R0,R2;R0=R0+R2andupdatetheflags
BCCNEXT;ifC=0,gotonextnumber
ADDR1,R1,#1;ifC=1,incrementtheupperword
NEXTSUBSR3,R3,#1;R3=R3-1andupdatetheflags
BNEL1;nextroundifz=0
HEREBHERE;stayhere
END
Code2:(Example4-4withconditionalexecution)
AREAEXAMPLE4_13,CODE,READONLY
ENTRY
MOVR1,#0;clearhighword(R1=0)
MOVR0,#0;clearlowword(R0=0)
LDRR2,=0x99999999;R2=0x99999999
MOVR3,#10;counter
L1ADDSR0,R0,R2;R0=R0+R2andupdatetheflags
ADDCSR1,R1,#1;ifCset(C=1),incrementtheupperword
NEXTSUBSR3,R3,#1;R3=R3-1andupdatetheflags
BNEL1;nextroundifz=0
HEREBHERE;stayhere
END
SeealsoExamples4-14and4-15.
Example4-14
RewritethemainpartofProgram4-1usingconditionalexecutionofARMinstructions
Solution:
MOVCOUNT,#5;COUNT=5
MOVMAX,#0;MAX=0
LDRPOINTER,=MYDATA
;POINTER=MYDATA(addressoffirstdata)
AGAIN
LDRNEXT,[POINTER]
;loadcontentsofPOINTERlocationtoNEXT
CMPMAX,NEXT;compareMAXandNEXT
MOVLOMAX,NEXT
;ifMAXislowerthanNEXTthenMAX=NEXT
ADDPOINTER,POINTER,#4
;POINTER=POINTER + 4topointtonext
SUBSCOUNT,COUNT,#1;decrementcounter
BNEAGAIN;branchAGAINifcounterisnotzero
Example4-15
Usingrotation,writeaprogramthatcountsthenumberof1sinR0.
Solution:
AREAEXAMPLE4_15,CODE,READONLY
ENTRY
LDRR0,=0x34F37D36
MOVR1,#0;numberof1s
MOVR2,#32;counter
BEGINMOVR0,R0,RRX;RotaterightwithcarrytheR0register
ADDCSR1,R1,#1;ifC=1thenincrementR1
SUBSR2,R2,#1;decrementcounter
BNEBEGIN;ifcounterisnotequaltozerobranchBEGIN
END
ReviewQuestions
1.Trueorfalse.AlltheARMinstructionshavetheconditionalexecutionfeature.
2.HowmanybitsoftheARMinstructionaresetasidefortheconditioncodes?
3.TheADDALstandsfor……………...
4.Trueorfalse.MOVVCisavalidinstructioninARM
5.Trueorfalse.SUBEQSisavalidinstructioninARM
ProblemsSection4.1:LoopingandBranchInstructions
1.IntheARM,loopingactionusingasingleregisterislimitedto_______iterations.
2.Ifaconditionalbranchisnottaken,whatisthenextinstructiontobeexecuted?
3.Incalculatingthetargetaddressforabranch,adisplacementisaddedtothecontentsofregister_______.
4.ThemnemonicBNEstandsfor__________.
5.WhatistheadvantageofusingBXoverB?
6.Trueorfalse.ThetargetofaBNEcanbeanywhereinthe4Gwordaddressspace.
7.Trueorfalse.AllARMbranchinstructionscanbranchtoanywhereinthe4Gbyteaddressspace.
8.DissecttheBinstruction,indicatinghowmanybitsareusedfortheoperandandtheopcode,andindicatehowfaritcanbranch.
9.Trueorfalse.Allconditionalbranchesare2-byteinstructions.
10.Showcodeforanestedlooptoperformanaction10,000,000,000times.
11.Showcodeforanestedlooptoperformanaction200,000,000,000times.
12.Findthenumberoftimesthefollowingloopisperformed:
MOVR0,#0x55
MOVR2,#40
L1LDRR1,=10000000
L2EORR0,R0,#0xFF
SUBR1,R1,#1
BNEL2
SUBR2,R2,#1
BNEL1
13.IndicatethestatusofZandCafterCMPisexecutedineachofthefollowingcases.
(a)
MOVR0,#50
MOVR1,#40
CMPR0,R1
(b)
MOVR1,#0xFF
MOVR2,#0x6F
CMPR1,R2
(c)
MOVR2,#34
MOVR3,#88
CMPR2,R3
(d)
SUBR1,R1,R1
MOVR2,#0
CMPR1,R2
(e)
EORR2,R2,R2
MOVR3,#0xFF
CMPR2,R3
(f)
EORR0,R0,R0
EORR1,R1,R1
CMPR0,R1
(g)
MOVR4,#0x78
MOVR2,#0x40
CMPR4,R2
(h)
MOVR0,#0xAA
ANDR0,R0,#0x55
CMPR0,#0
14.RewriteProgram4-1tofindthelowestgradeinthatclass.
15.ThetargetaddressofaBNEisbackwardiftherelativeaddressportionofopcodeis_______(negative,positive).
16.ThetargetaddressofaBNEisforwardiftherelativeaddressportionofopcodeis_______(negative,positive).
Section4.2:CallingSubroutinewithBL
17.BLisa(n)___-byteinstruction.
18.InARM,whichregisteristhelinkerregister?
19.Trueorfalse.TheBLtargetaddresscanbeanywhereinthe4Gbyteaddressspace.
20.DescribehowwecanreturnfromasubroutineinARM.
21.InARM,whichaddressissavedwhenBLinstructionisexecuted.
Section4.3:ARMTimeDelayandInstructionPipeline
22.Findtheoscillatorfrequencyifthemachinecycle=1.25ns.
23.Findthemachinecycleifthecrystalfrequencyis200MHz.
24.Findthemachinecycleifthecrystalfrequencyis100MHz.
25.Findthemachinecycleifthecrystalfrequencyis160MHz.
26.FindthetimedelayforthedelaysubroutineshownbelowifthesystemhasanARMwithafrequencyof80MHz:
MOVR8,#200
BACKLDRR1,=400000000
HERENOP
SUBSR1,R1,#1
BNEHERE
SUBSR8,R8,#1
BNEBACK
27.FindthetimedelayforthedelaysubroutineshownbelowifthesystemhasanARMwithafrequencyof50MHz:
MOVR2,#100
BACKLDRR0,=50000000
HERENOP
NOP
SUBSR0,R0,#1
BNEHERE
SUBSR2,R2,#1
BNEBACK
28.FindthetimedelayforthedelaysubroutineshownbelowifthesystemhasaARMwithafrequencyof40MHz:
MOVR1,#200
BACKLDRR0,#20000000
HERENOP
NOP
NOP
SUBSR0,R0,#1
BNEHERE
SUBSR1,R1,#1
BNEBACK
29.FindthetimedelayforthedelaysubroutineshownbelowifthesystemhasanARMwithafrequencyof100MHz:
MOVR8,#500
BACKLDRR1,=20000
HERENOP
NOP
NOP
SUBSR1,R1,#1
BNEHERE
SUBSR8,R8,#1
BNEBACK
30.TheARMchipdoesnothavetheNOPinstruction.ExaminethelistfilegeneratedbyyourARMassemblertoseewhatinstructionisusedinplaceoftheNOP.
Section4.4:ConditionalExecution
31.WhichbitsoftheARMinstructionaresetasideforconditionexecution?
32.Trueorfalse.OnlyADDandMOVinstructionshaveconditionalexecutionfeature.
33.Trueorfalse.InARM,theconditionalexecutionisdefault.
34.WhichflagbitisexaminedbeforetheMOVEQinstructionisexecuted?
35.StatethedifferencebetweentheADDEQandADDNEinstructions.
36.StatethedifferencebetweentheBALandBinstructions.
37.StatethedifferencebetweentheSUBCCandSUBCSinstructions.
38.StatethedifferencebetweentheANDEQandANDNEinstructions.
39.Trueorfalse.ThedecisiontoexecutetheSUBCCisbasedonthestatusofZflag.
40.Trueorfalse.ThedecisiontoexecutetheADDEQisbasedonthestatusofZflag.
AnswerstoReviewQuestions1.BranchifnotEqual
2.True
3.4
4.ZflagofCPSR(statusregister)
5.4
6.TheBcanonlybranchtoanaddresslocationwithin32MBaddressspace,whiletheBXcangoanywhereinthe4GBaddressspaceofARM.
Section4.2:CallingSubroutinewithBL
1.BranchandLink
2.True
3.4
4.32
5.R14
6.Inbothofthemthetargetaddressisrelativetothevalueoftheprogramcounterandtherelativeaddresscancovermemoryspaceof–32MBto+32MBfromcurrentlocationofprogramcounter.TheBLinstructionsavestheaddressofthenextinstructionintheLRregisterbeforejumping,whiletheBinstructionjustjumpswithoutsavinganything.
Section4.3:ARMTimeDelayandInstructionPipeline
1.True
2.1
3.MC=1/66MHz=0.015ms=15ns
4.[50,000,000×(1+1+1+1+1+1+3)]×10ns=4.5seconds
5.MachineCycle=1/50MHz=0.02ms=20ns
6.True
7.False
8.False.Ittakes3cycles,onlyifitbranchestothetargetaddress.
Section4.4:ConditionalExecution
1.True
2.4bits
3.ADDalwaysregardlessofthestatusflag.
4.True
5.True
Chapter5:SignedNumbersandIEEE754FloatingPointThischapterdealswithsignednumber instructionsandoperations. InSection5.1,
we focus on the concept of signed numbers in software engineering. Signed numberarithmeticoperationsand instructionsareexplainedalongwithexamples inSection5.2.TheIEEE754floatingpointwillbeexplainedinSection5.3.
Section5.1:SignedNumbersConceptAlldataitemsusedsofarhavebeenunsignednumbers,meaningthattheentire8-
bit,16-bitor32-bitoperandwasusedforthemagnitude.Manyapplicationsrequiresigneddata.Inthissectiontheconceptofsignednumbersisdiscussed.
Conceptofsignednumbersincomputers
Ineverydaylife,numbersareusedthatcouldbepositiveornegative.Forexample,atemperatureof5degreesbelowzerocanberepresentedas-5,and20degreesabovezeroas +20. Computersmust be able to accommodate such numbers. To do that, computerscientistshavedevisedthefollowingarrangementfortherepresentationofsignedpositiveandnegativenumbers:Themostsignificantbit(MSB)issetasideforthesign(+or-)andtherestofthebitsareusedforthemagnitude.Thesignisrepresentedby0forpositive(+)numbers and 1 for negative (-) numbers. Signed byte and word representations arediscussedbelow.
Signedbyteoperands
Insignedbyteoperands,D7(MSB) is thesignandD0toD6aresetasidefor themagnitudeofthenumber.IfD7=0,theoperandispositive,andifD7=1,itisnegative.
Therangeofpositivenumbersthatcanberepresentedbytheaboveformatis0to+127.
Dec. Binary
0 00000000
+1 00000001
… ….….
+5 00000101
… ….….
+127 01111111
Dec. Binary
-0 10000000
-1 10000001
… ….….
-5 10000101
… ….….
-127 11111111
Negativenumbersusing2’scomplement
TosaveALUcircuitry,weuse2’scomplementmethod to implement thenegativenumbers. For negative numbers, D7 is 1 but the non-sign (magnitude) portion isrepresented in 2’s complement. Although the assembler does the conversion, it is stillimportant to understand how the conversion works. To convert to negative numberrepresentation(2’scomplement),followthesesteps:
1.Writethemagnitudeofthenumberin8-bitbinary(nosign).
2.Inverteachbit.
3.Add1toit.
Dec. Binary
0 00000000
+1 00000001
… ….….
+5 00000101
… ….….
+127 01111111
Dec. Binary
0 00000000
-1 11111111
… ….….
-5 11111011
… ….….
-127 10000001
-128 10000000
Examples5-1,5-2,and5-3demonstratethesethreesteps.
Example5-1
Showhowthecomputerwouldrepresent-5.
Solution:
1.000001015in8-bitbinary
2.11111010inverteachbit
3.11111011add1(0xFB)
Thisisthesignednumberrepresentationin2’scomplementfor-5.
Example5-2
Show-34hexasitisrepresentedinternally.
Solution:
1.00110100
2.11001011
3.11001100(whichis0xCC)
Example5-3
Showtherepresentationfor-12810.
Solution:
1.10000000
2.01111111
3.10000000Noticethatthisisnotnegativezero(–0).
Fromtheexamplesaboveitisclearthattherangeofbyte-sizednegativenumbersis-1to-128.Thefollowinglistsbyte-sizedsignednumberranges:
Decimal Binary Hex
-128 10000000 80
-127 10000001 81
-126 10000010 82
… ….… ..
-2 11111110 FE
-1 11111111 FF
0 00000000 00
00000001 01
+2 00000010 02
… ….…. ..
+127 01111111 7F
Halfword-sizedsignednumbers
InARMCPUahalf-wordis16bitsinlength.SettingasidetheMSB(D15)forthesignleavesatotalof15bits(D14–D0)forthemagnitude.Thisgivesarangeof–32,768(–215)to+32,767(215–1).
Ifanumber is larger than16-bit, itmustbe treatedasa32-bitwordoperand.Thefollowingshowstherangeofsignedhalf-wordoperands.Toconvertanegativenumbertoits half-word operand representation, the steps discussed in negative byte operands areused.
Decimal Binary Hex
-32,768 1000000000000000 8000
-32,767 1000000000000001 8001
-32,766 1000000000000010 8002
… ….… ..
-2 1111111111111110 FFFE
-1 1111111111111111 FFFF
0 0000000000000000 0000
+1 0000000000000001 0001
+2 0000000000000010 0002
… ….…. …
+32,766 0111111111111110 7FFE
+32,767 0111111111111111 7FFF
UsingMicrosoftWindowscalculatorforsignednumbers
AllMicrosoftWindowsoperatingsystemscomewithahandycalculator.Useit toverifythesignednumberoperationsinthissection.
Word-sizedsignednumbers
InARMCPUs,awordis32bitsinlength.SettingasidetheMSB(D31)forthesignleavesatotalof31bits(D30–D0)forthemagnitude.Thisgivesarangeof-(231)to+(231-1).
To convert a negative number to its word operand representation, the three stepsdiscussedinnegativebyteoperandsareused.SeeExample5-4.
Example5-4
Showhowthecomputerwouldrepresent-5for(a)8-bit,(b)16-bit,and(c)32-bitdatasizes.
Solution:
(a)8-bit
1.000001015in8-bitbinary
2.11111010inverteachbit
3.11111011add1(0xFB)
(b)16-bit
1.00000000000001015in16-bitbinary
2.1111111111111010inverteachbit
3.1111111111111011add1(0xFFFB)
(c)32-bit
1.000000000000000000000000000001015in32-bitbinary
2.11111111111111111111111111111010inverteachbit
3.11111111111111111111111111111011add1(0xFFFFFFFB)
UsetheWindowscalculatortoverifytheseexamples.
Ifanumberislargerthan32-bit,itmustbetreatedasa64-bitdoublewordoperandand be processed chunk by chunk the same way as unsigned numbers. The followingshowstherangeofsignedwordoperands.
Decimal Binary Hex
-2,147,483,648 10000000000000000000000000000000 80000000
-2,147,483,647 10000000000000000000000000000001 80000001
-2,147,483,646 10000000000000000000000000000010 80000002
… … …
-2 11111111111111111111111111111110 FFFFFFFE
-1 11111111111111111111111111111111 FFFFFFFF
0 00000000000000000000000000000000 00000000
+1 00000000000000000000000000000001 00000001
+2 00000000000000000000000000000010 00000002
… … …
+2,147,483,646 01111111111111111111111111111110 7FFFFFFE
+2,147,483,647 01111111111111111111111111111111 7FFFFFFF
Table5-1showsasummaryofsigneddataranges.
DataSize Bits 2n Decimal Hexadecimal
Byte 8 -27to+27-1 -128to+127 0x80–0x7F
Half-word 16 -215to+215-
1-32,768to+32,767 0x8000–0x7FFF
Word 32 -231to+231-1 -2,147,483,648to+2,147,483,647
0x80000000–0x7FFFFFFF
Table5-1:SignedDataRangeSummary
ReviewQuestions
1.Inan8-bitoperand,bit_____isusedforthesignbit,whereasina16-bitoperand,bit_____isusedforthesignbit.Repeatfor32-bitsigneddata.
2.Computethebyte-sized2’scomplementof0x16.
3.Therangeofbyte-sizedsignedoperandsis-______to+_____.Therangeofhalfword-sizedsignedoperandsis-__________to+__________.
4.Therangeofword-sizedsignedoperandsis-__________to+__________.
5.Computethe2’scomplementof0x500000.
Section5.2:SignedNumberInstructionsandOperationsIn this section we examine issues associated with signed number arithmetic
operations.WewillalsodiscusstheARMinstructionsforsignednumbersandhowtousethem. Itmust be noted that inARM theN flag bit inCSPR is the signbit.N=0 is forpositiveandN=1fornegativenumbers.
Overflowprobleminsignednumberoperations
Whenusingsignednumbers,aseriousproblemarisesthatmustbedealtwith.Thisistheoverflowproblem.TheCPUindicatestheexistenceoftheproblembyraisingtheV(oVerflow) flag,but it isup to theprogrammer to takecareof it.TheCPUunderstandsonly0sand1sandignoresthehumanconventionofpositiveandnegativenumbers.Nowwhatisanoverflow?Iftheresultofanoperationonsignednumbersistoolargefortheregister,anoverflowoccursandtheprogrammermustbenotified.LookatExample5-5.
Example5-5
Lookatthefollowingcasefor8-bitdatasize:
+96 01100000
+70 +01000110
+166 10100110
AccordingtotheCPU,thisis-90,whichiswrong.(V=1,N=1,C=0)
Intheexampleabove,+96isaddedto+70andtheresultaccordingtotheCPUis- 90. Why? The reason is that the result was more than 8 bits could handle. The 8-bitregisterscanonlycontainupto+127.ThedesignersoftheCPUcreatedtheoverflowflagspecifically for the purpose of informing the programmer that the result of the signednumberoperationiserroneous.
Whentheoverflowflagissetin8-bitoperations
In 8-bit signed number operations, V is set to 1 if either of the following twoconditionsoccurs:
1.ThereisacarryfromD6toD7butnocarryoutofD7(C=0).
2.ThereisacarryfromD7out(C=1)butnocarryfromD6toD7.
Inotherwords,theoverflowflagissetto1ifthereisacarryfromD6toD7orfromD7out,butnotboth.ThismeansthatifthereisacarrybothfromD6toD7andfromD7out,V=0.InExample5-5,sincethereisonlyacarryfromD6toD7andnocarryfromD7out,V=1.Examples5-6,5-7,and5-8givefurtherillustrationsoftheoverflowflaginsignednumberarithmetic.
Example5-6
Examinethefollowingcase:
-128 10000000
+-2 +11111110
-130 01111110 V=1,N=0(positive),C=1
AccordingtotheCPU,theresultis+126,whichiswrong.TheerrorisindicatedbythefactthatV=1.
Example5-7
Observetheresultsofthefollowing:
;assumeR3=+7(R3=0x07)
;assumeR2=+18(R2=0x12)
ADDR2,R2,R3;(R2=0x19=+25,correct)
+7 00000111
++18 +00010010
+25 00011001
V=0,C=0,andN=0(positive).
Example5-8
Observetheresultsofthefollowing:
-2 11111110
+-5 11111011
-7 11111001
V=0,C=0,andN=1(negative);theresultiscorrectsinceV=0.
Overflowflagin16-bitoperations
Ina16-bitoperation,Vissetto1ineitheroftwocases:
1.ThereisacarryfromD14toD15butnocarryoutofD15(C=0).
2.ThereisacarryfromD15out(C=1)butnocarryfromD14toD15.
Againtheoverflowflagislow(notset)ifthereisacarryfrombothD14toD15andfromD15out.TheVissetto1onlywhenthereisacarryfromD14toD15orfromD15out,butnotfromboth.SeeExamples5-9and5-10.
Example5-9
Observetheresultsinthefollowing16-bithexnumbers:
+6E2F 0110111000101111
++13D4 0001001111010100
+8203 1000001000000011
=–0x7DFD=–32,253incorrect!
V=1,C=0,N=1
Example5-10
Observetheresultsinthefollowing16-bithexnumbers:
+542F 0101010000101111
++12E0 +0001001011100000
+670F 0110011100001111=+0x670F=+26,383(correctanswer);
V=0,C=0,N=0
Overflowflagin32-bitoperations
Ina32-bitoperation,Vissetto1ineitheroftwocases:
1.ThereisacarryfromD30toD31butnocarryoutofD31(C=0).
2.ThereisacarryfromD31out(C=1)butnocarryfromD30toD31.
Againtheoverflowflagislow(notset)ifthereisacarryfrombothD30toD31andfromD31out.TheVissetto1onlywhenthereisacarryfromD30toD31orfromD31
out,butnotfromboth.SeeExamples5-11and5-12.
Example5-11
Observetheresultsinthefollowing32-bithexnumbers:
+6E2F356F 01101110001011110011010101101111
++13D49530 +00010011110101001001010100110000
+8203CA9F 10000010000000111100101010011111 =–0x7DFC3561
incorrect!V=1,C=0,N=1
Example5-12
Observetheresultsinthefollowing32-bithexnumbers:
+542F356F 01010100001011110011010101101111
++12E09530 00010010111000001001010100110000
+670FCA9F 01100111000011111100101010011111 =+670FCA9F
correctanswer;V=0,C=0,N=0
Signextensionandavoidingerroneousresultsinsignednumberoperations
To avoid the problems associated with signed number operations, one can sign-extendtheoperand.Signextensioncopiesthesignbit(D7)ofthelowerbyteofaregisterintotheupperbitsofthe32-bitregister,orcopiesthesignbitofa16-bitregisterinto32-bit.TheLDRSB(loadregistersignedbyte)instructionloadsintotheregisterabytefrommemoryandsignextendstheD7totheentire32-bitregister.TheLDRSH(loadregistersignedhalf-word) instruction loads into the register a half-word frommemory and signextendstheD15totheentire32-bitregister.Theyworkasfollows:
LDRSB loads into the destination register a byte frommemory and sign extends(copyD7,thesignflag)toall32bits.ThisisillustratedinFigure5-1.
Figure5-1:SignExtendingaByte
Lookatthefollowingexample:
;assumelocation0x80000has+96=01100000andR1=0x80000
LDRSBR0,[R1];nowR0=00000000000000000000000001100000
;assumelocation0x80000contains-2=11111110andR2=0x80000
LDRSBR4,[R2];nowR4=11111111111111111111111111111110
Ascanbeseenintheaboveexamples,LDRSBdoesnotalterthelower8bits.Thesignof8bitsiscopiedtotherestofthe32-bitregister.
LDRSHloadsthedestinationregisterwitha16-bitsignednumberandsign-extendsto the32-bit register. It copiesD15ofRd toallbitsof theRd register.This isused forsignedhalf-wordoperandandisillustratedinFigure5-2.
Figure5-2:SignExtendingaHalf-word
Lookatthefollowingexample:
;assume0x80000contains+260=0000000100000100andR1=0x80000
LDRSHR0,[R1];R0=00000000000000000000000100000100
Anotherexample:
;assumelocation0x20000has-327660=0x8002andR2=0x20000
LDRSHR1,[R2];R1=FFFF8002
Ascanbeseenintheaboveexamples,LDRSHdoesnotalterthelower16bits.Thesignof16bitsiscopiedtotherestofthe32-bitregister.Howcantheseinstructionshelpcorrect the overflow error?To answer that question,Example 5-13 showsExample 5-5rewrittentocorrecttheoverflowproblem.
Example5-13
WriteaprogramforExample5-5tohandletheoverflowproblem.
Solution:
AREAEXAMPLE5_13,CODE,READONLY
ENTRY
LDRR1,=DATA1
LDRR2,=DATA2
LDRR3,=RESULT
LDRSBR4,[R1];R4=+96
LDRSBR5,[R2];R5=+70
ADDR4,R4,R5;R4=R4+R5=96+70=+166
STRR4,[R3];Store+166inlocationRESULT
HLBHL
DATA1DCB+96
DATA2DCB+70
AREAVARIABLES,DATA,READWRITE;ThefollowingisstoredinRAM
RESULTDCW0
END
ThefollowingisananalysisofthevaluesinExample5-13.Eachissign-extendedandthenaddedasfollows:
Sign Binarynumbers Decimal
0 0000000000000000000000001100000 +96aftersignext.
0 0000000000000000000000001000110 +70aftersignext.
0 0000000000000000000000010100110 +166
Asarule,ifthepossibilityofoverflowexists,allbyte-sizedsignednumbersshouldbesign-extendedintoaword,andsimilarly,allhalfword-sizedsignedoperandsshouldbesign-extended to a word before they are processed. This is shown in Program 5-1.Program5-1findstotalsumofagroupofsignednumberdata.
Program5-1
;Thisprogramcalculatesthesumofsignednumbers
AREAPROG5_1,CODE,READONLY
ENTRY
LDRR0,=SIGN_DAT
MOVR3,#9
MOVR2,#0
LOOPLDRSBR1,[R0]
;LoadintoR1andsignextendit.
ADDR2,R2,R1;R2=R2+R1
ADDR0,R0,#1;pointtonext
SUBSR3,R3,#1;decrementcounter
BNELOOP
LDRR0,=SUM
STRR2,[R0];StoreR2inlocationSUM
HEREBHERE
SIGN_DATDCB+13,-10,+19,+14,-18,-9,+12,-19,+16
AREAVARIABLES,DATA,READWRITE
SUMDCD0
END
Signednumbermultiplication
Signed number multiplication is similar in its operation to the unsignedmultiplication described in Chapter 3. The only difference between them is that theoperands in signed number operations can be positive or negative; therefore, the resultmust indicate the sign. InARMwehaveSMULL(signedmultiply long)butnoSMUL(signmultiply).Table5-2summarizessignednumbermultiplication;itissimilartoTable3-3inChapter3.SeeExamples5-14and5-15.
Multiplication Operand1 Operand2 Result
word×word Rm Rs RdHi=upper32-bit,RdLo=lower32-bit
Note:UsingSMULL(signedmultiplylong)forword×wordsmultiplicationprovidesthe64-bitresultinRdLoandRdHiregister.Thisisusedfor32-bit×32-bitnumbersinwhichresultcangobeyond0xFFFFFFFF.
Table5-2:SignedMultiplication(SMULLRdLo,RdHi,Rm,Rs)Summary
Example5-14
Observetheresultsofthefollowingmultiplicationofsignednumbers:
LDRR1,=-3500;R1=-3500(0xFFFFF254)
LDRR0,=-100;R0=-100(0xFFFFFF9C)
SMULLR2,R3,R0,R1
Solution:
-3500×-100=350,000=55730inhex.AfterexecutingtheaboveprogramR2andR3willcontain0x55730and00000000,respectively.
Example5-15
ThefollowingprogramissimilartoExample5-14.But,insteadofSMULL,theUMULLinstructionisused.Observetheresultsofthefollowingmultiplication:
LDRR1,=-3500;R1=-3500(0xFFFFF254)
MOVR0,#-100;R0=-100(0xFFFFFF9C)
UMULLR2,R3,R0,R1
Solution:
0xFFFFF254×0xFFFFFF9C=0xFFFFF1F000055730.Thus,R2andR3willcontain0x00055730and0xFFFFF1F0,respectively.Asyoucansee,theresultsoftheprogramsarecompletelydifferent.InthepreviousprogramtheSMULLinstructionconsiderstheoperandssignednumbersandtheresultoftwonegativenumbersbecomespositive.Asaresult,thesignbitbecomeszero,butinthisexampletheoperandsareconsideredasunsignednumbers.
Signednumbercomparison
InChapter4wesawthat theCMPinstructionaffects theZandCflags;usingtheflagswecomparedunsignednumbers.ThisinstructionaffectstheNandVflags,aswell;We can use flags Z, V, and N to compare signed numbers. The Z flag shows if thenumbersareequalornot.WhenthenumbersareequaltheZflagissettoone.NandVflagsshowiftheleftoperandisbiggerthantherightoperandornot.WhenNandVhavethesamevalue,thefirstoperandhasagreatervalue.
Insummary,afterexecutingtheinstructionCMPRn,Op2 theflagsarechangedasfollows:
Op2>RnV=N
Op2=RnZ=1
Op2<RnN≠V
Table 5-3 lists the branch instructions which check the Z, V, and N flags. TheinstructionscanbeusedtogetherwiththeCMPinstructiontocomparesignednumbers.
Instruction Action
BEQ branchequal BranchifZ=1
BNE Branchnotequal BranchifZ=0
BMI Branchminus(branchnegative) BranchifN=1
BPL Branchplus(branchpositive) BranchifN=0
BVS BranchifVset(branchoverflow) BranchifV=1
BVC BranchifVclear(branchifnooverflow) BranchifV=0
BGE Branchgreaterthanorequal BranchifN=V
BLT Branchlessthan BranchifN≠V
BGT Branchgreaterthan BranchifZ=0andN=V
BLE Branchlessthanorequal BranchifZ=1orN≠V
Table5-3:ARMConditionalBranch(Jump)InstructionsforSignedData
Program5-2findsthelowestnumberamongalistofnumbers.
Program5-2
;Findingthelowestofsignednumbers
AREAPROG5_2,CODE,READONLY
ENTRY
LDRR0,=SIGN_DAT
MOVR3,#8
LDRSBR2,[R0]
;bringintoR2thefirstsignnumberandsignextendit
ADDR0,R0,#1;pointtonext
BEGINLDRSBR1,[R0];R1=contentsofloc.pointedtobyR0
CMPR1,R2;compareR1andR2
BGENEXT
;branchtoNEXTifR1isgreaterthanorequaltoR2
MOVR2,R1
;R2=R1(usethenewnumberforcomparison)
NEXTADDR0,R0,#1;pointtonext
SUBSR3,R3,#1;decrementcounter
BNEBEGIN;ifR3isnotzerobranchBEGIN
LDRR0,=LOWEST;R0=addressofLOWEST
STRR2,[R0];storeR2inlocationSUM
HEREBHERE
SIGN_DATDCB+13,-10,+19,+14,-18,-9,+12,-19,+16
AREAVARIABLES,DATA,READWRITE
LOWESTDCD0
END
;Noticethatweusethefirstsignnumberasbasisfor
;comparisonandkeepcomparingitwithothernumbers.
;Anytimeanyofthenumberissmallerweuseitasbasis
;forcomparison.
CMNinstruction
CMNRn,Op2
In ARM we have two compare instructions: CMP and CMN. While the CMPinstructionsetstheflagsbysubtractingthesourceoperandfromthedestination,theCMNsetstheflagsbyaddingsourcetodestination.AsaresultCMNcomparesthedestinationoperandwiththenegativeofthesourceoperand:
destination>(-1×source)V=N
destination=(-1×source)Z=1
destination<(-1×source)N=negationofV
When the source operand is an immediate value, the instructions can be usedinterchangeably.Example5-16isanexampleofusingtheCMNinstruction.
Example5-16
AssumingR5hasapositivevalue,writeaprogramthatfindsitsnegativematchinanarrayofdata(OUR_DATA).
Solution:
AREAEXAMPLE5_16,CODE,READONLY
ENTRY
MOVR5,#13
LDRR0,=OUR_DATA
MOVR3,#9
BEGIN
LDRSBR1,[R0];R1=contentsofloc.pointedtobyR0
;(signextended)
CMNR1,R5;compareR1andnegativeofR5
BEQFOUND;branchifR1isequaltonegativeofR5
ADDSR0,R0,#1;incrementpointer
SUBSR3,R3,#1;decrementcounter
BNEBEGIN;ifR3isnotzerobranchBEGIN
NOT_FOUNDBNOT_FOUND
FOUNDBFOUND
OUR_DATADCB+13,-10,-13,+14,-18,-9,+12,-19,+16
END
IntheaboveprogramR5isinitializedwith13.Therefore,itfinishessearchingwhenitgetsto– 13.
Arithmeticshift
AswasdiscussedinChapter3,therearetwotypesofshifts:logicalandarithmetic.Logical shift, which is used for unsigned numbers, was discussed in Chapter 3. Thearithmetic shift is used for signednumbers. It is basically the same as the logical shift,exceptthatthesignbitiscopiedtotheshiftedbits.
ASR(arithmeticshiftright)
MOVRn,Op2,ASRcount
AsthebitsofthesourceareshiftedtotherightintoC,theemptybitsarefilledwiththesignbit.OnecanusetheASRinstructiontodivideasignednumberby2,asshownbelow:
MOVR0,#-10;R0=-10=0xFFFFFFF6
MOVR3,R0,ASR#1;R0isarithmeticshiftedrightonce
;R3=0xFFFFFFFB=-5
ReviewQuestions
1.Explainthedifferencebetweenanoverflowandacarry.
2.ExplainthepurposeoftheLDRSBandLDRSHinstructions.DemonstratetheeffectofLDRSBonR0=0xF6.DemonstratetheeffectofLDRSHonR1=0x124C.
3.Theinstructionforsignedmultiplicationis________.
4.Foreachofthefollowinginstructions,indicatetheflagconditionnecessaryforeachbranchtooccur:(a)BLE(b)BGT
Section5.3:IEEE754Floating-PointStandardsInthissectionwestudytheIEEEstandardforfloating-pointnumbers.
IEEEfloating-pointstandard
Uptothelate1970s,realnumbers(numberswithdecimalpoints)wererepresenteddifferentlyinbinaryformbydifferentcomputermanufacturers.Thismademanyprogramsincompatible for different machines. In 1980, an IEEE committee standardized thefloating-point data representation of real numbers. This standard, much of which wascontributedbyIntelbasedonthe8087mathcoprocessor,recognizedtheneedfordifferentdegreesofprecisionbydifferentapplications;therefore,itestablishedsingleprecisionanddoubleprecision.Sincealmostallsoftwareandhardwarecompaniesnowabidebythesestandards,eachoneisexplainedthoroughly.
IEEE754single-precisionfloating-pointnumbers
IEEEsingle-precision floating-pointnumbersuseonly32bitsofdata to representany real number range 2128 to 2-126, for both positive and negative numbers. Thistranslatesapproximatelytoarangeof1.2×10-38to3.4×10+38indecimalnumbers,againforbothpositiveandnegativevalues.Insomemathcoprocessorterminology,thesesingle-precision32-bitfloating-pointnumbersarereferredtoasshortreal.Assignmentofthe32bitsinthesingle-precisionformatisshowninFigure5-3.
Figure5-3:IEEE754Single-precisionFloating-pointNumbers
Tomakethehardwaredesignofthemathprocessorsmucheasierandlesstransistorconsuming, the exponent part is added to a constant of 0x7F (127 decimal). This isreferred to as a biased exponent. Conversion from real to floating point involves thefollowingsteps.
1.Therealnumberisconvertedtoitsbinaryform.
2.Thebinarynumberisrepresentedinscientificform:1.xxxxEyyyy
3.Bit31iseither0forpositiveor1fornegative.
4.Theexponentportion,yyyy,isaddedto7Ftogetthebiasedexponent,whichisplacedinbits23to30.
5.Thesignificand,xxxx,isplacedinbits22to0.
Examples5-17,5-18,and5-19demonstratethisprocess.
Example5-17
Convert9.7510tosingle-precision(shortreal)floatingpoint.
Solution:
decimal9.75=binary1001.11=scientificbinary1.00111E3
Signbit31is0forpositive.
Exponentbits30to23are10000010(3+7F=82H)afterbiasing.
Significandbits22to0are001110000000000000000…00.
Puttingitalltogethergivesthefollowingbinaryform,underwhichiswrittenthehexform:
01000001000111000000000000000000
411C0000
ThiscanbeverifiedbyusinganassemblersuchasKeil.
Example5-18
Convert0.07812510toshortIEEEfloating-pointstandardrealFP(singleprecision).
Solution:
decimal0.078125=binary0.000101=scientificbinary1.01E–4
Signbit31is0forpositive.
Exponentbits30–23are01111011(–4+7F=7B)afterbiasing.
Significandbits22–0are01000000….000.
Thisnumberwillberepresentedinbinaryandhexas
00111101101000000000000000000000
3DA00000
Example5-19
Convert–96.2710tosingle-precisionFPformat.
Solution:
decimal96.27=binary1100000.01000101000111101=
scientificbinary1.10000001000101000111101E6
Signbit31is1fornegative.
Exponentbits30–23are10000101(6+7F=85H)afterbiasing,
Fractionbits22–0are10000001000101000111101,
Thefinalforminbinaryandhexis
11000010110000001000101000111101
C2C08A3D
Itmustbenotedthatconversionofthedecimalportion0.27tobinarycanbecontinuedbeyondthepointshownabove,butbecausethefractionpartofthesingleprecisionislimitedto23bits,thiswasallthatwasshown.Forthatreason,double-precisionFPnumbersareusedinsomeapplicationstoachieveahigherdegreeofaccuracy.
IEEE754double-precisionfloating-pointnumbers
Double-precisionFP(calledlongrealbyIntel)canrepresentnumbersintherange2.3×10-308to1.7×10308,bothpositiveandnegative.Atotalof52bits(bits0to51)areforthesignificand,11bits(bits52to62)arefortheexponent,andfinally,bit63isforthesign.Theconversionprocess is the sameas for singleprecision in that the realnumbermustfirstberepresentedas1.xxxxxxxEYYYY,thenYYYYisaddedto3FFtogetthebiasedexponent.SeeFigure5-4andExample5-20.
Figure5-4:IEEE754Double-precisionFloating-pointNumbers
Example5-20
Convert152.187510todouble-precisionFP.
Solution:
decimal152.1875=binary10011000.0011=
scientificbinary1.00110000011E7
Bit63is0forpositive.
Exponentbits62–53are10000000110(7+3FF=406)afterbiasing.
Fractionbits52–0are00110000011000…..000.
0100 0000 0110 0011 0000 0110 0000 0000 0000 … 0000
4 0 6 3 0 6 0 0 0 … 0
Thisexamplewillbeverifiedbyanassemblerinthenextsection.
MathcoprocessorinARM
Usingageneral-purposemicroprocessorsuchastheARMtoperformmathematicalfunctionssuchasfloatingpointcalculation,log,sine,andothersisverytimeconsuming,notonlyfortheCPUbutalsoforprogrammerswritingsuchprograms.Intheabsenceofamath coprocessor, programmers must write subroutines using ARM instructions formathematicalfunctions.Althoughsomeofthesesubroutinesarealreadywritten,nomatterhow good the subroutine, its CPU run time will still be quite long. To acceleratecomplicated mathematical and floating point calculation, math or floating pointcoprocessorsareused.Asimplervariationofmathcoprocessoriscalledfloatingpointunit(FPU) and someARMchips comewith on-chip FPU. InKeil there is a library namedfplibtodofloatingpointcalculation.
ReviewQuestions
1.Single-precisionIEEEFPstandarduses_________bitstorepresentdata.
2.Double-precisionIEEEFPstandarduses________bitstorepresentdata.
3.TogetthebiasedexponentportionofIEEEsingle-precisionfloating-pointdataweadd________.
4.TogetthebiasedexponentportionofIEEEdouble-precisionfloating-pointdataweadd________.
5.Trueorfalse.Intheabsenceofamathprocessor,thegeneral-purposeprocessormustperformallmathcalculations.
6.Trueorfalse.AlloftheARMchipscomewiththeFPU.
Problems:Section5.1:SignedNumbersConcept
1.Showhowthe32-bitcomputerswouldrepresentthefollowingnumbersandverifyeachwithacalculator.
(a)-23 (b)+12 (c)-0x28
(d)+0x6F (e)-128 (f)+127
(g)+365 (h)-32,767
2.Showhowthe32-bitcomputerswouldrepresentthefollowingnumbersandverifyeachwithacalculator.
(a)-230 (b)+1200 (c)-0x28F
(d)+0x6FF
Section5.2:SignedNumberInstructionsandOperations
3.FindtheoverflowflagforeachcaseandverifytheresultusinganARMIDE.Dobyte-sizedcalculationonthem.
(a)(+15)+(-12) (b)(-123)+(-127) (c)(+0x25)+(+34)
(d)(-127)+(+127) (e)(+100)+(-100)
4.Sign-extendthefollowingandwritesimpleprogramsinusingARMIDEtoverifythem.
(a)-122 (b)-0x999 (c)+0x17
(d)+127 (e)-129
5.ModifyProgram5-2tofindthehighesttemperature.Verifyyourprogram.
Section5.3:IEEE754Floating-PointStandards
6.Whatisthedisadvantageofusingageneral-purposeprocessortoperformmathoperations?
7.ShowthebitassignmentoftheIEEEsingle-precisionstandard.
8.Convert(byhandcalculation)eachofthefollowingrealnumberstoIEEEsingle-precisionstandard.
(a)15.575(b)89.125(c)–1022.543(d)–0.00075
9.ShowthebitassignmentoftheIEEEdouble-precisionstandard.
10.Insingle-precisionFP(floatingpoint),thebiasedexponentiscalculatedbyadding________tothe_________portionofascientificbinarynumber.
11.Indouble-precisionFP,thebiasedexponentiscalculatedbyadding________tothe_________portionofascientificbinarynumber.
12.Convertthefollowingtodouble-precisionFP.
(a)12.9375(b)98.8125
AnswerstoReviewQuestionsSection5.1
1.D7,D15,andD31for32-bitsigneddata.
2.0x16=00010110;its2’scomplementis:11101010
3.–128to+127;–32,768to+32,767(decimal)
4.-2,147,483,648to+2,147,483,647
5.0x500000=010100000000000000000000;
Its2’scomplementis:101100000000000000000000
Section5.2
1.Cflagisraisedwhenthereisacarryoutoftheresult,butVflagisraisedwhenthereisacarrytothesignbitandnocarryoutofthesignbitorwhenthereisnocarrytothesignbitandthereisacarryoutofthesignbit.CflagisusedtoindicateoverflowinunsignedarithmeticoperationswhileVflagisinvolvedinsignedoperations.
2.TheLDRSBinstructionsignextendsthesignbitofabyteintoaword;theLDRSHinstructionsignextendsthesignbitofahalf-wordintoaword.
In0xF6thesignbitis1;thus,itissign-extendedinto0xFFFFFFF6
0x124Csign-extendedintoR1wouldbeR0=0x0000124C.
3.SMULL
(a)BLEwilljumpifVistheinverseofN,orifZ=1.
(b)BGTwilljumpifVequalsN,andifZ=0.
Section5.3
1.32
2.64
3.0x7F
4.0x3FF
5.True
6.False
Chapter6:ARMMemoryMap,MemoryAccess,andStackThis chapter discusses the issue of memory access and the stack. Section 6.1 is
dedicatedtoARMmemorymapandmemoryaccess.Wewillalsoexplaintheconceptsofalign,non-align,littleendian,andbigendiandataaccess.InSection6.2,weexaminetheuseofthestackinARM.Wediscussthebit-addressable(bit-band)SRAMandperipheralsinSection6.3.AdvancedindexedaddressingmodeisexplainedinSection6.4.InSection6.5,wedescribethePCrelativeaddressingmodeanditsuse in implementingADRandLDR.
Section6.1:ARMMemoryMapandMemoryAccessTheARMCPUuses32-bitaddressestoaccessmemoryandperipherals.Thisgives
us amaximum of 4GB (gigabytes) ofmemory space. This 4GB of directly accessiblememoryspacehasaddresses0x00000000to0xFFFFFFFF,meaningeachbyteisassignedauniqueaddress(ARMisabyte-addressableCPU).SeeFigure6-1.
Figure6-1:MemoryByteAddressinginARM
The4GBofmemoryspaceisdividedintothreeregions:code,data,andperipheraldevices.SeeTable6-1.
Addressrange Name Description
0x00000000-0x1FFFFFFF Code ROMorFlashmemory
0x20000000-0x3FFFFFFF SRAM SRAMregionusedforon-chipRAM
0x40000000-0x5FFFFFFF Peripheral On-chipperipheraladdressspace
0x60000000-0x9FFFFFFF RAM Memory,cachesupport
0xA0000000-0xDFFFFFFF Device Sharedandnon-shareddevicespace
0xE0000000-0xFFFFFFFF System PPBandvendorsystemperipherals
Table6-1:MemorySpaceAllocationinARMCortex
In other words, the ARM uses the memory mapped I/O. For the ARMmicrocontrollers,generallytheFlashROMisusedforprogramcode,SRAMforscratchpad data, and memory-mapped I/O ports for peripherals. While there is an absolutestandard for the ARM instructions that all licensees of ARMmust follow, there is no
standardforexactlocationsandtypesofmemoryandperipherals.Thereforethelicenseescanimplementthememoryandperipheralsastheychoose.ForthisreasontheamountandtheaddresslocationsofmemoryusedbyFlashROM,SRAM,andI/Operipheralsvariesamong the familymembers and chipmanufacturers. TheARMmanufacturer datasheetshouldgiveyouthedetailsofthememorymapforbothon-chipandoff-chipmemoryandperipherals.MakesuretoexaminethememorymapofagivenARMchipbeforeyoustarttoprogramit.FromTable6-1,noticethatsomeofthememoryaddressesaresetasideforthe external (off-chip) memory and peripherals. At the time of this writing, no ARMmanufacturerhaspopulatedtheentire4GBofmemoryspacewithon-chipROM,RAM,andI/Operipherals.
ARM-basedMotherboards
InARM systemsforMicrosoftWindows,Unix,andAndroidoperatingsystemstheARMmotherboardsuseDRAMfortheRAMmemory,justlikethex86andPentiumPCs.AstheARMCPUispushedintothelaptop,desktop,andtabletsPCs,andthehighendofembedded systems products such as routers,wewill see the use ofDRAM as primarymemory to store both the operating systems and the applications. In such systems, theFlashmemorywill beholding thePOST (poweron self test),BIOS (basic Input/outputsystems) andboot programs. Just likex86 system, such systemshavebothon-chip andoff-chiphighspeedSRAMforcache.Currently,thereareARMchipsonthemarketwithsome on-chip Flash ROM, SRAM, and memory decoding circuitry for connection toexternal(off-chip)memory.Thisoff-chipmemorycanbeSRAM,Flash,orDRAM.Thedatasheet for suchARMchipsprovide thedetailsofmemorymap forbothon-chipandoff-chipmemories.Next,weexaminetheARMbusesandmemoryaccess.
Figure6-2:MemoryConnectionBlockDiagraminARM
D31–D0Databus
The32-bitdatabusoftheARMprovidesthe32-bitdatapathtotheon-chipandoff-chipmemoryandperipherals.Theyaregroupedinto8-bitdatachunks,D0–D7,D8–D15,D16–D23,andD24–D31.
A31–A0
These signalsprovide the32-bit addresspath to theon-chipandoff-chipmemoryandperipherals.SincetheARMsupportsdataaccessofbyte(8bits),halfword(16bits),and word (32 bits), the buses must be able to access any of the 4 banks of memoryconnectedtothe32-bitdatabus.TheA0andA1areusedtoselectoneofthe4bytesoftheD31-D0databus.SeeFigure6-3.
Figure6-3:MemoryBlockDiagraminARM
AHBandAPBbuses
TheARMCPUisconnected to theon-chipmemoryviaanAHB(advancedhigh-performancebus).TheAHBisusednotonlyforconnectiontoon-chipROMandRAM,itisalsoused forconnection to someof thehighspeed I/Os (input/output) suchasGPIO(general purpose I/O). ARM chip also has the APB (advanced peripherals bus) busdedicated for communication with the on-chip peripherals such as timers, ADC, serialCOM,SPI, I2C,andotherperipheralports.Whileweneed the32-bitdatabusbetweenCPUand thememory (RAMandROM),many slower peripherals are 8 or 16 bits andthere is noneed for entire fast 32-bit databuspathway.For this reason,ARMuses theAHB-to-APBbridgetoaccessthesloweron-chipdevicessuchasperipherals.Alsosinceperipheralsdonotneedahighspeedbus,abridgebetweenAHBandAPBallowsgoingfromthehigherspeedbusofAHBtolowerspeedbusofperipherals.TheAHBbusallowsasingle-cycleaccess.SeeFigure6-4forAHB-to-APBbridge.
Figure6-4:AHBandAPBinARM
Buscycletime
ToaccessadevicesuchasmemoryorI/O,theCPUprovidesafixedamountoftimecalledabuscycletime.Duringthisbuscycletime,thereadorwriteoperationofmemoryorI/Omustbecompleted.ThebuscycletimeusedforaccessingmemoryisoftenreferredtoasMC(memorycycle)time.ThetimefromwhentheCPUprovidestheaddressesatitsaddresspinstowhenthedataisexpectedatitsdatapinsiscalledmemoryreadcycletime.Whileforon-chipmemorythecycletimecanbe1clock,intheoff-chipmemorythecycletimeisoften2clocks.IfmemoryisslowanditsaccesstimedoesnotmatchtheMCtimeoftheCPU,extratimecanberequestedfromtheCPUtoextendthereadcycletime.Thisextratimeiscalledawaitstate(WS).Inthe1980s,theclockspeedformemorycycletimewasthesameastheCPU’sclockspeed.Forexample,inthe20MHzprocessors,thebuseswereworkingatthesamespeedof20MHz.Thisresultedin2×50ns=100nsforthememorycycletime(1/20MHz=50ns).SeeExample6-1.
Example6-1
Calculatethememorycycletimeofa50-MHzbussystemwith
(a)0WS,
(b)1WS,and
(c)2WS.
Assumethatthebuscycletimeforoff-chipmemoryaccessis2clocks.
Solution:
1/50MHz=20nsisthebusclockperiod.Sincethebuscycletimeofzerowaitstatesis2clocks,wehave:
Memorycycletimewith0WS2×20=40ns
Memorycycletimewith1WS40+20=60ns
Memorycycletimewith2WS40+2×20=80ns
It ispreferred that all bus activitiesbe completedwith0WS.However, if the readandwriteoperationscannotbecompletedwith0WS,werequestanextensionofthebuscycletime.ThisextensionisintheformofanintegernumberofWS.Thatis,wecanhave1,2,3,andsoonWS,butnot1.25WS.
WhentheCPU’sspeedwasunder100MHz,thebusspeedwascomparabletotheCPU speed. In the 1990s theCPU speed exploded to 1GHz (gigahertz)while the busspeedmaxedoutataround200MHz.ThegapbetweentheCPUspeedandthebusspeedisoneofthebiggestproblemsinthedesignofhigh-performancesystems. ToavoidtheuseoftoomanywaitstatesininterfacingmemorytoCPU,cachememoryandotherhigh-speedDRAMsareused.ThesearediscussedinChapters9and10.
Busbandwidth
The rate of data transfer is generally called bus bandwidth. In other words, busbandwidth is a measure of how fast buses transfer information between the CPU andmemoryorperipherals.Thewiderthedatabus, thehigherthebusbandwidth.However,theadvantageofthewiderexternaldatabuscomesatthecostofincreasingthediesizeforsystem on-chip (SOC) or the printed circuit board size for off-chipmemory. Now youmightaskwhyweshouldcarehowfastbusestransferinformationbetweentheCPUandoutside, as long as theCPU isworking as fast as it can. The problem is that theCPUcannotprocess information that it doesnothave. Inotherwords, the speedof theCPUmustbematchedwiththehigherbusbandwidth;otherwise,thereisnouseforafastCPU.ThisislikedrivingaPorscheorFerrari infirstgear; it isaterribleunderusageofCPUpower.Bus bandwidth ismeasured inMB (megabytes) per second and is calculated asfollows:
busbandwidth=(1/buscycletime)×buswidthinbytes
In the above formula, bus cycle time can be for bothmemory and I/O since theARMusesthememorymappedI/O.Example6-2clarifiestheconceptofbusbandwidth.As can be seen from Example 6-2, there are twoways to increase the bus bandwidth:Eitheruseawiderdatabusorshortenthebuscycletime(ordoboth).Thatisexactlywhatmanyprocessorshavedone.Again, itmustbenotedthatalthoughtheprocessor’sspeedcango to1GHzorhigher, thebusspeed foroff-chipmemory is limited toaround200MHz.Thereasonforthisisthatthesignalsbecometoonoisyforthecircuitboardiftheyareabove100MHz.
Example6-2
CalculatememorybusbandwidthforthefollowingCPUifthebusspeedis100MHz.
(a)ARMThumbwith0WSand1WS(16-bitdatabus)
(b)ARMwith0WSand1WS(32-bitdatabus)
Assumethatthebuscycletimeforoff-chipmemoryaccessis2clocks.
Solution:
Thememorycycletimeforbothis2clocks,withzerowaitstates.Withthe100MHzbusspeedwehaveabusclockof1/100MHz=10ns.
(a)Busbandwidth=(1/(2×10ns))×2bytes=100Mbytes/second(MB/s)
With1waitstate,thememorycyclebecomes3clockcycles
3×10=30nsandthememorybusbandwidthis=(1/30ns)×2bytes=66.6MB/s
(b)Busbandwidth=(1/(2×10ns))×4bytes=200MB/s
With1waitstate,thememorycyclebecomes3clockcycles
3×10=30nsandthememorybusbandwidthis=(1/30ns)×4bytes=126.6MB/s
Fromtheaboveitcanbeseenthatthetwofactorsinfluencingbusbandwidthare:
1.Theread/writecycletimeoftheCPU
2.Thewidthofthedatabus
Codememoryregion
The 4 GB of ARMmemory space is organized as 1G × 32 bits since the ARMinstructionsare32-bit.TheinternaldatabusoftheARMis32-bit,allowingthetransferofone instruction into theCPUeveryclockcycle.This isoneof thebenefitsof theRISCfixedinstructionsize.Thefetchingofaninstructionineveryclockcyclecanworkonlyifthecodeiswordaligned,meaningeachinstructionisplacedatanaddresslocationendingwith0,4,8,orC.Example6-3showstheplacementofcodeinARMmemory.Noticethatthecodeaddressesgoupby4sincetheARMinstructionsarefixedat4byteseach.Whilecompilersensurethatcodesarewordaligned,itisjoboftheprogrammertomakesurethedatainSRAMiswordalignedtoo.Wewillexaminethisimportanttopicsoon.
Example6-3
CompileanddebugthefollowingcodeinKeilandseetheplacementofinstructionsinmemorylocations.
AREAARMex,CODE,READONLY
ENTRY
MOVR2,#0x00;R2=0x00
MOVR3,#0x35;R3=0x35
ADDR4,R3,R2
END;Markendoffile
Solution
Asyoucanseeinthefigure,thefirstMOVinstructionstartsfromlocation0x00000000,thesecondMOVinstructionstartsfromlocation0x00000004andtheADDinstructionstartsfromlocation0x00000008.
Thefollowingimagedisplaysthefirstlocationsofmemory.ThecodeofthefirstMOVinstructionislocatedinthefirstword(fourbytes)ofmemorywhichiswordaligned.Thesameruleappliesfortheotherinstructions.NotethatthecodeofMOVR2,0isE3A02000but0020A0E3isstoredinthememory.Wewilldiscussthereasoninthischapterwhenwefocusontheconceptofbigendianandlittleendian.
SRAMmemoryregion
AsectionofthememoryspaceisusedbySRAM.TheSRAMcanbeon-chiporoff-chip (external).The sameway that everyARMchip has someon-chipFlashROM forcode,aportionofthememoryregionisusedbytheon-chipSRAM.Thison-chipSRAMisusedbytheCPUforscratchpadtostoreparameters.ItisalsousedbytheCPUforthepurposeofthestack.WeexaminethestackusagebytheARMinthenextsection.Inusing
theSRAMmemory for storingparameters,wemustbecarefulwhen loadingor storingdata in the SRAM lest we use unaligned data access. Next, we discuss this importantissue.
DatamisalignmentinSRAM
ThecaseofmisaligneddatahasamajoreffectontheARMbusperformance.Ifthedataisaligned,foreverymemoryreadcycle, theARMbringsin4bytesofinformation(data or code) using theD31–D0 data bus. Such data alignment is referred to aswordalignment. Tomake dataword aligned, the least significant digits of the hex addressesmustbe0,4,8,orC(inhex).
Whilethecompilersmakesurethatprogramcodes(instructions)arealwaysaligned(Example 6-3), it is the placement of data in SRAM by the programmer that can benonaligned and therefore subject to memory access penalty. In other words, the singlecycleaccessofmemoryisalsousedbyARMtobringintoregisters4bytesofdataeveryclockcycleassumingthatthedataisaligned.Tomakesurethatdataarealsoalignedweusethealigndirective.TheuseofaligndirectiveforRAMdatamakessurethateachwordislocatedatanaddresslocationendingwithaddressof0,4,8,orC.Ifourdataiswordsize(usingDCDUdirective)thentheuseofaligndirectiveatthestartofthedatasectionguarantiesallthedataplacementswillbewordaligned.WhenawordsizedataisdefinedusingtheDCDdirective,theassembleralignsittobewordaligned.
Accessingnon-aligneddata
As we have stated many times before, ARM defines 32-bit data as a word. Theaddressofawordcanstartatanyaddresslocation.Forexample,intheinstruction“LDRR1,[R0]”ifR0=0x20000004,theaddressofthewordbeingfetchedintoR1startsatanalignedaddress.Inthecaseof“LDRR1,[R0]”ifR0=0x20000001theaddressstartsatanon-aligned address. In systems with a 32-bit data bus, accessing a word from a non-alignedaddressedlocationcanbeslower.Thisissueisimportantandappliestoall32-bitprocessors.
In the 8-bit system, accessing a word (4 bytes) is treated like accessing fourconsecutive bytes regardless of the address location. Since accessing a byte takes onememory cycle, accessing 4 bytes will take 4 memory cycles. In the 32-bit system,accessingawordwithanalignedaddresstakesonememorycycle.ThatisbecauseeachbyteiscarriedonitsowndatapathofD0–D7,D8–D15,D16–D23,andD24–D31inthesamememorycycle.However,accessingawordwithanon-alignedaddressrequirestwomemory cycles. For example, see how accessing theword in the instruction “LDRR1,[R0]” works as shown in Figure 6-5. As a case of aligned data, assume that R0 =0x80000000. In this instruction, 4 bytes contents of memory locations 0x80000000through 0x80000003 are being fetched in one cycle. In only one cycle, theARMCPUaccesseslocations0x80000000through0x80000003andputsitinR1.
Figure6-5:MemoryAccessforAlignedandNon-alignedData
Now assuming that R0 = 0x80000001 in this instruction, 8 bytes contents ofmemorylocations0x80000000through0x80000007arebeingfetchedintwoconsecutivecyclesbutonly4bytesofitareused.Inthefirstcycle,theARMCPUaccesseslocations0x80000000 through 0x80000003 and puts them in R1 only the desired three bytes oflocations0x800000001through0x80000003.Inthesecondcycle,thecontentsofmemorylocations 0x8000004 through 0x80000007 are accessed and only the desired byte of0x80000004isputintoR1.SeeExample6-4.
Example6-4
Showthedatatransferofthefollowingcasesandindicatethenumberofmemorycycletimesittakesfordatatransfer.AssumethatR2=0x4598F31E.
LDRR1,=0x40000000;R1=0x40000000
LDRR2,=0x4598F31E;R2=0x4598F31E
STRR2,[R1];StoreR2tolocation0x40000000
ADDR1,R1,#1;R1=R1+1=0x40000001
STRR2,[R1];StoreR2tolocation0x40000001
ADDR1,R1,#1;R1=R1+1=0x40000002
STRR2,[R1];StoreR2tolocation0x40000002
ADDR1,R1,#1;R1=R1+1=0x40000003
STRR2,[R1];StoreR2tolocation0x40000003
Solution:
ForthefirstSTRR2,[R1]instruction,theentire32bitsofR2isstoredintolocationswith
addresses of 0x40000000, 0x40000001, 0x40000002, and 0x40000003, The 4-bytecontent of register R2 is stored into memory locations with starting address of0x40000000 via the 32-bit data bus of D31–D0. This address is word aligned sinceaddress of the least significant digit is 0. Therefore, it takes only onememory cycle totransferthe32-bitdata.
ForthesecondSTRR2,[R1]instruction,inthefirstmemorycycle,thelower24bitsofR2is stored into locations 0x40000001, 0x40000002, and 0x40000003. In the secondmemorycycle,theupper8bitsofR2isstoredintothe0x40000004location.
ForthethirdSTRR2,[R1]instruction,inthefirstmemorycycle,thelower16bitsofR2isstoredintolocations0x40000002and0x40000003.Inthesecondmemorycycle,theupper16bitsofR2isstoredintolocations0x40000004and0x40000005.
ForthefourthSTRR2,[R1]instruction,inthefirstmemorycycle,thelower8bitsoftoR2isstoredintolocations0x40000003.Inthesecondmemorycycle,theupper24bitsofR2isstoredintothelocations0x40000004,0x40000005,and0x40000006.
Thelessontobe learnedfromthis is to trynot toputanywordsonanon-alignedaddress location ina32-bit system. Indeed this is so important thatdirectiveALIGN isspecificallydesignedforthispurpose.Next,wediscusstheissueofaligneddata.
UsingLDRinstructionwithDCDandALIGNdirectives
TheDCDandDCDUdirectivesareusedfor32-bit(word)data.TheDCDdirective
ensures32-bitdatatypesarealigned,incontrasttoDCDUwhichdoesnot.DCDisusedasfollows:
VALUE1DCD0x99775533
This ensures that VALUE1, a word-sized operand, is located in a word alignedaddress location. Therefore, an instruction accessing it will take only a singlememorycycle.SinceperformanceoftheCPUdependsonhowfastitcanfetchthedatawemustensure thatanymemoryaccess reading32-bitdata isdone inasingleclockcycle.Thismeanswemustmakesureall32-bitdataarewordaligned.ThisissoimportantthatARMhas an interrupt (exception) dedicated tomisaligneddata,meaning any time it accessesmisaligned data, it lets us know that there is a problem. The one-time use of ALIGNdirectiveatthebeginningofdataareausingDCDUmakesthedataalignedforthatgroupofdata.
UsingLDRHwithDCWandALIGNdirectives
Theproblemofmisaligneddataisalsoanissuewhenthedatasizeisinhalf-words(16-bit).InmanycasesusingDCWU,wemustusetheALIGNdirectivemultipletimesinthedataareaofagivenprogramtoensuretheyarealigned.ThisisincontrasttotheDCWdirectivewhichensuresdatatypetobehalf-wordaligned.Thisisespeciallythecasewhenwe use the LDRH instruction. See Example 6-5. Aligned data is also an issue for theThumbversionoftheARM.
Example6-5
ShowthedatatransferofthefollowingLDRHinstructionsandindicatethenumberofmemorycycletimesittakesfordatatransfer.
LDRR1,=0x80000000;R1=0x80000000
LDRR3,=0xF31E4598;R3=0xF31E4598
LDRR4,=0x1A2B3D4F;R4=0x1A2B3D4F
STRR3,[R1]
;(STRR3,[R1])storesR3tolocation0x80000000
STRR4,[R1,#4]
;(STRR4,[R1+4])storesR4tolocation0x80000004
LDRHR2,[R1]
;loadstwobytesfromlocation0x80000000toR2
LDRHR2,[R1,#1]
;loadstwobytesfromlocation0x80000001toR2
LDRHR2,[R1,#2]
;loadstwobytesfromlocation0x80000002toR2
LDRHR2,[R1,#3]
;loadstwobytesfromlocation0x80000003toR2
Solution:
IntheLDRHR2,[R1]instruction,locationswithaddressesof0x80000000,0x80000001,0x80000002,and0x80000003areaccessedbutonly0x80000000and0x80000001areusedtogetthe16bitstoR2.Thisaddressishalfwordalignedsincetheleastsignificantdigitis0.Therefore,ittakesonlyonememorycycletotransferthedata.Now,R2=0x00004598
FortheLDRHR2,[R1,#1],instruction,locationswithaddressesof0x80000000,0x80000001,0x80000002,and0x80000003areaccessed,butonly0x80000001and0x80000002areusedtogetthe16bitstoR2.Therefore,ittakesonlyonememorycycletotransferthedata.Now,R2=0x00001E45.
FortheLDRHR2,[R1,#2],instruction,locationswithaddressesof0x80000000,0x80000001,0x80000002,and0x80000003areaccessed,butonly0x80000002and0x80000003areusedtogetthe16bitstoR2.Therefore,ittakesonlyonememorycycletotransferthedata.Now,R2=0x0000F31E.
FortheLDRHR2,[R1,#3]instruction,inthefirstmemorycycle,locationswithaddressesof0x80000000,0x80000001,0x80000002,and0x80000003areaccessed,butonly0x80000003isusedtogetthelower8bitstoR2.Inthesecondmemorycycle,theaddresslocations0x80000004,0x80000005,0x80000006,and0x80000007areaccessedwhereonlythe0x80000004locationisusedtogettheupper8bitstoR2.Now,R2=0x00004FF3.
UsingLDRBwithDCBandALIGNdirectives
Theproblemofmisaligneddatadoesnotexistwhenthedatasizeisbytes.Incasessuch as using the string of ASCII characters with theDCB directive, accessing a bytetakes the same amount of time (one memory cycle) as an aligned word (4 bytes),regardlessoftheaddresslocationofthedata.TheonlyproblemwiththeLDRBisitbringsintotheCPUonlyasinglebyteofdataineachmemorycycleinsteadof4bytesifLDRis.SeeExample6-6.
Example6-6
ShowthedatatransferofthefollowingLDRBinstructionsandindicatethenumberofmemorycycletimesittakesfordatatransfer.
LDRR1,=0x80000000;R1=0x80000000
LDRR3,=0xF31E4598;R3=0xF31E4598
LDRR4,=0x1A2B3D4F;R4=0x1A2B3D4F
STRR3,[R1]
;StoreR3tolocation0x80000000
STRR4,[R1,#4]
;(STRR4,[R1+4])StoreR4tolocation0x80000004
LDRBR2,[R1]
;loadonebytefromlocation0x80000000toR2
LDRBR2,[R1,#1]
;(LDRBR2,[R1+1])loadonebytefromlocation0x80000001
LDRBR2,[R1,#2]
;(LDRBR2,[R1+2])loadonebytefromlocation0x80000002
LDRBR2,[R1,#3]
;(LDRBR2,[R1+3])loadonebytefromlocation0x80000003
Solution:
IntheLDRBR2,[R1]instruction,locationswithaddressesof0x80000000,0x80000001,0x80000002,and0x80000003areaccessedbutonly0x80000000isusedtogetthe8bitsto R2. Therefore, it takes only one memory cycle to transfer the data. Now,R2=0x00000098.
In the LDRB R2,[R1,#1] instruction, locations with addresses of 0x80000000,0x80000001,0x80000002,and0x80000003areaccessedbutonly0x80000001isusedtogetthe8bitstoR2.Therefore,ittakesonlyonememorycycletotransferthedata.Now,R2=0x00000045.
In the LDRB R2,[R1,#2] instruction, locations with addresses of 0x80000000,0x80000001,0x80000002,and0x80000003areaccessedbutonly0x80000002isusedtogetthe8bitstoR2.Therefore,ittakesonlyonememorycycletotransferthedata.Now,R2=0x0000001E.
In the LDRB R2,[R1,#3] instruction, locations with addresses of 0x80000000,0x80000001,0x80000002,and0x80000003areaccessedbutonly0x80000003isusedtogetthe8bitstoR2.Therefore,ittakesonlyonememorycycletotransferthedata.Now,R2=0x000000F3.
Peripheralregion
Table 6-1 showed a section of memory is set aside for peripherals. The type ofperipherals and memory address locations used is unique to a vendor. The ARMmanufacturersprovidethedetailsofmemorymapfortheperipherals.Ifyouexaminethemanufacturerdatasheetyouwillseethattheyusealignedaddressestomakesurethereisnoclockpenaltyforaccessingthem.
LittleEndianvs.BigEndianwar
In storing data, the ARM follows the little endian convention. The little endianplaces the least significant byte (little end of the data) in the low address and the bigendian is the opposite. The origin of the terms big endian and little endian is from aGulliver’sTravelsstoryabouthowaneggshouldbeopened:fromthebigendorthelittleend.ARMsupportsbothlittleandbigendian.InARMlittleendianisthedefault.SomeARMchipmanufacturersprovideanoptionforchangingittobigendian.SeeExample6-7tounderstandlittleendianandbigendiandatastorage.
Example6-7
Showhowdataisplacedafterexecutionofthefollowingcodeusing
a)littleendianand
b)bigendian.
LDRR2,=0x7698E39F;R2=0x7698E39F
LDRR1,=0x80000000
STRR2,[R1]
Solution:
a)Forlittleendianwehave:
Location80000000=(9F)
Location80000001=(E3)
Location80000002=(98)
Location80000003=(76)
b)Forbigendianwehave:
Location80000000=(76)
Location80000001=(98)
Location80000002=(E3)
Location80000003=(9F)
InExample6-7,noticehowtheleastsignificantbyte(thelittleendofthedata)0x9Fgoestothelowaddress0x80000000,andthemostsignificantbyteofthedata0x76goesto thehighaddress0x80000003.Thismeans that the little endof thedatagoes in first,hencethenamelittleendian.In theARMwithbigendianoptionenabled,data isstoredtheoppositeway:Thebigend(mostsignificantbyte)goesintothelowaddressfirst,andforthisreasonitiscalledbigendian.ManyofrecentRISCprocessorsallowselectionofmode,bigendianorlittleendian.
HarvardArchitectureandARM
In recent yearsmanyARMmanufacturers are using the Harvard architecture forARMCPUs.OldARM architectures up toARM7 useVonNeumann architecture. TheHarvardarchitecturefeedstheCPUwithbothcodeanddataatthesametimeviatwosetsofbuses,oneforcodeandonefordata.ThisincreasestheprocessingpoweroftheCPU
sinceitcanbringinmoreinformation.
Figure6-6:VonNeumannvs.HarvardArchitecture
ReviewQuestions
1.InARM,alltheinstructionsare___bytes?
2.Whomakessurethatinstructionarealignedonwordboundary?
3.InARM,the_____endianisthedefault.
4.A66 MHzsystemhasamemorycycletimeof_____nsifitisusedwithazerowaitstate.
5.Tointerfacea100 MHzprocessortoa50 nsaccesstimeROM,howmanywaitstatesareneeded?
6.Trueorfalse.ARMusesbigendianformatwhenispoweredup.
Section6.2:StackandStackUsageinARMThestack isa sectionofRAMusedby theCPU tostore information temporarily.
ThisinformationcouldbedataoranaddressorCPUregisterswhencallingasubroutine.Stack is alsowidely usedwhen executing an interrupt service routine (ISR). TheCPUneedsthisstorageareabecausethereareonlyalimitednumberofregisters.
Howstacksareaccessed
IfthestackisasectionofRAM,theremustbearegisterinsidetheCPUtopointtoit.IntheARMCPUtheregisterusedtoaccessthestackisR13.
ThestoringofCPUinformationsuchastheregistersonthestackiscalledaPUSH,and loading thecontentsof thestackback intoaCPUregister iscalledaPOP. Inotherwords,aregisterispushedontothestacktosaveitandpoppedoffthestacktoretrieveit.Thefollowingdescribeseachprocess.
Pushingontothestack
Thestackpointer(SP)pointstothetopofthestack(TOS).IntheARMregisterR13isdesignatedasstackpointer.Aswepush(store)dataontothestack,thedataaresavedinSRAM (wheretheSPpointsto)andSPmustbedecremented(orincremented)topointtothe next location. In the ARM we have a choice of either incrementing the SP ordecrementing it.Notice that this is different frommany othermicroprocessors, notablyx86processors, inwhich theSPisdecrementedautomaticallywhendata ispushedontothe stack.Again itmust be emphasized thatwhile in traditionalCPUs such as x86, thestackpointerisdecrementedautomaticallybytheCPUitself,intheARMCPUwemustactuallycodetheinstructionforstackpointerdecrementation(orincrementation)forthatmatter.Therefore,topusharegisterontostackweusetheSTRandSUBinstructionsasshowninthefollowingcode:
STRRr,[R13];Rrcanbeanyregisters(R0-R12)
SUBR13,R13,#4;decrementstackpointer
Forexample,tostorethevalueofR1wecanwritethefollowinginstructions:
STRR1,[R13];storeR1ontothestack,
SUBR13,R13,#4;anddecrementSP
Poppingfromthestack
Popping(loading)thecontentsofthestackbackintoagivenregisteristheoppositeprocessofpushing.When thePOP isexecuted, theSP is incremented (ordecremented)and the top locationof thestack iscopied (loaded)back to the register.Thatmeans thestackisLIFO(Last-In-First-Out)memory.
ToretrievedatafromstackwecanusetheLDRinstruction.
ADDR13,R13,#4;incrementstackpointer
LDRRr,[R13];Rrcananyoftheregisters(R0-R13)
Forexample,thefollowinginstructionspopfromthetopofstackandcopytoR1:
ADDR13,R13,#4;incrementSP
LDRR1,[R13];load(POP)thetopofstacktoR1
InitializingthestackpointerinARM
When theARMispoweredup, theR13(SP) registercontainsvalue0.Therefore,wemustinitializetheSPatthebeginningoftheprogramsothatitpointstosomewhereinthe internal SRAM. In ARM, we can make the stack to grow from a higher memorylocationtoalowermemorylocation.Inthiscasewhenwepush(store)ontothestacktheSPisdecremented.Wecanalsomakethestacktogrowfromalowermemorylocationtoa higher memory location, therefore when we push (store) onto the stack, the SP isincremented. It is common to initialize the SP to the uppermostRAMmemory region,whichmeansaswepushdataontothestack,thestackpointermustbedecremented.
Different ARMs have different amounts of RAM. In some ARM assemblerStack_Toprepresentstheaddressoftopofthestack.So,ifwewanttoinitializetheSP,wecansimplyloadStack_Topinto theSP.Notice thatSP(R13) isa32-bit register.So,wecanuseany32-bitaddressofSRAMasastacksection.
Example 6-8showshowtoinitializetheSPandusethestoreandloadinstructionsforPUSHandPOPoperations.
Example6-8
ThefollowingARMprogramplacessomedataintoregistersandcallsasubroutinethatusesthesameregisters.Itshowshowtousethestack.Examinethestack,stackpointer,andtheregistersusedaftertheexecutionofeachinstruction.
;initializetheSPtopoint
;tothelastlocationofRAM(Stack_Top)
;AssumeStack_Top=0xFF8
LDRR13,=Stack_Top;loadSP
LDRR0,=0x125;R0=0x125
LDRR1,=0x144;R1=0x144
MOVR2,#0x56;R2=0x56
BLMY_SUB;callasubroutine
ADDR3,R0,R1
;R3=R0+R1=0x125+0x144=0x269
ADDR3,R3,R2
;R3=R3+R2=0x269+0x56=0x2BF
HEREBHERE;stayhere
;–––––––––
MY_SUB
;––––––-
;saveR0,R1,andR2onstack
;beforetheyareusedbyaloop
STRR0,[R13];saveR0onstack
SUBR13,R13,#4
;R13=R13-4,todecrementthestackpointer
STRR1,[R13];saveR1onstack
SUBR13,R13,#4
;R13=R13-4,todecrementthestackpointer
STRR2,[R13];saveR2onstack
SUBR13,R13,#4
;R13=R13-4,todecrementthestackpointer
;––—R0,R1,andR2arechanged
MOVR0,#0;R0=0
MOVR1,#0;R1=0
MOVR2,#0;R2=0
;––––––––––––––
;restoretheoriginalregisterscontentsfromstack
ADDR13,R13,#4
;R13=R13+4toincrementthestackpointer
LDRR2,[R13];restoreR2fromstack
ADDR13,R13,#4
;R13=R13+4toincrementthestackpointer
LDRR1,[R13];restoreR1fromstack
ADDR13,R13,#4
;R13=R13+4toincrementthestackpointer
LDRR0,[R13];restoreR0fromstack
BXLR;returntocaller
Solution:
Aftertheexecutionof
Contentsofsometheregisters
(inHex) Stack
R0 R1 R2 SP(R13)
LDRR13,=0xFF8 0 0 0 FF8
LDRR0,=0x125
LDRR1,=0x144
LDRR2,=0x56
125 144 56 FF8
STRR0,[R13]
SUBR13,R13,#4125 144 56 FF4
STRR1,[R13]
SUBR13,R13,#4125 144 56 FF0
STRR2,[R13]
SUBR13,R13,#4125 144 56 FEC
MOVR0,#0
MOVR1,#0
MOVR2,#0
0 0 0 FEC
ADDR13,R13,#4
LDRR2,[R13]0 0 56 FF0
ADDR13,R13,#4
LDRR1,[R13]0 144 56 FF4
ADDR13,R13,#4
LDRR0,[R13]125 144 56 FF8
ThestacklimitandnestedcallsinARM
Asmentionedearlier,wecandefinethestackanywhereintheread/writememory.So,intheARMthestackcanbeasbigasitsRAM.InARM,thestackisusedforcallsandinterrupts.Wemustrememberthatuponcallingasubroutinefromthemainprogramusing theBL instruction,R14, the linker register,keeps trackofwhere theCPUshouldreturnaftercompletingthesubroutine.Now,ifwehaveanothercallinsidethesubroutineusingtheBLinstruction,thenitisourjobtostoretheoriginalR14onthestack.Failuretodothatwillendupcrashingtheprogram.Forthisreason,wemustbeverycarefulwhenmanipulatingthestackcontents.
UsingLDMandSTMinstructionsforthestack
AnotherwaytopushregistercontentsontothestackistouseSTM(storemultiple)andLDM(loadmultiple)instructions.Inmanyinterruptandmultitaskingapplicationsweneedtosavethecontentsofmultipleregistersonthestackandrestorethemback.Pushingtheregistersontothestackoneregisteratatimeistimeconsuming.ForthisreasonARMhas theSTMandLDMinstructions.TheSTMandLDMallowtostore(push)and load(pop)multiple registerswith a single instruction.Next,weexplain each instruction andhowitisused.
STM
BelowshowsthesyntaxfortheSTM.Noticewecanspecifythenumberofregistersto be stored and destination RAM address (pointer) to which the data is pushed onto.Examinethefollowinginstructions:
STMR11,{R0-R3}
;StoreR0throughR3ontomemorypointedtobyR11
STMR8,{R0-R7}
;StoreR0throughR7ontomemorypointedtobyR8
STMR7,{R0,R3,R5}
;StoreR0,R3,R5ontomemorypointedtobyR7
STMR11,{R0-R10}
;StoreR0throughR10ontomemorypointedtobyR11
LDM
BelowshowsthesyntaxfortheLDM.Noticewecanspecifythenumberofregisters
tobeloadedanddestinationRAMaddress(pointer)fromwhichthedataispoppedfrom.Examinethefollowinginstructions:
LDMR11,{R0-R3}
;LoadR0throughR3frommemorypointedtobyR11
LDMR8,{R0-R7}
;LoadR0throughR7frommemorypointedtobyR8
LDMR7,{R0,R3,R5}
;LoadR0,R3,R5frommemorypointedtobyR7
LDMR11,{R0-R10}
;LoadR0throughR10frommemorypointedtobyR11
Example6-9showshowwecanuseSTMandLDMtosimplifyacodeandpreventunwantederrors.
Example6-9
ModifytheExample6-8usingtheLDMandSTMinstructions.
Solution:
;initializetheSPtopointto
;thelastlocationofRAM(Stack_Top)
;AssumeStack_Top=0xFF8
LDRR13,=Stack_Top;loadSP
LDRR0,=0x125;R0=0x125
LDRR1,=0x144;R1=0x144
MOVR2,#0x56;R2=0x56
BLMY_SUB;callasubroutine
ADDR3,R0,R1;R3=R0+R1
;=0x125+0x144=0x269
ADDR3,R3,R2;R3=R3+R2
;=0x269+0x56=0x2BF
HEREBHERE;stayhere
;–––––––––
MY_SUB
;–––––––––
;saveR0,R1,andR2onstack
;beforetheyareusedbyaloop
STMR13,{R0-R2};saveR0,R1,R2onstack
SUBR13,R13,#12
;––—R0,R1,andR2arechanged
MOVR0,#0;R0=0
MOVR1,#0;R1=0
MOVR2,#0;R2=0
;–––
;restoretheoriginalregisterscontentsfromstack
ADDR13,R13,#12
LDMR13,{R0-R2};restoreR0,R1,andR2fromstack
BXLR;returntocaller
;–––-
WecanalsousethePUSHandPOPpseudo-instructionstodothesamething.
CopyingablockofdatawithLDMandSTM
To copy a block of data, we bring into the CPU’s register a word of data frommemoryandthencopyitfromtheregistertoalocationinRAM.Inthatcasewecopyoneword(4bytes)atatime.Sotocopy16wordswehavetosetacounterto16foraloopiteration.We can use theLDMandSTM to do the same thingwithmuch less coding.Using LDM and STM instructions to copy a block of data, we need a register for thesourceaddressandanotheroneforthedestinationaddress.Program6-1usesR11andR12for source and destination addresses, respectively. The registers R0–R9 are used astemporaryplacefordatabeforetheyarecopiedtothedestination.Thatgivesus10words(40bytes)transferatatime.NoticethatintheSTMandLDMinstructions,registersaretransferredtoorfrommemoryintheorderofthelowesttohighest,soR0willalwaysbetransferredtoorfromalocationlowerthanthelocationofR1inthememoryandsoon.ThatiswhyweshouldnotbeworryingabouttheorderofregistersinthestackwhenweuseSTMandLDManditisnotneededtoPUSHandPOPregistersinoppositeorderasiscommoninnormalPOPandPUSHinstructions.
Program6-1:CopyingaBlockofMemory
Thisprogramcopiesablockof10words(40bytes)memoryfromsourcetodestination.TheregistersR11andR12areusedforsourceanddestinationaddresses.
AREAPROG6_1,CODE,READONLY
ENTRY
LDMR11,{R0-R9};LoadR0thruR9frommemorypointedtobyR11
STMR12,{R0-R9};StoreR0thruR9tomemorypointedtobyR12
END
ReviewQuestions
1.The______registeristhedefaultstackpointer.
2.HowdeepisthesizeofthestackintheARM?
3.WriteaprogramthatpushesR5,R6,R7,andR8intothestack.
4.WriteaprogramthatpopsR5,R6,R7,andR8fromthestack.
5.Whatdoesthefollowingprogramdo?
LDRR5,=0x40000000
LDMR5,{R1,R4}
LDRR5,=0x50000000
STMR5,{R1,R4}
Section6.3:ARMBit-AddressableMemoryRegionManymicroprocessorsallowprogramstoaccessmemoryandI/Oportsinbytesize
incrementsonly.Inotherwords,ifyouneedtocheckasinglebitofanI/Oport,youmustreadtheentirebytefirstandthenmanipulatethewholebytewithsomelogicinstructionstogetholdofthedesiredbit.ThisisnotthecasewithARMmicroprocessors.Indeed,oneofthemostimportantfeaturesoftheCPUistheabilitytoaccesstheRAMandI/Oportsinbits insteadofbytes.This isavery importantandpowerful featureof theARMusedwidelyintheembeddedsystemdesignandapplications.InthissectionweshowaddressassignmentofbitsofI/OandRAM,inadditiontowaysofprogrammingthem.
Bit-addressable(bit-band)SRAM
Ofthe4GBmemoryspaceoftheARM,anumberofbytesarebit-addressable.TheARMliteraturereferstothisregionasthebit-bandregion.SincetheARMinstructionis32-bitandtheCPUexecutescodeoneinstructionatatime,thecode(FlashROM)spaceisalwayswordsizeandwordaligned,aswehaveseenthroughoutthechapters.Thatmeansthebit-addressableregionsmustbelocatedinSRAMandI/Operipheralsregions.Thebit-addressableRAMand I/O locations vary among the familymembers and vendors.TheARM generic manual defines the location addresses of bit-band as 0x20000000 to0x200FFFFFforSRAMand0x40000000to0x400FFFFFforperipherals.Noticetheyarelocatedatthelowest1MBaddressspaceofSRAMandperipherals.SeeTable6-1.Itmustbe alsonoted that thebit-band (bit-addressable) regions are theonly region that canbeaccessedinbothbitandbyte/halfword/wordformatswhiletheotherareaofmemorymustbeaccessedinbyte/halfword/wordsize.Inotherwords,thememoryspaceregionoutsideof the bit-band must always be accessed in byte/halfword/word formats only. For theARMCortex-M,thebit-bandSRAMhasaddressesof0x20000000to0x200FFFFF.This1Mbytesbit-addressableregionisgivenaliasaddressesof0x22000000to0x23FFFFFF.Therefore, thebitaddresses0x22000000 to0x2200001Fare for the firstbyteofSRAMlocation0x20000000,and0x22000020to0x2200003FarethebitaddressesofthesecondbyteofSRAMlocation0x20000001,andsoon.SeeFigure6-7.
Figure6-7:SRAMbit-addressableregionandtheiraliasaddresses
SinceeachbyteofSRAMhas8bitsweneedanaddressforeachbit.Thismeansweneedatleast8Maddresslocationstoaccess8Mbits,oneaddressforeachbit.However,tomaketheaddressesword-alignedtheARMprovides4-bytealiasaddressforeachbit.Forexample,0x22000000to0x2200001Fisassignedtoasinglebytelocationof0x20000000.That means we have 0x22000000 to 0x23FFFFFF (total of 32M locations, as aliasaddresses)for1Mbytesofaddress.
BitmapforSRAM
FromFigure6-7onceagainnoticethefollowingfacts:
1.Thebitaddress0x22000000isassignedtoD0ofSRAMlocation0x20000000.
2.Thebitaddress0x22000004isassignedtoD1ofSRAMlocation0x20000000.
3.Thebitaddress0x22000008isassignedtoD2ofSRAMlocationof0x20000000.
4.Thebitaddress0x2200000CisassignedtoD3ofSRAMlocationof0x20000000.
5.Thebitaddress0x22000010isassignedtoD4ofSRAMlocation0x20000000.
6.Thebitaddress0x22000014isassignedtoD5ofRAMlocation0x20000000.
7.Thebitaddress0x22000018isassignedtoD6ofSRAMlocation0x20000000.
8.Thebitaddress0x2200001CisassignedtoD7ofSRAMlocation0x20000000.
NoticethatSRAMlocations0x20000000–0x200FFFFFarebothbyte-addressableand bit-addressable. The only difference is when we access it in byte (or halfword orword)weuseaddresses0x20000000to0x200FFFFF,butwhentheyareaccessedinbit,theyareaccessedviatheiraliasaddressesof0x22000000to0x23FFFFFF.Thereasonit
is called aliases is because it is same physical location but accessed by two differentaddresses. It is like a same person but different names (aliases) See Examples 6-10through6-12.
Example6-10
ThegenericARMchiphasthefollowingaddressassignments.Calculatethespaceandtheamountofmemorygiventoeachregion.
(a)Addressrangeof0x20000000–200FFFFFforSRAMbit-addressableregion
(b)Addressrangeof0x22000000–23FFFFFFforaliasaddressesofbit-addressableSRAM
Solution:
(a)200FFFFF–20000000=FFFFFbytes.ConvertingFFFFFtodecimal,weget1,048,575+1=1,048,576,whichisequalto1Mbytes.
(b)23FFFFFF–22000000=1FFFFFFbytes.Converting1FFFFFFtodecimal,weget33,554,431+1=33,554,432,whichisequalto32Mbytes.
Example6-11
WriteaprogramtosetHIGHtheD6oftheSRAMlocation0x20000001usinga)byteaddressandb)thebitaliasaddress.
Solution:
a)
LDRR1,=0x20000001;loadtheaddressofthebyte
LDRBR2,[R1];getthebyte
ORRR2,R2,#2_01000000;makeD6bithigh
;(binaryrepresentationinKeilfor0b01000000)
STRBR2,[R1];writeitback
b)FromFigure6-7wehaveaddress0x22000038asthebitaddressofD6ofSRAM
location0x20000001.
LDRR1,=0x22000038;loadthealiasaddressofthebit
MOVR2,#1;R2=1
STRBR2,[R1];WriteonetoD6
Example6-12
WriteaprogramtosetLOWtheD0bitoftheSRAMlocation0x20000005usinga)byteaddressandb)thebitaliasaddress.
Solution:
a)
LDRR1,=0x20000005;loadtheaddressofbyte
LDRBR2,[R1];getthebyte
ANDR2,R2,#2_11111110;makeD0bitlow
STRBR2,[R1];writeitback
b)FromFigure6-7wehaveaddress0x220000A0asthebitaddressofD0ofSRAMlocation0x20000005.
LDRR2,=0x220000A0;loadthealiasaddressofthebit
MOVR0,0;R0=0
STRBR0,[R2];writezerotoD0
PeripheralI/Oportbit-addressableregion
The general purpose I/O (GPIO) and peripherals such as ADC, DAC, RTC, andserialCOMport arewidelyused in the embedded systemdesign. InmanyARM-basedtrainerboardsweseetheconnectionofLEDs,switches,andLCDtotheGPIOpinsoftheARM chip. In such trainers the vendor provides the details of I/O port and peripheralconnectionstotheARMchipinadditiontotheiraddressmap.Aswediscussedearlier,theARMhas set aside1Mbytesofaddress space tobeusedbit-band (bit-addressable) I/Oand peripherals. The address space assigned to bit-band peripherals and GPIO is0x40000000 to 0x400FFFFF with address aliases of 0x42000000 to 0x43FFFFFF.Examine your trainer board data sheet for the bit-band addresses implemented on theARMchip.
Figure6-8:Peripheralsbit-addressableregionandtheiraliasaddresses.
BitmapforI/Operipherals
FromFigure6-8onceagainnoticethefollowingfacts.
1.Thebitaddress0x42000000isassignedtoD0ofperipheralslocation0x40000000.
2.Thebitaddress0x42000004isassignedtoD1ofperipheralslocation0x40000000.
3.Thebitaddress0x42000008isassignedtoD2ofperipheralslocationof0x40000000.
4.Thebitaddress0x4200000CisassignedtoD3ofperipheralslocationof0x40000000.
5.Thebitaddress0x42000010isassignedtoD4ofperipheralslocation0x40000000.
6.Thebitaddress0x42000014isassignedtoD5ofperipheralslocation0x40000000.
7.Thebitaddress0x42000018isassignedtoD6ofperipheralslocation0x40000000.
8.Thebitaddress0x4200001CisassignedtoD7ofperipheralslocation0x40000000.
Whenaccessingaperipheralport in a single-bitmanner,wemustuse theaddressaliasesof0x42000000–0x43FFFFFF.Indeedthebit-addressableperipheralsarewidelyusedinembeddedsystemdesign.
ReviewQuestions
1.Trueorfalse.AllbytesofSRAMinARMarebit-addressable.
2.Trueorfalse.AllbitsoftheI/OperipheralsinARMarebit-addressable.
3.Trueorfalse.AllROMlocationsoftheARMarebit-addressable.
4.Ofthe4GbytesofmemoryintheARM,howmanybytesarebit-addressable?Listthem.
5.HowwouldyouchecktoseewhetherbitD0oflocation0x20000002ishighorlow?
6.Findouttowhichbyteeachofthefollowingbitsbelongs.GivetheaddressoftheRAMbyteinhex.
(a)0x23000030 (b)0x23000040 (c)0x23000048
(d)0x4200003C (e)0x43FFFFFC
Section6.4:AdvancedIndexedAddressingModeInpreviouschaptersweshowedhow tousea simple indexedaddressingmode in
STR, LDR, LDM and STM. The advanced addressing modes bring very importantadvantagesintheARMandwillbediscussedinthissection.
Indexedaddressingmode
Intheindexedaddressingmode,aregisterisusedasapointertothedatalocation.TheARMprovidesthreeindexedaddressingmodes.Thesemodesare:preindex,preindexwithwriteback,andpostindex.Table6-2summarizesthesemodes.Eachoftheseindexedaddressingmodecanbeusedwithoffsetoffixedvalueoroffsetofashiftedregister.SeeTable6-3.Inthissectionwewilldiscusseachmodeindetail.
IndexedAddressingMode Syntax PointingLocationin
MemoryRmValueAfterExecution
Preindex LDRRd,[Rm,#k] Rm+#k Rm
PreindexwithWB* LDRRd,[Rm,#k]! Rm+#k Rm+#k
Postindex LDRRd,[Rm],#k Rm Rm+#k
*WBmeansWriteback
**RdandRmareanyofregistersand#kisasigned12-bitimmediatevaluebetween-4095and+4095
Table6-2:IndexedAddressinginARM
Offset Syntax PointingLocation
Fixedvalue LDRRd,[Rm,#k] Rm+#k
Shiftedregister LDRRd,[Rm,Rn,<shift>] Rm+(Rnshifted<shift>)
*RnandRmareanyofregistersand#kisasigned12-bitimmediatevaluebetween-4095and+4095
**<shift>isanyofshiftsstudiedinChapter3likeLSL#2
Table6-3:OffsetofFixedValuevs.OffsetofShiftedRegister
Preindexedaddressingmodewithfixedoffset
In thisaddressingmode,a registerandapositiveornegative immediatevalueareused as a pointer to the data location. The value of register does not change afterinstructionisexecuted.ThisaddressingmodecanbeusedwithSTR,STRB,STRH,LDR,LDRB,andLDRH.SeeExample6-13.
Example6-13
WriteaprogramtostorecontentsofR5totheSRAMlocation0x10000000to0x10000000Fusingpreindexedaddressingmodewithfixedoffset.
Solution:
LDRR5,=0x55667788
LDRR1,=0x10000000
;loadtheaddressoffirstlocation
STRR5,[R1];storeR5tolocation0x10000000
STRR5,[R1,#4]
;storeR5tolocation0x10000000+4(0x10000004)
STRR5,[R1,#8]
;storeR5tolocation0x10000000+8(0x10000008)
STRR5,[R1,#0x0C]
;storeR5tolocation0x10000000+0x0C(0x1000000C)
NoticethatafterrunningthiscodethecontentofR1isstill0x10000000
Itisacommonpracticetousearegistertopointtothefirstlocationofthememoryspace and access the different locations using proper offsets. For example see thefollowingprogram:
ADRR0,OUR_DATA;pointtoOUR_DATA
LDRBR2,[R0,#1];loadR2withBETA
…
OUR_DATA
ALFADCB0x30
BETADCB0x21
Preindexedaddressingmodewithwrite-backandfixedoffset
Thisaddressingmode is likepreindexedaddressingmodewithfixedoffsetexceptthat the calculated pointer is written back to the pointing register. We put ! after theinstructiontotelltheassemblertoenablewritebackintheinstruction.SeeExample6-14.
Example6-14
RewriteExample6-13usingpreindexedaddressingmodewithwritebackandfixedoffset.
Solution:
LDRR1,=0x10000000;loadtheaddressoffirstlocation
STRR5,[R1];storeR5tolocation0x10000000
STRR5,[R1,#4]!
;storeR5tolocation0x10000000+4(0x10000004)
;writebackmakesR1=0x10000004
STRR5,[R1,#4]!
;storeR5tolocation0x10000004+4(0x10000008)
;writebackmakesR1=0x10000008
STRR5,[R1,#4]!
;storeR5tolocation0x10000008+4(0x1000000C)
;writebackmakesR1=0x1000000C
NoticethatafterrunningthiscodethecontentofR1is0x1000000C
Postindexedaddressingmodewithfixedoffset
Thisaddressingmodeislikepreindexedaddressingmodewithwritebackandfixedoffset except that the instruction is executed on the location that Rn is pointing toregardlessofoffset value.After running the instruction, thenewvalueof thepointer iscalculatedandwrittenbacktothepointingregister.Examinethefollowinginstructions:
STRR1,[R2],#4;storeR1ontomemorypointedtoby
;R2andthenwritebackR2+4toR2
LDRBR5,[R3],#1;loadabytefrommemorypointedto
;byR3andthenwritebackR3+1toR3
Noticethatwritebackisbydefaultenabledinpostindexedaddressingandthereisnoneed toput ! after instructionsbecausepostindexingwithoutwriteback isuselessas theindex is neither used in the instruction nor written back to the pointing register. SeeExample6-15.
Example6-15
RewriteExample6-13usingpostindexedaddressingmodewithfixedoffset.
Solution:
LDRR1,=0x10000000;loadtheaddressoffirstlocation
STRR5,[R1],#4;storeR5tolocation0x10000000andwriteback
;0x10000000+4(0x10000004)toR1
STRR5,[R1],#4;storeR5tolocation0x10000004andwriteback
;0x10000004+4(0x10000008)toR1
STRR5,[R1],#4;storeR5tolocation0x10000008andwriteback
;0x10000008+4(0x1000000C)toR1
STRR5,[R1],#4;storeR5tolocation0x1000000Candwriteback
;0x1000000C+4(0x10000010)toR1
NoticethatafterrunningthiscodethecontentofR1is0x10000010.
Preindexedaddressmodewithoffsetofashiftedregister
This advancedaddressingmode is avery important feature in theARM.We startdescribingthismodefromsimplecasewithnoshift(shift=0)andthenwewillfocusonmorecomplexformats.
Simpleformatofpreindexedaddressmodewithoffsetregister
ThefollowingisthesimplesyntaxforLDRandSTR.
LDRRd,[Rm,Rn];RdisloadedfromlocationRm+Rnofmemory
STRRs,[Rm,Rn];RsisstoredtolocationRm+Rnofmemory
Thisaddressingmodeiswidelyused in implementingofobjectorientedprogramswhenwewant tomakeadynamicarrayofvariables.Example6-16showshowweusethisaddressingmodeinaccessingdifferentlocationsofanarraydefinedinmemory.
Example6-16
ExaminethevalueofR5andR6aftertheexecutionofthefollowingprogram.
POINTERRNR2
ARRAY1RNR1
AREAEXAMPLE_6_16,CODE,READONLY
ENTRY
LDRARRAY1,=MYDATA
LDRBR4,[ARRAY1]
;loadPOINTERlocationofARRAY1toR4(R4= 0x45)
MOVPOINTER,#1;POINTER=1topointtolocation1ofarray
LDRBR5,[ARRAY1,POINTER]
;LoadPOINTERlocationofARRAY1toR5
;(R5 = 0x24)
MOVPOINTER,#2;POINTER=2topointtolocation2ofarray
LDRBR6,[ARRAY1,POINTER]
;loadPOINTERlocationofARRAY1toR6
;(R6 = 0x18)
HEREBHERE
MYDATADCB0x45,0x24,0x18,0x63
END
Solution:
AfterrunningtheLDRBR4,[ARRAY1]instruction,location0ofMYDATAisloadedtoR4.NowR4=0x45.
Next,afterrunningtheLDRBR5,[ARRAY1,POINTER]instruction,location1ofMYDATAisloadedtoR5.NowR5=0x24.
Next,afterrunningtheLDRBR6,[ARRAY1,POINTER]instruction,location2ofMYDATAisloadedtoR6.SothecontentofR6=0x18.
NoticethatpurposelyweusedDCBandLDRBinthisexamplesoeachlocationofMYDATAtakesonebyte.
InExample6-16,wepurposelyusedDCBandLDRBsoeachlocationofMYDATAtakesonebyte.IfwedefineanarrayusingDCD,thenwewillnotbeabletouseLDRBR5,[ARRAY1,POINTER] to load thePOINTERlocationofARRAY.SeeExample6-17forclarification.
Example6-17
InExample6-16,changeMYDATADCB0x45,0x24,0x18,0x63toMYDATADCD0x45,0x2489ACF5andexaminethevalueofR5andR6aftertheexecutionofthe
followingprogram.
POINTERRNR2
ARRAY1RNR1
AREAEXAMPLE_6_17,CODE,READONLY
ENTRY
LDRARRAY1,=MYDATA
LDRBR4,[ARRAY1];loadPOINTERlocationof
;ARRAY1toR4(R4= 0x45)
MOVPOINTER,#1;POINTER=1topointto
;location1ofarray
LDRBR5,[ARRAY1,POINTER]
;LoadPOINTERlocationofARRAY1toR5
MOVPOINTER,#2;POINTER=2topointto
;location2ofarray
LDRBR6,[ARRAY1,POINTER]
;loadPOINTERlocationofARRAY1toR6
HEREBHERE
MYDATADCD0x45,0x2489ACF5
END
Solution:
AfterrunningtheLDRBR4,[ARRAY1]instruction,location0ofMYDATAisloadedtoR4.NowR4=0x45.
Next,afterrunningtheLDRBR5,[ARRAY1,POINTER]instruction,location1ofMYDATAisloadedtoR5.NowR5=0x00.
Next,afterrunningtheLDRBR6,[ARRAY1,POINTER]instruction,location2ofMYDATAisloadedtoR6.SothecontentofR6=0x00.
Toaccess locations of aword size arraywehave tomultiply the pointer by four.Similarly,toaccesslocationsofahalf-wordsizearraywehavetomultiplythepointerbytwo.Forexample,wecancorrectprogramofExample6-17byreplacing
MOVPOINTER,#2;POINTER=2topointtolocation2ofarray
instructionwithfollowinginstructions:
MOVPOINTER,#2;POINTER=2
MOVPOINTER,POINTER,LSL#2;POINTERisshiftedlefttwobits(×4)
;topointtoword2ofthearray
Noticethatbyshiftingleftavaluetwobits,wemultiplyitbyfour.NextwewillseehowwecanuseindexedaddressingwithshiftedregisterstocombinemultiplicationwiththeLDRandSTRinstructions.
Generalformatofpreindexedaddressmodewithoffsetregister
ThegeneralformatofindexedaddressingwithshiftedregisterforLDRandSTRisasfollows:
LDRRd,[Rm,Rn,<shift>];(ShiftedRn)+Rmisusedasthepointer
STRRd,[Rm,Rn,<shift>];(ShiftedRn)+Rmisusedasthepointer
Intheaboveinstructions<shift>canbeanyofshiftinstructionsstudiedinChapter3suchasLSL,LSR,ASRandROR.Examinethefollowinginstructions:
LDRR1,[R2,R3,LSL#2];R2+(R3 × 4)isusedasthepointer
;contentoflocationR2+(R3×4)isloadedtoR1
STRR1,[R2,R3,LSL#1];R2+(R3×2)isusedasthepointer
;R1isstoredtolocationR2 + (R3×2)
STRBR1,[R2,R3,LSL#2];R2+(R3 × 4)isusedasthepointer
;firstbyteofR1isstoredtolocationR2+(R3×4)
LDRR1,[R2,R3,LSR#2];R2+(R3/4)isusedasthepointer
;contentoflocationR2+(R3/4)is
;loadedtoR1
From the above codeswe can see that indexed addressingwith shifted register ismostly used tomultiply thepointer by a powerof two and that iswhy it is also calledindexedaddressingwith scaled register.ExamineExample6-18 to seehowwecanusescaledregisterindexingtoaccessanarrayofwords.
Notice that scaled register indexing is not supported for halfword load and storeinstructions.
Example6-18
ExaminethevalueofR5andR6aftertheexecutionofthefollowingprogram.
POINTERRNR2
ARRAY1RNR1
AREAEXAMPLE_6_18,CODE,READONLY
ENTRY
LDRARRAY1,=MYDATA
MOVPOINTER,#0;POINTER=0
LDRR4,[ARRAY1,POINTER,LSL#2];
MOVPOINTER,#1;POINTER=1
LDRR5,[ARRAY1,POINTER,LSL#2];
MOVPOINTER,#2;POINTER=2
LDRR6,[ARRAY1,POINTER,LSL#2];
HEREBHERE
MYDATADCD0x45,0x2489ACf5,0x2489AC23
END
Solution:
AfterrunningtheLDRR4,[ARRAY1,POINTER,LSL#2]]instruction,location0of
MYDATAisloadedtoR4.NowR4=0x45.
Next,afterrunningtheLDRR5,[ARRAY1,POINTER,LSL#2]instruction,location1ofMYDATAisloadedtoR5.NowR5=0x2489ACF5.
Next,afterrunningtheLDRR6,[ARRAY1,POINTER,LSL#2]instruction,location2ofMYDATAisloadedtoR6.SothecontentofR6=0x2489AC23.
Writebacksign(!)inpreindexedloadandstorewithscaledregister
We can force all scaled register load and store instructions to writeback thecalculated pointer to the pointing register by putting ! after each load and storeinstructions.Examinethefollowinginstructions:
LDRR1,[R2,R3,LSL#2]!
;R2+(R3×4)isusedasthepointer
;contentoflocationR2+(R3×4)isloadedtoR1
;R2=R2+(R3×4)(R2isupdated.)
STRR1,[R2,R3,LSL#1]!
;R2+(R3×2)isusedasthepointer
;R1isstoredtolocationR2+(R3×2)
;R2=R2+(R3×2)(R2isupdated.)
Scaledregisterpostindex
The following instructionsare someexamplesof scaled registerpostindex in loadandstoreinstructions:
STRR1,[R2],R3,LSL#2
;storeR1atlocationR2ofmemoryandwriteback
;R2+(R3 × 4)toR2.
LDRR1,[R2],R3,LSL#2
;loadlocationR2ofmemorytoR1andwriteback
;R2+(R3 × 4)toR2.
Look-uptable
Oneuseofarrayisforimplementinglook-uptables.Thelook-uptableisawidelyusedconceptinmicrocontrollerprogramming.Itallowsaccesstoelementsofafrequentlyusedtablewithminimumoperations.Asanexample,assumethatforacertainapplicationwe need 4 + x2 values in the range of 0 to 9. To do so, the look-up table is stored inprogrammemoryspaceandaccessedasanarray.ThisisshowninExamples6-19through6-21.
Example6-19
WriteaprogramtogetthexvaluefromR9andsendx2+2x+3toR10.AssumeR9hasthexvalueof0–9.Usealook-uptableinsteadofamultiplyinstruction.
Solution:
AREALOOKUP_EXAMP6_19,READONLY,CODE
ENTRY
ADRR2,LOOKUP;pointtoLOOKUP
LDRBR10,[R2,R9];R10=locationR9oflookuparray
HEREBHERE;stayhereforever
LOOKUPDCB3,6,11,18,27,38,51,66,83,102
END
Example6-20
WriteaprogramtogetthexvaluefromR9andsendfactorialofxtoR10.AssumeR9hasthexvalueof0–10.Usealook-uptableinsteadofamultiplyinstruction.
Solution:
AREALOOKUP_EXAMP6_20,READONLY,CODE
ENTRY
MOVR9,#5
ADRR2,LOOKUP;pointtoLOOKUP
LDRR10,[R2,R9,LSL#2];R10=locationR9oflookuparray
HEREBHERE;stayhereforever
LOOKUPDCD1,1,2,6,24,120,720,5040,40320,362880,3628800
END
Example6-21
Writeaprogramthatcalculates10tothepowerofR2andstorestheresultinR3.AssumeR2hasthexvalueof0–6.Usealook-uptableinsteadofamultiplyinstruction.
Solution:
AREALOOKUP_EXAMP_6_21,READONLY,CODE
ENTRY
ADRR1,LOOKUP;pointtoLOOKUP
LDRR3,[R1,R2,LSL#2];R3=locationR2oflookuparray
HEREBHERE;stayhereforever
LOOKUPDCD1,10,100,1000,10000,100000,1000000
END
WritebackoptionsofSTMandLDM
TheSTMandLDMinstructionsallowyoutostoreandloadmultipleregisterswithasingleinstruction.Wecanalsospecifytheactiontobetakenforthepointer.Theactioncanbeincrementordecrementbeforeorafterthepopisdone.ThisisshowninTable6-4.
Option Description
IA IncrementAfter
IB IncrementBefore
DA DecrementAfter
DB DecrementBefore
Table6-4:OptionsforLDMandSTMinstructions
IA stands for IncrementAfter and adds four (the size of register in bytes) to thepointerafterloadorstoringeachregister.
IBstands for IncrementBeforeandadds four (the sizeof register inbytes) to thepointerbeforeloadorstoringeachregister.
DAstandsforDecrementAfterandsubtractsfour(thesizeofregisterinbytes)fromthepointerafterloadorstoringeachregister.
DB stands forDecrementBefore and subtracts four (the size of register in bytes)fromthepointerbeforeloadorstoringeachregister.
ForfurtherclarificationassumethatR1=0x100.Figure6-9showsthememoryafterrunningSTMR1!,{R2,R3}witheachofIA,IB,DAandDBoptions.
Figure6-9:FourOptionsofSTMandLDMinARM
Notice that generally we have four stack structure, either it is ascending ordescending.Thestackiscalledascendingwhenitisincrementedaftereachstore(PUSH)instruction and decremented after each load (POP) instruction. It is called descendingwhen it is decremented after each store (PUSH) instruction and incremented after eachload(POP)instruction.Thestackpointercanpointtothelastfilledlocation;inthiscasethestackiscalledFullStack.Thestackpointercanpointtothenextavailablelocation,aswell;whichiscalledanEmptyStack.SeeFigure6-10formoreclarification.
Figure6-10:FourGeneralStackStructure
To implement a Full Ascending stack we have to use STMIB because the stackpointershouldincrementonstoreinstructionanditshouldbeincrementedbeforestoringeach registerbecause it shouldpoint to full location.On theotherhandwehave touseLDMDA for pop instruction because the stack pointer should decrement on loadinstructionanditshouldbedecrementedbeforeloadingeachfulllocationbecauseitwaspointing to an empty locationbeforedecrementing.Table6-5 lists appropriate load andstore instruction for each stack structure. It is difficult and prone to error to rememberwhichloadandstoreshouldbeusedwitheachstackstructure.Tosolvethisproblem,eachofloadandstoreoptionshasalternatenamewhichiseasytorememberwhenitisusedforstackoperation.ThelasttwocolumnsofTable6-5listthealternatenames.
StackStructure Load StoreLoad
(alternateNames)
Store
(alternateNames)
FullAscending LDMDA STMIB LDMFA STMFA
FullDescending LDMIA STMDB LDMFD STMFD
EmptyAscending LDMDB STMIA LDMEA STMEA
EmptyDescending LDMIB STMDA LDMED STMED
Table6-5:OptionsforLDMandSTMinstructions
Program6-2usesSTMandLDMtosimplifyprogramofExample6-8.
Program6-2:UsingSTMFAandLDMFAforStack(RepeatofExample6-8)
;UsingFullAscendingLoadandStoreforstack
AREAPROG6_2,CODE,READONLY
ENTRY
LDRR13,=Stack_Top;loadSP
LDRR0,=0x125;R0=0x125
LDRR1,=0x144;R1=0x144
MOVR2,#0x56;R2=0x56
BLMY_SUB;callasubroutine
ADDR3,R0,R1;R3=R0+R1=0x125+0x144=0x269
ADDR3,R3,R2;R3=R3+R2=0x269+0x56=0x2BF
HEREBHERE;stayhere
;–––––––––
MY_SUB
;––—saveR0,R1,andR2onstackbeforetheyareusedbyaloop
STMFAR13,{R0-R2};saveR0,R1,R2onstackusingFullAscending
;––—R0,R1,andR2arechanged
MOVR0,#0;R0=0
MOVR1,#0;R1=0
MOVR2,#0;R2=0
;–––restoretheoriginalregisterscontentsfromstack
LDMFAR13,{R0-R2}
;restoreR0,R1,andR2fromstackusingF.Ascending
BXLR;returntocaller
END
ItmustbenotedthatARM CortexhasPUSHandPOPinstructions.SeeAppendixAformoreinformation.
ReviewQuestions
1.Trueorfalse.Inafullstackthepointerpointstothelaststoredlocation.
2.Trueorfalse.Inadescendingstackthepointerincrementsaftereachpush.
3.WriteaninstructionthatpushesLR(R14)intoanemptydescendingstack.
4.WriteaninstructionthatpopsR10fromafullascendingstack.
Section6.5:ADR,LDR,andPCRelativeAddressingIn indexedaddressingmodes,anyregisters includingthePC(R15)registercanbe
usedas thepointerregister.Forexample, thefollowing instructionreads thecontentsofmemorylocationPC+4:
LDRR0,[PC,#4]
Inthisway,thedatawhichhasaknowndistancefromthecurrentexecutinglinecanbe accessed. As discussed in Chapter 4, the PC register points 8 bytes (2 instructions)ahead of executing instruction. As a result, “LDR R0,[PC,#4]” accesses a memorylocationwhoseaddressis4+8bytesaheadofthecurrentinstruction.Generallyspeaking,theaddressofthememorylocationwhichisbeingaccessedusing“LDRR0,[PC,offset]”canbefoundusingthisformula:theaddressofcurrentinstruction+8+offset.
Forinstance,if“LDRR0,[PC,#4]”islocatedinaddress0x10theeffectiveaddressis:0x10+8+4=0x1C.
Thisaddressingmodecanbeconsideredasasubsetofindexedaddressingmodes.InARMapplicationnotes,theprogramrelativeaddressingmodeisconsideredasaseparateaddressingmode.
ImplementingtheADRDirective(LDRRn,=value)
The ADR directive uses the PC relative addressing mode to load registers. Forexampleseethefollowingprogram:
AREALOOKUP_EXAMPLE,READONLY,CODE
ENTRY
ADRR2,OUR_FIXED_DATA;pointtoOUR_FIXED_DATA
LDRBR0,[R2];loadR0withthecontents
;ofmemorypointedtobyR2
ADDR1,R1,R0;addR0toR1
HEREBHERE;stayhereforever
OUR_FIXED_DATA
DCB0x55,0x33,1,2,3,4,5,6
DCD0x23222120,0x30
DCW0x4540,0x50
END
SeeFigure6-11.Atcompiletime,theADRisreplacedwith“ADDR2,PC,#0x08”.Sincetheinstructionisinaddress0x00,theinstructionaccesseslocation0+8+0x08=0x10.AsshownintheFigure,0x10istheaddressofOUR_FIXED_DATA.
Figure6-11:MemoryDumpforADRInstruction
ImplementingtheLDRDirective
To implement the LDR directive, assembler stores the value as a fixed data inprogram memory and then accesses it using the LDR instruction and the PC relativeaddressingmode. Figure 6-12 shows the implementation of Program6-3. For example,0x12345678isstoredinmemorylocations0x10–0x13,andtheLDRdirectiveisreplacedwithLDRR0,[PC,#0x08].SincePC=0, theLDRR0,=0x1234567 is located at address0x000000.Nowwehave0+8+8=16=0x10.
Program6-3:LDRDirective
AREAEXAMPLE,READONLY,CODE
ENTRY
LDRR0,=0x12345678
LDRR1,=0x86427531
ADDR2,R0,R1
H1BH1
END
Figure6-12:MemoryDumpforLDRInstruction
The same way for the LDR R1,=0x86427531 is located in ROM address0x00000004. Therefore we have 4+8+8=20=0x14, which is the address of the data0x86427531.
ReviewQuestions
1.WhichregisterisusedasthepointerinPCrelativeaddressingmode?
2.WhichdirectiveismoreoptimizedADRorLDR?Why?
ProblemsSection6.1:ARMMemoryMapandMemoryAccess
1.Whatisthebusbandwidthunit?
2.Givethevariablesthataffectthebusbandwidth.
3.Trueorfalse.Onewaytoincreasethebusbandwidthistowidenthedatabus.
4.Trueorfalse.Anincreaseinthenumberofaddressbuspinsresultsinahigherbusbandwidthforthesystem.
5.Calculatethememorybusbandwidthforthefollowingsystems.
(a)ARMof100MHzbusspeedand0WS
(b)ARMof80MHzbusspeedand1WS
6.Indicatewhichofthefollowingaddressesiswordaligned.
(a)0x1200004A (b)0x52000068 (c)0x66000082
(d)0x23FFFF86 (e)0x23FFFFF0 (f)0x4200004F
(g)0x18000014 (h)0x43FFFFF3 (i)0x44FFFF05
7.Showhowdataisplacedafterexecutionofthefollowingcodeusing(a)littleendianand(b)bigendian.
LDRR2,=0xFA98E322
LDRR1,=0x20000100
STR[R1],R2
8.Trueorfalse.InARM,instructionsarealwayswordaligned.
9.Trueorfalse.Inawordalignedaddressthelowerdigitoftheaddressis0,4,8,orC.
10.Showhowmanymemorycyclesdoesittaketofetchthefollowingdataintoregister
LDRR1,=0x20000004
LDRD[R1],R2
11.Showhowmanymemorycyclesdoesittaketofetchthefollowingdataintoregister
LDRR1,=0x20000102
LDRD[R1],R2
12.Showhowmanymemorycyclesdoesittaketofetchthefollowingdataintoregister
LDRR1,=0x20000103
LDRD[R1],R2
13.Showhowmanymemorycyclesdoesittaketofetchthefollowingdataintoregister
LDRR1,=0x20000006
LDRH[R1],R2
14.Showhowmanymemorycyclesdoesittaketofetchthefollowingdataintoregister
LDRR1,=0x20000C10
LDRB[R1],R2
Section6.2:StackandStackUsageinARM
15.Trueorfalse.InARMtheR13isdesignatedasstackpointer.
16.WhenBLisexecuted,howmanylocationsofthestackareused?
17.WhenBisexecuted,howmanylocationsofthestackareused?
18.InARM,stackpointeris_________register.
19.DescribehowtheactionassociatedwiththereturnoperationisperformedinARM.
20.GivethesizeofthestackinARM.
21.InARM,whichaddressissavedwhenBLinstructionisexecuted.
Section6.3:ARMBit-AddressableMemoryRegion
22.WhichmemoryregionsofARMarebit-addressable?
23.Givethebit-addressableSRAMregionaddressforgenericARM.
24.Whatbitaddressesareassignedtobyteaddressof0x20000004?
25.Whatbitaddressesareassignedtobyteaddressof0x20000010?
26.Whatbitaddressesareassignedtobyteaddressof0x200FFFFF?
27.Whatbitaddressesareassignedtobyteaddressof0x20000020?
28.Whatbitaddressesareassignedtobyteaddressof0x40000008?
29.Whatbitaddressesareassignedtobyteaddressof0x4000000C?
30.Whatbitaddressesareassignedtobyteaddressof0x40000020?
31.Thefollowingarebitaddresses.Indicatewhereeachonebelongs.
(a)0x2200004C (b)0x22000068 (c)0x22000080
(d)0x23FFFF80 (e)0x23FFFF00 (f)0x4200004C
(g)0x42000014 (h)0x43FFFFF0 (i)0x43FFFF00
32.Ofthe4GbytesofmemorylocationsintheARM,howmanyofthemarealsoassignedabitaddressaswell?Indicatewhichbytesthoseare.
33.Trueorfalse.Thebit-addressableregioncannotbeaccessinbyte.
34.Trueorfalse.Thebit-addressableregioncannotbeaccessinword.
35.WriteaprogramtoseewhethertheD7bitofRAMlocation0x20000020ishigh.Ifso,senda1toD1ofRAMlocation0x20000000.
36.WriteaprogramtoseewhethertheD7bitofI/Olocation0x40000000islow.Ifso,senda0totheD0oflocation0x400FFFFF.
37.WriteaprogramtosethighallthebitsofRAMlocations0x2000000usingthefollowingmethods:
(a)byteaddresses(b)bitaddresses
38.WriteaprogramtoseewhethertheSRAMlocation0x20000000isdivisibleby8.
39.ExplainhowtheLDMinstructionworks.
40.ExplainhowtheSTMinstructionworks.
41.ExplainthedifferencebetweenLDMandLDRinstructions.
42.ExplainhowthedifferencebetweenSTMandSTRinstructions.
43.ExplaintheLDMIAoperationanditsimpactontheSP.
44.ExplaintheLDMIBoperationanditsimpactontheSP.
45.ExplaintheSTMIAoperationanditsimpactontheSP.
46.ExplaintheSTMIBoperationanditsimpactontheSP.
Section6.4:AdvancedIndexedAddressingMode
47.Trueorfalse.Writebackisbydefaultenabledinpreindexedaddressingmode.
48.Indicatetheaddressingmodeineachofthefollowinginstructions
(a)LDRR1,[R5],R2,LSL#2(b)STRR2,[R1,R0]
(c)STRR2,[R1,R0,LSL#2]!(d)STRR9,[R1],R0
49.Whatisanascendingstack?
50.Whatisthedifferencebetweenanemptyandafullstack?
51.WriteaninstructionthatstoresR0inafulldescendingstack.
52.WriteaninstructionthatloadsR9fromanemptydescendingstack.
Section6.5:ADR,LDR,andPCRelativeAddressing
53.Assumingthattheinstruction“LDRR2,[PC,#8]islocatedinaddress0x300,
calculatetheaddressofthememorylocationwhichisaccessed.
54.UsingPCrelativeaddressingmode,writeanLDRinstructionthataccessesamemorylocationwhichis0x20bytesaheadofitself.
AnswerstoReviewQuestionsSection6.1
1.4bytes
2.Compilersensurethatcodesarewordaligned.
3.littleendian
4.1/66MHz=15.15nsisthebusclockperiod.Sincethebuscycletimeofzerowaitstatesis2clocks,wehave2×15.15=30.3ns
5.1/100MHz=10nsisthebusclockperiod.50ns-10ns=40ns.TheNumberofWSis40ns/10ns=4.
6.False
Section6.2
1.R13
2.ThestackcanbeasbigasitsRAM
3.STMR13,{R5-R8}
SUBR13,R13,#16
4.ADDR13,R13,#16
LDMR13,{R5-R8}
5.Itcopiesthecontentsoflocations0x40000000–0x4000000Fintolocations0x50000000–0x5000000FusingtheLDMandSTMinstructions.
Section6.3
1.False
2.False
3.False
4.2MBytes;locations0x20000000to0x200FFFFFofSRAMand0x40000000to0x400FFFFFofGPIO
5.
LDRR0,=0x22000040
LDRR1,[R0]
CMPR1,#0
BNEL1
…
L1:
6.
(a)0x23000030-0x22000000=0x1000030;0x1000030/0x20=0x80001;thusitisinlocation0x20000000+0x80001=0x20080001
(0x1000030%32)/4=(48%32)/4=16/4=4;itisD4of0x20080001.
(b)0x1000040/0x20=0x80002;itisinlocation0x20080002
(0x1000040%0x20)/4=0;itisD0oflocation0x20080002
(c)0x1000048/0x20=80002;itisinlocation0x20080002
(0x1000048%0x20)/4=2;itisD2oflocation0x20080002
(d)0x4200003C-0x42000000=0x03C;0x03C/0x20=0x01
(0x3C%0x20)/4=0x1C/4=7;itisD7of0x40000001
(e)0x43FFFFFC–0x42000000=0x1FFFFFC;0x1FFFFFC/0x20=0xFFFFF
(0x1FFFFFC%0x20)/4=0x1C/4=7;D7of0x400FFFFF
Section6.4
1.True
2.False
3.STMEDSP!,{LR}
4.LDMFASP!,{R10}
Section6.5
1.PC(R15)
2.ADR,ToimplementtheLDRdirectivethevalueisstoredinmemory;asaresult,itusesmorememorywhiletheADRusesnomemory.
Chapter7:ARMPipelineandCPUEvolutionThis chapterwill look at pipeline evolution inARMwhile examining otherCPU
enhancements. In Section 7.1 the ARM’s pipelines are studied. Section 7.2 exploresvariousprocessorsenhancements.
Section7.1:ARMPipelineEvolutionThere aremanyways available to processor designers to increase the processing
poweroftheCPU.HerewelistsomeofthemthatareusedinARM.
1. Increase theclockfrequencyof thechip.Onedrawbackof thismethod is that thehigher the frequency, the more the power dissipation and the more difficult andexpensivethedesignofthemicroprocessorandmotherboard.
2.Increasethenumberofdatabusestobringmoreinformation(codeanddata)intotheCPU to be processed. For example, Von Neumann architecture in ARM7 has beenreplacedbyHarvardarchitectureinnewerversions.SeeChapter6.
3. Change the internal architecture of the CPU to overlap the execution of moreinstructions. This requires a lot of transistors. There are two trends for this option,superpipelineandsuperscalar.Insuperpipelining,theprocessoffetchingandexecutinginstructionsissplitintomanysmallstepsandallaredoneinparallel.Inthiswaytheexecution of many instructions is overlapped. The number of instructions beingprocessedatagiventimedependsonthenumberofpipelinestages,commonlytermedthe pipeline depth. Some designers use as many as 8 stages of pipelining. Onelimitationofsuperpipeliningisthatthespeedoftheexecutionislimitedtothesloweststage of the pipeline. Compare this to making pizza. You can split the process ofmakingpizzaintomanystages,suchasflatteningthedough,puttingonthetoppings,andbaking,buttheprocessislimitedtothesloweststage,baking,nomatterhowfastthe rest of the stages areperformed.Whathappens ifweuse twoor threeovens forbakingpizzas tospeedup theprocess? Thismayworkformakingpizzabutnot forexecutingprograms,sinceintheexecutionofinstructionswemustmakesurethatthesequenceof instructions iskept intact and that there isnoout-of-stepexecution.Thedifficultiesassociatedwithastalledpipeline(aslowdowninonestageofthepipeline,which prevents the remaining stages from advancing) has made CPU designersabandonsuperpipelininginfavorofsuperscaling.Insuperscaling,theentireexecutionunit has beendoubled and eachunit has 5 pipeline stages.Therefore, in superscalar,there is more than one execution unit and each has many stages, rather than oneexecution unit with 8 stages as in the case of a superpipelined processor. In somesuperscalarprocessors,therearetwoexecutionunitseachwith4pipelinestagesinsteadofasingleexecutionunitwith8pipelinestagesassuperpipeliningproponentswouldhave it. Inotherwords, insuperscalingwehave two(oreven three)executionunitsandastheinstructionsarefetchedtheyareissuedtothevariousexecutionunits.Usingtheanalogyofpizza,superscalarislikedoublingortriplingtheentirecrewflatteningthedough,puttingtoppingson,andbaking.Ofcourse,youwillneedalotmorepeopleinvolvedintheprocessandyouhavetohavemoreovens,butatthesametimeyouaredoublingortriplingthepizzaoutput.Incasesofrecentmicroprocessorarchitecture,avastmajorityofdesignershavechosensuperscalingoversuperpipelining.Thisrequiresnumeroustransistorstoduplicateseveralexecutionunits,justlikeneedingmorepeoplein our pizza-making analogy. Fortunately, advances in IC design have alloweddesigners access to hundreds of million transistors to throw around for the
implementationofpowerfulsuperscaling.Therearesomeproblemswithsuperscaling,suchasdatadependencyissues,whichcanbesolvedbythecompiler.
4.Combiningmorethanonecoreinasingleprocessorisanotherwayofimprovingthespeed of high end processers. Cortex-A series ofARM supports up to four cores incombinationwitheachother.
Next,wewillexaminetheissueofpipelining.SeeFigure7-1.
Figure7-1:Non-PipelinedInstructionExecutionvs.2-stagePipeline(8086)
Moreaboutpipelining
IntheearlyCPUstherewasnopipelining.Atanygivenmoment,iteitherfetchedorit executed. It couldnotdobothat the same time. In thenon-pipelinedCPU,while thebuseswerefetchingtheinstructions(opcodes)anddata,theCPUwassittingidle,andinthesameway,whentheCPUwasexecutinginstructions,busesweresittingidle.However,intheearlypipelinedCPUsuchas8086thefetchandexecutewereperformedinparallelby two sections inside the CPU called the BIU (bus interface unit) and EU (executionunit).
For the concept of pipelining to work, we need a buffer or queue in which aninstructionisprefetchedandreadytobeexecuted.Insomecircumstances,theCPUmustflush out the queue. For example, when a branch (B, BNE, BCS, and so on) or callinstructionisexecuted,theCPUstartstofetchcodesfromthenewmemorylocationandthecodeinthequeuethatwasfetchedpreviouslyisdiscarded.Inthiscase,theexecutionunit must wait until the fetch unit fetches the new instruction. This is called a branchpenalty.Thepenalty isanextra instructioncycle to fetch the instruction from the targetlocation insteadof executing the instruction rightbelow thebranch.Remember that theinstructionbelowthebranchhasalreadybeenfetchedand isnext in line tobeexecutedwhentheCPUbranchestoadifferentaddress.NotethatnewerCPUshavemorestagesintheir pipeline. For example a 3 stage pipelinemay divide the code execution to Fetch,
Decode and Execute stages.When the number of stages in a pipeline increases, morestagesshouldbeflushedoutwhenabranchinstructionisexecuted.ExamineExample7-1toseehowbranchpenaltyslowsdowntheexecutionofacode.Next,youwill seehowbranchpredictionsolvestheproblem.
Example7-1
Howmanycyclesdoesittakefora3stagepipelinedCPUtorun3iterationofthefollowingcode?
MOVR1,#0
L1ADDR2,R2,#1
BL1
MOVR3,#3
MOVR4,#4
Solution:
For the first instruction (MOV R1,#0), it takes 3 cycles to pass through the stages ofpipelineandbeexecuted.Afterthethirdcycle,oneinstructionisexecutedineachcycle.WhentheBranchinstructionisexecutedincycle5,theCPUflushesthepipelinebecausethe fetch and decode instruction in cycles 4 and 5 are not needed. It causes two clockcyclesbranchpenalty.ThesamescenariohappenseachtimetheCPUexecutesabranch.Aswecanseeinthefigure,Ittakes13cyclestorun7instructions.
Branchprediction
Branch prediction is another new feature of the new CPUs. The penalty forbranchingisveryhighforahigh-performancepipelinedmicroprocessorsuchastheARM.Forexample, inthecaseoftheBNE(branchifnotequal) instruction, if itbranches, the
pipelinemustbeflushedandrefilledwithinstructionsfromthetargetlocation.Thistakestime.Incontrast,theinstructionimmediatelybelowtheBNEisalreadyinthepipelineandisadvancingwithoutdelay.Someprocessorshave thecapability topredict andprefetchcodefrombothpossible locationsandhavethemadvancedthroughthepipelinewithoutwaiting (stalling) for the outcome of the zero flag. The ability to predict branches andavoidthebranchpenaltycanresultinasubstantialreductionintheclockcountforagivenprogram. Some CPUs have branch prediction, but with greater capability. When itencountersbranchinstructions(suchasBNE),itcreatesalistoftheminwhatiscalledthebranchtargetbuffer(BTB).TheBTBpredictsthetargetofthebranchandstartsexecutingfromthere.Whenthebranchisexecuted,theresultiscomparedwithwhatthepredictionsection of the CPU said it would do. If they match, the branch is retired. If not, allinstructions behind the branch are removed from the pool and the correct branch targetaddressisprovidedtotheBTB.FromtheretheBTBrefillsthepipelinewithinstructionsfromthenewtargetaddress.SeeExample7-2.
Example7-2
Showhowmanycyclesdoesittakefora3-stagepipelinedCPUtorun3iterationofthecodeinExample7-1?Assumethatthebranchpredictionunithaspredictedallbranches.
Solution:
Ittakes9cyclestorun7instructions.Incycles4to8instructionsarepredictedbybranchpredictionunit.
Notethatstoresareneverperformedspeculativelysincethereisnotransparentwaytoundo them.Storesarealsonever re-orderedamong themselves.Astore isdispatchedonly when both the address and the data are available and there are no older storesawaitingdispatch.
3-stagepipelineinARM7
Sincetheintroductionofthe8086microprocessorin1978,processordesignershavecometorelymoreandmoreontheconceptofpipeliningtoincreasetheprocessingpower
oftheCPU.ARM7usedtheconceptofpipeliningwiththreestagesoffetch,decode,andexecute.SeeExample7-3.
Example7-3
ShowhowthefollowingcodeisexecutedinARM7.
MOVR4,R5
ADDR1,R2,R3
SUBR6,R7,R8
Solution:
5-stagepipelineinARM9
AswementionedearliertheARM7hasa3-stagepipeline.AsshowninFigure7-2,theARM9hasextendedthepipelineto5stages.Theyare:
1.Fetch
2.Decode
3.Execute
4.Memory
5.Write
Figure7-2:5-StagePipelineinARM9
Fetch:IntheFetchstagetheinstructionsarefetchedfrommemoryandplacedinthequeueandwaittobedecoded.InthisstagetheProgramCounter(PC)isalsoincrementedby4sincetheARMinstructionsare4-bytes.
Decode: In thedecodestage the instruction isdecodedand theregister file isalso
accessedtogeteverythingreadyfortheExecutestage.
Execute:Inthisstage,anyeffectiveaddresscalculationandsignextendingofabyteor a half-word are done. In the instructions such as Load and Store this stage getseverything ready for the next stage of memory access. In instructions such as “ADDR1,R2,R3”inwhichalltheresourcesneededfortheexecutionoftheinstructionarereadybeforeitcomestothisstage,theregistersareaddedanditgoesdirectlytothewrite-backstagetowritetheresulttotheregisterfile.
Memory: For instructions such as Load and Store in which external memoryaccessesareneeded,thememorystagefetchesthedatafromtheexternalmemoryandhasthedatainsidetheCPUreadyforthenextstageofthewrite-back.Ifaninstructiondoesnotneedtoaccessmemory, thisstageisbypassedandtheresult isforwardedtothelaststageofwrite-back.Thismeansitbecomesa4-stagepipeline.
Write:Alsocalledwrite-backisthestageinwhichtheinstructioniscompletedbywriting the result to the register fileand retiring the instruction.Aswe just stated, ifaninstructiondoesnotneedtoaccessmemory,write-backisthestagerightaftertheexecutestage,meaningformanyinstructionswereallyhave4-stagepipeline.
3-stagevs.5-stagepipeline
Inthe3-stagepipelineofARM7,theexecution,thememoryaccess,andthewritingtheresulttoregisterfileareallperformedbytheExecutestage.Inthe5-stagepipeline,theCPUdecouplesthememoryaccessandexecutestage.Withthetwonewstagesofmemoryandwrite-back, theARM9 increases the processing power of theCPUby allowing theCPUtoworkconcurrentlyon5instructionsinsteadof3instructionsatagiventime.ThisisamajorenhancementoftheARM9overARM7.
ReviewQuestions
1.Whatissuperpipelining?
2.Whatisthelimitationofsuperpipeline?
3.Whatissuperscaling?
4.Trueorfalse.The5-stagepipelinehasbetterperformancethanthe3-stagepipeline.
5.Givethenamesofthe5-stagepipelineinARM9.
Section7.2:OtherCPUEnhancementsThere aremany otherways available tomicroprocessor designers to increase the
processingpoweroftheCPU.Next,weexaminesomeofthem.
SuperscalarCPUs
AnotheruniquefeatureofthemanyofnewCPUsisitssuperscalararchitecture.AlargenumberoftransistorsareusedtoputmorethanoneexecutionunitinsidetheCPU.Astheinstructionsarefetched,theyareissuedtotheseexecutionunits.Figure7-3showstheconceptofsuperscalar.
Figure7-3:SuperscalarCPUs
Issuingtwoinstructionsatthesametimetodifferentexecutionunitscanworkonlyiftheexecutionofonedoesnotdependontheotherone,inotherwords,ifthereisnodatadependency.Asanexample,lookatthefollowinginstructions.
ADDR1,R2,R3
SUBR4,R1,R5
ANDR6,R7,R8
MOVR9,R10
Intheabovecode,theADDandSUBinstructionscannotbeissuedtotwoexecutionunitssinceR1,thedestinationofthefirstinstruction,isusedimmediatelybythesecondinstruction.Thisiscalledread-after-writedependencysincetheSUBinstructionwantstoreadtheR1contents,but itmustwaituntilafter theADDisfinishedwritingit intoR1.TheproblemisthatADDwillnotwriteintoR1untilthelaststageofthepipeline,andbythen it is too late for the pipeline of the SUB instruction. This prevents the SUBinstructionfromadvancinginthepipeline,thereforecausingthepipelinetobestalleduntiltheADDfinisheswritingandthentheSUBinstructioncanadvancethroughthepipeline.ThiskindofdatadependencyraisestheclockcountfortheSUBinstruction.Whatiftheinstructionsarerescheduled?Wewilldiscussoutoforderexecutionnextinthischapter.
ADDR1,R2,R3
ANDR6,R7,R8
SUBR4,R1,R5
MOVR9,R10
If they are rescheduled as shownabove, each canbe issued to separate executionunits,allowingparallelexecutionofbothinstructionsbytwodifferentunitsoftheCPU.Since the clock count for each instruction is one, having two execution units leads toexecuting two instructionsbypairing themtogether, therebyusingonlyoneclockcountfor two instructions. In the case of the above program, if it is run on the CPU withsuperscalar it will take only 2 clocks instead of 4, assuming that two instructions arepaired together. This reordering of instructions to take advantage of the two internalexecutionunitsoftheCPUcanbedonebycompilerorCPU itselfandiscalledinstructionscheduling.Currently,somecompilersarebeingequippedtodoinstructionschedulingtoremovedependencies.Theprocessofissuingtwoinstructionstothetwoexecutionunitsiscommonlyreferredtoasinstructionpairing.
Superpipelinedandsuperscalar
Somemicroprocessors use a 10-stage pipeline for the CPU. In contrast to the 5-pipestage,althougheachpipestageofthe10-pipestageperformslesswork,therearemorestages. This means that in such processors, more instructions can be worked on andfinished at a time. These CPUs with their 10- or 12-stage pipeline are referred to assuperpipelined. Since they also have multiple execution units capable of working inparallel,theyarealsosuperscalar.Anotheradvantageofthesuperpipelinedconceptisthatitcanachieveahigherclockrate(frequency)withthegiventransistor technology.Theyalso usewhat is called out-of-order execution to increase the performance of theCPU.Thisisexplainednext.
Decouplingandout-of-orderexecution
InCPUarchitecture,whenoneof thepipelinestages isstalled, thepriorstagesoffetchanddecodearealsostalled.Inotherwords,thefetchstagestopsfetchinginstructionsif the execution stage is stalled, due, for example, to a delay in memory access. Thisdependency of fetch and execution has to be resolved in order to increase CPUperformance.ThatisexactlywhatmanydesignershavedonewiththeCPUandiscalleddecoupling the fetch and execution phases of the instructions. In these processors,instructionsarefetchedfrommemoryandplacedintoapoolcalledthe instructionpool.SeeFigure 7-4.
Figure7-4:CPUInstructionExecution
Thisfetch/decodeoftheinstructionsisdoneinthesameorderastheprogramwas
codedbytheprogrammer(orcompiler).However,whentheyareplacedintheinstructionpool theycanbeexecuted inanyorderas longas thedataneeded isavailable. Inotherwords,ifthereisnodependency,theinstructionsareexecutedoutoforder,notinthesameorder as the programmer coded them. Such a speculative execution can go 20–30instructionsdeepintotheprogram.Itisthejoboftheretireunittoprovidetheresultstothe programmer’s (visible) registers (e.g., R0, R1) according to the order in which theinstructionswerecoded.Again,itisimportanttonotethattheinstructionsarefetchedinthesameorderthattheywerecoded,butexecutedoutoforderifthereisnodependency,andultimatelyretiredinthesameorderastheywerecoded.Thisout-of-orderexecutioncanboostperformanceinmanycases.LookatExample7-4.
Example7-4
ThefollowingARMcode(a)setsthepointerforthreedifferentarrays,andthecountervalue,(b)getseachelementofARRAY_1,addsafixedvalueof100toit,andstorestheresultinARRAY_2,and(c)complementstheelementandstoresitinARRAY_3.Analyzetheexecutionofthecodeinlightoftheout-of-orderexecutionandbranchpredictioncapabilitiesofanARMCPU.
(i1)LDRR1,=ARRAY_1;loadpointer
(i2)LDRR2,=ARRAY_2;loadpointer
(i3)LDRR3,=ARRAY_3;loadpointer
(i4)MOVR4,#COUNT;loadthecounter
(i5)AGAINLDRR5,[R1];loadtheelement
(i6)ADDR5,R5,#100;addthefixvalue
(i7)ADDR1,R1,#4;updatethepointer
(i8)STRR5,[R2];storetheresult
(i9)ADDR2,R2,#4;updatethepointer
(i10)MVNR5,R5;complementtheresult
(i11)STRR5,[R3];andstoreit
(i12)ADDR3,R3,#4;updatethepointer
(i13)SUBSR4,R4,#4;
(i14)BNEAGAIN;stayintheloop
(i15)
Solution:
Thefetch/decodeunitfetchesandputsinstructionsintothepool.Sincethereisnodependencyforinstructionsi1throughi5,theyaredispatched,executed,andretiredexceptfori5.Noticethatthepointervaluesofi1toi4areimmediatevalues;therefore,theyareembeddedintotheinstructionwhenthefetch/decodeunitgetsthem.Nowi5isamemoryfetchthatcantakemanyclocks,dependingonwhethertheneededdataislocatedincacheormainmemory.Meanwhilei6,i8,i10,andi11mustwaituntilthedataisavailable.Howeveri7,i9,andi12canbeexecutedoutoforder.Moreimportantly,theBNEinstructionispredictedtogotothetargetaddressofAGAINandi5,i6,…aredispatchedoncemoreforthenextiteration.Thistimethememoryfetchwilltakeveryfewclockssinceinthepreviousdatafetch,theCPUreadsomebytesofdataintothecache.ThisprocesswillgoonuntilthelastroundofloopingwhereR4becomeszeroandfallsthrough.Atthistime,duetomisprediction,alltheinstructionsbelongingtoinstructionsi5,i6,i7,…(startoftheloop)areremovedandthewholepipelinerestartswithinstructionsbelongingtoi15,i16,andsoon.
Due to the fact that memory fetches (due to cachemisses) can takemany clockcyclesandresultinunderutilizationoftheCPU,out-of-orderexecutionisawayoffindingsomething to do for theCPU.Simply put, the idea of out-of-order execution is to lookdeepintothestreamofinstructionsandfindtheonesthatcanbeexecutedaheadofothers,providedthatresourcesareavailable.Again,it isimportanttonotethattheseprocessorswillnotimmediatelyprovidetheresultsofout-of-orderexecutionstoprogrammer-visibleregisterssuchasR0,R1,andsoon,sinceitmustmaintaintheoriginalorderofthecode.Instead,theresultsofout-of-orderexecutionsarestoredinthepoolandwaittoberetiredinthesameorderastheywerecoded.Therefore,programmer-visibleregistersareupdatedinthesamesequenceasexpectedbytheprogrammer.
Registerrenaming
Therearesomecases inwhich instructionsarenotreallydependentoneachotherbutthereisakindofimplicitdependencycalledregisterdependency.SeeExample7-5.
Example7-5
Forthefollowingcode,indicatetheinstructionsthatcanbeexecutedinparalleloroutoforder.
(i1)LDRR4,[R2];loadR4frommemorypointedtobyR2
(i2)ADDR3,R4,R7;R4+R7–>R3
(i3)ADDR6,R8,R10;R8+R10–>R6
(i4)SUBR5,R1,R9;R1-R9–>R5
(i5)ADDR6,R12,#1;R12+1–->R6
Solution:
Instructioni2cannotbeexecuteduntilthedataisbroughtinfrommemory(eithercacheormainmemoryDRAM).Therefore,i2isdependentoni1andmustwaituntiltheR4registerhasthedata.However,instructionsi3andi4canbeexecutedoutoforderandparallelwitheachothersincethereisnodependencyamongthem.Noticethati5isnotreallydependantoni3becausei5doesnotuseanyofdatageneratedbyi3.Buti5andi3cannotbeexecutedoutoforderorinparallelbecauseR6ismodifiedbybothofi3andi5.Thiskindofdependencyiscalledregisterdependencyandissolvedbyamethodcalledregisterrenaming.
InthefollowingcodenoneoftheinstructionscanbeexecutedinparallelbecauseofusingR1inallinstructions:
MOVR1,#5
ADDR3,R1,#2
MOVR1,#6
ADDR4,R1,#2
Ifyouexaminetheabovecodecarefullyyouwillseethatthefirsttwolinesofcodeare independent from the second two lines of code and we can remove the implicitdependencybychangingR1toanotherregistersuchasR2inthelasttwolinesofcode:
MOVR1,#5
ADDR3,R1,#2
MOVR2,#6
ADDR4,R2,#2
Renaming the registersbefore issuing the instructions toexecutionunit isdone inmanyofnewadvancedCPUanditiscalledregisterrenaming.
PuttingthemalltogetherinanARMCPU
InFigure7-5youcanseeatop-leveldiagramoftheARMCortexA9processor.Ithasmostofthepartsdiscussedinthischapter.
Figure7-5:Top-leveldiagramoftheARMCortexA9processor
Busfrequencyvs.internalfrequencyinCPU
Frequently you may see an advertisement for a 1-GHz or 2-GHz CPUs. It isimportanttonotethatthestatedfrequencyistheinternalfrequencyoftheCPUandnotthebusfrequency.Thisisduetothefactthatdesigninga1-GHzmotherboardisverydifficultandexpensive.Suchadesignrequiresaveryfastlogicfamilyandmemoryinadditiontoamassive simulation to avoid crosstalk and signal radiation. The bus frequency for suchsystemsiscurrentlylessthan1GHz.
ReviewQuestions
1.Trueorfalse.TheARMinstructionsetisintriadicform.
2.Trueorfalse.ThebranchpredictiontaskisperformedbycircuitryinsidetheCPU.
3.WhyaresomeCPUscalledasuperscalarprocessor?
4.Instructionschedulingisdoneby________.
5.Outoforderexecutionisarrangedby________.
ProblemsSection7.1:ARMPipelineEvolution
1.TheARM7usesapipelineof_______stages.
2.GivethenamesofthepipelinestagesintheARM7
3.TheARM9usesapipelineof_______stages.
4.GivethenamesofthepipelinestagesintheARM9
Section7.2:OtherCPUEnhancements
5.Thenumberofpipelinestagesinasuperpipelinesystemis________(less,more)thaninasuperscalarsystem.
6.Whichhasoneormoreexecutionunits,superpipelineorsuperscalar?
7.Whichpartofon-chipcacheintheARMiswriteprotected,dataorcode?
8.Whatisinstructionpairing,andwhencanithappen?
9.Whatisdatadependency,andhowisitavoided?
10.Trueorfalse.Instructionsarefetchedaccordingtotheorderinwhichtheywerewritten.
11.Trueorfalse.Instructionsareexecutedaccordingtotheorderinwhichtheywerewritten.
12.Trueorfalse.Instructionsareretiredaccordingtotheorderinwhichtheywerewritten.
13.ThevisibleregistersR0,R1,andsoon,areupdatedbywhichunitoftheCPU?
14.Trueorfalse.Amongtheinstructions,STRs(store)areneverexecutedoutoforder.
AnswerstoReviewQuestionsSection7.1
1.Insuperpipelining,theprocessoffetchingandexecutinginstructionsissplitintomanysmallstepsandallaredoneinparallel.Inthiswaytheexecutionofmanyinstructionsisoverlapped.
2.Thespeedoftheexecutionislimitedtothesloweststageofthepipeline.
3.Insuperscaling,theentireexecutionunithasbeendoubled
4.True
5.Fetch,decode,execute,memory,writeback
Section7.2
1.True
2.True
3.Sinceithastwoexecutionunits(pipelines)capableofexecutingtwoinstructionswithoneclock
4.circuitryinsidetheCPUandthecompiler
5.theCPU
AppendixA:ARMCortex-M3InstructionDescription
SectionA.1:ListofARMCortex-M3InstructionsADCAddwithCarry
ADCSAddwithCarry(andupdatetheflags)
ADDADD
ADDSADDandupdatetheflags
ADRLoadPC-RelativeAddress
ANDLogicalAND
ANDSLogicalAND(updateflags)
ASRArithmeticShiftright
ASRSArithmeticShiftright(updatetheflags)
BBranch(unconditionaljump)
BxxBranchConditional
BFCBitFieldClear
BFIBitFieldInsert
BICBitClear
BICSBitClear(updateflags)
BKPTBreakpoint
BLBranchwithLink(thisisCallinstruction)
BLXBranchIndirectwithLink
BXBranchIndirect(BXLRisusedforReturn)
CBNZCompareandBranchonNon-Zero
CBZCompareandBranchonZero
CDPCoprocessorDataprocessing
CLREXClearExclusive
CLZCountLeadingZero
CMNCompareNegative
CMPCompare
CPSIDChangeprocessorIDandDisableInterrupt
CPSIEChangeProcessorStateandEnableInterrupt
DMBDataMemoryBarrier
DSBDataSynchronizationBarrier
EORExclusiveOR
EORSExclusiveORandupdatetheflags
ISBInstructionSynchronizationBarrier
ITIf-ThenConditionBlock
LDCLoadCoprocessor
LDMLoadMultipleregisters
LDMDBLoadMultipleregistersandDecrementBeforeeachaccess
LDMEALoadMultipleregistersfromEmptyAscending
LDMFDLoadMultipleregistersFullDescending
LDMIALoadMultipleregistersandIncrementaftereachAccess
LDRLoadRegister
LDRRx,=ValueLoadRegisterwith32-bitvalue
LDRBLoadRegisterByte
LDRBTLoadRegisterBytewithTranslation
LDREX,LDREXB,LDREXHLoadRegisterExclusive
LDRHLoadRegisterHalfword
LDRSBLoadRegistersignedByte
LDRSHLoadRegisterSignedHalfword
LDRTLoadRegisterwithTranslation
LSLLogicalShiftLeft
LSLSLogicalShiftLeft(updatetheflags)
LSRLogicalShiftRight
LSRSLogicalShiftRight(updatetheflags)
MCRMovetoCoprocessorfromARMRegister
MLAMultiplyAccumulate
MLSMultiplyandSubtract
MOVMove(ARM7)
MOVMove(ARMCortex)
MOVSMove(andupdateflags)
MOVTMoveTop
MOVWMove16-bitconstant
MRCMovetoARMRegisterfromCoprocessor
MRSMovetogeneralRegisterfromSpecialregister
MSRMovetoSpecialregisterfromgeneralRegister
MULUnsignedMultiplication
MVNMoveNegative
MVNSMoveNegativeandupdatetheflags
NOPNoOperation
ORNLogicalORNot
ORNSORNotandupdateflags
ORRLogicalOR
ORRSLogicalORandupdatetheflags
POPPOPregisterfromStack
PUSHPUSHregisterontostack
RBITReverseBits
REVReversebyteorderinaword
RV16Reversebyteorderin16-bit
REVSHReversebyteorderinbottomhalfwordandsignextend
RORRotateRight
RORSRotateRight(updatetheflags)
RRXRotateRightwithextend
RRXSRotateRightwithextend(updatetheflags)
RSBReverseSubtract
RSBSReverseSubtractandupdatetheflags
SBCSubtractwithCarry(Borrow)
SBCSSubtractwithCarry(Borrow)andupdatetheflags
SBFXSignBitFieldextract
SDIVSignedDivide
SEVSendEvent
SMLALSignedMultiplyAccumulateLong
SMULLSignedMultiplyLong
SSATSignSaturate
STMStoreMultiple
STMDBStoreMultipleregisterandDecrementBefore
STMEAStoreMultipleregisterEmptyAscending
STMIAStoreMultipleregisterEmptyAscending
STMFDStoreMultipleregisterFullDescending
STRStoreRegister
STRBStoreRegisterByte
STRBTStoreRegisterBytewithTranslation
STRDStoreRegisterDouble(twowords)
STREX,STREXB,STREXHStoreRegisterExclusive
STRHStoreRegisterHalfword
STRTStoreRegister
SUBSubtract
SUBSSubtract
SVCsupervisorCall(SoftwareInterrupt)
SXTBSignExtendbyte
SXTHSignExtendHalfword
TBBTableBranchByte
TBHTableBranchhalfword
TEQTestEquivalence
TSTTest
UBFXUnsignedBitfiledextract
UDIVUnsignedDivide
UMLALUnsignedMultiplywithAccumulate
UMULLUnsignedMultiplyLong
USATUnsignedSaturate
UXBTZeroextendabyte
UXTHZeroextendhalfword
WFEWaitforevent
WFIWaitforinterrupt
SectionA.2:ARMCortex-M3InstructionDescription
ADCAddwithCarry
Flags:Unaffected.
Format:ADCRd,Rn,Op2;Rd=Rn+Op2+C
Function:IfC=1priortothisinstruction,thenafterexecutionofthisinstruction,Op2isaddedtoRnplus1andtheresultisplacedinRd.IfC=0,Op2isaddedtoRnplus0.Usedwidelyinmultiwordadditions.Aftertheexecutiontheflagsarenotupdated.TheADCSinstructionupdatestheflags.
Example1:
LDRR0,=0xFFFFFFFB;R0=0xFFFFFFFB
LDRR1,=0xFFFFFFFF;R1=0xFFFFFFFF
MOVR2,#3;R2=3
MOVR3,#4;R3=4
ADDSR4,R0,R1;R4=R0+R1,C=1
ADCR5,R2,R3;R5=R2+R3+C=R2+R3+1
ADCSAddwithCarry(andupdatetheflags)
Flags:Affected:N,Z,V,C.
Format:ADCRd,Rn,Op2;Rd=Rn+Op2+C
Function:IfC=1priortothisinstruction,thenafterexecutionofthisinstruction,Op2isaddedtoRnplus1andtheresultisplacedinRd.IfC=0,Op2isaddedtoRnplus0.Usedwidelyinmultiwordadditions.NoticetheSindicatestheflagswillbeupdated.
Example1:
LDRR0,=0xFFFFFFFB;R0=0xFFFFFFFB
LDRR1,=0xFFFFFFFF;R1=0xFFFFFFFF
MOVR2,#3;R2=3
MOVR3,#4;R3=4
ADDSR4,R0,R1;R4=R0+R1,C=1
ADCSR5,R2,R3;R5=R2+R3+C=3+4+1,Z=0,N=0,C=0,
ADDADD
Flags:Unaffected
Format:ADDRd,Rn,Op2;Rd=Rn+Op2
Function:Addssourceoperandstogetherandplacestheresultindestination.Thiswillnotupdatetheflags.ToupdatetheflagswemustuseADDS.
Example1:
LDRR0,=0xFFFFFFFF;R0=0xFFFFFFFB
MOVR1,#0x5;R1=0x5
ADDR2,R0,R1
;R2=R0+R1=0xFFFFFFFFB+0x5=00000000
;flagsunchanged
Example2:
LDRR0,=0xFFFFFFFF;R0=0xFFFFFFFF
ADDR2,R0,#0xF1
;R2=R0+0xF1=R1=0xFFFFFFFF+0xF1=000000F0
;flagsunchanged
ADDSADDandupdatetheflags
Flags:Affected:N,Z,V,C.
Format:ADDSRd,Rn,Op2;Rd=Rn+Op2andupdatetheflags
Function:Addsourceoperandstogetherandplacestheresultindestinationandupdatetheflags.Next,weexaminethecasesofsignedandunsignednumbers.
Unsignedaddition:
Inadditionofunsignednumbers,thestatusofC,Z,N,andVmaychange,butonlyCandZareofanyusetoprogrammers.ThemostimportantoftheseflagsisC.Itbecomes1whenthereiscarryfromD31outina32-bit(D0–D31)operation.
Example1:
MOVR1,#0x45;R1=0x45
ADDSR1,R1,#0x4F;R1=0x94(0x45+0x4F=0x94),C=0,Z=0
Example2:
MOVR2,#0xFE;R2=0xFE
MOVR3,#0x75;R3=0x75
ADDSR4,R2,R3;R4=0xFE+0x75=0x73,C=0,Z=0
Example3:
LDRR0,=0xFFFFFFFF;R0=0xFFFFFFFB
MOVR1,#0x01;R1=0x1
ADDSR2,R0,R1
;R2=R0+R1=0xFFFFFFFFF+0x1=00000000,C=1,Z=1
Example4:
LDRR0,=0xFFFF126F;R0=0xFFFF126F
LDRR1,=0xFFFF46D4;R1=0xFFFF46D4
ADDSR2,R0,R1
;R1=0xFFFF126F+0xFFFF46D5=0xFFFF5943,C=1,Z=0
Signedaddition:
Inadditionofsignednumbers,thestatusofV,Z,andNmustbenoted.Specialattentionshouldbegiventotheoverflowflag(V)sincethisindicatesifthereisanerrorintheresultoftheaddition.TherearetworulesforsettingVinsignednumberoperation.Theoverflowflagissetto1:
1.IfthereisacarryfromD30toD31andnocarryfromD31outina32-bitoperation
2.IfthereisacarryfromD31outandnocarryfromD30toD31ina32-bitoperation
NoticethatifthereisacarrybothfromD31outandfromD30toD31,thenV=0in32-bitoperations.
Example5:
MOVR1,#+8;R1=0x00000008
MOVR2,#+4;R2=0x00000004
ADDSR3,R1,R2;R3=0x0000000CN=0,V=0,C=0
NoticeN=D31=0sincetheresultispositiveandV=0sincethereisneitheracarryfromD30toD31noranycarrybeyondD31.SinceV=0,theresultiscorrect[(+8)+(+4)=(+12)].
Example6:
LDRR1,=0x+42FFFFFF;R1=0x42FFFFFF,(apositivenumber)
LDRR2,=0x+45FFFFFF;R2=0x45FFFFFF(apositivenumber)
ADDSR3,R2,R1;R3=0x88FFFFFE,C=0,N=1,Z=0,andV=1
InExample6,thecorrectresultshouldbe+,buttheresultis–.TheV=1isanindicationofthiserror.NoticethatN=D31=1sincetheresultisnegative;V=1sincethereisacarryfromD30toD31andC=0.
Example7:
LDRR0,=-12;R0=0xFFFFFFF4
LDRR1,=+17;R1=0x00000011
ADDSR2,R1,R2;R2=00000005(whichis+5andcorrect)
;N=0,Z=0,V=0,andC=1
NoticethatV=0sincethereisacarryfromD30toD31andacarryfromD31out.
Example8:
LDRR0,=-30;R0=0xFFFFFFE2
LDRR1,=+14;R2=0x0000000E
ADDSR2,R1,R0;R2=FFFFFFF0(whichis-16andcorrect)
;N=1,Z=0,V=0,andC=0
V=0sincethereisnocarryfromD31outnoranycarryfromD30toD31.
Example9:
LDRR1,=-126;R1=0xFFFFFF82
LDRR2,=-127;R2=0xFFFFFF81
ADDSR3,R2,R1;R3=0xFFFFFF03
;(whichis-253andcorrect),N=1,Z=0andV=0,C=1
V=0sincethereiscarryfromD31outandcarryfromD30toD31.
ADRLoadPC-RelativeAddress
Flags:Unaffected:
Format:ADRRd,label;Rd=addressoflabel
Function:ThisallowsloadingintoRdregisteranaddressrelativetothecurrentPC(programcounter).Thelabeltargetaddressmustbewithinthe-4,095to+4,096bytesfromtheaddressinPCregister.Thatisnofartherthan1024instructionsineitherdirectionofbackwardorforward.
Example:
ADRR3,MyMessage
HEREBHERE
MyMessageDCB“Hello”
ANDLogicalAND
Flags:Unaffected
Format:ANDRd,Rn,Op2;Rd=RnANDedOp2
Function:PerformslogicalANDontheoperands,bitbybit,storingtheresultinthedestination.Thiswillnotupdatetheflags.ToupdatetheflagswemustuseANDS.NoticethatCflagisupdatedduringcalculationofOp2whenLSRorLSLareused.
Inputs Output
X Y XANDY
0 0 0
0 1 0
1 0 0
1 1 1
Example1:
MOVR0,#0x39;R0=0x39
MOVR1,#0x0F;R1=0x0F
ANDR2,R1,R0;R2=09
;3900111001
;0F00001111
;—–––
;0900001001Flagsunchanged
Example2:
MOVR0,#0x37;R0=0x37
ANDR1,R0,#0x0F;R1=R0ANDed0x0F=07
;3700110111
;0F00001111
;—–––
;0700000111Flagsunchanged
ANDSLogicalAND(updateflags)
Flags:Affected:N,Z,C
Format:ANDRd,Rn,Op2;Rd=RnANDedOp2
Function:PerformslogicalANDontheoperands,bitbybit,storingtheresultinthedestination.Thiswillalsoupdatetheflags.NoticethatCflagisupdatedduringcalculationofOp2whenLSRorLSLareused.
Inputs Output
X Y XANDY
0 0 0
0 1 0
1 0 0
1 1 1
Example1:
MOVR0,#0x39;R0=0x00000039
MOVR1,#0x0F;R1=0x0000000F
ANDSR3,R1,R0;R3=0x00000009
;3900111001
;0F00001111
;—–––
;0900001001Z=0,N=0,C=unchanged
Example2:
MOVR1,#0x39;R1=0x00000039
MOVR2,#0xC6;R2=0x000000C6
ANDSR3,R1,R2;R3=00000000
;3900111001
;C611000110
;—–––
;0000000000Z=1,N=0
Example3:
LDRR2,=0xFFFFFF82
LDRR3,=0xFFFFFF81
ANDSR4,R2,R3;R4=0xFFFFFF80,Z=0,N=1
Example4:
LDRR1,=0x55555555
LDRR2,=0xAAAAAAAA
ANDSR3,R2,R1;R3=00000000Z=1,N=0
ASRArithmeticShiftright
Flags:Unaffected.ExceptC
Format:ASRRd,Rm,Rn
Function:AseachbitofRmregisterisshiftedright,theLSBisremovedandtheemptybitsfilledwiththesignbit(MSB).ThenumberofbitstobeshiftedrightisgivenbyRnandtheresultisplacedinRdregister.Theflagsareunchanged.ToupdatetheflagsuseASRSinstruction.
Example1:
LDRR2,=0xFFFFFF82
ASRR0,R2,#6;R0=R2isshiftedright6times
;now,R0=0xFFFFFFFE
Example2:
LDRR0,=0x2000FF18
MOVR1,#12
ASRR2,R0,R1;R2=R0isshiftedrightR1numberoftimes.
;now,R2=0x0002000F
Example3:
LDRR0,=0x0000FF18
MOVR1,#16
ASRR2,R0,R1;R2=R0isshiftedrightR1numberoftimes
;now,R2=0x00000000
ASRarithmeticshiftisusedforsignednumbershifting.ASRessentiallydividesRmbyapowerof2foreachbitshift.
ASRSArithmeticShiftright(updatetheflags)
Flags:N,Z,andC.
Format:ASRRd,Rm,Rn
Function:AseachbitofRmregisterisshiftedright,theLSBiscopiedtoCflagandtheemptybitsfilledwiththesignbit(MSB).ThenumberofbitstobeshiftedrightisgivenbyRnandtheresultisplacedinRdregister.TheASRSupdatestheflags.
Example1:
LDRR2,=0xFFFFFF82
ASRSR0,R2,#6;R0=R2isshiftedright6times.
;now,R0=0xFFFFFFFE,C=0,N=1,Z=0
Example2:
LDRR0,=0x2000FF18
MOVR1,#12
ASRSR2,R0,R1;R2=R0isshiftedrightR1numberoftimes.
;now,R2=0x0002000F,C=1,N=0,Z=0
Example3:
LDRR0,=0x0000FF18
MOVR1,#16
ASRSR2,R0,R1;R2=R0isshiftedrightR1numberoftimes.
;now,R2=0x00000000,C=1,N=0,Z=1
ASRSarithmeticshiftisusedforsignednumbershifting.ASRSessentiallydividesRmbyapowerof2foreachbitshift.
BBranch(unconditionaljump)
Flags:Unchanged.
Format:Btarget;jumptotargetaddress
Function:Thisinstructionisusedtotransfercontrolunconditionallytoanewaddress.ThedifferencebetweenBandBListhattheBLinstructionsavestheaddressofthenextinstructiontoLR(thelinkregister,R14).ForARM7,thetargetaddressiscalculatedby(a)shiftingthe24-bitsigned(2’scomp)offsetlefttwobits,(b)sign-extendtheresultto32-bit,and(c)addittocontentsofPC(programcounter).Thismeansthetargetaddresscouldbewithinthe–32Mbytesto+32Mbytesofaddressspacefromthecurrentprogramcounter.ForARMCortexM3.thetargetaddressmustbewithin–16MBto+16MBaddressspacefromcurrentinstruction.
BxxBranchConditional
Flags:Unaffected.
Format:Bxxtarget;jumptotargetuponcondition
Function:Usedtojumptoatargetaddressifcertainconditionsaremet.InARM7,thetargetaddresscannotbemorethan–32MBto+32MBbytesaway.ForARMCortexM3.thetargetaddressmustbewithin–16MBto+16MBaddressspacefromcurrentinstruction.Theconditionsareindicatedbytheflagregister.Theconditionsthatdeterminewhetherthejumptakesplacecanbecategorizedintothreegroups:
1.flagvalues,
2.thecomparisonofunsignednumbers,and
3.thecomparisonofsignednumbers.
Eachisexplainednext.
1.“Bcondition”wheretheconditionreferstoflagvalues.Thestatusofeachbitof
theflagregisterhasbeendecidedbyexecutionofinstructionspriortothejump.Thefollowing“Bcondition”instructionscheckifacertainflagbitisraisedornot.
Instruction Condition
BCS BranchifCarrySet jumpifC=1
BCC BranchifCarryClear jumpifC=0
BEQ BranchifEqual jumpifZ=1
BNE BranchifNotEqual jumpifZ=0
BMI BranchifMinus/Negative jumpifN=1
BPL BranchifPlus/Positive jumpifN=0
BVS BranchifOverflow jumpifV=1
BVC BranchifNooverflow jumpifV=0
2.“Bcondition”wheretheconditionreferstothecomparisonofunsignednumbers.Afteracompare(CMPRn,Op2)instructionisexecuted,CandZindicatetheresultofthecomparison,asfollows:
C Z
Rn>Op2 1 0
Rn=Op2 1 1
Rn<Op2 0 0
Sincetheoperandscomparedareviewedasunsignednumbers,thefollowing“Bcondition”instructionsareused.
Instruction Condition
BHI BranchifHigher jumpifC=1andZ=0
BEQ BranchifEqual jumpifC=1andZ=1
BLS BranchifLowerorsame jumpifC=0orZ=1
Inreality,the“CMPRn,Op2”isasubtractinstruction(Rn-Op2).Afterthesubtractiontheresultisdiscardedandflagsarechangedaccordingtotheresult.NoticeinARMthesubtractaffectstheCflagsettingdifferentlyfromthex86andotherCPUs.SeetheSUB
instruction.
3.“Bcondition”wheretheconditionreferstothecomparisonofsignednumbers.Inthecaseofthesignednumbercomparison,althoughthesameinstruction,“CMPRn,Op2”,isused,theflagsusedtochecktheresultareasfollows:
Rn>Op2 V=NorZ=0
Rn=Op2 Z=1
Rn<Op2 VinverseofN
Consequently,the“Bcondition”instructionsusedaredifferent.Theyareasfollows:
Instruction
BGE BranchGreaterorEqualjumpifN=1andV=1orN=0andV=0
(V=N)
BLT BranchLessthanjumpifN=1andV=0orN=0andV=1
(NnotequaltoV)
BGT BranchGreaterthanjumpifZ=0andeitherN=1andV=1orN=0and
V=0
(N=V)
BLE BranchLessorEqualjumpifZ=1orN=1andV=0.OrN=0andV=1
(Z=1orNnotequaltoV)
BEQ BranchifEqual jumpifZ=1
All“Bcondition”instructionsareshortjumps,meaningthatthetargetaddresscannotbemore than -32Mbytesbackwardor+32Mbytes forward from thePCof the instructionfollowingthejump.InARMCortexM3itis16MBineachdirection.Whathappensifaprogrammerneedstousea“Bcondition”togotoatargetaddressbeyondthe-32MBto+32MB range?The solution is to use the “BX condition,Rm” sinceRm can be 32-bitaddressandcoverstheentire4GBaddressspaceoftheARM.Thisisshownnext.
LDRR4,=MYTARGET
ADDSR1,R2,R3
BXEQR4;branchtoaddressheldbyR4ifZ=1
MYTRGTSUBSR7,#4
NOP
NOP
….
C Z N V
Rn>Op2 0 0 0 N
Rn=Op2 0 1 0 N
Rn<Op2 1 0 1 InverseofN
BFCBitFieldClear
Flags:Unaffected.
Format:BFCRd,#LSB,#Width
Function:ClearsselectedbitsofRd.ThestartlocationoftheRdbitisindicatedby#LSBandmustbeintherangeof0–31.Howmanybitsshouldbeclearedisindicatedby#Widthandmustbeintherangeof1–32.
Example1:
LDRR1,=0xFFFFFFFF;R1=0xFFFFFFFF
BFCR1,#2,#14;nowR1=0xFFFF0003
Example2:
LDRR2,=0x999999999;R2=0x99999999
BFCR2,#8,#24;nowR2=0x00000099
BFIBitFieldInsert
Flags:Unaffected.
Format:BFIRd,Rn,#LSB,#Width
Function:SelectedbitsofRnarecopiedtoRd.ThestartlocationoftheRdbitisindicatedby#LSBandmustbeintherangeof0–31.Howmanybitsshouldbecopiedisindicated by #Width andmust be in the range of 1–32. The start bit location of Rn isalwaysbit0(D0).
Example:
LDRR1,=0xABCDABCD;R1=0xABCDABCD
LDRR2,=0x12345678;R2=0x12345678
BFIR1,R2,#4,#8;nowR1=0xABCDA78D
…
BICBitClear
Flags:Unaffected.
Format:BICRd,Rn,Op2;Rd=RnANDedwithNOTofOp2
Function:SelectedbitsofRnareclearedandplacedinRd.TheOp2providesthebitsselection.IftheselectedbitsinOp2arehighthencorrespondingbitsinRnareclearedandtheresultisplacedinRd.IftheselectedbitsinOp2arelowthecorrespondingbitsinRnareleftunchangedandtheresultisplacedinRd.Inreality,theBICperformstheANDoperation on the bits ofRnwith the complement of the bits inOp2. TheBICwill notupdatetheflags.ToupdatetheflagswemustuseBICS.
Inputs Output
X Y XAND(NOTY)
0 0 0
0 1 0
1 0 1
1 1 0
Example:
LDRR1,=0xFFFFFF00;R1=0xFFFFFF00
LDRR2,=0x99999999;R2=0x9999999
BICR3,R2,R1;nowR3=0x00000099
BICSBitClear(updateflags)
Flags:NandZ.
Format:BICSRd,Rn,Op2;Rd=RnANDedwithNOTofOp2
Function:SelectedbitsofRnareclearedandplacedinRd.TheOp2providestheselectedbits.IftheselectedbitsinOp2ishighthenthecorrespondingbitsinRnisclearedandtheresultisplacedinRd.IftheselectedbitsinOp2islowthemthecorrespondingbitsinRnisleftunchangedandtheresultisplacedinRd.Inreality,theBICSperformstheANDoperationonthebitsofRnwiththecomplementofthebitsinOp2andupdatestheflags.
Inputs Output
X Y XAND(NOTY)
0 0 0
0 1 0
1 0 1
1 1 0
Example1:
LDRR1,=0xFFFFFF00;R1=0xFFFFFF00
LDRR2,=0x99999999;R2=0x9999999
BICSR3,R2,R1;nowR3=0x00000099,N=0,Z=0
Example2:
LDRR0,=0xFFFFFFFF;R0=0xFFFFFFFF
LDRR1,=0x01234567;R1=0x01234567
BICSR2,R1,R0;nowR2=0,N=0,Z=1
Example3:
MOVR0,#0;R0=0
LDRR1=0x99999999;R1=0x99999999
BICSR2,R1,R0;nowR2=0x99999999,N=1,Z=0
BKPTBreakpoint
Flags:Unaffected.
Format:BKPT#imme_value
Function:usedbycompilertoinsertbreakpointintoprograms.UponexecutionoftheBKPTinstructiontheprogramenterstheDebugmode.SeeyourARM compilerformoreinformation
BLBranchwithLink(thisisCallinstruction)
Flags:Unchanged.
Format:BLSubroutine_Addr;transfercontroltoasubroutine
Function:Transferscontroltoasubroutine.ThisinstructionsavestheaddressoftheinstructionaftertheBLinR14(linkregister).Attheendofthesubroutinethecontrolto the instruction after theBL is achieved by copying the LR (R14) register to PC. InARM7,thetargetaddresscannotbemorethan–32MBto+32MBbytesaway.ForARMCortexM3. the target address must bewithin –16MB to +16MB address space fromcurrentinstruction.
Example:
LDRR7,=20000000
BLDELAY;CallsubroutineMY_DELAY
ADDR3,#4;addressofthisinstructionissavedinR14
…
…
DELAYSUBSR7,#4
NOP
NOP
MOVPC,R14;Return,couldhaveused“BXLR”instruction
BLXBranchIndirectwithLink
Flags:Unaffected.
Format:BLXRm;transfercontroltoasubroutinewhose
;addressisgivenbyRm
Function: Transfers control to a subroutinewhoseaddress isgivenby theRmregister. This instruction saves the address of the instruction after the BL in R14 (linkregister). At the end of the subroutine the control to the instruction after the BL isachieved by copying the LR (R14) register to PC. One can use “BX LR” as returninstruction. Notice the difference between this instruction and “BL Target_Addr”instruction. In the “BL Target_Addr” instruction the target address of the subroutine isgiven right there. However, in the “BLX Rm” instruction, the target address of thesubroutineisheldbyregisterRm.
Example:
ADRR2,DELAY
BLXR2;CallsubroutinepointedtobyR2
ADDR3,#4;addressofthisinstructionissavedinR14
…
…
DELAYSUBSR1,#4
NOP
NOP
BXLR;return
BXBranchIndirect(BXLRisusedforReturn)
Flags:Unchanged.
Format:BXRm;BXLRisusedforReturnfromasubroutine
Function:Themostwidelyusageofthisinstructionisintheformof“BXLR”for
thepurposeofreturninstructionattheendofsubroutine.
Example:
LDRR1,=20000000
BLDELAY;CallsubroutineMY_DELAY
ADDR3,#4;addressofthisinstr.issavedinR14
…
…
DELAYSUBSR1,#4
NOP
NOP
BXLR;returntocaller
CBNZCompareandBranchonNon-Zero
Flags:Unchanged.
Format:CBNZRn,Target
Function:TransferscontroltothetargetlocationifRnisnotequaltozero.TheRnmustbeintherangeofR0–R7andtargetaddresscannotbefartherthan130bytesawayfromtheinstruction.ThisinstructioncomparestheRnwithzeroandjumpsonlyifRnisnotzero.Thecomparisonhasnoeffectonflags.Thiscanbeusedforloopsinwhichthebodyoftheloopisnomorethan20instructions.
Example1:
MOVR1,#10;R1=10
L1NOP
NOP
NOP
SUBR1,R1,1;R1=R1-1
CBNZR1,L1
CBZCompareandBranchonZero
Flags:Unaffected.
Format:CBZRn,Target
Function:TransferscontroltothetargetlocationifRniszero.TheRnmustbeinthe rangeofR0–R7and target address cannot be farther than130bytes away from theinstruction.ThisinstructioncomparestheRnwithzeroandjumpsonlyifRniszero.Thecomparisonhasnoeffectonflags.Thiscanbeusedtotestaregistervalueafterreadinga
port.
Example1:
LDRR0,=MYPORT_ADR;R0=MYPORTaddress
HERELDRR2,[R0];readfromMYPORT
CBZR2,HERE;keepreadingMYPORTuntilitiszero
CDPCoprocessorDataprocessing
SeeARMCortex-MManual.
CLREXClearExclusive
SeeARMCortex-MManual.
CLZCountLeadingZero
Flags:Unchanged.
Format:CLZRd,Rn
Function:ScanstheRnregistercontentsfrommostsignificantbit(D31)towardleastsignificantbit(D0)untilitfindthefirstHIGH.ThenumberofbinaryzerobitsbeforeitencountersthefirstbinaryHIGHisplacedinRd.
Example:
LDRR3,=0x01FFFFFF
CLZR1,R3;R1=7sincethereare7zerosbeforethefirstbinary1
CMNCompareNegative
Flags:Affected:V,N,Z,C.
Format:CMNRn,Op2;setsflagsasif“Rn+Op2”
;Notice,theRn-(-Op2)=Rn+Op2
Function:ComparesRnregistervaluewiththenegativeofOp2value.ThisisdonebyRn-(negativeofOp2)whichisRn-(-Op2)=Rn+Op2.TheRnandOp2operandsarenotaltered. In other words, the CMN adds the Op2 to Rn (Rn+Op2) and sets the flagsaccordingly.ThisisthesameasADDSinstructionexcepttheoperandsareunchangedandtheresultisdiscarded.SeeBxxinstructionforpossiblecasesofcomparison.
CMPCompare
Flags:Affected:V,N,Z,C.
Format:CMPRn,Op2;setsflagsasif“Rn-Op2”
Function:Comparestwooperands.Theoperandsarenotaltered.PerformscomparisonbysubtractingtheOp2operandfromtheRnandupdatesflagsasifSUBSwereperformed.AswecanseeinSUBS,theCMPperformtheoperationofRn+2’s
compofOp2andsetstheflagsaccordingtotheresult.SeeBxxinstructionforpossiblecasesofcomparison.
CPSIDChangeprocessorIDandDisableInterrupt
Flags:Unaffected
Format:CPSIDiflag;iflagisiinPRIMASKorfinFAULTMASK
Function:UsedfordisablingtheinterruptflagsinPRIMASKorFAULTMASKregisters.SeeARMCortexmanual.
CPSIEChangeProcessorStateandEnableInterrupt
Flags:Unaffected
Format:CPSIEiflag;iflagisiinPRIMASKorfinFAULTMASK
Function:UsedforenablingtheinterruptflagsinPRIMSKorFAULTMASKregisters.SeeARMCortexmanual.
DMBDataMemoryBarrier
Flags:Unaffected
Format:DMB
Function:ItmakessurethatalltheexplicitmemoryaccessespriortoDMBinstructionarecompletedbeforetheexplicitmemoryaccessesaftertheDMB.SeeARMCortexmanual.
DSBDataSynchronizationBarrier
Flags:Unaffected
Format:DSB
Function:ItmakessurethatalltheexplicitmemoryaccessespriortoDSBinstructionarecompletedbeforetheDSBinstructionisexecuted.SeeARMCortexmanual.
EORExclusiveOR
Flags:Unaffected
Format:EORRd,Rn,Op2
Function:PerformslogicalEx-ORontheRnandOp2operands,bitbybit,storingtheresultintheRd.Thiswillnotupdatetheflags.UseEORSinstructiontoupdatestheflags.
Inputs Output
X Y XEORY
0 0 0
0 1 1
1 0 1
1 1 0
Example1:
MOVR0,#0xAA;R0=0xAA
EORR2,R0,#0xFF;now,R2=0x55
;AA10101010
;FF11111111
;—–––
;5501010101flagsunchanged
Example2:
LDRR0,=0xAAAAAAAA;R0=0xAAAAAAAA
LDRR1,=0x55555555;R1=0x55555555
EORR2,R1,R0;R2=0xFFFFFFFF
;AA10101010
;5501010101
;—–––
;FF11111111flagsunchanged
The“EORRd,Rx,Rx”canbeusedtoclearRd.
Example3:
MOVR1,#0x55
EORR2,R1,R1;R2=0
;5501010101
;5501010101
;—–––
;0000000000flagsunchanged
TocomplementthebitsofRn,EX-ORitwith0xFF.
Example4:
LDRR0,=0xAAAAAAAA;R0=0xAAAAAAAA
LDRR1,=0xFFFFFFFF;R1=0xFFFFFFFF
EORR2,R1,R0;R2=0x55555555
;AA10101010
;FF11111111
;—–––
;5501010101flagsunchanged
EORSExclusiveORandupdatetheflags
Flags:Affected:C,V,N,Z
Format:EORSRd,Rn,Op2
Function:PerformslogicalEx-ORontheRnandOp2operands,bitbybit,storingtheresultintheRd.Aftertheexecutiontheflagsareupdated.
Inputs Output
X Y XEORY
0 0 0
0 1 1
1 0 1
1 1 0
Example1:
MOVR0,#0xAA;R0=0xAA
EORSR2,R0,#0xFF;now,R2=0x55
;AA10101010
;FF11111111
;—–––
;5501010101N=0,C=0,Z=0,V=0
Example2:
LDRR0,=0xAAAAAAAA;R0=0xAAAAAAAA
LDRR1,=0x55555555;R1=0x55555555
EORSR2,R1,R0;R2=0xFFFFFFFF
;AA10101010
;5501010101
;—–––
;FF11111111N=1,C=0,Z=0,V=0
The“EORRd,Rx,Rx”canbeusedtoclearRd.
Example3:
MOVR1,#0x55
EORSR2,R1,R1;R2=0x0
;5501010101
;5501010101
;—–––
;0000000000N=0,C=0,Z=1,V=0
ISBInstructionSynchronizationBarrier
Flags:Unaffected.
Format:ISB
Function:ItflushesthepipelinetomakesuretheinstructionsexecutedrightaftertheISBinstructionarefetchedfreshfromthecacheormemory.
ITIf-ThenConditionBlock
Flags:Unaffected
Format:SeeARMmanual
Function:ItallowstheexecutionofuptofourinstructionaftertheITtobeconditional.
LDCLoadCoprocessor
SeetheARMManual
LDMLoadMultipleregisters
Flags:Unaffected.
Format:LDMRn,{Rx,Ry,..}
Function: Loadsintoregistersfromconsecutivememorylocations.ThestartingaddressofmemorylocationisgivenbyRnregister.Thedestinationregistersseparatedbycommaandplacedinbraces.IntheARMCortex,thestackisdescendingmeaningthatasinformationispushedontostackthestackpointerisdecremented.ThisIA(IncrementtheaddressaftereachAccess)isthedefaultforloading(Poping).ThisinstructioniswidelyusedforPoping(loading)multiplewordsfromdescendingstackintoCPUregisters.
Example:
;Assumethefollowingmemorylocationswiththecontents:
;12000=(46)
;12001=(10)
;12002=(38)
;12003=(82)
;12004=(56)
;12005=(50)
;12006=(58)
;12007=(15)
;12008=(63)
;12009=(60)
;1200A=(68)
;1200B=(39)
;1200C=(79)
;1200D=(70)
;1200E=(75)
;1200F=(92)
LDRR7,=0x12000
LDMR7,{R0,R2,R4}
;now,R0=0x82381046,R2=0x15585056,…
;thecontentsofmemorylocations0x12000-0x12003are
;movedtoregisterR0,andthecontentsofmemory
;locations0x12004-0x12007aremovedtoregister
;R2,andsoon.ThereforewehaveR0=0x82381046,
;R2=0x15585056,andR4=0x39686063.
LDMDBLoadMultipleregistersandDecrementBeforeeachaccess
Flags:Unaffected.
Format:LDMDBRn,{Rx,Ry,…}
Function:ThisisthesameasLDMEA(loadmultipleregistersfromEmptyAscending)usedforcasesinwhichthestackisascending.SeeLDMEAinstruction.
LDMEALoadMultipleregistersfromEmptyAscending
Flags:Unaffected.
Format:LDMEARn,{Rx,Ry,…}
Function:Loadsintoregistersfromconsecutivememorylocations.ThestartingaddressofmemorylocationisgivenbyRnregister.Thedestinationregistersseparatedbycommaandplacedinbraces.IntheARMCortex,thedefaultforstackisdescendingmeaningthatasinformationarepushedontostackthestackpointerisdecremented.TheIA(IncrementtheaddressaftereachAccess)isthedefault.IfwechangethedefaultofdescendingstacktoascendingstackthenwehavetousetheEA(EmptyAscending).Theascendingstackmeansasinformationarepushedontostackthestackpointerisincremented.TheLDMEAisusedforPoping(loading)multiplewordsfromascendingstackintoCPUregisters.
LDMFDLoadMultipleregistersFullDescending
Flags:Unaffected.
Format:LDMFDRn,{Rx,Ry,..}
Function:ThisisthesameasLDMandLDMIA.
LDMIALoadMultipleregistersandIncrementaftereachAccess
Flags:Unaffected.
Format:LDMRn,{Rx,Ry,..}
Function: This is the same as the LDM instructions. In theARMCortex, the stack isdescending meaning that as information are pushed onto stack the stack pointer isdecremented.ThisIA(IncrementtheaddressaftereachAccess)isthedefault.WeusethisforPoping(loading)multiplewordsfromdescendingstackintoCPUregisters.
LDRLoadRegister
Flags:Unaffected.
Format:LDRRd,[Rx];loadintoRdawordfrommemory
;locationpointedtobeRx
Function: Loadsintodestinationregisterthecontentsoffourmemorylocations.The [Rx]points to addressofmemory location.This iswidelyused to load32-bit datafrommemoryintoRdregisteroftheARMsinceinthe“MOVRd,#immediate_value”theimmediatevaluecannotbelargerthan0xFF.
Example:
;Assumethefollowingmemorylocationswiththecontents:
;12000=(46)
;12001=(10)
;12002=(38)
;12003=(82)
LDRR0,=0x12000
LDRR1,[R0]
;now,R1=82381046.
LDRRx,=ValueLoadRegisterwith32-bitvalue
Flags:Unaffected.
Format:LDRRd,=32_bit_value;loadRdwith32-bitvalue
Function:Loadsintodestinationregistera32-bitimmediatevalue.Thisiswidelyused to load 32-bit immediate value into Rd register of the ARM since in the “MOVRd,#immediate_value”theimmediatevaluecannotbelargerthan0xFF.
Example:
LDRR0,=0x1200000;R0=0x1200000
LDRR1,=0x2FFFF;R1=0x2FFFF
LDRR0,=0xFFFFFFFF;R0=0xFFFFFFFF
LDRR1,=200000000;R1=200000000
LDRBLoadRegisterByte
Flags:Unaffected.
Format:LDRBRd,[Rx];loadintoRdabytefrommemory
;locationpointedtobeRx
Function:LoadsintodestinationregisterthecontentsofasinglememorylocationindicatedbyRx.
Example:
;Assumethefollowingmemorylocationswiththecontents:
;12000=(46)
;12001=(10)
;12002=(38)
;12003=(82)
LDRR0,=0x12000
LDRBR1,[R0]
;now,R0=00000046
LDRBTLoadRegisterBytewithTranslation
Flags:Unaffected.
Format:LDRBTRd,[Rx]
Function:LoadsintoRdregisterabytefrommemorylocationpointedtobyRx
andzero-extendsthebyteto32-bitword.Thatmeansazeroiscopiedtoalltheupper24bitsoftheRdregister.Usedforunprivilegedmemoryaccess.
Example:
;Assumethefollowingmemorylocationswiththecontents:
;12000=(46)
;12001=(10)
;12002=(38)
;12003=(82)
LDRR0,=0x12000
LDRBTR1,[R0];now,R1=00000046
LDREX,LDREXB,LDREXHLoadRegisterExclusive
Function:TheyareusedwithSTREX,STREXB,andSTREXHinstructionstoperformCPUsynchronizationoperation.SeeARMCortexmanual.
LDRHLoadRegisterHalfword
Flags:Unaffected.
Format:LDRHRd,[Rx];loadintoRda2-bytefrommemory
;locationpointedtobeRx
Function:Loadsintodestinationregisterthecontentsofthetwoconsecutivememorylocations(halfword)indicatedbyRx.
Example:
;Assumethefollowingmemorylocationswiththecontents:
;12000=(46)
;12001=(10)
;12002=(38)
;12003=(82)
LDRR0,=0x12000
LDRHR1,[R0]
;now,R0=00001046
LDRSBLoadRegistersignedByte
Flags:Unaffected.
Format:LDRSBRd,[Rx]
Function:LoadsintoRdregisterabytefrommemorylocationpointedtobyRxandsign-extendsthebyteto32-bitword.Thatmeansthesign(D7)ofthebyteiscopiedtoalltheupper24bitsoftheRdregister.
Example1:
;Assumethefollowingmemorylocationswiththecontents:
;12000=(85)
;12001=(10)
;12002=(38)
;12003=(82)
LDRR0,=0x12000
LDRBR1,[R0];nowR1=FFFFFF85becauseMSBof85is1
Example2:
;Assumethefollowingmemorylocationswiththecontents:
;12000=(15)
;12001=(20)
;12002=(3F)
;12003=(82)
LDRR0,=0x12000
LDRBR1,[R0];now,R1=00000015becauseMSBof15is0
LDRSHLoadRegisterSignedHalfword
Flags:Unaffected.
Format:LDRSHRd,[Rx]
Function:LoadsintoRdregisterahalf-word(2-byte)frommemorylocationpointedtobyRxandsign-extendsitto32-bitword.Thatmeansthesign(D15)ofthe16-bitoperandiscopiedtoalltheupper16bitsoftheRdregister.
Example1:
;Assumethefollowingmemorylocationswiththecontents:
;12000=(46)
;12001=(F3)
;12002=(38)
;12003=(82)
LDRR0,=0x12000
LDRBR1,[R0];now,R0=FFFFF346becauseMSBofF3is1
Example2:
;Assumethefollowingmemorylocationswiththecontents:
;12000=(4F)
;12001=(23)
;12002=(18)
;12003=(B2)
LDRR0,=0x12000
LDRBR1,[R0];now,R1=0000234FbecauseMSBof23is0
LDRTLoadRegisterwithTranslation
Flags:Unaffected
Format:LDRTRd,[Rx]
Function:LoadsintoRdregisterabytefrommemorylocationpointedtobyRxandzero-extendsthebyteto32-bitword.Thatmeansazeroiscopiedtoalltheupper24bitsoftheRdregister.Usedforunprivilegedmemoryaccess.
Example:
;Assumethefollowingmemorylocationswiththecontents:
;12000=(46)
;12001=(10)
;12002=(38)
;12003=(82)
LDRR0,=0x12000
LDRBR1,[R0];now,R1=00000046
LSLLogicalShiftLeft
Flags:Unaffected.
Format:LSLRd,Rm,Rn
Function:AseachbitofRmregisterisshiftedleft,theMSBisremovedandtheemptybitsarefilledwithzeros.ThenumberofbitstobeshiftedleftisgivenbyRnandtheresultisplacedinRdregister.TheLSLdoesnotupdatestheflags.
Example1:
LDRR2,=0x00000010
LSLR0,R2,#8;R0=R2isshiftedleft8times
;now,R0=0x00001000,flagsnotchanged
Example2:
LDRR0,=0x00000018
MOVR1,#12
LSLR2,R0,R1;R2=R0isshiftedleftR1numberoftimes
;now,R2=0x000018000,flagsnotchanged
Example3:
LDRR0,=0x0000FF18
MOVR1,#16
LSLR2,R0,R1;R2=R0isshiftedleftR1numberoftimes
;now,R2=0xFF180000,flagsnotchanged
Thelogicalshiftleftusedforunsignednumbershifting.LSLessentiallymultipliesRmbyapowerof2foreachbitshift.
LSLSLogicalShiftLeft(updatetheflags)
Flags:Affected.
Format:LSLSRd,Rm,Rn
Function:AseachbitofRmregisterisshiftedleft,theMSBiscopiedtoCflagandtheemptybitsarefilledwithzeros.ThenumberofbitstobeshiftedleftisgivenbyRnandtheresultisplacedinRdregister.TheLSLSupdatestheflags.
Example1:
LDRR2,=0x00000010
LSLSR0,R2,#8;R0=R2isshiftedleft8times
;now,R0=0x00001000,C=0,N=0,Z=0
Example2:
LDRR0,=0x00000018
MOVR1,#12
LSLSR2,R0,R1;R2=R0isshiftedleftR1numberoftimes
;now,R2=0x000018000,C=0,N=0,Z=0
Example3:
LDRR0,=0x000FFF18
MOVR1,#16
LSLSR2,R0,R1;R2=R0isshiftedleftR1numberoftimes
;now,R2=0xFF180000,C=1,Z=0,N=0
Thelogicalshiftleftusedforunsignednumbershifting.LSLSessentiallymultipliesRmbyapowerof2foreachbitshift.
LSRLogicalShiftRight
Flags:Unaffected.
Format:LSRRd,Rm,Rn
Function:AseachbitofRmregisterisshiftedright,theLSBisremovedandtheemptybitsarefilledwithzeros.ThenumberofbitstobeshiftedleftisgivenbyRnandtheresultisplacedinRdregister.TheLSRdoesnotupdatetheflags.
Example1:
LDRR2,=0x00001000
LSRR0,R2,#8;R0=R2isshiftedright8times
;now,R0=0x00000010,C=0
Example2:
LDRR0,=0x000018000
MOVR1,#12
LSRR2,R0,R1;R2=R0isshiftedrightR1numberoftimes
;now,R2=0x00000018,C=0
Example3:
LDRR0,=0x7F180000
MOVR1,#16
LSRR2,R0,R1;R2=R0isshiftedrightR1numberoftimes
;now,R2=0x00007F18,C=0
Thelogicalshiftrightusedforshiftingunsignednumbers.LSRessentiallydividesRmbyapowerof2foreachbitshift.
LSRSLogicalShiftRight(updatetheflags)
Flags:Affected.
Format:LSRSRd,Rm,Rn
Function:AseachbitofRmregisterisshiftedright,theLSBiscopiedtoCflagandtheemptybitsarefilledwithzeros.ThenumberofbitstobeshiftedleftisgivenbyRnandtheresultisplacedinRdregister.TheLSRSupdatestheflags.
Example1:
LDRR2,=0x00001FFF
LSRSR0,R2,#8;R0=R2isshiftedright8times
;now,R0=0x0000001F,C=1,N=0,Z=0
Example2:
LDRR0,=0x00000018
MOVR1,#12
LSRSR2,R0,R1;R2=R0isshiftedrightR1numberoftimes
;now,R2=0x000000000,C=0,N=0,Z=1,
Example3:
LDRR0,=0x000FFF18
MOVR1,#16
LSRSR2,R0,R1;R2=R0isshiftedrightR1numberoftimes
;Now,R2=0x0000000F,C=1,Z=0,N=0
Thelogicalshiftrightusedforshiftingunsignednumbers.LSRSessentiallydividesRmbyapowerof2foreachbitshift.
MCRMovetoCoprocessorfromARMRegister
SeeARMManual.
MLAMultiplyAccumulate
Flags:Unaffected
Format:MLARd,Rs1,Rs2,Rs3;Rd=(Rs1×Rs2)+Rs3
Function:MultipliesanunsignedwordheldbyRs1byaunsignedwordinRs2andtheresultisaddedtoRs3andplacedinRd.
Example:
MOVR0,#0x20;R0=0x20
MOVR1,#0x50;R1=0x50
MOVR2,#0x10;R2=0x10
MLAR4,R0,R1,R2;nowR4=(0x20×0x50)+10=0xA10
MLSMultiplyandSubtract
Flags:Unaffected
Format:MLSRd,Rm,Rs,Rn;Rd=Rn-(Rs×Rm)
Function:MultipliesanunsignedwordheldbyRmbyaunsignedwordinRsandtheresultissubtractedfromRnandplacedinRd.
Example:
MOVR0,#0x20;R0=0x20
MOVR1,#0x50;R1=0x50
LDRR2,=0x1000;R2=0x1000
MLSR4,R0,R1,R2;nowR4=0x1000-(0x20×0x50)=0x600
MOVMove(ARM7)
Flags:Unaffected.
Format:MOVRd,#imm_value;Rd=imm_Value<0x200
Function:LoadtheRdregisterwithanimmediatevalue.Theimmediatevaluecannotbelargerthan0xFF(0–255).Aftertheexecutiontheflagsarenotupdated.TheMOVSinstructionupdatestheflags.
Example1:
MOVR0,#0x25;R0=0x25
MOVR1,#0x5F;R1=0x5F
ToloadtheARMregisterwithvaluelargerthan0xFFwemustusethe“LDRRd,=32_bit_data.”Forexample,wecanuseLDRR2,=0xFFFFFFFF.
Example2:
LDRR0,=0x2000000;R0=0x2000000
MOVMove(ARMCortex)
Flags:Unaffected.
Format:MOVRd,#imm_val;Rd=imm_val(imm_val<0x10000)
Function:LoadtheRdregisterwithanimmediatevalue.Theimmediatevaluecannotbelargerthan0xFFFF(0–65535).
Example:
MOVR1,#0xF459;R1=0xF459
ToloadtheARMregisterwithvaluelargerthan0xFFFFwemustusethe“LDRRd,=32_bit_data.”ForexamplewecanuseLDRR2,=0xFFFFFFF.
MOVSMove(andupdateflags)
Flags:Affected:C,N,Z
Format:MOVRd,#immediate_value
Function:LoadtheRdregisterwithanimmediatevalueandupdatetheflags.
Example:
MOVSR0,#0x25;R0=0x25,N=0,Z=0,andC=0
MOVSR0,#0x0;R0=0x0,N=0,Z=1,andC=0
MOVTMoveTop
Flags:Unaffected.
Format:MOVTRd,#imm_value;imm_value<0x10000
Function:Loadstheupper16-bitofRdregisterwithanimmediatevalue.Theimmediatevaluecannotbelargerthan0xFFFF(0–65535).Thelower16-bitoftheRdregisterremainsunchanged.
Example:
LDRR0,=0x25579934;R0=0x25579934
MOVTR0,#0xAAAA;R0=0xAAAA9934
MOVWMove16-bitconstant
Flags:Unaffected.
Format:MOVWRd,#imm_value;imm_value<0x10000
Function:LoadtheRdregisterwithanimmediatevalue.Theimmediatevaluecannotbelargerthan0xFFFF(0–65535).
Example:
MOVWR1,#0x5555;R1=0x5555
ToloadtheARMregisterwithvaluelargerthan0xFFFFwemustusethe“LDRRd,=32_bit_data.”ForexamplewecanuseLDRR2,=0xFFFFFFF.
MRCMovetoARMRegisterfromCoprocessor
SeeARMmanual
MRSMovetogeneralRegisterfromSpecialregister
Flags:Unaffected.
Format:MRSRd,special_reg;copyspecial_regtoRd
Function:Copiesthecontentsofaspecialfunctionregistertoageneral-purposeregister.ThisinstructionalongwiththeMSRiswidelyusedtomodifythespecialfunction
registerssuchasCONTROL,PRIMASK,andISPR.Thisistheonlywaywecanaccessthespecialfunctionregisters.
Example:
MRSR1,CONTROL;R1=CONTROL
ANDR1,#0x00;maskthelower8bits
MSRCONTROL,R1
MSRMovetoSpecialregisterfromgeneralRegister
Flags:Unaffected.
Format:MSRspecial_reg,Rn;copyspecial_regtoRn
Function:Copiesthecontentsofageneral-purposeregistertospecialfunctionregister.ThisinstructionalongwiththeMRSiswidelyusedtomodifythecontentsofspecialfunctionregisterssuchasCONTROL,PRIMASK,andISPR.Thisistheonlywaywecanaccessthespecialfunctionregisters.
Example:
MRSR1,CONTROL;R1=CONTROL
ANDR1,#0x00;maskthelower8bits
MSRCONTROL,R1;maskthelower8bitsofCONTROLreg.
MULUnsignedMultiplication
Flags:Affected:N,Z,Unaffected:C,V
Format:MULRd,Rn,Rm;Rd=Rn×Rm
Function:MultipliesawordinregisterRnbyawordinregisterRmandplacestheresultinRd.
Example1:
MOVR0,#100;R0=100
MOVR1,#200;R1=200
MULR3,R0,R1;R3=R0xR1=100x200=20000
Example2:
LDRR0,=10000;R0=10000
LDRR1,=20000;R1=20000
MULR3,R0,R1;R3=R0xR1=10000x20000=200000000
MVNMoveNegative
Flags:Unaffected.
Format:MVNRd,Op2;Rd=1’scomp.ofOp2
Function:PlacesinRdthenegation(the1’scomplement)ofOp2.EachbitofOp2isinverted(logicalNOT)andplacedinRdwhileflagsremainunchanged.
Example1:
MOVR0,#0xAA;R0=0xAA
MVNR2,R0;now,R2=0xFFFFFF55
Example2:
LDRR0,=0xAAAAAAAA;R0=0xAAAAAAAA
MVNR1,R0;R1=0x55555555
Example3:
MVNR0,#0x0F;R0=0xFFFFFFF0
Example4:
MVNR2,#0x0;R0=0xFFFFFFFFwidelyusedtoloadRxwithall1s
MVNSMoveNegativeandupdatetheflags
Flags:NandZareaffected
Format:MVNSRd,Op2;Rd=1’scomp.ofOp2andupdateflags
Function:PlacesinRdthenegation(the1’scomplement)ofOp2.EachbitofOp2isinverted(logicalNOT)andplacedinRdandNandZflagsareupdated.
Example1:
MOVR0,#0xAA;R0=0xAA
MVNSR2,R0;now,R2=0xFFFFFF55,N=1,Z=0
Example2:
LDRR0,=0xAAAAAAAA;R0=0xAAAAAAAA
MVNSR1,R0;R1=0x55555555,Z=0andN=0
Example3:
LDRR0,=0xFFFFFFFF;R0=0xFFFFFFFF
MVNSR1,R0;R1=0x00000000,N=0andZ=1
Example4:
MVNR2,0x0;R0=0xFFFFFFFF,N=1,Z=0
NOPNoOperation
Flags:Unaffected.
Format:NOP
Function:Performsnooperation.Sometimesusedfortimingdelaystowasteclockcycles.UpdatesPC(programcounter)topointtonextinstructionfollowingNOP.InsomeARMCPUs,thepipelineremovestheNOPbeforeitreachestheexecutionstage.
ORNLogicalORNot
Flags:Unaffected.
Format:ORNRd,Rn,Op2;Rd=RnORedwith1’scompofOp2
Function:PerformstheORoperationonthebitsofRnwiththecomplementofthebitsinOp2.TheORNwillnotupdatetheflags.ToupdatetheflagswemustuseORNS.
Inputs Output
A B AOR(NOTB)
0 0 1
0 1 0
1 0 1
1 1 1
Example1:
LDRR1,=0xFFFFFF00;R1=0xFFFFFF00
LDRR2,=0x99999999;R2=0x9999999
ORNR3,R2,R1;nowR3=0x999999FF
Example2:
MOVR1,#0;R1=0
LDRR0=0xFFFFFFFF;R0=0xFFFFFFFF
ORNR2,R1,R0;now,R2=0x0
ORNSORNotandupdateflags
Flags:NandZareaffected
Format:ORNSRd,Rn,Op2;Rd=RnORedwith1’scompofOp2
Function:PerformstheORoperationonthebitsofRnwiththecomplementofthebitsinOp2andupdatestheflags.TheresultisplacedinRd.
Inputs Output
AOR(NOT
A B B)
0 0 1
0 1 0
1 0 1
1 1 1
Example1:
LDRR1,=0xFFFFFF00;R1=0xFFFFFF00
LDRR2,=0x99999999;R2=0x9999999
ORNSR3,R2,R1;now,R3=0x999999FF,N=1,Z=0
Example2:
LDRR0,=0xFFFFFFFF;R0=0xFFFFFFFF
LDRR1,=0x01234567;R1=0x01234567
ORNSR2,R1,R0;now,R2=0x01234567,N=0,Z=0
Example3:
MOVR1,#0;R1=0
LDRR0=0xFFFFFFFF;R0=0xFFFFFFFF
ORNSR2,R1,R0;now,R2=0x0,N=0,Z=1
ORRLogicalOR
Flags:Unaffected
Format:ORRRd,Rn,Op2;Rd=RnORedOp2
Function:PerformslogicalORonthebitsofRnandOp2,andplacestheresultinRd.Oftenusedtoturnabiton.ORRwillnotupdatetheflags.
Example1:
MOVR0,#0xAA;R0=0xAA
ORRR2,R0,#0x55;now,R2=0xFF
Example2:
LDRR0,=0x00010203;R0=00010203
LDRR1,=0x30303030
ORRR2,R0,R1;R2=0x30313233
Example3:
LDRR0,=0x55555555;R0=0x55555555
LDRR1,=0xAAAAAAAA;R0=0xAAAAAAAA
ORRR2,R1,R0;R1=0xFFFFFFFF
ORRSLogicalORandupdatetheflags
Flags:NandZareaffected.
Format:ORRSRd,Rn,Op2;Rd=RnORedOp2,updatetheflags
Function:PerformslogicalORonthebitsofRnandOp2,andplacestheresultinRd.Oftenusedtoturnabiton.ORRSupdatestheflags.
Inputs Output
X Y XORY
0 0 0
0 1 1
1 0 1
1 1 1
Example1:
MOVR0,#0xAA;R0=0xAA
ORRSR2,R0,#0x55;now,R2=0xFF,N=0,Z=0
Example2:
LDRR0,=0x00010203;R0=00010203
LDRR1,=0x30303030
ORRSR2,R0,R1;R2=0x30313233N=0,Z=0
Example3:
LDRR0,=0x55555555;R0=0x55555555
LDRR1,=0xAAAAAAAA;R0=0xAAAAAAAA
ORRSR2,R1,R0;R1=0xFFFFFFFFN=1,Z=0
POPPOPregisterfromStack
Flags:Unaffected.
Format:POP{reg_list};reg_reg=wordsofftopofstack
Function:Copiesthewordspointedtobythestackpointertotheregistersindicated
by the reg_list and increments the SP by 4, 8, 12, 16,… depending on the number ofregistersinthereg_list.
Example:
POP{R1};POPthetopwordofstacktoR1
POP{R1,R4,R7};POPthetop3wordsofstacktoR1,R4,R7
POP{R2-R6};POPthetop5wordsofstacktoR2-R6
POP{R0,R5};POPthetop2wordsofstacktoR0andR5
POP{R0-R7};POPthetop8wordsofstacktoR0-R7
ThePOPinstructionissynonymsforLDMIA.
PUSHPUSHregisterontostack
Flags:Unaffected.
Format:PUSH{reg_list};PUSHreg_listontostack
Function:Copiesthecontentsofregistersstatedinreg_listontothestackanddecrementsSPby4,8,12,16,…dependingonthenumberofregistersinreg_list.
Example:
PUSH{R1};PUSHtheR1ontotopofstack
PUSH{R1,R4,R7};PUSHR1,R4,R7ontotopofstack
PUSH{R2-R6};PUSHtheR2,R3,R4,R5,R6ontotopofstack
PUSH{R0,R5};PUSHtheR0andR5ontotopofstack
PUSH{R0-R7};PUSHtheR0throughR7ontotopofstack
ThePUSHinstructionissynonymsforSTMDB.
RBITReverseBits
Flags:Unaffected.
Format:RBITRd,Rn;ReversethebitorderofRnandplaceinRd
Function:Reversesthebitpositionorderofthe32-bitvalueinRnregisterandplacetheresultinRd.
Example:
MOVR1,#0x5F
RBITR2,R1;now,R2=0xF5000000
REVReversebyteorderinaword
Flags:Unaffected
Format:REVRd,Rn;ReversethebyteofRnandplaceitinRd
Function:Reversesthebytepositionorderofthe32-bitvalueinRnregisterandplacestheresultinRd.Thiscanbeusedtoconvertfromlittleendiantobigendianorfrombigendiantolittleendian.
Example:
LDRR1,=0x12345678
REVR2,R1;now,R2=0x78564312
RV16Reversebyteorderin16-bit
Flags:Unaffected
Format:REV16Rd,Rn;ReversethebitsifRnandplaceitinRd
Function:Reversesthe16-bitpositionorderofthe32-bitvalueinRnregisterandplacestheresultinRd.Thiscanbeusedtoconvert16-bitlittleendiantobigendianorfrom16-bitbigendiantolittleendian.
Example:
LDRR1,=0x559922FF
RV16R2,R1;now,R2=0x22FF5599
REVSHReversebyteorderinbottomhalfwordandsignextend
Flags:Unaffected
Format:REVSHRd,Rn;Rd=ReversethebyteandsignextendRn
Function:Reversesthe16-bitpositionorderofRnregisterandaftersignextendingto32-bititisplacedinRd.Thiscanbeusedtoconvertasigned16-bitlittleendianto32-bitsignedbigendianorfromsigned16-bitbigendianto32-bitsignedlittleendian.
Example:
LDRR1,=0x559922FF
REVSHR2,R1;now,R2=0x22FF5599
RORRotateRight
Flags:Unaffected.
Format:RORRd,Rm,Rn;Rd=rotateRmrightRnbitpositions
Function:AseachbitofRmregistershiftsfromlefttoright,theyexitfromtherightend(LSB)andenterfromleftend(MSB).ThenumberofbitstoberotatedrightisgivenbyRnandtheresultisplacedinRdregister.TheRORdoesnotupdatetheflags.
Example1:
LDRR2,=0x00000010
RORR0,R2,#8;R0=R2isrotatedright8times
;now,R0=0x10000000,C=0
Example2:
LDRR0,=0x00000018
MOVR1,#12
RORR2,R0,R1;R2=R0isrotatedrightR1numberoftimes
;now,R2=0x01800000,C=0
Example3:
LDRR0,=0x0000FF18
MOVR1,#16
RORR2,R0,R1;R2=R0isrotatedrightR1numberoftimes
;Now,R2=0xFF180000,C=0
RORSRotateRight(updatetheflags)
Flags:C,N,Zareaffected.
Format:RORSRd,Rm,Rn;Rd=rotateRmrightRnbitpositions
Function:AseachbitofRmregistershiftsfromlefttoright,theyexitfromtherightend(LSB)andentersfromleftend(MSB).InadditionaseachbitexitstheLSB,acopyisofitisgiventoCflag.ThenumberofbitstoberotatedrightisgivenbyRnandtheresultisplacedinRdregister.TheRORSupdatestheflags.
Example1:
LDRR2,=0x00000010
RORSR0,R2,#8;R0=R2isrotatedright8times
;now,R0=0x01000000,C=0,N=0,Z=0
Example2:
LDRR0,=0x00000018
MOVR1,#12
RORSR2,R0,R1;R2=R0isrotatedrightR1numberoftimes
;now,R2=0x01800000,C=0,N=0,Z=0
Example3:
LDRR0,=0x0000FF18
MOVR1,#16
RORSR2,R0,R1;R2=R0isrotatedrightR1numberoftimes
;now,R2=0xFF180000,C=1,N=0,Z=0
RRXRotateRightwithextend
Flags:Unaffected.
Format:RRXRd,Rm;Rd=rotateRmright1bitposition
Function:EachbitofRmregisterisshiftedfromlefttorightonebit.TheRRXdoesnotupdatetheflags.
Example:
LDRR2,=0x00000002
RRXR0,R2;R0=R2isshiftedrightonebit
;now,R0=0x00000001
RRXSRotateRightwithextend(updatetheflags)
Flags:Affected.
Format:RRXSRd,Rm;Rd=rotateRmright1bitposition
Function:EachbitofRmregisterisshiftedfromlefttorightonebit.TheRRXSupdatestheflags.
Example1:
LDRR2,=0x00000002
RRXSR0,R2;R0=R2isshiftedrightonebit
;now,R0=0x00000001
RSBReverseSubtract
Flags:Unaffected
Format:RSBRd,Rn,Op2;Rd=Op2-Rn
Function:SubtractstheRnfromtheOp2andputstheresultintheRd.TheRSBhasnoeffectonflags.ThestepsforsubtractionperformedbytheinternalhardwareoftheCPUareasfollows:
1.Takesthe2’scomplementoftheRn
2.AddsthistotheOp2
3.PlacestheresultinRd
TheOp2andRnoperandsremainunchangedbythisinstruction.
Example:
LDRR0,=0x55555555;R0=0x55555555
LDRR1,=0x99999999;R1=0x99999999
RSBR2,R0,R1;R2=R1-R0
;For“RSBR2,R0,R1”wehave:
;R2=R1-R0=0x99999999-0x55555555=
;R2=0x99999999+2’scompof0x55555555
;R2=0x99999999+0xAAAAAAAB=0x44444444
;0x99999999
;-0x55555555
;–––––
;0x44444444
RSBSReverseSubtractandupdatetheflags
Flags:Affected:V,N,Z,C.
Format:RSBSRd,Rn,Op2;Rd=Op2-Rn
Function:SubtractstheRnfromtheOp2andputstheresultintheRd.TheRSBSupdatestheflags.
ThestepsforsubtractionperformedbytheinternalhardwareoftheCPUareasfollows:
1.Takesthe2’scomplementoftheRn
2.AddsthistotheOp2
3.PlacestheresultintheRd
TheRnandOp2operandsremainunchangedbythisinstruction.
Example:
LDRR0,=0x55555555;R0=0x55555555
LDRR1,=0x99999999;R1=0x99999999
RSBR2,R0,R1;R2=R1-R0
;For“RSBR2,R0,R1”wehave:
;R2=R1-R0=0x99999999-0x55555555=
;R2=0x99999999+2’scompof0x55555555
;R2=0x99999999+0xAAAAAAAB=0x44444444
;0x99999999
;-0x55555555
;–––––
;0x44444444C=0,N=0,Z=0,V=0
SBCSubtractwithCarry(Borrow)
Flags:Unaffected
Format:SBCRd,Rn,Op2;Rd=Rn–Op2–(1–C)
Function:SubtractstheOp2operandfromtheRn,placingtheresultinRd.IfC=0,itsubtracts1fromtheresult;otherwise,itoperateslikeSUB.TheSBChasnoeffectonflags.Thisisusedwidelyformultiword(64-bit)subtraction.
Example:
LDRR0,=0x55555555;R0=0x55555555
LDRR1,=0x99999999;R1=0x99999999
SUBSR2,R0,R1;R2=R0-R1
MOVR3,#0x09;R3=0x09
SBCR4,R3,#03;R4=R3-0x3
;ForSUBSwehave:
;R2=R1-R0=0x55555555-0x99999999=
;R2=0x55555555+2’scompof0x99999999
;R2=0x55555555+0x66666667=0xBBBBBBBCC=0
;ForSBCwehave:
;R4=R3-0x3=0x09-0x3-(1-C)=9-3-1
;R4=0x9+2’comp.of-4=0x9+0xFFFFFFFC=0x05
;0x0000000955555555
;-0x0000000399999999
SBCSSubtractwithCarry(Borrow)andupdatetheflags
Flags:C,Z,N,V
Format:SBCSRd,Rn,Op2;Rd=Rn–Op2–(1–C)
Function:SubtractstheOp2operandfromtheRn,placingtheresultinRd.IfC=0,itsubtracts1fromtheresult;otherwise,itexecuteslikeSUB.TheSBCSupdatestheflags.Thisisusedwidelyformultiword(64-bit)subtraction.
Example:
LDRR0,=0x55555555;R0=0x55555555
LDRR1,=0x99999999;R1=0x99999999
SUBSR2,R0,R1;R2=R0-R1
MOVR3,#0x09;R3=0x09
SBCSR4,R3,#03;R4=R3-0x3
;ForSUBSwehave:
;R2=R1-R0=0x55555555-0x99999999=
;R2=0x55555555+2’scompof0x99999999
;R2=0x55555555+0x66666667=0xBBBBBBBCC=0
;ForSBCSwehave:
;R4=R3-0x3=0x09-0x3-(1-C)=9-3-1
;R4=0x9+2’comp.of-4=0x9+0xFFFFFFFC=0x05
;0x0000000955555555
;-0x0000000399999999
;––––––
;0x10000005BBBBBBBCC=0,Z=0,V=0,N=0
SBFXSignBitFieldextract
Flags:Unaffected
Format:SBFXRd,Rn,#LSB,#Width
Function:ExtractsthebitfieldfromtheRnregisterandthenaftersignextendingitisplacedinRd.The#LSBindicateswhichbitand#Widthindicateshowmanybits.
Example1:
LDRR0,=0x00000543;R0=0x00000543
SBFXR2,R0,#8,#4;now,R2=0x00000005
Example2:
LDRR0,=0x00000C43;R0=0x00000C43
SBFXR2,R0,#4,#8;now,R2=0xFFFFFFC4
SDIVSignedDivide
Flags:Unaffected
Format:SDIVRd,Rn,Rm;Rd=Rn/Rm
Function:DividesasignedintegerwordinRnbyanothersignedintegerwordinRm.ThequotientresultisplacedinRd.IfvalueinRnregisterisnotdivisiblebythevalueinRmregister,theresultisroundedtozeroandplacedinRd.Dividebyzerocausesinterrupttype3.
Example:
LDRR0,=-20000;R0=-20000
LDRR1,=-1000;R1=-1000
SDIVR2,R0,R1;now,R2=-2000/-1000=2
SEVSendEvent
Flags:Affected.
Format:SEV
Function:Sendssignaltoalltheprocessorsinthemultiprocessorssystem.SeetheARMCortexmanual.
SMLALSignedMultiplyAccumulateLong
Flags:Unaffected
Format:SMLALRdlo,Rdhi,Rn,Rm;Rdhi:Rdlo=(Rm×Rn)+(Rdhi:Rdlo)
Function:MultipliessignedwordsinRnandRmregister,addsthe64-bitresulttoRdhi:Rdloregister,andsavesthefinalresultinRdhi:Rdlo.TheRdlo(low)andRdhi(high)arethelowerwordandhigherwordofa64-bitvalue.
Example1:
LDRR0,=0
LDRR1,=0x23
LDRR2,=-5000
LDRR3,=-4000
SMLALR0,R1,R2,R3;now,R3:R2=(R3:R2)+(R1×R0)
;=0x2300000000+(-5000×-4000)
;=0x2300000000+20000000
;=0x23000000+0x1312D00=0x2301312D00
;=>R0=0x1312D00andR1=0x23
SMULLSignedMultiplyLong
Flags:Unaffected
Format:SMULLRdlo,Rdhi,Rn,Rm;Rdhi:Rdlo=Rm×Rn
Function:MultipliessignedwordsinRnandRmregister,andsavestheresultinRdhi:Rdlo.TheRdl(low)andRdh(high)arethelowerwordandhigherwordofa64-bitvalue.
Example:
LDRR0,=-20000;R0=-20000(signed2’scomp)
LDRR1,=-1000000;R0=-100000(signed2’scomp)
SMLALR2,R3,R0,R1;now,R3:R2=R1×R0=-20000×-1000000=
;20000000000=0x4A817C800=>R3=0x4and
;R2=0xA817C800
SSATSignSaturate
Flags:Unaffected.
Format:SSATRd,#n,Rm,shift#
Function:Usedforsaturationoperation.SeeARMCortexmanual.
STMStoreMultiple
Flags:Unaffected.
Format:STMRn,{Rx,Ry,…}
Function:StoresregistersRx,Ry,…intoconsecutivememorylocations.ThestartingaddressofmemorylocationisgivenbyRnregister.Thesourceregistersareseparatedbycommaandplacedinbraces.IntheARMCortex,thedefaultstackisdescendingmeaningthatasinformationarepushedontostackthestackpointerisdecremented.ThisIA(IncrementtheaddressAftereachaccess)isthedefault.ThisinstructioniswidelyusedforPushing(storing)multipleregistersintoascendingstack.
Example:
LDRR7,=0x12000
LDRR0,=0x82381046;R0=0x82381046
LDRR2,=0x15585056;R2=0x15585056
LDRR4,=0x39686063;R4,=0x39686063
STMR7,{R0,R2,R4};now,R2=0x15585056,..
;ThecontentsofregistersR0,R2,andR4arestoredinto
;consecutivememorylocationsstartingatanaddressgivenbyR7.
;TheR0contentsarestoredintomemorylocations0x12000-0x12003,
;theR2contentsarestoredintomemorylocations0x12004through
;0x12007,andsoon.Thisisshownbelow.
;12000=(46)
;12001=(10)
;12002=(38)
;12003=(82)
;12004=(56)
;12005=(50)
;12006=(58)
;12007=(15)
;12008=(63)
;12009=(60)
;1200A=(68)
;1200B=(39)
STMDBStoreMultipleregisterandDecrementBefore
Flags:Unaffected.
Format:STMDBRn,{Rx,Ry,…}
Function:StoresregistersRx,Ry,…intoconsecutivememorylocations.ThestartingaddressofmemorylocationisgivenbyRnregister.Thesourceregistersareseparatedbycommaandplacedinbraces.IntheARMCortex,thedefaultstackisdescendingmeaningthatasinformationarepushedontostackthestackpointerisdecremented.SinceIA(IncrementtheaddressAftereachaccess)isthedefaultweneedtouseDB(DecrementtheaddressBeforeeachaccess)istooverwritethedefault.ThisinstructioniswidelyusedforPushing(storing)multipleregistersintoDescendingstack.
Example:
LDRR7,=0x12000
LDRR0,=0x39686063;R0=0x39686063
LDRR2,=0x15585056;R2=0x15585056
LDRR4,=0x82381046;R4,=0x82381046
STMDBR7,{R0,R2,R4}
;ThecontentsofregistersR0,R2,andR4arestoredinto
;consecutivememorylocationsstartingatanaddressgivenbyR7.
;TheR0contentsarestoredintomemorylocations0x11FFF-0x11FFC,
;theR2contentsarestoredintomemorylocations0x11FFB
;through0x11FF8,andsoon.Thisisshownbelow.
;11FF4=(46)
;11FF5=(10)
;11FF6=(38)
;11FF7=(82)
;11FF8=(56)
;11FF9=(50)
;11FFA=(58)
;11FFB=(15)
;11FFC=(63)
;11FFD=(60)
;11FFE=(68)
;11FFF=(39)
STMEAStoreMultipleregisterEmptyAscending
Flags:Unaffected.
Format:STMEARn,{Rx,Ry,…}
Function:ThisissameasSTM.
STMIAStoreMultipleregisterEmptyAscending
Flags:Unaffected.
Format:STMIARn,{Rx,Ry,…}
Function:ThisissameasSTM.
STMFDStoreMultipleregisterFullDescending
Flags:Unaffected.
Format:STMFDRn,{Rx,Ry,…}
Function:ThisisanothernameforSTMDB.TheFDisforpushingontoFullDescendingstacks
STRStoreRegister
Flags:Unaffected.
Format:STRRd,[Rx];StoreRdintomemorylocationpointedto
beRx
Function:StoresRdregisterintofourconsecutivememorylocations.The[Rx]pointstostartingaddressofmemorylocation.Thisiswidelyusedtostore32-bitregisterintomemorylocations.
Example:
LDRR1,=0x82381046;R1=0x82381046
LDRR0,=0x12000;R0=0x12000
STRR1,[R0];now,
;12000=(46)
;12001=(10)
;12002=(38)
;12003=(82)
STRBStoreRegisterByte
Flags:Unaffected.
Format:STRBRd,[Rn]
Function:StoresthelowestbyteoftheRdregisterintoasinglememorylocationindicatedbyRn.
Example:
LDRR1,=0x82381046;R1=0x82381046
LDRR0,=0x12000;R0=0x12000
STRBR1,[R0];now,12000=(46)
STRBTStoreRegisterBytewithTranslation
Flags:Unaffected.
Format:STRBT
Function:StoresthelowestbyteoftheRdregisterintoasinglememorylocationindicatedbyRn.ThisisthesameasSTRBbutisusedforunprivilegedmemoryaccess.SeeARMCortexmanual.
Example:
LDRR1,=0x82381046;R1=0x82381046
LDRR0,=0x12000;R0=0x12000
STRBTR1,[R0]
;now,12000=(46)
STRDStoreRegisterDouble(twowords)
Flags:Unaffected.
Format:STRDRd,[Rn]
Function:StorestworegistersofRdandRd+1into8consecutivememorylocationsindicatedbyRn.RdcanbeR0,R2,R4,R6,R8,R10,orR12.
Example:
LDRR2,=0x12000
LDRR0,=0x82381046;R0=0x82381046
LDRR1,=0x15585056;R1=0x15585056
STRDR0,R1,[R2];storeR0andR1intomemorylocationsstarting
;atanaddressgivenbyR2.Now,wehave:
;12000=(46)
;12001=(10)
;12002=(38)
;12003=(82)
;12004=(56)
;12005=(50)
;12006=(58)
;12007=(15)
STREX,STREXB,STREXHStoreRegisterExclusive
Function:TheyareusedwithLDREX,LDREXB,andLDREXHinstructionstoperformCPUsynchronizationoperation.SeeARMCortexmanual.
STRHStoreRegisterHalfword
Flags:Unaffected.
Format:STRHRd,[Rn]
Function:Storesthelower2bytesoftheRdregisterintotwoconsecutivememorylocationsindicatedbyRn.
Example:
LDRR1,=0x82381046;R1=0x82381046
LDRR0,=0x12000;R0=0x12000
STRBR1,[R0];now,12000=(46),and12001=(10)
STRTStoreRegister
Flags:Unaffected
Format:STRTRx,[Rn]
Function:StoresRxregisterintomemorylocationpointedtobyRx.ThisisthesameasSTRbutisusedforunprivilegedmemoryaccess.SeeARMCortexmanual.
Example:
LDRR1,=0x82381046;R1=0x82381046
LDRR0,=0x12000;R0=0x12000
STRTR1,[R0]
;now,12000=(0x82381046)
SUBSubtract
Flags:Unaffected
Format:SUBRd,Rn,Op2;Rd=Rn–Op2
Function:SubtractstheOp2fromtheRnandputstheresultintheRd.Hasnoeffectonflags.ThestepsforsubtractionperformedbytheinternalhardwareoftheCPUareasfollows:
1.Takesthe2’scomplementoftheOp2
2.AddsthistotheRn
3.PlacetheresultintheRd
TheRdandOp2operandsremainunchangedbythisinstruction.
Example:
LDRR0,=0x55555555;R0=0x55555555
LDRR1,=0x99999999;R1=0x99999999
SUBR2,R1,R0;R2=R1-R0
;For“SUBR2,R1,R0”wehave:
;R2=R1-R0=0x99999999-0x55555555=
;R2=0x99999999+2’scompof0x55555555
;R2=0x99999999+0xAAAAAAAB=0x44444444
;0x99999999
;-0x55555555
;–––––
;0x44444444
SUBSSubtract
Flags:Affected:V,N,Z,C.
Format:SUBRd,Rn,Op2;Rd=Rn–Op2
Function:SubtractstheOp2fromtheRnandputstheresultintheRd.TheSUBSupdatestheflags.ThestepsforsubtractionperformedbytheinternalhardwareoftheCPUareasfollows:
1.Takesthe2’scomplementoftheOp2
2.AddsthistotheRn
3.PlacetheresultintheRd
TheRdandOp2operandsremainunchangedbythisinstruction.
Example:
LDRR0,=0x55555555;R0=0x55555555
LDRR1,=0x99999999;R1=0x99999999
SUBSR2,R1,R0;R2=R1-R0
;For“SUBSR2,R1,R0”wehave:
;R2=R1-R0=0x99999999-0x55555555=
;R2=0x99999999+2’scomp.of0x55555555
;R2=0x99999999+0xAAAAAAAB=0x44444444
;0x99999999
;-0x55555555
;––––—
;0x44444444C=1,Z=0,N=0,V=0
SVCsupervisorCall(SoftwareInterrupt)
Flags:Unaffected.
Format:SVC#imm_value
Function:Itisusedbyapplicationsoftwaretogetservicesfromoperatingsystems(OS).ThisisliketheSWI(softwareinterrupt)instructioninARM7.
SXTBSignExtendbyte
Flags:Unaffected.
Format:SXTBRd,Rm
Function:ConvertsasignedbyteinRmintoasignedwordbycopyingthesignbit(D7)ofRmintoallthebitsofRd.UsedwidelytoconvertasignedbyteinRmtoasignedwordtoavoidtheoverflowprobleminsignednumberarithmetic.
Example:
MOVR1,#0xFB;R1=0xFBwhichis2’scomplementof-5
SXTBR0,R1;now,R0=0xFFFFFFFB
;R1=00000000000000000000000011111011
;nowR0=0xFFFFFFFB
;R0=11111111111111111111111111111011
SXTHSignExtendHalfword
Flags:Unaffected.
Format:SXTHRd,Rm
Function:ConvertsasignedhalfwordinRmintoasignedwordbycopyingthesignbit(D15)ofRmintoallthebitsofRd.Usedwidelytoconvertasignedhalfword(16-bit)inRmtoasignedwordtoavoidtheoverflowprobleminsignednumberarithmetic.
Example:
;assumeR1=0xFFFBwhichis2’scomplementof-5
SXTHR0,R1;now,R0=0xFFFFFFFB
;R1=00000000000000001111111111111011
;now,R0=0xFFFFFFFB
;R0=11111111111111111111111111111011
TBBTableBranchByte
Flags:Unaffected.
Format:TBB[Rn,Rm]
Function:BranchesforwardusingtableofsinglebyteoffsetusingPC-relativeaddressingmode.RnhasstartingaddressofthetableandRmisanindexintothetable.SeeARMCortexM3manual.
TBHTableBranchhalfword
Flags:Unaffected.
Format:TBH[Rn,Rm,LSL#1]
Function:BranchesforwardusingtableofhalfwordoffsetusingPC-relativeaddressingmode.RnhasstartingaddressofthetableandRmisanindexintothetable.The“LSL#1”shiftslefttheaddressoncetomakeithalfwordalignedaddress.SeeARMCortexM3
manual.
TEQTestEquivalence
Flags:Affected:NandZ
Format:TEQRn,Op2;performsRnEx-OROp2
Function:PerformsabitwiselogicalEx-ORonRnandOp2,settingflagsbutleavingthecontentsofbothRnandOp2unchanged.WhiletheEORSinstructionchangesthecontentsofthedestinationandtheflagbits,theTEQinstructionchangesonlytheflagbits.Thisiswidelyusedtoseeiftworegistersareequal.
Example1:
TEQR1,R2;checktoseeifR1=R2.IfsoZ=1.R1andR2
;remainunchanged
Example2:
TEQR2,#0x01;checktoseeifD0ofR2is1,ifsoZ=1.R2
;remainsunchanged
Example3:
TEQR1,#0xFF;checktoseeifD7_D0ofR1are1s,
;ifsoZ=1.R1remainsunchanged
TSTTest
Flags:Affected:NandZ
Format:TSTRn,Op2;performsRnANDOp2
Function:PerformsabitwiselogicalANDonRnandOp2,settingflagsbutleavingthecontentsofbothRnandOp2unchanged.WhiletheANDSinstructionchangesthecontentsofthedestinationandtheflagbits,theTSTinstructionchangesonlytheflagbits.TotestwhetherabitofRnis0or1,usetheTSTinstructionwithanOp2constantthathasthatbitsetto1andallotherbitsclearedto0.
Example1:
TSTR1,#0x01;checktoseeifD0ofR1iszero,ifsoZ=1.
;R1remainunchanged
Example2:
TSTR1,#0xFF;checktoseeifanybitsofR1iszero,ifso
;Z=1.R1remainunchanged
UBFXUnsignedBitfiledextract
Flags:Unaffected.
Format:UBFXRd,Rn,#LSB,#Width
Function:ExtractsthebitfieldfromtheRnregisterandthenzeroextendsitandplacesinRd.The#LSBindicatesfromwhichbitand#Widthindicateshowmanybits.
Example1:
LDRR0,=0x00077555;R0=0x00077555
UBFXR2,R0,#8,#4;now,R2=0x00000005
Example2:
LDRR0,=0x12345678;R0=0x12345678
UBFXR2,R0,#8,#12;now,R2=0x00000456
UDIVUnsignedDivide
Flags:Unaffected
Format:UDIVRd,Rn,Rm;Rd=Rn/Rm
Function:DividesanunsignedintegerwordinRnbyanotherunsignedintegerwordinRm.ThequotientresultisplacedinRd.IfvalueinRnregisterisnotdivisiblebythevalueinRmregister,theresultisroundedtozeroandplacedinRd.Dividebyzerocausesexceptioninterrupt.
Example1:
LDRR0,=100;R0=100
LDRR1,=2000
UDIVR2,R1,R0;now,R2=R1/R0=2000/100=20
Example2:
LDRR0,=20000;R0=20000
UDIVR2,R0,#100;now,R2=2000/100=20
UMLALUnsignedMultiplywithAccumulate
Flags:Unaffected
Format:UMLALRdLo,RdHi,Rn,Rm;RdHi:RdLo=(Rm×Rn)+(RdHi:RdLo)
Function:MultipliesunsignedwordsinRnandRmregister,addsthe64-bitresulttoRdHi:RdLoregisters,andsavesthefinalresultinRdHi:RdLo.TheRdLo(low)andRdHi(high)aretheunsignedlowerwordandhigherwordofthe64-bitvalue.
Example:
LDRR0,=20000;R0=20000
LDRR1,=1000
LDRR2,=5000
LDRR3,=4000
UMLALR2,R3,R0,R1;now,R3:R2=R1×R0+R3:R2
UMULLUnsignedMultiplyLong
Flags:Unaffected
Format:UMULLRdLo,RdHi,Rn,Rm;RdHi:RdLo=Rm×Rn
Function:MultipliesunsignedwordsinRnandRmregisters,andsavestheresultinRdHi:RdLo.TheRdLo(low)andRdHi(high)arethelowerwordandhigherwordofa64-bitvalue.
Example:
LDRR0,=20000;R0=20000
LDRR1,=10000;R1=10000
LDRR2,=50000;R2=50000
LDRR3,=40000;R3=40000
UMLALR2,R3,R0,R1;now,R3:R2=R1×R0
USATUnsignedSaturate
Flags:Unaffected
Format:USATRd,#n,Rm,shift#
Function:Usedforunsignedsaturationoperation.SeeARMCortexmanual.
UXBTZeroextendabyte
Flags:Unaffected
Format:UXBTRd,Rm
Function:ZeroextendsabyteinRmandplacesinRd.UsedwidelytoconvertabyteinRmtowordforsignednumberoperations.
Example:
MOVR1,#0xFB;R1=0xFB
UXBTR0,R1;now,R0=0x00000000FB
;R1=00000000000000000000000011111011
;nowR0=0x000000FB
;R0=000000000000000000000000011111011
UXTHZeroextendhalfword
Flags:Unaffected
Format:UXTHRd,Rm
Function:ZeroextendsahalfwordinRmandplacesinRd.UsedwidelytoconvertahalfwordinRmtowordforsignednumberoperations.
Example:
;assumeR1=0xFFFB
UXTHR0,R1;now,R0=0x00000FFFB
;R1=00000000000000001111111111111011
;now,R0=0x0000FFFB
;R0=00000000000000001111111111111011
WFEWaitforevent
Flags:Unaffected
Format:WFE
Function:Usedbypowermanagement.SeeARMCortexM3manual.
WFIWaitforinterrupt
Flags:Unaffected
Format:WFI
Function:Suspendsexecutionuntiloneofthefollowingeventsoccurs:
1.anon-maskedinterruptoccursandistaken,
2.aninterruptmaskedbyPRIMASKbecomespending,
3.aDebugEntryrequest.
SeeARMCortexmanual.
AppendixB:ARMAssemblerDirectives
SectionB.1:ListofARMAssemblerDirectives
ALIGN
AREA
DCBdirective(defineconstantbyte)
DCDdirective(defineconstantword)
DCWdirective(defineconstanthalf-word)
ENDPorENDFUNC
ENTRY
EQU(Equate)
EXPORTorGLOBAL
EXTRN(External)
FUNCTIONorPROC
INCLUDE
RN(equate)
SectionB.2:DescriptionofARMAssemblerDirectivesDirectives,orastheyaresometimescalled,pseudo-opsorpseudo-instructions,are
usedby theassembler to translateAssembly languageprograms intomachine language.Unlikethemicroprocessor’sinstructions,directivesdonotgenerateanyopcode;therefore,nomemorylocationsareoccupiedbydirectivesinthefinalhexversionoftheassemblyprogram.Tosummarize,directivesgivedirectionstotheassemblerprogramtotellithowto generate the machine code; instructions are assembled into machine code to giveinstructionstotheCPUatexecutiontime.Thefollowingaredescriptionsofthesomeofthemostwidelyuseddirectives for theARMassembler.Theyaregiven inalphabeticalorderforeaseofreference.
ALIGN
Format:
ALIGNn;nisanypowerof2from20to231
Thisisusedtomakesuredataisalignedin32-bitwordor16-bithalfwordmemoryaddress.Ifnisnotspecified,ALIGNsetsthecurrentlocationtothenextword(fourbyte)boundary.ThefollowingusesALIGNtomakethedatawordandhalfwordaligned:
ALIGN4;Thenextinstructionisword(4bytes)aligned
ALIGN;Thenextinstructionisword(4bytes)aligned
ALIGN2;Thenextinstructionishalfword(2bytes)aligned
Noticethat,thisALIGNdirectiveshouldnotbeconfusedwiththeALIGNattributeoftheAREAdirective.
AREA
Format:
AREAsectionnameattribute,attribute,…
TheAREA directive tells the assembler to define a new section ofmemory. ThememorycanbecodeordataandcanhaveattributessuchasReadOnly,ReadWrite,andsoon.This iswidelyused todefineoneormoreblocksof indivisiblememoryforcodeordatatobeusedbythelinker.EveryassemblylanguageprogramhasatleastoneAREA.
ThefollowinglinedefinesanewareanamedMY_ASM_PROG1whichhasCODEandREADONLYattributes:
AREAMY_ASM_PROG1CODE,READONLY
Among widely used attributes are CODE, DATA, READONLY, READWRITE,COMMON,andALIGN.Thefollowingdescribesthesewidelyusedattributes.
CODE is an attribute given to an area of memory used for executable machineinstruction.SinceitisusedforcodesectionoftheprogramitisbydefaultREADONLYmemory.InARMAssemblylanguageweusethisareatowriteourinstructions.
DATA isanattributegiven toanareaofmemoryusedfordataandno instruction(machine instructions)canbeplaced in thisarea.Since it isusedfordatasectionof theprogram it is bydefault aREADWRITEmemory. InARMAssembly languageweusethisareatosetasideSRAMmemoryforscratchpadandstack.
READWRITE isanattributegiven toanareaofmemorywhichcanbereadfromandwrittento.SinceitisREADWRITEsectionoftheprogramitisbydefaultforDATA.InARMAssemblylanguageweusethisareatosetasideSRAMmemoryforscratchpadandstack.
READONLY is an attribute given to an area ofmemorywhich can only be readfrom.SinceitisREADONLYsectionoftheprogramitisbydefaultforCODE.InARMAssemblylanguageweusethisareatowriteourinstructionsformachinecodeexecution.
COMMONisanattributegiventoanareaofDATAmemorysectionwhichcanbeusedcommonlybyseveralprogramcodes.WedonotinitializetheCOMMONsectionofthe memory since it is used by compiler exclusively. The compiler initializes theCOMMONmemoryareawithallzeros.
ALIGN is another attribute given to an area ofmemory to indicate howmemoryshouldbeallocatedaccordingtotheaddresses.WhentheALIGNisusedforCODEandREADONLYitalignedin4-bytesaddressboundarybydefaultsincetheARMinstructionsare all 32-bit (4-bytes) word. The ALIGN attribute of AREA has a number after likeALIGN=3whichindicatestheinformationshouldbeplacedinmemorywithaddressesof23,thatis0x50000,0x50008,0x50010,0x50020,andsoon.ThisALIGNattributeoftheAREAshouldnotbeconfusedwiththeALIGNdirective.
DCBdirective(defineconstantbyte)
Format:
labelDCBn;nbetween-128to256,byteorstring
The DCB directive allocates a byte size memory and initializes the values forreadingonly.
MYVALUEDCB5;MYVALUE=5
MYMSAGEDCB“HELLOWORLD”;string
DCDdirective(defineconstantword)
Format:
labelDCDn
The DCD directive allocates a word size memory and initializes the values forreadingonly.Thedatais32bitaligned.
MYDATADCD0x200000,0xF30F5,5000000,0xFFFF9CD7
DCWdirective(defineconstanthalf-word)
Format:
labelDCBn
TheDCWdirectiveallocatesahalf-wordsizememoryandinitializesthevaluesforreadingonly.
MYDATADCW0x20,0xF230,5000,0x9CD7
END
EveryprogrammusthaveanentryandanENDpoint.Thelabelsfortheentryandendpointsmustmatch.TheENDdirectivetellstheassemblerthatithasreachedtheendofsourcefile.
AREAPROG_C2,CODE,READONLY
ENTRY
…
END
ENDPorENDFUNC
TheENDFUNCorENDPdirectiveinformstheassemblerthatithasreachedtheendofafunction.ENDFUNCandENDParethesame.SeeFUNCTIONorPROCdirectives.
ENTRY
TheENTRYdirective shows the entry point of a program to the assembler.Eachprogrammusthaveoneentrypoint.
AREAPROG_C2,CODE,READONLY
ENTRY
…
END
EQU(Equate)
Toassignafixedvaluetoaname,oneusestheEQUdirective.Theassemblerwillreplaceeachoccurrenceofthenamewiththevalueassignedtoit.
DATA1EQU0x39;thewaytodefinehexvalue
PORTBEQU0xF0018000;SFRPortBaddress
SUM1EQU0x40000120;assignRAMloctoSUM1
Unlike data directives such asDCB,DCD, and so on, EQU does not assign anymemorystorage;therefore,itcanbedefinedatanytimeandatanyplace,andcanevenbeusedwithinthecodesegment.
EXPORTorGLOBAL
Toinformtheassemblerthatanameorsymbolwillbereferencedbyothermodules
(in other files), it is marked by the EXPORT or GLOBAL directives. If a module isreferencinganameoutsideitself, thatnamemustbedeclaredasEXTRN(orIMPORT).Correspondingly, in the module where the variable is defined, that variable must bedeclaredasEXPORTorGLOBALinordertoallowittobereferencedbyothermodules.SeetheEXTRNdirectiveforexamplesoftheuseofbothEXTRNandEXPORT.
EXTRN(External)
TheEXTRNdirectiveisusedtoindicatethatcertainvariablesandnamesusedinamodule are defined by another module. In the absence of the EXTRN directive, theassemblerwouldsearchforthedefinitionandgiveanerrorwhenitcouldn’tfindit.Theformatofthisdirectiveis:
EXTRNname
ThefollowingexampleshowshowtheEXPORTandEXTERNdirectivesareused:
;fromthemainprogram:
EXTRNMY_FUNC
…
BLMY_FUNC
…
;–––––––––––––
;MY_FUNCislocatedinadifferentfile:
AREAOUR_EXAMPLE,CODE,READONLY
EXTRNDATA1
EXPORTMY_FUNC
MY_FUNCFUNCTION
…
LDRR1,=DATA1
…
ENDFUNC
Notice that the EXTRN directive is used in the main procedure to show thatMY_FUNC is defined in another module. This is needed because MY_FUNC is notdefined in that module. Correspondingly, MY_FUNC is defined as GLOABAL in themodulewhere it is defined. EXTRN is used in theMY_FUNCmodule to declare thatoperandDATA1hasbeendefinedinanothermodule.Correspondingly,DATA1isdeclaredasGLOBALinthecallingmodule.
FUNCTIONorPROC
Often,agroupofAssemblylanguageinstructionswillbecombinedintoaprocedure
sothatitcanbecalledbyanothermodule.TheFUNCTIONandENDFUNCdirectivesareusedtoindicatethebeginningandendoftheprocedure.Seethefollowingexample:
MY_FUNCFUNCTION
…
…
ENDFUNC
INCLUDE
Whenthereisagroupofmacroswrittenandsavedinaseparatefile,theINCLUDEdirectivecanbeusedtobringthemintoanotherfile.
RN(equate)
This isusedtodefineanameforaregister.TheRNdirectivedoesnotsetasideaseparatestorageforthename,butassociatesaregisterwiththatname.ThefollowingcodeshowshowweuseRN:
VAL1RNR1;defineVAL1asanameforR1
VAL2RNR2;defineVAL2asanameforR2
SUMRNR3;defineSUMasanameforR3
AREAPROG_2_1,CODE,READONLY
ENTRY
MOVVAL1,#0x25;R1=0x25
MOVVAL2,#0x34;R2=0x34
ADDSUM,VAL1,VAL2;addR2toR1andplaceitinR3
HEREBHERE
END
AppendixC:MacrosWhatisamacroandhowisitused?
There are applications in Assembly language programming where a group ofinstructionsperformsataskthatisusedrepeatedly.Forexample,youmightneedtoaddthree registers together. So it does notmake sense to rewrite them every time they areneeded. Therefore, to reduce the time that it takes towrite these codes and reduce thepossibility of errors, the concept ofmacroswasborn.Macros allow the programmer towritethetask(setofcodestoperformaspecificjob)onceonlyandtoinvokeitwheneveritisneeded,whereveritisneeded.
MACROdefinition
Everymacrodefinitionmusthavethreeparts,asfollows:
MACRO
[$label]macroNameparameter1,parameter2,…,parameterN
……
……
MEND
The MACRO directive indicates the beginning of the macro definition and theMEND directive signals the end. What goes in between the MACRO and MENDdirectives is called the body of themacro. The namemust be unique andmust followAssembly language naming conventions. The parameters are names, or parameters, oreven registers that are mentioned in the body of themacro. After themacro has beenwritten, itcanbe invoked(orcalled)byitsname,andappropriatevaluesaresubstitutedfor parameters. For example you might want to have an instruction that adds threeregisters.Thefollowingisamacroforthepurpose:
MACRO
ADD3VAL$DEST,$ARG1,$ARG2,$ARG3
ADD$DEST,$ARG1,$ARG2
ADD$DEST,$DEST,$ARG3
MEND
The above code is the macro definition. Note that parameters $DEST, $ARG1,$ARG2,and$ARG3arementionedinthebodyofthemacro.Todistinguishparameterstheymuststartwith$.Inthefollowingexample,themacroisinvokedbyitsnamewiththeuser’sactualdata:
AREAOURCODE,READONLY,CODE
ENTRY
MOVR1,#5
MOVR2,#2
ADD3VALR0,R1,R2,#5
Theinstruction“ADD3VALR0,R1,R2,#5”invokesthemacro.
Theassemblerexpandsthemacrobyprovidingthefollowingcodeinthe.LSTfile:
300000008E0810002ADDR0,R1,R2
40000000CE2800005ADDR0,R0,#5
DefaultValuesforparameters
Wecandefinedefaultvaluesforparametersasshownbelow:
MACRO
ADD3VAL$DEST,$ARG1=R3,$ARG2,$ARG3=#5
ADD$DEST,$ARG1,$ARG2
ADD$DEST,$DEST,$ARG3
MEND
To use the default value we put a | instead of the parameter while invoking themacro:
ADD3VALR0,R1,R2,|
Theabovecodeusesthedefaultvalueof$ARG3whichissetto#5.
Usinglabelsinmacros
In thediscussionofmacrosso far,exampleshavebeenchosen thatdonothavealabelornameinthebodyofthemacro.Thisisbecauseifamacroisexpandedmorethanonceinaprogramandthereisalabelinthelabelfieldofthebodyofthemacro,thesamelabelwouldbegeneratedmorethanonceandanassemblererrorwouldbegenerated.Toaddresstheproblemwecangiveauniquelabeltothemacrowhenweinvokeit,asshownbelow:
MACRO
$lblOUR_MACRO
CMPR1,#5
BEQ$lbl
MOVR1,#1
$lbl
MEND
AREAOURCODE,READONLY,CODE
ENTRY
MOVR1,#3
label1OUR_MACRO
MOVR1,#5
label2OUR_MACRO
HEREBHERE
Theassemblerexpandsthemacrobyprovidingthefollowingcodeinthe.LSTfile:
2000000000AREAOURCODE,READONLY,CODE
2100000000ENTRY
2200000000E3A01003MOVR1,#3
2300000004label1OUR_MACRO
300000004E3510005CMPR1,#5
4000000080A000000BEQlabel1
50000000CE3A01001MOVR1,#1
600000010label1
2400000010E3A01005MOVR1,#5
2500000014label2OUR_MACRO
300000014E3510005CMPR1,#5
4000000180A000000BEQlabel2
50000001CE3A01001MOVR1,#1
600000020label2
2600000020EAFFFFFE
HEREBHERE
Incases that therearemore thanone label inamacro the linescanbe labeledasshownbelow:
MACRO
$lblOUR_MACRO
CMPR1,#5
BEQ$lbl.equal
MOVR1,#1
B$lbl.next
$lbl.equal
MOVR1,#2
$lbl.next
MEND
AREAOURCODE,READONLY,CODE
ENTRY
MOVR1,#3
label1OUR_MACRO
MOVR1,#5
label2OUR_MACRO
HEREBHERE
Theassemblerexpandsthemacrobyprovidingthefollowingcodeinthe.LSTfile:
1300000000AREAOURCODE,READONLY,CODE
1400000000ENTRY
1500000000E3A01003MOVR1,#3
1600000004label1OUR_MACRO
300000004E3510005CMPR1,#5
4000000080A000001BEQlabel1equal
50000000CE3A01001MOVR1,#1
600000010EA000000Blabel1next
700000014label1equal
800000014E3A01002MOVR1,#2
900000018label1next
1700000018E3A01005MOVR1,#5
180000001Clabel2OUR_MACRO
30000001CE3510005CMPR1,#5
4000000200A000001BEQlabel2equal
500000024E3A01001MOVR1,#1
600000028EA000000Blabel2next
70000002Clabel2equal
80000002CE3A01002MOVR1,#2
900000030label2next
1900000030EAFFFFFE
HEREBHERE
Conditionalmacros
Wecanpassconditionintomacros,aswell:
MACRO
$lblOurMacro$cond
CMPR1,#5
B$cond$lbl.equal
MOVR1,#1
$lbl.equal
MEND
AREAOURCODE,READONLY,CODE
ENTRY
MOVR1,#3
label1OurMacroEQ;inthemacrocheckequality
MOVR1,#3
label2OurMacroLO;inthemacrocheckifislower
HEREBHERE
Theassemblerexpandsthemacrobyprovidingthefollowingcodeinthe.LSTfile:
1000000000AREAOURCODE,READONLY,CODE
1100000000ENTRY
1200000000E3A01003MOVR1,#3
1300000004label1OurMacroEQ
300000004E3510005CMPR1,#5
4000000080A000000BEQlabel1equal
50000000CE3A01001MOVR1,#1
600000010label1equal
1400000010E3A01003MOVR1,#3
1500000014label2OurMacroLO
300000014E3510005CMPR1,#5
4000000183A000000BLOlabel2equal
50000001CE3A01001MOVR1,#1
600000020label2equal
1600000020EAFFFFFE
HEREBHERE
Notice that the firstB$cond is substitutedwithBEQwhile the secondB$cond issubstitutedwithBLOsincetheconditionsEQandLOareusedrespectively.
INCLUDEdirective
Assumethatthereareseveralmacrosthatareusedineveryprogram.Musttheyberewritten every time? The answer is no if the concept of the INCLUDE directive isknown.TheINCLUDEdirectiveallowsaprogrammertowritemacrosandsavetheminafile, and later bring them into any file. For example, assuming that some widely usedmacroswerewrittenandthensavedunderthefilename“MYMACRO1.S”,theINCLUDEdirectivecanbeusedtobringthisfileintoany“.asm”fileandthentheprogramcancallupon any of the macros as many times as needed. In the following example theADD3VALmacro isdefined in theMyMACRO.sfileand it isused in theexample.asmfile.
FigureC.1:DefiningaMacroinanIncludeFile
Macrosvs.subroutinesMacros and subroutines are useful in writing assembly programs, but each has
limitations.Macros increasecodesizeevery time theyare invoked.Forexample, ifyoucall a 10-instruction macro 10 times, the code size is increased by 100 instructions;whereas, if you call the same subroutine 10 times, the code size is only that of thesubroutine instructions.On theotherhand, a functioncall takes3clocksand the returninstructiontakes3clockstogetexecuted.So,usingfunctionsaddsaround6clockcycles.Thesubroutinesmightusestackspaceaswellwhencalled,whilethemacrosdonot.
AppendixD:FlowchartsandPseudocodeFlowcharts
If you have taken any previous programming courses, you are probably familiarwith flowcharting. Flowcharts use graphic symbols to represent different types ofprogramoperations.Thesesymbolsareconnected together intoa flowchart to show theflowofexecutionofaprogram.Themorecommonlyusedsymbolsareasfollows:
StartandEndpoints
Start and End points are commonly represented as rounded rectangles or ovalscontainingthewords“Start”or“End”.Smallcircleisanotherwaytoshowthem.
FigureD-1:StartandEndPoints
Decisions
Theconditionsarerepresentedindiamonds.
FigureD-2:aDecisionthatComparesAwithB
Process
Theprocessingstepsarerepresentedusingrectangles.
FigureD-3:aProcessSample
Inputsandoutputs
Inputsandoutputsarerepresentedasparallelogram.
FigureD-4:anOutputSample
Subroutines
Callingsubroutinesarerepresentedasshownbelow
FigureD-5:aSubroutineCallSample
PseudocodeFlowcharting has been standard practice in industry for decades. However, some
findlimitationsinusingflowcharts,suchasthefactthatyoucan’twritemuchinthelittleboxes, and it is hard to get the “big picture” ofwhat the programdoeswithout gettingbogged down in the details. An alternative to using flowcharts is pseudocode, whichinvolves writing brief descriptions of the flow of the code. Figures D-6 through D-10showflowchartsandpseudocodeforcommonlyusedcontrolstructures.
Structured programming uses three basic types of program control structures:sequence, control, and iteration. Sequence is simply executing instructions one afteranother. Figure D-6 shows how sequence can be represented in pseudocode andflowcharts.
FigureD-6:SEQUENCEPseudocodeversusFlowchart
NoteinFiguresD-6throughD-11that“statement”canindicateonestatementoragroupofstatements.
FigureD-7throughD-9showtwocontrolprogrammingstructures:IF-THEN-ELSEandIF-THENinbothpseudocodeandflowcharts.
FigureD-7:IFTHENELSEPseudocodeversusFlowchart
FigureD-8:anIFTHENELSESample
FigureD-9:IFTHENPseudocodeversusFlowchart
FiguresD-10andD-11showtwoiterationcontrolstructures:REPEATUNTILandWHILEDO.Bothstructuresexecuteastatementorgroupofstatementsrepeatedly.Thedifference between them is that the REPEAT UNTIL structure always executes thestatement(s) at least once, and checks the condition after each iteration, whereas theWHILEDOmaynotexecutethestatement(s)atallbecausetheconditionischeckedatthebeginningofeachiteration.
FigureD-10:REPEATUNTILPseudocodeversusFlowchart
FigureD-11:WHILEDOPseudocodeversusFlowchart
ProgramD-1finds thesumofnumbersbetween1and10.Compare theflowchartversus thepseudocode forProgramD-1 (shown inFigureD-12). In this example,moreprogram details are given than one usually finds. For example, this shows steps forinitializingandchangingvalues.Anotherprogrammermaynotincludethesestepsintheflowchart or pseudocode. It is important to remember that the purpose of flowcharts orpseudocode is to show the flow of the program and what the program does, not thespecificAssemblylanguageinstructionsthataccomplishtheprogram’sobjectives.Noticealsothatthepseudocodegivesthesameinformationinamuchmorecompactformthandoestheflowchart.Itisimportanttonotethatsometimespseudocodeiswritteninlayers,sothattheouterlevelorlayershowstheflowoftheprogramandsubsequentlevelsshowmoredetailsofhowtheprogramaccomplishesitsassignedtasks.
ProgramD-1
intmain()
{
intsum=0;
intvalue=1;
do{
sum=sum+value;
value++;
}while(value<=10);
printf(“%d”,sum);
}
FigureD-12:PseudocodeversusFlowchartforProgramD-1
AppendixE:PassingArgumentsintoFunctionsTherearedifferentwaystopassargumentstofunctions.Someofthemare:
·throughregisters
·throughmemoryusingreferences
·usingstack
E.1:PassingargumentsthroughregistersInthefollowingprogramtheBIGGERfunctiongetstwovaluesthroughR0andR1.
AftercomparingR0andR1,itreturnsthebiggervaluethroughR2.
ProgramE-1
AREAOUR_PROG,CODE,READONLY
ENTRY
MOVR0,#5;R0=5
MOVR1,#7;R1=7
BLBIGGER;BIGGER(5,7)
HEREBHERE;stayhere
;=======================================
;BIGGERreturnsthebiggervalue
;Parameters:
;R0andR1:thevaluestobecompared
;Returns:
;R2:containingthebiggervalue
;=======================================
BIGGER
CMPR0,R1
BHIL1;ifR0>R1gotoL1
MOVR2,R1;R2=R1
BXLR;return
L1MOVR2,R0;R2=R0
BXLR;return
END
Thisisafastwayofpassingargumentstothefunction.
E.2:PassingthroughmemoryusingreferencesWe can store the data in memory and pass its address through a register. In the
following program the STR_LENGTH function gets the address of a zero-ended stringthroughR0andreturnsthelengthofthestringthroughR1.
ProgramE-2
AREAOUR_PROG,CODE,READONLY
ENTRY
ADRR0,OUR_STR;R0=addr.ofOUR_STR
BLSTR_LENGTH;STR_LENGTH(&OUR_STR)
HEREBHERE;stayhere
OUR_STRDCB“HELLO!”
ALIGN4
;=======================================
;STR_LENGTHreturnsthelengthofstr
;Parameters:
;R0:addressofthestring
;Returns:
;R0:thelengthofstring
;=======================================
STR_LENGTH
MOVR2,#0
L_BEGIN
LDRBR5,[R0]
CMPR5,#0
BXEQLR;returnif0
ADDR2,R2,#1;R2=R2+1
ADDR0,R0,#1;R0=R0+1
BL_BEGIN
END
E.3:PassingargumentsthroughstackPassing through the stack is a flexible way of passing arguments. To do so, the
argumentsarepushedintothestackjustbeforecallingthefunctionandpoppedoffafterreturning. InProgramE-3, theBIGGER function gets two arguments through the stackandreturnsthebiggervaluethroughthestack.
ProgramE-3
AREAOUR_PROG,CODE,READONLY
ENTRY
;initstackpointer
LDRSP,=(0x40000000+(16*1024))
MOVR0,#5
PUSH{R0};pushArg1
MOVR0,#7
PUSH{R0};pushArg2
BLBIGGER;BIGGER(5,7)
LDRR1,[SP,#0];R1=returnvalue
POP{R0}
POP{R0}
HEREBHERE;stayhere
;=======================================
;BIGGERreturnsthebiggervalue
;Parameters:
;valuestobecompared
;Returns:
;thebiggervalue
;=======================================
BIGGER
LDRR0,[SP,#4];R0=arg1
LDRR1,[SP,#0];R1=arg2
CMPR0,R1
BHIL1;ifR0>R1gotoL1
STRR1,[SP,#0];returnR1
BXLR;return
L1STRR0,[SP,#0];returnR0
BXLR;return
END
Thismethodofpassingargumentsareusedinx86computers.
E.4:AAPCS(ARMApplicationProcedureCallStandard)TheAAPCSprovidesanstandardforimplementingthefunctionsandthefunction
callsso that thecodesmadebydifferentcompilersanddifferentprogrammerscanworkwitheachother.Someoftherulesofthestandardare:
·TheargumentsmustbesentthroughR0toR4,respectively.
·ThereturnvaluemustbesentbackthroughR0andR1.
· The functions canuseR5 toR8 for their internal calculations.But theirvalues must be retrieved before returning. To do so, we push the registersbeforeusingthemandpopthembeforereturningfromthefunction.
·ThestackmustbeusedasFullDescending
InProgramE-1mostoftheaboverulesareconsidered.ButthereturnvaluemustbeinR0insteadofR2.
InProgramE-4theaboverulesareconsidered.
ProgramE-4
AREAOUR_PROG,CODE,READONLY
ENTRY
;initstackpointer
;(changeitaccordingtoyourchip)
LDRSP,=(0x40000000+(16*1024))
MOVR0,#20
BLDELAY;DELAY(20)
HEREBHERE;stayhere
;=======================================
;DELAYwaitsforawhile
;Parameters:
;R0:theamountofwait
;Returns:
;none
;=======================================
DELAY
CMPR0,#0
BXEQLR;returnifzero
PUSH{R5};saveR5
LDRR5,=5000000;R5=5000000
L1SUBSR5,R5,#1;R5=R5-1
BNEL1;gotoL1ifR5isnotzero
POP{R5};restoreR5
BXLR;return
END
Moreinformation
FormoreinformationaboutAAPCSseethefollowingarticleorsearch“AAPCS”ontheInternet:
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042e/IHI0042E_aapcs.pdf
AppendixF:ASCIICodes