arm assembly language: programming and architecture

ARMAssemblyLanguageProgramming&Architecture

MuhammadAliMazidi

SarmadNaimiSepehrNaimiJaniceMazidi

Copyright©2013MazidisandNaimis

Allrightsreserved

(Aportionofthisbookwastakenfrom“The80x86IBMPC&CompatibleComputersVol1&2:4thedition”andwaspreviouslypublishedbyPearsonEducation,

Inc.)

“Regardmanasaminerichingemsofinestimablevalue.

Educationcan,alone,causeittorevealitstreasures,andenablemankindtobenefittherefrom.”

Baha’u’llah

DedicationTothefaculty,staff,andstudentsofBIHEuniversityfortheirdedicationand

steadfastness.

PrefaceTheARMprocessor is becoming the dominantCPUarchitecture in the computer

industry. It is already the leadingarchitecture incellphonesand tablet computers.WithsuchalargenumberofcompaniesproducingARMchips,itiscertainthatthearchitecturewillmovetothelaptop,desktopandhigh-performancecomputerscurrentlydominatedbyx86 architecture from Intel and AMD. Currently the PIC and AVR microcontrollersdominate the 8-bit microcontroller market. The ARM architecture will have a majorimpactinthisareatooasdesignersbecomemorefamiliarwithitsarchitecture.ThisbookisintendedasanintroductiontoARMassemblylanguageprogrammingandarchitecture.We assumeno prior background in assembly language programmingwith otherCPUs.However,weurgeyou to studyChapter0covering the fundamentalsofdigital systemssuchashexadecimalnumbers,varioustypesofmemory,memoryandI/Ointerfacing,andmemory address decoding. Chapter 0 is available free of charge on our websitehttp://www.MicroDigitalEd.com/ARM/ARM_books.htm

UniversitiesandcollegesThis book is intended for both academic and industry readers. The answers to

reviewquestionsatendofeachsectionareprovidedatendofthechapter.Ifyouareusingthisbookforauniversitycoursethereareend-of-chapterproblemsthatcanbefoundonwww.MicroDigitalEd.com/ARM/ARM_books.htm

OurupcomingbooksintheARMseriesThisbookcoverstheAssemblylanguageprogrammingoftheARMchip.TheARM

Assemblylanguageisstandardregardlessofwhomakesthechip.TheARMlicenseesarefree to implement theon-chipperipheral (ADC,Timers, I/O,…)as theychoose.SincetheARMperipherals arenot standard among thevariousvendors,wehavededicated aseparatebooktoeachvendor.Thefollowingbooksareplannedinthisseries:

TIARMPeripheralProgrammingandInterfacing

FreescaleARMPeripheralProgrammingandInterfacing

NXPARMPeripheralProgrammingandInterfacing

AtmelARMPeripheralProgrammingandInterfacing

TheabovebooksuseClanguagetoprogramtheperipherals.

Finally,wewould like to thankprofessorsShujenChen,RabahAoufi, andClydeKnight for their reading of the book before publication. We sincerely appreciate theirinsightsandinputs.WewouldalsoliketothankFadaMahmoudi,MozhdeAmiri,KeyvanRoshani,AzaliaYaghini,ArashZamani,ParhamFazlali,andAshkanFarivarforsharingtheircommentsontheprepublicationversionofthise-book.

Contactusatthefollowingemailaddress:

http://www.MicroDigitalEd.com/ARM/ARM_books.htm

[email protected]

andpleaseplaceARMbookinsubjectlineofyouremail.

mailto:[email protected]

TableofContentsChapter1:TheHistoryofARMandMicrocontrollers

Section1.1:IntroductiontoMicrocontrollers

Section1.2:TheARMFamilyHistory

Problems

AnswerstoReviewQuestions

Chapter2:ARMArchitectureandAssemblyLanguageProgramming

Section2.1:TheGeneralPurposeRegistersintheARM

Section2.2:TheARMMemoryMap

Section2.3:LoadandStoreInstructionsinARM

Section2.4:ARMCPSR(CurrentProgramStatusRegister)

Section2.5:ARMDataFormatandDirectives

Section2.6:IntroductiontoARMAssemblyProgramming

Section2.7:AssemblinganARMProgram

Section2.8:TheProgramCounterandProgramROMSpaceintheARM

Section2.9:SomeARMAddressingModes

Section2.10:RISCArchitectureinARM

Section2.11:ViewingRegistersandMemorywithARMKeilIDE

Problems


Chapter3:ArithmeticandLogicInstructionsandPrograms

Section3.1:ArithmeticInstructions

Section3.2:LogicInstructions

Section3.3:RotateandBarrelShifter

Section3.4:ShiftandRotateInstructionsinARMCortex(CaseStudy)

Section3.5:BCDandASCIIConversion

Problems


Chapter4:Branch,Call,andLoopinginARM

Section4.1:LoopingandBranchInstructions

Section4.2:CallingSubroutinewithBL

Section4.3:ARMTimeDelayandInstructionPipeline

Section4.4:ConditionalExecution

Problems


Chapter5:SignedNumbersandIEEE754FloatingPoint

Section5.1:SignedNumbersConcept

Section5.2:SignedNumberInstructionsandOperations

Section5.3:IEEE754Floating-PointStandards

Problems:


Chapter6:ARMMemoryMap,MemoryAccess,andStack

Section6.1:ARMMemoryMapandMemoryAccess

Section6.2:StackandStackUsageinARM

Section6.3:ARMBit-AddressableMemoryRegion

Section6.4:AdvancedIndexedAddressingMode

Section6.5:ADR,LDR,andPCRelativeAddressing

Problems


Chapter7:ARMPipelineandCPUEvolution

Section7.1:ARMPipelineEvolution

Section7.2:OtherCPUEnhancements

Problems


AppendixA:ARMCortex-M3InstructionDescription

SectionA.1:ListofARMCortex-M3Instructions

SectionA.2:ARMCortex-M3InstructionDescription

AppendixB:ARMAssemblerDirectives

SectionB.1:ListofARMAssemblerDirectives

SectionB.2:DescriptionofARMAssemblerDirectives

AppendixC:Macros

Whatisamacroandhowisitused?

Macrosvs.subroutines

AppendixD:FlowchartsandPseudocode

Flowcharts

Pseudocode

AppendixE:PassingArgumentsintoFunctions

E.1:Passingargumentsthroughregisters

E.2:Passingthroughmemoryusingreferences

E.3:Passingargumentsthroughstack

E.4:AAPCS(ARMApplicationProcedureCallStandard)

AppendixF:ASCIICodes

Chapter1:TheHistoryofARMandMicrocontrollersInSection1.1welookatthehistoryofmicrocontrollersthenweintroducesomeof

theavailablemicrocontrollers.ThehistoryofARMisprovidedinSection1.2.

Section1.1:IntroductiontoMicrocontrollersTheevolutionofMicroprocessorsandMicrocontrollers

In early computers, CPUs were designed using a number of vacuum tubes. Thevacuum tubewas bulky and consumed a lot of electricity. The invention of transistors,followed by the IC (Integrated Circuit), provided the means to put a CPU on printedcircuitboards.TheadvancesinICtechnologyallowedputtingtheentireCPUonasingleICchip.This ICwascalledamicroprocessor.Someof themicroprocessorsare thex86family of Intel used widely in desktop computers, and the 68000 of Motorola. ThemicroprocessorsdonotcontainRAM,ROM,orI/Operipherals.Asaresult,theymustbeconnectedexternallytoRAM,ROMandI/O,asshowninFigure1-1.

Figure1-1:AComputerMadebyGeneralPurposeMicroprocessor

In thenextstep, thedifferentpartsofasystem, includingCPU,RAM,ROM,andI/Os, were put together on a single IC chip and it was called microcontroller. SOC(System on Chip) andMCU (Micro Controller Unit) are other names used to refer tomicrocontrollers. Figure 1-2 shows the simplified view of the internal parts ofmicrocontrollers.

Figure1-2:SimplifiedViewoftheInternalPartsofMicrocontrollers(SOC)

Since the microcontrollers are cheap and small, they are widely used in manydevices.

TypesofComputers

Typically,computersarecategorizedinto3groups:desktopcomputers,servers,andembeddedsystems.

Desktop computers, including PCs, tablets, and laptops, are general purposecomputers.Theycanbeusedtoplaygames,readandeditarticles,anddoanyothertaskjust by running the proper application programs. The desktop computers usemicroprocessors.

Incontrast,embeddedsystemsarespecial-purposecomputers.Inembeddedsystemdevices,thesoftwareapplicationandhardwareareembeddedtogetherandaredesignedtodoaspecifictask.Forexample,theKindle,digitalcamera,vacuumcleaner,mp3player,mouse, keyboard, and printer, are some examples of embedded systems. Inmost casesembeddedsystemsrunafixedprogramandcontainamicrocontroller.Itisinterestingtonote that embedded systems are the largest class of computers though they are notnormallyconsideredascomputersbythegeneralpublic.

Serversarethefastcomputerswhichmightbeusedaswebhosts,databaseservers,andinanyapplicationinwhichweneedtoprocessahugeamountofdatasuchasweatherforecasting. Similar to desktop computers, servers are made of microprocessors but,multipleprocessorsareusuallyusedineachserver.Bothserversanddesktopcomputersare connected to a number of embedded systemdevices such asmouse, keyboard, diskcontroller,Flashstickmemoryandsoon.

ABriefHistoryoftheMicrocontrollers

Inthe1980sand1990s,IntelandMotoroladominatedthefieldofmicroprocessorsandmicrocontrollers. Intel had the x86 (8088/86, 80286, 80386, 80486, and Pentium).Motorola (nowFreescale) had the 68xxx (68000, 68010, 68020, etc.).Many embeddedsystemsusedIntel’s32-bitchipsofx86(386,486,Pentium)andMotorola’s32-bit68xxxforhigh-endembeddedproductssuchasrouters.Forexample,Ciscoroutersused68xxxfor theCPU.At the lowend, the8051 fromInteland68HC11fromMotorolawere thedominant8-bitmicrocontrollers.With the introductionofPICfromMicrochipandAVRfromAtmel, they becamemajor players in the 8-bitmarket formicrocontroller. At thetime of this writing, PIC and AVR are the leaders in terms of volume for 8-bitmicrocontrollers. In the late 1990s, the ARM microcontroller started to challenge thedominanceofIntelandMotorolainthe32-bitmarket.AlthoughbothIntelandMotorolausedRISCfeaturestoenhancetheperformanceoftheirmicroprocessors,duetotheneedtomaintain compatibilitywith legacy software, they could notmake a clean break andstart over. Intel used massive amounts of gates to keep up the performance of x86architecture and that in turn increased the power consumption of the x86 to a levelunacceptable for battery-powered embedded products.Meanwhile Freescale (Motorola)streamlinedtheinstructionsofthe68xxxCPUandcreatedanewlineofmicroprocessorscalledColdFire,whileatthesametimeworkedwithIBMtodesignanewRISCprocessorcalledPowerPC.WhilebothPowerPCandColdfireare still aliveandbeingused in the32-bit market, it is ARM which has become the leading microcontroller in the 32-bitmarket.

CurrentlyAvailableMicrocontrollers

Therearemanymicrocontrollersavailableinthemarket.SomeofthemarelistedinTable1-1.

32-bit

ARM, AVR32 (Atmel), ColdFire (Freescale), MIPS32, PIC32 (Microchip), PowerPC, TriCore(Infineon),SuperH

16-bit

MSP430(TI),HCS12(Freescale),PIC24(Microchip),dsPIC(Microchip)

8-bit

8051,AVR(Atmel),HCS08(Freescale),PIC16,PIC18

Table1-1:SomeMicrocontrollers

Introductiontosome32-bitmicrocontrollers

x86: The x86 and Pentium processors are based on the 32-bit architecture of the386.AlthoughbothIntelandAMDarepushingthex86intotheembeddedmarket,duetothehighpowerconsumptionof thesechips, theembeddedmarkethasnotembraced thex86.Intelisworkinghardtomakealow-powerversionofthe386calledAtomavailablefortheembeddedmarket.

PIC32:ItisbasedontheMIPSarchitectureandisgettingsomeattentionduetothefact it shares some of the peripherals with the PIC24/PIC18 chips and also using theMPLABfor IDE. Microchiphopes the freeMPLABIDEandengineers’knowledgeofthe 8-bit PIC will attract embedded developers to the PIC32 as they move to 32-bitsystemsfortheirhighendembeddedproducts.

ColdFire: The Freescale (formerly Motorola) is based on the venerable 680x0(68000, 68010) so popular in the 1980s and 1990s. They streamlined the 68000instructions to make it more RISC-type architecture and is the top seller of 32-bitprocessorsfromtheFreescale.InrecentyearsFreescalerevampedandredesignedthe8-bitHCS08(fromthe6808)tosharesomeoftheperipheralswithColdFireandarepushingthemunderthenameFlexis.TheyhopeengineersusetheHCS08atthelow-endandmovetoColdfireforhigh-endoftheembeddedproductswithminimumlearningcurve.

PowerPC: Thiswas developed jointly by IBM and Freescale. Itwas used in theAppleMacforafewyears.ThenAppleswitchedtox86forawhileandcurrentlyisusingARMinall theirproducts.Nowadays,bothFreescaleandIBMmarket thePowerPCforthehigh-endoftheembeddedsystems.

Howtochooseamicrocontroller

Thefollowingtwofactorscanbeimportantinchoosingamicrocontroller:

· Chipcharacteristics:Someofthefactorsinchoosingamicrocontrollerchipareclockspeed,powerconsumption,price,andon-chipmemoriesandperipherals.

· Availableresources:OtherfactorsinchoosingamicrocontrollerincludetheIDEcompiler,legacysoftware,andmultiplesourcesofproduction.

ReviewQuestions

1.Trueorfalse.Microcontrollersarenormallylessexpensivethanmicroprocessors.

2. Whencomparingasystemboardbasedonamicrocontrollerandageneral-purposemicroprocessor,whichoneischeaper?

3.Amicrocontrollernormallyhaswhichofthefollowingdeviceson-chip?

(a)RAM(b)ROM(c)I/O(d)alloftheabove

4.Ageneral-purposemicroprocessornormallyneedswhichofthefollowingdevicestobeattachedtoit?

(a)RAM(b)ROM(c)I/O(d)alloftheabove

5.Anembeddedsystemisalsocalledadedicatedsystem.Why?

6.Whatdoesthetermembeddedsystemmean?

7.Whydoeshavingmultiplesourcesofagivenproductmatter?

Section1.2:TheARMFamilyHistoryInthissection,welookattheARManditshistory.

AbriefhistoryoftheARM

TheARMcameoutofacompanycalledAcornComputers inUnitedKingdominthe1980s.ProfessorSteveFurberofManchesterUniversityworkedwithSophieWilsontodefine theARMarchitecture and instructions.TheVLSITechnologyCorp.producedthe first ARM chip in 1985 for Acorn Computers and was designated as Acorn RISCMachine(ARM).Unable tocompetewithx86(8088,80286,80386,…)PCsfromIBMandotherpersonalcomputermakers,theAcornwasforcedtopushtheARMchipintothesingle-chipmicrocontrollermarketforembeddedproducts.ThatiswhenAppleCorp.gotinterestedinusingtheARMchipforthePDA(personaldigitalassistants)products.ThisrenewedinterestinthechipledtothecreationofanewcompanycalledARM(AdvancedRISCMachine).ThisnewcompanybetitsentirefortuneonsellingtherightstothisnewCPU to other siliconmanufacturers and design houses. Since the early 1990s, an everincreasingnumberofcompanieshavelicensedtherighttomaketheARMchip.SeeTable1-2forthemajormilestonesoftheARM.

Alsoseehttp://www.arm.com/about/company-profile/milestones.phpforthelist.

Table1-2:ARMCompanymilestones(www.ARM.com)

1982

AcornproducedacomputerforBBCnamedBBCmicro.GoodsalesofthecomputermotivatedAcorntodecidetomakeitsownmicroprocessor.

1983

AcornandVLSIbegandesigningtheARMmicroprocessor.

1985

AcornComputerGroupdevelopedtheworld’sfirstcommercialRISCprocessor.TheARMv1had2500transistors,andworkedwithafrequencyof4MHz.

1987

Acorn’sARMprocessordebutsasthefirstRISCprocessorforlow-costPCs

1989

AcornintroducedARMv3withafrequencyof25MHz.Ithada4KBcacheaswell.

1990

http://www.arm.com/about/company-profile/milestones.php

http://www.ARM.com

AdvancedRISCMachines(ARM)spinsoutofAcornandAppleComputer’scollaborationeffortswithachartertocreateanewmicroprocessorstandard.VLSITechnologybecomesaninvestorandthefirstlicensee.

1991

ARMintroduceditsfirstembeddableRISCcore,theARM6solutionusingARMv3architecture.

1992

GECPlesseyandSharplicensedARMtechnology

1993

TexasInstrumentslicensedARMtechnology

ARMintroducedtheARM7core.

1995

ARMannouncedtheThumbarchitectureextension,whichgives32-bitRISCperformanceat16-bitsystemcostandoffersindustry-leadingcodedensity

ARMlaunchedSoftwareDevelopmentToolkit

1996

ARMandVLSITechnologyintroducedtheARM810microprocessor

ARMandMicrosoftworkedtogethertoextendWindowsCEtotheARMarchitecture

1997

Hyundai,Lucent,Philips,RockwellandSonylicensedARMtechnology

ARM9TDMIfamilyannounced

1998

HP,IBM,Matsushita,SeikoEpsonandQualcommlicensedARMtechnology

ARMdevelopedsynthesizableversionoftheARM7TDMIcore

ARMPartnersshippedmorethan50millionARM-poweredproducts

1999

LSILogic,STMicroelectronicsandFujitsulicensedARMtechnology

ARMannouncedsynthesizableARM9Eprocessorwithenhancedsignalprocessing

2000

Agilent,Altera,Micronas,Mitsubishi,Motorola,Sanyo,TriscendandZTEIClicensedARMtechnology

ARMlaunchedSecurCorefamilyforsmartcards

TSMCandUMCbecamemembersofARMFoundryProgram

2001

ARM’sshareofthe32-bitembeddedRISCmicroprocessormarketgrewto76.8percent

ARMannouncednewARMv6architecture

Fujitsu,GlobalUniChip,SamsungandZeevolicensedARMtechnology

ARMacquiredkeytechnologiesandanembeddeddebugdesignteamfromNoralMicrologicsLtd

2002

ARMannouncedthatithadshippedoveronebillionofitsmicroprocessorcorestodate

ARMtechnologylicensedtoSeagate,Broadcom,Philips,Matsushita,Micrel,eSilicon,ChipExpressandITRI

ARMlaunchedtheARM11micro-architecture

ARMlaunchesitsRealViewfamilyofdevelopmenttools

FlextronicsbecamethefirstARMLicensingPartnerprogrammember,allowingittosub-licenseARMtechnologytoitsowncustomers

2004

TheARMCortexfamilyofprocessors,basedontheARMv7architecture,isannounced.TheARMCortex-M3isannouncedinconjunction,asthefirstofthenewfamilyofprocessors

ARMCortex-M3processorannounced,thefirstofanewCortexfamilyofprocessorcores

MPCoremultiprocessorlaunched,thefirstintegratedmultiprocessor

OptimoDEtechnologylaunched,thegroundbreakingembeddedsignalprocessingcore

2005

ARMacquiredKeilSoftware

ARMCortex-A8processorannounced

2007

FivebillionthARMPoweredprocessorshippedtothemobiledevicemarket

ARMCortex-M1processorlaunched–thefirstARMprocessordesignedspecificallyforimplementationonFPGAs

RealViewProfilerforEmbeddedSoftwareAnalysisintroduced

ARMunveilsCortex-A9processorsforscalableperformanceandlow-powerdesigns

2008

ARMannounces10billionthprocessorshipment

ARMMali-200GPUWorldsFirsttoachieveKhronosOpenGLES2.0conformanceat1080pHDTVresolution

2009

ARMannounces2GHzcapableCortex-A9dualcoreprocessorimplementation

ARMlaunchesitssmallest,lowestpower,mostenergyefficientprocessor,Cortex-M0

2010

ARMlaunchesCortex-M4processorforhighperformancedigitalsignalcontrol

ARMtogetherwithkeyPartnersformLinarotospeedrolloutofLinux-baseddevices

MicrosoftbecomesanARMArchitectureLicensee

ARM&TSMCsignlong-termagreementtoachieveoptimizedSystems-on-ChipbasedonARMprocessors,extendingdownto20nm

ARMextendsperformancerangeofprocessorofferingwiththeCortex-A15MPCoreprocessor

ARMMalibecomesthemostwidelylicensedembeddedGPUarchitecture

ARMMali-T604GraphicsProcessingUnitintroducedprovidingindustry-leadinggraphicsperformancewithanenergy-efficientprofile

2011

MicrosoftunveilsWindowsonARMatCES2011

IBMandARMcollaboratetoprovidecomprehensivedesignplatformsdownto14nm

ARMandUMCextendpartnershipinto28nm

Cortex-A7processorlaunched

Big-Littleprocessingannounced,linkingCortex-A15andCortex-A7processors

ARMv8architectureunveiledatTechCon

AMPannouncelicenseandplansforfirstARMv8-basedprocessor

ARMMali-T658GPUlaunched

ARMexpandsR&DpresenceinTaiwanwithHsinchuDesignCenter

ARMandAvnetlaunchEmbeddedSoftwareStore(ESS)

ARM,CadenceandTSMCtapeoutfirst20nmCortex-A15multicoreprocessor

2012

ARM,GemaltoandG&Dformjointventuretodelivernext-generationmobilesecurity

FirstWindowsRT(WindowsonARM)devicesrevealed

ARM,AMD,Imagination,MediaTekandTexasInstrumentsfoundingmembersofHeterogeneousSystemArchitecture(HAS)Foundation

ARMandTSMCworktogetheronFinFETprocesstechnologyfornext-generation64-bitARMprocessors

ARMformsfirstUKforumtocreatetechnologyblueprint“InternetofThings”devices

ARMnamedoneofBritain’sTopEmployers

MITTechnologyReviewnamedARMinitslistof50MostInnovativeCompanies

Currently theARMCorp. receives its entire revenue from licensing theARM toother companies since it does not own state of the art chip fabrication facility. ThisbusinessmodelofmakingmoneyfromsellingIP(intellectualproperty)hasmadeARMoneofthemostwidelyusedCPUarchitecturesintheworld.UnlikeIntelorFreescalewhodefinethearchitectureandfabricate thechip,hundredsofcompanieswhohavelicensedtheARMIPfeelalevelplayingfieldwhenitcomestocompetingwiththeoriginatorofthechip.

ARMandApple

WhenSteveJobscamebacktoruntheApplein1996,thecompanywasindecline.Ithadlostthepersonalcomputerracethathadstarted20yearsearlier.TheintroductionofiPod in 2001 changed the fortune of that companymore than anything else.Apple hadtriedtosellaPDAcalledNewtoninthe1990sbutwasnotsuccessful.TheNewtonwasusing theARMprocessor and itwas too early for its time.The iPodused an enhancedversionofARMcalledARM7andbecameaninstantsuccess.iPodbroughttheattentionto the ARM chip that it deserved. Since then Apple has been using the ARM chip iniPhonesand iPads.Today, theARMmicrocontroller is theCPUofchoice fordesigningcell phone and other hand-held devices. In the future,ARMwillmake further in-roadsinto the tablet and laptopPCmarketnow thatMicrosoftCorphas introduced theARMversionofitsWindowsoperatingsystem.

ARMfamilyvariations

AlthoughtheARM7familyisthemostwidelyusedversion,ARMisdeterminedtopushthearchitectureintothelowendofthemicrocontrollermarketwhere8-and16-bitmicrocontrollershavebeentraditionallydominating. ForthisreasontheyhavecomeupwithamicrocontrollerversionofARMcalledCortex.Aswewillseeinfuturechapters,the Cortex family of ARM microcontrollers maintains compatibility with the ARM7without sacrificing performance.TheARMarchitecture is also being pushed into high-performancesystemswheremulticorechipssuchasIntelXeondominate.

Figure 1-3 shows some of the most widely used ARM processors. It should beemphasized that we cannot use the terms ARM family and ARM architectureinterchangeably. For example, ARM11 family is based on ARMv6 architecture and

ARMv7AisthearchitectureofCortex-Afamily.

Figure1-3:ARMFamilyandArchitecture

OneCPU,manyperipherals

ARMhasdefinedthedetailsofarchitecture,registers,instructionset,memorymap,andtimingoftheARMCPUandholdsthecopyrighttoit.Thevariousdesignhousesandsemiconductormanufacturers license the IP (intellectual property) for theCPUand canadd their own peripherals as they please. It is up to the licensee (design houses andsemiconductormanufactures)todefinethedetailsofperipheralssuchasI/Oports,serialportUART,timer,ADC,SPI,DAC,I2C,andsoon.AsaresultwhiletheCPUinstructionsand architecture are same across all the ARM chips made by different vendors, theirperipheralsarenotcompatible.ThatmeansifyouwriteaprogramfortheserialportofanARMchipmadebyTI(TexasInstrument), theprogrammightnotnecessarilyrunonanARMchipsoldbyNXP.ThisistheonlydrawbackoftheARMmicrocontroller.ThegoodnewsistheIDE(integrateddevelopmentenvironment)suchasKeil(seewww.keil.com)orIAR(seewww.IAR.com)doprovideperipherallibrariesforchipsfromvariousvendorsandmake the jobofprogrammingtheperipheralsmucheasier. Itmustbenoted that inrecentyearsARMprovidestheIPforsomeperipheralssuchasUARTandSPI,butunliketheCPUarchitecture,itsadoptionisnotmandatoryanditisuptothechipmanufacturerwhether to adopt it or not. This is in contrast to the Coldfire microcontroller fromFreescale,inwhichtheFreescaledefinesthearchitectureandperipherals,fabricates,sells,andsupportsthechip.Figure1-4showstheARMsimplifiedblockdiagramandTable1-3providesalistofsomeARMvendors.

Figure1-4:ARMSimplifiedBlockDiagram

Actel AnalogDevices Atmel

Broadcom Cypress Ember

DustNetworks Energy Freescale

Fujitso Nuvoton NXP

Renesas Samsung ST

Toshiba TexasInstruments TriadSemiconductor

Table1-3:ARMVendors

ReviewQuestions

1.Trueorfalse.TheARMCPUinstructionsareuniversalregardlessofwhomakesthechip.

2.Trueorfalse.TheperipheralsofARMmicrocontrollerarestandardizedregardlessof

whomakesthechip.

3.AnARMmicrocontrollernormallyhaswhichofthefollowingdeviceson-chip?

(a)RAM(b)Timer(c)I/O(d)alloftheabove

4.Forwhichofthefollowings,ARMhasdefinedstandard?

(a)RAMsize(b)ROMsize(c)instructionset(d)alloftheabove

SeethefollowingwebsitesforARMmicrocontrollersandARMtrainers:

http://www.ARM.com

http://www.MicroDigitalEd.com

http://www.ARM.com


ProblemsSection1.1:IntroductiontoMicrocontrollers

1.TrueorFalse.Ageneral-purposemicroprocessorhason-chipROM.

2.TrueorFalse.Generally,amicrocontrollerhason-chipROM.

3.TrueorFalse.Amicrocontrollerhason-chipI/Oports.

4.TrueorFalse.AmicrocontrollerhasafixedamountofRAMonthechip.

5.Whatcomponentsareusuallyputtogetherwiththemicrocontrollerontoasinglechip?

6.Intel’sPentiumchipsusedinWindowsPCsneedexternal______and_____chipstostoredataandcode.

7.ListthreeembeddedproductsattachedtoaPC.

8.Whywouldsomeonewanttouseanx86asanembeddedprocessor?

9.Givethenameandthemanufacturerofsomeofthemostwidelyused8-bitmicrocontrollers.

10.InQuestion9,whichonehasthemostmanufacturesources?

11.Inabattery-basedembeddedproduct,whatisthemostimportantfactorinchoosingamicrocontroller?

12.Inanembeddedcontrollerwithon-chipROM,whydoesthesizeoftheROMmatter?

13.Inchoosingamicrocontroller,howimportantisittohavemultiplesourcesforthatchip?

14.Whatdoestheterm“third-partysupport”mean?

Section1.2:TheARMFamilyHistory

15.WhatdoesARMstandfor?

16.Trueorfalse.InARM,architectureshavethesamenamesasfamilies.

17.Trueorfalse.In1990s,ARMwaswidelyusedinmicroprocessorworld.

18.Trueorfalse.ARMiswidelyusedinAppleproducts,likeiPhoneandiPod.

19.Trueorfalse.CurrentlytheMicrosoftWindowsdoesnotsupportARMproducts.

20.Trueorfalse.AllARMchipshavestandardinstructions.

21.Trueorfalse.AllARMchipshavestandardperipherals

22.Trueorfalse.TheARMcorp.alsomanufacturestheARMchip.

23.Trueorfalse.TheARMIPmustbelicensedfromARMcorp.

24.Trueorfalse.AgivenserialcommunicationprogramiswrittenforTIARMchip.It

shouldworkwithoutanymodificationonFreescaleARMchip

25.Trueorfalse.AgivenAssemblylanguageprogramiswrittenforagivenfamilyofARMCortexchip.AnyotherCortexARMchipcanexecutetheprogram.

26.Trueorfalse.Atthepresenttime,ARMhasjustonemanufacturer.

27.WhatisthedifferencebetweentheARMproductsofdifferentmanufacturers?

28.Namesome32-bitmicrocontrollers.

29.WhatisIntel’schallengeindecreasingthepowerconsumptionofthex86?

AnswerstoReviewQuestionsSection1.1

1.True

2.Amicrocontroller-basedsystem

3.d

4.d

5.Itisdedicatedbecauseitdoesonlyonetypeofjob.

6.Embeddedsystemmeansthattheapplication(software)andtheprocessor(hardwaresuchasCPUandmemory)areembeddedtogetherintoasinglesystem.

7.Havingmultiplesourcesforagivenpartmeansyouarenothostagetoonesupplier.Moreimportantly,competitionamongsuppliersbringsaboutlowercostforthatproduct.

Section1.2

1.True

2.False

3.d

4.c

Chapter2:ARMArchitectureandAssemblyLanguageProgramming

CPUsuseregisterstostoredatatemporarily.ToprograminAssemblylanguage,wemustunderstand the registersandarchitectureofagivenCPUand the role theyplay inprocessing data. In Section 2.1we look at the general purpose registers (GPRs) of theARM.WedemonstratetheuseofGPRswithsimpleinstructionssuchasMOVandADD.Memory map and memory access of the ARM are discussed in Sections 2.2 and 2.3,respectively. In Section 2.4 we discuss the status register’s flag bits and how they areaffectedbyarithmeticinstructions.InSection2.5welookatsomewidelyusedAssemblylanguagedirectives,pseudo-code,anddata types related to theARM.InSection2.6weexamineAssembly languageandmachine languageprogramminganddefine termssuchas mnemonics, opcode, operand, and so on. The process of assembling and creating aready-to-runprogramfortheARMisdiscussedinSection2.7.Step-by-stepexecutionofan ARM program and the role of the program counter are examined in Section 2.8.Section2.9examinessomeARMaddressingmodes.ThemeritsofRISCarchitectureareexaminedinSection2.10.Section2.11discussestheKeilIDE.

Section2.1:TheGeneralPurposeRegistersintheARMCPUsuseregisterstostoredatatemporarily.ToprograminAssemblylanguage,we

mustunderstand the registersandarchitectureofagivenCPUand the role theyplay inprocessing data. In this sectionwe look at the general purpose registers (GPRs) of theARMandwedemonstrate the use ofGPRswith simple instructions such asMOVandADD.

ARMmicrocontrollershavemany registers for arithmetic and logicoperations. IntheCPU,registersareusedtostoreinformationtemporarily.Thatinformationcouldbeabyteofdatatobeprocessed,oranaddresspointingtothedatatobefetched.AllofARMregistersare32-bitwide.The32bitsofaregisterareshowninFigure2-1.TheserangefromtheMSB(most-significantbit)D31totheLSB(least-significantbit)D0.Witha32-bitdata type,anydata larger than32bitsmustbebrokeninto32-bitchunksbefore it isprocessed.AlthoughtheARMdefaultdatasizeis32-bitmanyassemblersalsosupportthesinglebit,8-bit,and16-bitdata types,aswewillseeinfuturechapters.The32-bitdatasizeoftheARMisoftenreferredasword.Thisisincontrasttox86CPUinwhichwordisdefined as 16-bit. InARM the 16-bit data is referred to as half-word. ThereforeARMsupportsbyte,half-word(twobyte),andword(fourbytes)datatypes.

Figure2-1:ARMRegistersDataSize

InARMthereare13generalpurposeregisters.TheyareR0–R12.SeeFigure2-2.Alloftheseregistersare32bitswide.

Figure2-2:ARMRegisters

The general purpose registers in ARM are the same as the accumulator in other

microprocessors.Theycanbeusedbyallarithmeticandlogicinstructions.Tounderstandthe use of the general purpose registers,wewill show it in the context of three simpleinstructions:MOV,ADD,andSUB.TheARMcorehasthreespecialfunctionregistersofR13, R14, and R15. We will examine their use in the next section. In some ARMprocessors,wealsohaveshadowregistersinvariousoperatingmodesdesignedtospeeduptheprogramexecutionwhenCPUswitchestask.

ARMInstructionFormat

TheARMCPUusesthetri-partinstructionformatformostinstructions.Oneofthemostcommonformatis:

instructiondestination,source1,source2

Depending on the instruction the source2 can be a register, immediate (constant)value,ormemory.Thedestinationisoftenaregisterorread/writememory.

MOVinstruction

Simply stated, the MOV instruction copies data into register or from register toregister.Ithasthefollowingformats:

MOVRn,Op2;loadRnregisterwithOp2(Operand2).

;Op2canbeimmediate

Op2canbeanimmediate(constant)number#Kwhichisan8-bitvaluethatcanbe0–255indecimal,(00–FFinhex).Op2canalsobearegisterRm.RnorRmareanyoftheregistersR0toR15.Ifweseetheword“immediate”,wearedealingwithaconstantvaluethat must be provided right there with the instruction. Notice the # before immediatevalues.ThefollowinginstructionloadstheR2registerwithavalueof0x25(25inhex).

MOVR2,#0x25;loadR2with0x25(R2=0x25)

ThefollowinginstructionloadstheR1registerwiththevalue0x87(87inhex).

MOVR1,#0x87;copy0x87intoR1(R1=0x87)

ThefollowinginstructionloadsR5withthevalueofR7.

MOVR5,R7;copycontentsofR7intoR5(R5=R7)

Notice the position of the source and destination operands. As you can see, theMOVloadstherightoperandintotheleftoperand.Inotherwords,thedestinationcomesfirst.

Towrite a comment inAssembly languagewe use ‘;’. It is the same as ‘//’ inClanguage,whichcauses theremainderof the lineofcodetobe ignored.For instance, intheaboveexamplestheexpressionsmentionedafter‘;’justexplainthefunctionalityoftheinstructionstoyou,anddonothaveanyeffectsontheexecutionoftheinstructions.

When programming the registers of theARMmicrocontrollerwith an immediatevalue,thefollowingpointsshouldbenoted:

1.Weput#infrontofeveryimmediatevalue.

2.Ifwewanttopresentanumberinhex,weputa0xinfrontofit.Ifweputnothinginfrontofanumber,itisindecimal.Forexample,in“MOVR1,#50”,R1isloadedwith50indecimal,whereasin“MOVR1,#0x50”,R1isloadedwith50inhex(80indecimal).

3.Ifvalues0toFFaremovedintoa32-bitregister,therestofthebitsareassumedtobeallzeros.Forexample, in“MOVR1,#0x5” the resultwillbeR1=0x00000005;thatis,R1=00000000000000000000000000000101inbinary.

4.Movinganimmediatevaluelargerthan255(FFinhex)intotheregisterwillcauseanerror.

Note!

Wecannotloadvalueslargerthan0xFF(255)intoregistersR0toR12usingtheMOVinstruction.Forexample,thefollowinginstructionisnotvalid:

MOVR5,#0x999999;invalidinstruction

ThereasonisthefactthatalthoughtheARMinstructionis32-bitwide,only8bitsofMOVinstructioncanbeusedasanimmediatevaluewhichcantakevaluesnotlargerthan0xFF(255).

ADDinstruction

TheADDinstructionhasthefollowingformat:

ADDRd,Rn,Op2;ADDRntoOp2andstoretheresultinRd

;Op2canbeImmediatevalue#K(Kisbetween0and255)

;orRegisterRm

TheADDinstructiontellstheCPUtoaddthevalueofRntoOp2andputtheresultinto the Rd (destination) register. Aswementioned before, Op2 can be an immediatevalue#Kbetween0–255indecimal(00–FFinhex)oraregisterRm.Toaddtwonumberssuchas0x25and0x34,onecandoanyofthefollowing:



ADDR5,R1,R7;addvalueR7toR1andputitinR5

;(R5=R1+R7)

or

MOVR1,#0x25;load(copy)0x25intoR1(R1=0x25)

ADDR5,R1,#0x34;add0x34toR1andputitinR5

;(R5=R1+0x34)

ExecutingtheabovelinesresultsinR5=0x59(0x25+0x34=0x59)

SUBinstruction

TheSUBinstructionislikeADDinstructionformat.ItsubtractsOp2fromRnandputtheresultinRd(destination)

SUBRd,Rn,Op2;Rd=Rn–Op2

Tosubtracttwonumberssuchas0x34and0x25,onecandothefollowing:

MOVR1,#0x34;load(copy)0x34intoR1(R1=0x34)

SUBR5,R1,#0x25;R5=R1–0x25(R1=0x34–0x25)

Theoldformat

NoticethatinmostofinstructionslikeADDandSUB,RncanbeomittedifRdandRnarethesame.ThisformatisnolongerrecommendedbyUnifiedAssemblerLanguage.

Forexample,eachpairofthefollowinginstructionsarethesame.

SUBR1,R1,#0x25;R1=R1-0x25

SUBR1,#0x25;R1=R1-0x25

SUBR1,R1,R2;R1=R1-R2

SUBR1,R2;R1=R1-R2

ADDR1,R1,#0x25;R1=R1+0x25

ADDR1,#0x25;R1=R1+0x25

ADDR1,R1,R2;R1=R1+R2

ADDR1,R2;R1=R1+R2

Figure2-3showsthegeneralpurposeregisters(GPRs)andtheALUinARM.TheeffectofarithmeticandlogicoperationsonthestatusregisterwillbediscussedinSection2.4.InTable2-1youseesomeoftheARM ALUinstructions.

Figure2-3:ARMRegistersandALU

Instruction Description

ADDRd,Rn,Op2* ADDRntoOp2andplacetheresultinRd

ADCRd,Rn,Op2 ADDRntoOp2withCarryandplacetheresultinRd

ANDRd,Rn,Op2 ANDRnwithOp2andplacetheresultinRd

BICRd,Rn,Op2 ANDRnwithNOTofOp2andplacetheresultinRd

CMPRn,Op2 CompareRnwithOp2andsetthestatusbitsofCPSR**

CMNRn,Op2 CompareRnwithnegativeofOp2andsetthestatusbits

EORRd,Rn,Op2 ExclusiveORRnwithOp2andplacetheresultinRd

MVNRd,Op2 PlaceNOTofOp2inRd

MOVRd,Op2 MOVE(Copy)Op2toRd

ORRRd,Rn,Op2 ORRnwithOp2andplacetheresultinRd

RSBRd,Rn,Op2 SubtractRnfromOp2andplacetheresultinRd

RSCRd,Rn,Op2 SubtractRnfromOp2withcarryandplacetheresultinRd

SBCRd,Rn,Op2 SubtractOp2fromRnwithcarryandplacetheresultinRd

SUBRd,Rn,Op2 SubtractOp2fromRnandplacetheresultinRd

TEQRn,Op2 Exclusive-ORRnwithOp2andsetthestatusbitsofCPSR

TSTRn,Op2 ANDRnwithOp2andsetthestatusbitsofCPSR

*Op2canbeanimmediate8-bitvalue#Kwhichcanbe0–255indecimal,(00–FFinhex).Op2canalsobearegisterRm.Rd,RnandRmareanyofthegeneralpurposeregisters

**CPSRisdiscussedlaterinthischapter

***Theinstructionsarediscussedindetailinthenextchapters

Table2-1:ALUInstructionsUsingGPRs

ReviewQuestions

1.Writeinstructionstomovethevalue0x34intotheR2register.

2.Writeinstructionstoaddthevalues0x16and0xCD.PlacetheresultintheR1register.

3.Trueorfalse.NovaluecanbemoveddirectlyintotheGPRs.

4.Whatisthelargesthexvaluethatcanbemovedintoa32-bitregisterusingMOVinstruction?Whatisthedecimalequivalentofthathexvalue?

5.AlloftheregistersintheARMare_____-bit.

Section2.2:TheARMMemoryMapInthissectionwediscussthememorymapforARMfamilymembers.

TheSpecialFunctionRegistersinARM

InARM theR13,R14,R15, andCPSR (current program status register) registersare called SFRs (special function registers) since each one is dedicated to a specificfunction.Agivenspecialfunctionregisterisdedicatedtospecificfunctionsuchasstatusregister,programcounter,stackpointer,andsoon.ThefunctionofeachSFRisfixedbytheCPUdesigneratthetimeofdesignbecauseitisusedforcontrolofthemicrocontrollerorkeepingtrackofspecificCPUstatus.ThefourSFRsofR13,R14,R15,andCPSRplayanextremelyimportantrole inARM.TheR13issetasideforstackpointer.TheR14isdesignatedaslinkregisterwhichholdsthereturnaddresswhentheCPUcallsasubroutineandtheR15istheprogramcounter(PC).ThePC(programcounter)pointstotheaddressof thenext instructiontobeexecutedaswewillseeinnextsection.TheCPSR(currentprogramstatusregister)isusedforkeepingconditionflagsamongotherthings,aswewillsee in Section 2.4. In contrast to SFRs, the GPRs (R0-R12) do not have any specificfunction and are used for storing general data. For this reason some ARM instructionformats(theThumb)haveonlyR0-7buteveryvariationofARMchiphasR13-R15SFRs.The Thumb instruction format is designed to compete with the 8- and 16-bitmicrocontrollersandincreasecodedensity.

ProgramCounterintheARM

OneofthemostimportantregisterintheARMmicrocontrolleristhePC(programcounter).Aswementionedearlier,theR15istheprogramcounter.TheprogramcounterisusedbytheCPUtopointtotheaddressofthenextinstructiontobeexecuted.AstheCPUfetches theopcodefromtheprogrammemory, theprogramcounter is incrementedautomatically to point to the next instruction.Thewider the programcounter, themorememorylocationsaCPUcanaccess.Thatmeansthata32-bitprogramcountercanaccessamaximumof4G(232=4G)bytesofprogrammemorylocations.

InARMmicrocontrollerseachmemorylocationisabytewide.Inthecaseofa32-bit program counter, the code space is 4G bytes (232 = 4G), which occupies the0x00000000–0xFFFFFFFFaddressrange.TheprogramcounterintheARMfamilyis32bits wide. This means that the ARM family can access addresses 0x00000000 to0xFFFFFFFF,atotalof4Gbytesoflocations(memoryspaces).Althoughthis4Gbytesofmemory spacecanbeallocated toon-chiporoff-chipmemory;however, at the timeofthiswriting,noneofthemembersoftheARMmicrocontrollerfamilyhavetheentire4Gbytesofon-chipmemorypopulated.SeeTable2-2.

Company Device Flash(KBytes) RAM(KBytes) I/OPins

Atmel AT91SAM7X512 512 128 62

NXP LPC2367 512 58 70

ST STR750FV2 256 16 72

TI TMS470R1A256 256 12 49

Freescale MK10DX256VML7 256 64 74

Table2-2:On-chipMemorySizeforsomeARMChips

MemoryspaceallocationintheARM

TheARMhas4Gbytesofdirectlyaccessiblememoryspace.Thismemoryspacehasaddresses0to0xFFFFFFFF.The4Gbytesofmemoryspacecanbedividedintofivesections.Theyareasfollows:

1.On-chipperipheralandI/Oregisters:ThisareaisdedicatedtogeneralpurposeI/O(GPIO)andspecialfunctionregisters(SFRs)ofperipheralssuchastimers,serialcommunication,ADC,andsoon.Inotherwords,ARMusesmemory-mappedI/O.See Chapter 0 for discussion of memory-mapped I/O. The function and addresslocationofeachSFRisfixedbythechipvendoratthetimeofdesignbecauseitisused forport registersofperipherals.Thenumberof locations set aside forGPIOregistersandSFRsdependsonthepinnumbersandperipheralfunctionssupportedbythatchip.Thatnumbercanvaryfromchiptochipevenamongmembersofthesamefamily fromthesamevendor.Due to the fact thatARMdoesnotdefine thetype and number of I/O peripherals one must not expect to have same addresslocationsfortheperipheralregistersamongvariousvendors.

2. On-chipdataSRAM:ARAMspace ranging froma fewkilobytes to severalhundredkilobytesissetasidemainlyfordatastorage.ThedataRAMspaceisusedfordatavariablesandstackandisaccessedbythemicrocontrollerinstructions.Thedata RAM space is read/write memory used by the CPU for storage of datavariables, scratch pad, and stack. The ARM microcontrollers’ data SRAM sizeranges from 2K bytes to several thousand kilobytes depending on the chip. Evenwithinthesamefamily,thesizeofthedataSRAMspacevariesfromchiptochip.AlargerdataSRAMsizemeansmoredifficultiesinmanagingtheseRAMlocationsifyou use Assembly language programming. In today’s high-performancemicrocontroller, however, with over a thousand bytes of data RAM, the job ofmanagingthemishandledbytheCcompilers.Indeed,theCcompilersaretheveryreasonweneedalargedataRAMsinceitmakesiteasierforCcompilerstostoreparametersandallowsthemtoperformtheirjobsmuchfaster.TheamountandthelocationoftheSRAMspacevaryfromchiptochipintheARMchips.AsectionofthedataRAMspaceisusedbystackaswewillseeinChapter6.AlthoughmanyoftheARMmicrocontrollersusedinembeddedproductstheSRAMspaceisusedfordata;onecanalsobuyordesignanARM-basedsysteminwhichtheRAMspaceisusedforbothdataandprogramcodes.Thisiswhatweseeinthex86PCs.Insuchsystemsnormallyoneconnects theARM CPUtoexternalDRAMand theDRAMmemoryisusedforbothcodeanddata.MicrosoftWindows8usessuchasystemforARM-basedTabletcomputers.

3.On-chipEEPROM:Ablockofmemoryfrom1Kbytestoseveralthousandbytesis set aside forEEPROMmemory.Theamount and the locationof theEEPROMspace vary from chip to chip in the ARM microcontrollers. Although in someapplicationstheEEPROMisusedforprogramcodestorage,itisusedmostoftenforsavingcriticaldata.NotallARMchipshaveon-chipEEPROM.

4.On-chipFlashROM:Ablockofmemoryfromafewkilobytestoseveralhundredkilobytesissetasideforprogramspace.Theprogramspaceisusedfortheprogramcode.Intoday’sARM microcontrollerchips,thecodeROMspaceisofFlashtypememory.The amount and the location of the codeROMspace vary fromchip tochip in the ARM products. See Table 2-2 and Examples 2-1 and 2-2. The FlashmemoryofcodeROMisunderthecontrolofthePC(programcounter).ThecodeROMmemorycanalsobeusedforstorageofstaticfixeddatasuchasASCIIdatastringsandlook-uptables.

Example2-1

AgivenARMchiphasthefollowingaddressassignments.Calculatethespaceandtheamountofmemorygiventoeachsection.

(a)Addressrangeof0x00100000–0x00100FFFforEEPROM

(b)Addressrangeof0x40000000–0x40007FFFforSRAM

(c)Addressrangeof0x00000000–0x0007FFFFforFlash

(d)Addressrangeof0xFFFC0000–0xFFFFFFFFforperipherals

Solution:

(a)Withaddressspaceof0x00100000to00100FFF,wehave00100FFF–00100000=0FFFbytes.Converting0FFFtodecimal,weget4,095+1,whichisequalto4Kbytes.

(b)Withaddressspaceof0x40000000to0x40007FFF,wehave40007FFF–40000000=7FFFbytes.Converting7FFFtodecimal,weget32,767+1,whichisequalto32Kbytes.

(c)Withaddressspaceof0000to7FFFF,wehave7FFFF–0=7FFFFbytes.Converting7FFFFtodecimal,weget524,287+1,whichisequalto512Kbytes.

(d)WithaddressspaceofFFFC0000toFFFFFFFF,wehaveFFFFFFFF–FFFC0000=3FFFFbytes.Converting3FFFFtodecimal,weget262,143+1,whichisequalto256Kbytes.

SeeFigure2-4.

Example2-2

FindtheaddressspacerangeofeachofthefollowingmemoryofanARMchip:

(a)2KBofEEPROMstartingataddress0x80000000

(b)16KBofSRAMstartingataddress0x90000000

(c)64KBofFlashROMstartingataddress0xF0000000

Solution:

(a)With2Kbytesofon-chipEEPROMmemory,wehave2048bytes(2×1024=2048).Thismapstoaddresslocationsof0x80000000to0x800007FF.Noticethat0isalwaysthefirstlocation.

(b)With16Kbytesofon-chipSRAM memory,wehave16,384bytes(16×1024=16,384),and16,384locationsgives0x90000000–0x90003FFF.

(c)With64Kwehave65,536bytes(64×1024=65,536),therefore,thememoryspaceis0xF0000000to0xF000FFFF.

Figure2-4:AnExampleofARMMemoryAllocation

5. Off-chipDRAM space: A DRAMmemory ranging from fewmegabytes toseveralhundredmegabytescanbe implemented for externalmemoryconnection.Althoughmany of theARMmicrocontrollers used in embedded products use theon-chipSRAMfordata, one can alsodesign anARM-based system inwhich theRAMisusedforbothdataandprogramcodes.Thisiswhatweseeinthex86PCs.In such systems one connects theARM CPU to externalDRAM and theDRAMmemoryisusedforbothcodeanddata,justlikethex86PCs.ManyARMvendors

are pushing theARM11 chip for the high-end of themarket such as servers anddatabasecomputers. In anARM11-based server computers, theexternal (off-chip)DRAM is used and managed by the operating system while on-chip Flash,EEPROM, and SRAMmemories are used for BIOS (basic input output system),POST (power on self test), and CPU scratch pad, respectively. In such cases thesystem is not different from x86 computers currently in use, except it usesARMCPU instead of a Pentium chip from Intel or x86 from AMD. The MicrosoftWindows8usesARMmotherboardwithoff-chipDRAM

Notice thefollowingdifferencesamong theon-chipFlashROM,dataSRAM,andEEPROMmemoriesinARM microcontrollersusedforembeddedproducts:

a)The data SRAM is used by theCPU for data variables and stack,whereas theEEPROMsareconsideredtobememorythatonecanalsoaddexternallytothechip.Inotherwords, whilemanyARMmicrocontroller chips have no EEPROMmemory, it isveryunlikelyforanARMmicrocontrollertohavenoon-chipdataSRAM.

b)Theon-chipFlashROMisusedforprogramcode,while theEEPROMisusedmostoftenforcriticalsystemdatathatmustnotbelostifpoweriscutoff.RememberthatdataSRAMisvolatilememoryanditscontentsarelostifthepowertothechipiscutoff.SincevolatiledataSRAMisused fordynamicvariables (constantlychangingdata)andstack.WeneedEEPROMmemorytosecurecriticalsystemdatathatdoesnotchangeveryoftenandwillnotbelostintheeventofpowerfailure.

c)Theon-chipFlashROMisprogrammedanderasedinblocksize.Theblocksizeis8,16,32,or64bytesormoredependingonthechiptechnology.That isnot thecasewith EEPROM, since the EEPROM is byte programmable and erasable. Both theEEPROMandFlashmemorieshave limitednumberoferase/writecycles,whichcanbe100,000 or more. While all semiconductor memories have unlimited number of reads(accesses),thenumberoftimesthatwecaneraseandwritetotheFlashandEEPROMarelimitedtoaround100,000timesatthetimeofthiswriting.Asthisnumberincreases,thelikelihoodthatwewillhaveharddrivemadeoutofFlashmemorywillalsoincrease.WehavealreadyseenhowtheFlashstickmemoryhasreplacedthefloppydriveswhichweresocommonintoearly2000s.

MemorymappedI/OintheARM

TraditionalCPUssuchasx86had twodistinct spaces: the I/Ospaceandmemoryspace.Inx86,whilealloftheI/OportsareaccessedusingINandOUTinstructions,thememoryaddressspaceisaccessedusingtheMOVinstruction.IntheARMCPUwehaveonlyonespaceanditismemoryspaceanditcanbeashighas4Gigabytes.TheARMusesthis4GigabytesforbothmemoryandI/Ospace.ThismappingoftheI/OportstomemoryspaceiscalledmemorymappedI/OandwasdiscussedinChapter0.

ReviewQuestions

1.Trueorfalse.TheGPRregistersareusedforstoringdata.

2.TheR0-R12registersarecalled______.

3.TheGPRregistersinARMare_____-bit.

4.TheR13-R15registersarecalled_____.

5.TheSFRregistersinARMare______-bit.

Section2.3:LoadandStoreInstructionsinARMTheinstructionswehaveusedsofarworkedwiththeimmediate(constant)valueof

KandtheGPRs.TheyalsousedtheGPRsastheirdestination.Wesawsimpleexamplesof using MOV and ADD earlier in Section 2.1. The ARM allows direct access to alllocations in the memory. In this section we show the instructions accessing variouslocations of the memory. This is one of the most important sections in the book formastering the topic of ARM Assembly language programming. Before we embark onstudying the load and store instructions of theARM,wemust note the fact that all theinstructions of theARM are 32-bit wide. In other words, every instruction of ARM isfixedat32-bit.Aswewillseeinlatersection,thefixedsizeinstructionisoneofthemostimportantcharacteristicsofRISCarchitecture.Incaseswherethereisnoneedforallthe32-bit,theARMhasaddedzerotomaketheinstructionfixedat32-bit.

LDRRd,[Rx]instruction

LDRRd,[Rx];loadRdwiththecontentsoflocationpointed

;tobyRxregister.Rxisanaddressbetween

;0x00000000to0xFFFFFFFF

TheLDRinstruction tells theCPUto load(bring in)oneword(32-bitor4bytes)fromabaseaddresspointedtobyRxintotheGPR.Afterthisinstructionisexecuted,theRdwillhavethesamevalueasfourconsecutivelocationsinthememory.Obviouslysinceeachmemorylocationcanholdonlyonebyte(ARMisabyteaddressableCPU),andourGPR is 32-bit, the LDR will bring in 4 bytes of data from 4 consecutive memorylocations.ThelocationscanbeintheSRAMoraFlashmemory.Forexample,the“LDRR2,[R5]” instructionwill copy the contents ofmemory locations pointed to byR5 intoregisterR2.SincetheR2registeris32-bitwide,itexpectsa32-bitoperandintherangeof0x00000000 to 0xFFFFFFFF.Thatmeans theR5 register gives the base address of thememory inwhich it holds the data. Therefore if R5=0x80000, theCPUwill fetch intoregisterR2,thecontentsofmemorylocations0x80000,0x80001,0x80002,and0x80003.

Thefollowing instruction loadsR7with thecontentsof location0x40000200.SeeFigure2-5.

;assumeR5=0x40000200

LDRR7,[R5];loadR7withthecontentsoflocations

;0x40000200-0x40000203

Figure2-5:ExecutingtheLDRInstruction

STRRx,[Rd]instruction

STRRx,[Rd];storeregisterRxintolocationspointedtobyRd

TheSTRinstructiontellstheCPUtostore(copy)thecontentsoftheGPRtoabaseaddress location pointed to by the Rd register. Notice that the source register of STRinstruction isplacedbefore thedestination register.ObviouslysinceGPRis32-bitwide(4-byte)weneed four consecutivememory locations to store the contents ofGPR.ThememorylocationsmustbewritablesuchasSRAM.SeeFigure2-6.The“STRR3,[R6]”instruction will copy the contents of R3 into locations pointed to by R6. Locations0x40000200 through 0x40000203 of the SRAMmemory will have the contents of R3sinceR6=0x40000200.

Figure2-6:ExecutingtheSTRInstruction

ThefollowinginstructionstoresthecontentsofR5intolocationspointedtobyR1.Assume0x40000340isanaddressofinternalRAMlocationsandheldbyregisterR1.


STRR5,[R1];storeR5intolocationspointedtobyR1.

LDRBRd,[Rx]instruction

LDRBRd,[Rx];loadRdwiththecontentsofthelocation

;pointedtobyRxregister.

The LDRB instruction tells the CPU to load (copy) one byte from an addresspointed tobyRxinto the lowerbyteofRd.After this instruction isexecuted, the lower

byte ofRdwill have the same value asmemory location pointed to byRx. Itmust benoted that theunusedportion (theupper24bits)of theRd registerwillbeall zeros, asshowninFigure2-7.

Figure2-7

LDRvs.LDRB

Aswementioned earlier,we canuse theLDR instruction to copy the contentsoffourconsecutivememorylocationsintoa32-bitGPR.Therearesituationsthatwedonotneed to bring 4 bytes of data intoGPR.AnUART register is such a case. TheUARTregistersaregenerally8-bitand takeonlyonememoryspace location(memorymappedI/O).UsingLDRB,we canbring intoGPRa single byte of data fromUART registers.Thisisawidelyusedinstructionforaccessingthe8-bitI/Oandperipheralports.

STRBRx,[Rd]instruction

STRBRx,[Rd];storethebyteinregisterRxinto

;locationpointedtobyRd

The STRB instruction tells the CPU to store (copy) the byte value in Rx to anaddress location pointed to by the Rd register. After this instruction is executed, thememorylocationspointedtobytheRdwillhavethesamebyteasthelowerbyteoftheRx,asshowninFigure2-8.

Figure2-8

ThefollowingprogramfirstloadstheR1registerwithvalue0x55,thenstoresthisvalueintolocation0x40000100:


MOVR1,#0x55;R1=55(inhex)

STRBR1,[R5];copyR1locationpointedtobyR5

GeneralpurposeI/O(GPIO)arepartofthespecialfunctionregistersinthememory-mapped I/O. They are connected to the I/O pins of the ARMmicrocontroller. See thedatasheetofyourARMmicrocontroller.WecanalsostorethecontentsofaGPRintoanylocationintheSRAMregionofthedataspace.SeeExamples2-3and2-4.

Example2-3

StatethecontentsofRAMlocations0x92to0x96afterthefollowingprogramisexecuted:

MOVR1,#0x99;R1=0x99

MOVR6,#0x92;R6=0x92

STRBR1,[R6];storeR1intolocationpointedtobyR6

;(location0x92)

ADDR6,R6,#1;R6=R6+1

MOVR1,#0x85;R1=0x85


;(location0x93)

ADDR6,R6,#1;R6=R6+1

MOVR1,#0x3F;R1=0x3F


ADDR6,R6,#1;R6=R6+1

MOVR1,#0x63;R1=0x63


ADDR6,R6,#1;R6=R6+1

MOVR1,#0x12;R1=0x12

STRBR1,[R6]

Solution:

AftertheexecutionofSTRBR1,[R6]datamemorylocation0x92hasvalue0x99;

aftertheexecutionofSTRBR1,[R6]datamemorylocation0x93hasvalue0x85;

aftertheexecutionofSTRBR1,[R6]datamemorylocation0x94hasvalue0x3F;andsoon,asshowninthechart.

Address Data

0x92 0x99

0x93 0x85

0x94 0x3F

0x95 0x63

0x96 0x12

Example2-4

StatethecontentsofR2,R1,andmemorylocation0x20afterthefollowingprogram:

MOVR2,#0x5;loadR2with5(R2=0x05)

MOVR1,#0x2;loadR1with2(R1=0x02)



MOVR5,#0x20;R5=0x20


Solution:

TheprogramloadsR2withvalue5.ThenitloadsR1withvalue2.ThenitaddstheR1registertoR2twice.Attheend,itstorestheresultinlocation0x20ofmemory.

AfterMOVR2,#0x05

Location Data

R2 5

R1

0x20

AfterMOVR1,#0x02

Location Data

R2 5

R1 2

0x20

AfterADDR2,R1,R1

Location Data

R2 7

R1 2

0x20

AfterADDR2,R1,R1

Location Data

R2 9

R1 2

0x20

AfterSTRB[R5],R2

Location Data

R2 9

R1 2

0x20 9

STRvs.STRB

Aswementionedearlier,wecanusetheSTRinstructiontocopythecontentsof32-bitGPR into four consecutivememory locations. The I/O ports are generally 8-bit andtakeonlyonememoryspacelocation(memorymappedI/O).UsingSTRB,wecansendabyteofdata fromGPR tomemory locationsuchasan I/Oport.Again, this isawidelyusedinstructionforaccessingthe8-bitI/Oandperipheralports.

LDRHRd,[Rx]instruction

LDRHRd,[Rx];loadRdwiththehalf-wordpointed

;tobyRxregister

TheLDRH instruction tells theCPU to load (copy) half-word (16-bit or 2 bytes)from a base address pointed to byRx into the lower 16-bits ofRdRegister.After thisinstructionisexecuted,thelower16-bitofRdwillhavethesamevalueastwoconsecutivelocations in the memory pointed to by base address of Rx. It must be noted that theunusedportion(theupper16bits)oftheRdregisterwillbeallzeros,asshowninFigure2-9.

Figure2-9

Table2-3comparesLDRB,LDRH,andLDR.

DataSize Bits Decimal Hexadecimal Loadinstructionused

Byte 8 0–255 0-0xFF LDRB

Half-word 16 0–65535 0-0xFFFF LDRH

Word 32 0–232-1 0-0xFFFFFFFF LDR

Table2-3:UnsignedDataRangeinARMandassociatedLoadInstructions

STRHRx,[Rd]instruction

STRHRx,[Rd];storehalf-word(2-byte)inregisterRx

;intolocationspointedtobyRd

TheSTRHinstructiontellstheCPUtostore(copy)thelower16-bitcontentsoftheRxtoanaddresslocationpointedtobytheRdregister.Afterthisinstructionisexecuted,thememorylocationspointedtobytheRdwillhavethesamevalueasthelower16-bitofRxRegister.Thelocationsarepartofthedataread/writememoryspacesuchason-chipSRAM.Forexample,the“STRHR3,[R6]”instructionwillcopythe16-bitlowercontentsofR3intotwoconsecutivelocationspointedtobybaseregisterR6.AsyoucanseeinFigure2-10,locations0x2000and0x2001oftheSRAMmemorywillhavethecontentsofthelowerhalfwordofR3sinceR6=0x2000.

Figure2-10

InTable2-4youseeacomparisonbetweenSTRB,STRH,andSTR.


Byte 8 0–255 0-0xFF STRB

Half-word 16 0–65535 0-0xFFFF STRH

Word 32 0–232-1 0-0xFFFFFFFF STR

Table2-4:UnsignedDataRangeinARMandassociatedStoreInstructions

ReviewQuestions

1.Trueorfalse.No32-bitvaluecanbeloadeddirectlyintointernalR0-R12.

2.Writeinstructionstoloadvalue0x95intolocationwithaddress0x20.

3.WriteinstructionstomovethecontentsofR2tomemorylocationpointedtobyR8.

4.Writeinstructionstoloadvaluesfrommemorylocations0x20–0x23toR4register.

5.Whatisthelargesthexvaluethatcanbemovedintoasinglelocationinthedatamemory?Whatisthedecimalequivalentofthehexvalue?

6.“LDRR6,[R3]”putstheresultin_____.

7.Whatdoes“STRBR1,[R2]”do?

8.Whatisthelargesthexvaluethatcanbemovedintofourconsecutivelocationsinthedatamemory?Whatisthedecimalequivalentofthehexvalue?

Section2.4:ARMCPSR(CurrentProgramStatusRegister)Likeallothermicroprocessors, theARMhasa flag register to indicate arithmetic

conditions such as the carry bit. The flag register in the ARM is called the currentprogramstatusregister(CPSR).Inthissection,wediscussvariousbitsofthisregisterandprovidesomeexamplesofhowitisaltered.Chapters3and4showhowtheflagbitsofthestatusregisterareused.

ARMcurrentprogramstatusregister

The status register is a 32-bit register. See Figure 2-11 for the bits of the statusregister.ThebitsC,Z,N,andVarecalledconditionalflags,meaningthat theyindicatesomeconditionsthatresultafteraninstructionisexecuted.Eachoftheconditionalflagscanbeusedtoperformaconditionalbranch(jump),aswewillseeinChapter4.

Figure2-11:CPSR(CurrentProgramStatusRegister)

The following is a brief explanationof the flagbits of the current program statusregister(CPSR).Theimpactofinstructionsonthisregisteristhendiscussed.

C,thecarryflag

This flag is set whenever there is a carry out from the D31 bit. This flag bit isaffectedaftera32-bitadditionorsubtraction.Chapter4showshowthecarryflagisused.

Z,thezeroflag

Thezeroflagreflects theresultofanarithmeticor logicoperation. If theresult iszero,thenZ=1.Therefore,Z=0iftheresultisnotzero.SeeChapter4toseehowweusetheZflagforlooping.

N,thenegativeflag

BinaryrepresentationofsignednumbersusesD31asthesignbit.Thenegativeflagreflectstheresultofanarithmeticoperation.IftheD31bitoftheresultiszero,thenN=0andtheresultispositive.IftheD31bitisone,thenN=1andtheresultisnegative.Thenegative and V flag bits are used for the signed number arithmetic operations and arediscussedinChapter5.

V,theoverflowflag

This flag is set whenever the result of a signed number operation is too large,causingthehigh-orderbittooverflowintothesignbit.Ingeneral,thecarryflagisusedtodetect errors inunsignedarithmeticoperationswhile theoverflow flag isused todetecterrorsinsignedarithmeticoperations.TheVandNflagbitsareusedforsignednumberarithmeticoperationsandarediscussedinChapter5.

TheTflagbitisusedtoindicatetheARMisinThumbstate.TheIandFflagsareusedtoenableordisabletheinterrupt.SeetheARMmanual.

Ssuffixandthestatusregister

MostofARMinstructionscanaffectthestatusbitsofCPSRaccordingtotheresult.Ifwe need an instruction to update the value of status bits inCPSR,we have to put Ssuffixattheendofinstructions.Thatmeans,forexample,ADDSinsteadofADDisused.

ADDinstructionandthestatusregister

NextweexaminetheimpactoftheSUBSandADDSinstructionsontheflagbitsCandZofthestatusregister.Someexamplesshouldclarifytheirmeanings.AlthoughalltheflagbitsC,Z,V,andNareaffectedbytheADDSandSUBSinstruction,wewillfocusonflagsCandZfornow.TheotherflagbitsarediscussedinChapter5,becausetheyrelateonlytosignednumberoperations.ExamineExample2-5toseetheimpactoftheADDSinstruction on selected flag bits. See also Example 2-6 to see the impact of the SUBSinstructiononselectedflagbits.

Example2-5

ShowthestatusoftheCandZflagsaftertheadditionof

a)0x0000009Cand0xFFFFFF64inthefollowinginstruction:

;assumeR1=0x0000009CandR2=0xFFFFFF64

ADDSR2,R1,R2;addR1toR2andplacetheresultinR2

b)0x0000009Cand0xFFFFFF69inthefollowinginstruction:

;assumeR1=0x0000009CandR2=0xFFFFFF69

ADDSR2,R1,R2;addR1toR2andplacetheresultinR2

Solution:

a)

0x0000009C 00000000000000000000000010011100

+0xFFFFFF64

+11111111111111111111111101100100

0x100000000 100000000000000000000000000000000

C=1becausethereisacarrybeyondtheD31bit.

Z=1becausetheR2(theresult)hasvalue0initaftertheaddition.

b)

0x0000009C 00000000000000000000000010011100

+0xFFFFFF69

+11111111111111111111111101101001

0x100000005 100000000000000000000000000000101

C=1becausethereisacarrybeyondtheD31bit.

Z=0becausetheR2(theresult)doesnothavevalue0initaftertheaddition.(R2=0x00000005)

Example2-6

ShowthestatusoftheZflagduringtheexecutionofthefollowingprogram:

MOVR2,#4;R2=4

MOVR3,#2;R3=2

MOVR4,#4;R4=4

SUBSR5,R2,R3;R5=R2-R3(R5=4-2=2)

SUBSR5,R2,R4;R5=R2-R4(R5=4-4=0)

Solution:

TheZflagisraisedwhentheresultiszero.Otherwise,itiscleared(zero).Thus:

After ValueofR5 Zflag

SUBSR5,R2,R3 2 0

SUBSR5,R2,R4 0 1

Notallinstructionsaffecttheflags

SomeinstructionsaffectallthefourflagbitsC,Z,V,andN(e.g.,ADDS).Butsomeinstructions affectno flagbits at all.Thebranch instructions are in this category.Someinstructionsaffectonlysomeof theflagbits.The logic instructions(e.g.,ANDS)are inthiscategory.

Table 2-5 shows the instructions and the flag bits affected by them.AppendixAprovidesacompletelistofalltheinstructionsandtheirassociatedflagbits.

Instruction FlagsAffected

ANDS C,Z,N

ORRS C,Z,N

MOVS C,Z,N

ADDS C,Z,N,V

SUBS C,Z,N,V

B Noflags

NotethatwecannotputSafterBinstruction.

Table2-5:FlagBitsAffectedbyDifferentInstructions

Flagbitsanddecisionmaking

Thereareinstructionsthatwillmakeaconditionaljump(branch)basedonthestatusof the flag bits. Table 2-6 provides some of these instructions. Chapter 4 discusses theconditionalbranchinstructionsandhowtheyareused.

Instruction FlagsAffected

BCS BranchifC=1

BCC BranchifC=0

BEQ BranchifZ=1

BNE BranchifZ=0

BMI BranchifN=1

BPL BranchifN=0

BVS BranchifV=1

BVC BranchifV=0

Table2-6:ARMBranch(Jump)InstructionsUsingFlagBits

ReviewQuestions

1.TheflagregisterintheARMiscalledthe________.

2.WhatisthesizeoftheflagregisterintheARM?

3.FindtheCandZflagbitsforthefollowingcode:

;assumeR2=0xFFFFFF9F


ADDSR2,R1,R2

4.FindtheZflagbitforthefollowingcode:

;assumeR7=0x22

;assumeR3=0x22

ADDSR7,R3,R7

5.FindtheCandZflagbitsforthefollowingcode:

;assumeR2=0x67

;assumeR1=0x99

ADDSR2,R1,R2

Section2.5:ARMDataFormatandDirectivesInthissectionwelookatsomewidelyuseddataformatsanddirectivessupportedby

theARMassembler.

ARMdatatype

ARMhasfourdatatypes.Theyarebit,byte(8-bit),half-word(16-bit)andword(32bit).DuetothefactthatARMregistersare32-bititisthejobofprogrammer/compilertobreakdowndatalargerthan32bitstobeprocessedbytheCPU.ThedatatypesusedbytheARMcanbepositiveornegative.AdiscussionofsignednumbersisgiveninChapter5.

Dataformatrepresentation

There are several ways to represent a byte of data in the ARM assembler. Thenumberscanbeinhex,binary,decimal,orASCIIformats.Thefollowingareexamplesofhoweachworks.

Hexnumbers

TorepresentHexnumbersinanARMassemblerweput0x(or0X)infrontofthenumberlikethis:

MOVR1,#0x99

Hereareafewlinesofcodethatusethehexformat:

MOVR2,#0x75;R2=0x75

MOVR1,#0x11

ADDR2,R1,R2;R2=R1+R2=0x75+0x11=0x86

Decimalnumbers

ToindicatedecimalnumbersinsomeARMassemblerssuchasKeilwesimplyusethedecimal(e.g.,12)andnothingbeforeorafterit.Herearesomeexamplesofhowtouseit:

MOVR7,#12;R7=00001100or0Cinhex

MOVR1,#32;R1=32=0x20

Binarynumbers

To represent binary numbers in an ARM assembler we put 2_ in front of thenumber.Itisasfollows:

MOVR6,#2_10011001;R6=10011001inbinaryor99inhex

Numbersinanybasebetween2and9

To indicate a number in any base n between 2 and 9 in an ARM assembler wesimplyusethen_infrontofit.Herearesomeexamplesofhowtouseit:

MOVR7,#8_33;R7=33inbase8or011011inbinaryformat

MOVR6,#2_10011001;R6=10011001inbase2or99inhex

ASCIIcharacters

TorepresentASCIIdatainanARMassemblerweusesinglequotesasfollows:

LDRR3,#‘2’;R3=00110010or32inhex(SeeAppendixF)

This is the same as other assemblers such as the 8051 and x86. Here is anotherexample:

LDRR2,#‘9’;R2=0x39,whichishexnumberforASCII‘9’

Torepresentastring,doublequotesareused;andfordefiningASCIIstrings(morethanonecharacter),weusetheDCBdirective.

Assemblerdirectives

While instructions tell the CPU what to do, directives (also called pseudo-instructions) give directions to the assembler. For example, the MOV and ADDinstructionsarecommandstotheCPU,butEQU,END,andENTRYaredirectivestotheassembler.The followingsectionpresentssomewidelyuseddirectivesof theARMandhow they are used. The directives help us develop our program easier and make ourprogramlegible(morereadable).Table2-7showssomeassemblerdirectives.

Directive Description

AREA Instructstheassemblertoassembleanewcodeordatasection

END Informstheassemblerthatithasreachedtheendofasourcefile.

ENTRY Declaresanentrypointtoaprogram.

EQU Givesasymbolicnametoanumericconstant,aregister-relativevalueoraPC-relativevalue.

INCLUDE Itaddsthecontentsofafiletoourprogram.

Table2-7:SomeWidelyUsedARMDirective

AREA

TheAREA directive tells the assembler to define a new section ofmemory. Thememory can be code (instruction) or data and can have attributes such as ReadOnly,ReadWrite, and so on. This iswidely used to define one ormore blocks of indivisiblememoryforcodeordatatobeusedbythelinker.EveryAssemblylanguageprogramhasatleastoneAREA.Thefollowingistheformat:

AREAsectionname,attribute,attribute,…

ThefollowinglinedefinesanewareanamedMY_ASM_PROG1whichhasCODEandREADONLYattributes:

AREAMY_ASM_PROG1,CODE,READONLY

Among widely used attributes are CODE, DATA, READONLY, READWRITE,COMMON,andALIGN.Thefollowingdescribesthesewidelyusedattributes.

READWRITE isanattributegiven toanareaofmemorywhichcanbereadfromandwrittento.SinceitisREADWRITEsectionoftheprogramitisbydefaultforDATA.InARMAssemblylanguageweusethisareatosetasideSRAMmemoryforscratchpadandstack.TheAssemblerputstheREADWRITEsectionsnexttoeachotherintheSRAMmemory.

READONLY is an attribute given to an area ofmemorywhich can only be readfrom.SinceitisREADONLYsectionoftheprogramitisbydefaultforCODE.InARMAssemblylanguageweusethisareatowriteourinstructionsformachinecodeexecution.TheREADONLYsectionsareputnexttoeachotherintheflashmemory.

Note

InKeil,ThememoryspaceofREADONLYandREADWRITEaredefinedintheLinkerandTargettabsoftheProject\Options.Keilsetsthevaluesaccordingtothememorymapofthechosenchip.

CODE is an attribute given to an area of memory used for executable machineinstruction.SinceitisusedforcodesectionoftheprogramitisbydefaultREADONLYmemory. In ARM Assembly language we use this area to write our instructions. Thefollowinglinedefinesanewareaforwritingprograms:

AREAOUR_ASM_PROG,CODE,READONLY

DATA isanattributegiven toanareaofmemoryusedfordataandno instruction(machine instructions)canbeplaced in thisarea.Since it isusedfordatasectionof theprogram it is bydefault aREADWRITEmemory. InARMAssembly languageweusethisareatosetasideSRAMmemoryforscratchpadandstack.Thefollowinglinedefinesanewareafordefiningvariables:

AREAOUR_VARIABLES,DATA,READWRITE

Todefineconstantvaluesintheflashmemorywewritethefollowing:

AREAOUR_CONSTS,DATA,READONLY

COMMONisanattributegiventoanareaofDATAmemorysectionwhichcanbeusedcommonlybyseveralprogramcodes.WedonotinitializetheCOMMONsectionofthe memory since it is used by compiler exclusively. The compiler initializes theCOMMONmemoryareawithallzeros.

ALIGN is another attribute given to an area ofmemory to indicate howmemoryshouldbeallocatedaccordingtotheaddresses.WhentheALIGNisusedforCODEandREADONLYitalignedin4-bytesaddressboundarybydefaultsincetheARMinstructionsare all 32-bit (4-bytes) word. The ALIGN attribute of AREA has a number after likeALIGN=3whichindicatestheinformationshouldbeplacedinmemorywithaddressesof

23,thatis0x50000,0x50008,0x50010,0x50020,andsoon.Wewillseemoreaboutthatsoon.TheusageandimportanceofALIGNattributeisdiscussedinChapter6.

ENTRY

Another important pseudocode is the ENTRY directive. This indicates to theassemblerthebeginningoftheexecutablecode.TheENTRYdirectiveisthefirstlineoftheARMAssemblylanguagecodesectionoftheprogram,meaningthatanythingaftertheENTRY directive in the source code is considered actual machine instruction to beexecutedbytheCPU.ForagivenARMAssemblylanguageprogramwecanhaveonlyoneENTRYpoint.HavingmultipleENTRYdirectiveinanAssemblylanguageprogramwillgiveyouanerrorbyassembler.

ENDdirective

AnotherimportantpseudocodeistheENDdirective.Thisindicatestotheassemblertheendofthesource(asm)file.TheENDdirectiveisthelastlineoftheARMAssemblylanguageprogram,meaning that anything after theENDdirective in the source code isignored by the assembler. Program 2-1 shows how the AREA, ENTRY and ENDdirectivesareused.

Program2-1

;ARMAssemblyLanguageProgramToAddSomeDataandStoretheSUMinR3.

AREAPROG_2_1,CODE,READONLY

ENTRY

MOVR1,#0x25;R1=0x25

MOVR2,#0x34;R2=0x34


HEREBHERE;stayhereforever

END

LDR

Inthelastsection,westatedthatonecannot loadvalueslarger than0xFFintothe32-bit registers ofARM since only 8 bits are used for the immediate value.TheARMassemblerprovideusapseudo-instructionof“LDRRd,=32-bit_immidiate_vlaue”toloadvaluegreaterthan0xFF.Wewillexaminehowthispseudo-instructionworksinChapter 6.Fornow,justnoticethe=signusedinthesyntax.Thefollowingpseudo-instructionloadsR7with0x112233.

LDRR7,=0x112233

Wewill use this pseudo-instruction to load 32-bit value into register extensivelythroughout the book. To load values less than 0xFF, we still use the “MOV Rd,#8-

bit_immidiate_value” instruction since it is a real instruction of ARM, therefore moreefficientincodesize.

EQU(equate)

Thisisusedtodefineaconstantvalueorafixedaddress.TheEQUdirectivedoesnotsetasidestorageforadata item,butassociatesaconstantnumberwithadataoranaddresslabelsothatwhenthelabelappearsintheprogram,itsconstantwillbesubstitutedfor the label.ThefollowingusesEQUfor thecounterconstant,and then theconstant isusedtoloadtheR2register:

COUNTEQU0x25

……….

MOVR2,#COUNT;R2=0x25

Whenexecutingtheaboveinstruction“MOVR2,#COUNT”,theregisterR2willbeloadedwiththevalue0x25.WhatistheadvantageofusingEQU?Assumethataconstant(afixedvalue)isusedthroughouttheprogram,andtheprogrammerwantstochangeitsvalue everywhere. By the use of EQU, the programmer can change it once and theassembler will change all of its occurrences throughout the program. This allows theprogrammertoavoidsearchingtheentireprogramtryingtofindeveryoccurrence.

UsingEQUforfixeddataassignment

TogetmorepracticeusingEQUtoassignfixeddata,examinethefollowing:

DATA1EQU0x39;thewaytodefinehexvalue

DATA2EQU2_00110101;thewaytodefinebinaryvalue(35inhex)

DATA3EQU39;decimalnumbers(27inhex)

DATA4EQU‘2’;ASCIIcharacters

UsingEQUforSFRaddressassignment

EQUisalsowidelyusedtoassignSFRaddresses.Examinethefollowingcode:

PORTBEQU0xF0018;SFRPortBaddress

MOVR6,#0x01;R6=0x01

LDRR2,=PORTB;R2=0xF0018

STRBR6,[R2];PortBnowhas0x01

UsingEQUforRAMaddressassignment

AnothercommonusageofEQUisfortheaddressassignmentoftheinternalSRAM.ItisexactlylikeusingEQUforSFRaddressassignment.Examinethefollowingcode:

SUMEQU0x40000120;assignRAMloctoSUM

MOVR2,#5;loadR2with5

MOVR1,#2;loadR1with2


LDRR3,=SUM;loadR3with0x40000120

STRBR2,[R3];storetheresultSUM

This is especiallyhelpfulwhen theaddressneeds tobechanged inorder touseadifferentARMchipforagivenproject.ItismucheasiertorefertoanamethananumberwhenaccessingRAMaddresslocations.

RN(equate)

Thisisusedtodefineanameforaregister.TheRNdirectivedoesnotsetasideaseperate storage for the name, but associates a registerwith that name. It improves theclarity.Program2-2showshowweuseSUMnameforR3.

Program2-2:AnARMAssemblyLanguageProgramUsingRNDirective

;ARMAssemblyLanguageProgramToAddSomeData

;andstoretheSUMinR3.

VAL1RNR1;defineVAL1asanameforR1


SUMRNR3;defineSUMasanameforR3


ENTRY

MOVVAL1,#0x25;R1=0x25


ADDSUM,VAL1,VAL2;R3=R2+R1

HEREBHERE

END

INCLUDEdirective

The includedirective tells theARM assembler toadd thecontentsofa file toourprogram(likethe#includedirectiveinClanguage).

Assemblerdataallocationdirectives

In most Assembly languages there are some directives to allocate memory andinitializeitsvalue.InARMAssemblylanguageDCB,DCD,andDCWallocatememoryandinitializethem.TheSPACEdirectiveallocatesmemorywithoutinitializingit.

DCBdirective(defineconstantbyte)

TheDCBdirectiveallocatesabytesizememoryandinitializesthevalues.

MYVALUEDCB5;MYVALUE=5

MYMSAGEDCB“HELLOWORLD”;string

DCWdirective(defineconstanthalf-word)

TheDCWdirectiveallocatesahalf-wordsizememoryandinitializesthevalues.

MYDATADCW0x20,0xF230,5000,0x9CD7

DCDdirective(defineconstantword)

TheDCDdirectiveallocatesawordsizememoryandinitializesthevalues.

MYDATADCD0x200000,0xF30F5,5000000,0xFFFF9CD7

SeeTables2-8and2-9.

Directive Description

DCB Allocatesoneormorebytesofmemory,anddefinestheinitialruntimecontentsofthememory

DCW Allocatesoneormorehalfwordsofmemory, alignedon two-byteboundaries, anddefinestheinitialruntimecontentsofthememory.

DCWU Allocatesoneormorehalfwordsofmemory,anddefinestheinitialruntimecontentsofthememory.Thedataisnotaligned.

DCD Allocates one or more words of memory, aligned on four-byte boundaries, anddefinestheinitialruntimecontentsofthememory.

DCDU Allocatesoneormorewordsofmemoryanddefinestheinitialruntimecontentsofthememory.Thedataisnotaligned.

Table2-8:SomeWidelyUsedARMMemoryAllocationDirectives

DataSize Bits Decimal Hexadecimal Directive Instruction

Byte 8 0–255 0-0xFF DCB STRB/LDRB

Half-word 16 0–65535 0-0xFFFF DCW STRH/LDRH

Word 32 0–232-1 0-0xFFFFFFFF DCD STR/LDR

Table2-9:UnsignedDataRangeinARMandassociatedInstructions

In Program 2-3A you see an example of storing constant values in the programmemoryusingthedirectives.Figure2-12showshowthedataisstoredinmemory.Intheexample, theprogramgoes to locations0x00 to0x0F.TheDCBdirectivestoresdata inaddresses0x10–0x17.Asyouseeonebyteisallocatedforeachdata.TheDCDallocates4

bytesforeachdata.Asaresultthelowestbyteof0x23222120(whichis0x20)isstoredinlocation0x18andthenextbytesarestoredinthenextlocations.

Program2-3A:SampleofStoringFixedDatainProgramMemory

;storingdatainprogrammemory.

AREALOOKUP_EXAMPLE,READONLY,CODE

ENTRY

LDRR2,=OUR_FIXED_DATA;pointtoOUR_FIXED_DATA

LDRBR0,[R2];loadR0withthecontents

;ofmemorypointedtobyR2

ADDR1,R1,R0;addR0toR1


OUR_FIXED_DATA

DCB0x55,0x33,1,2,3,4,5,6

DCD0x23222120,0x30

DCW0x4540,0x50

END

Figure2-12:MemoryDumpforProgram2-3A

TheDCWdirective allocates 2 bytes for eachdata.For example, the lowbyte of0x4540islocatedinaddress0x20andthehighbyteofitgoestoaddress0x21.Similarlythelowbyteof0x50islocatedinaddress0x22andthehighbyteofitinaddress0x23.

Intheprogramtoaccessthedata,firsttheR2registerisloadedwiththeaddressofOUR_FIXED_DATA.Inthisexample,OUR_FIXED_DATAhasaddress0x10. So,R2isloadedwith0x10.Then,thecontentsoflocation0x10isloadedintoregisterR0,usingtheLDRBinstruction.

SPACEdirective

Using the SPACE directivewe can allocatememory for variables. The followinglinesallocate4and2bytesofmemoryandnamethemasLONG_VARandOUR_ALFA:

LONG_VARSPACE4;Allocate4bytes

OUR_ALFASPACE2;Allocate2bytes

In the followingprogram3variablesaredefined:A,B,andC.ThenAandBareinitializedwith5and4,respectively.InthenextstepAandBareaddedtogetherandtheresultisputinC:

Program2-3B

AREAOUR_PROG,CODE,READONLY

ENTRY

;A=5

LDRR0,=A;R0=Addr.ofA

MOVR1,#5;R1=5

STRR1,[R0];init.Awith5

;B=4

LDRR0,=B;R0=Addr.ofB

MOVR1,#4;R1=4

STRR1,[R0];init.Bwith4

;R1=A

LDRR0,=A;R0=Addr.ofA

LDRR1,[R0];R1=valueofA

;R2=B

LDRR0,=B;R0=Addr.ofA

LDRR2,[R0];R2=valueofA

;C=R1+R2(C=A+B)

ADDR3,R1,R2;R3=A+B

LDRR0,=C;R0=Addr.ofC

STRR3,[R0];C=R3

loopBloop

AREAOUR_DATA,DATA,READWRITE

;AllocatesthefollowingsinSRAMmemory

ASPACE4

BSPACE4

CSPACE4

END

ADRdirective

ToloadregisterswiththeaddressesofmemorylocationswecanalsousetheADRpseudo-instructionwhichhasabetterperformance.SeeChapter6formore.ADRhasthefollowingsyntax:

ADRRn,label

For example, in Program 2-3A we can load R2 with the address ofOUR_FIXED_DATAusingthefollowingpseudo-instruction:

ADRR2,OUR_FIXED_DATA;pointtoOUR_FIXED_DATA

ALIGN

Thisisusedtomakesuredataisalignedin32-bitwordor16-bithalfwordmemoryaddress.ThefollowingusesALIGNtomakethedata32-bitwordaligned:

ALIGN4;thenextinstructionisword(4bytes)aligned

…

ALIGN2;thenextinstructionishalf-word(2bytes)aligned

…

Example2-7showstheresultofusingtheALIGNdirective.

Example2-7

ComparetheresultofusingALIGNinthefollowingprograms:

a)

AREAE2_7A,READONLY,CODE

ENTRY

ADRR2,DTA

LDRBR0,[R2]

ADDR1,R1,R0

H1BH1

DTADCB0x55

DCB0x22

END

b)

AREAE2_7B,READONLY,CODE

ENTRY

ADRR2,DTA

LDRBR0,[R2]

ADDR1,R1,R0

H1BH1

DTADCB0x55

ALIGN2

DCB0x22

END

c)

AREAE2_7C,READONLY,CODE

ENTRY

ADRR2,DTA

LDRBR0,[R2]

ADDR1,R1,R0

H1BH1

DTADCB0x55

ALIGN4

DCB0x22

END

Solution:

a)

WhenthereisnoALIGNdirectivetheDCBdirectiveallocatesthefirstemptylocationforitsdata.Inthisexample,address0x10isallocatedfor0x55.So0x22goestoaddress0x11.

b)

IntheexampletheALIGNissetto2whichmeansthedatashouldbeputinalocationwithevenaddress.The0x55goestothefirstemptylocationwhichis0x10.Thenextemptylocationis0x11whichisnotamultipleof2.So,itisfilledwith0andthenextdatagoestolocation0x12.

c)

IntheexampletheALIGNissetto4whichmeansthedatashouldgotolocationswhoseaddressismultipleof4.The0x55goestothefirstemptylocationwhichis0x10.Thenextemptylocationsare0x11,0x12,and0x13whicharenotamultipleof4.So,theyarefilledwith0sandthenextdatagoestolocation0x14.

RulesforlabelsinAssemblylanguage

Bychoosing labelnames that aremeaningful, aprogrammercanmakeaprogrammucheasier to readandmaintain.Thereareseveral rules thatnamesmust follow.First,each label name must be unique. The names used for labels in Assembly languageprogramming consist of alphabetic letters in both uppercase and lowercase, the digits 0through9,andthespecialcharactersquestionmark(?),period(.),at(@),underline(_),and dollar sign ($). The first character of the labelmust be an alphabetic character. Inotherwords,itcannotbeanumber.Everyassemblerhassomereservedwordsthatmustnot be used as labels in the program. Foremost among the reserved words are themnemonics for the instructions.Forexample,“MOV”and“ADD”are reservedbecausethey are instruction mnemonics. In addition to the mnemonics there are some other

reservedwords.Checkyourassemblerforthelistofreservedwords.

ReviewQuestions

1.GiveanexampleofhexdatarepresentationintheARMassembler.

2. Showhow to represent decimal 20 in formats of (a) hex, (b) decimal, and (c)binaryintheARMassembler.

3.WhatistheadvantageinusingtheEQUdirectivetodefineaconstantvalue?

4.Showthehexnumbervalueusedbythefollowingdirectives:

(a)ASC_DATAEQU‘4’(b)MY_DATAEQU2_00011111

5.GivethevalueinR2forthefollowing:

MYCOUNTEQU15

MOVR2,#MYCOUNT

6.Givethevalueindatamemorylocation0x200000forthefollowing:

MYCOUNTEQU0x95

MYMEMEQU0x200000

MOVR0,#MYCOUNT

LDRR2,=MYMEM

STRBR0,[R2]

7.Givethevalueindatamemory0x630000forthefollowing:

MYDATAEQU12

MYMEMEQU0x00630000

FACTOREQU0x10

MOVR1,#MYDATA

MOVR2,#FACTOR

LDRR3,=MYMEM

ADDR1R2,R1

STRBR1,[R3]

Section2.6:IntroductiontoARMAssemblyProgrammingInthissectionwediscussAssemblylanguageformatanddefinesomewidelyused

terminologyassociatedwithAssemblylanguageprogramming.

WhiletheCPUcanworkonlyinbinary,itcandosoataveryhighspeed.Itisquitetedious and slow for humans, however, to dealwith 0s and 1s in order to program thecomputer.Aprogramthatconsistsof0sand1s iscalledmachine language. In theearlydaysof thecomputer,programmerscodedprograms inmachine language.Although thehexadecimal systemwasused as amore efficientway to represent binarynumbers, theprocess of working in machine code was still cumbersome for humans. Eventually,Assembly languagesweredeveloped,whichprovidedmnemonics for themachine codeinstructions,plusotherfeaturesthatmadeprogrammingfasterandlesspronetoerror.Thetermmnemonicisfrequentlyusedincomputerscienceandengineeringliteraturetoreferto codes and abbreviations that are relatively easy to remember. Assembly languageprograms must be translated into machine code by a program called an assembler.Assemblylanguageisreferredtoasalow-levellanguagebecauseitdealsdirectlywiththeinternal structureof theCPU.Toprogram inAssembly language, theprogrammermustknowalltheregistersoftheCPUandthesizeofeach,aswellasotherdetails.

Today,onecanusemanydifferentprogramminglanguages,suchasBASIC,Pascal,C, C++, Java, and numerous others. These languages are called high-level languagesbecause the programmer does not have to be concernedwith the internal details of theCPU. Whereas an assembler is used to translate an Assembly language program intomachine code (sometimes also called object code or opcode for operation code), high-level languages are translated into machine code by a program called a compiler. Forinstance,towriteaprograminC,onemustuseaCcompilertotranslatetheprogramintomachinelanguage.NextwelookatARMAssemblylanguageformat.

StructureofAssemblylanguage

AnAssemblylanguageprogramconsistsof,amongotherthings,aseriesoflinesofAssembly language instructions. An Assembly language instruction consists of amnemonic,optionallyfollowedbytwoorthreeoperands.Theoperandsarethedataitemsbeingmanipulated,andthemnemonicsarethecommandstotheCPU,tellingitwhattodowiththoseitems.SeeProgram2-4.

Program2-4:SampleofanARMAssemblyLanguageProgram

;ARMAssemblylanguageprogramtoaddsomedataandstoretheSUMinR3.


ENTRY

MOVR1,#0x25;R1=0x25

MOVR2,#0x34;R2=0x34


HEREBHERE

END

AnAssemblylanguageprogramisaseriesofstatements,orlines,whichareeitherAssemblylanguageinstructions,suchasADDandMOV,orstatementscalleddirectives.While instructions tell the CPUwhat to do, directives (also called pseudo-instructions)givedirectionstotheassembler.Forexample,inProgram2-4,whiletheMOVandADDinstructionsarecommandstotheCPU,ENTRYandENDaredirectivestotheassembler.The directive END tells the assembler that it is the end of the code, while ENTRY isbeginningofthecode.

AnAssemblylanguageinstructionconsistsoffourfields:

[label]mnemonic[operands][;comment]

Bracketsindicatethatafieldisoptionalandnotalllineshavethem.Bracketsshouldnotbetypedin.Regardingtheaboveformat,thefollowingpointsshouldbenoted:

1.Thelabelfieldallowstheprogramtorefertoalineofcodebyname.Thelabelfieldcannotexceedacertainnumberofcharacters.Checkyourassemblerfortherule.

2.TheAssemblylanguagemnemonic(instruction)andoperand(s)fieldstogetherperform the real work of the program and accomplish the tasks forwhich theprogramwaswritten.InAssemblylanguagestatementssuchas

MOVR3,#0x55

MOVR2,#0x67


ADDandMOVarethemnemonicsthatproduceopcodes;the“0x55”and“0x67”arethe operands. Instead of a mnemonic and an operand, these two fields could containassembler pseudo-instructions, or directives. Remember that directives do not generateanymachinecode(opcode)andareusedonlybytheassembler,asopposedtoinstructionsthataretranslatedintomachinecode(opcode)fortheCPUtoexecute.InProgram2-4thecommands END and ENTRY are examples of directives. Many of these pseudo-instructionswerediscussedinthelastsection.

3.Thecommentfieldbeginswithasemicoloncommentindicator“;”.Commentsmaybe at the endof a lineoron a lineby themselves.The assembler ignorescomments,but theyare indispensable toprogrammers.Althoughcommentsareoptional, it isrecommendedthattheybeusedtodescribetheprograminawaythatmakesiteasierforsomeoneelsetoreadandunderstand.

4. Noticethelabel“HERE”inthelabelfieldinProgram2-4.IntheB(Branch)statementtheARMistoldtostayinthisloopindefinitely.Ifyoursystemhasamonitor program you do not need this line and should delete it from your

program.Inthenextsectionwewillseehowtocreateaready-to-runprogram.

Note!

Thefirstcolumnofeachlineisalwaysconsideredaslabel.Thus,becarefultopressaTabatthebeginningofeachlinethatdoesnothavelabel;otherwise,yourinstructionisconsideredasalabelandanerrormessagewillappearwhencompiling.

ReviewQuestions

1.Whatisthepurposeofpseudo-instructions?

2._____________aretranslatedbytheassemblerintomachinecode,whereas__________________arenot.

3.Trueorfalse.Assemblylanguageisahigh-levellanguage.

4.Whichofthefollowinginstructionsproducesopcode?Listallthatdo.

(a)MOVR6,#0x25(b)ADDR2,R1,R3(c)END(d)HEREBHERE

5.Pseudo-instructionsarealsocalled___________.

6.Trueorfalse.AssemblerdirectivesarenotusedbytheCPUitself.Theyaresimplyaguidetotheassembler.

7.InQuestion4,whichoneisanassemblerdirective?

Section2.7:AssemblinganARMProgramNowthatthebasicformofanAssemblylanguageprogramhasbeengiven,thenext

questionis:Howitiscreated,assembled,andmadereadytorun?ThestepstocreateanexecutableAssemblylanguageprogram(Figure2-13)areoutlinedasfollows:

Figure2-13:StepstoCreateaProgram

1. FirstweuseatexteditortotypeinaprogramsimilartoProgram2-4.Inthecase of ARM, we can use the Keil IDE, which has a text editor, assembler,simulator, and much more all in one software package. It is an excellentdevelopmentsoftwarethatsupportsalltheARMchipsandisfreeforuniversitystudents.Seewww.keil.comforevaluationversionof thesoftwareforstudents.Manyeditorsorwordprocessorsarealsoavailablethatcanbeusedtocreateoredittheprogram.AwidelyusededitoristheNotepadinWindows,whichcomeswith all Microsoft operating systems. Notice that the editor must be able toproduce an ASCII file. For assemblers, the file names follow the usual DOSconventions, but the source file has the extension “.a” or “.asm”. The “a”extensionforthesourcefileisusedbyanassemblerinthenextstep.

2. The“a”sourcefilecontainingtheprogramcodecreatedinstep1isfedtotheARMassembler.Theassemblerproducesanobjectfile,andalistfile.Theobjectfilehastheextension“.o”,andthelistfilehas“.lst”extension.

3.Theobjectfileplusascriptfileareusedbythelinkertoproducemapfileandhex file. The map file has the extension “.map”, and the hex file has “.hex”extension.The script file is optional and can be replacedwith some commandlineoptions.Aftera successful link, thehex file is ready tobeburned into theARM’sprogramROMandisdownloadedintotheARMchip.

Moreaboutasmandobjectfiles

Theasmfileisalsocalledthesourcefileandmusthavethe“a”or“asm”extension.Asmentioned earlier, this file is created with a text editor such asWindowsNotepad.

Manyassemblerscomewithatexteditor.Theassemblerconvertstheasmfile’sAssemblylanguage instructions intomachine languageandprovides theo (object) file.Theobjectfile,asmentionedearlier,hasan“o”asitsextension.Theobjectfileisusedasinputtoasimulatororanemulator.

Beforewecanassembleaprogramtocreateaready-to-runprogram,wemustmakesurethatitiserrorfree.TheKeiluVisionIDEprovidesuserrormessagesandweexaminethemtoseethenatureofsyntaxerrors.Theassemblerwillnotassembletheprogramuntilallthesyntaxerrorsarefixed.AsampleofanerrormessageisshowninFigure2-14.

Buildtarget‘Target1’

assemblinga1.asm…a1.asm(7):error:A1163E:UnknownopcodeMOVE,expectingopcodeorMacro

Targetnotcreated

Figure2-14:SampleofanErrorMessage

“lst”and“map”files

Themap file shows the labels defined in the program togetherwith their values.ExamineFigure2-15.ItshowstheMapfileofProgram2-4.

MemoryMapoftheimage

ImageEntrypoint:0x00000000

LoadRegionLR_1(Base:0x00000000,Size:0x00000010,Max:0xffffffff,ABSOLUTE)

ExecutionRegionER_RO(Base:0x00000000,Size:0x00000010,Max:0xffffffff,ABSOLUTE)

BaseAddrSizeTypeAttrIdxESectionNameObject

0x000000000x00000010CodeRO1*PROG_2_1a2.o

ExecutionRegionER_RW(Base:0x40000000,Size:0x00000000,Max:0xffffffff,ABSOLUTE)

****Nosectionassignedtothisexecutionregion****

ExecutionRegionER_ZI(Base:0x40000000,Size:0x00000000,Max:0xffffffff,ABSOLUTE)

****Nosectionassignedtothisexecutionregion****

Figure2-15:SampleofaMapFile

Thelst(list)file,whichisoptional,isveryusefultotheprogrammer.Thelistshowsthebinaryandsourcecode;italsoshowswhichinstructionsareusedinthesourcecode,andtheamountofmemorytheprogramuses.SeeFigure2-16.

ARMMacroAssembler Page1

100000000 ;ARMAssemblyLanguageProgramToAddSomeDataandStoretheSUMinR3.

2

00000000

300000000 AREAPROG_2_4,CODE,READONLY

400000000 ENTRY

500000000 E3A01025 MOVR1,#0x25;R1=0x25

600000004 E3A02034 MOVR2,#0x34;R2=0x34

700000008 E0823001 ADDR3,R2,R1;R3=R2+R1

80000000C EAFFFFFE HERE BHERE

900000010 END

Figure2-16:SampleofaListFileforARM

Manyassemblersassumethatthelistfileisnotwantedunlessyouindicatethatyouwant to produce it. These files can be accessed by a text editor such as Notepad anddisplayedonthemonitor,orsenttotheprintertogetahardcopy.Theprogrammerusesthelistandmapfilestolocatesyntaxerror.

There are many different ARM assemblers available for evaluation nowadays. Ifyou use theWindows operating system,ARM IAR IDE andKeil uVision can be used.Theyhavegreatfeaturesandniceenvironments.

ReviewQuestions

1.Trueorfalse.TheARMuVisionIDEandWindowsNotepadtexteditorbothproduceanASCIIfile.

2.Trueorfalse.Theextensionforthesourcefileis“a”.

3.Whichofthefollowingfilescanbeproducedbyatexteditor?

(a)myprog.a(b)myprog.obj(c)myprog.hex(d)myprog.lst

4.Whichofthefollowingfilesisproducedbyanassembler?

(a)myprog.asm(b)myprog.obj(c)myprog.hex(d)myprog.lst

Section2.8:TheProgramCounterandProgramROMSpaceintheARMIn this section we discuss the role of the program counter (PC) in executing a

programandshowhowthecodeisfetchedfromROMandexecuted.Wewillalsodiscusstheprogram(code)ROMspaceforvariousARMfamilymembers.Finally,weexaminetheHarvardarchitectureoftheARM.

ProgramcounterintheARM

ThemostimportantregisterinARMisthePC(programcounter).AswementionedearliertheR15istheprogramcounterinARM.TheprogramcounterisusedbytheCPUto point to the address of the next instruction to be executed. As the CPU fetches theopcode from the program ROM, the program counter is incremented automatically topointtothenextinstruction.Thewidertheprogramcounter,themorememorylocationsaCPUcanaccess.Thatmeansthata32-bitprogramcountercanaccessamaximumof4G(232=4G)bytesprogrammemorylocations.

Inmostmicrocontrollerseachmemorylocationis1bytewide.Inthecaseofa32-bit program counter, the memory space is 4G (232 = 4G) bytes, which occupies the0x00000000 –0xFFFFFFFF address range since it is byte-addressable. The programcounterintheARMfamilyis32bitswide.ThismeansthattheARMfamilycanaccessaddresses00000000to0xFFFFFFFF,atotalof4Gbyteofmemoryspacelocations.The4GbytesofmemoryspacelocationsareallocatedamongtheI/Operipherals,SRAM,andFlashROM.However,atthetimeofthiswriting,noneofthemembersoftheARMfamilyhavetheentire4Gbytesofmemoryspacepopulatedwithon-chipperipherals,SRAM,andFlash.

PoweruplocationforARM

Onequestionthatwemustaskaboutanymicrocontroller(ormicroprocessor)is:“Atwhat address does the CPUwake upwhen power is applied?” Eachmicroprocessor isdifferent.InthecaseoftheARMmicrocontrollers(thatis,allmembersregardlessofthefamilyandvariation),themicrocontrollerwakesupatmemoryaddress0x00000000whenit ispoweredup.BypoweringupwemeanapplyingVCCoractivatingRESETpin. Inotherwords,when theARM ispoweredup, thePC (programcounter)has thevalueof0x00000000init.ThismeansthatitexpectsthefirstopcodetobestoredatROMaddress0x00000000.For this reason, in theARMsystem, the first opcodemustbeburned intomemorylocation0x00000000ofprogramROMbecausethisiswhereitlooksforthefirstinstruction when it is booted. Next we discuss the step-by-step action of the programcounterinfetchingandexecutingasampleprogram.

PlacingcodeinprogramROM

To get a better understanding of the role of the program counter in fetching andexecutingaprogram,weexaminetheactionoftheprogramcounteraseachinstructionisfetchedandexecuted.First,weexamineoncemorethelistfileofthesampleprogramandshowhow the code is placed into theFlashROMof theARMchip.Aswe can see inFigure2-16,theopcodeandoperandforeachinstructionarelistedontheleftsideofthe

listfile.

AftertheprogramisburnedintoROMofanARMchip,theopcodeandoperandareplacedinROMmemorylocationsstartingat0x00000000asshownintheProgram2-4listfile.

Thelistshowsthataddress00000000containsE3A01025,whichistheopcodeformovingavalueintoregister,andtheoperand(inthiscase0x25)tobemovedtoanotheroperand (in this caseR1). Therefore, the instruction “MOVR1, #0x25” has amachinecodeof“E3A01025”,whereE3Aistheopcodeand01025istheoperands.SeeFigure2-16. Similarly, the machine code “E3A02034” is located in ROM memory location00000004 and represents the opcode and the operands for the instruction “MOV R2,#0x34”. In the same way, machine code “E0813002” is located in memory location00000008 and represents the opcode and the operand for the instruction “ADDR3,R1,R2”. The opcode for “HERE B HERE” and its target address are located inlocations 0000000C. Notice that all the instructions in this program are 4-byteinstructions.

Executingaprograminstructionbyinstruction

Assuming that the above program is burned into the ROM of anARM chip, thefollowingisastep-by-stepdescriptionoftheactionoftheARMuponapplyingpowertoit:

1. WhentheARMispoweredup,thePC(programcounter)has00000000andstartstofetchthefirstinstructionfromlocation00000000oftheprogramROM.InthecaseoftheaboveprogramthefirstcodeisE3A01025,whichisthecodeformovingoperand0x25 toR1.Uponexecuting thecode, theCPUplaces thevalueof25inR1.Nowoneinstructionisfinished.Theprogramcounterisnowincremented to point to 00000004 (PC = 00000004), which contains codeE3A02034,themachinecodefortheinstruction“MOVR2,#0x34”.

2. UponexecutingthemachinecodeE3A02034,thevalue0x34isloadedtoR2.Theprogramcounterisincrementedto00000008.

3. ROM location 00000008 has the machine code for instruction “ADDR3,R2,R1”.ThisinstructionisexecutedandnowPC=0000000C.

4. Now PC = 0000000C points to the next instruction, which is “HERE BHERE”.Aftertheexecutionofthisinstruction,PC=0000000C.Thiskeepstheprograminaninfiniteloop.

The fact that the program counter points at the next instruction to be executedexplains why some microprocessors (notably the x86) call the program counter theinstructionpointer.

The steps of running a code in ARM is slightly different from what mentionedabovebecauseoftheuseofpipelineinARMarchitecture.WewillexaminepipelineslaterinChapter 7.AlsoinARMCortexMchipsthepower-onResetlocationvalueisdifferentaswewillseeinChapter8.

InstructionformationoftheARM

RecallthattheARMinstructionsarealways4-byte.Nextweexploretheinstructionformationforafewoftheinstructionswehaveusedinthischapter.ThisshouldgiveyousomeinsightintotheinstructionsoftheARM.

ADDinstructionformation

TheADDisa4-byte(32-bit)instruction.SeeFigure2-17.Ofthe32bits,thefirst4bitsaresetasidefortheconditionfieldwhichwillbediscussedmoreinChapter4.Bits26and27arealways0inADDinstruction.Bit25whichisindicatedbyIdefinesthetypeofsecondoperand.Aswementionedbefore,thesecondoperandcanbeeitheraregisteroranimmediate value between 0–255. If I = 1, the second operand is an immediate valueotherwiseitshouldbearegister.Bits24to21aretheoperationcodeofADDinstruction.Whenthesebitsare0100theCPUknowsthatitshouldruntheADDinstruction.Bit20whichisindicatedbySdefineseithertheinstructionshouldupdatetheflagbitsornot.InADDinstructionthisbitiszerowhileinADDSinstructionitisone.Bits19to16definethefirstoperand(Rn).ItcanbearegisternumberbetweenR0toR15.Likewise,bits15to12definethedestinationregister(Rd).Finally,bits11to0definethesecondoperand.Aswementionedbefore,bit25(I)definesthateitherthesecondoperandshouldbearegisteroranimmediatevalue.WewilldiscussthebitsofOperand2inmoredetailinChapter3.

Figure2-17:ADDInstructionFormation

SUBinstructionformation

TheSUBisa4-byte(32-bit)instruction.Ofthe32bits,thefirst4bitsaresetasidefor theconditionfield.Bits26and27arealways0 inSUBinstruction.Bit25which isindicated by I defines the type of second operand. If I = 1, the second operand is animmediatevalueotherwiseitshouldbearegister.Bits24to21aretheoperationcodeofSUB instruction.When these bits are 0010 theCPUknows that it should run theSUBinstruction.Bit20whichisindicatedbySdefineseithertheinstructionshouldupdatetheflagbitsornot.InSUBinstructionthisbitiszerowhileinSUBSinstructionitisone.Bits19 to16define the first operand (Rn). It canbe a registernumberbetweenR0 toR15.Likewise,bits15to12definethedestinationregister(Rd).Finally,bits11to0definethesecondoperand.Aswementionedbefore,bit25(I)definesthateitherthesecondoperandshouldbearegisteroranimmediatevalue.TheformationofSUBinstructionisshowninFigure2-18.

Figure2-18:SUBInstructionFormation

Generalformationofdataprocessinginstructions

Asyoumayhavenoticed,theformationofADDandSUBinstructionsarethesame

exceptbits24to21whicharecalledtheoperationcodeandtellstheCPU whatinstructionitshouldexecute.InARM,allofthedataprocessinginstructionshavethesameformat.Figure2-19showsthegeneralformationofdataprocessinginstructions.Eachofthedataprocessing instructionhasauniqueoperationcode. InTable2-10you see the listof alldataprocessinginstructionsandtheiropcodes.

Figure2-19:GeneralFormationofDataProcessingInstructions

Branchinstructionformation

TheBisa4-byte(32-bit)instruction.SeeFigure2-20.Ofthe32bits,thefirst4bitsaresetasidefortheconditionfieldwhichwillbediscussedmoreinChapter4.Bits27to25arealways101inBinstruction.Bit24whichisindicatedbyLiszeroinBinstructionandoneinBLinstruction.Bits23to0giveusbranchtargetlocationrelativetothecurrentaddress.ThesewillbediscussedfurtherinChapter4.

Figure2-20:BranchInstructionFormation

ROMwidthintheARM

Aswehaveseenso far in this section,each locationof theaddressspaceholds1byte. Ifwe have 32 address lines, thiswill give us 232 locations,which is 4Gbytes ofmemory locationwith an addressmap of 0x00000000–0xFFFFFFFF. To bring inmoreinformation(codeordata) intotheCPUineachbuscycle,ARMincreasedthewidthofthedatabusto32bits.Inotherwords,theARMisword-addressable.Incontrast,the8051CPUisbyte-addressableonly.Inasense,thedatabusisliketrafficlanesonthehighwaywhereeachlaneis8bitswide.Themorelanes,themoreinformationwecanbringintotheCPUforprocessing.FortheARM,theinternaldatabusbetweenthecodememoryandtheCPUis32bitswide,asshowninFigure2-21.Therefore,the4Gmemoryspaceisshownas1G×32usinga32-bitworddatabussize.ThewideningofthedatapathbetweentheprogramROMand theCPU is anotherway inwhich theARMdesigners increased theprocessingpoweroftheARMfamily.Anotherreasontomakethecodememory32bitswideistomatchitwiththeinstructionwidthoftheARMbecausealloftheinstructionsare4-bytewide.Thisway, theCPUbrings inan instructionfrommemoryevery time itmakesatriptotheprogrammemory.Thatwillmakeinstructionfetchasinglecycle,aswewillseeintheChapter4wheninstructiontimingisdiscussed.

TheARMdesignershavemadeallinstructionsfixedat4-byte;thereareno1-byte,2-byte,or3-byteinstructions,asisthecasewiththex86and8051chips.ThisispartoftheRISCarchitecturalphilosophy,whichwill bediscussed later in this chapter. ItmustalsobenotedthatthedatamemorySRAMintheARMmicrocontrollerarealso4-byte.

HarvardandvonNeumannarchitecturesintheARM

In Chapter 0,we discussedHarvard andVonNeumann architecture.ARM 9 andnewerarchitecturesuseHarvardarchitecture,whichmeans that thereare separatebusesforthecodeandthedatamemory.SeeFigure2-21.TheprogrambusprovidesaccesstotheprogrammemorywhereasthedatabusisusedforbringingdatatotheCPU.

Figure2-21:Harvardvs.VonNeumannArchitecture

AswecanseeinFigure2-21,intheprogrambus,thedatabusis32bitswideandthe address bus is as wide as the PC register to enable the CPU to address the entireprogrammemory.

InSections2-2and2-3,we learnedaboutdatamemoryspaceandhowtouse theSTR and LDR instructions. When the CPU wants to execute the “LDR Rd,[Rx]”instruction, itputsRxon theaddressbusof thedatabus,and receivesdata through thedatabus.Forexample,toexecute“LDRR2,[R5]”,assumingthatR5=0x40000200,theCPUputsthevalueofR5ontheaddressbus.Thelocation0x40000200isintheSRAM(seeFigure2-4).Thus, theSRAMputs thecontentsof location0x40000200onthedatabus.TheCPUgetsthecontentsoflocation0x40000200throughthedatabusandbringsintoCPUandputsitinR2.

The “STR Rx,[Rd]” instruction is executed similarly. The CPU puts Rd on theaddressbusandthecontentsofRxonthedatabus.Thememorylocationwhoseaddressisontheaddressbusreceivesthecontentsofdatabus.

Littleendianvs.bigendianwar

ExaminetheplacingofthecodeintheARMprogrammemory,showninFigure2-22.The lowbyte goes to the lowmemory location, and the highbyte goes to the highmemory address. This convention is called little endian to contrast it with big endian.Figure2-23showsstoring thesamedatausingbigendianconvention.Theoriginof thetermsbigendianandlittleendianisfromanargumentinaGulliver’sTravelsstoryoverhowaneggshouldbeopened:fromthebigendorthelittleend.Inthebigendianmethod,thehighbytegoestothelowaddress,whereasinthelittleendianmethod,thehighbytegoestothehighaddressandthelowbytetothelowaddress.AllIntelmicroprocessorsand

many microcontrollers use the little endian convention. Freescale (formerly Motorola)microprocessors,alongwithsomemainframes,usebigendian.Thedifferencemightseemastrivialaswhethertobreakaneggfromthebigendorthelittleend,butitisanuisanceinconvertingsoftwarefromonecamptoberunonacomputeroftheothercamp.Manymicroprocessors,includingtheARM,letthesoftwaredesignerchooselittleendianorbigendianconvention.

Figure2-22:ARMProgramMemoryContentsforProgram2-4ListFile(LittleEndian)

Figure2-23:BigEndianConvention

ReviewQuestions

1.IntheARM,theprogramcounteris______bitswide.

2.Trueorfalse.EverymemberoftheARMfamilywakesupatmemory0x00000000whenitispoweredup.

3.AtwhatROMlocationdowestorethefirstopcodeofanARMprogram?

4.Trueorfalse.AlltheinstructionsintheARMare4-byteinstructions.

5.Trueorfalse.ARM9andnewerarchitecturesusevonNeumannarchitecture.

6.Trueorfalse.ARM7andolderarchitecturesusevonNeumannarchitecture.

Section2.9:SomeARMAddressingModesTheCPUcanaccessoperands(data)invariousways,calledaddressingmodes.The

number of addressing modes is determined when the microprocessor is designed andcannotbechanged.Usingadvancedaddressingmodestheaccessingofdifferentdatatypesanddatastructures(e.g.arrays,pointers,classes)arediscussedinChapter6.SomeofthesimpleARMaddressingmodesare:

1.register

2.immediate

3.registerindirect(indexedaddressingmode)

Registeraddressingmode

The register addressingmode involves the use of registers to hold the data to bemanipulated.Memoryisnotaccessedwhenthisaddressingmodeisexecuted;therefore,itisrelativelyfast.SeeFigure2-24.

Figure2-24:RegisterAddressingMode

Examplesofregisteraddressingmodeareasfollow:

MOVR6,R2;copythecontentsofR2intoR6

ADDR1,R1,R3;addthecontentsofR3tocontentsofR1

SUBR7,R7,R2;subtractR2fromR7

Immediateaddressingmode

Intheimmediateaddressingmode, thesourceoperandisaconstant.In immediateaddressingmode, as the name implies, when the instruction is assembled, the operandcomes immediately after the opcode. For this reason, this addressing mode executesquickly.SeeFigure2-25.Examples:

MOVR9,#0x25;move0x25intoR9

MOVR3,#62;loadthedecimalvalue62intoR3

ADDR6,R6,#0x40;add0x40toR6

Figure2-25:ImmediateAddressingMode

Inthefirsttwoaddressingmodes,theoperandsareeitherinsidethemicroprocessor

ortaggedalongwiththeinstruction.Inmostprograms,thedatatobeprocessedisofteninsomememorylocationoutsidetheCPU.Therearemanywaysofaccessingthedatainthedatamemoryspace.Thefollowingdescribesoneofthemethods.

RegisterIndirectAddressingMode(Indexedaddressingmode)

Intheregisterindirectaddressingmode,theaddressofthememorylocationwheretheoperandresidesisheldbyaregister.SeeFigure2-26.Forexample:

STRR5,[R6];moveR5intothememorylocation

;pointedtobyR6

LDRR10,[R3];moveintoR10thecontentsofthe

;memorylocationpointedtobyR3.

Figure2-26:RegisterIndirectAddressingMode

SampleUsage:Registerindirectaddressingmode

Usingregisterindirectaddressingmodewecanimplementthedifferentpointers.Sincetheregistersare32-bittheycanaddresstheentirememoryspace.HereyouseeasimplecodeinCanditsequivalentinAssembly:

CLanguage:char*ourPointer;

ourPointer=(char*)0x12456;//Pointtolocation12456

*ourPointer=25;//store25inlocation0x12456

ourPointer++;//pointtonextlocation

AssemblyLanguage:LDRR2,=0x12456;pointtolocation0x12456

MOVR0,#25;R0=25

STRBR0,[R2];storeR0inlocation0x12456

ADDR2,R2,#1;incrementR2topointtonextlocation

Dependingonthedatatypethatthepointerpointsto,STR/LDR,STRH/LDRH,orSTRB/LDRBmightbeused.Intheaboveexample,sinceitpointstochar(whichis8-bit)STRBisused.

SeeChapter6formoreadvancedaddressingmodes.

ReviewQuestions

1.CantheARMprogrammermakeupnewaddressingmodes?

2.Whichregisterscanbeusedfortheregisterindirectaddressingmode?

3.Whereisthedatalocatedinimmediateaddressingmode?

Section2.10:RISCArchitectureinARMThere are three ways available to microprocessor designers to increase the

processingpoweroftheCPU:

1. Increasetheclockfrequencyof thechip.Onedrawbackof thismethodisthatthehigherthefrequency,themorepowerandheatdissipation.Powerandheatdissipationisespeciallyaproblemforhand-helddevices.

2. UseHarvardarchitecturebyincreasingthenumberofbusestobringmoreinformation(codeanddata)intotheCPUtobeprocessed.Whileinthecaseofx86 and other general purpose microprocessors this architecture is veryexpensiveandunrealistic,intoday’smicrocontrollersthisisnotaproblem.AswesawinSection2.8,someoftheARMchipshaveHarvardarchitecture.

3. Change the internalarchitectureof theCPUandusewhat iscalledRISCarchitecture.

ARMhasusedall threemethods to increase theprocessingpowerof theARMmicrocontrollers.InthissectionwediscussthemeritsofRISCarchitecture.

In theearly1980s,acontroversybrokeout in thecomputerdesigncommunity,butunlikemostcontroversies, itdidnotgoaway.Since the1960s, inallmainframesandminicomputers,designersputasmanyinstructionsastheycouldthinkofintotheCPU.Someoftheseinstructionsperformedcomplextasks.Anexampleisaddingdatamemory locations and storing the sum into memory. Naturally, microprocessordesignersfollowedtheleadofminicomputerandmainframedesigners.Becausethesemicroprocessorsused sucha largenumberof instructions,manyofwhichperformedhighly complex activities, they came to be known as CISC (complex instruction setcomputer) processors. According to several studies in the 1970s, many of thesecomplex instructions etched into CPUs were never used by programmers andcompilers. The huge cost of implementing a large number of instructions (some ofthem complex) into the microprocessor, plus the fact that a good portion of thetransistorsonthechipareusedbytheinstructiondecoder,madesomedesignersthinkofsimplifyingandreducingthenumberofinstructions.Asthisconceptdeveloped,theresultingprocessorscametobeknownasRISC(reducedinstructionsetcomputer).

FeaturesofRISC

The following are some of the features ofRISC as implemented by theARMmicrocontroller.

Feature1

RISCprocessorshaveafixedinstructionsize.InaCISCmicroprocessorssuchasthex86,instructionscanbe1,2,3,oreven5bytes.Forexample,lookatthefollowinginstructionsinthex86:

CLRC;clearCarryflag,a1-byteinstruction

ADDAccumulator,#mybyte;a2-byteinstruction

LJMPtarget_address;a5-byteinstruction

This variable instruction size makes the task of the instruction decoder verydifficult because the size of the incoming instruction is never known. In a RISCarchitecture, the size of all instructions is fixed. Therefore, theCPU can decode theinstructionsquickly.This is likeabricklayerworkingwithbricksof thesamesizeasopposed tousingbricksofvariablesizes.Ofcourse, it ismuchmoreefficient tousebricks of the same size. In the last section we saw how the ARM uses 4-byteinstructions and if not all the 32 bits are needed to form the instruction it fillswithzeros.

Feature2

One of the major characteristics of RISC architecture is a large number ofregisters.AllRISCarchitectureshaveat least8or16registers.Ofthese16registers,onlya fewareassigned toadedicatedfunction.Oneadvantageofa largenumberofregistersisthatitavoidstheneedforalargestacktostoreparameters.AlthoughastackisimplementedonaRISCprocessor,itisnotasessentialasinCISCbecausesomanyregistersareavailable.InARMtheuseofalargenumberofgeneralpurposeregisters(GPRs)satisfiesthisRISCfeature.ThestackfortheARMiscoveredinChapter6.

Feature3

RISCprocessorshavea small instruction set.RISCprocessorshaveonlybasicinstructions such as ADD, SUB, MUL, LOAD, STORE, AND, OR, EOR, CALL,JUMP,andsoon.ThelimitednumberofinstructionsisoneofthecriticismsleveledattheRISCprocessorbecauseitmakesthejobofAssemblylanguageprogrammersmuchmoretediousanddifficultcomparedtoCISCAssemblylanguageprogramming.Thisisone reason that RISC is used more commonly in high-level language environmentssuchastheCprogramminglanguageratherthanAssemblylanguageenvironments.ItisinterestingtonotethatsomedefendersofCISChavecalledit“completeinstructionsetcomputer”insteadof“complexinstructionsetcomputer”becauseithasacompletesetofeverykindofinstruction.Howmanyoftheseinstructionsareusedandhowoftenisanothermatter.The limitednumberof instructions inRISC leads toprograms thatare large. Although these programs can use more memory, this is not a problembecausememoryischeap.Beforetheadventofsemiconductormemoryinthe1960s,however, CISC designers had to pack as much action as possible into a singleinstruction toget themaximumbangfor theirbuck. In theARMwehavearound50instructions. We will examine more of the instruction set for the ARM in futurechapters.

Feature4

At this point, one might ask, with all the difficulties associated with RISCprogramming, what is the gain? The most important characteristic of the RISCprocessoristhatmorethan99%ofinstructionsareexecutedwithonlyoneclockcycle,incontrasttoCISCinstructions.Evensomeofthe1%oftheRISCinstructionsthatareexecuted with two clock cycles can be executed with one clock cycle by juggling

instructionsaround(codescheduling).CodeschedulingisdiscussedinChapter7.WewillexaminetheinstructioncycletimeandpipeliningoftheARMinChapter7.

Feature5

RISCprocessorshaveseparatebusesfordataandcode.Inallthex86processors,likeallotherCISCcomputers,thereisonesetofbusesfortheaddress(e.g.,A0–A31inthe 80386) and another set of buses for data (e.g., D0–D31 in the 80386) carryingopcodes and operands in and out of the CPU. To access any section of memory,regardlessofwhetheritcontainscodeordataoperands,thesameaddressbusanddatabusareused.InmanyRISCprocessors, therearefoursetsofbuses:(1)asetofdatabusesforcarryingdata(operands)inandoutoftheCPU,(2)asetofaddressbusesforaccessingthedata,(3)asetofbusestocarrytheopcodes,and(4)asetofaddressbusesto access the opcodes. The use of separate buses for code and data operands iscommonlyreferred toasHarvardarchitecture.WeexaminedtheHarvardarchitectureoftheARMintheprevioussection.

Feature6

Because CISC has such a large number of instructions, each with so manydifferentaddressingmodes,microinstructions(microcode)areusedtoimplementthem.TheimplementationofmicroinstructionsinsidetheCPUemploysmorethan40–60%oftransistors inmanyCISCprocessors.RISCinstructions,however,dueto thesmallsetof instructions, are implementedusing thehardwiremethod.HardwiringofRISCinstructionstakesnomorethan10%ofthetransistors.

Feature7

RISC uses load/store architecture. In CISC microprocessors, data can bemanipulatedwhile it is still inmemory. For example, in instructions such as “ADDReg,Memory”, themicroprocessormust bring the contents of the external memorylocationintotheCPU,addittothecontentsoftheregister,thenmovetheresultbacktotheexternalmemorylocation.Theproblemistheremightbeadelayinaccessingthedatafromexternalmemory.Thenthewholeprocesswouldbestalled,preventingotherinstructions fromproceeding in thepipeline. InRISC,designersdidawaywith thesekindsof instructions. InRISC, instructionscanonly load fromexternalmemory intoregisters or store registers into externalmemory locations.There is nodirectwayofdoingarithmeticand logicoperationsbetweena register and thecontentsofexternalmemory locations. All these instructions must be performed by first bringing bothoperands into the registers inside the CPU, then performing the arithmetic or logicoperation,andthensendingtheresultbacktomemory.Thisideawasfirstimplementedby the Cray 1 supercomputer in 1976 and is commonly referred to as load/storearchitecture. In the last section, we saw that the arithmetic and logic operations arebetweentheGPRsregisters,butnoneinvolvesamemorylocation.Forexample,thereisno“ADDR1,RAM-Loc”instructioninARM.

In concluding this discussion of RISC processors, it is interesting to note thatRISC technologywasexploredby thescientistsat IBMin themid-1970s,but itwas

DavidPattersonof theUniversityofCaliforniaatBerkeleywho in1980brought themeritsofRISCconcepts totheattentionofcomputerscientists.Itmustalsobenotedthat in recent years CISC processors such as the Pentium have used some RISCfeatures in their design. This was the only way they could enhance the processingpowerof thex86processorsandstaycompetitive.Ofcourse, theyhad touse lotsoftransistorstodothejob,becausetheyhadtodealwithalltheCISCinstructionsofthex86processorsandthelegacysoftwareofDOS/Windows.

ReviewQuestions

1.WhatdoRISCandCISCstandfor?

2.Trueorfalse.TheCISCarchitectureexecutesthevastmajorityofitsinstructionsin2,3,ormoreclockcycles,whileRISCexecutestheminoneclock.

3.RISCprocessorsnormallyhavea_____(large,small)numberofgeneral-purposeregisters.

4.Trueorfalse.Instructionssuchas“ADDR16,ROMmemory”donotexistinRISCmicroprocessorssuchastheARM.

5.HowmanyinstructionsofARMare32-bitwide?

6.Trueorfalse.WhileCISCinstructionsareofvariablesizes,RISCinstructionsareallthesamesize.

7.WhichofthefollowingoperationsdonotexistfortheADDinstructioninRISC?

(a)registertoregister(b)immediatetoregister(c)memorytomemory

8. Trueorfalse.Harvardarchitectureusesthesameaddressanddatabusestofetchbothcodeanddata.

Section2.11:ViewingRegistersandMemorywithARMKeilIDETheARMmicroprocessorhasgreattoolsandsupportsystems,manyofthemfree

orinexpensive.ARMKeiluVisionisanassemblerandsimulatorprovidedforfreebyKeil Corporation and can be downloaded from the www.keil.com website. Seehttp://www.MicroDigitalEd.com for tutorials on how to use the Keil ARM uVisionassemblerandsimulator.

ManyassemblersandCcompilerscomewithasimulator.Simulatorsallowustoview the contents of registers and memory after executing each instruction (single-stepping). It is strongly recommended to use a simulator to single-step some of theprograms in this chapter and future chapters. Single-stepping a program with asimulatorgivesusadeeperunderstandingofmicrocontrollerarchitecture,inadditiontothefactthatwecanuseittofindtheerrorsinourprograms.

Figures2-27and2-28showscreenshots forARMsimulators fromARMKeiluVision.

Figure2-27:KeiluVisionScreenshot


Figure2-28:KeiluVisionScreenshot

ProblemsSection2.1:TheGeneralPurposeRegistersintheARM

1.ARMisa(n)_____-bitmicroprocessor.

2.Thegeneralpurposeregistersare_____bitswide.

3.ThevalueinMOVR2,#valueis_____bitswide.

4.ThelargestnumberthatanARMGPRregistercanhaveis_____inhex.

5.Whatistheresultofthefollowingcodeandwhereisitkept?

MOVR2,#0x15

MOVR1,#0x13

ADDR2,R1,R2

6.Whichofthefollowingsis(are)illegal?

(a)MOVR2,#0x50000(b)MOVR2,#0x50(c)MOVR1,#0x00

(d)MOVR1,255(e)MOVR17,#25(f)MOVR23,#0xF5

(g)MOV123,0x50

7.Whichofthefollowingis(are)illegal?

(a)ADDR2,#20,R1(b)ADDR1,R1,R2(c)ADDR5,R16,R3


MOVR9,#0x25

ADDR8,R9,#0x1F


MOVR1,#0x15

ADDR6,R1,#0xEA

10.Trueorfalse.Wehave32generalpurposeregistersintheARM.

Section2.2:TheARMMemoryMap

11.Trueorfalse.R13andR14arespecialfunctionregisters.

12.Trueorfalse.Theperipheralregistersaremappedtomemoryspace.

13.Trueorfalse.TheOn-chipFlashisthesamesizeinallmembersofARM.

14.Trueorfalse.TheOn-chipdataSRAMisthesamesizeinallmembersofARM.

15.WhatisthedifferencebetweentheEEPROManddataSRAMspaceintheARM?

16.CanwehaveanARMchipwithnoEEPROM?

17.CanwehaveanARMchipwithnodataRAM?

18.WhatisthemaximumnumberofbytesthattheARMcanaccess?

19.Findtheaddressofthelastlocationofon-chipFlashforeachofthefollowing,assumingthefirstlocationis0:

(a)ARMwith32KB(b)ARMwith8KB

(c)ARMwith64KB(d)ARMwith16KB

(e)ARMwith128KB(f)ARMwith256KB

20.Showthelowestandhighestvalues(inhex)thattheARMprogramcountercantake.

21.AgivenARMhas0x7FFFastheaddressofthelastlocationofitson-chipROM.Whatisthesizeofon-chipFlashforthisARM?

22.RepeatQuestion21for0x3FFF.

23.Findtheon-chipprogrammemorysizeinKfortheARMchipwiththefollowingaddressranges:

(a)0x0000–0x1FFF(b)0x0000–0x3FFF

(c)0x0000–0x7FFF(d)0x0000–0xFFFF

(e)0x0000–0x1FFFF(f)0x00000–0x3FFFF

24.Findtheon-chipprogrammemorysizeinKfortheARMchipswiththefollowingaddressranges:

(a)0x00000–0xFFFFFF(b)0x00000–0x7FFFF

(c)0x00000–0x7FFFFF(d)0x00000–0xFFFFF

(e)0x00000–0x1FFFFF(f)0x00000–0x3FFFFF

Section2.3:LoadandStoreInstructionsinARM

25.Showasimplecodetostorevalues0x30and0x97intolocations0x20000015and0x20000016,respectively.

26.Showasimplecodetoloadthevalue0x55intolocations0x20000030–0x20000038.

27.Trueorfalse.WecannotloadimmediatevaluesintothedataSRAMdirectly.

28.Showasimplecodetoloadthevalue0x11intolocations0x20000010–0x20000015.

29.RepeatProblem28,exceptloadthevalueintolocations0x20000034–0x2000003C.

Section2.4:ARMCPSR(CurrentProgramStatusRegister)

30.Thestatusregisterisa(n)______-bitregister.

31.WhichbitsofthestatusregisterareusedfortheCandZflagbits,respectively?

32.WhichbitsofthestatusregisterareusedfortheVandNflagbits,respectively?

33.IntheADDinstruction,whenisCraised?

34.IntheADDinstruction,whenisZraised?

35.WhatisthestatusoftheCandZflagsafterthefollowingcode?

LDRR0,=0xFFFFFFFF

LDRR1,=0xFFFFFFF1

ADDSR1,R0,R1

36.FindtheCflagvalueaftereachofthefollowingcodes:

(a)LDRR0,=0xFFFFFF54

LDRR5,=0xFFFFFFC4

ADDSR2,R5,R0

(b)MOVR3,#0

LDRR6,=0xFFFFFFFF

ADDSR3,R3,R6

(c)LDRR3,=0xFFFFFFFF

LDRR8,=0xFFFFFF05

ADDSR2,R3,R8

37.Writeasimpleprograminwhichthevalue0x55isadded5times.

Section2.5:ARMDataFormatandDirectives

38.Statethevalue(inhex)usedforeachofthefollowingdata:

MYDAT_1EQU55

MYDAT_2EQU98

MYDAT_3EQU‘G’

MYDAT_4EQU0x50

MYDAT_5EQU200

MYDAT_6EQU‘A’

MYDAT_7EQU0xAA

MYDAT_8EQU255

MYDAT_9EQU2_10010000

MYDAT_10EQU2_01111110

MYDAT_11EQU10

MYDAT_12EQU15

39.Statethevalue(inhex)foreachofthefollowingdata:

DAT_1EQU22

DAT_2EQU0x56

DAT_3EQU2_10011001

DAT_4EQU32

DAT_5EQU0xF6

DAT_6EQU2_11111011

40.Showasimplecodetoloadthevalue0x10102265intolocations0x40000030–0x4000003F.

41.Showasimplecodeto(a)loadthevalue0x23456789intolocations0x40000060–0x4000006F,and(b)addthemtogetherandplacetheresultinR9asthevaluesareadded.UseEQUtoassignthenamesTEMP0–TEMP3tolocations0x40000060–0x4000006F.

Section2.6:IntroductiontoARMAssemblyProgrammingand

Section2.7:AssemblinganARMProgram

42.Assemblylanguageisa________(low,high)-levellanguagewhileCisa________(low,high)-levellanguage.

43.OfCandAssemblylanguage,whichismoreefficientintermsofcodegeneration(i.e.,theamountofprogrammemoryspaceituses)?

44.Whichprogramproducestheobjfile?

45.Trueorfalse.Thesourcefilehastheextension“asm”.

46.Trueorfalse.Thesourcecodefilecanbeanon-ASCIIfile.

47.Trueorfalse.EverysourcefilemusthaveEQUdirective.

48.DotheEQUandENDdirectivesproduceopcodes?

49.Whyarethedirectivesalsocalledpseudocode?

50.Thefilewiththe_______extensionisdownloadedintoARMFlashROM.

51.GivethreefileextensionsproducedbyARMKeil.

Section2.8:TheProgramCounterandProgramROMSpaceintheARM

52.EveryARMfamilymemberwakesupataddress_________whenitispoweredup.

53.Aprogrammerputsthefirstopcodeataddress0x100.Whathappenswhenthe

microcontrollerispoweredup?

54.ARMinstructionsare_______bytes.

55.Writeaprogramtoaddeachofyour5-digitIDtoaregisterandplacetheresultintomemorylocation0x4000100.UsetheprogramlistingtoshowtheFlashmemoryaddressesandtheircontents.

56.Showtheplacementofdatainfollowingcode:

LDRR1,=0x22334455

LDRR2,=0x20000000

STRR1,[R2]

Usea)littleendianandb)bigendian.

57.Showtheplacementofdatainfollowingcode:

LDRR1,=0xFFEEDDCC

LDRR2,=0x2000002C

STRR1,[R2]

Usea)littleendianandb)bigendian.

58.HowwideisthememoryintheARMchip?

59.HowwideisthedatabusbetweentheCPUandtheprogrammemoryintheARM7chip?

60.In“ADDRd,Rn,operand2”,explainhowmanybitsaresetasideforRdandhowitcoverstheentireGPRsintheARMchip.

Section2.9:SomeARMAddressingModes

61.Givetheaddressingmodeforeachofthefollowing:

(a)MOVR5,R3 (b)MOVR0,#56

(c)LDRR5,[R3] (d)ADDR9,R1,R2

(e)LDRR7,[R2] (f)LDRBR1,[R4]

62.Showthecontentsofthememorylocationsaftertheexecutionofeachinstruction.

(a)LDRR2,=0x129F

LDRR1,=0x1450

LDRR2,[R1]

(b)LDRR4,=0x8C63

LDRR1,=0x2400

LDRHR4,[R1]

0x1450=(…….) 0x2400=(…….)

0x1451=(…….) 0x2401=(…….)

Section2.10:RISCArchitectureinARM

63.WhatdoRISCandCISCstandfor?

64.In______(RISC,CISC)architecturewecanhave1-,2-,3-,or4-byteinstructions.

65.In______(RISC,CISC)architectureinstructionsarefixedinsize.

66.In______(RISC,CISC)architectureinstructionsaremostlyexecutedinoneortwocycles.

67.In______(RISC,CISC)architecturewecanhaveaninstructiontoADDaregistertoexternalmemory.

68.Trueorfalse.MostinstructionsinCISCareexecutedinoneortwocycles.


1.MOVR2,#0x34

2.

MOVR1,#0x16

MOVR2,#0xCD

ADDR1,R1,R2

or

MOVR1,#0x16

ADDR1,R1,#0xCD

3.False

4.FFinhexand255indecimal

5.32

Section2.2

1.True

2.general-purposeregisters

3.32

4.Specialfunctionregisters(SFRs)

5.32

Section2.3

1.True

2.

MOVR1,#0x20

MOVR2,#0x95

STRBR2,[R1]

3.STRR2,[R8]

4.

MOVR1,#0x20

LDRR4,[R1]

5.FFinhexor255indecimal

6.R6

7.Itcopiesthelower8bitsofR1intolocationpointedtobyR2.

8.FFFFFFFFinhexor4,294,967,295indecimal(232-1)

Section2.4

1.CPSR(currentprogramstatusregister)

2.32bits

3.

Hex Binary

FFFFFF9F 11111111111111111111111110011111

+00000061 +00000000000000000000000001100001

100000000 100000000000000000000000000000000

ThisleadstoC=1andZ=1.

4.

Hex Binary

00000022 00000000000000000000000000100010

+00000022 +00000000000000000000000000100010

000000000 00000000000000000000000001000100

ThisleadstoZ=0.

5.

Hex Binary

00000067 00000000000000000000000001100111

+00000099 +00000000000000000000000010011001

00000100 00000000000000000000000100000000

ThisleadstoC=0andZ=0.

Section2.5

1.MOVR1,#0x20

2.(a)MOVR2,#0x14(b)MOVR2,#20(c)MOVR2,#2_00010100

3.Ifthevalueistobechangedlater,itcanbedoneonceinoneplaceinsteadofateveryoccurrenceandthecodebecomesmorereadable,aswell.

4.(a)0x34(b)0x1F

5.15indecimal(0x0Finhex)

6.Valueoflocation0x00000200=0x95

7.0x0C+0x10=0x1Cwillbeindatamemorylocation0x00000630.

Section2.6

1.TherealworkisperformedbyinstructionssuchasMOVandADD.Pseudo-instructions,alsocalledAssemblydirectives,directtheassemblerindoingitsjob.

2.Theinstructionmnemonics,pseudo-instructions

3.False

4.Allexcept(c)

5.Assemblerdirectives

6.True

7.(c)

Section2.7

1.True

2.True

3.(a)

4.(b)and(d)

Section2.8

1.32

2.True

3.0x00000000

4.True

5.False

6.True

Section2.9

1.No

2.Thegeneralpurposeregisters(R0toR15)

3.Itisapartoftheinstruction

Section2.10

1.RISCisreducedinstructionsetcomputer;CISCstandsforcomplexinstructionsetcomputer.

2.True

3.Large

4.True

5.Allofthem

6.True

7.(c)

8.False

Chapter3:ArithmeticandLogicInstructionsandProgramsIn this chapter, most of the arithmetic and logic instructions are discussed and

programexamples are given to illustrate the applicationof these instructions.Unsignednumbersareusedinthisdiscussionofarithmeticandlogicinstructions.InSection3.1weexamine the arithmetic instructions for unsigned numbers. The logic instructions andprograms are covered in Section 3.2. Section 3.3 is dedicated to rotate and shiftoperations.Weexamineloadingfixed(constant)valuesintoregistersusingrotateoptions,as well. In Section 3.4 we discuss the ARM Cortex instructions for rotate and shift.Section3.5isdedicatedtoBCDandASCIIdataconversion.

Section3.1:ArithmeticInstructionsUnsignednumbersaredefinedasdatainwhichallthebitsareusedtorepresentdata

andnobitsaresetasideforthepositiveornegativesign.Thismeansthattheoperandcanbe between 00 and 0xFF (0 to 255 decimal) for 8-bit data and between 0x0000 and0xFFFF(0to65535decimal)for16-bitdata.Forthe32-bitoperanditcanbebetween0and0xFFFFFFFF (0 to232 -1).SeeTable3-1.This sectioncovers theADD,SUB,andmultiplyinstructionsforunsignednumber.


Byte 8 0–255 0-0xFF STRB

Half-word 16 0–65535 0-0xFFFF STRH

Word 32 0–232-1 0–0xFFFFFFFF STR

Table3‑1:UnsignedDataRangeSummaryinARM

AffectingflagsinARMinstructions

AuniquefeatureoftheexecutionofARMarithmeticinstructionsisthatitdoesnotaffect(updates)theflagsunlesswespecifyit.Thisisdifferentfromothermicrocontrollersand CPUs such as 8051 and x86. In the x86 and 8051 the arithmetic instructionsautomaticallychangetheZandCflagsregardlessofwewantitornot.Thisisnotthecasewith ARM. The default for the ARM instruction is not to affect these flags after theexecutionofarithmeticinstructionssuchasADDandSUB.TheARMassemblergivesustheoptionof telling theCPU toupdate the flagbits in theCSPR register to reflect theresult.WeoverridethedefaultbyhavingletterSintheinstruction.Forexample,wemustuseSUBSinsteadofSUBsincetheSUBinstructionwillnotupdatetheflags.TheSUBSmeanssubtractandsettheflags,whiletheSUBsimplysubtractswithouthavinganyeffectontheflags.SeeTable3-2andFigure3-1.

Figure3-1:GeneralFormationofDataProcessingInstruction

Instruction(Flagsunchanged) Instruction(Flagsupdated)

ADD Add ADDS Addandsetflags

ADC Addwithcarry ADCS Addwithcarryandsetflags

SUB SUBS SUBS Subtractandsetflags

SBC Subtractwithcarry SBCS Subtractwithcarryandsetflags

MUL Multiply MULS Multiplyandsetflags

UMULL Multiplylong UMULLS MultiplyLongandsetflags

RSB Reversesubtract RSBS Reversesubtractandsetflags

RSC Reversesubtractwithcarry RSCS Reversesubtractwithcarryandsetflags

Note:TheaboveinstructionaffectalltheZ,C,VandNflagbitsofCPSR(currentprogramstatusregister)buttheNandVflagsareforsigneddataandarediscussedinChapter5.

Table3-2:ArithmeticInstructionsandFlagBitsforUnsignedData

Additionofunsignednumbers

TheformoftheADDinstructionis

ADDRd,Rn,Op2;Rd=Rn+Op2

The instructions ADD and ADC are used to add two operands. The destinationoperandmustbearegister.TheOp2operandcanbearegisterorimmediate.Rememberthatmemory-to-registerormemory-to-memoryarithmeticandlogicoperationsareneverallowedinARMAssembly languagesince it isaRISCprocessor.The instructioncouldchangeanyoftheZ,C,N,orVbitsofthestatusflagregister,aslongasweusetheADDSinsteadofADD.TheeffectoftheADDSinstructionontheoverflow(V)andN(negative)flagsisdiscussedinChapter5sincetheyareusedinsignednumberoperations.LookatExamples3-1and3-2fortheeffectofADDSinstructiononZandCflags.

Example3-1

Showtheflagbitsofstatusregisterforthefollowingcases:

a)LDRR2,=0xFFFFFFF5;R2=0xFFFFFFF5(noticethe=sign)

MOVR3,#0x0B

ADDSR1,R2,R3;R1=R2+R3andupdatetheflags

b)LDRR2,=0xFFFFFFFF

ADDSR1,R2,#0x95;R1=R2+95andupdatetheflags

Solution:

a)

0xFFFFFFF5 11111111111111111111111111110101

+0x0000000B

+00000000000000000000000000001011

0x100000000 100000000000000000000000000000000

First,noticehowthe“LDRR2,=0xFFFFFFF5”pseudo-instructionloadsthe32-bitvalueintoR2register.AlsonoticetheuseofADDSinstructioninsteadofADDsincetheADDinstructiondoesnotupdatetheflags.Now,aftertheaddition,theR1register(destination)contains0andtheflagsareasfollows:

C=1,sincethereisacarryoutfromD31

Z=1,theresultoftheactioniszero(forthe32bits)

b)

0xFFFFFFFF 11111111111111111111111111111111

+0x00000095

+00000000000000000000000010010101

0x100000094 100000000000000000000000010010100

Aftertheaddition,theR1register(destination)contains0x94andtheflagsareasfollows:


Z=0,theresultoftheactionisnotzero(forthe32bits)

Example3-2

Showtheflagbitsofstatusregisterforthefollowingcase:

LDRR2,=0xFFFFFFF1;R2=0xFFFFFFF1

MOVR3,#0x0F


ADDR3,R3,#0x7;R3=R3+0x7andflagsunchanged

MOVR1,R3

Solution:

0xFFFFFFF1 11111111111111111111111111110001

+0x0000000F +00000000000000000000000000001111

0x100000000 100000000000000000000000000000000

AftertheADDSaddition,theR3register(destination)contains0andtheflagsareasfollows:



Afterthe“ADDR3,R3,#0x7”addition,theR3register(destination)contains0x7(0x+07=07)andtheflagsareunchangedfrompreviousinstructionsinceweusedADDinsteadofADDS.Therefore,theZ=1andC=1.Ifweuse“ADDSR3,R3,#0x7”instructioninsteadof“ADDR3,R3,#0x7”,wewillhaveZ=0andC=0.UsetheKeilARMsimulatortoverifythis.

Comment

MicrosoftWindowscomeswithacalculator.Use it toverify thecalculations in thisandfuturechapters.Thecalculatorsupportsdatasizeofupto64-bit

NoincrementinstructioninARM

There is no increment instruction in theARMprocessor. Insteadwe useADD toperformthisaction.Theinstruction“ADDSR4,R4,#1”willincrementtheR4andplacestheresultinR4register.TheRISCprocessorseliminatetheunnecessaryinstructionsanduseanexistinginstructiontoperformtheoperation.

ADC(addwithcarry)

Thisinstructionisusedformultiword(datalargerthan32-bit)numbers.TheformoftheADCinstructionis

ADCRd,Rn,Op2;Rd=Rn+Op2+C

Indiscussingaddition,thefollowingtwocaseswillbeexamined:

1.Additionofindividualworddata

2.Additionofmultiworddata

CASE1:Additionofindividualworddata

Inpreviousexamples,inprogramsregardingaddition,thetotalsumwaspurposelykeptlessthan0xFFFFFFFF,themaximumvaluea32-bitregistercanhold.Inrealworldto calculate the total sumof anynumberof operands, the carry flag shouldbe checkedaftertheadditionofeachoperandtoseeifthetotalsumisgreaterthan0xFFFFFFFF.SeeExample3-3andProgram3-1.

Example3-3

Showtheflagbitsofstatusregisterforthefollowingcase:

LDRR2,=0xFFFFFFF1;R2=0xFFFFFFF1

MOVR3,#0x0F


ADDR3,R3,#0x7;R3=R3+0x7andflagsunchanged

MOVR1,R3

Solution:

0xFFFFFFF1 11111111111111111111111111110001

+0x0000000F +00000000000000000000000000001111

0x100000000 100000000000000000000000000000000

AftertheADDSaddition,theR3register(destination)contains0andtheflagsareasfollows:



Afterthe“ADDR3,R3,#0x7”addition,theR3register(destination)contains0x7(0x0+0x7=0x7)andtheflagsareunchangedfrompreviousinstructionsinceweusedADDinsteadofADDS.Therefore,theZ=1andC=1.Ifweuse“ADDSR3,R3,#0x7”instructioninsteadof“ADDR3,R3,#0x7”,wewillhaveZ=0andC=0.UsetheKeilARMsimulatortoverifythis.

Program3-1Writeaprogramtocalculatethetotalsumoffivewordsofdata.Eachdatavaluerepresentsthemassofaplanetininteger.Thedecimaldataareasfollow:1000000000,2000000000,3000000000,4000000000,and

4100000000.

AREAPROG3_1,CODE,READONLY

ENTRY

LDRR1,=1000000000

LDRR2,=2000000000

LDRR3,=3000000000

LDRR4,=4000000000

LDRR5,=4100000000

MOVR8,#0;R8=0forsavingthelowerword

MOVR9,#0;R9=0foraccumulatingthecarries

ADDSR8,R8,R1;R8=R8+R1

ADCR9,R9,#0;R9=R9+0+Carry

;(incrementR9ifthereiscarry)









HEREBHERE

END;Markendoffile

CASE2:Additionofmultiwordnumbers

AssumeaprogramisneededthatwilladdthetotalU.S.budgetforthelast100years

or themass of all the planets in the solar system. In cases like this, the numbers beingaddedcouldbeupto8byteswideormore.SinceARMregistersareonly32bitswide(4bytes),itisthejoboftheprogrammertowritethecodetobreakdowntheselargenumbersinto smaller chunks to be processed by the CPU. If a 32-bit register is used and theoperand is 8 bytes wide, that would take a total of two iterations. See Example 3-4.However,ifa16-bitregisterisused,thesameoperandswouldrequirefouriterations.Thisobviouslytakesmoretimefor theCPU.This isonereasontohavewideregisters in thedesignoftheCPU.

Example3-4

Analyzethefollowingprogramwhichadds0x35F62562FAto0x21F412963B:

LDRR0,=0xF62562FA;R0=0xF62562FA

LDRR1,=0xF412963B;R1=0xF412963B

MOVR2,#0x35;R2=0x35

MOVR3,#0x21;R3=0x21

ADDSR5,R1,R0;R5=0xF62562FA+0xF412963B

;nowC=1

ADCR6,R2,R3;R6=R2+R3+C

;=0x35+21+1=0x57

Solution:

AftertheR5=R0+R1thecarryflagisone.SinceC=1,whenADCisexecuted,R6 = R2 + R3 + C =0x35+0x21+1=0x57.

MicrosoftWindowscalculatorsupportdatasizeofup64-bit(doubleword).Useittoverifytheabovecalculations.

Subtractionofunsignednumbers

SUBRd,Rn,Op2;Rd=Rn-Op2

Insubtraction,theARMmicroprocessors(indeed,almostallmodernCPUs)usethe2’s complementmethod.Although everyCPU contains adder circuitry, itwould be too

cumbersome(and take toomany logicgates) todesignseparate subtractorcircuitry.Forthis reason, theARMuses internal adder circuitry toperform the subtractionoperation.AssumingthattheARMisexecutingsimplesubtractinstructions,onecansummarizethestepsofthehardwareoftheCPUinexecutingtheSUBinstructionforunsignednumbersasfollows:

1.Takethe2’scomplementofthesubtrahend(Op2operand).

2.Addittotheminuend(Rnoperand).

3.PlacetheresultindestinationRd.

4.Setthecarryflagifthereisacarry.

ThesefourstepsareperformedforeverySUBSinstructionbytheinternalhardwareoftheARMCPU.Itisafterthesefourstepsthattheresultisobtainedandtheflagsareset.Examples3-5through3-7illustratesthefoursteps.

Example3-5

Showthestepsinvolvedforthefollowingcases:

a)

MOVR2,#0x4F;R2=0x4F

MOVR3,#0x39;R3=0x39

SUBSR4,R2,R3;R4=R2–R3

b)

MOVR2,#0x4F;R2=0x4F

SUBSR4,R2,#0x05;R4=R2–0x05

Solution:

a)

0x4F 0000004F

–0x39 +FFFFFFC7 2’scomplementof0x39

16 100000016 (C=1step4)

Theflagswouldbesetasfollows:C=1,andZ=0.Theprogrammermustlookatthecarryflag(notthesignflag)todetermineiftheresultispositiveornegative.

b)

0x4F 0000004F

–0x05 +FFFFFFFB 2’scomplementof0x05

0x4A 10000004A (C=1step4)

Example3-6

Analyzethefollowinginstructions:

LDRR2,=0x88888888;R2=0x88888888

LDRR3,=0x33333333;R3=0x33333333


Solution:

Followingarethestepsfor“SUBR4,R2,R3”:

88888888 88888888

–33333333 +CCCCCCCD (2’scomplementof0x33333333)

55555555 155555555 (C=1step4)resultispositive

Notice that, unlikex86CPUs,ARMdoesnot invert the carry flagafterSUBSsoC=0 when there is borrow and C=1 when there is no borrow. It means that after theexecutionofSUBS,ifC=1,theresultispositive;ifC=0,theresultisnegativeandthedestination has the 2’s complement of the result. Normally, the result is left in 2’scomplement,butyoucantakethe2’scomplementoftheresultbyinvertingitandaddingonetoit.

Example3-7

Analyzethefollowinginstructions:

MOVR1,#0x4C;R1=0x4C

MOVR2,#0x6E;R2=0x6E


Solution:

Followingarethestepsfor“SUBR0,R1,R2”:

4C 0000004C

–6E +FFFFFF92 (2’scomplementof0x6E)

–22 0FFFFFFDE (C=0step4)resultisnegative

SBC(subtractwithborrow)

SBCRd,Rn,Op2;Rd=Rn–Op2–1+C

This instruction is used for subtraction of multiword (data larger than 32-bit)numbers. Notice that in some other architectures, the CPU inverts the C flag aftersubtraction so the content of carry flag is theborrowbit of subtract operation. In thosearchitectures the subtractwith borrow is implemented as “Rd=Rn –Op2 –C” but inARMthecarryflagisnotinvertedaftersubtractionandcarryflagisinvertofborrow.Toinvertthecarryflagwhilerunningthesubtractwithborrowinstructionitisimplementedas“Rd = Rn – Op2 –1+ C”SeeExample3-8.

Example3-8

Analyzethefollowingprogramwhichsubtracts0x21F62562FAfrom0x35F412963B:

LDRR0,=0xF62562FA;R0=0xF62562FA,

;noticethesyntaxforLDR

LDRR1,=0xF412963B;R1=0xF412963B

MOVR2,#0x21;R2=0x21

MOVR3,#0x35;R3=0x35


;=0xF412963B–0xF62562FA,andC=0

SBCR6,R3,R2;R6=R3–R2–1+C

;=0x35–0x21–1+0=0x13

Solution:

AftertheR5=R1–R0thereisaborrowsothecarryflagiscleared.SinceC=0,whenSBCisexecuted,R6=R3–R2–1+C=0x35–0x21–1+0=0x35–0x21–1=0x13.

NodecrementinstructioninARM

There isnodecrement instruction in theARMprocessor. InsteadweuseSUB toperform the action. The instruction “SUB R4,R4,#1” will decrement one from R4 andplaces the result in R4 register. The RISC processors eliminate the unnecessaryinstructionsanduseanexistinginstructiontoperformthedesiredoperation.

RSB(reversesubtract)

TheformatfortheRSBinstructionis

RSBRd,Rn,Op2;Rd=Op2–Rn

Notice the differencebetween theRSBandSUB instruction.They are essentiallythesameexcept thewaythesourceoperandsaresubtractedisreversed.This instructioncanbeusedtoget2’scomplementofa32-bitoperand.SeeExample3-9.

Example3-9

FindtheresultofR0forthefollowings:

a)

MOVR1,#0x6E;R1=0x6E

RSBR0,R1,#0;R0=0–R1

b)

MOVR1,#0x1;R1=1

RSBR0,R1,#0;R0=0–R1=0–1

Solution:

a)Followingarethestepsfor“RSBR0,R1,#0”:

0 0000000

–6E +FFFFFF92 (2’scomplement)

–6E FFFFFF92 (C=0)resultisnegative

b)Thisisonewaytogetafixedvalueof0xFFFFFFFFinaregister.Therefore,wehaveR0=0xFFFFFFFF.

RSC(reversesubtractwithcarry)

TheformoftheRSCinstructionis

RSCRd,Rn,Op2;Rd=Op2–Rn–1+C

Notice thedifferencebetween theRSBandRSCinstructions.Theyareessentiallythesameexcept thewaythesourceoperandsaresubtractedisreversed.This instructioncanbeusedtogetthe2’scomplementofthe64-bitoperand.SeeExample3-10.

Example3-10

Showhowtocreate2’scomplementofa64-bitdatainR0andR1register.TheR0holdthelower32-bit.

Solution:

LDRR0,=0xF62562FA;R0=0xF62562FA

LDRR1,=0xF812963B;R1=0xF812963B

RSBR5,R0,#0;R5=0–R0

;=0–0xF62562FA=9DA9D06andC = 0

RSCR6,R1,#0;R6=0–R1–1+C

;=0–0xF812963B – 1 + 0=7ED69C4

UseMicrosoftWindowscalculatortoverifytheabovecalculations.

Multiplicationanddivisionofunsignednumbers

Not all CPUs have instructions for multiplication and division. All the ARMprocessorshave amultiplication instructionbut not thedivision.Some familymemberssuchasARMCortexhaveboththedivisionandmultiplicationinstructions.Inthissectionwe examine the multiplication of unsigned numbers. Signed numbers multiplication istreatedinChapter5.

MultiplicationofunsignednumbersinARM

TheARMgivesyou twochoicesofunsignedmultiplication:normalmultiplyandlongmultiply.Thenormalmultiplyinstruction(MUL)isusedwhentheresultislessthan32-bit,whilethelongmultiply(MULL)mustbeusedwhentheresultisgreaterthan32-bit.SeeTable3-3.Inthissectionweexaminebothofthem.

Instruction Source1 Source2 Destination Result

MUL Rn Op2 Rd(32bits) Rd=Rn×Op2

UMULL Rn Op2 RdLo,RdHi(64bits) RdLo:RdHi=Rn×Op2

Note1:UsingMULforword×wordmultiplicationprovidesonlythelower32-bitresultinRdandtherestaredroppediftheresultisgreaterthan32-bit.Iftheresultisgreaterthan0xFFFFFFFF,thenwemustuseUMULL(unsignedMultiplyLong)instruction.

Note2:inword-by-wordmultiplicationusingMULinstruction,iftheresultisgreaterthan32-bitonlythelower32-bitissavedbyARMandtheupperpartisdroppedwithoutsettinganyflag.InsomeCPUstheCflagisusedtoindicatetheresultisgreaterthan32-bitbutthisisnotthecasewithARM.

Table3-3:UnsignedMultiplication(UMULRd,Rn,Op2)Summary

MUL(multiply)

MULRd,Rn,Op2;Rd=Rn×Op2

Innormalmultiplication,theoperandsmustbeinregisters.Afterthemultiplication,thedestinationregisterswillcontaintheresult.Seethefollowingexample:

MOVR1,#0x25;R1=0x25

MOVR2,#0x65;R2=0x65

MULR3,R1,R2;R3=R1×R2=0x65×0x25

Note that in the case of half-word times half-word or smaller sources since thedestinationregisteris32-bitthereisnoprobleminkeepingtheresultof65,535×65,535,the highest possible unsigned 16-bit data. That is not the case in word times wordmultiplicationbecause32-bit×32-bitcanproducearesultgreaterthan32-bit.IftheMULinstruction is used, the destination register will hold the lower word (32-bit) and theportion beyond 32-bit is dropped. So it is not safe to use MUL for multiplication ofnumbersgreater than65,536. Inmanymicroprocessorswhen theresultgoesbeyond thedestination register size, the C flag is raised to indicate that. This is not the casewithARM.Seethefollowingexample:

LDRR1,=100000;R1=100,000

LDRR2,=150000;R2=150,000

MULR3,R2,R1;R3isnot15,000,000,000because

;itcannotfitin32bits.

For this reason wemust use UMULL (unsignedmultiply long) instruction if theresultisgoingtobegreaterthan0xFFFFFFFF.

UMULL(unsignedmultiplylong)

UMULLRdLo,RdHi,Rn,Op2

;RdHi:RdLoRd=Rn×Op2

In unsigned long multiplication, the operands must be in registers. After themultiplication, the destination registerswill contain the result.Notice that the leftmost

register,RdLoinourcase,willholdthelowerwordandthehigherportionbeyond32-bitissavedinthesecondregister,RdHi.Seethefollowingexample:

LDRR1,=0x54000000;R1=0x54000000

LDRR2,=0x10000002;R2=0x10000002

UMULLR3,R4,R2,R1;0x54000000×0x10000002

;=0x054000000A8000000

;R3=0xA8000000,thelower32bits

;R4=0x05400000,thehigher32bits

Notice that it is the job of programmer to choose the best type ofmultiplicationdependingonthesizeofoperandsandtheresult.SeeExample3-11.

Example3-11

Writeashortprogramtomultiply0xFFFFFFFFbyitself.

Solution:

MOVR0,#1;R0=1

RSBR1,R0,#0;R1=0–R0

;=0xFFFFFFFF

RSBR2,R0,#0;R2=0xFFFFFFFF

UMULLR3,R4,R1,R2

Since 0xFFFFFFFF × 0xFFFFFFFF = 0xFFFFFFFE00000001, then R4=0xFFFFFFFEandR3=0x00000001.IfwehadusedMULinstruction,thenthe0xFFFFFFFFwouldhavebeendroppedandonly0x00000001wouldhavebeenkeptbythedestinationregister.

MultiplyandAccumulateInstructionsinARM

Insomeapplicationsuchasdigitalsignalprocessing(DSP)weneedtomultiplytworegistersandaddtheresultwithanotherregister.TheARMhasaninstructiontodobothjobs in a single instruction. The format of MLA (multiply and add) instruction is asfollows:

MLARd,Rm,Rs,Rn;Rd=Rm×Rs+Rn

Inmultiplicationandadd,theoperandsmustbeinregisters.Afterthemultiplicationandadd,thedestinationregisterwillcontaintheresult.Seethefollowingexample:

MOVR1,#100;R1=100

MOVR2,#5;R2=5

MOVR3,#40;R3=40

MLAR4,R1,R2,R3;R4=R1×R2+R3=100×5+40=540

Noticethatmultiplyandaddcanproducearesultgreaterthan32-bit, if theMLAinstruction is used, the destination register will hold the lower word (32-bit) and theportion beyond 32-bit is dropped. For this reason we must use UMLAL (unsignedmultiplyandadd long) instruction if theresult isgoing tobegreater than0xFFFFFFFF.TheformatofUMLALinstructionisasfollows:

UMLALRdLo,RdHi,Rn,Op2;RdHi:RdLo=Rn×Op2+RdHi:RdLo

Inmultiplicationandadd,theoperandsmustbeinregisters.Noticethattheaddendandthehighwordofthedestinationusethesameregisters,thetwoleftmostregistersintheinstruction.ItmeansthatthecontentsoftheregisterswhichhavetheaddendwillbechangedafterexecutionofUMLALinstruction.Seethefollowingexample:

LDRR1,=0x34000000;R1=0x34000000

LDRR2,=0x2000000;R2=0x2000000

EORR3,R3,R3;R3=0x00

LDRR4,=0x00000BBB;R4=0x00000BBB

UMLALR4,R3,R2,R1;0x34000000×0x2000000+0xBBB

;=0x068000000000000BBB

DivisionofunsignednumbersinARM

SomeARMfamilies donot have an instruction for divisionof unsignednumberssinceittakestoomanygatestoimplementit.WecanuseSUBinstructiontoperformthedivision. In the next chapter, after explaining conditional branches, we will show anexampleofunsigneddivisionusingsubtractoperation.

ReviewQuestions

1.ExplainthedifferencebetweenADDSandADDinstructions.

2.TheADCinstructionthathasthesyntax“ADCRd,Rn,Op2”means_____________.

3.ExplainwhytheZ=0forthefollowing:

MOVR2,#0x4F

MOVR4,#0xB1

ADDSR2,R4,R2

4.ExplainwhytheZ=1forthefollowing:

MOVR2,#0x4F

LDRR4,=0xFFFFFFB1

ADDSR2,R4,R2

5.ShowhowtheCPUwouldsubtract0x05from0x43.

6.IfC=1,R2=0x95,andR3=0x4Fpriortotheexecutionof“SBCR2,R2,R3”,whatwillbethecontentsofR2afterthesubtraction?

7.Inunsignedmultiplicationof“MULR2,R3,R4”,theproductwillbeplacedinregister__________.

8.Inunsignedmultiplicationof“MULR1,R2,R4”,theR2canbemaximumof___________ifR4=0xFFFFFFFF.

Section3.2:LogicInstructionsInthissectionwediscussthelogicinstructionsAND,OR,andEx-ORinthecontext

ofmanyexamples. Just likearithmetic instruction,wemustuse theS in the instructionsyntaxifwewanttoupdatetheflags.IftheSsyntaxisusedtheZflagwillbesetifandonlyiftheresultisallzeros,andtheNflagwillbesettothelogicalvalueofbit31oftheresult.TheVflagintheCPSRwillbeunaffected,andtheCflagwillbesettothecarryoutfromthebarrelshifterwhichisdiscussedinthenextsection.SeeTable3-4.

Instruction

(FlagsUnchanged)Action

Instruction

(FlagsChanged)Hexadecimal

AND ANDing ANDS Andingandsetflags

ORR ORRing ORS Oringandsetflags

EOR Exclusive-ORing EORS ExclusiveOringandsetflags

BIC BitClearing BICS Bitclearingandsetflags

Table3-4:LogicInstructionsandFlagBits

TheinstructionformatoflogicinstructionsinARMissimilartotheformatofotherdataprocessinginstructions.SeeFigure3-2.

Figure3-2:GeneralFormationofDataProcessingInstruction

AND

ANDRd,Rn,Op2;Rd=RnANDedOp2

Inputs Output Symbol

X Y XANDY

0 0 0

0 1 0

1 0 0

1 1 1

ThisinstructionwillperformabitwiselogicalANDontheoperandsandplacetheresult in thedestination.Thedestinationand the first sourceoperandsare registers.Thesecondsourceoperandcanbearegisteroranimmediatevalueoflessthan0xFF.

IfweuseANDSinsteadofANDitwillchangetheCandZflagsaccordingtotheresult.AsseeninExample 3-12,ANDcanbeusedtomaskcertainbitsoftheoperand.

Example3-12

Showtheresultsofthefollowingcases

a)

MOVR1,#0x35

ANDR2,R1,#0x0F;R2=R1ANDedwith0x0F

b)

MOVR0,#0x97

MOVR1,#0xF0

ANDR2,R0,R1;R2=R0ANDedwithR1

Solution:

a)

0x3500110101

AND0x0F00001111

0x0500000101

b)

0x9710010111

AND0xF011110000

0x9010010000

ORR

ORRRd,Rn,Op2;Rd=RnORedOp2


X Y XORY

0 0 0

0 1 1

1 0 1

1 1 1

TheoperandsareORedandtheresultisplacedinthedestination.ORRcanbeusedtosetcertainbitsofanoperandtoone.Thedestinationandthefirstsourceoperandsareregisters.Thesecondsourceoperandcanbeeitheraregisteroranimmediatevalueoflessthan0xFF.

IfweuseORRSinsteadofORR,theflagswillbeupdated,justthesameasfortheANDSinstruction.SeeExample3-13.

Example3-13

Showtheresultsofthefollowingcases:

a)

MOVR1,#0x04;R1=0x04

ORRSR2,R1,#0x68;R2=R1ORed0x68

b)

MOVR0,#0x97

MOVR1,#0xF0

ORRR2,R0,R1;R2=R0ORedwithR1

Solution:

a)

0x0400000100

OR0x6801101000Flagwillbe:Z=0

0x6C01101100

b)

0x9710010111

OR0xF011110000Flagwillbeunchanged

0xF711110111

The ORR instruction can also be used to test for a zero operand. For example,“ORRSR2,R2,#0”willORtheregisterR2withzeroandmakeZ=1ifR2iszero.

EOR

EORRd,Rn,Op2;Rd=RnEx-ORedwithOp2


X Y XEORY

0 0 0

0 1 1

1 0 1

1 1 0

TheEOR instructionwill Exclusive-OR the operands and place the result in thedestinationregister.EORsetstheresultbitsto1iftheyarenotequal;otherwise,theyarereset to 0. The flags are updated if we use EORS instead of EOR. The rules for theoperandsarethesameasintheANDandORinstructions.SeeExamples3-14and3-15.

Example3-14

Showtheresultsofthefollowing:

MOVR1,#0x54

EORR2,R1,#0x78;R2=R1ExOredwith0x78

Solution:

0x5401010100

XOR0x7801111000

0x2C00101100

Example3-15

TheEORinstructioncanbeusedtoclearthecontentsofaregisterbyEx-ORingitwithitself.

Showhow“EORSR1,R1,R1”clearsR1,assumingthatR1=0x45.

Solution:

0x4501000101

XOR0x4501000101

0x0000000000

EOR can also be used to see if two registers have the same value. “EORSR2,R3,R4”willmakeZ=1ifbothregistersR4andR3havethesamevalue,andiftheydo,theresult(00000000)issavedinR2,thedestination.

Another widely used application of EOR is to toggle bits of an operand. Forexample,totogglebit2ofregisterR2:

EORR2,R2,#0x04;EORR2with00000100

Thiswouldcausebit2ofR2tochangetotheoppositevalue;allotherbitswouldremainunchanged.

BIC(bitclear)

BICRd,Rn,Op2;clearcertainbitsofRnspecifiedby

;theOp2andplacetheresultinRd

Inputs Output

X Y XAND(NOTY)

0 0 0

0 1 0

1 0 1

1 1 0

TheBIC(bitclear) instruction isused toclear theselectedbitsof theRnregister.TheselectedbitsareheldbyOp2.ThebitsthatareHIGHinOp2willbeclearedandbitswith LOW will be left unchanged. For example, assuming that R3 = 00001000 theinstruction “BIC R2,R2,R3” will clear bit 3 of R2 and leaves the rest of the bitsunchanged.Inreality,theBICinstructionperformsANDoperationonRnregisterwiththecomplement ofOp2 and places the result in destination register. Look at the followingexample:

MOVR1,#0x0F

MOVR2,#0xAA

BICR3,R2,R1;nowR3=0xAAANDedwith0xF0=0xA0

Ifwewanttheflagstobeupdated,thenwemustuseBICSinsteadofBIC.

MVN(movenegative)

MVNRd,Rn;movenegativeofRntoRd

TheMVN(movenegative)instructionisusedtogenerateone’scomplementofanoperand.Forexample,theinstruction“MVNR2,#0”willmakeR2=0xFFFFFFFF.Lookatthefollowingexample:

LDRR2,=0xAAAAAAAA;R2=0xAAAAAAAA

MVNR2,R2;R2=0x55555555

WecanalsouseEx-ORinstructiontogenerateone’scomplementofanoperand.Ex-ORinganoperandwith0xFFFFFFFFwillgeneratethe1’scomplement.Seethefollowingcode:


MVNR0,#0;R0=0xFFFFFFFF

EORR2,R2,R0;R2=R2ExORedwith0xFFFFFFFF

;=0x55555555

Itmustbenotedthattheinstruction“MVNRd,#0”iswidelyusedtoloadthefixedvalueof0xFFFFFFFFintodestinationregister.Wecanusethe“LDRRd,=0xFFFFFFFF”pseudo-instruction to do the same thing, but theARMassemblerwill substitute severalrealARMinstructionsinitsplaceandthereforeittakesmorecodespace.

ReviewQuestions

1.Useoperands0x4FCAand0xC237toperform:

(a)AND(b)OR(c)XOR

2.ANDingawordoperandwith0xFFFFFFFFwillresultinwhatvalueforthewordoperand?Tosetallbitsofanoperandto0,itshouldbeANDedwith______.

3.Tosetallbitsofanoperandto1,itcouldbeORedwith_______.

4.XORinganoperandwithitselfresultsinwhatvaluefortheoperand?

5.Writeaninstructionthatsetsbit4ofR7.

6.Writeaninstructionthatclearsbit3ofR5.

Section3.3:RotateandBarrelShifter Although ARM Cortex has shift and rotate instruction, for the ARM7 we can

perform the shift and rotate operations as part of other instructions such as MOV. Inprevious sections we discussed that as the second argument of process instructions(arithmetic and logic instructions) we can use register or immediate values. In otherwords,theprocessinstructionscanbeusedinoneofthefollowingforms:

1.opcodeRd,Rn,Rs(e.g.ADDR1,R2,R3)

2.opcodeRd,Rn,immediateValue(e.g.ADDR2,R3,#5)

ARMisabletoshiftorrotatethesecondargumentbeforeusingitastheargument.Inthissectionwefirstdiscussshiftingandrotatingonregistersandthenwecovershiftingimmediatevalues.

BarrelShifter

Therearetwokindsofshifts:logicalandarithmetic.Thelogicalshiftisforunsignedoperandsandthearithmeticshiftisforsignedoperands.Logicalshiftwillbediscussedinthis section and the discussionof arithmetic shift is covered inChapter 5.UsingMOVinstructionsinARMonecanshift thecontentsofaregisterrightor left.Thenumberoftimes(orbits)thattheoperandisshiftedcanbespecifieddirectlyorthrougharegister.

LSR

Thisisthelogicalshiftright.Theoperandisshiftedrightbitbybit,andforeveryshift the LSB (least significant bit) will go to the carry flag (C) and the MSB (mostsignificantbit)isfilledwith0.Onecanuseanimmediateoperandoraregistertoholdthenumberoftimesitistobeshifted.Examples3-16and3-17shouldhelptoclarifyLSR.

Example3-16

ShowtheresultofLSRinthefollowing:

MOVR0,#0x9A;R0=0x9A

MOVSR1,R0,LSR#3;shiftR0toright3times

;andthenmove(copy)theresulttoR1

Solution:

0x9A=000000000000000000000000000000010011010

firstshift:000000000000000000000000000000001001101C=0

secondshift:000000000000000000000000000000000100110C=1

thirdshift:000000000000000000000000000000000010011C=0

Aftershiftingrightthreetimes,R1=0x00000013andC=0.Anotherwaytowritetheabovecodeis:

MOVR0,#0x9A

MOVR2,#0x03

MOVR1,R0,LSRR2;shiftR0torightR2times

;andmovetheresulttoR1

Example3-17

ShowtheresultsofLSRinthefollowing:

TIMESEQU0x4

LDRR1,=0x777;R1=0x777

MOVR2,#TIMES;R2=0x04

MOVSR3,R1,LSRR2;shiftR1rightR2numberoftimes

;andplacetheresultinR3

Solution:

Afterfourshifts,theR3willcontain0x77.ThefourLSBsarelostthroughthecarry,onebyone,and0sfillthefourMSBs.

AlthoughLSRdoesaffecttheSandZflags,theyarenotimportantinthiscase.Butnoticethatifyouwanttheflagstobeset,usetheMOVSinstructioninsteadofMOV.

OnecanusetheLSRtodivideanumberby2.SeeExample3-18.

Example3-18

ShowtheresultsofLSRinthefollowing:

LDRR0,=0x88;R0=0x88

MOVSR1,R0,LSR#3;shiftR0rightthreetimes(R1=0x11)

Solution:

Afterthethreeshifts,theR1willcontain0x11.Thisdividesthenumberby8since2tothepower3is8.

LSL

Shiftleftisalsoalogicalshift.ItisthereverseofLSR.Aftereveryshift,theLSBisfilledwith0andtheMSBgoestoC.AlltherulesarethesameasforLSR.Onecanuseanimmediateoperandora register tohold thenumberof times it is tobeshifted left.SeeExample3-19.OnecanusetheLSLtomultiplyanumberby2.SeeExample3-20.

Example3-19

ShowtheeffectsofLSLinthefollowing:

LDRR1,=0x0F000006

MOVSR2,R1,LSL#8

Solution:

00001111000000000000000000000110

C=000011110000000000000000000001100(shiftedleftonce)

C=000111100000000000000000000011000

C=001111000000000000000000000110000

C=011110000000000000000000001100000

C=111100000000000000000000011000000

C=111000000000000000000000110000000

C=110000000000000000000001100000000

C=100000000000000000000011000000000(shiftedeighttimes)

Aftereightshiftsleft,theR2registerhas0x00000600andC=1.TheeightMSBsarelostthroughthecarry,onebyone,and0sfilltheeightLSBs.Anotherwaytowritetheabovecodeis:

LDRR1,=0x0F000006

MOVR0,#0x08

MOVR2,R1,LSLR0

Example3-20

ShowtheresultsofLSLinthefollowing:

TIMESEQU0x5

LDRR1,#0x7;R1=0x7

MOVR2,#TIMES;R2=0x05

MOVR1,R1,LSLR2;shiftR1leftR2numberoftimes

;andplacetheresultinR1

Solution:

Afterthefiveshifts,theR1willcontain0x000000E0.0xE0is224indecimal.Noticethatitmultipliesnumberbypowerof2.7×32=224=0xE0since2tothepowerof5is32.

Table3-5liststhelogicalshiftoperationsinARM.

Operation Destination Source Numberofshifts

LSR(ShiftRight) Rd Rn Immediatevalue

LSR(ShiftRight) Rd Rn registerRm

LSL(ShiftLeft) Rd Rn Immediatevalue

LSL(ShiftLeft) Rd Rn registerRm

Note:Numberofshiftcannotbemorethan32.

Table3-5:LogicShiftoperationsforunsignednumbersinARM

ArithmeticshiftrightASR

ThearithmeticshiftASRisusedforsignednumbersandisdiscussedinChapter5.

Rotatingthebitsofanoperandrightandleft

Therearetwotypesofrotations.Oneisasimplerotationofthebitsoftheoperand,andtheotherisarotationthroughthecarry.Eachisexplainedbelow.

ROR(rotateright)

Inrotateright,asbitsareshiftedfromlefttorighttheyexitfromtherightend(LSB)andentertheleftend(MSB).Inaddition,aseachbitexitstheLSB,acopyofitisgiventothecarryflag.Inotherwords,inRORtheLSBismovedtotheMSBandisalsocopiedtoCflag,asshowninthediagram.Onecanuseanimmediateoperandoraregistertoholdthenumberoftimesitistoberotated.

MOVR1,#0x36

;R1=00000000000000000000000000110110

MOVSR1,R1,ROR#1

;R1=00000000000000000000000000011011C=0

MOVSR1,R1,ROR#1

;R1=10000000000000000000000000001101C=1

MOVSR1,R1,ROR#1

;R1=11000000000000000000000000000110C=1

or:

MOVR1,#0x36

;R1=00000000000000000000000000110110

MOVR0,#3

;R0=3numberoftimestorotate

MOVSR1,R1,RORR0

;R1=11000000000000000000000000000110C=1

alsolookatthefollowingcase:

LDRR2,=0xC7E5

;R2=00000000000000001100011111100101

MOVR4,#0x06

;R4=6numberoftimestorotate

MOVSR3,R2,RORR4

;R3=10010100000000000000001100011111C=1

Rotateleft

ThereisnorotateleftoptioninARM7sinceonecanusetherotateright(ROR)todothejob.Thatmeansinsteadofrotatingleftnbitswecanuserotateright32–nbitstodothe job of rotate left. Using this method does not give us the proper carry if actualinstructionofROLwasavailable.Lookatthefollowingexamples:

LDRR0,=0x00000072

;R0=00000000000000000000000001110010

MOVSR0,R0,ROR#31

;R0=00000000000000000000000011100100C=0

MOVSR0,R0,ROR#31

;R0=00000000000000000000000111001000C=0

MOVSR0,R0,ROR#31

;R0=00000000000000000000001110010000C=0

MOVSR0,R0,ROR#31

;R0=00000000000000000000011100100000C=0

or:

MOVR0,#0x72;R0=01110010

MOVR1,#28;R1=32–4=28

MOVSR0,R0,RORR1;R0=011100100000C=0

;assumeR2=0x672A

LDRR2,=0x671A

;R2=00000000000000000110011100101010

MOVSR2,R2,ROR#27

;R2=00000000000011001110010101000000

;C=0

RRXrotaterightthroughcarry

InRRX,asbitsareshiftedfromlefttoright,theyexitfromtherightend(LSB)tothecarryflag,and thecarryflagenters the leftend(MSB). Inotherwords, inRRXtheLSBismovedtoCandCismovedtotheMSB.Inreality,Cflagactsasifitispartoftheoperand.ThatmeanstheRRXislike33-bitregistersincetheCflagis33thbit.TheRRXtakesnoargumentsandthenumberoftimesanoperandtoberotatedisfixedatone.

;assumeC=0

MOVR2,#0x26

;R2=00000000000000000000000000100110

MOVSR2,R2,RRX

;R2=00000000000000000000000000010011C=0

MOVSR2,R2,RRX

;R2=00000000000000000000000000001001C=1

MOVSR2,R2,RRX

;R2=10000000000000000000000000000100C=1

MOVR2,#0x0F

;R2=00000000000000000000000000001111

MOVSR2,R2,RRX

;R2=00000000000000000000000000000111C=1

MOVSR2,R2,RRX

;R2=10000000000000000000000000000011C=1

MOVSR2,R2,RRX

;R2=11000000000000000000000000000001C=1

Table3-6liststherotateinstructionsoftheARM.

Operation Destination Source NumberofRotates

ROR(RotateRight) Rd Rn Immediatevalue

ROR(RotateRight) Rd Rn registerRm

RRX(RotateRightThroughCarry) Rd Rn 1bit

Table3-6:RotateoperationsforunsignednumbersinARM

ShiftingImmediateArguments

Wejustexaminedtherotateandshiftoperations.Onecanusetherotateoperationtoloadfixed(constant)valuesintoARMregister,aswell.ExaminetheMOVinstructionbitassignmentinFigure3-3.Ofthe32-bitopcode,theupper12bits(D31–D20)areusedfortheopcodeitself.Thelowest8(D7–D0)bitsareusedforfixedvaluesand4bits(D11–D8)areusedforthenumberoftimesrotateoperationisperformedbeforeloadedintotheregister.The8bitsforthefixedvaluegivesus00to255(00to0xFFinhex)rangeandthe4bitsoftherotatenumbergivesus0to15(0to0xFinhex)range.TheMOVinstructionscanbeconstructedtorotatean8-bitfixedvaluetotherightanumberoftimesandthenloadedintotheregister.Thenumberoftimesrotatedrightisalwaystwicethenumberintherotateportionoftheinstruction.Sincerotatevaluecanbe0–15thatgivesnumberofrotationsbetween0–30.Thismeans thatwhenever the secondoperand is an immediatevaluethenumberofrotationisanevennumber.

Figure3-3:MOVInstruction

Aswementioned earlier, theARM supports the rotate right only and there is norotateleftoption.Sotodorotateleftwemustuse32-nforthenumberofrotationright.Therefore, “MOV R0,#0xFF,#30” is the same as rotating left 2 times and “MOVR0,#0xFF,#28”isthesameasrotatingleft4times.SeeExamples3-21through3-23.

Example3-21

ShowallthepossiblecasesofusingMOVinstructionfor0xFFvalueandrotateoptions.

Solution:

MOVR0,#0xFF,#0

;R0=0xFFisrotatedright0times.R0=0x000000FF

MOVR0,#0xFF,#2

;R0=0xFFisrotatedright2times.R0=0xC000003F

MOVR0,#0xFF,#4

;R0=0xFFisrotatedright4times.R0=0xF000000F

MOVR0,#0xFF,#6

;R0=0xFFisrotatedright6times.R0=0xFC000003

MOVR0,#0xFF,#8

;R0=0xFFisrotatedright8times.R0=0xFF000000

MOVR0,#0xFF,#10

;R0=0xFFisrotatedright10times.R0=0x3FC00000

MOVR0,#0xFF,#12

;R0=0xFFisrotatedright12times.R0=0x0FF00000

MOVR0,#0xFF,#14


MOVR0,#0xFF,#16


MOVR0,#0xFF,#18


MOVR0,#0xFF,#20


MOVR0,#0xFF,#22


MOVR0,#0xFF,#24


MOVR0,#0xFF,#26


MOVR0,#0xFF,#28


MOVR0,#0xFF,#30

;R0=0xFFisrotatedright30times.R0=0x000003FC

Example3-22

UsingMOVinstruction,showhowtorotateleftthefixedvalueof0x99totalof(a)4,(b)8,and(c)16times.Alsogivethevalueintheregisteraftertherotation.

Solution:

Sincewedonothaverotateleftoperationwemustuserotateright32–ntimes.

MOVR1,#0x99,#28

;rotatingright28timesisthesameasrotateleft4times

MOVR2,#0x99,#24


MOVR3,#0x99,#16


Now,wehave(a)R1=0x00000990,(b)R2=0x00009900,and(c)R3=0x00990000

Example3-23

UsingtheKeilIDE,assembletheprograminExample3-21andexaminethelistfile.Compareandcontrastthecountvalueintheinstructionwithcountvalueoftheopcodes.

Solution:

Asexpected,thenumberoftimesrotatedrightistwicethenumberofrotatefield.

AlsoseeExample3-24forfurtherexamplesofrotateoperation.

Example3-24

Givetheregistervalueforeachofthefollowinginstructionsafteritisexecuted.

MOVR0,#0xAA,#2

MOVR1,#0x20,#28

MOVR4,#0x99,#6

MOVR2,#0x55,#24

MOVR3,#0x01,#20

MOVR7,#0x80,#12

MOVR10,#0x0F,#14

MOVR5,#0x66,#2

Solution:

MOVR0,#0xAA,#2

;R0=0xAAisrotatedright2times.R0=0x8000002A

MOVR1,#0x20,#28

;R1=0x20isrotatedright28times.R0=0x00000200

MOVR4,#0x99,#6


MOVR2,#0x55,#24


MOVR3,#0x01,#20


MOVR7,#0x80,#12


MOVR10,#0x0F,#14

;R10=0x0Fisrotatedright14times.R0=0x003C0000

MOVR5,#0x66,#2


WecanusetheconceptofrotatewithotherdataprocessinginstructionsoftheARMtoloadfixedvaluesintotheARMregister.SeeExample3-25.

Example3-25

Givetheregistervalueforeachofthefollowinginstructionsafteritisexecuted.

MVNR0,#0xAA,#2

MVNR1,#0x20,#28

MVNR4,#0x99,#6

MVNR2,#0x55,#24

MVNR3,#0x01,#20

MVNR7,#0x80,#12

MVNR10,#0x0F,#14

MVNR5,#0x66,#2

Solution:

MVNR0,#0xAA,#2

;0xAArotatedright2times=0x8000002A;R0=0x7FFFFFD5

MVNR1,#0x20,#28

;0x20isrotatedright28times=0x00000200;R1=0xFFFFFDFF

MVNR4,#0x99,#6

;0x99isrotatedright6times=0x64000002;R4=0x9BFFFFFD

MVNR2,#0x55,#24

;0x55isrotatedright24times=0x00001A40;R2=0xFFFFAAFF

MVNR3,#0x01,#20

;0x01isrotatedright20times=0x00001000;R3=0xFFFFEFFF

MVNR7,#0x80,#12

;0x80isrotatedright12times=0x08000000;R7=0xF7FFFFFF

MVNR10,#0x0F,#14

;0x0Fisrotatedright14times=0x003C0000;R10=0xFFC3FFFF

MVNR5,#0x66,#2

;0x66isrotatedright2times=0x80000019;R5=0x7FFFFFE6

GeneralFormationofProcessinstruction

NextwewillshowhowthedifferentoperandsaresupportedbyARM.InFigure3-4youseethegeneralformationofprocessinstructions.

Figure3-4:DataProcessInstructions

InallARMinstructionsbits28–31areputasideforconditionfieldwhichiscoveredinChapter4.

Bits26and27are0inprocessinstructionsshowingthattheinstructionisaprocessinstruction and the opcode is represented by bits 24–21. Using 4 bits 16 differentinstructionsareprovidedasshowninFigure3-4.

TheSbit (bit20) shows if the flagsshouldbeupdated.WhenweaddanS (e.g.MOVS)thebitwillbesetwhichshowsthattheflagsshouldbeupdatedbytheCPU.

Bits12–15containthedestinationregister.Using4bitswecanselectregistersR0–R15.

Bits16–19representthefirstoperandregister.

Bit25(I)showsthetypeofsecondoperand.Thebitis1whenthesecondoperandisanimmediatevalue.Wheneverthesecondoperandisaregisterthisbitiszero.

Immediate values: In the case that I is 1, bits 0–7, contain an immediate valuewhichcanbeanumberbetween0–0xFF.

Second register: In the case that I is 0, bits 0–11 represent the second operandregister, togetherwith the amount of shift/rotate and type of shift/rotate.Asmentionedearlier the shift amount can be provided either by a register or an immediate value.WhenevertheIbitiscleared,bit4showsthewayshiftamountisprovided.SeeFigure3-5.

Figure3-5:DataProcessInstructions

Example3-26shouldhelptoclarifythis.

Example3-26

UsingtheKeilIDE,assemblethefollowingprogramandcomparetheinstructionwiththeprocessinstructionformat.

AREAEX3_26,CODE,READONLY

ENTRY

ADDR0,R1,R5,LSR#2

ADDR0,R1,R5,LSRR2

ADDR0,R1,R5,LSLR2

ADDR0,R1,R5

ADDR0,R1,#5,#2

ADDR0,R1,#5

H1BH1

END

Solution:

In“ADDR0,R1,R5,LSR#2”thesecondoperandisaregister.Therefore,theIbitiscleared.Sincetheshiftamountisanimmediatevalue,thebit4iscleared,aswell.Theshifttypeissetto01representingLSR.

In“ADDR0,R1,R5,LSRR2”thesecondoperandisaregister.Therefore,theIbitiscleared.Asaregisterprovidestheshiftamount,thebit4isset.

The“ADDR0,R1,R5,LSLR2”instructionisthesameas“ADDR0,R1,R5,LSRR2”exceptthattheshifttypeissetto00torepresentLSL.

Themachinecoderepresentsinfacttheinstruction“ADD R0,R1,R5,LSL#0”.But,ifweshiftanumber0bitstoleft,thenumberremainsunchanged.Asaresult,itrepresents“ADD R0,R1,R5”.

In“ADDR0,R1,#5,#2”thesecondoperandisimmediate.Therefore,theIbitisset.Inimmediateoperandsthevalueisshiftedtwicetherotatefield.Thatiswhytherotatefieldis1.

In“ADDR0,R1,#5”,theimmediatevalueisrotated0bits.Therefore,theimmediatevalueremainsunchanged.

LoadingfixedvaluesintoRegisters

In this section, we examined loading fixed values into registers. However, aswehaveseensofar,itisnotpossibletocreateeveryconstantvaluebyusingtherotateoptionoftheinstructions.

AnotherwayistousecombinationoftheseinstructionssuchasMVN,MOV,ADD,AND,OR,andsoontoloadanyvalueintoaregister.ForexamplethefollowingprogramloadsR1with0x87654321:

MOVR1,#0x21;R1=0x00000021

ORRR1,R1,#0x43,#24

;R1=0x00000021ORed0x00004300=0x00004321

ORRR1,R1,#0x65,#16

;R1=0x00004321ORed0x00650000=0x00654321

ORRR1,R1,#0x87,#8

;R1=0x00654321ORed0x87000000=0x87654321

But it is very tedious and time consuming to come upwith such combination ofinstructions to get the resultwe need. For this reason theARM assembler has pseudo-instructionsuchasLDR.TheLDRisnotarealinstructionlikeMOVandADDsinceyouwillnotfindtheopcodeforitintheARMmanual.WhentheARMassemblerassemblesaprogramwiththeLDRitreplacestheLDRwithagroupofinstructionstodothejobofloading a fixed value in themost efficient way possible. Aswe have seen in previouschapters, theLDRuses = sign for the immediate value instead of # used by theMOVinstruction. Indeed this is thewaywedistinguishbetween them.Examine the followingcodestoseethedifferencesinsyntaxes:

MOVR1,#0x55;R1=0x55

LDRR2,=0x5555;R2=0x5555

ItneedstobeemphasizedthatwemustusetheMOVinstructionforloadingfixed

valueoflessthan(orequalto)0xFFandusetheLDRpseudo-instructiononlyforloadingvalues larger than0xFF.This isdue to thefact thatMOVisareal instructionand takesless memory space when it is compiled. See Chapter 6 for more on LDR and ADRpseudo-instructions.

ReviewQuestions

1.FindthecontentsofR3afterexecutingthefollowingcode:

MOVR0,#0x04

MOVR3,R0,LSR#2


LDRR1,=0xA0F2

MOVR2,#0x3

MOVR4,R1,LSRR2


LDRR1,=0xA0F2

MOVR2,#0x3

MOVR3,R1,LSLR2


SUBSR0,R0,R0

MOVR0,#0xAA

MOVR5,R0,ROR#4


LDRR2,=0xA0F2

MOVR1,0x4

MOVR0,R2,RRXR1

6.GivetheresultinR1forthefollowing:

MVNR1,#0x01,#2

7.GivetheresultinR2forthefollowing:

MVNR2,#0x02,#28

Section3.4:ShiftandRotateInstructionsinARMCortex(CaseStudy)ARMCortexhasnew instructions specifically for shift and rotate. In this section,

wediscusstheseinstructions.ForfullinstructionsetseeAppendixA.ForthepurposeofcomparisonwehavecopiedthissectionfromAppendixA.

LSLLogicalShiftLeft

LSLRd,Rm,Rn

Function:AseachbitofRmregisterisshiftedleft,theMSBisremovedandtheemptybitsarefilledwithzeros.ThenumberofbitstobeshiftedleftisgivenbyRnandtheresultisplacedinRdregister.TheLSLdoesnotupdatestheflags.

Example1:

LDRR2,=0x00000010

LSLR0,R2,#8;R0=R2isshiftedleft8times

;now,R0=0x00001000,flagsnotupdated

Example2:

LDRR0,=0x00000018

MOVR1,#12

LSLR2,R0,R1;R2=R0isshiftedleftR1numberoftimes

;now,R2=0x000018000,flagsnotupdated

Example3:

LDRR0,=0x0000FF18

MOVR1,#16


;now,R2=0xFF180000,flagsnotupdated

LSLSLogicalShiftLeft(updatetheflags)

LSLSRd,Rm,Rn

Function:AseachbitofRmregisterisshiftedleft,theMSBiscopiedtoCflagandtheemptybitsarefilledwithzeros.ThenumberofbitstobeshiftedleftisgivenbyRnandtheresultisplacedinRdregister.TheLSLSupdatestheflags.

Example1:

LDRR2,=0x00000010

LSLSR0,R2,#8;R0=R2isshiftedleft8times

;now,R0=0x00001000,C=0,N=0,Z=0

Example2:

LDRR0,=0x00000018

MOVR1,#12

LSLSR2,R0,R1;R2=R0isshiftedleftR1numberoftimes

;now,R2=0x000018000,C=0,N=0,Z=0

Example3:

LDRR0,=0x000FFF18

MOVR1,#16


;now,R2=0xFF180000,C=1,Z=0,N=0

Thelogicalshiftleftisusedforshiftingunsignednumbers.LSLSessentiallymultipliesRmbyapowerof2aftereachbitisshifted.

LSRLogicalShiftRight

LSRRd,Rm,Rn

Function:AseachbitofRmregisterisshiftedright,theLSBisremovedandtheemptybitsarefilledwithzeros.ThenumberofbitstobeshiftedleftisgivenbyRnandtheresultisplacedinRdregister.TheLSRdoesnotupdatetheflags.

Example1:

LDRR2,=0x00001000

LSRR0,R2,#8;R0=R2isshiftedright8times

;now,R0=0x00000010,C=0

Example2:

LDRR0,=0x000018000

MOVR1,#12

LSRR2,R0,R1;R2=R0isshiftedrightR1numberoftimes

;now,R2=0x00000018,C=0

Example3:

LDRR0,=0x7F180000

MOVR1,#16


;now,R2=0x00007F18,C=0

Thelogicalshiftrightisusedforshiftingunsignednumbers.LSRessentiallydividesRmbyapowerof2aftereachbitisshifted.

LSRSLogicalShiftRight(updatetheflags)

LSRSRd,Rm,Rn

Function:AseachbitofRmregisterisshiftedright,theLSBiscopiedtoCflagandtheemptybitsarefilledwithzeros.ThenumberofbitstobeshiftedleftisgivenbyRnandtheresultisplacedinRdregister.TheLSRSupdatestheflags.

Example1:

LDRR2,=0x00001FFF

LSRSR0,R2,#8;R0=R2isshiftedright8times

;now,R0=0x0000001F,C=1,N=0,Z=0

Example2:

LDRR0,=0x00000018

MOVR1,#12

LSRSR2,R0,R1;R2=R0isshiftedrightR1numberoftimes

;now,R2=0x000000000,C=0,N=0,Z=1,

Example3:

LDRR0,=0x000FFF18

MOVR1,#16


;now,R2=0x0000000F,C=1,Z=0,N=0

The logical shift right is used for shifting unsigned numbers. LSRS essentiallydividesRmbyapowerof2aftereachbitisshifted.

RORRotateRight

RORRd,Rm,Rn;Rd=rotateRmrightRnbitpositions

Function:AseachbitofRmregisterisshiftedfromlefttoright,theyexitfromtheend(LSB)andenteredfromleftend(MSB).ThenumberofbitstoberotatedrightisgivenbyRnandtheresultisplacedinRdregister.TheRORdoesnotupdatetheflags.

Example1:

LDRR2,=0x00000010

RORR0,R2,#8;R0=R2isrotatedright8times

;now,R0=0x10000000,C=0

Example2:

LDRR0,=0x00000018

MOVR1,#12

RORR2,R0,R1;R2=R0isrotatedrightR1numberoftimes

;now,R2=0x01800000,C=0

Example3:

LDRR0,=0x0000FF18

MOVR1,#16


;now,R2=0xFF180000,C=0

RORSRotateRight(updatetheflags)

RORSRd,Rm,Rn;Rd=rotateRmrightRnbitpositions

Function:AseachbitofRmregistershiftsfromlefttoright,theyexitfromtherightend(LSB)andenterfromtheleftend(MSB).InadditionaseachbitexitstheLSB,acopyofitisgiventoCflag.ThenumberofbitstoberotatedrightisgivenbyRnandtheresultisplacedinRdregister.TheRORSupdatestheflags.

Example1:

LDRR2,=0x00000010

RORSR0,R2,#8;R0=R2isrotatedright8times

;now,R0=0x01000000,C=0,N=0,Z=0

Example2:

LDRR0,=0x00000018

MOVR1,#12

RORSR2,R0,R1;R2=R0isrotatedrightR1numberoftimes

;now,R2=0x01800000,C=0,N=0,Z=0

Example3:

LDRR0,=0x0000FF18

MOVR1,#16


;now,R2=0xFF180000,C=1,N=0,Z=0

RRXRotateRightwithextend

RRXRd,Rm;Rd=rotateRmright1bitposition

Function:EachbitofRmregisterisshiftedfromlefttorightonebit.TheRRXdoesnotupdatetheflags.

Example:

LDRR2,=0x00000002

RRXR0,R2;R0=R2isshiftedrightonebit

;now,R0=0x00000001

RRXSRotateRightwithextend(updatetheflags)

RRXSRd,Rm;Rd=rotateRmright1bitposition

Function:EachbitofRmregisterisshiftedfromlefttorightonebit.TheRRXSupdatestheflags.

Example1:

LDRR2,=0x00000002

RRXSR0,R2;R0=R2isshiftedrightonebit

;now,R0=0x00000001

ReviewQuestions

1.Trueorfalse.ARMCortexhasitsowninstructionforrotateandshift.


MOVR1,#0x08

RORR2,R1,#2


MOVR2,#0x3

LSLR5,R3,#2

Section3.5:BCDandASCIIConversionThissectioncoversbinary,BCD,andASCIIconversionswithsomeexamples.

BCDnumbersystem

BCDstandsforbinarycodeddecimal.BCDisneededbecauseweusethedigits0to9fornumbersineverydaylife.BCDsystemiswidelyusedinreal-timeclock(RTC)oftheembeddedsystems.Binaryrepresentationof0to9iscalledBCD.IncomputerliteratureoneencounterstwotermsforBCDnumbers:(1)unpackedBCD,and(2)packedBCD.

Digit BCD

0 0000

1 0001

2 0010

3 0011

4 0100

5 0101

6 0110

7 0111

8 1000

9 1001

Table3-9:BCDCodes

UnpackedBCD

InunpackedBCD,thelower4bitsofthenumberrepresenttheBCDnumberandtherestofthebitsare0.Forexample,“00001001”and“00000101”areunpackedBCDfor9and5,respectively.InthecaseofunpackedBCDittakes1byteofmemorylocationoraregisterof8bitstoholdthenumber.

PackedBCD

In the caseofpackedBCD, a singlebytehas twoBCDnumbers in it, one in thelower4bitsandoneintheupper4bits.Forexample,“01011001”ispackedBCDfor59.Ittakesonly1byteofmemorytostorethepackedBCDoperands.ThisisonereasontousepackedBCDsinceitistwiceasefficientinstoringdata.

ASCIInumbers

InASCIIkeyboards,whenkey“0”ispressed,“0110000”(0x30)isprovidedtothecomputer.Inthesameway,0x31(0110001)isprovidedforkey“1”,andsoon,asshown

inthefollowinglist:

Key ASCII Binary(hex) BCD(unpacked)

0 30 0110000 00000000

1 31 0110001 00000001

2 32 0110010 00000010

3 33 0110011 00000011

4 34 0110100 00000100

5 45 0110101 00000101

6 36 0110110 00000110

7 37 0110111 00000111

8 38 0111000 00001000

9 39 0111001 00001001

Itmust be noted that althoughASCII is standard in theUnited States (andmanyother countries), BCD numbers have universal application. Now since the keyboard,printers,andmonitorsareallinASCII,howdoesdatagetconvertedfromASCIItoBCD,andviceversa?Thesearethesubjectscoverednext.

ASCIItounpackedBCDconversion

ToconvertASCIIdatatounpackedBCD,theprogrammermustgetridofthetagged“011” in theupper4bitsof theASCII.Todo that,eachASCIInumber isANDedwith“00001111”(0x0F).

ASCIItopackedBCDconversion

ToconvertASCIItopackedBCD,itisfirstconvertedtounpackedBCD(togetridofthe3)andthencombinedtomakepackedBCD.Forexample,for2and7thekeyboardgives0x32and0x37,respectively.Thegoalistoproduce0x27or“00100111”,whichiscalledpackedBCD,asdiscussedearlier.Thisprocessisillustratedindetailbelow.

Key ASCII UnpackedBCD PackedBCD

2 32 00000010

7 37 00000111 00100111(0x27)

MOVR1,#0x37;R1=0x37

MOVR2,#0x32;R2=0x32

ANDR1,R1,#0x0F;mask3togetunpackedBCD

ANDR2,R2,#0x0F;mask3togetunpackedBCD

MOVR3,R2,LSL#4;shiftR24bitstolefttogetR3=0x20

ORRR4,R3,R1;ORthemtogetpackedBCD,R4=0x27

PackedBCDtoASCIIconversion

Fordata tobedisplayedon themonitororbeprintedby theprinter, itmustbe inASCII format. Conversion from packed BCD to ASCII is discussed next. To convertpackedBCDtoASCII,itmustfirstbeconvertedtounpackedandthentheunpackedBCDis tagged with 011 0000 (0x30). The following shows the process of converting frompackedBCDtoASCII.

PackedBCD UnpackedBCD ASCII

0x29 0x02&0x09 0x32&0x39

00101001 00000010&00001001 0110010&0111001

MOVR0,#0x29

ANDR1,R0,#0x0F;maskupperfourbits

ORRR1,R1,#0x30;combinewith30togetASCII

MOVR2,R0,LSR#04;shiftright4bitstogetunpackedBCD

ORRR2,R2,#0x30;combinewith30togetASCII

ReviewQuestions

1.Forthefollowingdecimalnumbers,givethepackedBCDandunpackedBCDrepresentationsinbinary

(a)15(b)99

2.ForthefollowingpackedBCDnumbers,givethedecimalandunpackedBCDrepresentations.

(a)0x41(b)0x09

3.Repeatquestion2forASCII.

ProblemsSection3.1:ArithmeticInstructions

1.FindCandZflagsforeachofthefollowing.Alsoindicatetheresultoftheadditionandwheretheresultissaved.

(a)

MOVR1,#0x3F

MOVR2,#0x45

ADDSR3,R1,R2

(b)

LDRR0,=0x95999999

LDRR1,=0x94FFFF58

ADDSR1,R1,R0

(c)

LDRR0,=0xFFFFFFFF

ADDSR0,R0,#1

(d)

LDRR2,=0x00000001

LDRR1,=0xFFFFFFFF

ADDSR0,R1,R2

ADCSR0,R0,#0

(e)

LDRR0,=0xFFFFFFFE

ADDSR0,R0,#2

ADCR1,R0,#0x0

2.StatethethreestepsinvolvedinaSUBandshowthestepsforthefollowingdata.

(a)0x23–0x12(b)0x43–0x51(c)0x99–0x99


3.Assumethatthefollowingregisterscontainthesehexcontents:R0=0xF000,R1=0x3456,andR2=0xE390.Performthefollowingoperations.Indicatetheresultandtheregisterwhereitisstored.

Note:theoperationsareindependentofeachother.

(a)ANDR3,R2,R0 (b)ORRR3,R2,R1

(c)EORR0,R0,#0x76 (d)ANDR3,R2,R2

(e)EORR0,R0,R0 (f)ORRR3,R0,R2

(g)ANDR3,R0,#0xFF (h)ORRR3,R0,#0x99

(i)EORR3,R1,R0 (j)EORR3,R1,R1

4.GivethevalueinR2afterthefollowingcodeisexecuted:

MOVR0,#0xF0

MOVR1,#0x55

BICR2,R1,R0

5.GivethevalueinR2afterthefollowingcodeisexecuted:

LDRR1,=0x55555555

MVNR0,#0

EORR2,R1,R0

Section3.3:RotateandBarrelShifter

6.AssumingC=0,whatisthevalueofR1afterthefollowing?

MOVR1,#0x25

MOVSR1,R1,ROR#4

7.AssumingC=0,whatarethevaluesofR0andCafterthefollowing?

LDRR0,=0x3FA2

MOVR2,#8

MOVSR0,R0,RORR2

8.AssumingC=0whatisthevalueofR2andCafterthefollowing?

MOVR2,#0x55

MOVSR2,R2,RRX

9.AssumingC=0whatisthevalueofR1afterthefollowing?

MOVR1,#0xFF

MOVR3,#5

MOVSR1,R1,RORR3

10.Givetheregistervalueforeachofthefollowinginstructionsafteritisexecuted.

a)MOVR1,#0x88,#4 b)MOVR0,#0x22,#22

c)MOVR2,#0x77,#8 d)MOVR4,#0x5F,#28

e)MOVR6,#0x88,#22 f)MOVR5,#0x8F,#16

g)MOVR7,#0xF0,#20 h)MOVR1,#0x33,#28

11.Givetheregistervalueforeachofthefollowinginstructionsafteritisexecuted.

a)MVNR2,#0x1 b)MVNR2,#0xAA,#20

c)MVNR1,0x55,#4 d)MVNR0,#0x66,#28

e)MVNR1,#0x80,#24 f)MVNR6,#0x10,#20

g)MVNR7,#0xF0,#24 h)MVNR4,#0x99,#4

12.FindthecontentsofregistersandCflagafterexecutingeachofthefollowingcodes:

a)

MOVR0,#0x04

MOVSR1,R0,LSR#2

MOVSR3,R0,LSRR1

b)

LDRR1,=0xA0F2

MOVR2,#0x3

MOVSR3,R1,LSLR2

c)

LDRR1,=0xB085

MOVR2,#3

MOVSR4,R1,LSRR2

13.FindthecontentsofregistersandCflagafterexecutingeachofthefollowingcodes:

a)

SUBSR2,R2,R2

MOVR0,#0xAA

MOVSR1,R0,ROR#4

b)

MOVR2,#0xAA,#4

MOVR0,#1

MOVSR1,R2,RORR0

c)

LDRR1,=0x1234

MOVR2,#0x010,#2

MOVSR1,R0,RORR2

d)

MOVR0,#0xAA

MOVSR1,R0,RRX

14.UsingMOVinstruction,showhowyourotateleftthefixedvalueof0x33totalofa)4,b)8,andc)12times.Alsogivethevalueintheregisteraftertherotation.


15.Writeaprogramtoconvert0x76frompackedBCDnumbertoASCII.PlacetheASCIIcodesintoR1andR2.

16.For3and2thekeyboardgives0x33and0x32,respectively.Writeaprogramtoconvert0x33and0x32topackedBCDandstoretheresultinR2.

AnswerstoReviewQuestionsSection3.1:ArithmeticInstructions

1.TheADDSinstructionupdatestheflagbitswhileADDdoesnotdothat.

2.Rd=Rn+Op2+C

3.0x4F+0xB1=0x100,sinceitisabyteadditionandresultislessthan32-bittheC=0andZ=0.

4.0x4F+0xFFFFFFB1=00000000,sinceitisawordadditionandresultisgreaterthan32-bit,theC=1andZ=1.

5.

0x43 01000011 00000000000000000000000001000011

–0x05 00000101 2’scomplement= +11111111111111111111111111111011

0x3E 100000000000000000000000000111110

C=1;therefore,theresultispositive

6.R2=R2–R3–C+1=0x95–0x4F–1+1=0x46

7.R2

8.R2=1


1.(a)0x4202(b)0xCFFF(c)0x8DFD

2.Theoperandwillremainunchanged;allzeros

3.Allones

4.Allzeros

5.ORRR7,R7,#0x10;R7=R7ORed00010000

6.ANDR5,R5,#0x8;R5=R5ORed00001000

Section3.3:RotateandBarrelShifterOperation

1.R3=1

2.R4=0x0000141E

3.R3=0x00050790

4.R5=0xA000000A

5.R0=0x00005079

6.0xBFFFFFFF

7.0xFFFFFFDF

Section3.4:ShiftandRotateInstructionsinARMCortex

1.True

2.0x01

3.0x0C


1.(a)15=00010101packedBCD=0000000100000101unpackedBCD

(b)99=10011001packedBCD=0000100100001001unpackedBCD

2.(a)0x41=0000010000000001unpackedBCD=41indecimal

(b)0x09=0000000000001001unpackedBCD=9indecimal

3.(a)0x34,0x31

(b)0x30,0x39

Chapter4:Branch,Call,andLoopinginARMIn the sequence of instructions to be executed, it is often necessary to transfer

programcontroltoadifferentlocation.(e.g.whenafunctioniscalled,executionofaloopisrepeated,oraninstructionexecutesconditionally)TherearemanyinstructionsinARMto achieve this. This chapter covers the control transfer instructions available in ARMAssembly language. InSection4.1,wediscuss instructionsused for looping, aswell asinstructionsforconditionalandunconditionalbranches(jumps).Inthesecondsection,weexamine the instructions associated with calling subroutine. In Section 4.3, instructionpipeliningoftheARMisexamined.Instructiontimingandtimedelaysubroutinesarealsodiscussed in Section 4.3. In Section 4.4, we examine the conditional execution of theARMinstructionswhichisauniquefeatureofARM.

Section4.1:LoopingandBranchInstructionsInthissectionwefirstdiscusshowtoperformaloopingactioninARMandthenthe

branch(jump)instructions,bothconditionalandunconditional.

LoopinginARM

Repeating a sequenceof instructionsor anoperation a certainnumberof times iscalleda loop.The loop isoneof themostwidelyusedprogramming techniques. In theARM,thereareseveralwaystorepeatanoperationmanytimes.Onewayistorepeattheoperationoverandoveruntilitisfinished,asshownbelow:

MOVR0,#0;R0=0

MOVR1,#9;R1=9

ADDR0,R0,R1;R0=R0+R1,add9toR0(NowR0is0x09)


ADDR0,R0,R1;R0=R0+R1,add9toR0(NowR0is0x1B)


ADDR0,R0,R1;R0=0x2D

ADDR0,R0,R1;R0=0x36

Intheaboveprogram,weadd0x9toR0sixtimes.Thatmakes6×9=54=0x36.One problemwith the above program is that toomuch code spacewould be needed toincrease the number of repetitions to 50or 1000. Amuchbetterway is to use a loop.Next,wedescribethemethodtodoaloopinARM.

UsinginstructionBNEforlooping

TheBNE(branch ifnotequal) instructionuses thezero flag in thestatus register.TheBNEinstructionisusedasfollows:

BACK………;startoftheloop

………;bodyoftheloop

………;bodyoftheloop

SUBSRn,Rn,#1;Rn=Rn-1,settheflagZ=1ifRn=0

BNEBACK;branchifZ=0

Inthelasttwoinstructions,theRn(e.g.R2orR3)isdecremented;ifitisnotzero,itbranches(jumps)backtothetargetaddressreferredtobythelabel.Priortothestartoftheloop,theRnisloadedwiththecountervalueforthenumberofrepetitions.NoticethattheBNE instruction refers to the Z flag of the status register affected by the previousinstruction,SUBS.ThisisshowninExample4-1.

Example4-1

Writeaprogramto(a)clearR0,(b)add9toR0athousandtimes,then

(c)placethesuminR4.

UsethezeroflagandBNEinstruction.

Solution:

;–thisprogramaddsvalue9totheR0a1000times–

AREAEXAMPLE4_1,CODE,READONLY

ENTRY

LDRR2,=1000;R2=1000(decimal)forcounter

MOVR0,#0;R0=0(sum)

AGAINADDR0,R0,#9;R0=R0+9(add09toR1,R1=sum)

SUBSR2,R2,#1;R2=R2-1andsettheflags.Decrementcounter

BNEAGAIN;repeatuntilCOUNT=0(whenZ=1)

MOVR4,R0;storethesuminR4

HEREBHERE;stayhere

END

IntheprograminExample 4-1,registerR2isusedasacounter.Thecounterisfirstset to1000. Ineach iteration, theSUBSinstructiondecrements theR2andsets the flagbitsaccordingly.IfR2isnotzero(Z=0),itjumpstothetargetaddressassociatedwiththe

label“AGAIN”.ThisloopingactioncontinuesuntilR2becomeszero.AfterR2becomeszero(Z=1),itfallsthroughtheloopandexecutestheinstructionimmediatelybelowit,inthiscase“MOVR4,R0”.

ItmustbeemphasizedagainthatwemustuseSUBSinsteadofSUBsincetheSUBinstructionwillnotchange(update)theflags.SincewearemonitoringtheZflagfortheloop counter we must use SUBS instruction for the decrementing the counter. As wementioned inChapter3,manyof theARMinstructionshave theoptionofaffecting theflags.Intheseinstructionsthedefaultisnottoaffecttheflags.Thereforetomakethemtoeffect the flag we must add letter S to the instruction. That means SUBS and ADDSinstructions are different from SUB and ADD, as far as the flags are concerned. AsanotherexampleseeExample4-2.

Example4-2

Writeaprogramtoplacevalue0x55into100bytesofRAMlocations.

Solution:


ENTRY

RAM_ADDREQU0x40000000;changetheaddressforyourARM

MOVR2,#25;counter(25times4=100byteblocksize)

LDRR1,=RAM_ADDR;R1=RAMAddress

LDRR0,=0x55555555;R0=0x55555555

OVERSTRR0,[R1];sendittoRAM

ADDR1,R1,#4;R1=R1+4toincrementpointer

SUBSR2,R2,#1;R2=R2–1fordec.counter

BNEOVER;keepdoingit

HEREBHERE

END

Loopingatrilliontimeswithloopinsidealoop

AsshowninExample 4-3,themaximumcountis232-1.Whathappensifwewanttorepeatanactionmoretimesthanthat?Todothat,weusealoopinsidealoop,whichis

called a nested loop. In a nested loop, we use two registers to hold the count. SeeExample 4-3.

Example4-3

ExplainwhatisthemaximumnumberoftimesthattheloopinExample4-1canberepeated?Now,writeaprogramto(a)loadtheR0registerwiththevalue0x55,and(b)complementit16,000,000,000(16billion)times.

Solution:

BecauseRxisa32-bitregister,itcanholdamaximumof0xFFFFFFFF(232–1decimal);therefore,theloopcanberepeatedamaximumof232–1times.Thisexampleshowshowtocreateanestinglooptogobeyond4billiontimes.Because16,000,000,000islargerthan0xFFFFFFFF(themaximumcapacityofanyR0–R12registers),weusetworegisterstoholdthecount.ThefollowingcodeshowshowtouseR2andR1asaregisterforcountersinanestingloop.


ENTRY

MOVR0,#0x55;R0=0x55

MOVR2,#16;load16intoR2(outerloopcount)

L1LDRR1,=1000000000;R1=1,000,000,000(innerloopcount)

L2EORR0,R0,#0xFF;complementR0(R0=R0Ex-OR0xFF)

SUBSR1,R1,#1;R1=R1–1,dec.R1(innerloop)

BNEL2;repeatituntilR1=0

SUBSR2,R2,#1;R2=R2–1,dec.R2(outerloop)

BNEL1;repeatituntilR2=0

HEREBHERE;stayhere

END

In this program,R1 is used to keep the inner loop count. In the instruction “BNEL2”,whenever R1 becomes 0 it falls through and “SUBS R2,R2,#1” is executed. The nextinstructions force theCPUto load the innercountwith1,000,000,000 ifR2 isnotzero,andtheinnerloopstartsagain.ThisprocesswillcontinueuntilR2becomeszeroandtheouterloopisfinished.IfyouusetheKeilIDEtoverifytheoperationoftheaboveprogram

usesmallervaluesforcountertogothroughtheiterations.SeeFigure4-1.

Figure4-1:FlowchartforExample4-3

OtherconditionalBranches

Aswementioned inChapter 3,C andZ flags reflect the result of calculation onunsigned numbers. Table 4-1 lists available conditional branches for unsigned numbersthatuseCandZflags.Moredetailsofeach instructionareprovided inAppendixA.InTable 4-1noticethattheinstructions,suchasBEQ(BranchifZ=1)andBCS(Branchif

carry set, C = 1), jump only if a certain condition is met. Next, we examine someconditionalbranch instructionswithexamples.Theotherconditionalbranch instructionsassociatedwiththesignednumbersarediscussedinChapter5whenarithmeticoperationsforsignednumbersarediscussed.

Instruction Action

BCS/BHS branchifcarryset/branchifhigherorsame BranchifC=1

BCC/BLO branchifcarryclear/branchlower BranchifC=0

BEQ branchifequal BranchifZ=1

BNE branchifnotequal BranchifZ=0

BLS branchiflessorsame BranchifZ=1orC=0

BHI branchifhigher BranchifZ=0andC=1

Table4-1:ARMConditionalBranchInstructionsforUnsignedData

BCC(branchifcarryclear,branchifC=0)

In this instruction, thecarry flagbit inprogramstatus registers (CPSR) isused tomakethedecisionwhethertobranch.Inexecuting“BCClabel”,theprocessorlooksatthecarry flag to see if it is cleared (C = 0). If it is, the CPU starts to fetch and executeinstructionsfromtheaddressofthelabel.IfC=1, itwillnotjumpbutwillexecutethenextinstructionbelowBCC.SeeExample4-4.

Example4-4

ExaminethefollowingcodeandgivetheresultinregistersR0,R1,andR2.

MOVR1,#0;clearhighword(R1=0)

MOVR0,#0;clearlowword(R0=0)

LDRR2,=0x99999999;R2=0x99999999

ADDSR0,R0,R2;R0=R0+R2andsettheflags

BCCL1;ifC=0,jumptoL1andaddnextnumber

ADDSR1,R1,#1;ELSE,increment(R1=R1+1)

L1ADDSR0,R0,R2;R0=R0+R2andsettheflags

BCCL2;ifC=0,addnextnumber

ADDSR1,R1,#1;ifC=1,increment

L2ADDSR0,R2;R0=R0+R2andsettheflags


ADDSR1,R1,#1;C=1,increment



ADDSR1,R1,#1;ifC=1,andsettheflags

L4

Solution:

Thisprogramadds0x99999999togetherfourtimes.

R1(highbyte) R0(lowbyte)

Atfirst 0 0

JustbeforeL1 0 0x99999999


JustbeforeL3 1 0xCCCCCCCB


Hereistheloopversionoftheaboveprogramthatruns10times.


ENTRY



LDRR2,=0x99999999;R2=0x99999999

MOVR3,#10;counter


BCCNEXT;ifC=0,addnextnumber

ADDR1,R1,#1;ifC=1,incrementtheupperword

NEXTSUBSR3,R3,#1;R3=R3-1andsettheflags

;(Decrementcounter)

BNEL1;nextroundifZ=0

HEREBHERE;stayhere

END

Notethatthereisalsoa“BCSlabel”instruction.IntheBCSinstruction,ifC=1itjumps to the target address. We will give more examples of these instructions in thecontextofapplications.

Comparisonofunsignednumbers

CMPRn,Op2;compareRnwithOp2andsettheflags

TheCMPinstructioncomparestwooperandsandchangestheflagsaccordingtotheresult of the comparison. The operands themselves remain unchanged. There is nodestination register and the second source operands can be a register or an immediatevaluenotlargerthan0xFF.Itmustbeemphasizedthat“CMPRn,Op2”instructionisreallyasubtractoperation.Op2issubtractedfromRn(Rn–Op2)andtheresultisdiscardedandflags are set accordingly. The contents of Rn and Op2 remain unchanged after theexecutionofCMPinstruction.AlthoughalltheC,S,Z,andVflagsreflecttheresultofthecomparison,onlyCandZareusedforunsignednumbers,asoutlinedinTable4-2.

Instruction C Z

Rn>Op2 1 0

Rn=Op2 1 1

Rn<Op2 0 0

Table4-2:FlagSettingsforCompare(CMPRn,Op2)ofUnsignedData

Lookatthefollowingcase:

LDRR1,=0x35F;R1=0x35F

LDRR2,=0xCCC;R2=0xCCC

CMPR1,R2;compare0x35Fwith0xCCC

BCCOVER;branchifC=0

MOVR1,#0;ifC=1,thenclearR1

OVERADDR2,R2,#1;R2=R2+1=0xCCC+1=0xCCD

Figure4-2showsthediagramandtheClanguageversionofthecode.

Pseudocode:

R1=0x35F

R2=0xCCC

InC:

R1=0x35F;

R2=0xCCC;

IF(R1>=R2)THEN

R1=0

ENDIF

R2=R2+1

if(R1>=R2)

{

R1=0;

}

R2=R2+1;

Figure4-2:FlowchartofifInstruction

Intheaboveprogram,R1islessthantheR2(0x35F<0xCCC);therefore,C=0andBCC(branchifcarryclear)willgototargetOVER.Incontrast,lookatthefollowing:

LDRR1,=0xFFF

LDRR2,=0x888

CMPR1,R2;compare0xFFFwith0x888

BCCNEXT

ADDR1,R1,#0x40

NEXTADDR1,R1,#0x25

In theabove,R1 isgreater thanR2 (0xFFF>0x888),which setsC=1,making“BCCNEXT”fallthroughsothat“ADDR1,R1,0x40”isexecuted.

Again,itmustbeemphasizedthatinCMPinstructions,theoperandsareunaffectedregardlessoftheresultofthecomparison.Onlytheflagsareaffected.ThisisdespitethefactthatCMPusestheSUBoperationtosetorresettheflags.Italsomaybenotedthat,unlike other arithmetic and logic instructions, there is no need to put S in the CMPinstructiontoupdatetheflags.Inotherwords,theCMPinstructionautomaticallyupdatestheflags.

Program4-1usestheCMPinstructiontosearchforthehighestbyteinaseriesof5databytes.Tosearchforthehighestvaluetheinstruction“CMPR1,R3”worksasfollows,whereR1 is the contents of thememory location brought into R1 register by the [R2]pointer.

a)IfR1<R3,thenC=0andR3becomesthebasisofthenewcomparison.

b) IfR1≥R3, thenC=1andR1 is the largerof the twovaluesand remains thebasisofcomparison.

Program4-1Assumethatthereisaclassoffivepeoplewiththefollowinggrades:69,87,96,45,and75.

Findthehighestgrade.

;searchingforhighestvalue

COUNTRNR0;COUNTisthenewnameofR0

MAXRNR1;MAXisthenewnameofR1

;(MAXhasthehighestvalue)

POINTERRNR2;POINTERisthenewnameofR2

NEXTRNR3;NEXTisthenewnameofR3

AREAPROG_4_1D,DATA,READONLY

MYDATADCD69,87,96,45,75


ENTRY

MOVCOUNT,#5;COUNT=5

MOVMAX,#0;MAX=0

LDRPOINTER,=MYDATA

;POINTER=MYDATA(addressoffirstdata)

AGAINLDRNEXT,[POINTER]

;loadcontentsofPOINTERlocationtoNEXT

CMPMAX,NEXT

;compareMAXandNEXT

BHSCTNU

;ifMAX>NEXTbranchtoCTNU

MOVMAX,NEXT;MAX=NEXT

CTNUADDPOINTER,POINTER,#4

;POINTER=POINTER+4topointtothenext

SUBSCOUNT,COUNT,#1;decrementcounter

BNEAGAIN;branchAGAINifcounterisnotzero

HEREBHERE

END

Program4-1searchesthroughfivedataitemstofindthehighestvalue.Theprogramhasavariablecalled“MAX”thatholds thehighestgradefoundsofar.Onebyone, thegradesarecompared toHighest. Ifanyof them ishigher, thatvalue isplaced inMAX.

Thiscontinuesuntilalldataitemsarechecked.AREPEAT-UNTILstructurewaschosenintheprogramdesign.Figure4-3showstheflowchartforProgram4-1.Thisdesigncouldbeusedtocodetheprograminmanydifferentlanguages.

Pseudocode:

Count=5

Highest=0

REPEAT

IF(Next>MAX)

THEN

MAX=Next

ENDIF

Incrementpointer

DecrementCount

UNTILCount=0

;nowMAXistheHighest

InC:

//InKeil,longis32-bitwide

unsignedlongmyData[5]={69,87,96,45,75};

unsignedlongcount=5;

unsignedlongmax=0;

unsignedlongnext;

unsignedlong*pointer=myData;

do

{

next=*pointer;

if(max<next)

max=next;

pointer++;

count—;

}while(count!=0);

Figure4-3:FlowchartandPseudocodeforProgram4-1

Using CMP followed by conditional branches we can make any comparison onunsigned numbers, as shown in Table 4-3. Although BCS (branch carry set) and BCC(branchcarryclear)checkthecarryflagandcanbeusedafteracompareinstruction,itisrecommendedthatBHS(branchhigherorsame)andBLO(branchbelow)beusedfortworeasons.OnereasonisthatassemblerswillunassembleBCSasBLO,andBCCasBHS,whichmaybeconfusingtobeginnerprogrammers.Anotherreasonisthat“branchhigher”and “branch below” are easier to understand than “branch carry set” and “branch carryclear,”sinceitismoreimmediatelyapparentthatonenumberislargerthananother,thanwhetheracarrywouldbegeneratedifthetwonumbersweresubtracted.

Instruction Action

BCS/BHS branchifcarryset/branchifhigherorsame BranchifRn≥Op2

BCC/BLO branchifcarryclear/branchlower BranchifRn<Op2

BEQ branchifequal BranchifRn=Op2

BNE branchifnotequal BranchifRn≠Op2

BLS branchiflessorsame BranchifRn≤Op2

BHI branchifhigher BranchifRn>Op2

Table4-3:ARMConditionalBranchInstructionsforUnsignedData

DivisionofunsignednumbersinARM

TheolderARMfamilymembersdonothaveaninstructionfordivisionofunsignednumberssinceittooktoomanygatestoimplementit.However,manyoftheARMCortexchipsimplementthedivideinstruction.InARMswithnodivideinstructionswecanuseSUB instruction to perform the division. Program 4-2 shows an example of unsigneddivisionusingsubtractoperation.Intheprogramthenumeratorisplacedinaregisterandthedenominatorissubtractedfromitrepeatedly.Thequotientisthenumberoftimeswesubtractedandtheremainderisintheregisteruponcompletion.SeeFigure4-4.

Program4-2:DivisionbyRepeatedSubtractions

AREAPROG_4_2,CODE,READONLY;Divisionbysubtractions

ENTRY

LDRR0,=2012;R0=2012(numerator)

;itwillcontainremainder

MOVR1,#10;R1=10(denominator)

MOVR2,#0;R2=0(quotient)

L1CMPR0,R1;CompareR0withR1toseeiflessthan10

BLOFINISH;ifR0<R1jumptofinish

SUBR0,R0,R1;R0=R0-R1(divisionbysubtraction)

ADDR2,R2,#1;R2=R2+1(quotientisincremented)

BL1;gotoL1(Bisdiscussedinthenextsection)

FINISHBFINISH

Pseudocode:

NUM=2012

DENOM=10

WHILENUM>=DENOM

SubtractDENOMfromNUM

IncrementQUOTIENT

ENDWHILE

InC:

R0=2012;

R1=10;

while(R0>=R1)

{

R0=R0-R1;

R2=R2+1;

}

Figure4-4:FlowchartandPseudo-codeforProgram4-2

TST(Test)

TSTRn,Op2;RnANDwithOp2andflagbitsareupdated

TheTSTinstructionisusedtotestthecontentsofregistertoseeifanybitissettoHIGH. After the operands are ANDed together the flags are updated. After the TSTinstructionifresult iszero,thenZflagisraisedandonecanuseBEQ(branchequal)tomakedecision.Lookatthefollowingexample:

MOVR0,#0x04;R0=00000100inbinary

LDRR1,=myport;portaddress

OVERLDRBR2,[R1];loadR2frommyport

TSTR2,R0;isbit2HIGH?

BEQOVER;keepchecking

InTST,likeotherlogicorarithmeticdataprocessinginstructions,theOp2canbeanimmediatevalueoflessthan0xFF.Lookatthefollowingexample:



TSTR2,#0x04;isbit2HIGH?

BEQOVER;keepchecking

SeeExample4-5.

Example4-5

Assumeaddresslocation0x200000isassignedtoaninputportaddressandconnectedto8DIPswitches.WriteasimpleshortprogramtocheckthePORTandwheneverbothpins4or6areLOW,R4registerisincremented.

Solution:

MYPORTEQU0x200000

MOVR0,#2_01010000;R0=0x50(01010000inbinary)

LDRR1,=MYPORT;R1=portaddress

OVERLDRBR2,[R1];getabytefromPORTandplaceitinR2

TSTR2,R0;arebits4and6LOW?

BNEOVER;keepchecking

ADDR4,R4,#1

TEQ(testequal)

TEQRn,Op2;RnEX-ORedwithOp2andflagbitsareset

TheTEQinstructionisusedtotesttoseeifthecontentsoftworegistersareequal.After the source operands are Ex-ORed together the flag bits are set according to theresult.AftertheTEQinstructionifresultis0,thenZflagisraisedandonecanuseBEQ(branch zero) tomake decision.Recall that ifweExclusive-OR a valuewith itself, theresult is zero.Look at the following example for checking the temperatureof 100on agivenport:

TEMPEQU100

MOVR0,#TEMP;R0=Temp



TEQR2,R0;isit100?

BNEOVER;keepchecking

Unconditionalbranch(jump)instruction

Theunconditionalbranchisajumpinwhichcontrolistransferredunconditionallytothetarget location.IntheARMtherearetwounconditionalbranches:B(branch)andBX(branchandexchange).Thisisdiscussednext.

B(Branch)

B(branch)isanunconditionaljumpthatcangotoanymemorylocationinthe32MbyteaddressspaceoftheARM.AnothersyntaxforBinstructionisBAL(branchalways).

Bhasdifferentusageslikeimplementingif/else,while,andforinstructions.Inthefollowingcodeyouseeanexampleofimplementingtheif/elseinstruction:

CMPR1,R2

BHSL1

MOVR3,#2

BOVER

L1MOVR3,#5

OVER

//inC

if(R1<R2)

{

R3=2;

}

else

{

R3=5;

}

Intheabovecode,R3isinitializedwith2whenR1islowerthanR2.Otherwise,itisinitializedwith5.

Asanexampleofimplementingthewhileinstructionseethefollowingprogram.Itcalculatesthesumofnumbersbetween1and5:

MOVR1,#1

MOVR2,#0

L1CMPR1,#5

BHIL2

ADDR2,R2,R1

ADDR1,R1,#1

BL1

L2MOVR3,#5

//inC

unsignedlongR1=1;

unsignedlongR2=0;

while(R1<=5)

{

R2=R2+R1;

R1=R1+1;

}

Thefor instructioncanbeimplementedthesamewayasthewhileinstruction.Forexample,theaboveassemblyprogramcanbeconsideredasaforloop.

Incaseswherethereisnooperatingsystemormonitorprogram,weusetheBranchto itself inorder tokeep themicrocontrollerbusy.Asimplewayofdoing that isshownbelow:

HEREBHERE;stayhere

AnothersyntaxfortheBinstructionisBAL(branchalways)asshownbelow:

HEREBALHERE;stayhere

SinceARMinstructionis32-bit,8bitsareusedfortheopcode,andtheother24bitsrepresenttheaddressofthetargetlocation.The24-bittargetaddressisshiftedlefttwiceandthatallowsajumpto–32Mto+32Mbytesofmemorylocationsfromtheaddressofcurrentinstruction.Next,weexplainthereasonforthis.

Allbranchesareshortbranches(jumps)

Itmustbenotedthatallbranchinstructions(conditionalandunconditional)areshortjumps,meaningtheaddressofthetargetmustbewithin32Mbytesoftheprogramcounter(PC). That means the short jumps cannot cover the entire address space of 4G bytes(0x00000000to0xFFFFFFFF).

Calculatingtheshortbranchaddress

AllconditionalbranchessuchasBCC,BEQ,andBNEareshortbranches.This isduetothefactthattheARMinstructionsare32-bitandsomeofthe32-bitmustbeusedforopcodeandthereforeitcannotcovertheentire4GbytesaddressspaceofARM.Inthebranchinstructiontheopcodeis8bitsandtherelativeaddressis24bits.SeeFigure4-5.Thetargetaddressisrelativetothevalueoftheprogramcounter.Iftherelativeaddressispositive, the jump is forward. If the relative address is negative, then the jump isbackwards. The relative address can covermemory space of –32Mbytes to +32Mbytes

fromcurrentlocationofprogramcounter.Thereasonfor32MBisthefactthat24bitsareshiftedlefttwice(multipliedby4)bytheARMCPUautomatically.Thatgivesus26bits.Now, since one bit is used for positive or negative sign, we have only 25 bits formagnitude.The25bitsmagnitudegivesus32Mbytes(225=32M)ineachdirection.Thatis–32Mbytesif it isbackwardand+32MBif it isforwardjump.Therefore, tocalculatethetargetaddress,therelativeaddressisshiftedlefttwiceandaddedtotheaddress ofthenextinstructiontobefetched[targetaddress=relativeaddressshiftedlefttwice+addressof the next instruction to be fetched]. It must be noted that the next instruction to befetched is two instructions below the current branch instruction. The reason is theinstructionrightbelowthebranchisalreadyinthepipeline.SeeFigure 4-5.

Figure4-5:B(Branch)Instruction

Youmight askwhyweadd the relativeaddress to theaddressof two instructionsbelowthecurrentinstruction.(whydon’tweaddtherelativeaddresstotheaddressoftheinstruction rightbelow thecurrent instructionas it is inotherCPUs).This isdue to thepipelineissuesaswewillseeinnextsectionandChapter7.

Now, to calculate the target address of the branch we add address of the nextinstructiontobefetchedtotheoffsetshiftedleft twice(multipliedby4).Thereasonforshifting left twice is tomake sure it is word aligned. Given the fact that all theARMinstructionsare4-byte (word) longandarewordaligned,with225=32Mbytesaddressspacefortheforwardandbackwardjumpsthetargetaddressofjump(branch)canbeupto8Minstructions(32MBytes/4byes=8Minstructions)fromthecurrent instructionineachdirection.Althoughthisdoesnotcovertheentire1Ginstructions(4GBytes/4byes=1Ginstructions)ofARMmemoryspace,itismorethanadequateformanyapplications.SeeExample 4-6.

Example4-6

InARM7,thenextinstructiontobefetchedis2instructionsbelowthecurrentexecutinginstruction.Usingthefollowinglistfileverifythejumpforwardaddresscalculation.

LINEADDRESSMachineMnemonicOperand

100000000AREAEXAMPLE_4_6,CODE,READONLY

200000000ENTRY

300000000E3A01015MOVR1,#0x15;R1=0x15

400000004EA000002BTHERE

500000008E3A01025MOVR1,#0x25;R1=0x25

60000000CE3A02035MOVR2,#0x35;R2=0x35

700000010E3A03045MOVR3,#0x45;R3=0x45

800000014E3A04055THEREMOVR4,#0x55;R4=0x55

900000018EAFFFFFEHEREBHERE

10END

Solution:

FirstnoticethattheBinstructioninline4jumpsforward.Tocalculatethetargetaddress,therelativeaddress(offset)isshiftedlefttwiceandaddedtothePC ofthenextinstructiontobefetched.Thepositionofthenextinstructiontobefetchedis2instructionsbelowthecurrentinstruction.RecallthateachinstructionofARMtakes4bytes.Sothenextinstructiontobefetchedis2×4bytes=8bytesbelowthecurrentinstructionaddress(00000004).Sotheaddressofthenextinstructiontobefetchedis0000004+8=000000C(thepositionofMOVinstructioninline6).Inline4theinstruction“BTHERE”hasthemachinecodeofEA000002.Todistinguishtheoperandandopcodeparts,weshouldcomparethemachinecodewiththeBinstructionformat.Inthefollowing,youseetheformatoftheBinstruction.InthisexamplethemachinecodeisEA000002.IfwecompareitwiththeBformat,weseethat,theoperandis000002andtheopcodeisEA.The000002istheoffset,relativetotheaddressofthenextinstruction.Recallthattocalculatethetargetaddress,therelativeaddress(offset)isshiftedlefttwiceandaddedtothecurrentvalueofthePC(ProgramCounter).Shiftingtheoffset(000002)lefttwiceresultsin000008andthenaddingittotheaddressofthenextinstructiontobefetched(0000000C)wehave000008+0000000C=00000014whichisexactlytheaddressofTHERElabel.Allthejumpinstructions,whosemnemonicsbeginwithB,havethesameinstructionformat,andtheopcodechangesfrominstructiontoinstruction.So,wecancalculatetheshortbranchaddressforanyofthem,aswejustdidinthisexample.

Itmustalsobenotedthatforthebackwardbranchtherelativevalueisnegative(2’scomplement).ThatisshowninExample 4-7.

Example4-7

VerifythecalculationofbackwardjumpsforthelistingofExample4-1,shownbelow.

LINEADDRESSMachineMnemonicOperand

500000000E3A02FFALDRR2,=1000;R2=1000

600000004E3A00000MOVR0,#0;R0=0,sum

700000008E2800009AGAINADDR0,R0,#9;R0=R0+9

80000000CE2522001SUBSR2,R2,#1;R2=R2-1

9000000101AFFFFFCBNEAGAIN;repeat

1000000014E1A04000MOVR4,R0;storethesuminR4

1100000018EAFFFFFEHEREBHERE;stayhere

120000001CEND

Solution:

Intheprogramlist,“BNEAGAIN”inline9hasmachinecode1AFFFFFC.Tospecifytheoperandandopcode,wecomparetheinstructionwiththebranchinstructionformat,whichyousawinthepreviousexample.Theopcodeis1Aandtheoperand(relativeoffsetaddress)isFFFFFC.TheFFFFFCgivesus–4,whichmeansthedisplacementis(–4×4=–16=–0x10).

Thebranchislocatedinaddress0x0010.Theaddressofthenextinstructiontobefetchedistwoinstructionsaheadofcurrentbranchinstruction,andeachinstructionis4-bytewide.Therefore,addressofthenextinstructiontobefetched=0x0010+(2×4)=0x0018

Whentherelativeaddressof–0x10isaddedto00000018,wehave–0x0010+0x0018=0x08

Noticethat00000008istheaddressofthelabelAGAIN.

FFFFFCisanegativenumberandthatmeansitwillbranchbackward.Forfurtherdiscussionoftheadditionofnegativenumbers,seeChapter5.

Branchingbeyond32Mbytelimit

To branch beyond the address space of 32M bytes, we use BX (branch andexchange) instruction.The “BXRn” instructionuses registerRn to hold target address.

SinceRncanbeanyoftheR0–R14registersandtheyare32-bitregisters,the“BXRn”instruction can land anywhere in the 4G bytes address space of the ARM. In theinstruction“BXR2”theR2isloadedintotheprogramcounter(R15)andCPUstartstofetch instructions from the target address pointed to by R15, the program counter. SeeFigure 4-6.Sincetheinstructionsarewordaligned,wemustmakesurethatthelowertwobitsoftheRnare0s.

Figure4-6:BX(Branchandexchange)InstructionTargetAddress

TheBXinstructionisalsousedtoswitchtoTHUMBversionoftheARMCPU.FormoreinformationseetheARM manual.

ReviewQuestions

1.ThemnemonicBNEstandsfor_______.

2. True or false. “BNEBACK”makes its decision based on the last instructionaffectingtheZflag.

3.“BNEHERE”isa___-byteinstruction.

4.In“BEQNEXT”,whichflagbitischeckedtoseeifitishigh?

5.B(ranch)isa(n)___-byteinstruction.

6.CompareBandBXinstructions.

Section4.2:CallingSubroutinewithBLAnothercontroltransferinstructionistheBL(branchandlink)instruction,whichis

used to call a subroutine. Subroutines are often used to perform tasks that need to beperformed frequently. This makes a program more structured in addition to savingmemoryspace.In theARMthere isonlyoneinstructionforcallandthat isBL(branchandlink).TouseBLinstructionforcall,wemustleavetheR14registerunusedsincethatis where the ARM CPU stores the address of the next instruction where it returns toresumeexecutingtheprogram.

BL(BranchandLink)instructionandcallingsubroutine

In the32-bit instructionBL, 8 bits are used for theopcode and theother 24bits,k23–k0, are used for the address of the target subroutine, just like in the Branchinstruction. Therefore, BL can be used to call subroutines located anywherewithin the32MaddressspaceoftheARM,asshowninFigure 4-7.

Figure4-7:BL(BranchandLink)Instruction

Tomake sure that theARMknowswhere to comeback to after executionof thecalled subroutine, the ARM automatically saves in the link register (LR), the R14, theaddressoftheinstructionimmediatelybelowtheBL.WhenasubroutineiscalledbytheBL instruction, control is transferred to that subroutine, and the processor saves thePC(program counter) in the R14 register and begins to fetch instructions from the newlocation.Afterfinishingexecutionofthesubroutine,wemustuse“BXLR“instructiontotransfercontrolbacktothecaller.Everysubroutineneeds“BXLR”asthelastinstructionforreturnaddress.

BLinstructionandtheroleoflinkerregister

When a subroutine is called using BL instruction, first the processor saves theaddress of the instruction just below theBL instruction on theR14 register (LR, linkerregister), and thencontrol is transferred to that subroutine.This ishow theCPUknows

wheretoresumewhenitreturnsfromthecalledsubroutine.

Thelinkerregisterandreturningfromsubroutine

ThereisnoRETurninstructioninARM7.Thereforeattheendofthesubroutinewemustcopythelinkerregister,R14,totheprogramcounter(R15).WhentheBLinstructionis executed the address of the instruction below the BL instruction is placed into R14register, so, when the execution of the function finishes, the address of the instructionbelow the BL must be reloaded back into the PC, and so the CPU can resume theexecutionofinstructionsbelowtheBLinstruction.

TounderstandtheroleoftheR14registerinBLinstructionandthereturn,examinetheExamples 4-8.ThefollowingpointsshouldbenotedfortheExample 4-8:

1. Notice theDELAY subroutine.Upon executing the first “BLDELAY”, theaddress of the instruction right below it, “MOVR0,#0xAA”, is savedonto theR14register,andtheARMstartstoexecuteinstructionsatDELAYsubroutine.

2.IntheDELAYsubroutine,firstthecounterR3issetto5(R3=5);therefore,theinnerloopisrepeated5times.WhenR3becomes0,controlfallstothe“BXLR”instruction,which restores the address into the program counter and returns tomainprogramtoresumeexecutingtheinstructionsaftertheBL.

Example4-8

Writeaprogramtotoggleallthebitsofaddress0x40000000bysendingtoitthevalues0x55and0xAAcontinuously.Putatimedelaybetweeneachissuingofdatatoaddresslocation.

Solution:


ENTRY

RAM_ADDREQU0x40000000;changetheaddressforyourARM

LDRR1,=RAM_ADDR;R1=RAMaddress

AGAINMOVR0,#0x55;R0=0x55

STRBR0,[R1];sendittoRAM

BLDELAY;calldelay(R14=PCofnextinstruction)

MOVR0,#0xAA;R0=0xAA

STRBR0,[R1];sendittoRAM

BLDELAY;calldelay

BAGAIN;keepdoingit

;––––––—DELAYSUBROUTINE

DELAYLDRR3,=5;R3 =5,modifythisvaluefordifferentsizedelay

L1SUBSR3,R3,#1;R3=R3-1

BNEL1

BXLR;returntocaller

;––––––—endofDELAYsubroutine

END;noticetheplaceforENDdirective

UseKeilIDEsimulatorforARMtosimulatetheaboveprogramandexaminetheregistersandmemorylocation0x40000000.Youmighthavetochangetheaddress0x40000000tosomeothervaluedependingontheRAMaddressoftheARMchipyouuse.

Inaboveprogram,inplaceof“BXLR”forreturn,wecouldhaveused“BXR14”,“MOV R15,R14”,or“MOVPC,LR”instructions.Allofthemdothesamething;butitisrecommendedtousethe“BXLR”instruction.

Theamountof timedelay inExample 4-8dependson the frequencyof theARMchip.Howtocalculatethetimewillbeexplainedinthelastsectionofthischapter.

MainProgramandCallingSubroutines

In realworld projectswe divide the programs into small subroutines (also calledfunctions) and the subroutines are called from themain program.Figure 4-8 shows theformat.

;MAINprogramcallingsubroutines

AREAPogramName,CODE,READONLY

ENTRY

MAINBLSUBR_1;CallSubroutine1

BLSUBR_2;CallSubroutine1

BLSUBR_3;CallSubroutine1

HEREBALHERE;stayhere.BAListhesameasB

;––-endofMAIN

;––––––—SUBROUTINE1

SUBR_1….

….

BXLR;returntomain

;––endofsubroutine1


SUBR_2….

….

BXLR;returntomain



SUBR_3….

….

BXLR;returntomain


END;noticetheENDoffile

Figure4-8:ARMAssemblyMainProgramThatCallsSubroutines

Program4-3showsanexampleofthemainprogramcallingsubroutine.

Program4-3

;Thisprogramfillsablockofmemorywithafixedvalueand

;thentransfers(copies)theblocktonewareaofmemory

AREAPROGRAM4_3,CODE,READONLY

ENTRY

RAM1_ADDREQU0x40000000;ChangetheaddressforyourARM

RAM2_ADDREQU0x40000100;ChangetheaddressforyourARM

BLFILL;callblockfillsubroutine

BLCOPY;callblocktransfersubroutine

HEREBALHERE;BAL(branchalways)isthesameasB

;–––––-BLOCKFILLSUBROUTINE

FILLLDRR1,=RAM1_ADDR;R1=RAMAddresspointer

MOVR0,#10;counter

LDRR2,=0x55555555

L1STRR2,[R1];sendittoRAM

ADDR1,R1,#4;R1=R1+4toincrementpointer

SUBSR0,R0,#1;R0=R0-1fordeccounter

BNEL1;keepdoingit

BXLR;returntocaller

;–––––—BLOCKCOPYSUBROUTINE

COPYLDRR1,=RAM1_ADDR;R1=RAMAddresspointer(source)

LDRR2,=RAM2_ADDR;R2=RAMAddresspointer(dest.)

MOVR0,#10;counter

L2LDRR3,[R1];getfromRAM1

STRR3,[R2];sendittoRAM2

ADDR1,R1,#4;R1=R1+4toincrementpointerforRAM1

ADDR2,R2,#4;R2=R2+4toincrementpointerforRAM2

SUBSR0,R0,#1;R0=R0–1fordecrementingcounter

BNEL2;keepdoingit

BXLR;returntocaller

;–––-

END;noticetheplaceofENDdirective

Therearecasesinwhichweneedtocallasubroutinewithinacall(nestedcall).TodothatwemustusestacksincetheARMCPUcansupportonlyonecallatatimesincethereisonlyonelinkerregister(R14).

ReviewQuestions

1.ThemnemonicBLstandsfor_______.

2.Trueorfalse.“BLDELAY”savestheaddressoftheinstructionbelowBLinLRregister.

3.“BLDELAY”isa___-byteinstruction.

4.LRisan___-bitregister.

5.LRisthesameas______register.

6.ExplainthedifferencebetweenBandBLinstructions.

Section4.3:ARMTimeDelayandInstructionPipelineIn this sectionwediscusshow togeneratevarious timedelays and calculate time

delays for the ARM. We will also discuss instruction pipelining and its impact onexecutiontime.

DelaycalculationfortheARM

IncreatingatimedelayusingAssemblylanguageinstructions,onemustbemindfuloftwofactorsthatcanaffecttheaccuracyofthedelay:

1. Thecrystalfrequency:ThefrequencyofthecrystaloscillatorconnectedtotheCPUisonefactorinthetimedelaycalculation.Thedurationoftheclockperiodfortheinstructioncycleisafunctionofthiscrystalfrequency.

2. TheARMdesign: Since the 1970s, both the field of IC technology and thearchitecturaldesignofmicroprocessorshave seengreat advancements.Due to thelimitationsofICtechnologyandlimitedCPUdesignexperienceformanyyears,theinstruction cycle duration was longer. Advances in both IC technology and CPUdesign in the 1980s and 1990s havemade the single instruction cycle a commonfeatureofmanymicroprocessors.Indeed,onewaytoincreaseperformancewithoutlosingcodecompatibilitywiththeoldergenerationofagivenfamilyistoreducethenumberof instructioncycles it takes toexecutean instruction.OnemightwonderhowmicroprocessorssuchasARMareabletoexecuteaninstructioninonecycle.Thereare threeways todo that: (a)UseHarvardarchitecture toget themaximumamountofcodeanddata into theCPU,(b)useRISCarchitecturefeaturessuchasfixed-size instructions, and finally (c) use pipelining to overlap fetching andexecutionofinstructions.WehaveexaminedtheHarvardandRISCarchitecturesinChapter2.Next,wegiveabriefdiscussionofpipelining.Chapter7coverstheARMpipelineinmuchmoredetail.

Pipelining

Inearlymicroprocessorssuchasthe8085,theCPUcouldeitherfetchorexecuteatagiventime.Inotherwords,theCPUhadtofetchaninstructionfrommemory,decode,andthenexecuteit,andthenfetchthenextinstruction,decodeandexecuteit,andsoonasshown inFigure 4-9.The ideaofpipelining in its simplest form is toallow theCPUtofetch and execute at the same time. That is an instruction is being fetched while thepreviousinstructionisbeingexecuted.

Figure4-9:Non-pipelineexecution

We can use a pipeline to speed up execution of instructions. In pipelining, theprocessofexecutinginstructionsissplitintosmallstepsthatareallexecutedinparallel.Inthisway,theexecutionofmanyinstructionsisoverlapped.Onelimitationofpipeliningisthatthespeedofexecutionislimitedtothesloweststageofthepipeline.Comparethistomaking pizza. You can split the process ofmaking pizza intomany stages, such as

flatteningthedough,puttingonthetoppings,andbaking,buttheprocessislimitedtotheslowest stage, baking, no matter how fast the rest of the stages are performed. Whathappensifweusetwoorthreeovensforbakingpizzastospeeduptheprocess?Thismaywork for making pizza but not for executing programs, because in the execution ofinstructionswemustmake sure that the sequence of instructions is kept intact and thatthereisnoout-of-stepexecution.

ARMmultistageexecutionpipeline

As shown in Figure 4-10, in the ARM, each instruction is executed in 3 stages:Fetch,Decode,andExecute.

Figure4-10:PipelineinARM

In step 1, the opcode is fetched. In step 2, the opcode is decoded. In step 3, theinstruction is executed and result is written into the destination register. This 3-stagepipelinewasusedintheoriginalARM.ThenewerversionofARMmayhavemorestagesofpipeline.SeeChapter7andyourARMmanual.

InstructioncycletimefortheARM

IttakesacertainamountoftimefortheCPUtoexecuteaninstruction.Thisamountoftimeisreferredtoasmachinecycles.ThankstotheRISCarchitecture,ARMexecutesmost instructions in onemachine cycle. In theARM family, the length of themachinecycle depends on the frequency of the oscillator connected to the ARM system. Thecrystal oscillator, along with on-chip circuitry, provide the clock source for the ARMCPU.IntheARM,onemachinecycleconsistsofoneoscillatorperiod,whichmeansthatwitheachoscillatorclock,onemachinecyclepasses.Therefore,tocalculatethemachinecyclefortheARM,wetaketheinverseofthecrystalfrequency,asshowninExample 4-9.

Example4-9

ThefollowingshowsthecrystalfrequencyforfourdifferentARM-basedsystems.Findtheperiodoftheinstructioncycleineachcase.

(a)80MHz(b)160MHz(c)100MHz(d)50MHz

Solution:

(a)instructioncycleis1/80MHz=0.0125ms(microsecond)=12.5ns(nanosecond)

(b)instructioncycle=1/160MHz=0.00625ms=6.25ns

(c)instructioncycle=1/100MHz=0.01ms=10ns

(d)instructioncycle=1/50MHz=0.02ms=20ns

Branchpenalty

Theoverlappingoffetchandexecutionoftheinstructioniswidelyusedintoday’smicroprocessorssuchasARM.Fortheconceptofpipeliningtowork,weneedabufferorqueue in which an instruction is prefetched and ready to be executed. In somecircumstances,theCPUmustflushoutthequeue.Forexample,whenabranchinstructionisexecuted,theCPUstartstofetchcodesfromthenewmemorylocationandthecodeinthequeue thatwasfetchedpreviously isdiscarded. In thiscase, theexecutionunitmustwaituntil thefetchunitfetchesthenewinstruction.This iscalledabranchpenalty.Thepenaltyisanextrainstructioncycletofetchtheinstructionfromthetargetlocationinsteadofexecutingtheinstructionrightbelowthebranch.Rememberthattheinstructionbelowthe branch has already been fetched and is next in line to be decoded when the CPUbranches to a different address. This means that while the vast majority of ARMinstructions take only onemachine cycle, some instructions take threemachine cycles.These are Branch, BL (call), and all the conditional branch instructions such as BNE,BLO,andsoon.Theconditionalbranchinstructioncantakeonlyonemachinecycleifitdoes not jump.For example, theBNEwill jump ifZ=0 and that takes threemachinecycles.IfZ=1,thenitfallsthroughandittakesonlyonemachinecycle.SeeExamples 4-10and4-11.

Example4-10

ForanARMsystemof100MHz,findhowlongittakestoexecuteeachofthefollowinginstructions:

(a)MOV(b)SUB(c)B

(d)ADD(e)NOP(f)BHI

(g)BLO(h)BNE(i)EQU

Solution:

Themachinecycleforasystemof100MHzis10ns,asshowninExample4-9.Therefore,wehave:

InstructionInstructioncyclesTimetoexecute

(a)MOV11×10ns=10ns

(b)SUB11×10ns=10ns

(c)B33×10ns=30ns

(d)ADD11×10ns=10ns

(e)NOP11×10ns=10ns

Forthefollowing,duetobranchpenalty,3clockcyclesiftakenand1ifitfallsthrough:

(f)BHI3/13×10ns=30ns

(g)BLO3/13×10ns=30ns

(h)BNE3/13×10ns=30ns

(i)EQU0(directivesdonotproducemachineinstructions)

NoticethatARMchipdoesnothavetheNOPinstruction.TheARMAssemblerreplacesitwithsomeother1-cycleinstruction.OnyourARMAssembler,youmightgetawarningthatNOPdoesnotexist,butitstillcompilestheprogram.Itdoesnotgiveanerror.

DelaycalculationforARM

Adelaysubroutineconsistsoftwoparts:(1)settingacounter,and(2)aloop.Mostofthetimedelayisperformedbythebodyoftheloop,asshowninExample 4-11.

Example4-11

Findthesizeofthedelayofthecodesnippetbelowifthecrystalfrequencyis100MHz:

DELAYMOVR0,#255

AGAINNOP

NOP

SUBSR0,R0,#1

BNEAGAIN

MOVPC,LR;return

Solution:

WehavethefollowingmachinecyclesforeachinstructionoftheDELAYsubroutine:

InstructionMachineCycle

DELAYMOVR0,#255;1

AGAINNOP;1

NOP;1

SUBSR0,R0,#1;1

BNEAGAIN;3/1

MOVPC,LR;1

Therefore,wehaveatimedelayof[1+((1+1+1+3)×255)+1]×10ns=15,320ns.

NoticethatBNEtakesthreeinstructioncyclesifitjumpsback,andtakesonlyonecyclewhenfallingthroughtheloop.Thatmeanstheabovenumbershouldbe153.0ns.Becausethelasttime,whenR0iszero,theBNEtakesonlyonecyclebecauseitfallsthroughtheloop

Veryoftenwecalculatethetimedelaybasedontheinstructionsinsidetheloopandignoretheclockcyclesassociatedwiththeinstructionsoutsidetheloop.

InExample 4-11,thelargestvaluetheR0registercantakeis232=4G.Onewaytoincrease the delay is to use NOP instructions in the loop. NOP, which stands for “nooperation,” simply wastes time, but takes 4 bytes of program memory and that is tooheavyapricetopayforjustoneinstructioncycle.Abetterwayistouseanestedloop.

Loopinsidealoopdelay

Anotherwaytogetalargedelayistousealoopinsidealoop,whichisalsocalledanestedloop.SeeExample 4-12.

Example4-12

InagivenARMtraineranI/Oportisconnectedto8LEDs.ThefollowingprogramtogglestheLEDsbysendingtoit0x55and0xAAvaluescontinuously.CalculatethetimedelayfortogglingofLEDs.Assumethesystemclockfrequencyof100MHz.

Solution:

AREAExample4_12,CODE,READONLY

ENTRY

PORT_ADDREQU0x40000000;changetheaddressforyourARM

LDRR1,=PORT_ADDR;R1=portaddress

AGAINMOVR0,#0x55;R0=0x55

STRBR0,[R1];sendittoLEDs

BLDELAY;calldelay

MOVR0,#0xAA;R0=0xAA

STRBR0,[R1];sendittoLEDs

BLDELAY;calldelay

BALAGAIN;keepdoingitforever(BAListhesameasB)

;––––––—DELAYSUBROUTINE

DELAY

MOVR3,#100;R3=100,modifythisvaluefordifferentsizedelay

L1LDRR4,=250000;R4=250,000(innerloopcount)

L2SUBSR4,R4,#1;1clock

BNEL2;3clock

SUBSR3,R3,#1;R3=R3-1

BNEL1

MOVPC,LR;returntocaller

END

Ignoringthedelayassociatedwiththeouterloop,wehavethefollowingtimedelay:

[(1+3)×250,000×100]×10ns=1secondsince1/100MHz=10ns.

ExaminetheworkingfrequencyforyourARMtrainer,changetheaboveaddress0x40000000toyourARMtrainerportaddressandverifythetimedelayusingoscilloscope.

Fromthesediscussionsweconcludethat theuseofinstructionsingeneratingtimedelayisnot themostreliablemethod.Togetmoreaccurate timedelayTimersareused.All ARM microcontrollers come with on-chip Timers. We can use Keil uVision’ssimulatortoverifydelaytimeandnumberofcyclesused.Meanwhile,togetanaccuratetimedelayforagivenARMmicrocontroller,wemustuseanoscilloscopetomeasuretheexacttimedelay.

ReviewQuestions

1.Trueorfalse.IntheARM,themachinecyclelasts1clockperiodofthecrystalfrequency.

2.TheminimumnumberofmachinecyclesneededtoexecuteanARMinstructionis_____.

3.Findthemachinecycleforacrystalfrequencyof66MHz.

4.Assumingacrystalfrequencyof100MHz,findthetimedelayassociatedwiththeloopsectionofthefollowingDELAYsubroutine:

DELAYLDRR2,=50000000

HERENOP

NOP

NOP

NOP

NOP

SUBSR2,R2,#1

BNEHERE

MOVPC,LR

5.FindthemachinecycleforanARMifthecrystalfrequencyis50MHz.

6.Trueorfalse.IntheARM,theinstructionfetchingandexecutionaredoneatthesametime.

7.Trueorfalse.BandBLwillalwaystake2machinecycles.

8.Trueorfalse.TheBNEinstructionwillalwaystake3machinecycles.

Section4.4:ConditionalExecutionEverymicroprocessor has the conditional branch (jump) instruction based on the

statusofflagbitssuchasZandC.InstructionssuchasBEQ(branchequal,Z=1)orBNC(branchifnocarry,C=0)arecommoninallCPUs.TheARMCPUhasauniquefeaturethatwedonotseeinothermicroprocessors.InARM,theconceptofconditionalexecutionisimplementedforallinstructionandnotjustforbranchwhichmakesyouabletodecidetorunorignoreeachsingleinstructiondependingonthestatusofflagbits.Inotherwords,notonlythebranchinstructionbutalloftheARMinstructionscanbeconditional.Aswediscussed inChapters 2 and 3, theADD, SUB, and other arithmetic instruction do notaffect the flag bits in CPSR (current program status register) register unless they haveletterS in the syntax.Thedefault isnot toupdate the flags.Weoverride thedefaultbyhaving letterS in the instruction.Thesame thing is trueaboutconditional fieldofeachinstruction. If we do not add a condition after an instruction, it will be executedunconditionallybecausethedefaultisnottochecktheflagsandexecuteunconditionally.If we want an instruction to be executed only when a condition is met, we put theconditionsyntaxrightaftertheinstruction.

ThisfeatureofARMallowstheexecutionofaninstructionconditionallybasedonthestatusofZ,C,VandNflags.Todothat,theARMinstructionshavesetasidethemostsignificant 4bits of the instruction field for the conditions.SeeFigure4-11.The4bitsgivesus16possibleconditions.Table4-4showsthelistofallthe16possibleconditions.

Figure4-11:ConditionFieldinARMInstructions

Bits MnemonicExtension Meaning Flag

0000 EQ Equal Z=1

0001 NE Notequal Z=0

0010 CS/HS CarrySet/HigherorSame C=1

0011 CC/LO CarryClear/Lower C=0

0100 MI Minus/Negative N=1

0101 PL Plus N=0

0110 VS VSet(Overflow) V=1

0111 VC VClear(NoOverflow) V=0

1000 HI Higher C=1andZ=0

1001 HS LowerorSame C=1andZ=1

1010 GE GreaterthanorEqual N=V

1011 LT Lessthan N≠V

1100 GT Greaterthan Z=0andN=V

1101 LE LessthanorEqual Z=0orN≠V

1110 AL Always(unconditional)

1111 – NotValid

Table4-4:ARMConditioncodesfortheOpcodebits[31-28]

Note!

By default, all the instructions are executed unconditionally. As a result the AL(Always)suffixhasnoeffectontheinstruction.Forexample,BAL(BranchAlways)isexactlythesameasB(Branch).Thesameistrueforallinstructions.

Tomakeaninstructionconditional,simplyweputtheconditionsyntaxfromTable4-4infrontofit.Seethefollowingexamples:

MOVR1,#10;R1=10

MOVR2,#12;R2=12

CMPR2,R1;compare12with10,Z=0becausetheyarenotequal

MOVEQR4,#20;thislineisnotexecutedbecause

;theconditionEQisnotmet

Thefollowingcodeadds10toR1ifitisnotzero:

CMPR1,#0;compareR1with0

ADDNER1,R1,#10;thislineisexecutedifZ=0

;(ifinthelastCMPoperandswerenotequal)

NotethatwecanaddbothSandconditiontosyntaxofaninstruction.ItiscommontoputSafterthecondition.Seethefollowingexamples:

ADDNESR1,R1,#10

;thislineisexecutedandsettheflagsifZ=0

Oneadvantageofusingconditionalexecutionisitsavestimeintheexecutionofaninstruction by avoiding branch penalty. As we discussed earlier, each instruction goesthroughthreepipelinestagesoffetch,decode,andexecution.WhenabranchinstructionsuchasBCC(branchifCcleared)entersthepipeline,theinstructionbelowisalsofetchedintotheCPUnotknowingwhethertheCflagishighorlow.IftheC=0,thebranchwilljump to new location resulting in emptying the instruction pipeline queue which wasworking on the instruction below theBCC instruction. This penalty can be avoided by

usingconditionalexecutionfeatureoftheARM.WecanmaketheexecutionofmanyoftheARMinstructionsconditionalbysimplyaddingamnemonicextensiontotheendofit.For example, we can use ADDCS instead of ADD. The ADDCS means add if C=1.(ADDCS:AddifCarrySet).BythesametokentheADDEQmeansaddonlyifZ=1.

Example 4-13 shows two versions of the Example 4-4we covered earlier. In thenewversion,we use the conditional execution instructions. Simulate and compare bothversionstoseehowtheconditionalinstructionsareexecuted.

Example4-13

Thefollowingprogramadds0x99999999together10times.Comparethecodesyntaxtoseehowtheconditionalexecutionofthecodeisused.

Code1:(Example4-4withoutconditionalexecution)


ENTRY



LDRR2,=0x99999999;R2=0x99999999

MOVR3,#10;counter

L1ADDSR0,R0,R2;R0=R0+R2andupdatetheflags

BCCNEXT;ifC=0,gotonextnumber

ADDR1,R1,#1;ifC=1,incrementtheupperword

NEXTSUBSR3,R3,#1;R3=R3-1andupdatetheflags

BNEL1;nextroundifz=0

HEREBHERE;stayhere

END

Code2:(Example4-4withconditionalexecution)


ENTRY



LDRR2,=0x99999999;R2=0x99999999

MOVR3,#10;counter

L1ADDSR0,R0,R2;R0=R0+R2andupdatetheflags

ADDCSR1,R1,#1;ifCset(C=1),incrementtheupperword

NEXTSUBSR3,R3,#1;R3=R3-1andupdatetheflags

BNEL1;nextroundifz=0

HEREBHERE;stayhere

END

SeealsoExamples4-14and4-15.

Example4-14

RewritethemainpartofProgram4-1usingconditionalexecutionofARMinstructions

Solution:

MOVCOUNT,#5;COUNT=5

MOVMAX,#0;MAX=0

LDRPOINTER,=MYDATA

;POINTER=MYDATA(addressoffirstdata)

AGAIN

LDRNEXT,[POINTER]

;loadcontentsofPOINTERlocationtoNEXT

CMPMAX,NEXT;compareMAXandNEXT

MOVLOMAX,NEXT

;ifMAXislowerthanNEXTthenMAX=NEXT

ADDPOINTER,POINTER,#4

;POINTER=POINTER + 4topointtonext

SUBSCOUNT,COUNT,#1;decrementcounter

BNEAGAIN;branchAGAINifcounterisnotzero

Example4-15

Usingrotation,writeaprogramthatcountsthenumberof1sinR0.

Solution:


ENTRY

LDRR0,=0x34F37D36

MOVR1,#0;numberof1s

MOVR2,#32;counter

BEGINMOVR0,R0,RRX;RotaterightwithcarrytheR0register

ADDCSR1,R1,#1;ifC=1thenincrementR1

SUBSR2,R2,#1;decrementcounter

BNEBEGIN;ifcounterisnotequaltozerobranchBEGIN

END

ReviewQuestions

1.Trueorfalse.AlltheARMinstructionshavetheconditionalexecutionfeature.

2.HowmanybitsoftheARMinstructionaresetasidefortheconditioncodes?

3.TheADDALstandsfor……………...

4.Trueorfalse.MOVVCisavalidinstructioninARM

5.Trueorfalse.SUBEQSisavalidinstructioninARM

ProblemsSection4.1:LoopingandBranchInstructions

1.IntheARM,loopingactionusingasingleregisterislimitedto_______iterations.

2.Ifaconditionalbranchisnottaken,whatisthenextinstructiontobeexecuted?

3.Incalculatingthetargetaddressforabranch,adisplacementisaddedtothecontentsofregister_______.

4.ThemnemonicBNEstandsfor__________.

5.WhatistheadvantageofusingBXoverB?

6.Trueorfalse.ThetargetofaBNEcanbeanywhereinthe4Gwordaddressspace.

7.Trueorfalse.AllARMbranchinstructionscanbranchtoanywhereinthe4Gbyteaddressspace.

8.DissecttheBinstruction,indicatinghowmanybitsareusedfortheoperandandtheopcode,andindicatehowfaritcanbranch.

9.Trueorfalse.Allconditionalbranchesare2-byteinstructions.

10.Showcodeforanestedlooptoperformanaction10,000,000,000times.

11.Showcodeforanestedlooptoperformanaction200,000,000,000times.

12.Findthenumberoftimesthefollowingloopisperformed:

MOVR0,#0x55

MOVR2,#40

L1LDRR1,=10000000

L2EORR0,R0,#0xFF

SUBR1,R1,#1

BNEL2

SUBR2,R2,#1

BNEL1

13.IndicatethestatusofZandCafterCMPisexecutedineachofthefollowingcases.

(a)

MOVR0,#50

MOVR1,#40

CMPR0,R1

(b)

MOVR1,#0xFF

MOVR2,#0x6F

CMPR1,R2

(c)

MOVR2,#34

MOVR3,#88

CMPR2,R3

(d)

SUBR1,R1,R1

MOVR2,#0

CMPR1,R2

(e)

EORR2,R2,R2

MOVR3,#0xFF

CMPR2,R3

(f)

EORR0,R0,R0

EORR1,R1,R1

CMPR0,R1

(g)

MOVR4,#0x78

MOVR2,#0x40

CMPR4,R2

(h)

MOVR0,#0xAA

ANDR0,R0,#0x55

CMPR0,#0

14.RewriteProgram4-1tofindthelowestgradeinthatclass.

15.ThetargetaddressofaBNEisbackwardiftherelativeaddressportionofopcodeis_______(negative,positive).

16.ThetargetaddressofaBNEisforwardiftherelativeaddressportionofopcodeis_______(negative,positive).


17.BLisa(n)___-byteinstruction.

18.InARM,whichregisteristhelinkerregister?

19.Trueorfalse.TheBLtargetaddresscanbeanywhereinthe4Gbyteaddressspace.

20.DescribehowwecanreturnfromasubroutineinARM.

21.InARM,whichaddressissavedwhenBLinstructionisexecuted.


22.Findtheoscillatorfrequencyifthemachinecycle=1.25ns.

23.Findthemachinecycleifthecrystalfrequencyis200MHz.



26.FindthetimedelayforthedelaysubroutineshownbelowifthesystemhasanARMwithafrequencyof80MHz:

MOVR8,#200

BACKLDRR1,=400000000

HERENOP

SUBSR1,R1,#1

BNEHERE

SUBSR8,R8,#1

BNEBACK


MOVR2,#100

BACKLDRR0,=50000000

HERENOP

NOP

SUBSR0,R0,#1

BNEHERE

SUBSR2,R2,#1

BNEBACK

28.FindthetimedelayforthedelaysubroutineshownbelowifthesystemhasaARMwithafrequencyof40MHz:

MOVR1,#200

BACKLDRR0,#20000000

HERENOP

NOP

NOP

SUBSR0,R0,#1

BNEHERE

SUBSR1,R1,#1

BNEBACK


MOVR8,#500

BACKLDRR1,=20000

HERENOP

NOP

NOP

SUBSR1,R1,#1

BNEHERE

SUBSR8,R8,#1

BNEBACK

30.TheARMchipdoesnothavetheNOPinstruction.ExaminethelistfilegeneratedbyyourARMassemblertoseewhatinstructionisusedinplaceoftheNOP.


31.WhichbitsoftheARMinstructionaresetasideforconditionexecution?

32.Trueorfalse.OnlyADDandMOVinstructionshaveconditionalexecutionfeature.

33.Trueorfalse.InARM,theconditionalexecutionisdefault.

34.WhichflagbitisexaminedbeforetheMOVEQinstructionisexecuted?

35.StatethedifferencebetweentheADDEQandADDNEinstructions.

36.StatethedifferencebetweentheBALandBinstructions.

37.StatethedifferencebetweentheSUBCCandSUBCSinstructions.

38.StatethedifferencebetweentheANDEQandANDNEinstructions.

39.Trueorfalse.ThedecisiontoexecutetheSUBCCisbasedonthestatusofZflag.

40.Trueorfalse.ThedecisiontoexecutetheADDEQisbasedonthestatusofZflag.

AnswerstoReviewQuestions1.BranchifnotEqual

2.True

3.4

4.ZflagofCPSR(statusregister)

5.4

6.TheBcanonlybranchtoanaddresslocationwithin32MBaddressspace,whiletheBXcangoanywhereinthe4GBaddressspaceofARM.


1.BranchandLink

2.True

3.4

4.32

5.R14

6.Inbothofthemthetargetaddressisrelativetothevalueoftheprogramcounterandtherelativeaddresscancovermemoryspaceof–32MBto+32MBfromcurrentlocationofprogramcounter.TheBLinstructionsavestheaddressofthenextinstructionintheLRregisterbeforejumping,whiletheBinstructionjustjumpswithoutsavinganything.


1.True

2.1

3.MC=1/66MHz=0.015ms=15ns

4.[50,000,000×(1+1+1+1+1+1+3)]×10ns=4.5seconds

5.MachineCycle=1/50MHz=0.02ms=20ns

6.True

7.False

8.False.Ittakes3cycles,onlyifitbranchestothetargetaddress.


1.True

2.4bits

3.ADDalwaysregardlessofthestatusflag.

4.True

5.True

Chapter5:SignedNumbersandIEEE754FloatingPointThischapterdealswithsignednumber instructionsandoperations. InSection5.1,

we focus on the concept of signed numbers in software engineering. Signed numberarithmeticoperationsand instructionsareexplainedalongwithexamples inSection5.2.TheIEEE754floatingpointwillbeexplainedinSection5.3.

Section5.1:SignedNumbersConceptAlldataitemsusedsofarhavebeenunsignednumbers,meaningthattheentire8-

bit,16-bitor32-bitoperandwasusedforthemagnitude.Manyapplicationsrequiresigneddata.Inthissectiontheconceptofsignednumbersisdiscussed.

Conceptofsignednumbersincomputers

Ineverydaylife,numbersareusedthatcouldbepositiveornegative.Forexample,atemperatureof5degreesbelowzerocanberepresentedas-5,and20degreesabovezeroas +20. Computersmust be able to accommodate such numbers. To do that, computerscientistshavedevisedthefollowingarrangementfortherepresentationofsignedpositiveandnegativenumbers:Themostsignificantbit(MSB)issetasideforthesign(+or-)andtherestofthebitsareusedforthemagnitude.Thesignisrepresentedby0forpositive(+)numbers and 1 for negative (-) numbers. Signed byte and word representations arediscussedbelow.

Signedbyteoperands

Insignedbyteoperands,D7(MSB) is thesignandD0toD6aresetasidefor themagnitudeofthenumber.IfD7=0,theoperandispositive,andifD7=1,itisnegative.

Therangeofpositivenumbersthatcanberepresentedbytheaboveformatis0to+127.

Dec. Binary

0 00000000

+1 00000001

… ….….

+5 00000101

… ….….

+127 01111111

Dec. Binary

-0 10000000

-1 10000001

… ….….

-5 10000101

… ….….

-127 11111111

Negativenumbersusing2’scomplement

TosaveALUcircuitry,weuse2’scomplementmethod to implement thenegativenumbers. For negative numbers, D7 is 1 but the non-sign (magnitude) portion isrepresented in 2’s complement. Although the assembler does the conversion, it is stillimportant to understand how the conversion works. To convert to negative numberrepresentation(2’scomplement),followthesesteps:

1.Writethemagnitudeofthenumberin8-bitbinary(nosign).

2.Inverteachbit.

3.Add1toit.

Dec. Binary

0 00000000

+1 00000001

… ….….

+5 00000101

… ….….

+127 01111111

Dec. Binary

0 00000000

-1 11111111

… ….….

-5 11111011

… ….….

-127 10000001

-128 10000000

Examples5-1,5-2,and5-3demonstratethesethreesteps.

Example5-1

Showhowthecomputerwouldrepresent-5.

Solution:

1.000001015in8-bitbinary

2.11111010inverteachbit

3.11111011add1(0xFB)

Thisisthesignednumberrepresentationin2’scomplementfor-5.

Example5-2

Show-34hexasitisrepresentedinternally.

Solution:

1.00110100

2.11001011

3.11001100(whichis0xCC)

Example5-3

Showtherepresentationfor-12810.

Solution:

1.10000000

2.01111111

3.10000000Noticethatthisisnotnegativezero(–0).

Fromtheexamplesaboveitisclearthattherangeofbyte-sizednegativenumbersis-1to-128.Thefollowinglistsbyte-sizedsignednumberranges:

Decimal Binary Hex

-128 10000000 80

-127 10000001 81

-126 10000010 82

… ….… ..

-2 11111110 FE

-1 11111111 FF

0 00000000 00

00000001 01

+2 00000010 02

… ….…. ..

+127 01111111 7F

Halfword-sizedsignednumbers

InARMCPUahalf-wordis16bitsinlength.SettingasidetheMSB(D15)forthesignleavesatotalof15bits(D14–D0)forthemagnitude.Thisgivesarangeof–32,768(–215)to+32,767(215–1).

Ifanumber is larger than16-bit, itmustbe treatedasa32-bitwordoperand.Thefollowingshowstherangeofsignedhalf-wordoperands.Toconvertanegativenumbertoits half-word operand representation, the steps discussed in negative byte operands areused.

Decimal Binary Hex

-32,768 1000000000000000 8000

-32,767 1000000000000001 8001

-32,766 1000000000000010 8002

… ….… ..

-2 1111111111111110 FFFE

-1 1111111111111111 FFFF

0 0000000000000000 0000

+1 0000000000000001 0001

+2 0000000000000010 0002

… ….…. …

+32,766 0111111111111110 7FFE

+32,767 0111111111111111 7FFF

UsingMicrosoftWindowscalculatorforsignednumbers

AllMicrosoftWindowsoperatingsystemscomewithahandycalculator.Useit toverifythesignednumberoperationsinthissection.

Word-sizedsignednumbers

InARMCPUs,awordis32bitsinlength.SettingasidetheMSB(D31)forthesignleavesatotalof31bits(D30–D0)forthemagnitude.Thisgivesarangeof-(231)to+(231-1).

To convert a negative number to its word operand representation, the three stepsdiscussedinnegativebyteoperandsareused.SeeExample5-4.

Example5-4

Showhowthecomputerwouldrepresent-5for(a)8-bit,(b)16-bit,and(c)32-bitdatasizes.

Solution:

(a)8-bit

1.000001015in8-bitbinary


3.11111011add1(0xFB)

(b)16-bit

1.00000000000001015in16-bitbinary


3.1111111111111011add1(0xFFFB)

(c)32-bit

1.000000000000000000000000000001015in32-bitbinary

2.11111111111111111111111111111010inverteachbit

3.11111111111111111111111111111011add1(0xFFFFFFFB)

UsetheWindowscalculatortoverifytheseexamples.

Ifanumberislargerthan32-bit,itmustbetreatedasa64-bitdoublewordoperandand be processed chunk by chunk the same way as unsigned numbers. The followingshowstherangeofsignedwordoperands.

Decimal Binary Hex

-2,147,483,648 10000000000000000000000000000000 80000000

-2,147,483,647 10000000000000000000000000000001 80000001

-2,147,483,646 10000000000000000000000000000010 80000002

… … …

-2 11111111111111111111111111111110 FFFFFFFE

-1 11111111111111111111111111111111 FFFFFFFF

0 00000000000000000000000000000000 00000000

+1 00000000000000000000000000000001 00000001

+2 00000000000000000000000000000010 00000002

… … …

+2,147,483,646 01111111111111111111111111111110 7FFFFFFE

+2,147,483,647 01111111111111111111111111111111 7FFFFFFF

Table5-1showsasummaryofsigneddataranges.

DataSize Bits 2n Decimal Hexadecimal

Byte 8 -27to+27-1 -128to+127 0x80–0x7F

Half-word 16 -215to+215-

1-32,768to+32,767 0x8000–0x7FFF

Word 32 -231to+231-1 -2,147,483,648to+2,147,483,647

0x80000000–0x7FFFFFFF

Table5-1:SignedDataRangeSummary

ReviewQuestions

1.Inan8-bitoperand,bit_____isusedforthesignbit,whereasina16-bitoperand,bit_____isusedforthesignbit.Repeatfor32-bitsigneddata.

2.Computethebyte-sized2’scomplementof0x16.

3.Therangeofbyte-sizedsignedoperandsis-______to+_____.Therangeofhalfword-sizedsignedoperandsis-__________to+__________.

4.Therangeofword-sizedsignedoperandsis-__________to+__________.

5.Computethe2’scomplementof0x500000.

Section5.2:SignedNumberInstructionsandOperationsIn this section we examine issues associated with signed number arithmetic

operations.WewillalsodiscusstheARMinstructionsforsignednumbersandhowtousethem. Itmust be noted that inARM theN flag bit inCSPR is the signbit.N=0 is forpositiveandN=1fornegativenumbers.

Overflowprobleminsignednumberoperations

Whenusingsignednumbers,aseriousproblemarisesthatmustbedealtwith.Thisistheoverflowproblem.TheCPUindicatestheexistenceoftheproblembyraisingtheV(oVerflow) flag,but it isup to theprogrammer to takecareof it.TheCPUunderstandsonly0sand1sandignoresthehumanconventionofpositiveandnegativenumbers.Nowwhatisanoverflow?Iftheresultofanoperationonsignednumbersistoolargefortheregister,anoverflowoccursandtheprogrammermustbenotified.LookatExample5-5.

Example5-5

Lookatthefollowingcasefor8-bitdatasize:

+96 01100000

+70 +01000110

+166 10100110

AccordingtotheCPU,thisis-90,whichiswrong.(V=1,N=1,C=0)

Intheexampleabove,+96isaddedto+70andtheresultaccordingtotheCPUis- 90. Why? The reason is that the result was more than 8 bits could handle. The 8-bitregisterscanonlycontainupto+127.ThedesignersoftheCPUcreatedtheoverflowflagspecifically for the purpose of informing the programmer that the result of the signednumberoperationiserroneous.

Whentheoverflowflagissetin8-bitoperations

In 8-bit signed number operations, V is set to 1 if either of the following twoconditionsoccurs:

1.ThereisacarryfromD6toD7butnocarryoutofD7(C=0).

2.ThereisacarryfromD7out(C=1)butnocarryfromD6toD7.

Inotherwords,theoverflowflagissetto1ifthereisacarryfromD6toD7orfromD7out,butnotboth.ThismeansthatifthereisacarrybothfromD6toD7andfromD7out,V=0.InExample5-5,sincethereisonlyacarryfromD6toD7andnocarryfromD7out,V=1.Examples5-6,5-7,and5-8givefurtherillustrationsoftheoverflowflaginsignednumberarithmetic.

Example5-6

Examinethefollowingcase:

-128 10000000

+-2 +11111110

-130 01111110 V=1,N=0(positive),C=1

AccordingtotheCPU,theresultis+126,whichiswrong.TheerrorisindicatedbythefactthatV=1.

Example5-7

Observetheresultsofthefollowing:

;assumeR3=+7(R3=0x07)

;assumeR2=+18(R2=0x12)

ADDR2,R2,R3;(R2=0x19=+25,correct)

+7 00000111

++18 +00010010

+25 00011001

V=0,C=0,andN=0(positive).

Example5-8

Observetheresultsofthefollowing:

-2 11111110

+-5 11111011

-7 11111001

V=0,C=0,andN=1(negative);theresultiscorrectsinceV=0.

Overflowflagin16-bitoperations

Ina16-bitoperation,Vissetto1ineitheroftwocases:



Againtheoverflowflagislow(notset)ifthereisacarryfrombothD14toD15andfromD15out.TheVissetto1onlywhenthereisacarryfromD14toD15orfromD15out,butnotfromboth.SeeExamples5-9and5-10.

Example5-9

Observetheresultsinthefollowing16-bithexnumbers:

+6E2F 0110111000101111

++13D4 0001001111010100

+8203 1000001000000011

=–0x7DFD=–32,253incorrect!

V=1,C=0,N=1

Example5-10


+542F 0101010000101111

++12E0 +0001001011100000

+670F 0110011100001111=+0x670F=+26,383(correctanswer);

V=0,C=0,N=0

Overflowflagin32-bitoperations

Ina32-bitoperation,Vissetto1ineitheroftwocases:



Againtheoverflowflagislow(notset)ifthereisacarryfrombothD30toD31andfromD31out.TheVissetto1onlywhenthereisacarryfromD30toD31orfromD31

out,butnotfromboth.SeeExamples5-11and5-12.

Example5-11


+6E2F356F 01101110001011110011010101101111

++13D49530 +00010011110101001001010100110000

+8203CA9F 10000010000000111100101010011111 =–0x7DFC3561

incorrect!V=1,C=0,N=1

Example5-12


+542F356F 01010100001011110011010101101111

++12E09530 00010010111000001001010100110000

+670FCA9F 01100111000011111100101010011111 =+670FCA9F

correctanswer;V=0,C=0,N=0

Signextensionandavoidingerroneousresultsinsignednumberoperations

To avoid the problems associated with signed number operations, one can sign-extendtheoperand.Signextensioncopiesthesignbit(D7)ofthelowerbyteofaregisterintotheupperbitsofthe32-bitregister,orcopiesthesignbitofa16-bitregisterinto32-bit.TheLDRSB(loadregistersignedbyte)instructionloadsintotheregisterabytefrommemoryandsignextendstheD7totheentire32-bitregister.TheLDRSH(loadregistersignedhalf-word) instruction loads into the register a half-word frommemory and signextendstheD15totheentire32-bitregister.Theyworkasfollows:

LDRSB loads into the destination register a byte frommemory and sign extends(copyD7,thesignflag)toall32bits.ThisisillustratedinFigure5-1.

Figure5-1:SignExtendingaByte

Lookatthefollowingexample:

;assumelocation0x80000has+96=01100000andR1=0x80000

LDRSBR0,[R1];nowR0=00000000000000000000000001100000

;assumelocation0x80000contains-2=11111110andR2=0x80000

LDRSBR4,[R2];nowR4=11111111111111111111111111111110

Ascanbeseenintheaboveexamples,LDRSBdoesnotalterthelower8bits.Thesignof8bitsiscopiedtotherestofthe32-bitregister.

LDRSHloadsthedestinationregisterwitha16-bitsignednumberandsign-extendsto the32-bit register. It copiesD15ofRd toallbitsof theRd register.This isused forsignedhalf-wordoperandandisillustratedinFigure5-2.

Figure5-2:SignExtendingaHalf-word

Lookatthefollowingexample:

;assume0x80000contains+260=0000000100000100andR1=0x80000

LDRSHR0,[R1];R0=00000000000000000000000100000100

Anotherexample:

;assumelocation0x20000has-327660=0x8002andR2=0x20000

LDRSHR1,[R2];R1=FFFF8002

Ascanbeseenintheaboveexamples,LDRSHdoesnotalterthelower16bits.Thesignof16bitsiscopiedtotherestofthe32-bitregister.Howcantheseinstructionshelpcorrect the overflow error?To answer that question,Example 5-13 showsExample 5-5rewrittentocorrecttheoverflowproblem.

Example5-13

WriteaprogramforExample5-5tohandletheoverflowproblem.

Solution:


ENTRY

LDRR1,=DATA1

LDRR2,=DATA2

LDRR3,=RESULT

LDRSBR4,[R1];R4=+96

LDRSBR5,[R2];R5=+70

ADDR4,R4,R5;R4=R4+R5=96+70=+166

STRR4,[R3];Store+166inlocationRESULT

HLBHL

DATA1DCB+96

DATA2DCB+70

AREAVARIABLES,DATA,READWRITE;ThefollowingisstoredinRAM

RESULTDCW0

END

ThefollowingisananalysisofthevaluesinExample5-13.Eachissign-extendedandthenaddedasfollows:

Sign Binarynumbers Decimal

0 0000000000000000000000001100000 +96aftersignext.

0 0000000000000000000000001000110 +70aftersignext.

0 0000000000000000000000010100110 +166

Asarule,ifthepossibilityofoverflowexists,allbyte-sizedsignednumbersshouldbesign-extendedintoaword,andsimilarly,allhalfword-sizedsignedoperandsshouldbesign-extended to a word before they are processed. This is shown in Program 5-1.Program5-1findstotalsumofagroupofsignednumberdata.

Program5-1

;Thisprogramcalculatesthesumofsignednumbers


ENTRY

LDRR0,=SIGN_DAT

MOVR3,#9

MOVR2,#0

LOOPLDRSBR1,[R0]

;LoadintoR1andsignextendit.


ADDR0,R0,#1;pointtonext


BNELOOP

LDRR0,=SUM

STRR2,[R0];StoreR2inlocationSUM

HEREBHERE

SIGN_DATDCB+13,-10,+19,+14,-18,-9,+12,-19,+16

AREAVARIABLES,DATA,READWRITE

SUMDCD0

END

Signednumbermultiplication

Signed number multiplication is similar in its operation to the unsignedmultiplication described in Chapter 3. The only difference between them is that theoperands in signed number operations can be positive or negative; therefore, the resultmust indicate the sign. InARMwehaveSMULL(signedmultiply long)butnoSMUL(signmultiply).Table5-2summarizessignednumbermultiplication;itissimilartoTable3-3inChapter3.SeeExamples5-14and5-15.

Multiplication Operand1 Operand2 Result

word×word Rm Rs RdHi=upper32-bit,RdLo=lower32-bit

Note:UsingSMULL(signedmultiplylong)forword×wordsmultiplicationprovidesthe64-bitresultinRdLoandRdHiregister.Thisisusedfor32-bit×32-bitnumbersinwhichresultcangobeyond0xFFFFFFFF.

Table5-2:SignedMultiplication(SMULLRdLo,RdHi,Rm,Rs)Summary

Example5-14

Observetheresultsofthefollowingmultiplicationofsignednumbers:

LDRR1,=-3500;R1=-3500(0xFFFFF254)

LDRR0,=-100;R0=-100(0xFFFFFF9C)

SMULLR2,R3,R0,R1

Solution:

-3500×-100=350,000=55730inhex.AfterexecutingtheaboveprogramR2andR3willcontain0x55730and00000000,respectively.

Example5-15

ThefollowingprogramissimilartoExample5-14.But,insteadofSMULL,theUMULLinstructionisused.Observetheresultsofthefollowingmultiplication:

LDRR1,=-3500;R1=-3500(0xFFFFF254)

MOVR0,#-100;R0=-100(0xFFFFFF9C)

UMULLR2,R3,R0,R1

Solution:

0xFFFFF254×0xFFFFFF9C=0xFFFFF1F000055730.Thus,R2andR3willcontain0x00055730and0xFFFFF1F0,respectively.Asyoucansee,theresultsoftheprogramsarecompletelydifferent.InthepreviousprogramtheSMULLinstructionconsiderstheoperandssignednumbersandtheresultoftwonegativenumbersbecomespositive.Asaresult,thesignbitbecomeszero,butinthisexampletheoperandsareconsideredasunsignednumbers.

Signednumbercomparison

InChapter4wesawthat theCMPinstructionaffects theZandCflags;usingtheflagswecomparedunsignednumbers.ThisinstructionaffectstheNandVflags,aswell;We can use flags Z, V, and N to compare signed numbers. The Z flag shows if thenumbersareequalornot.WhenthenumbersareequaltheZflagissettoone.NandVflagsshowiftheleftoperandisbiggerthantherightoperandornot.WhenNandVhavethesamevalue,thefirstoperandhasagreatervalue.

Insummary,afterexecutingtheinstructionCMPRn,Op2 theflagsarechangedasfollows:

Op2>RnV=N

Op2=RnZ=1

Op2<RnN≠V

Table 5-3 lists the branch instructions which check the Z, V, and N flags. TheinstructionscanbeusedtogetherwiththeCMPinstructiontocomparesignednumbers.

Instruction Action

BEQ branchequal BranchifZ=1

BNE Branchnotequal BranchifZ=0

BMI Branchminus(branchnegative) BranchifN=1

BPL Branchplus(branchpositive) BranchifN=0

BVS BranchifVset(branchoverflow) BranchifV=1

BVC BranchifVclear(branchifnooverflow) BranchifV=0

BGE Branchgreaterthanorequal BranchifN=V

BLT Branchlessthan BranchifN≠V

BGT Branchgreaterthan BranchifZ=0andN=V

BLE Branchlessthanorequal BranchifZ=1orN≠V

Table5-3:ARMConditionalBranch(Jump)InstructionsforSignedData

Program5-2findsthelowestnumberamongalistofnumbers.

Program5-2

;Findingthelowestofsignednumbers


ENTRY

LDRR0,=SIGN_DAT

MOVR3,#8

LDRSBR2,[R0]

;bringintoR2thefirstsignnumberandsignextendit

ADDR0,R0,#1;pointtonext

BEGINLDRSBR1,[R0];R1=contentsofloc.pointedtobyR0

CMPR1,R2;compareR1andR2

BGENEXT

;branchtoNEXTifR1isgreaterthanorequaltoR2

MOVR2,R1

;R2=R1(usethenewnumberforcomparison)

NEXTADDR0,R0,#1;pointtonext


BNEBEGIN;ifR3isnotzerobranchBEGIN

LDRR0,=LOWEST;R0=addressofLOWEST

STRR2,[R0];storeR2inlocationSUM

HEREBHERE

SIGN_DATDCB+13,-10,+19,+14,-18,-9,+12,-19,+16

AREAVARIABLES,DATA,READWRITE

LOWESTDCD0

END

;Noticethatweusethefirstsignnumberasbasisfor

;comparisonandkeepcomparingitwithothernumbers.

;Anytimeanyofthenumberissmallerweuseitasbasis

;forcomparison.

CMNinstruction

CMNRn,Op2

In ARM we have two compare instructions: CMP and CMN. While the CMPinstructionsetstheflagsbysubtractingthesourceoperandfromthedestination,theCMNsetstheflagsbyaddingsourcetodestination.AsaresultCMNcomparesthedestinationoperandwiththenegativeofthesourceoperand:

destination>(-1×source)V=N

destination=(-1×source)Z=1

destination<(-1×source)N=negationofV

When the source operand is an immediate value, the instructions can be usedinterchangeably.Example5-16isanexampleofusingtheCMNinstruction.

Example5-16

AssumingR5hasapositivevalue,writeaprogramthatfindsitsnegativematchinanarrayofdata(OUR_DATA).

Solution:


ENTRY

MOVR5,#13

LDRR0,=OUR_DATA

MOVR3,#9

BEGIN

LDRSBR1,[R0];R1=contentsofloc.pointedtobyR0

;(signextended)

CMNR1,R5;compareR1andnegativeofR5

BEQFOUND;branchifR1isequaltonegativeofR5

ADDSR0,R0,#1;incrementpointer


BNEBEGIN;ifR3isnotzerobranchBEGIN

NOT_FOUNDBNOT_FOUND

FOUNDBFOUND

OUR_DATADCB+13,-10,-13,+14,-18,-9,+12,-19,+16

END

IntheaboveprogramR5isinitializedwith13.Therefore,itfinishessearchingwhenitgetsto– 13.

Arithmeticshift

AswasdiscussedinChapter3,therearetwotypesofshifts:logicalandarithmetic.Logical shift, which is used for unsigned numbers, was discussed in Chapter 3. Thearithmetic shift is used for signednumbers. It is basically the same as the logical shift,exceptthatthesignbitiscopiedtotheshiftedbits.

ASR(arithmeticshiftright)

MOVRn,Op2,ASRcount

AsthebitsofthesourceareshiftedtotherightintoC,theemptybitsarefilledwiththesignbit.OnecanusetheASRinstructiontodivideasignednumberby2,asshownbelow:

MOVR0,#-10;R0=-10=0xFFFFFFF6

MOVR3,R0,ASR#1;R0isarithmeticshiftedrightonce

;R3=0xFFFFFFFB=-5

ReviewQuestions

1.Explainthedifferencebetweenanoverflowandacarry.

2.ExplainthepurposeoftheLDRSBandLDRSHinstructions.DemonstratetheeffectofLDRSBonR0=0xF6.DemonstratetheeffectofLDRSHonR1=0x124C.

3.Theinstructionforsignedmultiplicationis________.

4.Foreachofthefollowinginstructions,indicatetheflagconditionnecessaryforeachbranchtooccur:(a)BLE(b)BGT

Section5.3:IEEE754Floating-PointStandardsInthissectionwestudytheIEEEstandardforfloating-pointnumbers.

IEEEfloating-pointstandard

Uptothelate1970s,realnumbers(numberswithdecimalpoints)wererepresenteddifferentlyinbinaryformbydifferentcomputermanufacturers.Thismademanyprogramsincompatible for different machines. In 1980, an IEEE committee standardized thefloating-point data representation of real numbers. This standard, much of which wascontributedbyIntelbasedonthe8087mathcoprocessor,recognizedtheneedfordifferentdegreesofprecisionbydifferentapplications;therefore,itestablishedsingleprecisionanddoubleprecision.Sincealmostallsoftwareandhardwarecompaniesnowabidebythesestandards,eachoneisexplainedthoroughly.

IEEE754single-precisionfloating-pointnumbers

IEEEsingle-precision floating-pointnumbersuseonly32bitsofdata to representany real number range 2128 to 2-126, for both positive and negative numbers. Thistranslatesapproximatelytoarangeof1.2×10-38to3.4×10+38indecimalnumbers,againforbothpositiveandnegativevalues.Insomemathcoprocessorterminology,thesesingle-precision32-bitfloating-pointnumbersarereferredtoasshortreal.Assignmentofthe32bitsinthesingle-precisionformatisshowninFigure5-3.

Figure5-3:IEEE754Single-precisionFloating-pointNumbers

Tomakethehardwaredesignofthemathprocessorsmucheasierandlesstransistorconsuming, the exponent part is added to a constant of 0x7F (127 decimal). This isreferred to as a biased exponent. Conversion from real to floating point involves thefollowingsteps.

1.Therealnumberisconvertedtoitsbinaryform.

2.Thebinarynumberisrepresentedinscientificform:1.xxxxEyyyy

3.Bit31iseither0forpositiveor1fornegative.

4.Theexponentportion,yyyy,isaddedto7Ftogetthebiasedexponent,whichisplacedinbits23to30.

5.Thesignificand,xxxx,isplacedinbits22to0.

Examples5-17,5-18,and5-19demonstratethisprocess.

Example5-17

Convert9.7510tosingle-precision(shortreal)floatingpoint.

Solution:

decimal9.75=binary1001.11=scientificbinary1.00111E3

Signbit31is0forpositive.

Exponentbits30to23are10000010(3+7F=82H)afterbiasing.

Significandbits22to0are001110000000000000000…00.

Puttingitalltogethergivesthefollowingbinaryform,underwhichiswrittenthehexform:

01000001000111000000000000000000

411C0000

ThiscanbeverifiedbyusinganassemblersuchasKeil.

Example5-18

Convert0.07812510toshortIEEEfloating-pointstandardrealFP(singleprecision).

Solution:

decimal0.078125=binary0.000101=scientificbinary1.01E–4

Signbit31is0forpositive.

Exponentbits30–23are01111011(–4+7F=7B)afterbiasing.

Significandbits22–0are01000000….000.

Thisnumberwillberepresentedinbinaryandhexas

00111101101000000000000000000000

3DA00000

Example5-19

Convert–96.2710tosingle-precisionFPformat.

Solution:

decimal96.27=binary1100000.01000101000111101=

scientificbinary1.10000001000101000111101E6

Signbit31is1fornegative.

Exponentbits30–23are10000101(6+7F=85H)afterbiasing,

Fractionbits22–0are10000001000101000111101,

Thefinalforminbinaryandhexis

11000010110000001000101000111101

C2C08A3D

Itmustbenotedthatconversionofthedecimalportion0.27tobinarycanbecontinuedbeyondthepointshownabove,butbecausethefractionpartofthesingleprecisionislimitedto23bits,thiswasallthatwasshown.Forthatreason,double-precisionFPnumbersareusedinsomeapplicationstoachieveahigherdegreeofaccuracy.

IEEE754double-precisionfloating-pointnumbers

Double-precisionFP(calledlongrealbyIntel)canrepresentnumbersintherange2.3×10-308to1.7×10308,bothpositiveandnegative.Atotalof52bits(bits0to51)areforthesignificand,11bits(bits52to62)arefortheexponent,andfinally,bit63isforthesign.Theconversionprocess is the sameas for singleprecision in that the realnumbermustfirstberepresentedas1.xxxxxxxEYYYY,thenYYYYisaddedto3FFtogetthebiasedexponent.SeeFigure5-4andExample5-20.

Figure5-4:IEEE754Double-precisionFloating-pointNumbers

Example5-20

Convert152.187510todouble-precisionFP.

Solution:

decimal152.1875=binary10011000.0011=

scientificbinary1.00110000011E7

Bit63is0forpositive.

Exponentbits62–53are10000000110(7+3FF=406)afterbiasing.

Fractionbits52–0are00110000011000…..000.

0100 0000 0110 0011 0000 0110 0000 0000 0000 … 0000

4 0 6 3 0 6 0 0 0 … 0

Thisexamplewillbeverifiedbyanassemblerinthenextsection.

MathcoprocessorinARM

Usingageneral-purposemicroprocessorsuchastheARMtoperformmathematicalfunctionssuchasfloatingpointcalculation,log,sine,andothersisverytimeconsuming,notonlyfortheCPUbutalsoforprogrammerswritingsuchprograms.Intheabsenceofamath coprocessor, programmers must write subroutines using ARM instructions formathematicalfunctions.Althoughsomeofthesesubroutinesarealreadywritten,nomatterhow good the subroutine, its CPU run time will still be quite long. To acceleratecomplicated mathematical and floating point calculation, math or floating pointcoprocessorsareused.Asimplervariationofmathcoprocessoriscalledfloatingpointunit(FPU) and someARMchips comewith on-chip FPU. InKeil there is a library namedfplibtodofloatingpointcalculation.

ReviewQuestions

1.Single-precisionIEEEFPstandarduses_________bitstorepresentdata.

2.Double-precisionIEEEFPstandarduses________bitstorepresentdata.

3.TogetthebiasedexponentportionofIEEEsingle-precisionfloating-pointdataweadd________.

4.TogetthebiasedexponentportionofIEEEdouble-precisionfloating-pointdataweadd________.

5.Trueorfalse.Intheabsenceofamathprocessor,thegeneral-purposeprocessormustperformallmathcalculations.

6.Trueorfalse.AlloftheARMchipscomewiththeFPU.

Problems:Section5.1:SignedNumbersConcept

1.Showhowthe32-bitcomputerswouldrepresentthefollowingnumbersandverifyeachwithacalculator.

(a)-23 (b)+12 (c)-0x28

(d)+0x6F (e)-128 (f)+127

(g)+365 (h)-32,767

2.Showhowthe32-bitcomputerswouldrepresentthefollowingnumbersandverifyeachwithacalculator.

(a)-230 (b)+1200 (c)-0x28F

(d)+0x6FF

Section5.2:SignedNumberInstructionsandOperations

3.FindtheoverflowflagforeachcaseandverifytheresultusinganARMIDE.Dobyte-sizedcalculationonthem.

(a)(+15)+(-12) (b)(-123)+(-127) (c)(+0x25)+(+34)

(d)(-127)+(+127) (e)(+100)+(-100)

4.Sign-extendthefollowingandwritesimpleprogramsinusingARMIDEtoverifythem.

(a)-122 (b)-0x999 (c)+0x17

(d)+127 (e)-129

5.ModifyProgram5-2tofindthehighesttemperature.Verifyyourprogram.

Section5.3:IEEE754Floating-PointStandards

6.Whatisthedisadvantageofusingageneral-purposeprocessortoperformmathoperations?

7.ShowthebitassignmentoftheIEEEsingle-precisionstandard.

8.Convert(byhandcalculation)eachofthefollowingrealnumberstoIEEEsingle-precisionstandard.

(a)15.575(b)89.125(c)–1022.543(d)–0.00075

9.ShowthebitassignmentoftheIEEEdouble-precisionstandard.

10.Insingle-precisionFP(floatingpoint),thebiasedexponentiscalculatedbyadding________tothe_________portionofascientificbinarynumber.

11.Indouble-precisionFP,thebiasedexponentiscalculatedbyadding________tothe_________portionofascientificbinarynumber.

12.Convertthefollowingtodouble-precisionFP.

(a)12.9375(b)98.8125


1.D7,D15,andD31for32-bitsigneddata.

2.0x16=00010110;its2’scomplementis:11101010

3.–128to+127;–32,768to+32,767(decimal)

4.-2,147,483,648to+2,147,483,647

5.0x500000=010100000000000000000000;

Its2’scomplementis:101100000000000000000000

Section5.2

1.Cflagisraisedwhenthereisacarryoutoftheresult,butVflagisraisedwhenthereisacarrytothesignbitandnocarryoutofthesignbitorwhenthereisnocarrytothesignbitandthereisacarryoutofthesignbit.CflagisusedtoindicateoverflowinunsignedarithmeticoperationswhileVflagisinvolvedinsignedoperations.

2.TheLDRSBinstructionsignextendsthesignbitofabyteintoaword;theLDRSHinstructionsignextendsthesignbitofahalf-wordintoaword.

In0xF6thesignbitis1;thus,itissign-extendedinto0xFFFFFFF6

0x124Csign-extendedintoR1wouldbeR0=0x0000124C.

3.SMULL

(a)BLEwilljumpifVistheinverseofN,orifZ=1.

(b)BGTwilljumpifVequalsN,andifZ=0.

Section5.3

1.32

2.64

3.0x7F

4.0x3FF

5.True

6.False

Chapter6:ARMMemoryMap,MemoryAccess,andStackThis chapter discusses the issue of memory access and the stack. Section 6.1 is

dedicatedtoARMmemorymapandmemoryaccess.Wewillalsoexplaintheconceptsofalign,non-align,littleendian,andbigendiandataaccess.InSection6.2,weexaminetheuseofthestackinARM.Wediscussthebit-addressable(bit-band)SRAMandperipheralsinSection6.3.AdvancedindexedaddressingmodeisexplainedinSection6.4.InSection6.5,wedescribethePCrelativeaddressingmodeanditsuse in implementingADRandLDR.

Section6.1:ARMMemoryMapandMemoryAccessTheARMCPUuses32-bitaddressestoaccessmemoryandperipherals.Thisgives

us amaximum of 4GB (gigabytes) ofmemory space. This 4GB of directly accessiblememoryspacehasaddresses0x00000000to0xFFFFFFFF,meaningeachbyteisassignedauniqueaddress(ARMisabyte-addressableCPU).SeeFigure6-1.

Figure6-1:MemoryByteAddressinginARM

The4GBofmemoryspaceisdividedintothreeregions:code,data,andperipheraldevices.SeeTable6-1.

Addressrange Name Description

0x00000000-0x1FFFFFFF Code ROMorFlashmemory

0x20000000-0x3FFFFFFF SRAM SRAMregionusedforon-chipRAM

0x40000000-0x5FFFFFFF Peripheral On-chipperipheraladdressspace

0x60000000-0x9FFFFFFF RAM Memory,cachesupport

0xA0000000-0xDFFFFFFF Device Sharedandnon-shareddevicespace

0xE0000000-0xFFFFFFFF System PPBandvendorsystemperipherals

Table6-1:MemorySpaceAllocationinARMCortex

In other words, the ARM uses the memory mapped I/O. For the ARMmicrocontrollers,generallytheFlashROMisusedforprogramcode,SRAMforscratchpad data, and memory-mapped I/O ports for peripherals. While there is an absolutestandard for the ARM instructions that all licensees of ARMmust follow, there is no

standardforexactlocationsandtypesofmemoryandperipherals.Thereforethelicenseescanimplementthememoryandperipheralsastheychoose.ForthisreasontheamountandtheaddresslocationsofmemoryusedbyFlashROM,SRAM,andI/Operipheralsvariesamong the familymembers and chipmanufacturers. TheARMmanufacturer datasheetshouldgiveyouthedetailsofthememorymapforbothon-chipandoff-chipmemoryandperipherals.MakesuretoexaminethememorymapofagivenARMchipbeforeyoustarttoprogramit.FromTable6-1,noticethatsomeofthememoryaddressesaresetasideforthe external (off-chip) memory and peripherals. At the time of this writing, no ARMmanufacturerhaspopulatedtheentire4GBofmemoryspacewithon-chipROM,RAM,andI/Operipherals.

ARM-basedMotherboards

InARM systemsforMicrosoftWindows,Unix,andAndroidoperatingsystemstheARMmotherboardsuseDRAMfortheRAMmemory,justlikethex86andPentiumPCs.AstheARMCPUispushedintothelaptop,desktop,andtabletsPCs,andthehighendofembedded systems products such as routers,wewill see the use ofDRAM as primarymemory to store both the operating systems and the applications. In such systems, theFlashmemorywill beholding thePOST (poweron self test),BIOS (basic Input/outputsystems) andboot programs. Just likex86 system, such systemshavebothon-chip andoff-chiphighspeedSRAMforcache.Currently,thereareARMchipsonthemarketwithsome on-chip Flash ROM, SRAM, and memory decoding circuitry for connection toexternal(off-chip)memory.Thisoff-chipmemorycanbeSRAM,Flash,orDRAM.Thedatasheet for suchARMchipsprovide thedetailsofmemorymap forbothon-chipandoff-chipmemories.Next,weexaminetheARMbusesandmemoryaccess.

Figure6-2:MemoryConnectionBlockDiagraminARM

D31–D0Databus

The32-bitdatabusoftheARMprovidesthe32-bitdatapathtotheon-chipandoff-chipmemoryandperipherals.Theyaregroupedinto8-bitdatachunks,D0–D7,D8–D15,D16–D23,andD24–D31.

A31–A0

These signalsprovide the32-bit addresspath to theon-chipandoff-chipmemoryandperipherals.SincetheARMsupportsdataaccessofbyte(8bits),halfword(16bits),and word (32 bits), the buses must be able to access any of the 4 banks of memoryconnectedtothe32-bitdatabus.TheA0andA1areusedtoselectoneofthe4bytesoftheD31-D0databus.SeeFigure6-3.

Figure6-3:MemoryBlockDiagraminARM

AHBandAPBbuses

TheARMCPUisconnected to theon-chipmemoryviaanAHB(advancedhigh-performancebus).TheAHBisusednotonlyforconnectiontoon-chipROMandRAM,itisalsoused forconnection to someof thehighspeed I/Os (input/output) suchasGPIO(general purpose I/O). ARM chip also has the APB (advanced peripherals bus) busdedicated for communication with the on-chip peripherals such as timers, ADC, serialCOM,SPI, I2C,andotherperipheralports.Whileweneed the32-bitdatabusbetweenCPUand thememory (RAMandROM),many slower peripherals are 8 or 16 bits andthere is noneed for entire fast 32-bit databuspathway.For this reason,ARMuses theAHB-to-APBbridgetoaccessthesloweron-chipdevicessuchasperipherals.Alsosinceperipheralsdonotneedahighspeedbus,abridgebetweenAHBandAPBallowsgoingfromthehigherspeedbusofAHBtolowerspeedbusofperipherals.TheAHBbusallowsasingle-cycleaccess.SeeFigure6-4forAHB-to-APBbridge.

Figure6-4:AHBandAPBinARM

Buscycletime

ToaccessadevicesuchasmemoryorI/O,theCPUprovidesafixedamountoftimecalledabuscycletime.Duringthisbuscycletime,thereadorwriteoperationofmemoryorI/Omustbecompleted.ThebuscycletimeusedforaccessingmemoryisoftenreferredtoasMC(memorycycle)time.ThetimefromwhentheCPUprovidestheaddressesatitsaddresspinstowhenthedataisexpectedatitsdatapinsiscalledmemoryreadcycletime.Whileforon-chipmemorythecycletimecanbe1clock,intheoff-chipmemorythecycletimeisoften2clocks.IfmemoryisslowanditsaccesstimedoesnotmatchtheMCtimeoftheCPU,extratimecanberequestedfromtheCPUtoextendthereadcycletime.Thisextratimeiscalledawaitstate(WS).Inthe1980s,theclockspeedformemorycycletimewasthesameastheCPU’sclockspeed.Forexample,inthe20MHzprocessors,thebuseswereworkingatthesamespeedof20MHz.Thisresultedin2×50ns=100nsforthememorycycletime(1/20MHz=50ns).SeeExample6-1.

Example6-1

Calculatethememorycycletimeofa50-MHzbussystemwith

(a)0WS,

(b)1WS,and

(c)2WS.

Assumethatthebuscycletimeforoff-chipmemoryaccessis2clocks.

Solution:

1/50MHz=20nsisthebusclockperiod.Sincethebuscycletimeofzerowaitstatesis2clocks,wehave:

Memorycycletimewith0WS2×20=40ns

Memorycycletimewith1WS40+20=60ns

Memorycycletimewith2WS40+2×20=80ns

It ispreferred that all bus activitiesbe completedwith0WS.However, if the readandwriteoperationscannotbecompletedwith0WS,werequestanextensionofthebuscycletime.ThisextensionisintheformofanintegernumberofWS.Thatis,wecanhave1,2,3,andsoonWS,butnot1.25WS.

WhentheCPU’sspeedwasunder100MHz,thebusspeedwascomparabletotheCPU speed. In the 1990s theCPU speed exploded to 1GHz (gigahertz)while the busspeedmaxedoutataround200MHz.ThegapbetweentheCPUspeedandthebusspeedisoneofthebiggestproblemsinthedesignofhigh-performancesystems. ToavoidtheuseoftoomanywaitstatesininterfacingmemorytoCPU,cachememoryandotherhigh-speedDRAMsareused.ThesearediscussedinChapters9and10.

Busbandwidth

The rate of data transfer is generally called bus bandwidth. In other words, busbandwidth is a measure of how fast buses transfer information between the CPU andmemoryorperipherals.Thewiderthedatabus, thehigherthebusbandwidth.However,theadvantageofthewiderexternaldatabuscomesatthecostofincreasingthediesizeforsystem on-chip (SOC) or the printed circuit board size for off-chipmemory. Now youmightaskwhyweshouldcarehowfastbusestransferinformationbetweentheCPUandoutside, as long as theCPU isworking as fast as it can. The problem is that theCPUcannotprocess information that it doesnothave. Inotherwords, the speedof theCPUmustbematchedwiththehigherbusbandwidth;otherwise,thereisnouseforafastCPU.ThisislikedrivingaPorscheorFerrari infirstgear; it isaterribleunderusageofCPUpower.Bus bandwidth ismeasured inMB (megabytes) per second and is calculated asfollows:

busbandwidth=(1/buscycletime)×buswidthinbytes

In the above formula, bus cycle time can be for bothmemory and I/O since theARMusesthememorymappedI/O.Example6-2clarifiestheconceptofbusbandwidth.As can be seen from Example 6-2, there are twoways to increase the bus bandwidth:Eitheruseawiderdatabusorshortenthebuscycletime(ordoboth).Thatisexactlywhatmanyprocessorshavedone.Again, itmustbenotedthatalthoughtheprocessor’sspeedcango to1GHzorhigher, thebusspeed foroff-chipmemory is limited toaround200MHz.Thereasonforthisisthatthesignalsbecometoonoisyforthecircuitboardiftheyareabove100MHz.

Example6-2

CalculatememorybusbandwidthforthefollowingCPUifthebusspeedis100MHz.

(a)ARMThumbwith0WSand1WS(16-bitdatabus)

(b)ARMwith0WSand1WS(32-bitdatabus)

Assumethatthebuscycletimeforoff-chipmemoryaccessis2clocks.

Solution:

Thememorycycletimeforbothis2clocks,withzerowaitstates.Withthe100MHzbusspeedwehaveabusclockof1/100MHz=10ns.

(a)Busbandwidth=(1/(2×10ns))×2bytes=100Mbytes/second(MB/s)

With1waitstate,thememorycyclebecomes3clockcycles

3×10=30nsandthememorybusbandwidthis=(1/30ns)×2bytes=66.6MB/s

(b)Busbandwidth=(1/(2×10ns))×4bytes=200MB/s

With1waitstate,thememorycyclebecomes3clockcycles

3×10=30nsandthememorybusbandwidthis=(1/30ns)×4bytes=126.6MB/s

Fromtheaboveitcanbeseenthatthetwofactorsinfluencingbusbandwidthare:

1.Theread/writecycletimeoftheCPU

2.Thewidthofthedatabus

Codememoryregion

The 4 GB of ARMmemory space is organized as 1G × 32 bits since the ARMinstructionsare32-bit.TheinternaldatabusoftheARMis32-bit,allowingthetransferofone instruction into theCPUeveryclockcycle.This isoneof thebenefitsof theRISCfixedinstructionsize.Thefetchingofaninstructionineveryclockcyclecanworkonlyifthecodeiswordaligned,meaningeachinstructionisplacedatanaddresslocationendingwith0,4,8,orC.Example6-3showstheplacementofcodeinARMmemory.Noticethatthecodeaddressesgoupby4sincetheARMinstructionsarefixedat4byteseach.Whilecompilersensurethatcodesarewordaligned,itisjoboftheprogrammertomakesurethedatainSRAMiswordalignedtoo.Wewillexaminethisimportanttopicsoon.

Example6-3

CompileanddebugthefollowingcodeinKeilandseetheplacementofinstructionsinmemorylocations.

AREAARMex,CODE,READONLY

ENTRY

MOVR2,#0x00;R2=0x00

MOVR3,#0x35;R3=0x35

ADDR4,R3,R2

END;Markendoffile

Solution

Asyoucanseeinthefigure,thefirstMOVinstructionstartsfromlocation0x00000000,thesecondMOVinstructionstartsfromlocation0x00000004andtheADDinstructionstartsfromlocation0x00000008.

Thefollowingimagedisplaysthefirstlocationsofmemory.ThecodeofthefirstMOVinstructionislocatedinthefirstword(fourbytes)ofmemorywhichiswordaligned.Thesameruleappliesfortheotherinstructions.NotethatthecodeofMOVR2,0isE3A02000but0020A0E3isstoredinthememory.Wewilldiscussthereasoninthischapterwhenwefocusontheconceptofbigendianandlittleendian.

SRAMmemoryregion

AsectionofthememoryspaceisusedbySRAM.TheSRAMcanbeon-chiporoff-chip (external).The sameway that everyARMchip has someon-chipFlashROM forcode,aportionofthememoryregionisusedbytheon-chipSRAM.Thison-chipSRAMisusedbytheCPUforscratchpadtostoreparameters.ItisalsousedbytheCPUforthepurposeofthestack.WeexaminethestackusagebytheARMinthenextsection.Inusing

theSRAMmemory for storingparameters,wemustbecarefulwhen loadingor storingdata in the SRAM lest we use unaligned data access. Next, we discuss this importantissue.

DatamisalignmentinSRAM

ThecaseofmisaligneddatahasamajoreffectontheARMbusperformance.Ifthedataisaligned,foreverymemoryreadcycle, theARMbringsin4bytesofinformation(data or code) using theD31–D0 data bus. Such data alignment is referred to aswordalignment. Tomake dataword aligned, the least significant digits of the hex addressesmustbe0,4,8,orC(inhex).

Whilethecompilersmakesurethatprogramcodes(instructions)arealwaysaligned(Example 6-3), it is the placement of data in SRAM by the programmer that can benonaligned and therefore subject to memory access penalty. In other words, the singlecycleaccessofmemoryisalsousedbyARMtobringintoregisters4bytesofdataeveryclockcycleassumingthatthedataisaligned.Tomakesurethatdataarealsoalignedweusethealigndirective.TheuseofaligndirectiveforRAMdatamakessurethateachwordislocatedatanaddresslocationendingwithaddressof0,4,8,orC.Ifourdataiswordsize(usingDCDUdirective)thentheuseofaligndirectiveatthestartofthedatasectionguarantiesallthedataplacementswillbewordaligned.WhenawordsizedataisdefinedusingtheDCDdirective,theassembleralignsittobewordaligned.

Accessingnon-aligneddata

As we have stated many times before, ARM defines 32-bit data as a word. Theaddressofawordcanstartatanyaddresslocation.Forexample,intheinstruction“LDRR1,[R0]”ifR0=0x20000004,theaddressofthewordbeingfetchedintoR1startsatanalignedaddress.Inthecaseof“LDRR1,[R0]”ifR0=0x20000001theaddressstartsatanon-aligned address. In systems with a 32-bit data bus, accessing a word from a non-alignedaddressedlocationcanbeslower.Thisissueisimportantandappliestoall32-bitprocessors.

In the 8-bit system, accessing a word (4 bytes) is treated like accessing fourconsecutive bytes regardless of the address location. Since accessing a byte takes onememory cycle, accessing 4 bytes will take 4 memory cycles. In the 32-bit system,accessingawordwithanalignedaddresstakesonememorycycle.ThatisbecauseeachbyteiscarriedonitsowndatapathofD0–D7,D8–D15,D16–D23,andD24–D31inthesamememorycycle.However,accessingawordwithanon-alignedaddressrequirestwomemory cycles. For example, see how accessing theword in the instruction “LDRR1,[R0]” works as shown in Figure 6-5. As a case of aligned data, assume that R0 =0x80000000. In this instruction, 4 bytes contents of memory locations 0x80000000through 0x80000003 are being fetched in one cycle. In only one cycle, theARMCPUaccesseslocations0x80000000through0x80000003andputsitinR1.

Figure6-5:MemoryAccessforAlignedandNon-alignedData

Now assuming that R0 = 0x80000001 in this instruction, 8 bytes contents ofmemorylocations0x80000000through0x80000007arebeingfetchedintwoconsecutivecyclesbutonly4bytesofitareused.Inthefirstcycle,theARMCPUaccesseslocations0x80000000 through 0x80000003 and puts them in R1 only the desired three bytes oflocations0x800000001through0x80000003.Inthesecondcycle,thecontentsofmemorylocations 0x8000004 through 0x80000007 are accessed and only the desired byte of0x80000004isputintoR1.SeeExample6-4.

Example6-4

Showthedatatransferofthefollowingcasesandindicatethenumberofmemorycycletimesittakesfordatatransfer.AssumethatR2=0x4598F31E.

LDRR1,=0x40000000;R1=0x40000000

LDRR2,=0x4598F31E;R2=0x4598F31E

STRR2,[R1];StoreR2tolocation0x40000000

ADDR1,R1,#1;R1=R1+1=0x40000001


ADDR1,R1,#1;R1=R1+1=0x40000002


ADDR1,R1,#1;R1=R1+1=0x40000003


Solution:

ForthefirstSTRR2,[R1]instruction,theentire32bitsofR2isstoredintolocationswith

addresses of 0x40000000, 0x40000001, 0x40000002, and 0x40000003, The 4-bytecontent of register R2 is stored into memory locations with starting address of0x40000000 via the 32-bit data bus of D31–D0. This address is word aligned sinceaddress of the least significant digit is 0. Therefore, it takes only onememory cycle totransferthe32-bitdata.

ForthesecondSTRR2,[R1]instruction,inthefirstmemorycycle,thelower24bitsofR2is stored into locations 0x40000001, 0x40000002, and 0x40000003. In the secondmemorycycle,theupper8bitsofR2isstoredintothe0x40000004location.

ForthethirdSTRR2,[R1]instruction,inthefirstmemorycycle,thelower16bitsofR2isstoredintolocations0x40000002and0x40000003.Inthesecondmemorycycle,theupper16bitsofR2isstoredintolocations0x40000004and0x40000005.

ForthefourthSTRR2,[R1]instruction,inthefirstmemorycycle,thelower8bitsoftoR2isstoredintolocations0x40000003.Inthesecondmemorycycle,theupper24bitsofR2isstoredintothelocations0x40000004,0x40000005,and0x40000006.

Thelessontobe learnedfromthis is to trynot toputanywordsonanon-alignedaddress location ina32-bit system. Indeed this is so important thatdirectiveALIGN isspecificallydesignedforthispurpose.Next,wediscusstheissueofaligneddata.

UsingLDRinstructionwithDCDandALIGNdirectives

TheDCDandDCDUdirectivesareusedfor32-bit(word)data.TheDCDdirective

ensures32-bitdatatypesarealigned,incontrasttoDCDUwhichdoesnot.DCDisusedasfollows:

VALUE1DCD0x99775533

This ensures that VALUE1, a word-sized operand, is located in a word alignedaddress location. Therefore, an instruction accessing it will take only a singlememorycycle.SinceperformanceoftheCPUdependsonhowfastitcanfetchthedatawemustensure thatanymemoryaccess reading32-bitdata isdone inasingleclockcycle.Thismeanswemustmakesureall32-bitdataarewordaligned.ThisissoimportantthatARMhas an interrupt (exception) dedicated tomisaligneddata,meaning any time it accessesmisaligned data, it lets us know that there is a problem. The one-time use of ALIGNdirectiveatthebeginningofdataareausingDCDUmakesthedataalignedforthatgroupofdata.

UsingLDRHwithDCWandALIGNdirectives

Theproblemofmisaligneddataisalsoanissuewhenthedatasizeisinhalf-words(16-bit).InmanycasesusingDCWU,wemustusetheALIGNdirectivemultipletimesinthedataareaofagivenprogramtoensuretheyarealigned.ThisisincontrasttotheDCWdirectivewhichensuresdatatypetobehalf-wordaligned.Thisisespeciallythecasewhenwe use the LDRH instruction. See Example 6-5. Aligned data is also an issue for theThumbversionoftheARM.

Example6-5

ShowthedatatransferofthefollowingLDRHinstructionsandindicatethenumberofmemorycycletimesittakesfordatatransfer.

LDRR1,=0x80000000;R1=0x80000000

LDRR3,=0xF31E4598;R3=0xF31E4598

LDRR4,=0x1A2B3D4F;R4=0x1A2B3D4F

STRR3,[R1]

;(STRR3,[R1])storesR3tolocation0x80000000

STRR4,[R1,#4]

;(STRR4,[R1+4])storesR4tolocation0x80000004

LDRHR2,[R1]

;loadstwobytesfromlocation0x80000000toR2

LDRHR2,[R1,#1]


LDRHR2,[R1,#2]


LDRHR2,[R1,#3]


Solution:

IntheLDRHR2,[R1]instruction,locationswithaddressesof0x80000000,0x80000001,0x80000002,and0x80000003areaccessedbutonly0x80000000and0x80000001areusedtogetthe16bitstoR2.Thisaddressishalfwordalignedsincetheleastsignificantdigitis0.Therefore,ittakesonlyonememorycycletotransferthedata.Now,R2=0x00004598

FortheLDRHR2,[R1,#1],instruction,locationswithaddressesof0x80000000,0x80000001,0x80000002,and0x80000003areaccessed,butonly0x80000001and0x80000002areusedtogetthe16bitstoR2.Therefore,ittakesonlyonememorycycletotransferthedata.Now,R2=0x00001E45.

FortheLDRHR2,[R1,#2],instruction,locationswithaddressesof0x80000000,0x80000001,0x80000002,and0x80000003areaccessed,butonly0x80000002and0x80000003areusedtogetthe16bitstoR2.Therefore,ittakesonlyonememorycycletotransferthedata.Now,R2=0x0000F31E.

FortheLDRHR2,[R1,#3]instruction,inthefirstmemorycycle,locationswithaddressesof0x80000000,0x80000001,0x80000002,and0x80000003areaccessed,butonly0x80000003isusedtogetthelower8bitstoR2.Inthesecondmemorycycle,theaddresslocations0x80000004,0x80000005,0x80000006,and0x80000007areaccessedwhereonlythe0x80000004locationisusedtogettheupper8bitstoR2.Now,R2=0x00004FF3.

UsingLDRBwithDCBandALIGNdirectives

Theproblemofmisaligneddatadoesnotexistwhenthedatasizeisbytes.Incasessuch as using the string of ASCII characters with theDCB directive, accessing a bytetakes the same amount of time (one memory cycle) as an aligned word (4 bytes),regardlessoftheaddresslocationofthedata.TheonlyproblemwiththeLDRBisitbringsintotheCPUonlyasinglebyteofdataineachmemorycycleinsteadof4bytesifLDRis.SeeExample6-6.

Example6-6

ShowthedatatransferofthefollowingLDRBinstructionsandindicatethenumberofmemorycycletimesittakesfordatatransfer.

LDRR1,=0x80000000;R1=0x80000000

LDRR3,=0xF31E4598;R3=0xF31E4598

LDRR4,=0x1A2B3D4F;R4=0x1A2B3D4F

STRR3,[R1]

;StoreR3tolocation0x80000000

STRR4,[R1,#4]

;(STRR4,[R1+4])StoreR4tolocation0x80000004

LDRBR2,[R1]

;loadonebytefromlocation0x80000000toR2

LDRBR2,[R1,#1]

;(LDRBR2,[R1+1])loadonebytefromlocation0x80000001

LDRBR2,[R1,#2]


LDRBR2,[R1,#3]


Solution:

IntheLDRBR2,[R1]instruction,locationswithaddressesof0x80000000,0x80000001,0x80000002,and0x80000003areaccessedbutonly0x80000000isusedtogetthe8bitsto R2. Therefore, it takes only one memory cycle to transfer the data. Now,R2=0x00000098.

In the LDRB R2,[R1,#1] instruction, locations with addresses of 0x80000000,0x80000001,0x80000002,and0x80000003areaccessedbutonly0x80000001isusedtogetthe8bitstoR2.Therefore,ittakesonlyonememorycycletotransferthedata.Now,R2=0x00000045.

In the LDRB R2,[R1,#2] instruction, locations with addresses of 0x80000000,0x80000001,0x80000002,and0x80000003areaccessedbutonly0x80000002isusedtogetthe8bitstoR2.Therefore,ittakesonlyonememorycycletotransferthedata.Now,R2=0x0000001E.

In the LDRB R2,[R1,#3] instruction, locations with addresses of 0x80000000,0x80000001,0x80000002,and0x80000003areaccessedbutonly0x80000003isusedtogetthe8bitstoR2.Therefore,ittakesonlyonememorycycletotransferthedata.Now,R2=0x000000F3.

Peripheralregion

Table 6-1 showed a section of memory is set aside for peripherals. The type ofperipherals and memory address locations used is unique to a vendor. The ARMmanufacturersprovidethedetailsofmemorymapfortheperipherals.Ifyouexaminethemanufacturerdatasheetyouwillseethattheyusealignedaddressestomakesurethereisnoclockpenaltyforaccessingthem.

LittleEndianvs.BigEndianwar

In storing data, the ARM follows the little endian convention. The little endianplaces the least significant byte (little end of the data) in the low address and the bigendian is the opposite. The origin of the terms big endian and little endian is from aGulliver’sTravelsstoryabouthowaneggshouldbeopened:fromthebigendorthelittleend.ARMsupportsbothlittleandbigendian.InARMlittleendianisthedefault.SomeARMchipmanufacturersprovideanoptionforchangingittobigendian.SeeExample6-7tounderstandlittleendianandbigendiandatastorage.

Example6-7

Showhowdataisplacedafterexecutionofthefollowingcodeusing

a)littleendianand

b)bigendian.

LDRR2,=0x7698E39F;R2=0x7698E39F

LDRR1,=0x80000000

STRR2,[R1]

Solution:

a)Forlittleendianwehave:

Location80000000=(9F)

Location80000001=(E3)

Location80000002=(98)

Location80000003=(76)

b)Forbigendianwehave:

Location80000000=(76)

Location80000001=(98)

Location80000002=(E3)

Location80000003=(9F)

InExample6-7,noticehowtheleastsignificantbyte(thelittleendofthedata)0x9Fgoestothelowaddress0x80000000,andthemostsignificantbyteofthedata0x76goesto thehighaddress0x80000003.Thismeans that the little endof thedatagoes in first,hencethenamelittleendian.In theARMwithbigendianoptionenabled,data isstoredtheoppositeway:Thebigend(mostsignificantbyte)goesintothelowaddressfirst,andforthisreasonitiscalledbigendian.ManyofrecentRISCprocessorsallowselectionofmode,bigendianorlittleendian.

HarvardArchitectureandARM

In recent yearsmanyARMmanufacturers are using the Harvard architecture forARMCPUs.OldARM architectures up toARM7 useVonNeumann architecture. TheHarvardarchitecturefeedstheCPUwithbothcodeanddataatthesametimeviatwosetsofbuses,oneforcodeandonefordata.ThisincreasestheprocessingpoweroftheCPU

sinceitcanbringinmoreinformation.

Figure6-6:VonNeumannvs.HarvardArchitecture

ReviewQuestions

1.InARM,alltheinstructionsare___bytes?

2.Whomakessurethatinstructionarealignedonwordboundary?

3.InARM,the_____endianisthedefault.

4.A66 MHzsystemhasamemorycycletimeof_____nsifitisusedwithazerowaitstate.

5.Tointerfacea100 MHzprocessortoa50 nsaccesstimeROM,howmanywaitstatesareneeded?

6.Trueorfalse.ARMusesbigendianformatwhenispoweredup.

Section6.2:StackandStackUsageinARMThestack isa sectionofRAMusedby theCPU tostore information temporarily.

ThisinformationcouldbedataoranaddressorCPUregisterswhencallingasubroutine.Stack is alsowidely usedwhen executing an interrupt service routine (ISR). TheCPUneedsthisstorageareabecausethereareonlyalimitednumberofregisters.

Howstacksareaccessed

IfthestackisasectionofRAM,theremustbearegisterinsidetheCPUtopointtoit.IntheARMCPUtheregisterusedtoaccessthestackisR13.

ThestoringofCPUinformationsuchastheregistersonthestackiscalledaPUSH,and loading thecontentsof thestackback intoaCPUregister iscalledaPOP. Inotherwords,aregisterispushedontothestacktosaveitandpoppedoffthestacktoretrieveit.Thefollowingdescribeseachprocess.

Pushingontothestack

Thestackpointer(SP)pointstothetopofthestack(TOS).IntheARMregisterR13isdesignatedasstackpointer.Aswepush(store)dataontothestack,thedataaresavedinSRAM (wheretheSPpointsto)andSPmustbedecremented(orincremented)topointtothe next location. In the ARM we have a choice of either incrementing the SP ordecrementing it.Notice that this is different frommany othermicroprocessors, notablyx86processors, inwhich theSPisdecrementedautomaticallywhendata ispushedontothe stack.Again itmust be emphasized thatwhile in traditionalCPUs such as x86, thestackpointerisdecrementedautomaticallybytheCPUitself,intheARMCPUwemustactuallycodetheinstructionforstackpointerdecrementation(orincrementation)forthatmatter.Therefore,topusharegisterontostackweusetheSTRandSUBinstructionsasshowninthefollowingcode:

STRRr,[R13];Rrcanbeanyregisters(R0-R12)

SUBR13,R13,#4;decrementstackpointer

Forexample,tostorethevalueofR1wecanwritethefollowinginstructions:

STRR1,[R13];storeR1ontothestack,

SUBR13,R13,#4;anddecrementSP

Poppingfromthestack

Popping(loading)thecontentsofthestackbackintoagivenregisteristheoppositeprocessofpushing.When thePOP isexecuted, theSP is incremented (ordecremented)and the top locationof thestack iscopied (loaded)back to the register.Thatmeans thestackisLIFO(Last-In-First-Out)memory.

ToretrievedatafromstackwecanusetheLDRinstruction.

ADDR13,R13,#4;incrementstackpointer

LDRRr,[R13];Rrcananyoftheregisters(R0-R13)

Forexample,thefollowinginstructionspopfromthetopofstackandcopytoR1:

ADDR13,R13,#4;incrementSP

LDRR1,[R13];load(POP)thetopofstacktoR1

InitializingthestackpointerinARM

When theARMispoweredup, theR13(SP) registercontainsvalue0.Therefore,wemustinitializetheSPatthebeginningoftheprogramsothatitpointstosomewhereinthe internal SRAM. In ARM, we can make the stack to grow from a higher memorylocationtoalowermemorylocation.Inthiscasewhenwepush(store)ontothestacktheSPisdecremented.Wecanalsomakethestacktogrowfromalowermemorylocationtoa higher memory location, therefore when we push (store) onto the stack, the SP isincremented. It is common to initialize the SP to the uppermostRAMmemory region,whichmeansaswepushdataontothestack,thestackpointermustbedecremented.

Different ARMs have different amounts of RAM. In some ARM assemblerStack_Toprepresentstheaddressoftopofthestack.So,ifwewanttoinitializetheSP,wecansimplyloadStack_Topinto theSP.Notice thatSP(R13) isa32-bit register.So,wecanuseany32-bitaddressofSRAMasastacksection.

Example 6-8showshowtoinitializetheSPandusethestoreandloadinstructionsforPUSHandPOPoperations.

Example6-8

ThefollowingARMprogramplacessomedataintoregistersandcallsasubroutinethatusesthesameregisters.Itshowshowtousethestack.Examinethestack,stackpointer,andtheregistersusedaftertheexecutionofeachinstruction.

;initializetheSPtopoint

;tothelastlocationofRAM(Stack_Top)

;AssumeStack_Top=0xFF8

LDRR13,=Stack_Top;loadSP

LDRR0,=0x125;R0=0x125

LDRR1,=0x144;R1=0x144

MOVR2,#0x56;R2=0x56

BLMY_SUB;callasubroutine

ADDR3,R0,R1

;R3=R0+R1=0x125+0x144=0x269

ADDR3,R3,R2

;R3=R3+R2=0x269+0x56=0x2BF

HEREBHERE;stayhere

;–––––––––

MY_SUB

;––––––-

;saveR0,R1,andR2onstack

;beforetheyareusedbyaloop

STRR0,[R13];saveR0onstack

SUBR13,R13,#4

;R13=R13-4,todecrementthestackpointer


SUBR13,R13,#4



SUBR13,R13,#4


;––—R0,R1,andR2arechanged

MOVR0,#0;R0=0

MOVR1,#0;R1=0

MOVR2,#0;R2=0

;––––––––––––––

;restoretheoriginalregisterscontentsfromstack

ADDR13,R13,#4

;R13=R13+4toincrementthestackpointer

LDRR2,[R13];restoreR2fromstack

ADDR13,R13,#4



ADDR13,R13,#4



BXLR;returntocaller

Solution:

Aftertheexecutionof

Contentsofsometheregisters

(inHex) Stack

R0 R1 R2 SP(R13)

LDRR13,=0xFF8 0 0 0 FF8

LDRR0,=0x125

LDRR1,=0x144

LDRR2,=0x56

125 144 56 FF8

STRR0,[R13]

SUBR13,R13,#4125 144 56 FF4

STRR1,[R13]

SUBR13,R13,#4125 144 56 FF0

STRR2,[R13]

SUBR13,R13,#4125 144 56 FEC

MOVR0,#0

MOVR1,#0

MOVR2,#0

0 0 0 FEC

ADDR13,R13,#4

LDRR2,[R13]0 0 56 FF0

ADDR13,R13,#4

LDRR1,[R13]0 144 56 FF4

ADDR13,R13,#4

LDRR0,[R13]125 144 56 FF8

ThestacklimitandnestedcallsinARM

Asmentionedearlier,wecandefinethestackanywhereintheread/writememory.So,intheARMthestackcanbeasbigasitsRAM.InARM,thestackisusedforcallsandinterrupts.Wemustrememberthatuponcallingasubroutinefromthemainprogramusing theBL instruction,R14, the linker register,keeps trackofwhere theCPUshouldreturnaftercompletingthesubroutine.Now,ifwehaveanothercallinsidethesubroutineusingtheBLinstruction,thenitisourjobtostoretheoriginalR14onthestack.Failuretodothatwillendupcrashingtheprogram.Forthisreason,wemustbeverycarefulwhenmanipulatingthestackcontents.

UsingLDMandSTMinstructionsforthestack

AnotherwaytopushregistercontentsontothestackistouseSTM(storemultiple)andLDM(loadmultiple)instructions.Inmanyinterruptandmultitaskingapplicationsweneedtosavethecontentsofmultipleregistersonthestackandrestorethemback.Pushingtheregistersontothestackoneregisteratatimeistimeconsuming.ForthisreasonARMhas theSTMandLDMinstructions.TheSTMandLDMallowtostore(push)and load(pop)multiple registerswith a single instruction.Next,weexplain each instruction andhowitisused.

STM

BelowshowsthesyntaxfortheSTM.Noticewecanspecifythenumberofregistersto be stored and destination RAM address (pointer) to which the data is pushed onto.Examinethefollowinginstructions:

STMR11,{R0-R3}

;StoreR0throughR3ontomemorypointedtobyR11

STMR8,{R0-R7}


STMR7,{R0,R3,R5}

;StoreR0,R3,R5ontomemorypointedtobyR7

STMR11,{R0-R10}


LDM

BelowshowsthesyntaxfortheLDM.Noticewecanspecifythenumberofregisters

tobeloadedanddestinationRAMaddress(pointer)fromwhichthedataispoppedfrom.Examinethefollowinginstructions:

LDMR11,{R0-R3}

;LoadR0throughR3frommemorypointedtobyR11

LDMR8,{R0-R7}


LDMR7,{R0,R3,R5}

;LoadR0,R3,R5frommemorypointedtobyR7

LDMR11,{R0-R10}


Example6-9showshowwecanuseSTMandLDMtosimplifyacodeandpreventunwantederrors.

Example6-9

ModifytheExample6-8usingtheLDMandSTMinstructions.

Solution:

;initializetheSPtopointto

;thelastlocationofRAM(Stack_Top)

;AssumeStack_Top=0xFF8


LDRR0,=0x125;R0=0x125

LDRR1,=0x144;R1=0x144

MOVR2,#0x56;R2=0x56



;=0x125+0x144=0x269


;=0x269+0x56=0x2BF

HEREBHERE;stayhere

;–––––––––

MY_SUB

;–––––––––

;saveR0,R1,andR2onstack

;beforetheyareusedbyaloop

STMR13,{R0-R2};saveR0,R1,R2onstack

SUBR13,R13,#12


MOVR0,#0;R0=0

MOVR1,#0;R1=0

MOVR2,#0;R2=0

;–––

;restoretheoriginalregisterscontentsfromstack

ADDR13,R13,#12

LDMR13,{R0-R2};restoreR0,R1,andR2fromstack

BXLR;returntocaller

;–––-

WecanalsousethePUSHandPOPpseudo-instructionstodothesamething.

CopyingablockofdatawithLDMandSTM

To copy a block of data, we bring into the CPU’s register a word of data frommemoryandthencopyitfromtheregistertoalocationinRAM.Inthatcasewecopyoneword(4bytes)atatime.Sotocopy16wordswehavetosetacounterto16foraloopiteration.We can use theLDMandSTM to do the same thingwithmuch less coding.Using LDM and STM instructions to copy a block of data, we need a register for thesourceaddressandanotheroneforthedestinationaddress.Program6-1usesR11andR12for source and destination addresses, respectively. The registers R0–R9 are used astemporaryplacefordatabeforetheyarecopiedtothedestination.Thatgivesus10words(40bytes)transferatatime.NoticethatintheSTMandLDMinstructions,registersaretransferredtoorfrommemoryintheorderofthelowesttohighest,soR0willalwaysbetransferredtoorfromalocationlowerthanthelocationofR1inthememoryandsoon.ThatiswhyweshouldnotbeworryingabouttheorderofregistersinthestackwhenweuseSTMandLDManditisnotneededtoPUSHandPOPregistersinoppositeorderasiscommoninnormalPOPandPUSHinstructions.

Program6-1:CopyingaBlockofMemory

Thisprogramcopiesablockof10words(40bytes)memoryfromsourcetodestination.TheregistersR11andR12areusedforsourceanddestinationaddresses.


ENTRY

LDMR11,{R0-R9};LoadR0thruR9frommemorypointedtobyR11

STMR12,{R0-R9};StoreR0thruR9tomemorypointedtobyR12

END

ReviewQuestions

1.The______registeristhedefaultstackpointer.

2.HowdeepisthesizeofthestackintheARM?

3.WriteaprogramthatpushesR5,R6,R7,andR8intothestack.

4.WriteaprogramthatpopsR5,R6,R7,andR8fromthestack.

5.Whatdoesthefollowingprogramdo?

LDRR5,=0x40000000

LDMR5,{R1,R4}

LDRR5,=0x50000000

STMR5,{R1,R4}

Section6.3:ARMBit-AddressableMemoryRegionManymicroprocessorsallowprogramstoaccessmemoryandI/Oportsinbytesize

incrementsonly.Inotherwords,ifyouneedtocheckasinglebitofanI/Oport,youmustreadtheentirebytefirstandthenmanipulatethewholebytewithsomelogicinstructionstogetholdofthedesiredbit.ThisisnotthecasewithARMmicroprocessors.Indeed,oneofthemostimportantfeaturesoftheCPUistheabilitytoaccesstheRAMandI/Oportsinbits insteadofbytes.This isavery importantandpowerful featureof theARMusedwidelyintheembeddedsystemdesignandapplications.InthissectionweshowaddressassignmentofbitsofI/OandRAM,inadditiontowaysofprogrammingthem.

Bit-addressable(bit-band)SRAM

Ofthe4GBmemoryspaceoftheARM,anumberofbytesarebit-addressable.TheARMliteraturereferstothisregionasthebit-bandregion.SincetheARMinstructionis32-bitandtheCPUexecutescodeoneinstructionatatime,thecode(FlashROM)spaceisalwayswordsizeandwordaligned,aswehaveseenthroughoutthechapters.Thatmeansthebit-addressableregionsmustbelocatedinSRAMandI/Operipheralsregions.Thebit-addressableRAMand I/O locations vary among the familymembers and vendors.TheARM generic manual defines the location addresses of bit-band as 0x20000000 to0x200FFFFFforSRAMand0x40000000to0x400FFFFFforperipherals.Noticetheyarelocatedatthelowest1MBaddressspaceofSRAMandperipherals.SeeTable6-1.Itmustbe alsonoted that thebit-band (bit-addressable) regions are theonly region that canbeaccessedinbothbitandbyte/halfword/wordformatswhiletheotherareaofmemorymustbeaccessedinbyte/halfword/wordsize.Inotherwords,thememoryspaceregionoutsideof the bit-band must always be accessed in byte/halfword/word formats only. For theARMCortex-M,thebit-bandSRAMhasaddressesof0x20000000to0x200FFFFF.This1Mbytesbit-addressableregionisgivenaliasaddressesof0x22000000to0x23FFFFFF.Therefore, thebitaddresses0x22000000 to0x2200001Fare for the firstbyteofSRAMlocation0x20000000,and0x22000020to0x2200003FarethebitaddressesofthesecondbyteofSRAMlocation0x20000001,andsoon.SeeFigure6-7.

Figure6-7:SRAMbit-addressableregionandtheiraliasaddresses

SinceeachbyteofSRAMhas8bitsweneedanaddressforeachbit.Thismeansweneedatleast8Maddresslocationstoaccess8Mbits,oneaddressforeachbit.However,tomaketheaddressesword-alignedtheARMprovides4-bytealiasaddressforeachbit.Forexample,0x22000000to0x2200001Fisassignedtoasinglebytelocationof0x20000000.That means we have 0x22000000 to 0x23FFFFFF (total of 32M locations, as aliasaddresses)for1Mbytesofaddress.

BitmapforSRAM

FromFigure6-7onceagainnoticethefollowingfacts:

1.Thebitaddress0x22000000isassignedtoD0ofSRAMlocation0x20000000.


3.Thebitaddress0x22000008isassignedtoD2ofSRAMlocationof0x20000000.

4.Thebitaddress0x2200000CisassignedtoD3ofSRAMlocationof0x20000000.


6.Thebitaddress0x22000014isassignedtoD5ofRAMlocation0x20000000.


8.Thebitaddress0x2200001CisassignedtoD7ofSRAMlocation0x20000000.

NoticethatSRAMlocations0x20000000–0x200FFFFFarebothbyte-addressableand bit-addressable. The only difference is when we access it in byte (or halfword orword)weuseaddresses0x20000000to0x200FFFFF,butwhentheyareaccessedinbit,theyareaccessedviatheiraliasaddressesof0x22000000to0x23FFFFFF.Thereasonit

is called aliases is because it is same physical location but accessed by two differentaddresses. It is like a same person but different names (aliases) See Examples 6-10through6-12.

Example6-10

ThegenericARMchiphasthefollowingaddressassignments.Calculatethespaceandtheamountofmemorygiventoeachregion.

(a)Addressrangeof0x20000000–200FFFFFforSRAMbit-addressableregion

(b)Addressrangeof0x22000000–23FFFFFFforaliasaddressesofbit-addressableSRAM

Solution:

(a)200FFFFF–20000000=FFFFFbytes.ConvertingFFFFFtodecimal,weget1,048,575+1=1,048,576,whichisequalto1Mbytes.

(b)23FFFFFF–22000000=1FFFFFFbytes.Converting1FFFFFFtodecimal,weget33,554,431+1=33,554,432,whichisequalto32Mbytes.

Example6-11

WriteaprogramtosetHIGHtheD6oftheSRAMlocation0x20000001usinga)byteaddressandb)thebitaliasaddress.

Solution:

a)

LDRR1,=0x20000001;loadtheaddressofthebyte

LDRBR2,[R1];getthebyte

ORRR2,R2,#2_01000000;makeD6bithigh

;(binaryrepresentationinKeilfor0b01000000)

STRBR2,[R1];writeitback

b)FromFigure6-7wehaveaddress0x22000038asthebitaddressofD6ofSRAM

location0x20000001.

LDRR1,=0x22000038;loadthealiasaddressofthebit

MOVR2,#1;R2=1

STRBR2,[R1];WriteonetoD6

Example6-12

WriteaprogramtosetLOWtheD0bitoftheSRAMlocation0x20000005usinga)byteaddressandb)thebitaliasaddress.

Solution:

a)

LDRR1,=0x20000005;loadtheaddressofbyte

LDRBR2,[R1];getthebyte

ANDR2,R2,#2_11111110;makeD0bitlow

STRBR2,[R1];writeitback

b)FromFigure6-7wehaveaddress0x220000A0asthebitaddressofD0ofSRAMlocation0x20000005.

LDRR2,=0x220000A0;loadthealiasaddressofthebit

MOVR0,0;R0=0

STRBR0,[R2];writezerotoD0

PeripheralI/Oportbit-addressableregion

The general purpose I/O (GPIO) and peripherals such as ADC, DAC, RTC, andserialCOMport arewidelyused in the embedded systemdesign. InmanyARM-basedtrainerboardsweseetheconnectionofLEDs,switches,andLCDtotheGPIOpinsoftheARM chip. In such trainers the vendor provides the details of I/O port and peripheralconnectionstotheARMchipinadditiontotheiraddressmap.Aswediscussedearlier,theARMhas set aside1Mbytesofaddress space tobeusedbit-band (bit-addressable) I/Oand peripherals. The address space assigned to bit-band peripherals and GPIO is0x40000000 to 0x400FFFFF with address aliases of 0x42000000 to 0x43FFFFFF.Examine your trainer board data sheet for the bit-band addresses implemented on theARMchip.

Figure6-8:Peripheralsbit-addressableregionandtheiraliasaddresses.

BitmapforI/Operipherals

FromFigure6-8onceagainnoticethefollowingfacts.

1.Thebitaddress0x42000000isassignedtoD0ofperipheralslocation0x40000000.


3.Thebitaddress0x42000008isassignedtoD2ofperipheralslocationof0x40000000.

4.Thebitaddress0x4200000CisassignedtoD3ofperipheralslocationof0x40000000.




8.Thebitaddress0x4200001CisassignedtoD7ofperipheralslocation0x40000000.

Whenaccessingaperipheralport in a single-bitmanner,wemustuse theaddressaliasesof0x42000000–0x43FFFFFF.Indeedthebit-addressableperipheralsarewidelyusedinembeddedsystemdesign.

ReviewQuestions

1.Trueorfalse.AllbytesofSRAMinARMarebit-addressable.

2.Trueorfalse.AllbitsoftheI/OperipheralsinARMarebit-addressable.

3.Trueorfalse.AllROMlocationsoftheARMarebit-addressable.

4.Ofthe4GbytesofmemoryintheARM,howmanybytesarebit-addressable?Listthem.

5.HowwouldyouchecktoseewhetherbitD0oflocation0x20000002ishighorlow?

6.Findouttowhichbyteeachofthefollowingbitsbelongs.GivetheaddressoftheRAMbyteinhex.

(a)0x23000030 (b)0x23000040 (c)0x23000048

(d)0x4200003C (e)0x43FFFFFC

Section6.4:AdvancedIndexedAddressingModeInpreviouschaptersweshowedhow tousea simple indexedaddressingmode in

STR, LDR, LDM and STM. The advanced addressing modes bring very importantadvantagesintheARMandwillbediscussedinthissection.

Indexedaddressingmode

Intheindexedaddressingmode,aregisterisusedasapointertothedatalocation.TheARMprovidesthreeindexedaddressingmodes.Thesemodesare:preindex,preindexwithwriteback,andpostindex.Table6-2summarizesthesemodes.Eachoftheseindexedaddressingmodecanbeusedwithoffsetoffixedvalueoroffsetofashiftedregister.SeeTable6-3.Inthissectionwewilldiscusseachmodeindetail.

IndexedAddressingMode Syntax PointingLocationin

MemoryRmValueAfterExecution

Preindex LDRRd,[Rm,#k] Rm+#k Rm

PreindexwithWB* LDRRd,[Rm,#k]! Rm+#k Rm+#k

Postindex LDRRd,[Rm],#k Rm Rm+#k

*WBmeansWriteback

**RdandRmareanyofregistersand#kisasigned12-bitimmediatevaluebetween-4095and+4095

Table6-2:IndexedAddressinginARM

Offset Syntax PointingLocation

Fixedvalue LDRRd,[Rm,#k] Rm+#k

Shiftedregister LDRRd,[Rm,Rn,<shift>] Rm+(Rnshifted<shift>)

*RnandRmareanyofregistersand#kisasigned12-bitimmediatevaluebetween-4095and+4095

**<shift>isanyofshiftsstudiedinChapter3likeLSL#2

Table6-3:OffsetofFixedValuevs.OffsetofShiftedRegister

Preindexedaddressingmodewithfixedoffset

In thisaddressingmode,a registerandapositiveornegative immediatevalueareused as a pointer to the data location. The value of register does not change afterinstructionisexecuted.ThisaddressingmodecanbeusedwithSTR,STRB,STRH,LDR,LDRB,andLDRH.SeeExample6-13.

Example6-13

WriteaprogramtostorecontentsofR5totheSRAMlocation0x10000000to0x10000000Fusingpreindexedaddressingmodewithfixedoffset.

Solution:

LDRR5,=0x55667788

LDRR1,=0x10000000

;loadtheaddressoffirstlocation

STRR5,[R1];storeR5tolocation0x10000000

STRR5,[R1,#4]

;storeR5tolocation0x10000000+4(0x10000004)

STRR5,[R1,#8]


STRR5,[R1,#0x0C]

;storeR5tolocation0x10000000+0x0C(0x1000000C)

NoticethatafterrunningthiscodethecontentofR1isstill0x10000000

Itisacommonpracticetousearegistertopointtothefirstlocationofthememoryspace and access the different locations using proper offsets. For example see thefollowingprogram:

ADRR0,OUR_DATA;pointtoOUR_DATA

LDRBR2,[R0,#1];loadR2withBETA

…

OUR_DATA

ALFADCB0x30

BETADCB0x21

Preindexedaddressingmodewithwrite-backandfixedoffset

Thisaddressingmode is likepreindexedaddressingmodewithfixedoffsetexceptthat the calculated pointer is written back to the pointing register. We put ! after theinstructiontotelltheassemblertoenablewritebackintheinstruction.SeeExample6-14.

Example6-14

RewriteExample6-13usingpreindexedaddressingmodewithwritebackandfixedoffset.

Solution:

LDRR1,=0x10000000;loadtheaddressoffirstlocation

STRR5,[R1];storeR5tolocation0x10000000

STRR5,[R1,#4]!


;writebackmakesR1=0x10000004

STRR5,[R1,#4]!


;writebackmakesR1=0x10000008

STRR5,[R1,#4]!

;storeR5tolocation0x10000008+4(0x1000000C)

;writebackmakesR1=0x1000000C

NoticethatafterrunningthiscodethecontentofR1is0x1000000C

Postindexedaddressingmodewithfixedoffset

Thisaddressingmodeislikepreindexedaddressingmodewithwritebackandfixedoffset except that the instruction is executed on the location that Rn is pointing toregardlessofoffset value.After running the instruction, thenewvalueof thepointer iscalculatedandwrittenbacktothepointingregister.Examinethefollowinginstructions:

STRR1,[R2],#4;storeR1ontomemorypointedtoby

;R2andthenwritebackR2+4toR2

LDRBR5,[R3],#1;loadabytefrommemorypointedto

;byR3andthenwritebackR3+1toR3

Noticethatwritebackisbydefaultenabledinpostindexedaddressingandthereisnoneed toput ! after instructionsbecausepostindexingwithoutwriteback isuselessas theindex is neither used in the instruction nor written back to the pointing register. SeeExample6-15.

Example6-15

RewriteExample6-13usingpostindexedaddressingmodewithfixedoffset.

Solution:

LDRR1,=0x10000000;loadtheaddressoffirstlocation

STRR5,[R1],#4;storeR5tolocation0x10000000andwriteback

;0x10000000+4(0x10000004)toR1


;0x10000004+4(0x10000008)toR1


;0x10000008+4(0x1000000C)toR1

STRR5,[R1],#4;storeR5tolocation0x1000000Candwriteback

;0x1000000C+4(0x10000010)toR1

NoticethatafterrunningthiscodethecontentofR1is0x10000010.

Preindexedaddressmodewithoffsetofashiftedregister

This advancedaddressingmode is avery important feature in theARM.We startdescribingthismodefromsimplecasewithnoshift(shift=0)andthenwewillfocusonmorecomplexformats.

Simpleformatofpreindexedaddressmodewithoffsetregister

ThefollowingisthesimplesyntaxforLDRandSTR.

LDRRd,[Rm,Rn];RdisloadedfromlocationRm+Rnofmemory

STRRs,[Rm,Rn];RsisstoredtolocationRm+Rnofmemory

Thisaddressingmodeiswidelyused in implementingofobjectorientedprogramswhenwewant tomakeadynamicarrayofvariables.Example6-16showshowweusethisaddressingmodeinaccessingdifferentlocationsofanarraydefinedinmemory.

Example6-16

ExaminethevalueofR5andR6aftertheexecutionofthefollowingprogram.

POINTERRNR2

ARRAY1RNR1

AREAEXAMPLE_6_16,CODE,READONLY

ENTRY

LDRARRAY1,=MYDATA

LDRBR4,[ARRAY1]

;loadPOINTERlocationofARRAY1toR4(R4= 0x45)

MOVPOINTER,#1;POINTER=1topointtolocation1ofarray

LDRBR5,[ARRAY1,POINTER]

;LoadPOINTERlocationofARRAY1toR5

;(R5 = 0x24)



;loadPOINTERlocationofARRAY1toR6

;(R6 = 0x18)

HEREBHERE

MYDATADCB0x45,0x24,0x18,0x63

END

Solution:

AfterrunningtheLDRBR4,[ARRAY1]instruction,location0ofMYDATAisloadedtoR4.NowR4=0x45.

Next,afterrunningtheLDRBR5,[ARRAY1,POINTER]instruction,location1ofMYDATAisloadedtoR5.NowR5=0x24.

Next,afterrunningtheLDRBR6,[ARRAY1,POINTER]instruction,location2ofMYDATAisloadedtoR6.SothecontentofR6=0x18.

NoticethatpurposelyweusedDCBandLDRBinthisexamplesoeachlocationofMYDATAtakesonebyte.

InExample6-16,wepurposelyusedDCBandLDRBsoeachlocationofMYDATAtakesonebyte.IfwedefineanarrayusingDCD,thenwewillnotbeabletouseLDRBR5,[ARRAY1,POINTER] to load thePOINTERlocationofARRAY.SeeExample6-17forclarification.

Example6-17

InExample6-16,changeMYDATADCB0x45,0x24,0x18,0x63toMYDATADCD0x45,0x2489ACF5andexaminethevalueofR5andR6aftertheexecutionofthe

followingprogram.

POINTERRNR2

ARRAY1RNR1


ENTRY

LDRARRAY1,=MYDATA

LDRBR4,[ARRAY1];loadPOINTERlocationof

;ARRAY1toR4(R4= 0x45)

MOVPOINTER,#1;POINTER=1topointto

;location1ofarray


;LoadPOINTERlocationofARRAY1toR5

MOVPOINTER,#2;POINTER=2topointto

;location2ofarray


;loadPOINTERlocationofARRAY1toR6

HEREBHERE

MYDATADCD0x45,0x2489ACF5

END

Solution:

AfterrunningtheLDRBR4,[ARRAY1]instruction,location0ofMYDATAisloadedtoR4.NowR4=0x45.

Next,afterrunningtheLDRBR5,[ARRAY1,POINTER]instruction,location1ofMYDATAisloadedtoR5.NowR5=0x00.

Next,afterrunningtheLDRBR6,[ARRAY1,POINTER]instruction,location2ofMYDATAisloadedtoR6.SothecontentofR6=0x00.

Toaccess locations of aword size arraywehave tomultiply the pointer by four.Similarly,toaccesslocationsofahalf-wordsizearraywehavetomultiplythepointerbytwo.Forexample,wecancorrectprogramofExample6-17byreplacing


instructionwithfollowinginstructions:

MOVPOINTER,#2;POINTER=2

MOVPOINTER,POINTER,LSL#2;POINTERisshiftedlefttwobits(×4)

;topointtoword2ofthearray

Noticethatbyshiftingleftavaluetwobits,wemultiplyitbyfour.NextwewillseehowwecanuseindexedaddressingwithshiftedregisterstocombinemultiplicationwiththeLDRandSTRinstructions.

Generalformatofpreindexedaddressmodewithoffsetregister

ThegeneralformatofindexedaddressingwithshiftedregisterforLDRandSTRisasfollows:

LDRRd,[Rm,Rn,<shift>];(ShiftedRn)+Rmisusedasthepointer

STRRd,[Rm,Rn,<shift>];(ShiftedRn)+Rmisusedasthepointer

Intheaboveinstructions<shift>canbeanyofshiftinstructionsstudiedinChapter3suchasLSL,LSR,ASRandROR.Examinethefollowinginstructions:

LDRR1,[R2,R3,LSL#2];R2+(R3 × 4)isusedasthepointer

;contentoflocationR2+(R3×4)isloadedtoR1

STRR1,[R2,R3,LSL#1];R2+(R3×2)isusedasthepointer

;R1isstoredtolocationR2 + (R3×2)

STRBR1,[R2,R3,LSL#2];R2+(R3 × 4)isusedasthepointer

;firstbyteofR1isstoredtolocationR2+(R3×4)

LDRR1,[R2,R3,LSR#2];R2+(R3/4)isusedasthepointer

;contentoflocationR2+(R3/4)is

;loadedtoR1

From the above codeswe can see that indexed addressingwith shifted register ismostly used tomultiply thepointer by a powerof two and that iswhy it is also calledindexedaddressingwith scaled register.ExamineExample6-18 to seehowwecanusescaledregisterindexingtoaccessanarrayofwords.

Notice that scaled register indexing is not supported for halfword load and storeinstructions.

Example6-18

ExaminethevalueofR5andR6aftertheexecutionofthefollowingprogram.

POINTERRNR2

ARRAY1RNR1


ENTRY

LDRARRAY1,=MYDATA


LDRR4,[ARRAY1,POINTER,LSL#2];





HEREBHERE

MYDATADCD0x45,0x2489ACf5,0x2489AC23

END

Solution:

AfterrunningtheLDRR4,[ARRAY1,POINTER,LSL#2]]instruction,location0of

MYDATAisloadedtoR4.NowR4=0x45.

Next,afterrunningtheLDRR5,[ARRAY1,POINTER,LSL#2]instruction,location1ofMYDATAisloadedtoR5.NowR5=0x2489ACF5.

Next,afterrunningtheLDRR6,[ARRAY1,POINTER,LSL#2]instruction,location2ofMYDATAisloadedtoR6.SothecontentofR6=0x2489AC23.

Writebacksign(!)inpreindexedloadandstorewithscaledregister

We can force all scaled register load and store instructions to writeback thecalculated pointer to the pointing register by putting ! after each load and storeinstructions.Examinethefollowinginstructions:

LDRR1,[R2,R3,LSL#2]!

;R2+(R3×4)isusedasthepointer

;contentoflocationR2+(R3×4)isloadedtoR1

;R2=R2+(R3×4)(R2isupdated.)

STRR1,[R2,R3,LSL#1]!

;R2+(R3×2)isusedasthepointer

;R1isstoredtolocationR2+(R3×2)

;R2=R2+(R3×2)(R2isupdated.)

Scaledregisterpostindex

The following instructionsare someexamplesof scaled registerpostindex in loadandstoreinstructions:

STRR1,[R2],R3,LSL#2

;storeR1atlocationR2ofmemoryandwriteback

;R2+(R3 × 4)toR2.

LDRR1,[R2],R3,LSL#2

;loadlocationR2ofmemorytoR1andwriteback

;R2+(R3 × 4)toR2.

Look-uptable

Oneuseofarrayisforimplementinglook-uptables.Thelook-uptableisawidelyusedconceptinmicrocontrollerprogramming.Itallowsaccesstoelementsofafrequentlyusedtablewithminimumoperations.Asanexample,assumethatforacertainapplicationwe need 4 + x2 values in the range of 0 to 9. To do so, the look-up table is stored inprogrammemoryspaceandaccessedasanarray.ThisisshowninExamples6-19through6-21.

Example6-19

WriteaprogramtogetthexvaluefromR9andsendx2+2x+3toR10.AssumeR9hasthexvalueof0–9.Usealook-uptableinsteadofamultiplyinstruction.

Solution:

AREALOOKUP_EXAMP6_19,READONLY,CODE

ENTRY

ADRR2,LOOKUP;pointtoLOOKUP

LDRBR10,[R2,R9];R10=locationR9oflookuparray


LOOKUPDCB3,6,11,18,27,38,51,66,83,102

END

Example6-20

WriteaprogramtogetthexvaluefromR9andsendfactorialofxtoR10.AssumeR9hasthexvalueof0–10.Usealook-uptableinsteadofamultiplyinstruction.

Solution:

AREALOOKUP_EXAMP6_20,READONLY,CODE

ENTRY

MOVR9,#5


LDRR10,[R2,R9,LSL#2];R10=locationR9oflookuparray


LOOKUPDCD1,1,2,6,24,120,720,5040,40320,362880,3628800

END

Example6-21

Writeaprogramthatcalculates10tothepowerofR2andstorestheresultinR3.AssumeR2hasthexvalueof0–6.Usealook-uptableinsteadofamultiplyinstruction.

Solution:

AREALOOKUP_EXAMP_6_21,READONLY,CODE

ENTRY


LDRR3,[R1,R2,LSL#2];R3=locationR2oflookuparray


LOOKUPDCD1,10,100,1000,10000,100000,1000000

END

WritebackoptionsofSTMandLDM

TheSTMandLDMinstructionsallowyoutostoreandloadmultipleregisterswithasingleinstruction.Wecanalsospecifytheactiontobetakenforthepointer.Theactioncanbeincrementordecrementbeforeorafterthepopisdone.ThisisshowninTable6-4.

Option Description

IA IncrementAfter

IB IncrementBefore

DA DecrementAfter

DB DecrementBefore

Table6-4:OptionsforLDMandSTMinstructions

IA stands for IncrementAfter and adds four (the size of register in bytes) to thepointerafterloadorstoringeachregister.

IBstands for IncrementBeforeandadds four (the sizeof register inbytes) to thepointerbeforeloadorstoringeachregister.

DAstandsforDecrementAfterandsubtractsfour(thesizeofregisterinbytes)fromthepointerafterloadorstoringeachregister.

DB stands forDecrementBefore and subtracts four (the size of register in bytes)fromthepointerbeforeloadorstoringeachregister.

ForfurtherclarificationassumethatR1=0x100.Figure6-9showsthememoryafterrunningSTMR1!,{R2,R3}witheachofIA,IB,DAandDBoptions.

Figure6-9:FourOptionsofSTMandLDMinARM

Notice that generally we have four stack structure, either it is ascending ordescending.Thestackiscalledascendingwhenitisincrementedaftereachstore(PUSH)instruction and decremented after each load (POP) instruction. It is called descendingwhen it is decremented after each store (PUSH) instruction and incremented after eachload(POP)instruction.Thestackpointercanpointtothelastfilledlocation;inthiscasethestackiscalledFullStack.Thestackpointercanpointtothenextavailablelocation,aswell;whichiscalledanEmptyStack.SeeFigure6-10formoreclarification.

Figure6-10:FourGeneralStackStructure

To implement a Full Ascending stack we have to use STMIB because the stackpointershouldincrementonstoreinstructionanditshouldbeincrementedbeforestoringeach registerbecause it shouldpoint to full location.On theotherhandwehave touseLDMDA for pop instruction because the stack pointer should decrement on loadinstructionanditshouldbedecrementedbeforeloadingeachfulllocationbecauseitwaspointing to an empty locationbeforedecrementing.Table6-5 lists appropriate load andstore instruction for each stack structure. It is difficult and prone to error to rememberwhichloadandstoreshouldbeusedwitheachstackstructure.Tosolvethisproblem,eachofloadandstoreoptionshasalternatenamewhichiseasytorememberwhenitisusedforstackoperation.ThelasttwocolumnsofTable6-5listthealternatenames.

StackStructure Load StoreLoad

(alternateNames)

Store

(alternateNames)

FullAscending LDMDA STMIB LDMFA STMFA

FullDescending LDMIA STMDB LDMFD STMFD

EmptyAscending LDMDB STMIA LDMEA STMEA

EmptyDescending LDMIB STMDA LDMED STMED

Table6-5:OptionsforLDMandSTMinstructions

Program6-2usesSTMandLDMtosimplifyprogramofExample6-8.

Program6-2:UsingSTMFAandLDMFAforStack(RepeatofExample6-8)

;UsingFullAscendingLoadandStoreforstack


ENTRY


LDRR0,=0x125;R0=0x125

LDRR1,=0x144;R1=0x144

MOVR2,#0x56;R2=0x56


ADDR3,R0,R1;R3=R0+R1=0x125+0x144=0x269

ADDR3,R3,R2;R3=R3+R2=0x269+0x56=0x2BF

HEREBHERE;stayhere

;–––––––––

MY_SUB

;––—saveR0,R1,andR2onstackbeforetheyareusedbyaloop

STMFAR13,{R0-R2};saveR0,R1,R2onstackusingFullAscending


MOVR0,#0;R0=0

MOVR1,#0;R1=0

MOVR2,#0;R2=0

;–––restoretheoriginalregisterscontentsfromstack

LDMFAR13,{R0-R2}

;restoreR0,R1,andR2fromstackusingF.Ascending

BXLR;returntocaller

END

ItmustbenotedthatARM CortexhasPUSHandPOPinstructions.SeeAppendixAformoreinformation.

ReviewQuestions

1.Trueorfalse.Inafullstackthepointerpointstothelaststoredlocation.

2.Trueorfalse.Inadescendingstackthepointerincrementsaftereachpush.

3.WriteaninstructionthatpushesLR(R14)intoanemptydescendingstack.

4.WriteaninstructionthatpopsR10fromafullascendingstack.

Section6.5:ADR,LDR,andPCRelativeAddressingIn indexedaddressingmodes,anyregisters includingthePC(R15)registercanbe

usedas thepointerregister.Forexample, thefollowing instructionreads thecontentsofmemorylocationPC+4:

LDRR0,[PC,#4]

Inthisway,thedatawhichhasaknowndistancefromthecurrentexecutinglinecanbe accessed. As discussed in Chapter 4, the PC register points 8 bytes (2 instructions)ahead of executing instruction. As a result, “LDR R0,[PC,#4]” accesses a memorylocationwhoseaddressis4+8bytesaheadofthecurrentinstruction.Generallyspeaking,theaddressofthememorylocationwhichisbeingaccessedusing“LDRR0,[PC,offset]”canbefoundusingthisformula:theaddressofcurrentinstruction+8+offset.

Forinstance,if“LDRR0,[PC,#4]”islocatedinaddress0x10theeffectiveaddressis:0x10+8+4=0x1C.

Thisaddressingmodecanbeconsideredasasubsetofindexedaddressingmodes.InARMapplicationnotes,theprogramrelativeaddressingmodeisconsideredasaseparateaddressingmode.

ImplementingtheADRDirective(LDRRn,=value)

The ADR directive uses the PC relative addressing mode to load registers. Forexampleseethefollowingprogram:

AREALOOKUP_EXAMPLE,READONLY,CODE

ENTRY

ADRR2,OUR_FIXED_DATA;pointtoOUR_FIXED_DATA

LDRBR0,[R2];loadR0withthecontents

;ofmemorypointedtobyR2

ADDR1,R1,R0;addR0toR1


OUR_FIXED_DATA

DCB0x55,0x33,1,2,3,4,5,6

DCD0x23222120,0x30

DCW0x4540,0x50

END

SeeFigure6-11.Atcompiletime,theADRisreplacedwith“ADDR2,PC,#0x08”.Sincetheinstructionisinaddress0x00,theinstructionaccesseslocation0+8+0x08=0x10.AsshownintheFigure,0x10istheaddressofOUR_FIXED_DATA.

Figure6-11:MemoryDumpforADRInstruction

ImplementingtheLDRDirective

To implement the LDR directive, assembler stores the value as a fixed data inprogram memory and then accesses it using the LDR instruction and the PC relativeaddressingmode. Figure 6-12 shows the implementation of Program6-3. For example,0x12345678isstoredinmemorylocations0x10–0x13,andtheLDRdirectiveisreplacedwithLDRR0,[PC,#0x08].SincePC=0, theLDRR0,=0x1234567 is located at address0x000000.Nowwehave0+8+8=16=0x10.

Program6-3:LDRDirective

AREAEXAMPLE,READONLY,CODE

ENTRY

LDRR0,=0x12345678

LDRR1,=0x86427531

ADDR2,R0,R1

H1BH1

END

Figure6-12:MemoryDumpforLDRInstruction

The same way for the LDR R1,=0x86427531 is located in ROM address0x00000004. Therefore we have 4+8+8=20=0x14, which is the address of the data0x86427531.

ReviewQuestions

1.WhichregisterisusedasthepointerinPCrelativeaddressingmode?

2.WhichdirectiveismoreoptimizedADRorLDR?Why?

ProblemsSection6.1:ARMMemoryMapandMemoryAccess

1.Whatisthebusbandwidthunit?

2.Givethevariablesthataffectthebusbandwidth.

3.Trueorfalse.Onewaytoincreasethebusbandwidthistowidenthedatabus.

4.Trueorfalse.Anincreaseinthenumberofaddressbuspinsresultsinahigherbusbandwidthforthesystem.

5.Calculatethememorybusbandwidthforthefollowingsystems.

(a)ARMof100MHzbusspeedand0WS

(b)ARMof80MHzbusspeedand1WS

6.Indicatewhichofthefollowingaddressesiswordaligned.

(a)0x1200004A (b)0x52000068 (c)0x66000082

(d)0x23FFFF86 (e)0x23FFFFF0 (f)0x4200004F

(g)0x18000014 (h)0x43FFFFF3 (i)0x44FFFF05

7.Showhowdataisplacedafterexecutionofthefollowingcodeusing(a)littleendianand(b)bigendian.

LDRR2,=0xFA98E322

LDRR1,=0x20000100

STR[R1],R2

8.Trueorfalse.InARM,instructionsarealwayswordaligned.

9.Trueorfalse.Inawordalignedaddressthelowerdigitoftheaddressis0,4,8,orC.

10.Showhowmanymemorycyclesdoesittaketofetchthefollowingdataintoregister

LDRR1,=0x20000004

LDRD[R1],R2


LDRR1,=0x20000102

LDRD[R1],R2


LDRR1,=0x20000103

LDRD[R1],R2


LDRR1,=0x20000006

LDRH[R1],R2


LDRR1,=0x20000C10

LDRB[R1],R2

Section6.2:StackandStackUsageinARM

15.Trueorfalse.InARMtheR13isdesignatedasstackpointer.

16.WhenBLisexecuted,howmanylocationsofthestackareused?

17.WhenBisexecuted,howmanylocationsofthestackareused?

18.InARM,stackpointeris_________register.

19.DescribehowtheactionassociatedwiththereturnoperationisperformedinARM.

20.GivethesizeofthestackinARM.

21.InARM,whichaddressissavedwhenBLinstructionisexecuted.

Section6.3:ARMBit-AddressableMemoryRegion

22.WhichmemoryregionsofARMarebit-addressable?

23.Givethebit-addressableSRAMregionaddressforgenericARM.

24.Whatbitaddressesareassignedtobyteaddressof0x20000004?


26.Whatbitaddressesareassignedtobyteaddressof0x200FFFFF?



29.Whatbitaddressesareassignedtobyteaddressof0x4000000C?


31.Thefollowingarebitaddresses.Indicatewhereeachonebelongs.

(a)0x2200004C (b)0x22000068 (c)0x22000080

(d)0x23FFFF80 (e)0x23FFFF00 (f)0x4200004C

(g)0x42000014 (h)0x43FFFFF0 (i)0x43FFFF00

32.Ofthe4GbytesofmemorylocationsintheARM,howmanyofthemarealsoassignedabitaddressaswell?Indicatewhichbytesthoseare.

33.Trueorfalse.Thebit-addressableregioncannotbeaccessinbyte.

34.Trueorfalse.Thebit-addressableregioncannotbeaccessinword.

35.WriteaprogramtoseewhethertheD7bitofRAMlocation0x20000020ishigh.Ifso,senda1toD1ofRAMlocation0x20000000.

36.WriteaprogramtoseewhethertheD7bitofI/Olocation0x40000000islow.Ifso,senda0totheD0oflocation0x400FFFFF.

37.WriteaprogramtosethighallthebitsofRAMlocations0x2000000usingthefollowingmethods:

(a)byteaddresses(b)bitaddresses

38.WriteaprogramtoseewhethertheSRAMlocation0x20000000isdivisibleby8.

39.ExplainhowtheLDMinstructionworks.

40.ExplainhowtheSTMinstructionworks.

41.ExplainthedifferencebetweenLDMandLDRinstructions.

42.ExplainhowthedifferencebetweenSTMandSTRinstructions.

43.ExplaintheLDMIAoperationanditsimpactontheSP.

44.ExplaintheLDMIBoperationanditsimpactontheSP.

45.ExplaintheSTMIAoperationanditsimpactontheSP.

46.ExplaintheSTMIBoperationanditsimpactontheSP.

Section6.4:AdvancedIndexedAddressingMode

47.Trueorfalse.Writebackisbydefaultenabledinpreindexedaddressingmode.

48.Indicatetheaddressingmodeineachofthefollowinginstructions

(a)LDRR1,[R5],R2,LSL#2(b)STRR2,[R1,R0]

(c)STRR2,[R1,R0,LSL#2]!(d)STRR9,[R1],R0

49.Whatisanascendingstack?

50.Whatisthedifferencebetweenanemptyandafullstack?

51.WriteaninstructionthatstoresR0inafulldescendingstack.

52.WriteaninstructionthatloadsR9fromanemptydescendingstack.

Section6.5:ADR,LDR,andPCRelativeAddressing

53.Assumingthattheinstruction“LDRR2,[PC,#8]islocatedinaddress0x300,

calculatetheaddressofthememorylocationwhichisaccessed.

54.UsingPCrelativeaddressingmode,writeanLDRinstructionthataccessesamemorylocationwhichis0x20bytesaheadofitself.


1.4bytes

2.Compilersensurethatcodesarewordaligned.

3.littleendian

4.1/66MHz=15.15nsisthebusclockperiod.Sincethebuscycletimeofzerowaitstatesis2clocks,wehave2×15.15=30.3ns

5.1/100MHz=10nsisthebusclockperiod.50ns-10ns=40ns.TheNumberofWSis40ns/10ns=4.

6.False

Section6.2

1.R13

2.ThestackcanbeasbigasitsRAM

3.STMR13,{R5-R8}

SUBR13,R13,#16

4.ADDR13,R13,#16

LDMR13,{R5-R8}

5.Itcopiesthecontentsoflocations0x40000000–0x4000000Fintolocations0x50000000–0x5000000FusingtheLDMandSTMinstructions.

Section6.3

1.False

2.False

3.False

4.2MBytes;locations0x20000000to0x200FFFFFofSRAMand0x40000000to0x400FFFFFofGPIO

5.

LDRR0,=0x22000040

LDRR1,[R0]

CMPR1,#0

BNEL1

…

L1:

6.

(a)0x23000030-0x22000000=0x1000030;0x1000030/0x20=0x80001;thusitisinlocation0x20000000+0x80001=0x20080001

(0x1000030%32)/4=(48%32)/4=16/4=4;itisD4of0x20080001.

(b)0x1000040/0x20=0x80002;itisinlocation0x20080002

(0x1000040%0x20)/4=0;itisD0oflocation0x20080002

(c)0x1000048/0x20=80002;itisinlocation0x20080002

(0x1000048%0x20)/4=2;itisD2oflocation0x20080002

(d)0x4200003C-0x42000000=0x03C;0x03C/0x20=0x01

(0x3C%0x20)/4=0x1C/4=7;itisD7of0x40000001

(e)0x43FFFFFC–0x42000000=0x1FFFFFC;0x1FFFFFC/0x20=0xFFFFF

(0x1FFFFFC%0x20)/4=0x1C/4=7;D7of0x400FFFFF

Section6.4

1.True

2.False

3.STMEDSP!,{LR}

4.LDMFASP!,{R10}

Section6.5

1.PC(R15)

2.ADR,ToimplementtheLDRdirectivethevalueisstoredinmemory;asaresult,itusesmorememorywhiletheADRusesnomemory.

Chapter7:ARMPipelineandCPUEvolutionThis chapterwill look at pipeline evolution inARMwhile examining otherCPU

enhancements. In Section 7.1 the ARM’s pipelines are studied. Section 7.2 exploresvariousprocessorsenhancements.

Section7.1:ARMPipelineEvolutionThere aremanyways available to processor designers to increase the processing

poweroftheCPU.HerewelistsomeofthemthatareusedinARM.

1. Increase theclockfrequencyof thechip.Onedrawbackof thismethod is that thehigher the frequency, the more the power dissipation and the more difficult andexpensivethedesignofthemicroprocessorandmotherboard.

2.Increasethenumberofdatabusestobringmoreinformation(codeanddata)intotheCPU to be processed. For example, Von Neumann architecture in ARM7 has beenreplacedbyHarvardarchitectureinnewerversions.SeeChapter6.

3. Change the internal architecture of the CPU to overlap the execution of moreinstructions. This requires a lot of transistors. There are two trends for this option,superpipelineandsuperscalar.Insuperpipelining,theprocessoffetchingandexecutinginstructionsissplitintomanysmallstepsandallaredoneinparallel.Inthiswaytheexecution of many instructions is overlapped. The number of instructions beingprocessedatagiventimedependsonthenumberofpipelinestages,commonlytermedthe pipeline depth. Some designers use as many as 8 stages of pipelining. Onelimitationofsuperpipeliningisthatthespeedoftheexecutionislimitedtothesloweststage of the pipeline. Compare this to making pizza. You can split the process ofmakingpizzaintomanystages,suchasflatteningthedough,puttingonthetoppings,andbaking,buttheprocessislimitedtothesloweststage,baking,nomatterhowfastthe rest of the stages areperformed.Whathappens ifweuse twoor threeovens forbakingpizzas tospeedup theprocess? Thismayworkformakingpizzabutnot forexecutingprograms,sinceintheexecutionofinstructionswemustmakesurethatthesequenceof instructions iskept intact and that there isnoout-of-stepexecution.Thedifficultiesassociatedwithastalledpipeline(aslowdowninonestageofthepipeline,which prevents the remaining stages from advancing) has made CPU designersabandonsuperpipelininginfavorofsuperscaling.Insuperscaling,theentireexecutionunit has beendoubled and eachunit has 5 pipeline stages.Therefore, in superscalar,there is more than one execution unit and each has many stages, rather than oneexecution unit with 8 stages as in the case of a superpipelined processor. In somesuperscalarprocessors,therearetwoexecutionunitseachwith4pipelinestagesinsteadofasingleexecutionunitwith8pipelinestagesassuperpipeliningproponentswouldhave it. Inotherwords, insuperscalingwehave two(oreven three)executionunitsandastheinstructionsarefetchedtheyareissuedtothevariousexecutionunits.Usingtheanalogyofpizza,superscalarislikedoublingortriplingtheentirecrewflatteningthedough,puttingtoppingson,andbaking.Ofcourse,youwillneedalotmorepeopleinvolvedintheprocessandyouhavetohavemoreovens,butatthesametimeyouaredoublingortriplingthepizzaoutput.Incasesofrecentmicroprocessorarchitecture,avastmajorityofdesignershavechosensuperscalingoversuperpipelining.Thisrequiresnumeroustransistorstoduplicateseveralexecutionunits,justlikeneedingmorepeoplein our pizza-making analogy. Fortunately, advances in IC design have alloweddesigners access to hundreds of million transistors to throw around for the

implementationofpowerfulsuperscaling.Therearesomeproblemswithsuperscaling,suchasdatadependencyissues,whichcanbesolvedbythecompiler.

4.Combiningmorethanonecoreinasingleprocessorisanotherwayofimprovingthespeed of high end processers. Cortex-A series ofARM supports up to four cores incombinationwitheachother.

Next,wewillexaminetheissueofpipelining.SeeFigure7-1.

Figure7-1:Non-PipelinedInstructionExecutionvs.2-stagePipeline(8086)

Moreaboutpipelining

IntheearlyCPUstherewasnopipelining.Atanygivenmoment,iteitherfetchedorit executed. It couldnotdobothat the same time. In thenon-pipelinedCPU,while thebuseswerefetchingtheinstructions(opcodes)anddata,theCPUwassittingidle,andinthesameway,whentheCPUwasexecutinginstructions,busesweresittingidle.However,intheearlypipelinedCPUsuchas8086thefetchandexecutewereperformedinparallelby two sections inside the CPU called the BIU (bus interface unit) and EU (executionunit).

For the concept of pipelining to work, we need a buffer or queue in which aninstructionisprefetchedandreadytobeexecuted.Insomecircumstances,theCPUmustflush out the queue. For example, when a branch (B, BNE, BCS, and so on) or callinstructionisexecuted,theCPUstartstofetchcodesfromthenewmemorylocationandthecodeinthequeuethatwasfetchedpreviouslyisdiscarded.Inthiscase,theexecutionunit must wait until the fetch unit fetches the new instruction. This is called a branchpenalty.Thepenalty isanextra instructioncycle to fetch the instruction from the targetlocation insteadof executing the instruction rightbelow thebranch.Remember that theinstructionbelowthebranchhasalreadybeenfetchedand isnext in line tobeexecutedwhentheCPUbranchestoadifferentaddress.NotethatnewerCPUshavemorestagesintheir pipeline. For example a 3 stage pipelinemay divide the code execution to Fetch,

Decode and Execute stages.When the number of stages in a pipeline increases, morestagesshouldbeflushedoutwhenabranchinstructionisexecuted.ExamineExample7-1toseehowbranchpenaltyslowsdowntheexecutionofacode.Next,youwill seehowbranchpredictionsolvestheproblem.

Example7-1

Howmanycyclesdoesittakefora3stagepipelinedCPUtorun3iterationofthefollowingcode?

MOVR1,#0

L1ADDR2,R2,#1

BL1

MOVR3,#3

MOVR4,#4

Solution:

For the first instruction (MOV R1,#0), it takes 3 cycles to pass through the stages ofpipelineandbeexecuted.Afterthethirdcycle,oneinstructionisexecutedineachcycle.WhentheBranchinstructionisexecutedincycle5,theCPUflushesthepipelinebecausethe fetch and decode instruction in cycles 4 and 5 are not needed. It causes two clockcyclesbranchpenalty.ThesamescenariohappenseachtimetheCPUexecutesabranch.Aswecanseeinthefigure,Ittakes13cyclestorun7instructions.

Branchprediction

Branch prediction is another new feature of the new CPUs. The penalty forbranchingisveryhighforahigh-performancepipelinedmicroprocessorsuchastheARM.Forexample, inthecaseoftheBNE(branchifnotequal) instruction, if itbranches, the

pipelinemustbeflushedandrefilledwithinstructionsfromthetargetlocation.Thistakestime.Incontrast,theinstructionimmediatelybelowtheBNEisalreadyinthepipelineandisadvancingwithoutdelay.Someprocessorshave thecapability topredict andprefetchcodefrombothpossible locationsandhavethemadvancedthroughthepipelinewithoutwaiting (stalling) for the outcome of the zero flag. The ability to predict branches andavoidthebranchpenaltycanresultinasubstantialreductionintheclockcountforagivenprogram. Some CPUs have branch prediction, but with greater capability. When itencountersbranchinstructions(suchasBNE),itcreatesalistoftheminwhatiscalledthebranchtargetbuffer(BTB).TheBTBpredictsthetargetofthebranchandstartsexecutingfromthere.Whenthebranchisexecuted,theresultiscomparedwithwhatthepredictionsection of the CPU said it would do. If they match, the branch is retired. If not, allinstructions behind the branch are removed from the pool and the correct branch targetaddressisprovidedtotheBTB.FromtheretheBTBrefillsthepipelinewithinstructionsfromthenewtargetaddress.SeeExample7-2.

Example7-2

Showhowmanycyclesdoesittakefora3-stagepipelinedCPUtorun3iterationofthecodeinExample7-1?Assumethatthebranchpredictionunithaspredictedallbranches.

Solution:

Ittakes9cyclestorun7instructions.Incycles4to8instructionsarepredictedbybranchpredictionunit.

Notethatstoresareneverperformedspeculativelysincethereisnotransparentwaytoundo them.Storesarealsonever re-orderedamong themselves.Astore isdispatchedonly when both the address and the data are available and there are no older storesawaitingdispatch.

3-stagepipelineinARM7

Sincetheintroductionofthe8086microprocessorin1978,processordesignershavecometorelymoreandmoreontheconceptofpipeliningtoincreasetheprocessingpower

oftheCPU.ARM7usedtheconceptofpipeliningwiththreestagesoffetch,decode,andexecute.SeeExample7-3.

Example7-3

ShowhowthefollowingcodeisexecutedinARM7.

MOVR4,R5

ADDR1,R2,R3

SUBR6,R7,R8

Solution:

5-stagepipelineinARM9

AswementionedearliertheARM7hasa3-stagepipeline.AsshowninFigure7-2,theARM9hasextendedthepipelineto5stages.Theyare:

1.Fetch

2.Decode

3.Execute

4.Memory

5.Write

Figure7-2:5-StagePipelineinARM9

Fetch:IntheFetchstagetheinstructionsarefetchedfrommemoryandplacedinthequeueandwaittobedecoded.InthisstagetheProgramCounter(PC)isalsoincrementedby4sincetheARMinstructionsare4-bytes.

Decode: In thedecodestage the instruction isdecodedand theregister file isalso

accessedtogeteverythingreadyfortheExecutestage.

Execute:Inthisstage,anyeffectiveaddresscalculationandsignextendingofabyteor a half-word are done. In the instructions such as Load and Store this stage getseverything ready for the next stage of memory access. In instructions such as “ADDR1,R2,R3”inwhichalltheresourcesneededfortheexecutionoftheinstructionarereadybeforeitcomestothisstage,theregistersareaddedanditgoesdirectlytothewrite-backstagetowritetheresulttotheregisterfile.

Memory: For instructions such as Load and Store in which external memoryaccessesareneeded,thememorystagefetchesthedatafromtheexternalmemoryandhasthedatainsidetheCPUreadyforthenextstageofthewrite-back.Ifaninstructiondoesnotneedtoaccessmemory, thisstageisbypassedandtheresult isforwardedtothelaststageofwrite-back.Thismeansitbecomesa4-stagepipeline.

Write:Alsocalledwrite-backisthestageinwhichtheinstructioniscompletedbywriting the result to the register fileand retiring the instruction.Aswe just stated, ifaninstructiondoesnotneedtoaccessmemory,write-backisthestagerightaftertheexecutestage,meaningformanyinstructionswereallyhave4-stagepipeline.

3-stagevs.5-stagepipeline

Inthe3-stagepipelineofARM7,theexecution,thememoryaccess,andthewritingtheresulttoregisterfileareallperformedbytheExecutestage.Inthe5-stagepipeline,theCPUdecouplesthememoryaccessandexecutestage.Withthetwonewstagesofmemoryandwrite-back, theARM9 increases the processing power of theCPUby allowing theCPUtoworkconcurrentlyon5instructionsinsteadof3instructionsatagiventime.ThisisamajorenhancementoftheARM9overARM7.

ReviewQuestions

1.Whatissuperpipelining?

2.Whatisthelimitationofsuperpipeline?

3.Whatissuperscaling?

4.Trueorfalse.The5-stagepipelinehasbetterperformancethanthe3-stagepipeline.

5.Givethenamesofthe5-stagepipelineinARM9.

Section7.2:OtherCPUEnhancementsThere aremany otherways available tomicroprocessor designers to increase the

processingpoweroftheCPU.Next,weexaminesomeofthem.

SuperscalarCPUs

AnotheruniquefeatureofthemanyofnewCPUsisitssuperscalararchitecture.AlargenumberoftransistorsareusedtoputmorethanoneexecutionunitinsidetheCPU.Astheinstructionsarefetched,theyareissuedtotheseexecutionunits.Figure7-3showstheconceptofsuperscalar.

Figure7-3:SuperscalarCPUs

Issuingtwoinstructionsatthesametimetodifferentexecutionunitscanworkonlyiftheexecutionofonedoesnotdependontheotherone,inotherwords,ifthereisnodatadependency.Asanexample,lookatthefollowinginstructions.

ADDR1,R2,R3

SUBR4,R1,R5

ANDR6,R7,R8

MOVR9,R10

Intheabovecode,theADDandSUBinstructionscannotbeissuedtotwoexecutionunitssinceR1,thedestinationofthefirstinstruction,isusedimmediatelybythesecondinstruction.Thisiscalledread-after-writedependencysincetheSUBinstructionwantstoreadtheR1contents,but itmustwaituntilafter theADDisfinishedwritingit intoR1.TheproblemisthatADDwillnotwriteintoR1untilthelaststageofthepipeline,andbythen it is too late for the pipeline of the SUB instruction. This prevents the SUBinstructionfromadvancinginthepipeline,thereforecausingthepipelinetobestalleduntiltheADDfinisheswritingandthentheSUBinstructioncanadvancethroughthepipeline.ThiskindofdatadependencyraisestheclockcountfortheSUBinstruction.Whatiftheinstructionsarerescheduled?Wewilldiscussoutoforderexecutionnextinthischapter.

ADDR1,R2,R3

ANDR6,R7,R8

SUBR4,R1,R5

MOVR9,R10

If they are rescheduled as shownabove, each canbe issued to separate executionunits,allowingparallelexecutionofbothinstructionsbytwodifferentunitsoftheCPU.Since the clock count for each instruction is one, having two execution units leads toexecuting two instructionsbypairing themtogether, therebyusingonlyoneclockcountfor two instructions. In the case of the above program, if it is run on the CPU withsuperscalar it will take only 2 clocks instead of 4, assuming that two instructions arepaired together. This reordering of instructions to take advantage of the two internalexecutionunitsoftheCPUcanbedonebycompilerorCPU itselfandiscalledinstructionscheduling.Currently,somecompilersarebeingequippedtodoinstructionschedulingtoremovedependencies.Theprocessofissuingtwoinstructionstothetwoexecutionunitsiscommonlyreferredtoasinstructionpairing.

Superpipelinedandsuperscalar

Somemicroprocessors use a 10-stage pipeline for the CPU. In contrast to the 5-pipestage,althougheachpipestageofthe10-pipestageperformslesswork,therearemorestages. This means that in such processors, more instructions can be worked on andfinished at a time. These CPUs with their 10- or 12-stage pipeline are referred to assuperpipelined. Since they also have multiple execution units capable of working inparallel,theyarealsosuperscalar.Anotheradvantageofthesuperpipelinedconceptisthatitcanachieveahigherclockrate(frequency)withthegiventransistor technology.Theyalso usewhat is called out-of-order execution to increase the performance of theCPU.Thisisexplainednext.

Decouplingandout-of-orderexecution

InCPUarchitecture,whenoneof thepipelinestages isstalled, thepriorstagesoffetchanddecodearealsostalled.Inotherwords,thefetchstagestopsfetchinginstructionsif the execution stage is stalled, due, for example, to a delay in memory access. Thisdependency of fetch and execution has to be resolved in order to increase CPUperformance.ThatisexactlywhatmanydesignershavedonewiththeCPUandiscalleddecoupling the fetch and execution phases of the instructions. In these processors,instructionsarefetchedfrommemoryandplacedintoapoolcalledthe instructionpool.SeeFigure 7-4.

Figure7-4:CPUInstructionExecution

Thisfetch/decodeoftheinstructionsisdoneinthesameorderastheprogramwas

codedbytheprogrammer(orcompiler).However,whentheyareplacedintheinstructionpool theycanbeexecuted inanyorderas longas thedataneeded isavailable. Inotherwords,ifthereisnodependency,theinstructionsareexecutedoutoforder,notinthesameorder as the programmer coded them. Such a speculative execution can go 20–30instructionsdeepintotheprogram.Itisthejoboftheretireunittoprovidetheresultstothe programmer’s (visible) registers (e.g., R0, R1) according to the order in which theinstructionswerecoded.Again,itisimportanttonotethattheinstructionsarefetchedinthesameorderthattheywerecoded,butexecutedoutoforderifthereisnodependency,andultimatelyretiredinthesameorderastheywerecoded.Thisout-of-orderexecutioncanboostperformanceinmanycases.LookatExample7-4.

Example7-4

ThefollowingARMcode(a)setsthepointerforthreedifferentarrays,andthecountervalue,(b)getseachelementofARRAY_1,addsafixedvalueof100toit,andstorestheresultinARRAY_2,and(c)complementstheelementandstoresitinARRAY_3.Analyzetheexecutionofthecodeinlightoftheout-of-orderexecutionandbranchpredictioncapabilitiesofanARMCPU.

(i1)LDRR1,=ARRAY_1;loadpointer



(i4)MOVR4,#COUNT;loadthecounter

(i5)AGAINLDRR5,[R1];loadtheelement

(i6)ADDR5,R5,#100;addthefixvalue

(i7)ADDR1,R1,#4;updatethepointer

(i8)STRR5,[R2];storetheresult


(i10)MVNR5,R5;complementtheresult

(i11)STRR5,[R3];andstoreit


(i13)SUBSR4,R4,#4;

(i14)BNEAGAIN;stayintheloop

(i15)

Solution:

Thefetch/decodeunitfetchesandputsinstructionsintothepool.Sincethereisnodependencyforinstructionsi1throughi5,theyaredispatched,executed,andretiredexceptfori5.Noticethatthepointervaluesofi1toi4areimmediatevalues;therefore,theyareembeddedintotheinstructionwhenthefetch/decodeunitgetsthem.Nowi5isamemoryfetchthatcantakemanyclocks,dependingonwhethertheneededdataislocatedincacheormainmemory.Meanwhilei6,i8,i10,andi11mustwaituntilthedataisavailable.Howeveri7,i9,andi12canbeexecutedoutoforder.Moreimportantly,theBNEinstructionispredictedtogotothetargetaddressofAGAINandi5,i6,…aredispatchedoncemoreforthenextiteration.Thistimethememoryfetchwilltakeveryfewclockssinceinthepreviousdatafetch,theCPUreadsomebytesofdataintothecache.ThisprocesswillgoonuntilthelastroundofloopingwhereR4becomeszeroandfallsthrough.Atthistime,duetomisprediction,alltheinstructionsbelongingtoinstructionsi5,i6,i7,…(startoftheloop)areremovedandthewholepipelinerestartswithinstructionsbelongingtoi15,i16,andsoon.

Due to the fact that memory fetches (due to cachemisses) can takemany clockcyclesandresultinunderutilizationoftheCPU,out-of-orderexecutionisawayoffindingsomething to do for theCPU.Simply put, the idea of out-of-order execution is to lookdeepintothestreamofinstructionsandfindtheonesthatcanbeexecutedaheadofothers,providedthatresourcesareavailable.Again,it isimportanttonotethattheseprocessorswillnotimmediatelyprovidetheresultsofout-of-orderexecutionstoprogrammer-visibleregisterssuchasR0,R1,andsoon,sinceitmustmaintaintheoriginalorderofthecode.Instead,theresultsofout-of-orderexecutionsarestoredinthepoolandwaittoberetiredinthesameorderastheywerecoded.Therefore,programmer-visibleregistersareupdatedinthesamesequenceasexpectedbytheprogrammer.

Registerrenaming

Therearesomecases inwhich instructionsarenotreallydependentoneachotherbutthereisakindofimplicitdependencycalledregisterdependency.SeeExample7-5.

Example7-5

Forthefollowingcode,indicatetheinstructionsthatcanbeexecutedinparalleloroutoforder.

(i1)LDRR4,[R2];loadR4frommemorypointedtobyR2

(i2)ADDR3,R4,R7;R4+R7–>R3

(i3)ADDR6,R8,R10;R8+R10–>R6

(i4)SUBR5,R1,R9;R1-R9–>R5

(i5)ADDR6,R12,#1;R12+1–->R6

Solution:

Instructioni2cannotbeexecuteduntilthedataisbroughtinfrommemory(eithercacheormainmemoryDRAM).Therefore,i2isdependentoni1andmustwaituntiltheR4registerhasthedata.However,instructionsi3andi4canbeexecutedoutoforderandparallelwitheachothersincethereisnodependencyamongthem.Noticethati5isnotreallydependantoni3becausei5doesnotuseanyofdatageneratedbyi3.Buti5andi3cannotbeexecutedoutoforderorinparallelbecauseR6ismodifiedbybothofi3andi5.Thiskindofdependencyiscalledregisterdependencyandissolvedbyamethodcalledregisterrenaming.

InthefollowingcodenoneoftheinstructionscanbeexecutedinparallelbecauseofusingR1inallinstructions:

MOVR1,#5

ADDR3,R1,#2

MOVR1,#6

ADDR4,R1,#2

Ifyouexaminetheabovecodecarefullyyouwillseethatthefirsttwolinesofcodeare independent from the second two lines of code and we can remove the implicitdependencybychangingR1toanotherregistersuchasR2inthelasttwolinesofcode:

MOVR1,#5

ADDR3,R1,#2

MOVR2,#6

ADDR4,R2,#2

Renaming the registersbefore issuing the instructions toexecutionunit isdone inmanyofnewadvancedCPUanditiscalledregisterrenaming.

PuttingthemalltogetherinanARMCPU

InFigure7-5youcanseeatop-leveldiagramoftheARMCortexA9processor.Ithasmostofthepartsdiscussedinthischapter.

Figure7-5:Top-leveldiagramoftheARMCortexA9processor

Busfrequencyvs.internalfrequencyinCPU

Frequently you may see an advertisement for a 1-GHz or 2-GHz CPUs. It isimportanttonotethatthestatedfrequencyistheinternalfrequencyoftheCPUandnotthebusfrequency.Thisisduetothefactthatdesigninga1-GHzmotherboardisverydifficultandexpensive.Suchadesignrequiresaveryfastlogicfamilyandmemoryinadditiontoamassive simulation to avoid crosstalk and signal radiation. The bus frequency for suchsystemsiscurrentlylessthan1GHz.

ReviewQuestions

1.Trueorfalse.TheARMinstructionsetisintriadicform.

2.Trueorfalse.ThebranchpredictiontaskisperformedbycircuitryinsidetheCPU.

3.WhyaresomeCPUscalledasuperscalarprocessor?

4.Instructionschedulingisdoneby________.

5.Outoforderexecutionisarrangedby________.

ProblemsSection7.1:ARMPipelineEvolution

1.TheARM7usesapipelineof_______stages.

2.GivethenamesofthepipelinestagesintheARM7

3.TheARM9usesapipelineof_______stages.

4.GivethenamesofthepipelinestagesintheARM9

Section7.2:OtherCPUEnhancements

5.Thenumberofpipelinestagesinasuperpipelinesystemis________(less,more)thaninasuperscalarsystem.

6.Whichhasoneormoreexecutionunits,superpipelineorsuperscalar?

7.Whichpartofon-chipcacheintheARMiswriteprotected,dataorcode?

8.Whatisinstructionpairing,andwhencanithappen?

9.Whatisdatadependency,andhowisitavoided?

10.Trueorfalse.Instructionsarefetchedaccordingtotheorderinwhichtheywerewritten.

11.Trueorfalse.Instructionsareexecutedaccordingtotheorderinwhichtheywerewritten.

12.Trueorfalse.Instructionsareretiredaccordingtotheorderinwhichtheywerewritten.

13.ThevisibleregistersR0,R1,andsoon,areupdatedbywhichunitoftheCPU?

14.Trueorfalse.Amongtheinstructions,STRs(store)areneverexecutedoutoforder.


1.Insuperpipelining,theprocessoffetchingandexecutinginstructionsissplitintomanysmallstepsandallaredoneinparallel.Inthiswaytheexecutionofmanyinstructionsisoverlapped.

2.Thespeedoftheexecutionislimitedtothesloweststageofthepipeline.

3.Insuperscaling,theentireexecutionunithasbeendoubled

4.True

5.Fetch,decode,execute,memory,writeback

Section7.2

1.True

2.True

3.Sinceithastwoexecutionunits(pipelines)capableofexecutingtwoinstructionswithoneclock

4.circuitryinsidetheCPUandthecompiler

5.theCPU

AppendixA:ARMCortex-M3InstructionDescription

SectionA.1:ListofARMCortex-M3InstructionsADCAddwithCarry

ADCSAddwithCarry(andupdatetheflags)

ADDADD

ADDSADDandupdatetheflags

ADRLoadPC-RelativeAddress

ANDLogicalAND

ANDSLogicalAND(updateflags)

ASRArithmeticShiftright

ASRSArithmeticShiftright(updatetheflags)

BBranch(unconditionaljump)

BxxBranchConditional

BFCBitFieldClear

BFIBitFieldInsert

BICBitClear

BICSBitClear(updateflags)

BKPTBreakpoint

BLBranchwithLink(thisisCallinstruction)

BLXBranchIndirectwithLink

BXBranchIndirect(BXLRisusedforReturn)

CBNZCompareandBranchonNon-Zero

CBZCompareandBranchonZero

CDPCoprocessorDataprocessing

CLREXClearExclusive

CLZCountLeadingZero

CMNCompareNegative

CMPCompare

CPSIDChangeprocessorIDandDisableInterrupt

CPSIEChangeProcessorStateandEnableInterrupt

DMBDataMemoryBarrier

DSBDataSynchronizationBarrier

EORExclusiveOR

EORSExclusiveORandupdatetheflags

ISBInstructionSynchronizationBarrier

ITIf-ThenConditionBlock

LDCLoadCoprocessor

LDMLoadMultipleregisters

LDMDBLoadMultipleregistersandDecrementBeforeeachaccess

LDMEALoadMultipleregistersfromEmptyAscending

LDMFDLoadMultipleregistersFullDescending

LDMIALoadMultipleregistersandIncrementaftereachAccess

LDRLoadRegister

LDRRx,=ValueLoadRegisterwith32-bitvalue

LDRBLoadRegisterByte

LDRBTLoadRegisterBytewithTranslation

LDREX,LDREXB,LDREXHLoadRegisterExclusive

LDRHLoadRegisterHalfword

LDRSBLoadRegistersignedByte

LDRSHLoadRegisterSignedHalfword

LDRTLoadRegisterwithTranslation

LSLLogicalShiftLeft




MCRMovetoCoprocessorfromARMRegister

MLAMultiplyAccumulate

MLSMultiplyandSubtract

MOVMove(ARM7)

MOVMove(ARMCortex)

MOVSMove(andupdateflags)

MOVTMoveTop

MOVWMove16-bitconstant

MRCMovetoARMRegisterfromCoprocessor

MRSMovetogeneralRegisterfromSpecialregister

MSRMovetoSpecialregisterfromgeneralRegister

MULUnsignedMultiplication

MVNMoveNegative

MVNSMoveNegativeandupdatetheflags

NOPNoOperation

ORNLogicalORNot

ORNSORNotandupdateflags

ORRLogicalOR

ORRSLogicalORandupdatetheflags

POPPOPregisterfromStack

PUSHPUSHregisterontostack

RBITReverseBits

REVReversebyteorderinaword

RV16Reversebyteorderin16-bit

REVSHReversebyteorderinbottomhalfwordandsignextend

RORRotateRight




RSBReverseSubtract

RSBSReverseSubtractandupdatetheflags

SBCSubtractwithCarry(Borrow)

SBCSSubtractwithCarry(Borrow)andupdatetheflags

SBFXSignBitFieldextract

SDIVSignedDivide

SEVSendEvent

SMLALSignedMultiplyAccumulateLong

SMULLSignedMultiplyLong

SSATSignSaturate

STMStoreMultiple

STMDBStoreMultipleregisterandDecrementBefore

STMEAStoreMultipleregisterEmptyAscending

STMIAStoreMultipleregisterEmptyAscending

STMFDStoreMultipleregisterFullDescending

STRStoreRegister

STRBStoreRegisterByte

STRBTStoreRegisterBytewithTranslation

STRDStoreRegisterDouble(twowords)

STREX,STREXB,STREXHStoreRegisterExclusive

STRHStoreRegisterHalfword

STRTStoreRegister

SUBSubtract

SUBSSubtract

SVCsupervisorCall(SoftwareInterrupt)

SXTBSignExtendbyte

SXTHSignExtendHalfword

TBBTableBranchByte

TBHTableBranchhalfword

TEQTestEquivalence

TSTTest

UBFXUnsignedBitfiledextract

UDIVUnsignedDivide

UMLALUnsignedMultiplywithAccumulate

UMULLUnsignedMultiplyLong

USATUnsignedSaturate

UXBTZeroextendabyte

UXTHZeroextendhalfword

WFEWaitforevent

WFIWaitforinterrupt

SectionA.2:ARMCortex-M3InstructionDescription

ADCAddwithCarry

Flags:Unaffected.

Format:ADCRd,Rn,Op2;Rd=Rn+Op2+C

Function:IfC=1priortothisinstruction,thenafterexecutionofthisinstruction,Op2isaddedtoRnplus1andtheresultisplacedinRd.IfC=0,Op2isaddedtoRnplus0.Usedwidelyinmultiwordadditions.Aftertheexecutiontheflagsarenotupdated.TheADCSinstructionupdatestheflags.

Example1:

LDRR0,=0xFFFFFFFB;R0=0xFFFFFFFB

LDRR1,=0xFFFFFFFF;R1=0xFFFFFFFF

MOVR2,#3;R2=3

MOVR3,#4;R3=4

ADDSR4,R0,R1;R4=R0+R1,C=1

ADCR5,R2,R3;R5=R2+R3+C=R2+R3+1

ADCSAddwithCarry(andupdatetheflags)

Flags:Affected:N,Z,V,C.

Format:ADCRd,Rn,Op2;Rd=Rn+Op2+C

Function:IfC=1priortothisinstruction,thenafterexecutionofthisinstruction,Op2isaddedtoRnplus1andtheresultisplacedinRd.IfC=0,Op2isaddedtoRnplus0.Usedwidelyinmultiwordadditions.NoticetheSindicatestheflagswillbeupdated.

Example1:

LDRR0,=0xFFFFFFFB;R0=0xFFFFFFFB


MOVR2,#3;R2=3

MOVR3,#4;R3=4

ADDSR4,R0,R1;R4=R0+R1,C=1

ADCSR5,R2,R3;R5=R2+R3+C=3+4+1,Z=0,N=0,C=0,

ADDADD

Flags:Unaffected

Format:ADDRd,Rn,Op2;Rd=Rn+Op2

Function:Addssourceoperandstogetherandplacestheresultindestination.Thiswillnotupdatetheflags.ToupdatetheflagswemustuseADDS.

Example1:

LDRR0,=0xFFFFFFFF;R0=0xFFFFFFFB

MOVR1,#0x5;R1=0x5

ADDR2,R0,R1

;R2=R0+R1=0xFFFFFFFFB+0x5=00000000

;flagsunchanged

Example2:


ADDR2,R0,#0xF1

;R2=R0+0xF1=R1=0xFFFFFFFF+0xF1=000000F0

;flagsunchanged

ADDSADDandupdatetheflags

Flags:Affected:N,Z,V,C.

Format:ADDSRd,Rn,Op2;Rd=Rn+Op2andupdatetheflags

Function:Addsourceoperandstogetherandplacestheresultindestinationandupdatetheflags.Next,weexaminethecasesofsignedandunsignednumbers.

Unsignedaddition:

Inadditionofunsignednumbers,thestatusofC,Z,N,andVmaychange,butonlyCandZareofanyusetoprogrammers.ThemostimportantoftheseflagsisC.Itbecomes1whenthereiscarryfromD31outina32-bit(D0–D31)operation.

Example1:

MOVR1,#0x45;R1=0x45

ADDSR1,R1,#0x4F;R1=0x94(0x45+0x4F=0x94),C=0,Z=0

Example2:

MOVR2,#0xFE;R2=0xFE

MOVR3,#0x75;R3=0x75

ADDSR4,R2,R3;R4=0xFE+0x75=0x73,C=0,Z=0

Example3:

LDRR0,=0xFFFFFFFF;R0=0xFFFFFFFB

MOVR1,#0x01;R1=0x1

ADDSR2,R0,R1

;R2=R0+R1=0xFFFFFFFFF+0x1=00000000,C=1,Z=1

Example4:

LDRR0,=0xFFFF126F;R0=0xFFFF126F

LDRR1,=0xFFFF46D4;R1=0xFFFF46D4

ADDSR2,R0,R1

;R1=0xFFFF126F+0xFFFF46D5=0xFFFF5943,C=1,Z=0

Signedaddition:

Inadditionofsignednumbers,thestatusofV,Z,andNmustbenoted.Specialattentionshouldbegiventotheoverflowflag(V)sincethisindicatesifthereisanerrorintheresultoftheaddition.TherearetworulesforsettingVinsignednumberoperation.Theoverflowflagissetto1:

1.IfthereisacarryfromD30toD31andnocarryfromD31outina32-bitoperation

2.IfthereisacarryfromD31outandnocarryfromD30toD31ina32-bitoperation

NoticethatifthereisacarrybothfromD31outandfromD30toD31,thenV=0in32-bitoperations.

Example5:

MOVR1,#+8;R1=0x00000008

MOVR2,#+4;R2=0x00000004

ADDSR3,R1,R2;R3=0x0000000CN=0,V=0,C=0

NoticeN=D31=0sincetheresultispositiveandV=0sincethereisneitheracarryfromD30toD31noranycarrybeyondD31.SinceV=0,theresultiscorrect[(+8)+(+4)=(+12)].

Example6:

LDRR1,=0x+42FFFFFF;R1=0x42FFFFFF,(apositivenumber)

LDRR2,=0x+45FFFFFF;R2=0x45FFFFFF(apositivenumber)

ADDSR3,R2,R1;R3=0x88FFFFFE,C=0,N=1,Z=0,andV=1

InExample6,thecorrectresultshouldbe+,buttheresultis–.TheV=1isanindicationofthiserror.NoticethatN=D31=1sincetheresultisnegative;V=1sincethereisacarryfromD30toD31andC=0.

Example7:

LDRR0,=-12;R0=0xFFFFFFF4

LDRR1,=+17;R1=0x00000011

ADDSR2,R1,R2;R2=00000005(whichis+5andcorrect)

;N=0,Z=0,V=0,andC=1

NoticethatV=0sincethereisacarryfromD30toD31andacarryfromD31out.

Example8:

LDRR0,=-30;R0=0xFFFFFFE2

LDRR1,=+14;R2=0x0000000E

ADDSR2,R1,R0;R2=FFFFFFF0(whichis-16andcorrect)

;N=1,Z=0,V=0,andC=0

V=0sincethereisnocarryfromD31outnoranycarryfromD30toD31.

Example9:

LDRR1,=-126;R1=0xFFFFFF82

LDRR2,=-127;R2=0xFFFFFF81

ADDSR3,R2,R1;R3=0xFFFFFF03

;(whichis-253andcorrect),N=1,Z=0andV=0,C=1

V=0sincethereiscarryfromD31outandcarryfromD30toD31.

ADRLoadPC-RelativeAddress

Flags:Unaffected:

Format:ADRRd,label;Rd=addressoflabel

Function:ThisallowsloadingintoRdregisteranaddressrelativetothecurrentPC(programcounter).Thelabeltargetaddressmustbewithinthe-4,095to+4,096bytesfromtheaddressinPCregister.Thatisnofartherthan1024instructionsineitherdirectionofbackwardorforward.

Example:

ADRR3,MyMessage

HEREBHERE

MyMessageDCB“Hello”

ANDLogicalAND

Flags:Unaffected

Format:ANDRd,Rn,Op2;Rd=RnANDedOp2

Function:PerformslogicalANDontheoperands,bitbybit,storingtheresultinthedestination.Thiswillnotupdatetheflags.ToupdatetheflagswemustuseANDS.NoticethatCflagisupdatedduringcalculationofOp2whenLSRorLSLareused.

Inputs Output

X Y XANDY

0 0 0

0 1 0

1 0 0

1 1 1

Example1:

MOVR0,#0x39;R0=0x39

MOVR1,#0x0F;R1=0x0F

ANDR2,R1,R0;R2=09

;3900111001

;0F00001111

;—–––

;0900001001Flagsunchanged

Example2:

MOVR0,#0x37;R0=0x37

ANDR1,R0,#0x0F;R1=R0ANDed0x0F=07

;3700110111

;0F00001111

;—–––

;0700000111Flagsunchanged

ANDSLogicalAND(updateflags)

Flags:Affected:N,Z,C

Format:ANDRd,Rn,Op2;Rd=RnANDedOp2

Function:PerformslogicalANDontheoperands,bitbybit,storingtheresultinthedestination.Thiswillalsoupdatetheflags.NoticethatCflagisupdatedduringcalculationofOp2whenLSRorLSLareused.

Inputs Output

X Y XANDY

0 0 0

0 1 0

1 0 0

1 1 1

Example1:

MOVR0,#0x39;R0=0x00000039

MOVR1,#0x0F;R1=0x0000000F

ANDSR3,R1,R0;R3=0x00000009

;3900111001

;0F00001111

;—–––

;0900001001Z=0,N=0,C=unchanged

Example2:

MOVR1,#0x39;R1=0x00000039

MOVR2,#0xC6;R2=0x000000C6

ANDSR3,R1,R2;R3=00000000

;3900111001

;C611000110

;—–––

;0000000000Z=1,N=0

Example3:

LDRR2,=0xFFFFFF82

LDRR3,=0xFFFFFF81

ANDSR4,R2,R3;R4=0xFFFFFF80,Z=0,N=1

Example4:

LDRR1,=0x55555555

LDRR2,=0xAAAAAAAA

ANDSR3,R2,R1;R3=00000000Z=1,N=0

ASRArithmeticShiftright

Flags:Unaffected.ExceptC

Format:ASRRd,Rm,Rn

Function:AseachbitofRmregisterisshiftedright,theLSBisremovedandtheemptybitsfilledwiththesignbit(MSB).ThenumberofbitstobeshiftedrightisgivenbyRnandtheresultisplacedinRdregister.Theflagsareunchanged.ToupdatetheflagsuseASRSinstruction.

Example1:

LDRR2,=0xFFFFFF82

ASRR0,R2,#6;R0=R2isshiftedright6times

;now,R0=0xFFFFFFFE

Example2:

LDRR0,=0x2000FF18

MOVR1,#12

ASRR2,R0,R1;R2=R0isshiftedrightR1numberoftimes.

;now,R2=0x0002000F

Example3:

LDRR0,=0x0000FF18

MOVR1,#16

ASRR2,R0,R1;R2=R0isshiftedrightR1numberoftimes

;now,R2=0x00000000

ASRarithmeticshiftisusedforsignednumbershifting.ASRessentiallydividesRmbyapowerof2foreachbitshift.

ASRSArithmeticShiftright(updatetheflags)

Flags:N,Z,andC.

Format:ASRRd,Rm,Rn

Function:AseachbitofRmregisterisshiftedright,theLSBiscopiedtoCflagandtheemptybitsfilledwiththesignbit(MSB).ThenumberofbitstobeshiftedrightisgivenbyRnandtheresultisplacedinRdregister.TheASRSupdatestheflags.

Example1:

LDRR2,=0xFFFFFF82

ASRSR0,R2,#6;R0=R2isshiftedright6times.

;now,R0=0xFFFFFFFE,C=0,N=1,Z=0

Example2:

LDRR0,=0x2000FF18

MOVR1,#12

ASRSR2,R0,R1;R2=R0isshiftedrightR1numberoftimes.

;now,R2=0x0002000F,C=1,N=0,Z=0

Example3:

LDRR0,=0x0000FF18

MOVR1,#16

ASRSR2,R0,R1;R2=R0isshiftedrightR1numberoftimes.

;now,R2=0x00000000,C=1,N=0,Z=1

ASRSarithmeticshiftisusedforsignednumbershifting.ASRSessentiallydividesRmbyapowerof2foreachbitshift.

BBranch(unconditionaljump)

Flags:Unchanged.

Format:Btarget;jumptotargetaddress

Function:Thisinstructionisusedtotransfercontrolunconditionallytoanewaddress.ThedifferencebetweenBandBListhattheBLinstructionsavestheaddressofthenextinstructiontoLR(thelinkregister,R14).ForARM7,thetargetaddressiscalculatedby(a)shiftingthe24-bitsigned(2’scomp)offsetlefttwobits,(b)sign-extendtheresultto32-bit,and(c)addittocontentsofPC(programcounter).Thismeansthetargetaddresscouldbewithinthe–32Mbytesto+32Mbytesofaddressspacefromthecurrentprogramcounter.ForARMCortexM3.thetargetaddressmustbewithin–16MBto+16MBaddressspacefromcurrentinstruction.

BxxBranchConditional

Flags:Unaffected.

Format:Bxxtarget;jumptotargetuponcondition

Function:Usedtojumptoatargetaddressifcertainconditionsaremet.InARM7,thetargetaddresscannotbemorethan–32MBto+32MBbytesaway.ForARMCortexM3.thetargetaddressmustbewithin–16MBto+16MBaddressspacefromcurrentinstruction.Theconditionsareindicatedbytheflagregister.Theconditionsthatdeterminewhetherthejumptakesplacecanbecategorizedintothreegroups:

1.flagvalues,

2.thecomparisonofunsignednumbers,and

3.thecomparisonofsignednumbers.

Eachisexplainednext.

1.“Bcondition”wheretheconditionreferstoflagvalues.Thestatusofeachbitof

theflagregisterhasbeendecidedbyexecutionofinstructionspriortothejump.Thefollowing“Bcondition”instructionscheckifacertainflagbitisraisedornot.

Instruction Condition

BCS BranchifCarrySet jumpifC=1

BCC BranchifCarryClear jumpifC=0

BEQ BranchifEqual jumpifZ=1

BNE BranchifNotEqual jumpifZ=0

BMI BranchifMinus/Negative jumpifN=1

BPL BranchifPlus/Positive jumpifN=0

BVS BranchifOverflow jumpifV=1

BVC BranchifNooverflow jumpifV=0

2.“Bcondition”wheretheconditionreferstothecomparisonofunsignednumbers.Afteracompare(CMPRn,Op2)instructionisexecuted,CandZindicatetheresultofthecomparison,asfollows:

C Z

Rn>Op2 1 0

Rn=Op2 1 1

Rn<Op2 0 0

Sincetheoperandscomparedareviewedasunsignednumbers,thefollowing“Bcondition”instructionsareused.

Instruction Condition

BHI BranchifHigher jumpifC=1andZ=0

BEQ BranchifEqual jumpifC=1andZ=1

BLS BranchifLowerorsame jumpifC=0orZ=1

Inreality,the“CMPRn,Op2”isasubtractinstruction(Rn-Op2).Afterthesubtractiontheresultisdiscardedandflagsarechangedaccordingtotheresult.NoticeinARMthesubtractaffectstheCflagsettingdifferentlyfromthex86andotherCPUs.SeetheSUB

instruction.

3.“Bcondition”wheretheconditionreferstothecomparisonofsignednumbers.Inthecaseofthesignednumbercomparison,althoughthesameinstruction,“CMPRn,Op2”,isused,theflagsusedtochecktheresultareasfollows:

Rn>Op2 V=NorZ=0

Rn=Op2 Z=1

Rn<Op2 VinverseofN

Consequently,the“Bcondition”instructionsusedaredifferent.Theyareasfollows:

Instruction

BGE BranchGreaterorEqualjumpifN=1andV=1orN=0andV=0

(V=N)

BLT BranchLessthanjumpifN=1andV=0orN=0andV=1

(NnotequaltoV)

BGT BranchGreaterthanjumpifZ=0andeitherN=1andV=1orN=0and

V=0

(N=V)

BLE BranchLessorEqualjumpifZ=1orN=1andV=0.OrN=0andV=1

(Z=1orNnotequaltoV)

BEQ BranchifEqual jumpifZ=1

All“Bcondition”instructionsareshortjumps,meaningthatthetargetaddresscannotbemore than -32Mbytesbackwardor+32Mbytes forward from thePCof the instructionfollowingthejump.InARMCortexM3itis16MBineachdirection.Whathappensifaprogrammerneedstousea“Bcondition”togotoatargetaddressbeyondthe-32MBto+32MB range?The solution is to use the “BX condition,Rm” sinceRm can be 32-bitaddressandcoverstheentire4GBaddressspaceoftheARM.Thisisshownnext.

LDRR4,=MYTARGET

ADDSR1,R2,R3

BXEQR4;branchtoaddressheldbyR4ifZ=1

MYTRGTSUBSR7,#4

NOP

NOP

….

C Z N V

Rn>Op2 0 0 0 N

Rn=Op2 0 1 0 N

Rn<Op2 1 0 1 InverseofN

BFCBitFieldClear

Flags:Unaffected.

Format:BFCRd,#LSB,#Width

Function:ClearsselectedbitsofRd.ThestartlocationoftheRdbitisindicatedby#LSBandmustbeintherangeof0–31.Howmanybitsshouldbeclearedisindicatedby#Widthandmustbeintherangeof1–32.

Example1:


BFCR1,#2,#14;nowR1=0xFFFF0003

Example2:

LDRR2,=0x999999999;R2=0x99999999

BFCR2,#8,#24;nowR2=0x00000099

BFIBitFieldInsert

Flags:Unaffected.

Format:BFIRd,Rn,#LSB,#Width

Function:SelectedbitsofRnarecopiedtoRd.ThestartlocationoftheRdbitisindicatedby#LSBandmustbeintherangeof0–31.Howmanybitsshouldbecopiedisindicated by #Width andmust be in the range of 1–32. The start bit location of Rn isalwaysbit0(D0).

Example:

LDRR1,=0xABCDABCD;R1=0xABCDABCD

LDRR2,=0x12345678;R2=0x12345678

BFIR1,R2,#4,#8;nowR1=0xABCDA78D

…

BICBitClear

Flags:Unaffected.

Format:BICRd,Rn,Op2;Rd=RnANDedwithNOTofOp2

Function:SelectedbitsofRnareclearedandplacedinRd.TheOp2providesthebitsselection.IftheselectedbitsinOp2arehighthencorrespondingbitsinRnareclearedandtheresultisplacedinRd.IftheselectedbitsinOp2arelowthecorrespondingbitsinRnareleftunchangedandtheresultisplacedinRd.Inreality,theBICperformstheANDoperation on the bits ofRnwith the complement of the bits inOp2. TheBICwill notupdatetheflags.ToupdatetheflagswemustuseBICS.

Inputs Output

X Y XAND(NOTY)

0 0 0

0 1 0

1 0 1

1 1 0

Example:

LDRR1,=0xFFFFFF00;R1=0xFFFFFF00

LDRR2,=0x99999999;R2=0x9999999

BICR3,R2,R1;nowR3=0x00000099

BICSBitClear(updateflags)

Flags:NandZ.

Format:BICSRd,Rn,Op2;Rd=RnANDedwithNOTofOp2

Function:SelectedbitsofRnareclearedandplacedinRd.TheOp2providestheselectedbits.IftheselectedbitsinOp2ishighthenthecorrespondingbitsinRnisclearedandtheresultisplacedinRd.IftheselectedbitsinOp2islowthemthecorrespondingbitsinRnisleftunchangedandtheresultisplacedinRd.Inreality,theBICSperformstheANDoperationonthebitsofRnwiththecomplementofthebitsinOp2andupdatestheflags.

Inputs Output

X Y XAND(NOTY)

0 0 0

0 1 0

1 0 1

1 1 0

Example1:


LDRR2,=0x99999999;R2=0x9999999

BICSR3,R2,R1;nowR3=0x00000099,N=0,Z=0

Example2:


LDRR1,=0x01234567;R1=0x01234567

BICSR2,R1,R0;nowR2=0,N=0,Z=1

Example3:

MOVR0,#0;R0=0

LDRR1=0x99999999;R1=0x99999999

BICSR2,R1,R0;nowR2=0x99999999,N=1,Z=0

BKPTBreakpoint

Flags:Unaffected.

Format:BKPT#imme_value

Function:usedbycompilertoinsertbreakpointintoprograms.UponexecutionoftheBKPTinstructiontheprogramenterstheDebugmode.SeeyourARM compilerformoreinformation

BLBranchwithLink(thisisCallinstruction)

Flags:Unchanged.

Format:BLSubroutine_Addr;transfercontroltoasubroutine

Function:Transferscontroltoasubroutine.ThisinstructionsavestheaddressoftheinstructionaftertheBLinR14(linkregister).Attheendofthesubroutinethecontrolto the instruction after theBL is achieved by copying the LR (R14) register to PC. InARM7,thetargetaddresscannotbemorethan–32MBto+32MBbytesaway.ForARMCortexM3. the target address must bewithin –16MB to +16MB address space fromcurrentinstruction.

Example:

LDRR7,=20000000

BLDELAY;CallsubroutineMY_DELAY

ADDR3,#4;addressofthisinstructionissavedinR14

…

…

DELAYSUBSR7,#4

NOP

NOP

MOVPC,R14;Return,couldhaveused“BXLR”instruction

BLXBranchIndirectwithLink

Flags:Unaffected.

Format:BLXRm;transfercontroltoasubroutinewhose

;addressisgivenbyRm

Function: Transfers control to a subroutinewhoseaddress isgivenby theRmregister. This instruction saves the address of the instruction after the BL in R14 (linkregister). At the end of the subroutine the control to the instruction after the BL isachieved by copying the LR (R14) register to PC. One can use “BX LR” as returninstruction. Notice the difference between this instruction and “BL Target_Addr”instruction. In the “BL Target_Addr” instruction the target address of the subroutine isgiven right there. However, in the “BLX Rm” instruction, the target address of thesubroutineisheldbyregisterRm.

Example:

ADRR2,DELAY

BLXR2;CallsubroutinepointedtobyR2

ADDR3,#4;addressofthisinstructionissavedinR14

…

…

DELAYSUBSR1,#4

NOP

NOP

BXLR;return

BXBranchIndirect(BXLRisusedforReturn)

Flags:Unchanged.

Format:BXRm;BXLRisusedforReturnfromasubroutine

Function:Themostwidelyusageofthisinstructionisintheformof“BXLR”for

thepurposeofreturninstructionattheendofsubroutine.

Example:

LDRR1,=20000000

BLDELAY;CallsubroutineMY_DELAY

ADDR3,#4;addressofthisinstr.issavedinR14

…

…

DELAYSUBSR1,#4

NOP

NOP

BXLR;returntocaller

CBNZCompareandBranchonNon-Zero

Flags:Unchanged.

Format:CBNZRn,Target

Function:TransferscontroltothetargetlocationifRnisnotequaltozero.TheRnmustbeintherangeofR0–R7andtargetaddresscannotbefartherthan130bytesawayfromtheinstruction.ThisinstructioncomparestheRnwithzeroandjumpsonlyifRnisnotzero.Thecomparisonhasnoeffectonflags.Thiscanbeusedforloopsinwhichthebodyoftheloopisnomorethan20instructions.

Example1:

MOVR1,#10;R1=10

L1NOP

NOP

NOP

SUBR1,R1,1;R1=R1-1

CBNZR1,L1

CBZCompareandBranchonZero

Flags:Unaffected.

Format:CBZRn,Target

Function:TransferscontroltothetargetlocationifRniszero.TheRnmustbeinthe rangeofR0–R7and target address cannot be farther than130bytes away from theinstruction.ThisinstructioncomparestheRnwithzeroandjumpsonlyifRniszero.Thecomparisonhasnoeffectonflags.Thiscanbeusedtotestaregistervalueafterreadinga

port.

Example1:

LDRR0,=MYPORT_ADR;R0=MYPORTaddress

HERELDRR2,[R0];readfromMYPORT

CBZR2,HERE;keepreadingMYPORTuntilitiszero

CDPCoprocessorDataprocessing

SeeARMCortex-MManual.

CLREXClearExclusive

SeeARMCortex-MManual.

CLZCountLeadingZero

Flags:Unchanged.

Format:CLZRd,Rn

Function:ScanstheRnregistercontentsfrommostsignificantbit(D31)towardleastsignificantbit(D0)untilitfindthefirstHIGH.ThenumberofbinaryzerobitsbeforeitencountersthefirstbinaryHIGHisplacedinRd.

Example:

LDRR3,=0x01FFFFFF

CLZR1,R3;R1=7sincethereare7zerosbeforethefirstbinary1

CMNCompareNegative

Flags:Affected:V,N,Z,C.

Format:CMNRn,Op2;setsflagsasif“Rn+Op2”

;Notice,theRn-(-Op2)=Rn+Op2

Function:ComparesRnregistervaluewiththenegativeofOp2value.ThisisdonebyRn-(negativeofOp2)whichisRn-(-Op2)=Rn+Op2.TheRnandOp2operandsarenotaltered. In other words, the CMN adds the Op2 to Rn (Rn+Op2) and sets the flagsaccordingly.ThisisthesameasADDSinstructionexcepttheoperandsareunchangedandtheresultisdiscarded.SeeBxxinstructionforpossiblecasesofcomparison.

CMPCompare


Format:CMPRn,Op2;setsflagsasif“Rn-Op2”

Function:Comparestwooperands.Theoperandsarenotaltered.PerformscomparisonbysubtractingtheOp2operandfromtheRnandupdatesflagsasifSUBSwereperformed.AswecanseeinSUBS,theCMPperformtheoperationofRn+2’s

compofOp2andsetstheflagsaccordingtotheresult.SeeBxxinstructionforpossiblecasesofcomparison.

CPSIDChangeprocessorIDandDisableInterrupt

Flags:Unaffected

Format:CPSIDiflag;iflagisiinPRIMASKorfinFAULTMASK

Function:UsedfordisablingtheinterruptflagsinPRIMASKorFAULTMASKregisters.SeeARMCortexmanual.

CPSIEChangeProcessorStateandEnableInterrupt

Flags:Unaffected

Format:CPSIEiflag;iflagisiinPRIMASKorfinFAULTMASK

Function:UsedforenablingtheinterruptflagsinPRIMSKorFAULTMASKregisters.SeeARMCortexmanual.

DMBDataMemoryBarrier

Flags:Unaffected

Format:DMB

Function:ItmakessurethatalltheexplicitmemoryaccessespriortoDMBinstructionarecompletedbeforetheexplicitmemoryaccessesaftertheDMB.SeeARMCortexmanual.

DSBDataSynchronizationBarrier

Flags:Unaffected

Format:DSB

Function:ItmakessurethatalltheexplicitmemoryaccessespriortoDSBinstructionarecompletedbeforetheDSBinstructionisexecuted.SeeARMCortexmanual.

EORExclusiveOR

Flags:Unaffected

Format:EORRd,Rn,Op2

Function:PerformslogicalEx-ORontheRnandOp2operands,bitbybit,storingtheresultintheRd.Thiswillnotupdatetheflags.UseEORSinstructiontoupdatestheflags.

Inputs Output

X Y XEORY

0 0 0

0 1 1

1 0 1

1 1 0

Example1:

MOVR0,#0xAA;R0=0xAA

EORR2,R0,#0xFF;now,R2=0x55

;AA10101010

;FF11111111

;—–––

;5501010101flagsunchanged

Example2:


LDRR1,=0x55555555;R1=0x55555555

EORR2,R1,R0;R2=0xFFFFFFFF

;AA10101010

;5501010101

;—–––

;FF11111111flagsunchanged

The“EORRd,Rx,Rx”canbeusedtoclearRd.

Example3:

MOVR1,#0x55

EORR2,R1,R1;R2=0

;5501010101

;5501010101

;—–––


TocomplementthebitsofRn,EX-ORitwith0xFF.

Example4:



EORR2,R1,R0;R2=0x55555555

;AA10101010

;FF11111111

;—–––


EORSExclusiveORandupdatetheflags

Flags:Affected:C,V,N,Z

Format:EORSRd,Rn,Op2

Function:PerformslogicalEx-ORontheRnandOp2operands,bitbybit,storingtheresultintheRd.Aftertheexecutiontheflagsareupdated.

Inputs Output

X Y XEORY

0 0 0

0 1 1

1 0 1

1 1 0

Example1:

MOVR0,#0xAA;R0=0xAA

EORSR2,R0,#0xFF;now,R2=0x55

;AA10101010

;FF11111111

;—–––

;5501010101N=0,C=0,Z=0,V=0

Example2:


LDRR1,=0x55555555;R1=0x55555555

EORSR2,R1,R0;R2=0xFFFFFFFF

;AA10101010

;5501010101

;—–––

;FF11111111N=1,C=0,Z=0,V=0

The“EORRd,Rx,Rx”canbeusedtoclearRd.

Example3:

MOVR1,#0x55

EORSR2,R1,R1;R2=0x0

;5501010101

;5501010101

;—–––

;0000000000N=0,C=0,Z=1,V=0

ISBInstructionSynchronizationBarrier

Flags:Unaffected.

Format:ISB

Function:ItflushesthepipelinetomakesuretheinstructionsexecutedrightaftertheISBinstructionarefetchedfreshfromthecacheormemory.

ITIf-ThenConditionBlock

Flags:Unaffected

Format:SeeARMmanual

Function:ItallowstheexecutionofuptofourinstructionaftertheITtobeconditional.

LDCLoadCoprocessor

SeetheARMManual

LDMLoadMultipleregisters

Flags:Unaffected.

Format:LDMRn,{Rx,Ry,..}

Function: Loadsintoregistersfromconsecutivememorylocations.ThestartingaddressofmemorylocationisgivenbyRnregister.Thedestinationregistersseparatedbycommaandplacedinbraces.IntheARMCortex,thestackisdescendingmeaningthatasinformationispushedontostackthestackpointerisdecremented.ThisIA(IncrementtheaddressaftereachAccess)isthedefaultforloading(Poping).ThisinstructioniswidelyusedforPoping(loading)multiplewordsfromdescendingstackintoCPUregisters.

Example:

;Assumethefollowingmemorylocationswiththecontents:

;12000=(46)

;12001=(10)

;12002=(38)

;12003=(82)

;12004=(56)

;12005=(50)

;12006=(58)

;12007=(15)

;12008=(63)

;12009=(60)

;1200A=(68)

;1200B=(39)

;1200C=(79)

;1200D=(70)

;1200E=(75)

;1200F=(92)

LDRR7,=0x12000

LDMR7,{R0,R2,R4}

;now,R0=0x82381046,R2=0x15585056,…

;thecontentsofmemorylocations0x12000-0x12003are

;movedtoregisterR0,andthecontentsofmemory

;locations0x12004-0x12007aremovedtoregister

;R2,andsoon.ThereforewehaveR0=0x82381046,

;R2=0x15585056,andR4=0x39686063.

LDMDBLoadMultipleregistersandDecrementBeforeeachaccess

Flags:Unaffected.

Format:LDMDBRn,{Rx,Ry,…}

Function:ThisisthesameasLDMEA(loadmultipleregistersfromEmptyAscending)usedforcasesinwhichthestackisascending.SeeLDMEAinstruction.

LDMEALoadMultipleregistersfromEmptyAscending

Flags:Unaffected.

Format:LDMEARn,{Rx,Ry,…}

Function:Loadsintoregistersfromconsecutivememorylocations.ThestartingaddressofmemorylocationisgivenbyRnregister.Thedestinationregistersseparatedbycommaandplacedinbraces.IntheARMCortex,thedefaultforstackisdescendingmeaningthatasinformationarepushedontostackthestackpointerisdecremented.TheIA(IncrementtheaddressaftereachAccess)isthedefault.IfwechangethedefaultofdescendingstacktoascendingstackthenwehavetousetheEA(EmptyAscending).Theascendingstackmeansasinformationarepushedontostackthestackpointerisincremented.TheLDMEAisusedforPoping(loading)multiplewordsfromascendingstackintoCPUregisters.

LDMFDLoadMultipleregistersFullDescending

Flags:Unaffected.

Format:LDMFDRn,{Rx,Ry,..}

Function:ThisisthesameasLDMandLDMIA.

LDMIALoadMultipleregistersandIncrementaftereachAccess

Flags:Unaffected.

Format:LDMRn,{Rx,Ry,..}

Function: This is the same as the LDM instructions. In theARMCortex, the stack isdescending meaning that as information are pushed onto stack the stack pointer isdecremented.ThisIA(IncrementtheaddressaftereachAccess)isthedefault.WeusethisforPoping(loading)multiplewordsfromdescendingstackintoCPUregisters.

LDRLoadRegister

Flags:Unaffected.

Format:LDRRd,[Rx];loadintoRdawordfrommemory

;locationpointedtobeRx

Function: Loadsintodestinationregisterthecontentsoffourmemorylocations.The [Rx]points to addressofmemory location.This iswidelyused to load32-bit datafrommemoryintoRdregisteroftheARMsinceinthe“MOVRd,#immediate_value”theimmediatevaluecannotbelargerthan0xFF.

Example:


;12000=(46)

;12001=(10)

;12002=(38)

;12003=(82)

LDRR0,=0x12000

LDRR1,[R0]

;now,R1=82381046.

LDRRx,=ValueLoadRegisterwith32-bitvalue

Flags:Unaffected.

Format:LDRRd,=32_bit_value;loadRdwith32-bitvalue

Function:Loadsintodestinationregistera32-bitimmediatevalue.Thisiswidelyused to load 32-bit immediate value into Rd register of the ARM since in the “MOVRd,#immediate_value”theimmediatevaluecannotbelargerthan0xFF.

Example:

LDRR0,=0x1200000;R0=0x1200000

LDRR1,=0x2FFFF;R1=0x2FFFF


LDRR1,=200000000;R1=200000000

LDRBLoadRegisterByte

Flags:Unaffected.

Format:LDRBRd,[Rx];loadintoRdabytefrommemory


Function:LoadsintodestinationregisterthecontentsofasinglememorylocationindicatedbyRx.

Example:


;12000=(46)

;12001=(10)

;12002=(38)

;12003=(82)

LDRR0,=0x12000

LDRBR1,[R0]

;now,R0=00000046

LDRBTLoadRegisterBytewithTranslation

Flags:Unaffected.

Format:LDRBTRd,[Rx]

Function:LoadsintoRdregisterabytefrommemorylocationpointedtobyRx

andzero-extendsthebyteto32-bitword.Thatmeansazeroiscopiedtoalltheupper24bitsoftheRdregister.Usedforunprivilegedmemoryaccess.

Example:


;12000=(46)

;12001=(10)

;12002=(38)

;12003=(82)

LDRR0,=0x12000

LDRBTR1,[R0];now,R1=00000046

LDREX,LDREXB,LDREXHLoadRegisterExclusive

Function:TheyareusedwithSTREX,STREXB,andSTREXHinstructionstoperformCPUsynchronizationoperation.SeeARMCortexmanual.

LDRHLoadRegisterHalfword

Flags:Unaffected.

Format:LDRHRd,[Rx];loadintoRda2-bytefrommemory


Function:Loadsintodestinationregisterthecontentsofthetwoconsecutivememorylocations(halfword)indicatedbyRx.

Example:


;12000=(46)

;12001=(10)

;12002=(38)

;12003=(82)

LDRR0,=0x12000

LDRHR1,[R0]

;now,R0=00001046

LDRSBLoadRegistersignedByte

Flags:Unaffected.

Format:LDRSBRd,[Rx]

Function:LoadsintoRdregisterabytefrommemorylocationpointedtobyRxandsign-extendsthebyteto32-bitword.Thatmeansthesign(D7)ofthebyteiscopiedtoalltheupper24bitsoftheRdregister.

Example1:


;12000=(85)

;12001=(10)

;12002=(38)

;12003=(82)

LDRR0,=0x12000

LDRBR1,[R0];nowR1=FFFFFF85becauseMSBof85is1

Example2:


;12000=(15)

;12001=(20)

;12002=(3F)

;12003=(82)

LDRR0,=0x12000

LDRBR1,[R0];now,R1=00000015becauseMSBof15is0

LDRSHLoadRegisterSignedHalfword

Flags:Unaffected.

Format:LDRSHRd,[Rx]

Function:LoadsintoRdregisterahalf-word(2-byte)frommemorylocationpointedtobyRxandsign-extendsitto32-bitword.Thatmeansthesign(D15)ofthe16-bitoperandiscopiedtoalltheupper16bitsoftheRdregister.

Example1:


;12000=(46)

;12001=(F3)

;12002=(38)

;12003=(82)

LDRR0,=0x12000

LDRBR1,[R0];now,R0=FFFFF346becauseMSBofF3is1

Example2:


;12000=(4F)

;12001=(23)

;12002=(18)

;12003=(B2)

LDRR0,=0x12000

LDRBR1,[R0];now,R1=0000234FbecauseMSBof23is0

LDRTLoadRegisterwithTranslation

Flags:Unaffected

Format:LDRTRd,[Rx]

Function:LoadsintoRdregisterabytefrommemorylocationpointedtobyRxandzero-extendsthebyteto32-bitword.Thatmeansazeroiscopiedtoalltheupper24bitsoftheRdregister.Usedforunprivilegedmemoryaccess.

Example:


;12000=(46)

;12001=(10)

;12002=(38)

;12003=(82)

LDRR0,=0x12000

LDRBR1,[R0];now,R1=00000046

LSLLogicalShiftLeft

Flags:Unaffected.

Format:LSLRd,Rm,Rn

Function:AseachbitofRmregisterisshiftedleft,theMSBisremovedandtheemptybitsarefilledwithzeros.ThenumberofbitstobeshiftedleftisgivenbyRnandtheresultisplacedinRdregister.TheLSLdoesnotupdatestheflags.

Example1:

LDRR2,=0x00000010

LSLR0,R2,#8;R0=R2isshiftedleft8times

;now,R0=0x00001000,flagsnotchanged

Example2:

LDRR0,=0x00000018

MOVR1,#12


;now,R2=0x000018000,flagsnotchanged

Example3:

LDRR0,=0x0000FF18

MOVR1,#16


;now,R2=0xFF180000,flagsnotchanged

Thelogicalshiftleftusedforunsignednumbershifting.LSLessentiallymultipliesRmbyapowerof2foreachbitshift.


Flags:Affected.

Format:LSLSRd,Rm,Rn

Function:AseachbitofRmregisterisshiftedleft,theMSBiscopiedtoCflagandtheemptybitsarefilledwithzeros.ThenumberofbitstobeshiftedleftisgivenbyRnandtheresultisplacedinRdregister.TheLSLSupdatestheflags.

Example1:

LDRR2,=0x00000010

LSLSR0,R2,#8;R0=R2isshiftedleft8times

;now,R0=0x00001000,C=0,N=0,Z=0

Example2:

LDRR0,=0x00000018

MOVR1,#12


;now,R2=0x000018000,C=0,N=0,Z=0

Example3:

LDRR0,=0x000FFF18

MOVR1,#16


;now,R2=0xFF180000,C=1,Z=0,N=0

Thelogicalshiftleftusedforunsignednumbershifting.LSLSessentiallymultipliesRmbyapowerof2foreachbitshift.


Flags:Unaffected.

Format:LSRRd,Rm,Rn

Function:AseachbitofRmregisterisshiftedright,theLSBisremovedandtheemptybitsarefilledwithzeros.ThenumberofbitstobeshiftedleftisgivenbyRnandtheresultisplacedinRdregister.TheLSRdoesnotupdatetheflags.

Example1:

LDRR2,=0x00001000

LSRR0,R2,#8;R0=R2isshiftedright8times

;now,R0=0x00000010,C=0

Example2:

LDRR0,=0x000018000

MOVR1,#12


;now,R2=0x00000018,C=0

Example3:

LDRR0,=0x7F180000

MOVR1,#16


;now,R2=0x00007F18,C=0

Thelogicalshiftrightusedforshiftingunsignednumbers.LSRessentiallydividesRmbyapowerof2foreachbitshift.


Flags:Affected.

Format:LSRSRd,Rm,Rn

Function:AseachbitofRmregisterisshiftedright,theLSBiscopiedtoCflagandtheemptybitsarefilledwithzeros.ThenumberofbitstobeshiftedleftisgivenbyRnandtheresultisplacedinRdregister.TheLSRSupdatestheflags.

Example1:

LDRR2,=0x00001FFF

LSRSR0,R2,#8;R0=R2isshiftedright8times

;now,R0=0x0000001F,C=1,N=0,Z=0

Example2:

LDRR0,=0x00000018

MOVR1,#12


;now,R2=0x000000000,C=0,N=0,Z=1,

Example3:

LDRR0,=0x000FFF18

MOVR1,#16


;Now,R2=0x0000000F,C=1,Z=0,N=0

Thelogicalshiftrightusedforshiftingunsignednumbers.LSRSessentiallydividesRmbyapowerof2foreachbitshift.

MCRMovetoCoprocessorfromARMRegister

SeeARMManual.

MLAMultiplyAccumulate

Flags:Unaffected

Format:MLARd,Rs1,Rs2,Rs3;Rd=(Rs1×Rs2)+Rs3

Function:MultipliesanunsignedwordheldbyRs1byaunsignedwordinRs2andtheresultisaddedtoRs3andplacedinRd.

Example:

MOVR0,#0x20;R0=0x20

MOVR1,#0x50;R1=0x50

MOVR2,#0x10;R2=0x10

MLAR4,R0,R1,R2;nowR4=(0x20×0x50)+10=0xA10

MLSMultiplyandSubtract

Flags:Unaffected

Format:MLSRd,Rm,Rs,Rn;Rd=Rn-(Rs×Rm)

Function:MultipliesanunsignedwordheldbyRmbyaunsignedwordinRsandtheresultissubtractedfromRnandplacedinRd.

Example:

MOVR0,#0x20;R0=0x20

MOVR1,#0x50;R1=0x50

LDRR2,=0x1000;R2=0x1000

MLSR4,R0,R1,R2;nowR4=0x1000-(0x20×0x50)=0x600

MOVMove(ARM7)

Flags:Unaffected.

Format:MOVRd,#imm_value;Rd=imm_Value<0x200

Function:LoadtheRdregisterwithanimmediatevalue.Theimmediatevaluecannotbelargerthan0xFF(0–255).Aftertheexecutiontheflagsarenotupdated.TheMOVSinstructionupdatestheflags.

Example1:

MOVR0,#0x25;R0=0x25

MOVR1,#0x5F;R1=0x5F

ToloadtheARMregisterwithvaluelargerthan0xFFwemustusethe“LDRRd,=32_bit_data.”Forexample,wecanuseLDRR2,=0xFFFFFFFF.

Example2:

LDRR0,=0x2000000;R0=0x2000000

MOVMove(ARMCortex)

Flags:Unaffected.

Format:MOVRd,#imm_val;Rd=imm_val(imm_val<0x10000)

Function:LoadtheRdregisterwithanimmediatevalue.Theimmediatevaluecannotbelargerthan0xFFFF(0–65535).

Example:

MOVR1,#0xF459;R1=0xF459

ToloadtheARMregisterwithvaluelargerthan0xFFFFwemustusethe“LDRRd,=32_bit_data.”ForexamplewecanuseLDRR2,=0xFFFFFFF.

MOVSMove(andupdateflags)

Flags:Affected:C,N,Z

Format:MOVRd,#immediate_value

Function:LoadtheRdregisterwithanimmediatevalueandupdatetheflags.

Example:

MOVSR0,#0x25;R0=0x25,N=0,Z=0,andC=0

MOVSR0,#0x0;R0=0x0,N=0,Z=1,andC=0

MOVTMoveTop

Flags:Unaffected.

Format:MOVTRd,#imm_value;imm_value<0x10000

Function:Loadstheupper16-bitofRdregisterwithanimmediatevalue.Theimmediatevaluecannotbelargerthan0xFFFF(0–65535).Thelower16-bitoftheRdregisterremainsunchanged.

Example:

LDRR0,=0x25579934;R0=0x25579934

MOVTR0,#0xAAAA;R0=0xAAAA9934

MOVWMove16-bitconstant

Flags:Unaffected.

Format:MOVWRd,#imm_value;imm_value<0x10000

Function:LoadtheRdregisterwithanimmediatevalue.Theimmediatevaluecannotbelargerthan0xFFFF(0–65535).

Example:

MOVWR1,#0x5555;R1=0x5555

ToloadtheARMregisterwithvaluelargerthan0xFFFFwemustusethe“LDRRd,=32_bit_data.”ForexamplewecanuseLDRR2,=0xFFFFFFF.

MRCMovetoARMRegisterfromCoprocessor

SeeARMmanual

MRSMovetogeneralRegisterfromSpecialregister

Flags:Unaffected.

Format:MRSRd,special_reg;copyspecial_regtoRd

Function:Copiesthecontentsofaspecialfunctionregistertoageneral-purposeregister.ThisinstructionalongwiththeMSRiswidelyusedtomodifythespecialfunction

registerssuchasCONTROL,PRIMASK,andISPR.Thisistheonlywaywecanaccessthespecialfunctionregisters.

Example:

MRSR1,CONTROL;R1=CONTROL

ANDR1,#0x00;maskthelower8bits

MSRCONTROL,R1

MSRMovetoSpecialregisterfromgeneralRegister

Flags:Unaffected.

Format:MSRspecial_reg,Rn;copyspecial_regtoRn

Function:Copiesthecontentsofageneral-purposeregistertospecialfunctionregister.ThisinstructionalongwiththeMRSiswidelyusedtomodifythecontentsofspecialfunctionregisterssuchasCONTROL,PRIMASK,andISPR.Thisistheonlywaywecanaccessthespecialfunctionregisters.

Example:

MRSR1,CONTROL;R1=CONTROL

ANDR1,#0x00;maskthelower8bits

MSRCONTROL,R1;maskthelower8bitsofCONTROLreg.

MULUnsignedMultiplication

Flags:Affected:N,Z,Unaffected:C,V

Format:MULRd,Rn,Rm;Rd=Rn×Rm

Function:MultipliesawordinregisterRnbyawordinregisterRmandplacestheresultinRd.

Example1:

MOVR0,#100;R0=100

MOVR1,#200;R1=200

MULR3,R0,R1;R3=R0xR1=100x200=20000

Example2:

LDRR0,=10000;R0=10000

LDRR1,=20000;R1=20000

MULR3,R0,R1;R3=R0xR1=10000x20000=200000000

MVNMoveNegative

Flags:Unaffected.

Format:MVNRd,Op2;Rd=1’scomp.ofOp2

Function:PlacesinRdthenegation(the1’scomplement)ofOp2.EachbitofOp2isinverted(logicalNOT)andplacedinRdwhileflagsremainunchanged.

Example1:

MOVR0,#0xAA;R0=0xAA

MVNR2,R0;now,R2=0xFFFFFF55

Example2:


MVNR1,R0;R1=0x55555555

Example3:

MVNR0,#0x0F;R0=0xFFFFFFF0

Example4:

MVNR2,#0x0;R0=0xFFFFFFFFwidelyusedtoloadRxwithall1s

MVNSMoveNegativeandupdatetheflags

Flags:NandZareaffected

Format:MVNSRd,Op2;Rd=1’scomp.ofOp2andupdateflags

Function:PlacesinRdthenegation(the1’scomplement)ofOp2.EachbitofOp2isinverted(logicalNOT)andplacedinRdandNandZflagsareupdated.

Example1:

MOVR0,#0xAA;R0=0xAA

MVNSR2,R0;now,R2=0xFFFFFF55,N=1,Z=0

Example2:


MVNSR1,R0;R1=0x55555555,Z=0andN=0

Example3:


MVNSR1,R0;R1=0x00000000,N=0andZ=1

Example4:

MVNR2,0x0;R0=0xFFFFFFFF,N=1,Z=0

NOPNoOperation

Flags:Unaffected.

Format:NOP

Function:Performsnooperation.Sometimesusedfortimingdelaystowasteclockcycles.UpdatesPC(programcounter)topointtonextinstructionfollowingNOP.InsomeARMCPUs,thepipelineremovestheNOPbeforeitreachestheexecutionstage.

ORNLogicalORNot

Flags:Unaffected.

Format:ORNRd,Rn,Op2;Rd=RnORedwith1’scompofOp2

Function:PerformstheORoperationonthebitsofRnwiththecomplementofthebitsinOp2.TheORNwillnotupdatetheflags.ToupdatetheflagswemustuseORNS.

Inputs Output

A B AOR(NOTB)

0 0 1

0 1 0

1 0 1

1 1 1

Example1:


LDRR2,=0x99999999;R2=0x9999999

ORNR3,R2,R1;nowR3=0x999999FF

Example2:

MOVR1,#0;R1=0

LDRR0=0xFFFFFFFF;R0=0xFFFFFFFF

ORNR2,R1,R0;now,R2=0x0

ORNSORNotandupdateflags

Flags:NandZareaffected

Format:ORNSRd,Rn,Op2;Rd=RnORedwith1’scompofOp2

Function:PerformstheORoperationonthebitsofRnwiththecomplementofthebitsinOp2andupdatestheflags.TheresultisplacedinRd.

Inputs Output

AOR(NOT

A B B)

0 0 1

0 1 0

1 0 1

1 1 1

Example1:


LDRR2,=0x99999999;R2=0x9999999

ORNSR3,R2,R1;now,R3=0x999999FF,N=1,Z=0

Example2:


LDRR1,=0x01234567;R1=0x01234567

ORNSR2,R1,R0;now,R2=0x01234567,N=0,Z=0

Example3:

MOVR1,#0;R1=0

LDRR0=0xFFFFFFFF;R0=0xFFFFFFFF

ORNSR2,R1,R0;now,R2=0x0,N=0,Z=1

ORRLogicalOR

Flags:Unaffected

Format:ORRRd,Rn,Op2;Rd=RnORedOp2

Function:PerformslogicalORonthebitsofRnandOp2,andplacestheresultinRd.Oftenusedtoturnabiton.ORRwillnotupdatetheflags.

Example1:

MOVR0,#0xAA;R0=0xAA

ORRR2,R0,#0x55;now,R2=0xFF

Example2:

LDRR0,=0x00010203;R0=00010203

LDRR1,=0x30303030

ORRR2,R0,R1;R2=0x30313233

Example3:

LDRR0,=0x55555555;R0=0x55555555


ORRR2,R1,R0;R1=0xFFFFFFFF

ORRSLogicalORandupdatetheflags

Flags:NandZareaffected.

Format:ORRSRd,Rn,Op2;Rd=RnORedOp2,updatetheflags

Function:PerformslogicalORonthebitsofRnandOp2,andplacestheresultinRd.Oftenusedtoturnabiton.ORRSupdatestheflags.

Inputs Output

X Y XORY

0 0 0

0 1 1

1 0 1

1 1 1

Example1:

MOVR0,#0xAA;R0=0xAA

ORRSR2,R0,#0x55;now,R2=0xFF,N=0,Z=0

Example2:

LDRR0,=0x00010203;R0=00010203

LDRR1,=0x30303030

ORRSR2,R0,R1;R2=0x30313233N=0,Z=0

Example3:

LDRR0,=0x55555555;R0=0x55555555


ORRSR2,R1,R0;R1=0xFFFFFFFFN=1,Z=0

POPPOPregisterfromStack

Flags:Unaffected.

Format:POP{reg_list};reg_reg=wordsofftopofstack

Function:Copiesthewordspointedtobythestackpointertotheregistersindicated

by the reg_list and increments the SP by 4, 8, 12, 16,… depending on the number ofregistersinthereg_list.

Example:

POP{R1};POPthetopwordofstacktoR1

POP{R1,R4,R7};POPthetop3wordsofstacktoR1,R4,R7

POP{R2-R6};POPthetop5wordsofstacktoR2-R6

POP{R0,R5};POPthetop2wordsofstacktoR0andR5

POP{R0-R7};POPthetop8wordsofstacktoR0-R7

ThePOPinstructionissynonymsforLDMIA.

PUSHPUSHregisterontostack

Flags:Unaffected.

Format:PUSH{reg_list};PUSHreg_listontostack

Function:Copiesthecontentsofregistersstatedinreg_listontothestackanddecrementsSPby4,8,12,16,…dependingonthenumberofregistersinreg_list.

Example:

PUSH{R1};PUSHtheR1ontotopofstack

PUSH{R1,R4,R7};PUSHR1,R4,R7ontotopofstack

PUSH{R2-R6};PUSHtheR2,R3,R4,R5,R6ontotopofstack

PUSH{R0,R5};PUSHtheR0andR5ontotopofstack

PUSH{R0-R7};PUSHtheR0throughR7ontotopofstack

ThePUSHinstructionissynonymsforSTMDB.

RBITReverseBits

Flags:Unaffected.

Format:RBITRd,Rn;ReversethebitorderofRnandplaceinRd

Function:Reversesthebitpositionorderofthe32-bitvalueinRnregisterandplacetheresultinRd.

Example:

MOVR1,#0x5F

RBITR2,R1;now,R2=0xF5000000

REVReversebyteorderinaword

Flags:Unaffected

Format:REVRd,Rn;ReversethebyteofRnandplaceitinRd

Function:Reversesthebytepositionorderofthe32-bitvalueinRnregisterandplacestheresultinRd.Thiscanbeusedtoconvertfromlittleendiantobigendianorfrombigendiantolittleendian.

Example:

LDRR1,=0x12345678

REVR2,R1;now,R2=0x78564312

RV16Reversebyteorderin16-bit

Flags:Unaffected

Format:REV16Rd,Rn;ReversethebitsifRnandplaceitinRd

Function:Reversesthe16-bitpositionorderofthe32-bitvalueinRnregisterandplacestheresultinRd.Thiscanbeusedtoconvert16-bitlittleendiantobigendianorfrom16-bitbigendiantolittleendian.

Example:

LDRR1,=0x559922FF

RV16R2,R1;now,R2=0x22FF5599

REVSHReversebyteorderinbottomhalfwordandsignextend

Flags:Unaffected

Format:REVSHRd,Rn;Rd=ReversethebyteandsignextendRn

Function:Reversesthe16-bitpositionorderofRnregisterandaftersignextendingto32-bititisplacedinRd.Thiscanbeusedtoconvertasigned16-bitlittleendianto32-bitsignedbigendianorfromsigned16-bitbigendianto32-bitsignedlittleendian.

Example:

LDRR1,=0x559922FF

REVSHR2,R1;now,R2=0x22FF5599

RORRotateRight

Flags:Unaffected.

Format:RORRd,Rm,Rn;Rd=rotateRmrightRnbitpositions

Function:AseachbitofRmregistershiftsfromlefttoright,theyexitfromtherightend(LSB)andenterfromleftend(MSB).ThenumberofbitstoberotatedrightisgivenbyRnandtheresultisplacedinRdregister.TheRORdoesnotupdatetheflags.

Example1:

LDRR2,=0x00000010

RORR0,R2,#8;R0=R2isrotatedright8times

;now,R0=0x10000000,C=0

Example2:

LDRR0,=0x00000018

MOVR1,#12


;now,R2=0x01800000,C=0

Example3:

LDRR0,=0x0000FF18

MOVR1,#16


;Now,R2=0xFF180000,C=0


Flags:C,N,Zareaffected.

Format:RORSRd,Rm,Rn;Rd=rotateRmrightRnbitpositions

Function:AseachbitofRmregistershiftsfromlefttoright,theyexitfromtherightend(LSB)andentersfromleftend(MSB).InadditionaseachbitexitstheLSB,acopyisofitisgiventoCflag.ThenumberofbitstoberotatedrightisgivenbyRnandtheresultisplacedinRdregister.TheRORSupdatestheflags.

Example1:

LDRR2,=0x00000010

RORSR0,R2,#8;R0=R2isrotatedright8times

;now,R0=0x01000000,C=0,N=0,Z=0

Example2:

LDRR0,=0x00000018

MOVR1,#12


;now,R2=0x01800000,C=0,N=0,Z=0

Example3:

LDRR0,=0x0000FF18

MOVR1,#16


;now,R2=0xFF180000,C=1,N=0,Z=0


Flags:Unaffected.

Format:RRXRd,Rm;Rd=rotateRmright1bitposition

Function:EachbitofRmregisterisshiftedfromlefttorightonebit.TheRRXdoesnotupdatetheflags.

Example:

LDRR2,=0x00000002

RRXR0,R2;R0=R2isshiftedrightonebit

;now,R0=0x00000001


Flags:Affected.

Format:RRXSRd,Rm;Rd=rotateRmright1bitposition

Function:EachbitofRmregisterisshiftedfromlefttorightonebit.TheRRXSupdatestheflags.

Example1:

LDRR2,=0x00000002

RRXSR0,R2;R0=R2isshiftedrightonebit

;now,R0=0x00000001

RSBReverseSubtract

Flags:Unaffected

Format:RSBRd,Rn,Op2;Rd=Op2-Rn

Function:SubtractstheRnfromtheOp2andputstheresultintheRd.TheRSBhasnoeffectonflags.ThestepsforsubtractionperformedbytheinternalhardwareoftheCPUareasfollows:

1.Takesthe2’scomplementoftheRn

2.AddsthistotheOp2

3.PlacestheresultinRd

TheOp2andRnoperandsremainunchangedbythisinstruction.

Example:

LDRR0,=0x55555555;R0=0x55555555

LDRR1,=0x99999999;R1=0x99999999

RSBR2,R0,R1;R2=R1-R0

;For“RSBR2,R0,R1”wehave:

;R2=R1-R0=0x99999999-0x55555555=

;R2=0x99999999+2’scompof0x55555555

;R2=0x99999999+0xAAAAAAAB=0x44444444

;0x99999999

;-0x55555555

;–––––

;0x44444444

RSBSReverseSubtractandupdatetheflags


Format:RSBSRd,Rn,Op2;Rd=Op2-Rn

Function:SubtractstheRnfromtheOp2andputstheresultintheRd.TheRSBSupdatestheflags.

ThestepsforsubtractionperformedbytheinternalhardwareoftheCPUareasfollows:

1.Takesthe2’scomplementoftheRn

2.AddsthistotheOp2

3.PlacestheresultintheRd

TheRnandOp2operandsremainunchangedbythisinstruction.

Example:

LDRR0,=0x55555555;R0=0x55555555

LDRR1,=0x99999999;R1=0x99999999

RSBR2,R0,R1;R2=R1-R0

;For“RSBR2,R0,R1”wehave:

;R2=R1-R0=0x99999999-0x55555555=

;R2=0x99999999+2’scompof0x55555555

;R2=0x99999999+0xAAAAAAAB=0x44444444

;0x99999999

;-0x55555555

;–––––

;0x44444444C=0,N=0,Z=0,V=0

SBCSubtractwithCarry(Borrow)

Flags:Unaffected

Format:SBCRd,Rn,Op2;Rd=Rn–Op2–(1–C)

Function:SubtractstheOp2operandfromtheRn,placingtheresultinRd.IfC=0,itsubtracts1fromtheresult;otherwise,itoperateslikeSUB.TheSBChasnoeffectonflags.Thisisusedwidelyformultiword(64-bit)subtraction.

Example:

LDRR0,=0x55555555;R0=0x55555555

LDRR1,=0x99999999;R1=0x99999999

SUBSR2,R0,R1;R2=R0-R1

MOVR3,#0x09;R3=0x09

SBCR4,R3,#03;R4=R3-0x3

;ForSUBSwehave:

;R2=R1-R0=0x55555555-0x99999999=

;R2=0x55555555+2’scompof0x99999999

;R2=0x55555555+0x66666667=0xBBBBBBBCC=0

;ForSBCwehave:

;R4=R3-0x3=0x09-0x3-(1-C)=9-3-1

;R4=0x9+2’comp.of-4=0x9+0xFFFFFFFC=0x05

;0x0000000955555555

;-0x0000000399999999

SBCSSubtractwithCarry(Borrow)andupdatetheflags

Flags:C,Z,N,V

Format:SBCSRd,Rn,Op2;Rd=Rn–Op2–(1–C)

Function:SubtractstheOp2operandfromtheRn,placingtheresultinRd.IfC=0,itsubtracts1fromtheresult;otherwise,itexecuteslikeSUB.TheSBCSupdatestheflags.Thisisusedwidelyformultiword(64-bit)subtraction.

Example:

LDRR0,=0x55555555;R0=0x55555555

LDRR1,=0x99999999;R1=0x99999999


MOVR3,#0x09;R3=0x09

SBCSR4,R3,#03;R4=R3-0x3

;ForSUBSwehave:

;R2=R1-R0=0x55555555-0x99999999=

;R2=0x55555555+2’scompof0x99999999

;R2=0x55555555+0x66666667=0xBBBBBBBCC=0

;ForSBCSwehave:

;R4=R3-0x3=0x09-0x3-(1-C)=9-3-1

;R4=0x9+2’comp.of-4=0x9+0xFFFFFFFC=0x05

;0x0000000955555555

;-0x0000000399999999

;––––––

;0x10000005BBBBBBBCC=0,Z=0,V=0,N=0

SBFXSignBitFieldextract

Flags:Unaffected

Format:SBFXRd,Rn,#LSB,#Width

Function:ExtractsthebitfieldfromtheRnregisterandthenaftersignextendingitisplacedinRd.The#LSBindicateswhichbitand#Widthindicateshowmanybits.

Example1:

LDRR0,=0x00000543;R0=0x00000543

SBFXR2,R0,#8,#4;now,R2=0x00000005

Example2:

LDRR0,=0x00000C43;R0=0x00000C43

SBFXR2,R0,#4,#8;now,R2=0xFFFFFFC4

SDIVSignedDivide

Flags:Unaffected

Format:SDIVRd,Rn,Rm;Rd=Rn/Rm

Function:DividesasignedintegerwordinRnbyanothersignedintegerwordinRm.ThequotientresultisplacedinRd.IfvalueinRnregisterisnotdivisiblebythevalueinRmregister,theresultisroundedtozeroandplacedinRd.Dividebyzerocausesinterrupttype3.

Example:

LDRR0,=-20000;R0=-20000

LDRR1,=-1000;R1=-1000

SDIVR2,R0,R1;now,R2=-2000/-1000=2

SEVSendEvent

Flags:Affected.

Format:SEV

Function:Sendssignaltoalltheprocessorsinthemultiprocessorssystem.SeetheARMCortexmanual.

SMLALSignedMultiplyAccumulateLong

Flags:Unaffected

Format:SMLALRdlo,Rdhi,Rn,Rm;Rdhi:Rdlo=(Rm×Rn)+(Rdhi:Rdlo)

Function:MultipliessignedwordsinRnandRmregister,addsthe64-bitresulttoRdhi:Rdloregister,andsavesthefinalresultinRdhi:Rdlo.TheRdlo(low)andRdhi(high)arethelowerwordandhigherwordofa64-bitvalue.

Example1:

LDRR0,=0

LDRR1,=0x23

LDRR2,=-5000

LDRR3,=-4000

SMLALR0,R1,R2,R3;now,R3:R2=(R3:R2)+(R1×R0)

;=0x2300000000+(-5000×-4000)

;=0x2300000000+20000000

;=0x23000000+0x1312D00=0x2301312D00

;=>R0=0x1312D00andR1=0x23

SMULLSignedMultiplyLong

Flags:Unaffected

Format:SMULLRdlo,Rdhi,Rn,Rm;Rdhi:Rdlo=Rm×Rn

Function:MultipliessignedwordsinRnandRmregister,andsavestheresultinRdhi:Rdlo.TheRdl(low)andRdh(high)arethelowerwordandhigherwordofa64-bitvalue.

Example:

LDRR0,=-20000;R0=-20000(signed2’scomp)

LDRR1,=-1000000;R0=-100000(signed2’scomp)

SMLALR2,R3,R0,R1;now,R3:R2=R1×R0=-20000×-1000000=

;20000000000=0x4A817C800=>R3=0x4and

;R2=0xA817C800

SSATSignSaturate

Flags:Unaffected.

Format:SSATRd,#n,Rm,shift#

Function:Usedforsaturationoperation.SeeARMCortexmanual.

STMStoreMultiple

Flags:Unaffected.

Format:STMRn,{Rx,Ry,…}

Function:StoresregistersRx,Ry,…intoconsecutivememorylocations.ThestartingaddressofmemorylocationisgivenbyRnregister.Thesourceregistersareseparatedbycommaandplacedinbraces.IntheARMCortex,thedefaultstackisdescendingmeaningthatasinformationarepushedontostackthestackpointerisdecremented.ThisIA(IncrementtheaddressAftereachaccess)isthedefault.ThisinstructioniswidelyusedforPushing(storing)multipleregistersintoascendingstack.

Example:

LDRR7,=0x12000

LDRR0,=0x82381046;R0=0x82381046

LDRR2,=0x15585056;R2=0x15585056

LDRR4,=0x39686063;R4,=0x39686063

STMR7,{R0,R2,R4};now,R2=0x15585056,..

;ThecontentsofregistersR0,R2,andR4arestoredinto

;consecutivememorylocationsstartingatanaddressgivenbyR7.

;TheR0contentsarestoredintomemorylocations0x12000-0x12003,

;theR2contentsarestoredintomemorylocations0x12004through

;0x12007,andsoon.Thisisshownbelow.

;12000=(46)

;12001=(10)

;12002=(38)

;12003=(82)

;12004=(56)

;12005=(50)

;12006=(58)

;12007=(15)

;12008=(63)

;12009=(60)

;1200A=(68)

;1200B=(39)

STMDBStoreMultipleregisterandDecrementBefore

Flags:Unaffected.

Format:STMDBRn,{Rx,Ry,…}

Function:StoresregistersRx,Ry,…intoconsecutivememorylocations.ThestartingaddressofmemorylocationisgivenbyRnregister.Thesourceregistersareseparatedbycommaandplacedinbraces.IntheARMCortex,thedefaultstackisdescendingmeaningthatasinformationarepushedontostackthestackpointerisdecremented.SinceIA(IncrementtheaddressAftereachaccess)isthedefaultweneedtouseDB(DecrementtheaddressBeforeeachaccess)istooverwritethedefault.ThisinstructioniswidelyusedforPushing(storing)multipleregistersintoDescendingstack.

Example:

LDRR7,=0x12000

LDRR0,=0x39686063;R0=0x39686063

LDRR2,=0x15585056;R2=0x15585056

LDRR4,=0x82381046;R4,=0x82381046

STMDBR7,{R0,R2,R4}

;ThecontentsofregistersR0,R2,andR4arestoredinto

;consecutivememorylocationsstartingatanaddressgivenbyR7.

;TheR0contentsarestoredintomemorylocations0x11FFF-0x11FFC,

;theR2contentsarestoredintomemorylocations0x11FFB

;through0x11FF8,andsoon.Thisisshownbelow.

;11FF4=(46)

;11FF5=(10)

;11FF6=(38)

;11FF7=(82)

;11FF8=(56)

;11FF9=(50)

;11FFA=(58)

;11FFB=(15)

;11FFC=(63)

;11FFD=(60)

;11FFE=(68)

;11FFF=(39)

STMEAStoreMultipleregisterEmptyAscending

Flags:Unaffected.

Format:STMEARn,{Rx,Ry,…}

Function:ThisissameasSTM.

STMIAStoreMultipleregisterEmptyAscending

Flags:Unaffected.

Format:STMIARn,{Rx,Ry,…}

Function:ThisissameasSTM.

STMFDStoreMultipleregisterFullDescending

Flags:Unaffected.

Format:STMFDRn,{Rx,Ry,…}

Function:ThisisanothernameforSTMDB.TheFDisforpushingontoFullDescendingstacks

STRStoreRegister

Flags:Unaffected.

Format:STRRd,[Rx];StoreRdintomemorylocationpointedto

beRx

Function:StoresRdregisterintofourconsecutivememorylocations.The[Rx]pointstostartingaddressofmemorylocation.Thisiswidelyusedtostore32-bitregisterintomemorylocations.

Example:

LDRR1,=0x82381046;R1=0x82381046

LDRR0,=0x12000;R0=0x12000

STRR1,[R0];now,

;12000=(46)

;12001=(10)

;12002=(38)

;12003=(82)

STRBStoreRegisterByte

Flags:Unaffected.

Format:STRBRd,[Rn]

Function:StoresthelowestbyteoftheRdregisterintoasinglememorylocationindicatedbyRn.

Example:

LDRR1,=0x82381046;R1=0x82381046

LDRR0,=0x12000;R0=0x12000

STRBR1,[R0];now,12000=(46)

STRBTStoreRegisterBytewithTranslation

Flags:Unaffected.

Format:STRBT

Function:StoresthelowestbyteoftheRdregisterintoasinglememorylocationindicatedbyRn.ThisisthesameasSTRBbutisusedforunprivilegedmemoryaccess.SeeARMCortexmanual.

Example:

LDRR1,=0x82381046;R1=0x82381046

LDRR0,=0x12000;R0=0x12000

STRBTR1,[R0]

;now,12000=(46)

STRDStoreRegisterDouble(twowords)

Flags:Unaffected.

Format:STRDRd,[Rn]

Function:StorestworegistersofRdandRd+1into8consecutivememorylocationsindicatedbyRn.RdcanbeR0,R2,R4,R6,R8,R10,orR12.

Example:

LDRR2,=0x12000

LDRR0,=0x82381046;R0=0x82381046

LDRR1,=0x15585056;R1=0x15585056

STRDR0,R1,[R2];storeR0andR1intomemorylocationsstarting

;atanaddressgivenbyR2.Now,wehave:

;12000=(46)

;12001=(10)

;12002=(38)

;12003=(82)

;12004=(56)

;12005=(50)

;12006=(58)

;12007=(15)

STREX,STREXB,STREXHStoreRegisterExclusive

Function:TheyareusedwithLDREX,LDREXB,andLDREXHinstructionstoperformCPUsynchronizationoperation.SeeARMCortexmanual.

STRHStoreRegisterHalfword

Flags:Unaffected.

Format:STRHRd,[Rn]

Function:Storesthelower2bytesoftheRdregisterintotwoconsecutivememorylocationsindicatedbyRn.

Example:

LDRR1,=0x82381046;R1=0x82381046

LDRR0,=0x12000;R0=0x12000

STRBR1,[R0];now,12000=(46),and12001=(10)

STRTStoreRegister

Flags:Unaffected

Format:STRTRx,[Rn]

Function:StoresRxregisterintomemorylocationpointedtobyRx.ThisisthesameasSTRbutisusedforunprivilegedmemoryaccess.SeeARMCortexmanual.

Example:

LDRR1,=0x82381046;R1=0x82381046

LDRR0,=0x12000;R0=0x12000

STRTR1,[R0]

;now,12000=(0x82381046)

SUBSubtract

Flags:Unaffected

Format:SUBRd,Rn,Op2;Rd=Rn–Op2

Function:SubtractstheOp2fromtheRnandputstheresultintheRd.Hasnoeffectonflags.ThestepsforsubtractionperformedbytheinternalhardwareoftheCPUareasfollows:

1.Takesthe2’scomplementoftheOp2

2.AddsthistotheRn

3.PlacetheresultintheRd

TheRdandOp2operandsremainunchangedbythisinstruction.

Example:

LDRR0,=0x55555555;R0=0x55555555

LDRR1,=0x99999999;R1=0x99999999

SUBR2,R1,R0;R2=R1-R0

;For“SUBR2,R1,R0”wehave:

;R2=R1-R0=0x99999999-0x55555555=

;R2=0x99999999+2’scompof0x55555555

;R2=0x99999999+0xAAAAAAAB=0x44444444

;0x99999999

;-0x55555555

;–––––

;0x44444444

SUBSSubtract


Format:SUBRd,Rn,Op2;Rd=Rn–Op2

Function:SubtractstheOp2fromtheRnandputstheresultintheRd.TheSUBSupdatestheflags.ThestepsforsubtractionperformedbytheinternalhardwareoftheCPUareasfollows:

1.Takesthe2’scomplementoftheOp2

2.AddsthistotheRn

3.PlacetheresultintheRd

TheRdandOp2operandsremainunchangedbythisinstruction.

Example:

LDRR0,=0x55555555;R0=0x55555555

LDRR1,=0x99999999;R1=0x99999999


;For“SUBSR2,R1,R0”wehave:

;R2=R1-R0=0x99999999-0x55555555=

;R2=0x99999999+2’scomp.of0x55555555

;R2=0x99999999+0xAAAAAAAB=0x44444444

;0x99999999

;-0x55555555

;––––—

;0x44444444C=1,Z=0,N=0,V=0

SVCsupervisorCall(SoftwareInterrupt)

Flags:Unaffected.

Format:SVC#imm_value

Function:Itisusedbyapplicationsoftwaretogetservicesfromoperatingsystems(OS).ThisisliketheSWI(softwareinterrupt)instructioninARM7.

SXTBSignExtendbyte

Flags:Unaffected.

Format:SXTBRd,Rm

Function:ConvertsasignedbyteinRmintoasignedwordbycopyingthesignbit(D7)ofRmintoallthebitsofRd.UsedwidelytoconvertasignedbyteinRmtoasignedwordtoavoidtheoverflowprobleminsignednumberarithmetic.

Example:

MOVR1,#0xFB;R1=0xFBwhichis2’scomplementof-5

SXTBR0,R1;now,R0=0xFFFFFFFB

;R1=00000000000000000000000011111011

;nowR0=0xFFFFFFFB

;R0=11111111111111111111111111111011

SXTHSignExtendHalfword

Flags:Unaffected.

Format:SXTHRd,Rm

Function:ConvertsasignedhalfwordinRmintoasignedwordbycopyingthesignbit(D15)ofRmintoallthebitsofRd.Usedwidelytoconvertasignedhalfword(16-bit)inRmtoasignedwordtoavoidtheoverflowprobleminsignednumberarithmetic.

Example:

;assumeR1=0xFFFBwhichis2’scomplementof-5

SXTHR0,R1;now,R0=0xFFFFFFFB

;R1=00000000000000001111111111111011

;now,R0=0xFFFFFFFB

;R0=11111111111111111111111111111011

TBBTableBranchByte

Flags:Unaffected.

Format:TBB[Rn,Rm]

Function:BranchesforwardusingtableofsinglebyteoffsetusingPC-relativeaddressingmode.RnhasstartingaddressofthetableandRmisanindexintothetable.SeeARMCortexM3manual.

TBHTableBranchhalfword

Flags:Unaffected.

Format:TBH[Rn,Rm,LSL#1]

Function:BranchesforwardusingtableofhalfwordoffsetusingPC-relativeaddressingmode.RnhasstartingaddressofthetableandRmisanindexintothetable.The“LSL#1”shiftslefttheaddressoncetomakeithalfwordalignedaddress.SeeARMCortexM3

manual.

TEQTestEquivalence

Flags:Affected:NandZ

Format:TEQRn,Op2;performsRnEx-OROp2

Function:PerformsabitwiselogicalEx-ORonRnandOp2,settingflagsbutleavingthecontentsofbothRnandOp2unchanged.WhiletheEORSinstructionchangesthecontentsofthedestinationandtheflagbits,theTEQinstructionchangesonlytheflagbits.Thisiswidelyusedtoseeiftworegistersareequal.

Example1:

TEQR1,R2;checktoseeifR1=R2.IfsoZ=1.R1andR2

;remainunchanged

Example2:

TEQR2,#0x01;checktoseeifD0ofR2is1,ifsoZ=1.R2

;remainsunchanged

Example3:

TEQR1,#0xFF;checktoseeifD7_D0ofR1are1s,

;ifsoZ=1.R1remainsunchanged

TSTTest

Flags:Affected:NandZ

Format:TSTRn,Op2;performsRnANDOp2

Function:PerformsabitwiselogicalANDonRnandOp2,settingflagsbutleavingthecontentsofbothRnandOp2unchanged.WhiletheANDSinstructionchangesthecontentsofthedestinationandtheflagbits,theTSTinstructionchangesonlytheflagbits.TotestwhetherabitofRnis0or1,usetheTSTinstructionwithanOp2constantthathasthatbitsetto1andallotherbitsclearedto0.

Example1:

TSTR1,#0x01;checktoseeifD0ofR1iszero,ifsoZ=1.

;R1remainunchanged

Example2:

TSTR1,#0xFF;checktoseeifanybitsofR1iszero,ifso

;Z=1.R1remainunchanged

UBFXUnsignedBitfiledextract

Flags:Unaffected.

Format:UBFXRd,Rn,#LSB,#Width

Function:ExtractsthebitfieldfromtheRnregisterandthenzeroextendsitandplacesinRd.The#LSBindicatesfromwhichbitand#Widthindicateshowmanybits.

Example1:

LDRR0,=0x00077555;R0=0x00077555

UBFXR2,R0,#8,#4;now,R2=0x00000005

Example2:

LDRR0,=0x12345678;R0=0x12345678

UBFXR2,R0,#8,#12;now,R2=0x00000456

UDIVUnsignedDivide

Flags:Unaffected

Format:UDIVRd,Rn,Rm;Rd=Rn/Rm

Function:DividesanunsignedintegerwordinRnbyanotherunsignedintegerwordinRm.ThequotientresultisplacedinRd.IfvalueinRnregisterisnotdivisiblebythevalueinRmregister,theresultisroundedtozeroandplacedinRd.Dividebyzerocausesexceptioninterrupt.

Example1:

LDRR0,=100;R0=100

LDRR1,=2000

UDIVR2,R1,R0;now,R2=R1/R0=2000/100=20

Example2:

LDRR0,=20000;R0=20000

UDIVR2,R0,#100;now,R2=2000/100=20

UMLALUnsignedMultiplywithAccumulate

Flags:Unaffected

Format:UMLALRdLo,RdHi,Rn,Rm;RdHi:RdLo=(Rm×Rn)+(RdHi:RdLo)

Function:MultipliesunsignedwordsinRnandRmregister,addsthe64-bitresulttoRdHi:RdLoregisters,andsavesthefinalresultinRdHi:RdLo.TheRdLo(low)andRdHi(high)aretheunsignedlowerwordandhigherwordofthe64-bitvalue.

Example:

LDRR0,=20000;R0=20000

LDRR1,=1000

LDRR2,=5000

LDRR3,=4000

UMLALR2,R3,R0,R1;now,R3:R2=R1×R0+R3:R2

UMULLUnsignedMultiplyLong

Flags:Unaffected

Format:UMULLRdLo,RdHi,Rn,Rm;RdHi:RdLo=Rm×Rn

Function:MultipliesunsignedwordsinRnandRmregisters,andsavestheresultinRdHi:RdLo.TheRdLo(low)andRdHi(high)arethelowerwordandhigherwordofa64-bitvalue.

Example:

LDRR0,=20000;R0=20000

LDRR1,=10000;R1=10000

LDRR2,=50000;R2=50000

LDRR3,=40000;R3=40000

UMLALR2,R3,R0,R1;now,R3:R2=R1×R0

USATUnsignedSaturate

Flags:Unaffected

Format:USATRd,#n,Rm,shift#

Function:Usedforunsignedsaturationoperation.SeeARMCortexmanual.

UXBTZeroextendabyte

Flags:Unaffected

Format:UXBTRd,Rm

Function:ZeroextendsabyteinRmandplacesinRd.UsedwidelytoconvertabyteinRmtowordforsignednumberoperations.

Example:

MOVR1,#0xFB;R1=0xFB

UXBTR0,R1;now,R0=0x00000000FB

;R1=00000000000000000000000011111011

;nowR0=0x000000FB

;R0=000000000000000000000000011111011

UXTHZeroextendhalfword

Flags:Unaffected

Format:UXTHRd,Rm

Function:ZeroextendsahalfwordinRmandplacesinRd.UsedwidelytoconvertahalfwordinRmtowordforsignednumberoperations.

Example:

;assumeR1=0xFFFB

UXTHR0,R1;now,R0=0x00000FFFB

;R1=00000000000000001111111111111011

;now,R0=0x0000FFFB

;R0=00000000000000001111111111111011

WFEWaitforevent

Flags:Unaffected

Format:WFE

Function:Usedbypowermanagement.SeeARMCortexM3manual.

WFIWaitforinterrupt

Flags:Unaffected

Format:WFI

Function:Suspendsexecutionuntiloneofthefollowingeventsoccurs:

1.anon-maskedinterruptoccursandistaken,

2.aninterruptmaskedbyPRIMASKbecomespending,

3.aDebugEntryrequest.

SeeARMCortexmanual.

AppendixB:ARMAssemblerDirectives

SectionB.1:ListofARMAssemblerDirectives

ALIGN

AREA




ENDPorENDFUNC

ENTRY

EQU(Equate)

EXPORTorGLOBAL

EXTRN(External)

FUNCTIONorPROC

INCLUDE

RN(equate)

SectionB.2:DescriptionofARMAssemblerDirectivesDirectives,orastheyaresometimescalled,pseudo-opsorpseudo-instructions,are

usedby theassembler to translateAssembly languageprograms intomachine language.Unlikethemicroprocessor’sinstructions,directivesdonotgenerateanyopcode;therefore,nomemorylocationsareoccupiedbydirectivesinthefinalhexversionoftheassemblyprogram.Tosummarize,directivesgivedirectionstotheassemblerprogramtotellithowto generate the machine code; instructions are assembled into machine code to giveinstructionstotheCPUatexecutiontime.Thefollowingaredescriptionsofthesomeofthemostwidelyuseddirectives for theARMassembler.Theyaregiven inalphabeticalorderforeaseofreference.

ALIGN

Format:

ALIGNn;nisanypowerof2from20to231

Thisisusedtomakesuredataisalignedin32-bitwordor16-bithalfwordmemoryaddress.Ifnisnotspecified,ALIGNsetsthecurrentlocationtothenextword(fourbyte)boundary.ThefollowingusesALIGNtomakethedatawordandhalfwordaligned:

ALIGN4;Thenextinstructionisword(4bytes)aligned

ALIGN;Thenextinstructionisword(4bytes)aligned

ALIGN2;Thenextinstructionishalfword(2bytes)aligned

Noticethat,thisALIGNdirectiveshouldnotbeconfusedwiththeALIGNattributeoftheAREAdirective.

AREA

Format:

AREAsectionnameattribute,attribute,…

TheAREA directive tells the assembler to define a new section ofmemory. ThememorycanbecodeordataandcanhaveattributessuchasReadOnly,ReadWrite,andsoon.This iswidelyused todefineoneormoreblocksof indivisiblememoryforcodeordatatobeusedbythelinker.EveryassemblylanguageprogramhasatleastoneAREA.

ThefollowinglinedefinesanewareanamedMY_ASM_PROG1whichhasCODEandREADONLYattributes:

AREAMY_ASM_PROG1CODE,READONLY

Among widely used attributes are CODE, DATA, READONLY, READWRITE,COMMON,andALIGN.Thefollowingdescribesthesewidelyusedattributes.

CODE is an attribute given to an area of memory used for executable machineinstruction.SinceitisusedforcodesectionoftheprogramitisbydefaultREADONLYmemory.InARMAssemblylanguageweusethisareatowriteourinstructions.

DATA isanattributegiven toanareaofmemoryusedfordataandno instruction(machine instructions)canbeplaced in thisarea.Since it isusedfordatasectionof theprogram it is bydefault aREADWRITEmemory. InARMAssembly languageweusethisareatosetasideSRAMmemoryforscratchpadandstack.

READWRITE isanattributegiven toanareaofmemorywhichcanbereadfromandwrittento.SinceitisREADWRITEsectionoftheprogramitisbydefaultforDATA.InARMAssemblylanguageweusethisareatosetasideSRAMmemoryforscratchpadandstack.

READONLY is an attribute given to an area ofmemorywhich can only be readfrom.SinceitisREADONLYsectionoftheprogramitisbydefaultforCODE.InARMAssemblylanguageweusethisareatowriteourinstructionsformachinecodeexecution.

COMMONisanattributegiventoanareaofDATAmemorysectionwhichcanbeusedcommonlybyseveralprogramcodes.WedonotinitializetheCOMMONsectionofthe memory since it is used by compiler exclusively. The compiler initializes theCOMMONmemoryareawithallzeros.

ALIGN is another attribute given to an area ofmemory to indicate howmemoryshouldbeallocatedaccordingtotheaddresses.WhentheALIGNisusedforCODEandREADONLYitalignedin4-bytesaddressboundarybydefaultsincetheARMinstructionsare all 32-bit (4-bytes) word. The ALIGN attribute of AREA has a number after likeALIGN=3whichindicatestheinformationshouldbeplacedinmemorywithaddressesof23,thatis0x50000,0x50008,0x50010,0x50020,andsoon.ThisALIGNattributeoftheAREAshouldnotbeconfusedwiththeALIGNdirective.


Format:

labelDCBn;nbetween-128to256,byteorstring

The DCB directive allocates a byte size memory and initializes the values forreadingonly.

MYVALUEDCB5;MYVALUE=5

MYMSAGEDCB“HELLOWORLD”;string


Format:

labelDCDn

The DCD directive allocates a word size memory and initializes the values forreadingonly.Thedatais32bitaligned.

MYDATADCD0x200000,0xF30F5,5000000,0xFFFF9CD7


Format:

labelDCBn

TheDCWdirectiveallocatesahalf-wordsizememoryandinitializesthevaluesforreadingonly.

MYDATADCW0x20,0xF230,5000,0x9CD7

END

EveryprogrammusthaveanentryandanENDpoint.Thelabelsfortheentryandendpointsmustmatch.TheENDdirectivetellstheassemblerthatithasreachedtheendofsourcefile.

AREAPROG_C2,CODE,READONLY

ENTRY

…

END

ENDPorENDFUNC

TheENDFUNCorENDPdirectiveinformstheassemblerthatithasreachedtheendofafunction.ENDFUNCandENDParethesame.SeeFUNCTIONorPROCdirectives.

ENTRY

TheENTRYdirective shows the entry point of a program to the assembler.Eachprogrammusthaveoneentrypoint.

AREAPROG_C2,CODE,READONLY

ENTRY

…

END

EQU(Equate)

Toassignafixedvaluetoaname,oneusestheEQUdirective.Theassemblerwillreplaceeachoccurrenceofthenamewiththevalueassignedtoit.

DATA1EQU0x39;thewaytodefinehexvalue

PORTBEQU0xF0018000;SFRPortBaddress

SUM1EQU0x40000120;assignRAMloctoSUM1

Unlike data directives such asDCB,DCD, and so on, EQU does not assign anymemorystorage;therefore,itcanbedefinedatanytimeandatanyplace,andcanevenbeusedwithinthecodesegment.

EXPORTorGLOBAL

Toinformtheassemblerthatanameorsymbolwillbereferencedbyothermodules

(in other files), it is marked by the EXPORT or GLOBAL directives. If a module isreferencinganameoutsideitself, thatnamemustbedeclaredasEXTRN(orIMPORT).Correspondingly, in the module where the variable is defined, that variable must bedeclaredasEXPORTorGLOBALinordertoallowittobereferencedbyothermodules.SeetheEXTRNdirectiveforexamplesoftheuseofbothEXTRNandEXPORT.

EXTRN(External)

TheEXTRNdirectiveisusedtoindicatethatcertainvariablesandnamesusedinamodule are defined by another module. In the absence of the EXTRN directive, theassemblerwouldsearchforthedefinitionandgiveanerrorwhenitcouldn’tfindit.Theformatofthisdirectiveis:

EXTRNname

ThefollowingexampleshowshowtheEXPORTandEXTERNdirectivesareused:

;fromthemainprogram:

EXTRNMY_FUNC

…

BLMY_FUNC

…

;–––––––––––––

;MY_FUNCislocatedinadifferentfile:

AREAOUR_EXAMPLE,CODE,READONLY

EXTRNDATA1

EXPORTMY_FUNC

MY_FUNCFUNCTION

…

LDRR1,=DATA1

…

ENDFUNC

Notice that the EXTRN directive is used in the main procedure to show thatMY_FUNC is defined in another module. This is needed because MY_FUNC is notdefined in that module. Correspondingly, MY_FUNC is defined as GLOABAL in themodulewhere it is defined. EXTRN is used in theMY_FUNCmodule to declare thatoperandDATA1hasbeendefinedinanothermodule.Correspondingly,DATA1isdeclaredasGLOBALinthecallingmodule.

FUNCTIONorPROC

Often,agroupofAssemblylanguageinstructionswillbecombinedintoaprocedure

sothatitcanbecalledbyanothermodule.TheFUNCTIONandENDFUNCdirectivesareusedtoindicatethebeginningandendoftheprocedure.Seethefollowingexample:

MY_FUNCFUNCTION

…

…

ENDFUNC

INCLUDE

Whenthereisagroupofmacroswrittenandsavedinaseparatefile,theINCLUDEdirectivecanbeusedtobringthemintoanotherfile.

RN(equate)

This isusedtodefineanameforaregister.TheRNdirectivedoesnotsetasideaseparatestorageforthename,butassociatesaregisterwiththatname.ThefollowingcodeshowshowweuseRN:



SUMRNR3;defineSUMasanameforR3


ENTRY



ADDSUM,VAL1,VAL2;addR2toR1andplaceitinR3

HEREBHERE

END

AppendixC:MacrosWhatisamacroandhowisitused?

There are applications in Assembly language programming where a group ofinstructionsperformsataskthatisusedrepeatedly.Forexample,youmightneedtoaddthree registers together. So it does notmake sense to rewrite them every time they areneeded. Therefore, to reduce the time that it takes towrite these codes and reduce thepossibility of errors, the concept ofmacroswasborn.Macros allow the programmer towritethetask(setofcodestoperformaspecificjob)onceonlyandtoinvokeitwheneveritisneeded,whereveritisneeded.

MACROdefinition

Everymacrodefinitionmusthavethreeparts,asfollows:

MACRO

[$label]macroNameparameter1,parameter2,…,parameterN

……

……

MEND

The MACRO directive indicates the beginning of the macro definition and theMEND directive signals the end. What goes in between the MACRO and MENDdirectives is called the body of themacro. The namemust be unique andmust followAssembly language naming conventions. The parameters are names, or parameters, oreven registers that are mentioned in the body of themacro. After themacro has beenwritten, itcanbe invoked(orcalled)byitsname,andappropriatevaluesaresubstitutedfor parameters. For example you might want to have an instruction that adds threeregisters.Thefollowingisamacroforthepurpose:

MACRO

ADD3VAL$DEST,$ARG1,$ARG2,$ARG3

ADD$DEST,$ARG1,$ARG2

ADD$DEST,$DEST,$ARG3

MEND

The above code is the macro definition. Note that parameters $DEST, $ARG1,$ARG2,and$ARG3arementionedinthebodyofthemacro.Todistinguishparameterstheymuststartwith$.Inthefollowingexample,themacroisinvokedbyitsnamewiththeuser’sactualdata:

AREAOURCODE,READONLY,CODE

ENTRY

MOVR1,#5

MOVR2,#2

ADD3VALR0,R1,R2,#5

Theinstruction“ADD3VALR0,R1,R2,#5”invokesthemacro.

Theassemblerexpandsthemacrobyprovidingthefollowingcodeinthe.LSTfile:

300000008E0810002ADDR0,R1,R2

40000000CE2800005ADDR0,R0,#5

DefaultValuesforparameters

Wecandefinedefaultvaluesforparametersasshownbelow:

MACRO

ADD3VAL$DEST,$ARG1=R3,$ARG2,$ARG3=#5

ADD$DEST,$ARG1,$ARG2

ADD$DEST,$DEST,$ARG3

MEND

To use the default value we put a | instead of the parameter while invoking themacro:

ADD3VALR0,R1,R2,|

Theabovecodeusesthedefaultvalueof$ARG3whichissetto#5.

Usinglabelsinmacros

In thediscussionofmacrosso far,exampleshavebeenchosen thatdonothavealabelornameinthebodyofthemacro.Thisisbecauseifamacroisexpandedmorethanonceinaprogramandthereisalabelinthelabelfieldofthebodyofthemacro,thesamelabelwouldbegeneratedmorethanonceandanassemblererrorwouldbegenerated.Toaddresstheproblemwecangiveauniquelabeltothemacrowhenweinvokeit,asshownbelow:

MACRO

$lblOUR_MACRO

CMPR1,#5

BEQ$lbl

MOVR1,#1

$lbl

MEND


ENTRY

MOVR1,#3

label1OUR_MACRO

MOVR1,#5

label2OUR_MACRO

HEREBHERE


2000000000AREAOURCODE,READONLY,CODE

2100000000ENTRY

2200000000E3A01003MOVR1,#3

2300000004label1OUR_MACRO

300000004E3510005CMPR1,#5

4000000080A000000BEQlabel1

50000000CE3A01001MOVR1,#1

600000010label1

2400000010E3A01005MOVR1,#5


300000014E3510005CMPR1,#5

4000000180A000000BEQlabel2

50000001CE3A01001MOVR1,#1

600000020label2

2600000020EAFFFFFE

HEREBHERE

Incases that therearemore thanone label inamacro the linescanbe labeledasshownbelow:

MACRO

$lblOUR_MACRO

CMPR1,#5

BEQ$lbl.equal

MOVR1,#1

B$lbl.next

$lbl.equal

MOVR1,#2

$lbl.next

MEND


ENTRY

MOVR1,#3

label1OUR_MACRO

MOVR1,#5

label2OUR_MACRO

HEREBHERE



1400000000ENTRY

1500000000E3A01003MOVR1,#3


300000004E3510005CMPR1,#5

4000000080A000001BEQlabel1equal

50000000CE3A01001MOVR1,#1

600000010EA000000Blabel1next

700000014label1equal

800000014E3A01002MOVR1,#2

900000018label1next

1700000018E3A01005MOVR1,#5

180000001Clabel2OUR_MACRO

30000001CE3510005CMPR1,#5


500000024E3A01001MOVR1,#1

600000028EA000000Blabel2next

70000002Clabel2equal

80000002CE3A01002MOVR1,#2

900000030label2next

1900000030EAFFFFFE

HEREBHERE

Conditionalmacros

Wecanpassconditionintomacros,aswell:

MACRO

$lblOurMacro$cond

CMPR1,#5

B$cond$lbl.equal

MOVR1,#1

$lbl.equal

MEND


ENTRY

MOVR1,#3

label1OurMacroEQ;inthemacrocheckequality

MOVR1,#3

label2OurMacroLO;inthemacrocheckifislower

HEREBHERE



1100000000ENTRY

1200000000E3A01003MOVR1,#3

1300000004label1OurMacroEQ

300000004E3510005CMPR1,#5


50000000CE3A01001MOVR1,#1


1400000010E3A01003MOVR1,#3

1500000014label2OurMacroLO

300000014E3510005CMPR1,#5

4000000183A000000BLOlabel2equal

50000001CE3A01001MOVR1,#1


1600000020EAFFFFFE

HEREBHERE

Notice that the firstB$cond is substitutedwithBEQwhile the secondB$cond issubstitutedwithBLOsincetheconditionsEQandLOareusedrespectively.

INCLUDEdirective

Assumethatthereareseveralmacrosthatareusedineveryprogram.Musttheyberewritten every time? The answer is no if the concept of the INCLUDE directive isknown.TheINCLUDEdirectiveallowsaprogrammertowritemacrosandsavetheminafile, and later bring them into any file. For example, assuming that some widely usedmacroswerewrittenandthensavedunderthefilename“MYMACRO1.S”,theINCLUDEdirectivecanbeusedtobringthisfileintoany“.asm”fileandthentheprogramcancallupon any of the macros as many times as needed. In the following example theADD3VALmacro isdefined in theMyMACRO.sfileand it isused in theexample.asmfile.

FigureC.1:DefiningaMacroinanIncludeFile

Macrosvs.subroutinesMacros and subroutines are useful in writing assembly programs, but each has

limitations.Macros increasecodesizeevery time theyare invoked.Forexample, ifyoucall a 10-instruction macro 10 times, the code size is increased by 100 instructions;whereas, if you call the same subroutine 10 times, the code size is only that of thesubroutine instructions.On theotherhand, a functioncall takes3clocksand the returninstructiontakes3clockstogetexecuted.So,usingfunctionsaddsaround6clockcycles.Thesubroutinesmightusestackspaceaswellwhencalled,whilethemacrosdonot.

AppendixD:FlowchartsandPseudocodeFlowcharts

If you have taken any previous programming courses, you are probably familiarwith flowcharting. Flowcharts use graphic symbols to represent different types ofprogramoperations.Thesesymbolsareconnected together intoa flowchart to show theflowofexecutionofaprogram.Themorecommonlyusedsymbolsareasfollows:

StartandEndpoints

Start and End points are commonly represented as rounded rectangles or ovalscontainingthewords“Start”or“End”.Smallcircleisanotherwaytoshowthem.

FigureD-1:StartandEndPoints

Decisions

Theconditionsarerepresentedindiamonds.

FigureD-2:aDecisionthatComparesAwithB

Process

Theprocessingstepsarerepresentedusingrectangles.

FigureD-3:aProcessSample

Inputsandoutputs

Inputsandoutputsarerepresentedasparallelogram.

FigureD-4:anOutputSample

Subroutines

Callingsubroutinesarerepresentedasshownbelow

FigureD-5:aSubroutineCallSample

PseudocodeFlowcharting has been standard practice in industry for decades. However, some

findlimitationsinusingflowcharts,suchasthefactthatyoucan’twritemuchinthelittleboxes, and it is hard to get the “big picture” ofwhat the programdoeswithout gettingbogged down in the details. An alternative to using flowcharts is pseudocode, whichinvolves writing brief descriptions of the flow of the code. Figures D-6 through D-10showflowchartsandpseudocodeforcommonlyusedcontrolstructures.

Structured programming uses three basic types of program control structures:sequence, control, and iteration. Sequence is simply executing instructions one afteranother. Figure D-6 shows how sequence can be represented in pseudocode andflowcharts.

FigureD-6:SEQUENCEPseudocodeversusFlowchart

NoteinFiguresD-6throughD-11that“statement”canindicateonestatementoragroupofstatements.

FigureD-7throughD-9showtwocontrolprogrammingstructures:IF-THEN-ELSEandIF-THENinbothpseudocodeandflowcharts.

FigureD-7:IFTHENELSEPseudocodeversusFlowchart

FigureD-8:anIFTHENELSESample

FigureD-9:IFTHENPseudocodeversusFlowchart

FiguresD-10andD-11showtwoiterationcontrolstructures:REPEATUNTILandWHILEDO.Bothstructuresexecuteastatementorgroupofstatementsrepeatedly.Thedifference between them is that the REPEAT UNTIL structure always executes thestatement(s) at least once, and checks the condition after each iteration, whereas theWHILEDOmaynotexecutethestatement(s)atallbecausetheconditionischeckedatthebeginningofeachiteration.

FigureD-10:REPEATUNTILPseudocodeversusFlowchart

FigureD-11:WHILEDOPseudocodeversusFlowchart

ProgramD-1finds thesumofnumbersbetween1and10.Compare theflowchartversus thepseudocode forProgramD-1 (shown inFigureD-12). In this example,moreprogram details are given than one usually finds. For example, this shows steps forinitializingandchangingvalues.Anotherprogrammermaynotincludethesestepsintheflowchart or pseudocode. It is important to remember that the purpose of flowcharts orpseudocode is to show the flow of the program and what the program does, not thespecificAssemblylanguageinstructionsthataccomplishtheprogram’sobjectives.Noticealsothatthepseudocodegivesthesameinformationinamuchmorecompactformthandoestheflowchart.Itisimportanttonotethatsometimespseudocodeiswritteninlayers,sothattheouterlevelorlayershowstheflowoftheprogramandsubsequentlevelsshowmoredetailsofhowtheprogramaccomplishesitsassignedtasks.

ProgramD-1

intmain()

{

intsum=0;

intvalue=1;

do{

sum=sum+value;

value++;

}while(value<=10);

printf(“%d”,sum);

}

FigureD-12:PseudocodeversusFlowchartforProgramD-1

AppendixE:PassingArgumentsintoFunctionsTherearedifferentwaystopassargumentstofunctions.Someofthemare:

·throughregisters

·throughmemoryusingreferences

·usingstack

E.1:PassingargumentsthroughregistersInthefollowingprogramtheBIGGERfunctiongetstwovaluesthroughR0andR1.

AftercomparingR0andR1,itreturnsthebiggervaluethroughR2.

ProgramE-1


ENTRY

MOVR0,#5;R0=5

MOVR1,#7;R1=7

BLBIGGER;BIGGER(5,7)

HEREBHERE;stayhere

;=======================================

;BIGGERreturnsthebiggervalue

;Parameters:

;R0andR1:thevaluestobecompared

;Returns:

;R2:containingthebiggervalue

;=======================================

BIGGER

CMPR0,R1

BHIL1;ifR0>R1gotoL1

MOVR2,R1;R2=R1

BXLR;return

L1MOVR2,R0;R2=R0

BXLR;return

END

Thisisafastwayofpassingargumentstothefunction.

E.2:PassingthroughmemoryusingreferencesWe can store the data in memory and pass its address through a register. In the

following program the STR_LENGTH function gets the address of a zero-ended stringthroughR0andreturnsthelengthofthestringthroughR1.

ProgramE-2


ENTRY

ADRR0,OUR_STR;R0=addr.ofOUR_STR

BLSTR_LENGTH;STR_LENGTH(&OUR_STR)

HEREBHERE;stayhere

OUR_STRDCB“HELLO!”

ALIGN4

;=======================================

;STR_LENGTHreturnsthelengthofstr

;Parameters:

;R0:addressofthestring

;Returns:

;R0:thelengthofstring

;=======================================

STR_LENGTH

MOVR2,#0

L_BEGIN

LDRBR5,[R0]

CMPR5,#0

BXEQLR;returnif0

ADDR2,R2,#1;R2=R2+1

ADDR0,R0,#1;R0=R0+1

BL_BEGIN

END

E.3:PassingargumentsthroughstackPassing through the stack is a flexible way of passing arguments. To do so, the

argumentsarepushedintothestackjustbeforecallingthefunctionandpoppedoffafterreturning. InProgramE-3, theBIGGER function gets two arguments through the stackandreturnsthebiggervaluethroughthestack.

ProgramE-3


ENTRY

;initstackpointer

LDRSP,=(0x40000000+(16*1024))

MOVR0,#5

PUSH{R0};pushArg1

MOVR0,#7

PUSH{R0};pushArg2

BLBIGGER;BIGGER(5,7)

LDRR1,[SP,#0];R1=returnvalue

POP{R0}

POP{R0}

HEREBHERE;stayhere

;=======================================

;BIGGERreturnsthebiggervalue

;Parameters:

;valuestobecompared

;Returns:

;thebiggervalue

;=======================================

BIGGER

LDRR0,[SP,#4];R0=arg1

LDRR1,[SP,#0];R1=arg2

CMPR0,R1

BHIL1;ifR0>R1gotoL1

STRR1,[SP,#0];returnR1

BXLR;return

L1STRR0,[SP,#0];returnR0

BXLR;return

END

Thismethodofpassingargumentsareusedinx86computers.

E.4:AAPCS(ARMApplicationProcedureCallStandard)TheAAPCSprovidesanstandardforimplementingthefunctionsandthefunction

callsso that thecodesmadebydifferentcompilersanddifferentprogrammerscanworkwitheachother.Someoftherulesofthestandardare:

·TheargumentsmustbesentthroughR0toR4,respectively.

·ThereturnvaluemustbesentbackthroughR0andR1.

· The functions canuseR5 toR8 for their internal calculations.But theirvalues must be retrieved before returning. To do so, we push the registersbeforeusingthemandpopthembeforereturningfromthefunction.

·ThestackmustbeusedasFullDescending

InProgramE-1mostoftheaboverulesareconsidered.ButthereturnvaluemustbeinR0insteadofR2.

InProgramE-4theaboverulesareconsidered.

ProgramE-4


ENTRY

;initstackpointer

;(changeitaccordingtoyourchip)

LDRSP,=(0x40000000+(16*1024))

MOVR0,#20

BLDELAY;DELAY(20)

HEREBHERE;stayhere

;=======================================

;DELAYwaitsforawhile

;Parameters:

;R0:theamountofwait

;Returns:

;none

;=======================================

DELAY

CMPR0,#0

BXEQLR;returnifzero

PUSH{R5};saveR5

LDRR5,=5000000;R5=5000000

L1SUBSR5,R5,#1;R5=R5-1

BNEL1;gotoL1ifR5isnotzero

POP{R5};restoreR5

BXLR;return

END

Moreinformation

FormoreinformationaboutAAPCSseethefollowingarticleorsearch“AAPCS”ontheInternet:

http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042e/IHI0042E_aapcs.pdf

http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042e/IHI0042E_aapcs.pdf

AppendixF:ASCIICodes

arm assembly language: programming and architecture

Documents