arm assembly language: programming and architecture

436

Upload: others

Post on 11-Sep-2021

19 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ARM Assembly Language: Programming and Architecture
Page 2: ARM Assembly Language: Programming and Architecture

ARMAssemblyLanguageProgramming&Architecture

MuhammadAliMazidi

SarmadNaimiSepehrNaimiJaniceMazidi

Page 3: ARM Assembly Language: Programming and Architecture

Copyright©2013MazidisandNaimis

Allrightsreserved

(Aportionofthisbookwastakenfrom“The80x86IBMPC&CompatibleComputersVol1&2:4thedition”andwaspreviouslypublishedbyPearsonEducation,

Inc.)

Page 4: ARM Assembly Language: Programming and Architecture

“Regardmanasaminerichingemsofinestimablevalue.

Educationcan,alone,causeittorevealitstreasures,andenablemankindtobenefittherefrom.”

Baha’u’llah

Page 5: ARM Assembly Language: Programming and Architecture

DedicationTothefaculty,staff,andstudentsofBIHEuniversityfortheirdedicationand

steadfastness.

Page 6: ARM Assembly Language: Programming and Architecture

PrefaceTheARMprocessor is becoming the dominantCPUarchitecture in the computer

industry. It is already the leadingarchitecture incellphonesand tablet computers.WithsuchalargenumberofcompaniesproducingARMchips,itiscertainthatthearchitecturewillmovetothelaptop,desktopandhigh-performancecomputerscurrentlydominatedbyx86 architecture from Intel and AMD. Currently the PIC and AVR microcontrollersdominate the 8-bit microcontroller market. The ARM architecture will have a majorimpactinthisareatooasdesignersbecomemorefamiliarwithitsarchitecture.ThisbookisintendedasanintroductiontoARMassemblylanguageprogrammingandarchitecture.We assumeno prior background in assembly language programmingwith otherCPUs.However,weurgeyou to studyChapter0covering the fundamentalsofdigital systemssuchashexadecimalnumbers,varioustypesofmemory,memoryandI/Ointerfacing,andmemory address decoding. Chapter 0 is available free of charge on our websitehttp://www.MicroDigitalEd.com/ARM/ARM_books.htm

UniversitiesandcollegesThis book is intended for both academic and industry readers. The answers to

reviewquestionsatendofeachsectionareprovidedatendofthechapter.Ifyouareusingthisbookforauniversitycoursethereareend-of-chapterproblemsthatcanbefoundonwww.MicroDigitalEd.com/ARM/ARM_books.htm

OurupcomingbooksintheARMseriesThisbookcoverstheAssemblylanguageprogrammingoftheARMchip.TheARM

Assemblylanguageisstandardregardlessofwhomakesthechip.TheARMlicenseesarefree to implement theon-chipperipheral (ADC,Timers, I/O,…)as theychoose.SincetheARMperipherals arenot standard among thevariousvendors,wehavededicated aseparatebooktoeachvendor.Thefollowingbooksareplannedinthisseries:

TIARMPeripheralProgrammingandInterfacing

FreescaleARMPeripheralProgrammingandInterfacing

NXPARMPeripheralProgrammingandInterfacing

AtmelARMPeripheralProgrammingandInterfacing

TheabovebooksuseClanguagetoprogramtheperipherals.

Finally,wewould like to thankprofessorsShujenChen,RabahAoufi, andClydeKnight for their reading of the book before publication. We sincerely appreciate theirinsightsandinputs.WewouldalsoliketothankFadaMahmoudi,MozhdeAmiri,KeyvanRoshani,AzaliaYaghini,ArashZamani,ParhamFazlali,andAshkanFarivarforsharingtheircommentsontheprepublicationversionofthise-book.

Contactusatthefollowingemailaddress:

Page 7: ARM Assembly Language: Programming and Architecture

[email protected]

andpleaseplaceARMbookinsubjectlineofyouremail.

Page 8: ARM Assembly Language: Programming and Architecture

TableofContentsChapter1:TheHistoryofARMandMicrocontrollers

Section1.1:IntroductiontoMicrocontrollers

Section1.2:TheARMFamilyHistory

Problems

AnswerstoReviewQuestions

Chapter2:ARMArchitectureandAssemblyLanguageProgramming

Section2.1:TheGeneralPurposeRegistersintheARM

Section2.2:TheARMMemoryMap

Section2.3:LoadandStoreInstructionsinARM

Section2.4:ARMCPSR(CurrentProgramStatusRegister)

Section2.5:ARMDataFormatandDirectives

Section2.6:IntroductiontoARMAssemblyProgramming

Section2.7:AssemblinganARMProgram

Section2.8:TheProgramCounterandProgramROMSpaceintheARM

Section2.9:SomeARMAddressingModes

Section2.10:RISCArchitectureinARM

Section2.11:ViewingRegistersandMemorywithARMKeilIDE

Problems

AnswerstoReviewQuestions

Chapter3:ArithmeticandLogicInstructionsandPrograms

Section3.1:ArithmeticInstructions

Section3.2:LogicInstructions

Section3.3:RotateandBarrelShifter

Section3.4:ShiftandRotateInstructionsinARMCortex(CaseStudy)

Section3.5:BCDandASCIIConversion

Problems

AnswerstoReviewQuestions

Chapter4:Branch,Call,andLoopinginARM

Section4.1:LoopingandBranchInstructions

Page 9: ARM Assembly Language: Programming and Architecture

Section4.2:CallingSubroutinewithBL

Section4.3:ARMTimeDelayandInstructionPipeline

Section4.4:ConditionalExecution

Problems

AnswerstoReviewQuestions

Chapter5:SignedNumbersandIEEE754FloatingPoint

Section5.1:SignedNumbersConcept

Section5.2:SignedNumberInstructionsandOperations

Section5.3:IEEE754Floating-PointStandards

Problems:

AnswerstoReviewQuestions

Chapter6:ARMMemoryMap,MemoryAccess,andStack

Section6.1:ARMMemoryMapandMemoryAccess

Section6.2:StackandStackUsageinARM

Section6.3:ARMBit-AddressableMemoryRegion

Section6.4:AdvancedIndexedAddressingMode

Section6.5:ADR,LDR,andPCRelativeAddressing

Problems

AnswerstoReviewQuestions

Chapter7:ARMPipelineandCPUEvolution

Section7.1:ARMPipelineEvolution

Section7.2:OtherCPUEnhancements

Problems

AnswerstoReviewQuestions

AppendixA:ARMCortex-M3InstructionDescription

SectionA.1:ListofARMCortex-M3Instructions

SectionA.2:ARMCortex-M3InstructionDescription

AppendixB:ARMAssemblerDirectives

SectionB.1:ListofARMAssemblerDirectives

SectionB.2:DescriptionofARMAssemblerDirectives

AppendixC:Macros

Page 10: ARM Assembly Language: Programming and Architecture

Whatisamacroandhowisitused?

Macrosvs.subroutines

AppendixD:FlowchartsandPseudocode

Flowcharts

Pseudocode

AppendixE:PassingArgumentsintoFunctions

E.1:Passingargumentsthroughregisters

E.2:Passingthroughmemoryusingreferences

E.3:Passingargumentsthroughstack

E.4:AAPCS(ARMApplicationProcedureCallStandard)

AppendixF:ASCIICodes

Page 11: ARM Assembly Language: Programming and Architecture
Page 12: ARM Assembly Language: Programming and Architecture
Page 13: ARM Assembly Language: Programming and Architecture

Chapter1:TheHistoryofARMandMicrocontrollersInSection1.1welookatthehistoryofmicrocontrollersthenweintroducesomeof

theavailablemicrocontrollers.ThehistoryofARMisprovidedinSection1.2.

Page 14: ARM Assembly Language: Programming and Architecture
Page 15: ARM Assembly Language: Programming and Architecture

Section1.1:IntroductiontoMicrocontrollersTheevolutionofMicroprocessorsandMicrocontrollers

In early computers, CPUs were designed using a number of vacuum tubes. Thevacuum tubewas bulky and consumed a lot of electricity. The invention of transistors,followed by the IC (Integrated Circuit), provided the means to put a CPU on printedcircuitboards.TheadvancesinICtechnologyallowedputtingtheentireCPUonasingleICchip.This ICwascalledamicroprocessor.Someof themicroprocessorsare thex86family of Intel used widely in desktop computers, and the 68000 of Motorola. ThemicroprocessorsdonotcontainRAM,ROM,orI/Operipherals.Asaresult,theymustbeconnectedexternallytoRAM,ROMandI/O,asshowninFigure1-1.

Figure1-1:AComputerMadebyGeneralPurposeMicroprocessor

In thenextstep, thedifferentpartsofasystem, includingCPU,RAM,ROM,andI/Os, were put together on a single IC chip and it was called microcontroller. SOC(System on Chip) andMCU (Micro Controller Unit) are other names used to refer tomicrocontrollers. Figure 1-2 shows the simplified view of the internal parts ofmicrocontrollers.

Figure1-2:SimplifiedViewoftheInternalPartsofMicrocontrollers(SOC)

Since the microcontrollers are cheap and small, they are widely used in manydevices.

TypesofComputers

Typically,computersarecategorizedinto3groups:desktopcomputers,servers,andembeddedsystems.

Desktop computers, including PCs, tablets, and laptops, are general purposecomputers.Theycanbeusedtoplaygames,readandeditarticles,anddoanyothertaskjust by running the proper application programs. The desktop computers usemicroprocessors.

Page 16: ARM Assembly Language: Programming and Architecture

Incontrast,embeddedsystemsarespecial-purposecomputers.Inembeddedsystemdevices,thesoftwareapplicationandhardwareareembeddedtogetherandaredesignedtodoaspecifictask.Forexample,theKindle,digitalcamera,vacuumcleaner,mp3player,mouse, keyboard, and printer, are some examples of embedded systems. Inmost casesembeddedsystemsrunafixedprogramandcontainamicrocontroller.Itisinterestingtonote that embedded systems are the largest class of computers though they are notnormallyconsideredascomputersbythegeneralpublic.

Serversarethefastcomputerswhichmightbeusedaswebhosts,databaseservers,andinanyapplicationinwhichweneedtoprocessahugeamountofdatasuchasweatherforecasting. Similar to desktop computers, servers are made of microprocessors but,multipleprocessorsareusuallyusedineachserver.Bothserversanddesktopcomputersare connected to a number of embedded systemdevices such asmouse, keyboard, diskcontroller,Flashstickmemoryandsoon.

ABriefHistoryoftheMicrocontrollers

Inthe1980sand1990s,IntelandMotoroladominatedthefieldofmicroprocessorsandmicrocontrollers. Intel had the x86 (8088/86, 80286, 80386, 80486, and Pentium).Motorola (nowFreescale) had the 68xxx (68000, 68010, 68020, etc.).Many embeddedsystemsusedIntel’s32-bitchipsofx86(386,486,Pentium)andMotorola’s32-bit68xxxforhigh-endembeddedproductssuchasrouters.Forexample,Ciscoroutersused68xxxfor theCPU.At the lowend, the8051 fromInteland68HC11fromMotorolawere thedominant8-bitmicrocontrollers.With the introductionofPICfromMicrochipandAVRfromAtmel, they becamemajor players in the 8-bitmarket formicrocontroller. At thetime of this writing, PIC and AVR are the leaders in terms of volume for 8-bitmicrocontrollers. In the late 1990s, the ARM microcontroller started to challenge thedominanceofIntelandMotorolainthe32-bitmarket.AlthoughbothIntelandMotorolausedRISCfeaturestoenhancetheperformanceoftheirmicroprocessors,duetotheneedtomaintain compatibilitywith legacy software, they could notmake a clean break andstart over. Intel used massive amounts of gates to keep up the performance of x86architecture and that in turn increased the power consumption of the x86 to a levelunacceptable for battery-powered embedded products.Meanwhile Freescale (Motorola)streamlinedtheinstructionsofthe68xxxCPUandcreatedanewlineofmicroprocessorscalledColdFire,whileatthesametimeworkedwithIBMtodesignanewRISCprocessorcalledPowerPC.WhilebothPowerPCandColdfireare still aliveandbeingused in the32-bit market, it is ARM which has become the leading microcontroller in the 32-bitmarket.

CurrentlyAvailableMicrocontrollers

Therearemanymicrocontrollersavailableinthemarket.SomeofthemarelistedinTable1-1.

32-bit

ARM, AVR32 (Atmel), ColdFire (Freescale), MIPS32, PIC32 (Microchip), PowerPC, TriCore(Infineon),SuperH

Page 17: ARM Assembly Language: Programming and Architecture

16-bit

MSP430(TI),HCS12(Freescale),PIC24(Microchip),dsPIC(Microchip)

8-bit

8051,AVR(Atmel),HCS08(Freescale),PIC16,PIC18

Table1-1:SomeMicrocontrollers

Introductiontosome32-bitmicrocontrollers

x86: The x86 and Pentium processors are based on the 32-bit architecture of the386.AlthoughbothIntelandAMDarepushingthex86intotheembeddedmarket,duetothehighpowerconsumptionof thesechips, theembeddedmarkethasnotembraced thex86.Intelisworkinghardtomakealow-powerversionofthe386calledAtomavailablefortheembeddedmarket.

PIC32:ItisbasedontheMIPSarchitectureandisgettingsomeattentionduetothefact it shares some of the peripherals with the PIC24/PIC18 chips and also using theMPLABfor IDE. Microchiphopes the freeMPLABIDEandengineers’knowledgeofthe 8-bit PIC will attract embedded developers to the PIC32 as they move to 32-bitsystemsfortheirhighendembeddedproducts.

ColdFire: The Freescale (formerly Motorola) is based on the venerable 680x0(68000, 68010) so popular in the 1980s and 1990s. They streamlined the 68000instructions to make it more RISC-type architecture and is the top seller of 32-bitprocessorsfromtheFreescale.InrecentyearsFreescalerevampedandredesignedthe8-bitHCS08(fromthe6808)tosharesomeoftheperipheralswithColdFireandarepushingthemunderthenameFlexis.TheyhopeengineersusetheHCS08atthelow-endandmovetoColdfireforhigh-endoftheembeddedproductswithminimumlearningcurve.

PowerPC: Thiswas developed jointly by IBM and Freescale. Itwas used in theAppleMacforafewyears.ThenAppleswitchedtox86forawhileandcurrentlyisusingARMinall theirproducts.Nowadays,bothFreescaleandIBMmarket thePowerPCforthehigh-endoftheembeddedsystems.

Howtochooseamicrocontroller

Thefollowingtwofactorscanbeimportantinchoosingamicrocontroller:

· Chipcharacteristics:Someofthefactorsinchoosingamicrocontrollerchipareclockspeed,powerconsumption,price,andon-chipmemoriesandperipherals.

· Availableresources:OtherfactorsinchoosingamicrocontrollerincludetheIDEcompiler,legacysoftware,andmultiplesourcesofproduction.

ReviewQuestions

1.Trueorfalse.Microcontrollersarenormallylessexpensivethanmicroprocessors.

2. Whencomparingasystemboardbasedonamicrocontrollerandageneral-purposemicroprocessor,whichoneischeaper?

Page 18: ARM Assembly Language: Programming and Architecture

3.Amicrocontrollernormallyhaswhichofthefollowingdeviceson-chip?

(a)RAM(b)ROM(c)I/O(d)alloftheabove

4.Ageneral-purposemicroprocessornormallyneedswhichofthefollowingdevicestobeattachedtoit?

(a)RAM(b)ROM(c)I/O(d)alloftheabove

5.Anembeddedsystemisalsocalledadedicatedsystem.Why?

6.Whatdoesthetermembeddedsystemmean?

7.Whydoeshavingmultiplesourcesofagivenproductmatter?

Page 19: ARM Assembly Language: Programming and Architecture
Page 20: ARM Assembly Language: Programming and Architecture

Section1.2:TheARMFamilyHistoryInthissection,welookattheARManditshistory.

AbriefhistoryoftheARM

TheARMcameoutofacompanycalledAcornComputers inUnitedKingdominthe1980s.ProfessorSteveFurberofManchesterUniversityworkedwithSophieWilsontodefine theARMarchitecture and instructions.TheVLSITechnologyCorp.producedthe first ARM chip in 1985 for Acorn Computers and was designated as Acorn RISCMachine(ARM).Unable tocompetewithx86(8088,80286,80386,…)PCsfromIBMandotherpersonalcomputermakers,theAcornwasforcedtopushtheARMchipintothesingle-chipmicrocontrollermarketforembeddedproducts.ThatiswhenAppleCorp.gotinterestedinusingtheARMchipforthePDA(personaldigitalassistants)products.ThisrenewedinterestinthechipledtothecreationofanewcompanycalledARM(AdvancedRISCMachine).ThisnewcompanybetitsentirefortuneonsellingtherightstothisnewCPU to other siliconmanufacturers and design houses. Since the early 1990s, an everincreasingnumberofcompanieshavelicensedtherighttomaketheARMchip.SeeTable1-2forthemajormilestonesoftheARM.

Alsoseehttp://www.arm.com/about/company-profile/milestones.phpforthelist.

Table1-2:ARMCompanymilestones(www.ARM.com)

1982

AcornproducedacomputerforBBCnamedBBCmicro.GoodsalesofthecomputermotivatedAcorntodecidetomakeitsownmicroprocessor.

1983

AcornandVLSIbegandesigningtheARMmicroprocessor.

1985

AcornComputerGroupdevelopedtheworld’sfirstcommercialRISCprocessor.TheARMv1had2500transistors,andworkedwithafrequencyof4MHz.

1987

Acorn’sARMprocessordebutsasthefirstRISCprocessorforlow-costPCs

1989

AcornintroducedARMv3withafrequencyof25MHz.Ithada4KBcacheaswell.

1990

Page 21: ARM Assembly Language: Programming and Architecture

AdvancedRISCMachines(ARM)spinsoutofAcornandAppleComputer’scollaborationeffortswithachartertocreateanewmicroprocessorstandard.VLSITechnologybecomesaninvestorandthefirstlicensee.

1991

ARMintroduceditsfirstembeddableRISCcore,theARM6solutionusingARMv3architecture.

1992

GECPlesseyandSharplicensedARMtechnology

1993

TexasInstrumentslicensedARMtechnology

ARMintroducedtheARM7core.

1995

ARMannouncedtheThumbarchitectureextension,whichgives32-bitRISCperformanceat16-bitsystemcostandoffersindustry-leadingcodedensity

ARMlaunchedSoftwareDevelopmentToolkit

1996

ARMandVLSITechnologyintroducedtheARM810microprocessor

ARMandMicrosoftworkedtogethertoextendWindowsCEtotheARMarchitecture

1997

Hyundai,Lucent,Philips,RockwellandSonylicensedARMtechnology

ARM9TDMIfamilyannounced

1998

HP,IBM,Matsushita,SeikoEpsonandQualcommlicensedARMtechnology

ARMdevelopedsynthesizableversionoftheARM7TDMIcore

ARMPartnersshippedmorethan50millionARM-poweredproducts

1999

LSILogic,STMicroelectronicsandFujitsulicensedARMtechnology

ARMannouncedsynthesizableARM9Eprocessorwithenhancedsignalprocessing

Page 22: ARM Assembly Language: Programming and Architecture

2000

Agilent,Altera,Micronas,Mitsubishi,Motorola,Sanyo,TriscendandZTEIClicensedARMtechnology

ARMlaunchedSecurCorefamilyforsmartcards

TSMCandUMCbecamemembersofARMFoundryProgram

2001

ARM’sshareofthe32-bitembeddedRISCmicroprocessormarketgrewto76.8percent

ARMannouncednewARMv6architecture

Fujitsu,GlobalUniChip,SamsungandZeevolicensedARMtechnology

ARMacquiredkeytechnologiesandanembeddeddebugdesignteamfromNoralMicrologicsLtd

2002

ARMannouncedthatithadshippedoveronebillionofitsmicroprocessorcorestodate

ARMtechnologylicensedtoSeagate,Broadcom,Philips,Matsushita,Micrel,eSilicon,ChipExpressandITRI

ARMlaunchedtheARM11micro-architecture

ARMlaunchesitsRealViewfamilyofdevelopmenttools

FlextronicsbecamethefirstARMLicensingPartnerprogrammember,allowingittosub-licenseARMtechnologytoitsowncustomers

2004

TheARMCortexfamilyofprocessors,basedontheARMv7architecture,isannounced.TheARMCortex-M3isannouncedinconjunction,asthefirstofthenewfamilyofprocessors

ARMCortex-M3processorannounced,thefirstofanewCortexfamilyofprocessorcores

MPCoremultiprocessorlaunched,thefirstintegratedmultiprocessor

OptimoDEtechnologylaunched,thegroundbreakingembeddedsignalprocessingcore

2005

ARMacquiredKeilSoftware

ARMCortex-A8processorannounced

2007

FivebillionthARMPoweredprocessorshippedtothemobiledevicemarket

ARMCortex-M1processorlaunched–thefirstARMprocessordesignedspecificallyforimplementationonFPGAs

Page 23: ARM Assembly Language: Programming and Architecture

RealViewProfilerforEmbeddedSoftwareAnalysisintroduced

ARMunveilsCortex-A9processorsforscalableperformanceandlow-powerdesigns

2008

ARMannounces10billionthprocessorshipment

ARMMali-200GPUWorldsFirsttoachieveKhronosOpenGLES2.0conformanceat1080pHDTVresolution

2009

ARMannounces2GHzcapableCortex-A9dualcoreprocessorimplementation

ARMlaunchesitssmallest,lowestpower,mostenergyefficientprocessor,Cortex-M0

2010

ARMlaunchesCortex-M4processorforhighperformancedigitalsignalcontrol

ARMtogetherwithkeyPartnersformLinarotospeedrolloutofLinux-baseddevices

MicrosoftbecomesanARMArchitectureLicensee

ARM&TSMCsignlong-termagreementtoachieveoptimizedSystems-on-ChipbasedonARMprocessors,extendingdownto20nm

ARMextendsperformancerangeofprocessorofferingwiththeCortex-A15MPCoreprocessor

ARMMalibecomesthemostwidelylicensedembeddedGPUarchitecture

ARMMali-T604GraphicsProcessingUnitintroducedprovidingindustry-leadinggraphicsperformancewithanenergy-efficientprofile

2011

MicrosoftunveilsWindowsonARMatCES2011

IBMandARMcollaboratetoprovidecomprehensivedesignplatformsdownto14nm

ARMandUMCextendpartnershipinto28nm

Cortex-A7processorlaunched

Big-Littleprocessingannounced,linkingCortex-A15andCortex-A7processors

ARMv8architectureunveiledatTechCon

AMPannouncelicenseandplansforfirstARMv8-basedprocessor

ARMMali-T658GPUlaunched

ARMexpandsR&DpresenceinTaiwanwithHsinchuDesignCenter

ARMandAvnetlaunchEmbeddedSoftwareStore(ESS)

ARM,CadenceandTSMCtapeoutfirst20nmCortex-A15multicoreprocessor

Page 24: ARM Assembly Language: Programming and Architecture

2012

ARM,GemaltoandG&Dformjointventuretodelivernext-generationmobilesecurity

FirstWindowsRT(WindowsonARM)devicesrevealed

ARM,AMD,Imagination,MediaTekandTexasInstrumentsfoundingmembersofHeterogeneousSystemArchitecture(HAS)Foundation

ARMandTSMCworktogetheronFinFETprocesstechnologyfornext-generation64-bitARMprocessors

ARMformsfirstUKforumtocreatetechnologyblueprint“InternetofThings”devices

ARMnamedoneofBritain’sTopEmployers

MITTechnologyReviewnamedARMinitslistof50MostInnovativeCompanies

Currently theARMCorp. receives its entire revenue from licensing theARM toother companies since it does not own state of the art chip fabrication facility. ThisbusinessmodelofmakingmoneyfromsellingIP(intellectualproperty)hasmadeARMoneofthemostwidelyusedCPUarchitecturesintheworld.UnlikeIntelorFreescalewhodefinethearchitectureandfabricate thechip,hundredsofcompanieswhohavelicensedtheARMIPfeelalevelplayingfieldwhenitcomestocompetingwiththeoriginatorofthechip.

ARMandApple

WhenSteveJobscamebacktoruntheApplein1996,thecompanywasindecline.Ithadlostthepersonalcomputerracethathadstarted20yearsearlier.TheintroductionofiPod in 2001 changed the fortune of that companymore than anything else.Apple hadtriedtosellaPDAcalledNewtoninthe1990sbutwasnotsuccessful.TheNewtonwasusing theARMprocessor and itwas too early for its time.The iPodused an enhancedversionofARMcalledARM7andbecameaninstantsuccess.iPodbroughttheattentionto the ARM chip that it deserved. Since then Apple has been using the ARM chip iniPhonesand iPads.Today, theARMmicrocontroller is theCPUofchoice fordesigningcell phone and other hand-held devices. In the future,ARMwillmake further in-roadsinto the tablet and laptopPCmarketnow thatMicrosoftCorphas introduced theARMversionofitsWindowsoperatingsystem.

ARMfamilyvariations

AlthoughtheARM7familyisthemostwidelyusedversion,ARMisdeterminedtopushthearchitectureintothelowendofthemicrocontrollermarketwhere8-and16-bitmicrocontrollershavebeentraditionallydominating. ForthisreasontheyhavecomeupwithamicrocontrollerversionofARMcalledCortex.Aswewillseeinfuturechapters,the Cortex family of ARM microcontrollers maintains compatibility with the ARM7without sacrificing performance.TheARMarchitecture is also being pushed into high-performancesystemswheremulticorechipssuchasIntelXeondominate.

Figure 1-3 shows some of the most widely used ARM processors. It should beemphasized that we cannot use the terms ARM family and ARM architectureinterchangeably. For example, ARM11 family is based on ARMv6 architecture and

Page 25: ARM Assembly Language: Programming and Architecture

ARMv7AisthearchitectureofCortex-Afamily.

Figure1-3:ARMFamilyandArchitecture

OneCPU,manyperipherals

ARMhasdefinedthedetailsofarchitecture,registers,instructionset,memorymap,andtimingoftheARMCPUandholdsthecopyrighttoit.Thevariousdesignhousesandsemiconductormanufacturers license the IP (intellectual property) for theCPUand canadd their own peripherals as they please. It is up to the licensee (design houses andsemiconductormanufactures)todefinethedetailsofperipheralssuchasI/Oports,serialportUART,timer,ADC,SPI,DAC,I2C,andsoon.AsaresultwhiletheCPUinstructionsand architecture are same across all the ARM chips made by different vendors, theirperipheralsarenotcompatible.ThatmeansifyouwriteaprogramfortheserialportofanARMchipmadebyTI(TexasInstrument), theprogrammightnotnecessarilyrunonanARMchipsoldbyNXP.ThisistheonlydrawbackoftheARMmicrocontroller.ThegoodnewsistheIDE(integrateddevelopmentenvironment)suchasKeil(seewww.keil.com)orIAR(seewww.IAR.com)doprovideperipherallibrariesforchipsfromvariousvendorsandmake the jobofprogrammingtheperipheralsmucheasier. Itmustbenoted that inrecentyearsARMprovidestheIPforsomeperipheralssuchasUARTandSPI,butunliketheCPUarchitecture,itsadoptionisnotmandatoryanditisuptothechipmanufacturerwhether to adopt it or not. This is in contrast to the Coldfire microcontroller fromFreescale,inwhichtheFreescaledefinesthearchitectureandperipherals,fabricates,sells,andsupportsthechip.Figure1-4showstheARMsimplifiedblockdiagramandTable1-3providesalistofsomeARMvendors.

Page 26: ARM Assembly Language: Programming and Architecture

Figure1-4:ARMSimplifiedBlockDiagram

Actel AnalogDevices Atmel

Broadcom Cypress Ember

DustNetworks Energy Freescale

Fujitso Nuvoton NXP

Renesas Samsung ST

Toshiba TexasInstruments TriadSemiconductor

Table1-3:ARMVendors

ReviewQuestions

1.Trueorfalse.TheARMCPUinstructionsareuniversalregardlessofwhomakesthechip.

2.Trueorfalse.TheperipheralsofARMmicrocontrollerarestandardizedregardlessof

Page 27: ARM Assembly Language: Programming and Architecture

whomakesthechip.

3.AnARMmicrocontrollernormallyhaswhichofthefollowingdeviceson-chip?

(a)RAM(b)Timer(c)I/O(d)alloftheabove

4.Forwhichofthefollowings,ARMhasdefinedstandard?

(a)RAMsize(b)ROMsize(c)instructionset(d)alloftheabove

SeethefollowingwebsitesforARMmicrocontrollersandARMtrainers:

http://www.ARM.com

http://www.MicroDigitalEd.com

Page 28: ARM Assembly Language: Programming and Architecture

ProblemsSection1.1:IntroductiontoMicrocontrollers

1.TrueorFalse.Ageneral-purposemicroprocessorhason-chipROM.

2.TrueorFalse.Generally,amicrocontrollerhason-chipROM.

3.TrueorFalse.Amicrocontrollerhason-chipI/Oports.

4.TrueorFalse.AmicrocontrollerhasafixedamountofRAMonthechip.

5.Whatcomponentsareusuallyputtogetherwiththemicrocontrollerontoasinglechip?

6.Intel’sPentiumchipsusedinWindowsPCsneedexternal______and_____chipstostoredataandcode.

7.ListthreeembeddedproductsattachedtoaPC.

8.Whywouldsomeonewanttouseanx86asanembeddedprocessor?

9.Givethenameandthemanufacturerofsomeofthemostwidelyused8-bitmicrocontrollers.

10.InQuestion9,whichonehasthemostmanufacturesources?

11.Inabattery-basedembeddedproduct,whatisthemostimportantfactorinchoosingamicrocontroller?

12.Inanembeddedcontrollerwithon-chipROM,whydoesthesizeoftheROMmatter?

13.Inchoosingamicrocontroller,howimportantisittohavemultiplesourcesforthatchip?

14.Whatdoestheterm“third-partysupport”mean?

Section1.2:TheARMFamilyHistory

15.WhatdoesARMstandfor?

16.Trueorfalse.InARM,architectureshavethesamenamesasfamilies.

17.Trueorfalse.In1990s,ARMwaswidelyusedinmicroprocessorworld.

18.Trueorfalse.ARMiswidelyusedinAppleproducts,likeiPhoneandiPod.

19.Trueorfalse.CurrentlytheMicrosoftWindowsdoesnotsupportARMproducts.

20.Trueorfalse.AllARMchipshavestandardinstructions.

21.Trueorfalse.AllARMchipshavestandardperipherals

22.Trueorfalse.TheARMcorp.alsomanufacturestheARMchip.

23.Trueorfalse.TheARMIPmustbelicensedfromARMcorp.

24.Trueorfalse.AgivenserialcommunicationprogramiswrittenforTIARMchip.It

Page 29: ARM Assembly Language: Programming and Architecture

shouldworkwithoutanymodificationonFreescaleARMchip

25.Trueorfalse.AgivenAssemblylanguageprogramiswrittenforagivenfamilyofARMCortexchip.AnyotherCortexARMchipcanexecutetheprogram.

26.Trueorfalse.Atthepresenttime,ARMhasjustonemanufacturer.

27.WhatisthedifferencebetweentheARMproductsofdifferentmanufacturers?

28.Namesome32-bitmicrocontrollers.

29.WhatisIntel’schallengeindecreasingthepowerconsumptionofthex86?

Page 30: ARM Assembly Language: Programming and Architecture

AnswerstoReviewQuestionsSection1.1

1.True

2.Amicrocontroller-basedsystem

3.d

4.d

5.Itisdedicatedbecauseitdoesonlyonetypeofjob.

6.Embeddedsystemmeansthattheapplication(software)andtheprocessor(hardwaresuchasCPUandmemory)areembeddedtogetherintoasinglesystem.

7.Havingmultiplesourcesforagivenpartmeansyouarenothostagetoonesupplier.Moreimportantly,competitionamongsuppliersbringsaboutlowercostforthatproduct.

Section1.2

1.True

2.False

3.d

4.c

Page 31: ARM Assembly Language: Programming and Architecture
Page 32: ARM Assembly Language: Programming and Architecture
Page 33: ARM Assembly Language: Programming and Architecture

Chapter2:ARMArchitectureandAssemblyLanguageProgramming

CPUsuseregisterstostoredatatemporarily.ToprograminAssemblylanguage,wemustunderstand the registersandarchitectureofagivenCPUand the role theyplay inprocessing data. In Section 2.1we look at the general purpose registers (GPRs) of theARM.WedemonstratetheuseofGPRswithsimpleinstructionssuchasMOVandADD.Memory map and memory access of the ARM are discussed in Sections 2.2 and 2.3,respectively. In Section 2.4 we discuss the status register’s flag bits and how they areaffectedbyarithmeticinstructions.InSection2.5welookatsomewidelyusedAssemblylanguagedirectives,pseudo-code,anddata types related to theARM.InSection2.6weexamineAssembly languageandmachine languageprogramminganddefine termssuchas mnemonics, opcode, operand, and so on. The process of assembling and creating aready-to-runprogramfortheARMisdiscussedinSection2.7.Step-by-stepexecutionofan ARM program and the role of the program counter are examined in Section 2.8.Section2.9examinessomeARMaddressingmodes.ThemeritsofRISCarchitectureareexaminedinSection2.10.Section2.11discussestheKeilIDE.

Page 34: ARM Assembly Language: Programming and Architecture
Page 35: ARM Assembly Language: Programming and Architecture

Section2.1:TheGeneralPurposeRegistersintheARMCPUsuseregisterstostoredatatemporarily.ToprograminAssemblylanguage,we

mustunderstand the registersandarchitectureofagivenCPUand the role theyplay inprocessing data. In this sectionwe look at the general purpose registers (GPRs) of theARMandwedemonstrate the use ofGPRswith simple instructions such asMOVandADD.

ARMmicrocontrollershavemany registers for arithmetic and logicoperations. IntheCPU,registersareusedtostoreinformationtemporarily.Thatinformationcouldbeabyteofdatatobeprocessed,oranaddresspointingtothedatatobefetched.AllofARMregistersare32-bitwide.The32bitsofaregisterareshowninFigure2-1.TheserangefromtheMSB(most-significantbit)D31totheLSB(least-significantbit)D0.Witha32-bitdata type,anydata larger than32bitsmustbebrokeninto32-bitchunksbefore it isprocessed.AlthoughtheARMdefaultdatasizeis32-bitmanyassemblersalsosupportthesinglebit,8-bit,and16-bitdata types,aswewillseeinfuturechapters.The32-bitdatasizeoftheARMisoftenreferredasword.Thisisincontrasttox86CPUinwhichwordisdefined as 16-bit. InARM the 16-bit data is referred to as half-word. ThereforeARMsupportsbyte,half-word(twobyte),andword(fourbytes)datatypes.

Figure2-1:ARMRegistersDataSize

InARMthereare13generalpurposeregisters.TheyareR0–R12.SeeFigure2-2.Alloftheseregistersare32bitswide.

Figure2-2:ARMRegisters

The general purpose registers in ARM are the same as the accumulator in other

Page 36: ARM Assembly Language: Programming and Architecture

microprocessors.Theycanbeusedbyallarithmeticandlogicinstructions.Tounderstandthe use of the general purpose registers,wewill show it in the context of three simpleinstructions:MOV,ADD,andSUB.TheARMcorehasthreespecialfunctionregistersofR13, R14, and R15. We will examine their use in the next section. In some ARMprocessors,wealsohaveshadowregistersinvariousoperatingmodesdesignedtospeeduptheprogramexecutionwhenCPUswitchestask.

ARMInstructionFormat

TheARMCPUusesthetri-partinstructionformatformostinstructions.Oneofthemostcommonformatis:

instructiondestination,source1,source2

Depending on the instruction the source2 can be a register, immediate (constant)value,ormemory.Thedestinationisoftenaregisterorread/writememory.

MOVinstruction

Simply stated, the MOV instruction copies data into register or from register toregister.Ithasthefollowingformats:

MOVRn,Op2;loadRnregisterwithOp2(Operand2).

;Op2canbeimmediate

Op2canbeanimmediate(constant)number#Kwhichisan8-bitvaluethatcanbe0–255indecimal,(00–FFinhex).Op2canalsobearegisterRm.RnorRmareanyoftheregistersR0toR15.Ifweseetheword“immediate”,wearedealingwithaconstantvaluethat must be provided right there with the instruction. Notice the # before immediatevalues.ThefollowinginstructionloadstheR2registerwithavalueof0x25(25inhex).

MOVR2,#0x25;loadR2with0x25(R2=0x25)

ThefollowinginstructionloadstheR1registerwiththevalue0x87(87inhex).

MOVR1,#0x87;copy0x87intoR1(R1=0x87)

ThefollowinginstructionloadsR5withthevalueofR7.

MOVR5,R7;copycontentsofR7intoR5(R5=R7)

Notice the position of the source and destination operands. As you can see, theMOVloadstherightoperandintotheleftoperand.Inotherwords,thedestinationcomesfirst.

Towrite a comment inAssembly languagewe use ‘;’. It is the same as ‘//’ inClanguage,whichcauses theremainderof the lineofcodetobe ignored.For instance, intheaboveexamplestheexpressionsmentionedafter‘;’justexplainthefunctionalityoftheinstructionstoyou,anddonothaveanyeffectsontheexecutionoftheinstructions.

When programming the registers of theARMmicrocontrollerwith an immediatevalue,thefollowingpointsshouldbenoted:

Page 37: ARM Assembly Language: Programming and Architecture

1.Weput#infrontofeveryimmediatevalue.

2.Ifwewanttopresentanumberinhex,weputa0xinfrontofit.Ifweputnothinginfrontofanumber,itisindecimal.Forexample,in“MOVR1,#50”,R1isloadedwith50indecimal,whereasin“MOVR1,#0x50”,R1isloadedwith50inhex(80indecimal).

3.Ifvalues0toFFaremovedintoa32-bitregister,therestofthebitsareassumedtobeallzeros.Forexample, in“MOVR1,#0x5” the resultwillbeR1=0x00000005;thatis,R1=00000000000000000000000000000101inbinary.

4.Movinganimmediatevaluelargerthan255(FFinhex)intotheregisterwillcauseanerror.

Note!

Wecannotloadvalueslargerthan0xFF(255)intoregistersR0toR12usingtheMOVinstruction.Forexample,thefollowinginstructionisnotvalid:

MOVR5,#0x999999;invalidinstruction

ThereasonisthefactthatalthoughtheARMinstructionis32-bitwide,only8bitsofMOVinstructioncanbeusedasanimmediatevaluewhichcantakevaluesnotlargerthan0xFF(255).

ADDinstruction

TheADDinstructionhasthefollowingformat:

ADDRd,Rn,Op2;ADDRntoOp2andstoretheresultinRd

;Op2canbeImmediatevalue#K(Kisbetween0and255)

;orRegisterRm

TheADDinstructiontellstheCPUtoaddthevalueofRntoOp2andputtheresultinto the Rd (destination) register. Aswementioned before, Op2 can be an immediatevalue#Kbetween0–255indecimal(00–FFinhex)oraregisterRm.Toaddtwonumberssuchas0x25and0x34,onecandoanyofthefollowing:

MOVR1,#0x25;copy0x25intoR1(R1=0x25)

MOVR7,#0x34;copy0x34intoR1(R7=0x34)

ADDR5,R1,R7;addvalueR7toR1andputitinR5

;(R5=R1+R7)

or

MOVR1,#0x25;load(copy)0x25intoR1(R1=0x25)

ADDR5,R1,#0x34;add0x34toR1andputitinR5

;(R5=R1+0x34)

Page 38: ARM Assembly Language: Programming and Architecture

ExecutingtheabovelinesresultsinR5=0x59(0x25+0x34=0x59)

SUBinstruction

TheSUBinstructionislikeADDinstructionformat.ItsubtractsOp2fromRnandputtheresultinRd(destination)

SUBRd,Rn,Op2;Rd=Rn–Op2

Tosubtracttwonumberssuchas0x34and0x25,onecandothefollowing:

MOVR1,#0x34;load(copy)0x34intoR1(R1=0x34)

SUBR5,R1,#0x25;R5=R1–0x25(R1=0x34–0x25)

Theoldformat

NoticethatinmostofinstructionslikeADDandSUB,RncanbeomittedifRdandRnarethesame.ThisformatisnolongerrecommendedbyUnifiedAssemblerLanguage.

Forexample,eachpairofthefollowinginstructionsarethesame.

SUBR1,R1,#0x25;R1=R1-0x25

SUBR1,#0x25;R1=R1-0x25

SUBR1,R1,R2;R1=R1-R2

SUBR1,R2;R1=R1-R2

ADDR1,R1,#0x25;R1=R1+0x25

ADDR1,#0x25;R1=R1+0x25

ADDR1,R1,R2;R1=R1+R2

ADDR1,R2;R1=R1+R2

Figure2-3showsthegeneralpurposeregisters(GPRs)andtheALUinARM.TheeffectofarithmeticandlogicoperationsonthestatusregisterwillbediscussedinSection2.4.InTable2-1youseesomeoftheARM ALUinstructions.

Page 39: ARM Assembly Language: Programming and Architecture

Figure2-3:ARMRegistersandALU

Instruction Description

ADDRd,Rn,Op2* ADDRntoOp2andplacetheresultinRd

ADCRd,Rn,Op2 ADDRntoOp2withCarryandplacetheresultinRd

ANDRd,Rn,Op2 ANDRnwithOp2andplacetheresultinRd

BICRd,Rn,Op2 ANDRnwithNOTofOp2andplacetheresultinRd

CMPRn,Op2 CompareRnwithOp2andsetthestatusbitsofCPSR**

CMNRn,Op2 CompareRnwithnegativeofOp2andsetthestatusbits

EORRd,Rn,Op2 ExclusiveORRnwithOp2andplacetheresultinRd

MVNRd,Op2 PlaceNOTofOp2inRd

Page 40: ARM Assembly Language: Programming and Architecture

MOVRd,Op2 MOVE(Copy)Op2toRd

ORRRd,Rn,Op2 ORRnwithOp2andplacetheresultinRd

RSBRd,Rn,Op2 SubtractRnfromOp2andplacetheresultinRd

RSCRd,Rn,Op2 SubtractRnfromOp2withcarryandplacetheresultinRd

SBCRd,Rn,Op2 SubtractOp2fromRnwithcarryandplacetheresultinRd

SUBRd,Rn,Op2 SubtractOp2fromRnandplacetheresultinRd

TEQRn,Op2 Exclusive-ORRnwithOp2andsetthestatusbitsofCPSR

TSTRn,Op2 ANDRnwithOp2andsetthestatusbitsofCPSR

*Op2canbeanimmediate8-bitvalue#Kwhichcanbe0–255indecimal,(00–FFinhex).Op2canalsobearegisterRm.Rd,RnandRmareanyofthegeneralpurposeregisters

**CPSRisdiscussedlaterinthischapter

***Theinstructionsarediscussedindetailinthenextchapters

Table2-1:ALUInstructionsUsingGPRs

ReviewQuestions

1.Writeinstructionstomovethevalue0x34intotheR2register.

2.Writeinstructionstoaddthevalues0x16and0xCD.PlacetheresultintheR1register.

3.Trueorfalse.NovaluecanbemoveddirectlyintotheGPRs.

4.Whatisthelargesthexvaluethatcanbemovedintoa32-bitregisterusingMOVinstruction?Whatisthedecimalequivalentofthathexvalue?

5.AlloftheregistersintheARMare_____-bit.

Page 41: ARM Assembly Language: Programming and Architecture
Page 42: ARM Assembly Language: Programming and Architecture

Section2.2:TheARMMemoryMapInthissectionwediscussthememorymapforARMfamilymembers.

TheSpecialFunctionRegistersinARM

InARM theR13,R14,R15, andCPSR (current program status register) registersare called SFRs (special function registers) since each one is dedicated to a specificfunction.Agivenspecialfunctionregisterisdedicatedtospecificfunctionsuchasstatusregister,programcounter,stackpointer,andsoon.ThefunctionofeachSFRisfixedbytheCPUdesigneratthetimeofdesignbecauseitisusedforcontrolofthemicrocontrollerorkeepingtrackofspecificCPUstatus.ThefourSFRsofR13,R14,R15,andCPSRplayanextremelyimportantrole inARM.TheR13issetasideforstackpointer.TheR14isdesignatedaslinkregisterwhichholdsthereturnaddresswhentheCPUcallsasubroutineandtheR15istheprogramcounter(PC).ThePC(programcounter)pointstotheaddressof thenext instructiontobeexecutedaswewillseeinnextsection.TheCPSR(currentprogramstatusregister)isusedforkeepingconditionflagsamongotherthings,aswewillsee in Section 2.4. In contrast to SFRs, the GPRs (R0-R12) do not have any specificfunction and are used for storing general data. For this reason some ARM instructionformats(theThumb)haveonlyR0-7buteveryvariationofARMchiphasR13-R15SFRs.The Thumb instruction format is designed to compete with the 8- and 16-bitmicrocontrollersandincreasecodedensity.

ProgramCounterintheARM

OneofthemostimportantregisterintheARMmicrocontrolleristhePC(programcounter).Aswementionedearlier,theR15istheprogramcounter.TheprogramcounterisusedbytheCPUtopointtotheaddressofthenextinstructiontobeexecuted.AstheCPUfetches theopcodefromtheprogrammemory, theprogramcounter is incrementedautomatically to point to the next instruction.Thewider the programcounter, themorememorylocationsaCPUcanaccess.Thatmeansthata32-bitprogramcountercanaccessamaximumof4G(232=4G)bytesofprogrammemorylocations.

InARMmicrocontrollerseachmemorylocationisabytewide.Inthecaseofa32-bit program counter, the code space is 4G bytes (232 = 4G), which occupies the0x00000000–0xFFFFFFFFaddressrange.TheprogramcounterintheARMfamilyis32bits wide. This means that the ARM family can access addresses 0x00000000 to0xFFFFFFFF,atotalof4Gbytesoflocations(memoryspaces).Althoughthis4Gbytesofmemory spacecanbeallocated toon-chiporoff-chipmemory;however, at the timeofthiswriting,noneofthemembersoftheARMmicrocontrollerfamilyhavetheentire4Gbytesofon-chipmemorypopulated.SeeTable2-2.

Company Device Flash(KBytes) RAM(KBytes) I/OPins

Atmel AT91SAM7X512 512 128 62

NXP LPC2367 512 58 70

Page 43: ARM Assembly Language: Programming and Architecture

ST STR750FV2 256 16 72

TI TMS470R1A256 256 12 49

Freescale MK10DX256VML7 256 64 74

Table2-2:On-chipMemorySizeforsomeARMChips

MemoryspaceallocationintheARM

TheARMhas4Gbytesofdirectlyaccessiblememoryspace.Thismemoryspacehasaddresses0to0xFFFFFFFF.The4Gbytesofmemoryspacecanbedividedintofivesections.Theyareasfollows:

1.On-chipperipheralandI/Oregisters:ThisareaisdedicatedtogeneralpurposeI/O(GPIO)andspecialfunctionregisters(SFRs)ofperipheralssuchastimers,serialcommunication,ADC,andsoon.Inotherwords,ARMusesmemory-mappedI/O.See Chapter 0 for discussion of memory-mapped I/O. The function and addresslocationofeachSFRisfixedbythechipvendoratthetimeofdesignbecauseitisused forport registersofperipherals.Thenumberof locations set aside forGPIOregistersandSFRsdependsonthepinnumbersandperipheralfunctionssupportedbythatchip.Thatnumbercanvaryfromchiptochipevenamongmembersofthesamefamily fromthesamevendor.Due to the fact thatARMdoesnotdefine thetype and number of I/O peripherals one must not expect to have same addresslocationsfortheperipheralregistersamongvariousvendors.

2. On-chipdataSRAM:ARAMspace ranging froma fewkilobytes to severalhundredkilobytesissetasidemainlyfordatastorage.ThedataRAMspaceisusedfordatavariablesandstackandisaccessedbythemicrocontrollerinstructions.Thedata RAM space is read/write memory used by the CPU for storage of datavariables, scratch pad, and stack. The ARM microcontrollers’ data SRAM sizeranges from 2K bytes to several thousand kilobytes depending on the chip. Evenwithinthesamefamily,thesizeofthedataSRAMspacevariesfromchiptochip.AlargerdataSRAMsizemeansmoredifficultiesinmanagingtheseRAMlocationsifyou use Assembly language programming. In today’s high-performancemicrocontroller, however, with over a thousand bytes of data RAM, the job ofmanagingthemishandledbytheCcompilers.Indeed,theCcompilersaretheveryreasonweneedalargedataRAMsinceitmakesiteasierforCcompilerstostoreparametersandallowsthemtoperformtheirjobsmuchfaster.TheamountandthelocationoftheSRAMspacevaryfromchiptochipintheARMchips.AsectionofthedataRAMspaceisusedbystackaswewillseeinChapter6.AlthoughmanyoftheARMmicrocontrollersusedinembeddedproductstheSRAMspaceisusedfordata;onecanalsobuyordesignanARM-basedsysteminwhichtheRAMspaceisusedforbothdataandprogramcodes.Thisiswhatweseeinthex86PCs.Insuchsystemsnormallyoneconnects theARM CPUtoexternalDRAMand theDRAMmemoryisusedforbothcodeanddata.MicrosoftWindows8usessuchasystemforARM-basedTabletcomputers.

Page 44: ARM Assembly Language: Programming and Architecture

3.On-chipEEPROM:Ablockofmemoryfrom1Kbytestoseveralthousandbytesis set aside forEEPROMmemory.Theamount and the locationof theEEPROMspace vary from chip to chip in the ARM microcontrollers. Although in someapplicationstheEEPROMisusedforprogramcodestorage,itisusedmostoftenforsavingcriticaldata.NotallARMchipshaveon-chipEEPROM.

4.On-chipFlashROM:Ablockofmemoryfromafewkilobytestoseveralhundredkilobytesissetasideforprogramspace.Theprogramspaceisusedfortheprogramcode.Intoday’sARM microcontrollerchips,thecodeROMspaceisofFlashtypememory.The amount and the location of the codeROMspace vary fromchip tochip in the ARM products. See Table 2-2 and Examples 2-1 and 2-2. The FlashmemoryofcodeROMisunderthecontrolofthePC(programcounter).ThecodeROMmemorycanalsobeusedforstorageofstaticfixeddatasuchasASCIIdatastringsandlook-uptables.

Example2-1

AgivenARMchiphasthefollowingaddressassignments.Calculatethespaceandtheamountofmemorygiventoeachsection.

(a)Addressrangeof0x00100000–0x00100FFFforEEPROM

(b)Addressrangeof0x40000000–0x40007FFFforSRAM

(c)Addressrangeof0x00000000–0x0007FFFFforFlash

(d)Addressrangeof0xFFFC0000–0xFFFFFFFFforperipherals

Solution:

(a)Withaddressspaceof0x00100000to00100FFF,wehave00100FFF–00100000=0FFFbytes.Converting0FFFtodecimal,weget4,095+1,whichisequalto4Kbytes.

(b)Withaddressspaceof0x40000000to0x40007FFF,wehave40007FFF–40000000=7FFFbytes.Converting7FFFtodecimal,weget32,767+1,whichisequalto32Kbytes.

(c)Withaddressspaceof0000to7FFFF,wehave7FFFF–0=7FFFFbytes.Converting7FFFFtodecimal,weget524,287+1,whichisequalto512Kbytes.

(d)WithaddressspaceofFFFC0000toFFFFFFFF,wehaveFFFFFFFF–FFFC0000=3FFFFbytes.Converting3FFFFtodecimal,weget262,143+1,whichisequalto256Kbytes.

SeeFigure2-4.

Example2-2

Page 45: ARM Assembly Language: Programming and Architecture

FindtheaddressspacerangeofeachofthefollowingmemoryofanARMchip:

(a)2KBofEEPROMstartingataddress0x80000000

(b)16KBofSRAMstartingataddress0x90000000

(c)64KBofFlashROMstartingataddress0xF0000000

Solution:

(a)With2Kbytesofon-chipEEPROMmemory,wehave2048bytes(2×1024=2048).Thismapstoaddresslocationsof0x80000000to0x800007FF.Noticethat0isalwaysthefirstlocation.

(b)With16Kbytesofon-chipSRAM memory,wehave16,384bytes(16×1024=16,384),and16,384locationsgives0x90000000–0x90003FFF.

(c)With64Kwehave65,536bytes(64×1024=65,536),therefore,thememoryspaceis0xF0000000to0xF000FFFF.

Figure2-4:AnExampleofARMMemoryAllocation

5. Off-chipDRAM space: A DRAMmemory ranging from fewmegabytes toseveralhundredmegabytescanbe implemented for externalmemoryconnection.Althoughmany of theARMmicrocontrollers used in embedded products use theon-chipSRAMfordata, one can alsodesign anARM-based system inwhich theRAMisusedforbothdataandprogramcodes.Thisiswhatweseeinthex86PCs.In such systems one connects theARM CPU to externalDRAM and theDRAMmemoryisusedforbothcodeanddata,justlikethex86PCs.ManyARMvendors

Page 46: ARM Assembly Language: Programming and Architecture

are pushing theARM11 chip for the high-end of themarket such as servers anddatabasecomputers. In anARM11-based server computers, theexternal (off-chip)DRAM is used and managed by the operating system while on-chip Flash,EEPROM, and SRAMmemories are used for BIOS (basic input output system),POST (power on self test), and CPU scratch pad, respectively. In such cases thesystem is not different from x86 computers currently in use, except it usesARMCPU instead of a Pentium chip from Intel or x86 from AMD. The MicrosoftWindows8usesARMmotherboardwithoff-chipDRAM

Notice thefollowingdifferencesamong theon-chipFlashROM,dataSRAM,andEEPROMmemoriesinARM microcontrollersusedforembeddedproducts:

a)The data SRAM is used by theCPU for data variables and stack,whereas theEEPROMsareconsideredtobememorythatonecanalsoaddexternallytothechip.Inotherwords, whilemanyARMmicrocontroller chips have no EEPROMmemory, it isveryunlikelyforanARMmicrocontrollertohavenoon-chipdataSRAM.

b)Theon-chipFlashROMisusedforprogramcode,while theEEPROMisusedmostoftenforcriticalsystemdatathatmustnotbelostifpoweriscutoff.RememberthatdataSRAMisvolatilememoryanditscontentsarelostifthepowertothechipiscutoff.SincevolatiledataSRAMisused fordynamicvariables (constantlychangingdata)andstack.WeneedEEPROMmemorytosecurecriticalsystemdatathatdoesnotchangeveryoftenandwillnotbelostintheeventofpowerfailure.

c)Theon-chipFlashROMisprogrammedanderasedinblocksize.Theblocksizeis8,16,32,or64bytesormoredependingonthechiptechnology.That isnot thecasewith EEPROM, since the EEPROM is byte programmable and erasable. Both theEEPROMandFlashmemorieshave limitednumberoferase/writecycles,whichcanbe100,000 or more. While all semiconductor memories have unlimited number of reads(accesses),thenumberoftimesthatwecaneraseandwritetotheFlashandEEPROMarelimitedtoaround100,000timesatthetimeofthiswriting.Asthisnumberincreases,thelikelihoodthatwewillhaveharddrivemadeoutofFlashmemorywillalsoincrease.WehavealreadyseenhowtheFlashstickmemoryhasreplacedthefloppydriveswhichweresocommonintoearly2000s.

MemorymappedI/OintheARM

TraditionalCPUssuchasx86had twodistinct spaces: the I/Ospaceandmemoryspace.Inx86,whilealloftheI/OportsareaccessedusingINandOUTinstructions,thememoryaddressspaceisaccessedusingtheMOVinstruction.IntheARMCPUwehaveonlyonespaceanditismemoryspaceanditcanbeashighas4Gigabytes.TheARMusesthis4GigabytesforbothmemoryandI/Ospace.ThismappingoftheI/OportstomemoryspaceiscalledmemorymappedI/OandwasdiscussedinChapter0.

ReviewQuestions

1.Trueorfalse.TheGPRregistersareusedforstoringdata.

2.TheR0-R12registersarecalled______.

Page 47: ARM Assembly Language: Programming and Architecture

3.TheGPRregistersinARMare_____-bit.

4.TheR13-R15registersarecalled_____.

5.TheSFRregistersinARMare______-bit.

Page 48: ARM Assembly Language: Programming and Architecture
Page 49: ARM Assembly Language: Programming and Architecture

Section2.3:LoadandStoreInstructionsinARMTheinstructionswehaveusedsofarworkedwiththeimmediate(constant)valueof

KandtheGPRs.TheyalsousedtheGPRsastheirdestination.Wesawsimpleexamplesof using MOV and ADD earlier in Section 2.1. The ARM allows direct access to alllocations in the memory. In this section we show the instructions accessing variouslocations of the memory. This is one of the most important sections in the book formastering the topic of ARM Assembly language programming. Before we embark onstudying the load and store instructions of theARM,wemust note the fact that all theinstructions of theARM are 32-bit wide. In other words, every instruction of ARM isfixedat32-bit.Aswewillseeinlatersection,thefixedsizeinstructionisoneofthemostimportantcharacteristicsofRISCarchitecture.Incaseswherethereisnoneedforallthe32-bit,theARMhasaddedzerotomaketheinstructionfixedat32-bit.

LDRRd,[Rx]instruction

LDRRd,[Rx];loadRdwiththecontentsoflocationpointed

;tobyRxregister.Rxisanaddressbetween

;0x00000000to0xFFFFFFFF

TheLDRinstruction tells theCPUto load(bring in)oneword(32-bitor4bytes)fromabaseaddresspointedtobyRxintotheGPR.Afterthisinstructionisexecuted,theRdwillhavethesamevalueasfourconsecutivelocationsinthememory.Obviouslysinceeachmemorylocationcanholdonlyonebyte(ARMisabyteaddressableCPU),andourGPR is 32-bit, the LDR will bring in 4 bytes of data from 4 consecutive memorylocations.ThelocationscanbeintheSRAMoraFlashmemory.Forexample,the“LDRR2,[R5]” instructionwill copy the contents ofmemory locations pointed to byR5 intoregisterR2.SincetheR2registeris32-bitwide,itexpectsa32-bitoperandintherangeof0x00000000 to 0xFFFFFFFF.Thatmeans theR5 register gives the base address of thememory inwhich it holds the data. Therefore if R5=0x80000, theCPUwill fetch intoregisterR2,thecontentsofmemorylocations0x80000,0x80001,0x80002,and0x80003.

Thefollowing instruction loadsR7with thecontentsof location0x40000200.SeeFigure2-5.

;assumeR5=0x40000200

LDRR7,[R5];loadR7withthecontentsoflocations

;0x40000200-0x40000203

Page 50: ARM Assembly Language: Programming and Architecture

Figure2-5:ExecutingtheLDRInstruction

STRRx,[Rd]instruction

STRRx,[Rd];storeregisterRxintolocationspointedtobyRd

TheSTRinstructiontellstheCPUtostore(copy)thecontentsoftheGPRtoabaseaddress location pointed to by the Rd register. Notice that the source register of STRinstruction isplacedbefore thedestination register.ObviouslysinceGPRis32-bitwide(4-byte)weneed four consecutivememory locations to store the contents ofGPR.ThememorylocationsmustbewritablesuchasSRAM.SeeFigure2-6.The“STRR3,[R6]”instruction will copy the contents of R3 into locations pointed to by R6. Locations0x40000200 through 0x40000203 of the SRAMmemory will have the contents of R3sinceR6=0x40000200.

Figure2-6:ExecutingtheSTRInstruction

ThefollowinginstructionstoresthecontentsofR5intolocationspointedtobyR1.Assume0x40000340isanaddressofinternalRAMlocationsandheldbyregisterR1.

;assumeR1=0x40000340

STRR5,[R1];storeR5intolocationspointedtobyR1.

LDRBRd,[Rx]instruction

LDRBRd,[Rx];loadRdwiththecontentsofthelocation

;pointedtobyRxregister.

The LDRB instruction tells the CPU to load (copy) one byte from an addresspointed tobyRxinto the lowerbyteofRd.After this instruction isexecuted, the lower

Page 51: ARM Assembly Language: Programming and Architecture

byte ofRdwill have the same value asmemory location pointed to byRx. Itmust benoted that theunusedportion (theupper24bits)of theRd registerwillbeall zeros, asshowninFigure2-7.

Figure2-7

LDRvs.LDRB

Aswementioned earlier,we canuse theLDR instruction to copy the contentsoffourconsecutivememorylocationsintoa32-bitGPR.Therearesituationsthatwedonotneed to bring 4 bytes of data intoGPR.AnUART register is such a case. TheUARTregistersaregenerally8-bitand takeonlyonememoryspace location(memorymappedI/O).UsingLDRB,we canbring intoGPRa single byte of data fromUART registers.Thisisawidelyusedinstructionforaccessingthe8-bitI/Oandperipheralports.

STRBRx,[Rd]instruction

STRBRx,[Rd];storethebyteinregisterRxinto

;locationpointedtobyRd

The STRB instruction tells the CPU to store (copy) the byte value in Rx to anaddress location pointed to by the Rd register. After this instruction is executed, thememorylocationspointedtobytheRdwillhavethesamebyteasthelowerbyteoftheRx,asshowninFigure2-8.

Figure2-8

ThefollowingprogramfirstloadstheR1registerwithvalue0x55,thenstoresthisvalueintolocation0x40000100:

;assumeR5=0x40000100

Page 52: ARM Assembly Language: Programming and Architecture

MOVR1,#0x55;R1=55(inhex)

STRBR1,[R5];copyR1locationpointedtobyR5

GeneralpurposeI/O(GPIO)arepartofthespecialfunctionregistersinthememory-mapped I/O. They are connected to the I/O pins of the ARMmicrocontroller. See thedatasheetofyourARMmicrocontroller.WecanalsostorethecontentsofaGPRintoanylocationintheSRAMregionofthedataspace.SeeExamples2-3and2-4.

Example2-3

StatethecontentsofRAMlocations0x92to0x96afterthefollowingprogramisexecuted:

MOVR1,#0x99;R1=0x99

MOVR6,#0x92;R6=0x92

STRBR1,[R6];storeR1intolocationpointedtobyR6

;(location0x92)

ADDR6,R6,#1;R6=R6+1

MOVR1,#0x85;R1=0x85

STRBR1,[R6];storeR1intolocationpointedtobyR6

;(location0x93)

ADDR6,R6,#1;R6=R6+1

MOVR1,#0x3F;R1=0x3F

STRBR1,[R6];storeR1intolocationpointedtobyR6

ADDR6,R6,#1;R6=R6+1

MOVR1,#0x63;R1=0x63

STRBR1,[R6];storeR1intolocationpointedtobyR6

ADDR6,R6,#1;R6=R6+1

MOVR1,#0x12;R1=0x12

STRBR1,[R6]

Solution:

AftertheexecutionofSTRBR1,[R6]datamemorylocation0x92hasvalue0x99;

aftertheexecutionofSTRBR1,[R6]datamemorylocation0x93hasvalue0x85;

aftertheexecutionofSTRBR1,[R6]datamemorylocation0x94hasvalue0x3F;andsoon,asshowninthechart.

Address Data

Page 53: ARM Assembly Language: Programming and Architecture

0x92 0x99

0x93 0x85

0x94 0x3F

0x95 0x63

0x96 0x12

Example2-4

StatethecontentsofR2,R1,andmemorylocation0x20afterthefollowingprogram:

MOVR2,#0x5;loadR2with5(R2=0x05)

MOVR1,#0x2;loadR1with2(R1=0x02)

ADDR2,R1,R2;R2=R1+R2

ADDR2,R1,R2;R2=R1+R2

MOVR5,#0x20;R5=0x20

STRBR2,[R5];storeR2intolocationpointedtobyR5

Solution:

TheprogramloadsR2withvalue5.ThenitloadsR1withvalue2.ThenitaddstheR1registertoR2twice.Attheend,itstorestheresultinlocation0x20ofmemory.

AfterMOVR2,#0x05

Location Data

R2 5

R1

0x20

AfterMOVR1,#0x02

Location Data

R2 5

R1 2

Page 54: ARM Assembly Language: Programming and Architecture

0x20

AfterADDR2,R1,R1

Location Data

R2 7

R1 2

0x20

AfterADDR2,R1,R1

Location Data

R2 9

R1 2

0x20

AfterSTRB[R5],R2

Location Data

R2 9

R1 2

0x20 9

STRvs.STRB

Aswementionedearlier,wecanusetheSTRinstructiontocopythecontentsof32-bitGPR into four consecutivememory locations. The I/O ports are generally 8-bit andtakeonlyonememoryspacelocation(memorymappedI/O).UsingSTRB,wecansendabyteofdata fromGPR tomemory locationsuchasan I/Oport.Again, this isawidelyusedinstructionforaccessingthe8-bitI/Oandperipheralports.

LDRHRd,[Rx]instruction

Page 55: ARM Assembly Language: Programming and Architecture

LDRHRd,[Rx];loadRdwiththehalf-wordpointed

;tobyRxregister

TheLDRH instruction tells theCPU to load (copy) half-word (16-bit or 2 bytes)from a base address pointed to byRx into the lower 16-bits ofRdRegister.After thisinstructionisexecuted,thelower16-bitofRdwillhavethesamevalueastwoconsecutivelocations in the memory pointed to by base address of Rx. It must be noted that theunusedportion(theupper16bits)oftheRdregisterwillbeallzeros,asshowninFigure2-9.

Figure2-9

Table2-3comparesLDRB,LDRH,andLDR.

DataSize Bits Decimal Hexadecimal Loadinstructionused

Byte 8 0–255 0-0xFF LDRB

Half-word 16 0–65535 0-0xFFFF LDRH

Word 32 0–232-1 0-0xFFFFFFFF LDR

Table2-3:UnsignedDataRangeinARMandassociatedLoadInstructions

STRHRx,[Rd]instruction

STRHRx,[Rd];storehalf-word(2-byte)inregisterRx

;intolocationspointedtobyRd

TheSTRHinstructiontellstheCPUtostore(copy)thelower16-bitcontentsoftheRxtoanaddresslocationpointedtobytheRdregister.Afterthisinstructionisexecuted,thememorylocationspointedtobytheRdwillhavethesamevalueasthelower16-bitofRxRegister.Thelocationsarepartofthedataread/writememoryspacesuchason-chipSRAM.Forexample,the“STRHR3,[R6]”instructionwillcopythe16-bitlowercontentsofR3intotwoconsecutivelocationspointedtobybaseregisterR6.AsyoucanseeinFigure2-10,locations0x2000and0x2001oftheSRAMmemorywillhavethecontentsofthelowerhalfwordofR3sinceR6=0x2000.

Page 56: ARM Assembly Language: Programming and Architecture

Figure2-10

InTable2-4youseeacomparisonbetweenSTRB,STRH,andSTR.

DataSize Bits Decimal Hexadecimal Loadinstructionused

Byte 8 0–255 0-0xFF STRB

Half-word 16 0–65535 0-0xFFFF STRH

Word 32 0–232-1 0-0xFFFFFFFF STR

Table2-4:UnsignedDataRangeinARMandassociatedStoreInstructions

ReviewQuestions

1.Trueorfalse.No32-bitvaluecanbeloadeddirectlyintointernalR0-R12.

2.Writeinstructionstoloadvalue0x95intolocationwithaddress0x20.

3.WriteinstructionstomovethecontentsofR2tomemorylocationpointedtobyR8.

4.Writeinstructionstoloadvaluesfrommemorylocations0x20–0x23toR4register.

5.Whatisthelargesthexvaluethatcanbemovedintoasinglelocationinthedatamemory?Whatisthedecimalequivalentofthehexvalue?

6.“LDRR6,[R3]”putstheresultin_____.

7.Whatdoes“STRBR1,[R2]”do?

8.Whatisthelargesthexvaluethatcanbemovedintofourconsecutivelocationsinthedatamemory?Whatisthedecimalequivalentofthehexvalue?

Page 57: ARM Assembly Language: Programming and Architecture
Page 58: ARM Assembly Language: Programming and Architecture

Section2.4:ARMCPSR(CurrentProgramStatusRegister)Likeallothermicroprocessors, theARMhasa flag register to indicate arithmetic

conditions such as the carry bit. The flag register in the ARM is called the currentprogramstatusregister(CPSR).Inthissection,wediscussvariousbitsofthisregisterandprovidesomeexamplesofhowitisaltered.Chapters3and4showhowtheflagbitsofthestatusregisterareused.

ARMcurrentprogramstatusregister

The status register is a 32-bit register. See Figure 2-11 for the bits of the statusregister.ThebitsC,Z,N,andVarecalledconditionalflags,meaningthat theyindicatesomeconditionsthatresultafteraninstructionisexecuted.Eachoftheconditionalflagscanbeusedtoperformaconditionalbranch(jump),aswewillseeinChapter4.

Figure2-11:CPSR(CurrentProgramStatusRegister)

The following is a brief explanationof the flagbits of the current program statusregister(CPSR).Theimpactofinstructionsonthisregisteristhendiscussed.

C,thecarryflag

This flag is set whenever there is a carry out from the D31 bit. This flag bit isaffectedaftera32-bitadditionorsubtraction.Chapter4showshowthecarryflagisused.

Z,thezeroflag

Thezeroflagreflects theresultofanarithmeticor logicoperation. If theresult iszero,thenZ=1.Therefore,Z=0iftheresultisnotzero.SeeChapter4toseehowweusetheZflagforlooping.

N,thenegativeflag

BinaryrepresentationofsignednumbersusesD31asthesignbit.Thenegativeflagreflectstheresultofanarithmeticoperation.IftheD31bitoftheresultiszero,thenN=0andtheresultispositive.IftheD31bitisone,thenN=1andtheresultisnegative.Thenegative and V flag bits are used for the signed number arithmetic operations and arediscussedinChapter5.

V,theoverflowflag

This flag is set whenever the result of a signed number operation is too large,causingthehigh-orderbittooverflowintothesignbit.Ingeneral,thecarryflagisusedtodetect errors inunsignedarithmeticoperationswhile theoverflow flag isused todetecterrorsinsignedarithmeticoperations.TheVandNflagbitsareusedforsignednumberarithmeticoperationsandarediscussedinChapter5.

TheTflagbitisusedtoindicatetheARMisinThumbstate.TheIandFflagsareusedtoenableordisabletheinterrupt.SeetheARMmanual.

Page 59: ARM Assembly Language: Programming and Architecture

Ssuffixandthestatusregister

MostofARMinstructionscanaffectthestatusbitsofCPSRaccordingtotheresult.Ifwe need an instruction to update the value of status bits inCPSR,we have to put Ssuffixattheendofinstructions.Thatmeans,forexample,ADDSinsteadofADDisused.

ADDinstructionandthestatusregister

NextweexaminetheimpactoftheSUBSandADDSinstructionsontheflagbitsCandZofthestatusregister.Someexamplesshouldclarifytheirmeanings.AlthoughalltheflagbitsC,Z,V,andNareaffectedbytheADDSandSUBSinstruction,wewillfocusonflagsCandZfornow.TheotherflagbitsarediscussedinChapter5,becausetheyrelateonlytosignednumberoperations.ExamineExample2-5toseetheimpactoftheADDSinstruction on selected flag bits. See also Example 2-6 to see the impact of the SUBSinstructiononselectedflagbits.

Example2-5

ShowthestatusoftheCandZflagsaftertheadditionof

a)0x0000009Cand0xFFFFFF64inthefollowinginstruction:

;assumeR1=0x0000009CandR2=0xFFFFFF64

ADDSR2,R1,R2;addR1toR2andplacetheresultinR2

b)0x0000009Cand0xFFFFFF69inthefollowinginstruction:

;assumeR1=0x0000009CandR2=0xFFFFFF69

ADDSR2,R1,R2;addR1toR2andplacetheresultinR2

Solution:

a)

0x0000009C 00000000000000000000000010011100

+0xFFFFFF64

+11111111111111111111111101100100

0x100000000 100000000000000000000000000000000

Page 60: ARM Assembly Language: Programming and Architecture

C=1becausethereisacarrybeyondtheD31bit.

Z=1becausetheR2(theresult)hasvalue0initaftertheaddition.

b)

0x0000009C 00000000000000000000000010011100

+0xFFFFFF69

+11111111111111111111111101101001

0x100000005 100000000000000000000000000000101

C=1becausethereisacarrybeyondtheD31bit.

Z=0becausetheR2(theresult)doesnothavevalue0initaftertheaddition.(R2=0x00000005)

Example2-6

ShowthestatusoftheZflagduringtheexecutionofthefollowingprogram:

MOVR2,#4;R2=4

MOVR3,#2;R3=2

MOVR4,#4;R4=4

SUBSR5,R2,R3;R5=R2-R3(R5=4-2=2)

SUBSR5,R2,R4;R5=R2-R4(R5=4-4=0)

Solution:

TheZflagisraisedwhentheresultiszero.Otherwise,itiscleared(zero).Thus:

After ValueofR5 Zflag

SUBSR5,R2,R3 2 0

SUBSR5,R2,R4 0 1

Notallinstructionsaffecttheflags

Page 61: ARM Assembly Language: Programming and Architecture

SomeinstructionsaffectallthefourflagbitsC,Z,V,andN(e.g.,ADDS).Butsomeinstructions affectno flagbits at all.Thebranch instructions are in this category.Someinstructionsaffectonlysomeof theflagbits.The logic instructions(e.g.,ANDS)are inthiscategory.

Table 2-5 shows the instructions and the flag bits affected by them.AppendixAprovidesacompletelistofalltheinstructionsandtheirassociatedflagbits.

Instruction FlagsAffected

ANDS C,Z,N

ORRS C,Z,N

MOVS C,Z,N

ADDS C,Z,N,V

SUBS C,Z,N,V

B Noflags

NotethatwecannotputSafterBinstruction.

Table2-5:FlagBitsAffectedbyDifferentInstructions

Flagbitsanddecisionmaking

Thereareinstructionsthatwillmakeaconditionaljump(branch)basedonthestatusof the flag bits. Table 2-6 provides some of these instructions. Chapter 4 discusses theconditionalbranchinstructionsandhowtheyareused.

Instruction FlagsAffected

BCS BranchifC=1

BCC BranchifC=0

BEQ BranchifZ=1

BNE BranchifZ=0

BMI BranchifN=1

BPL BranchifN=0

BVS BranchifV=1

BVC BranchifV=0

Table2-6:ARMBranch(Jump)InstructionsUsingFlagBits

ReviewQuestions

Page 62: ARM Assembly Language: Programming and Architecture

1.TheflagregisterintheARMiscalledthe________.

2.WhatisthesizeoftheflagregisterintheARM?

3.FindtheCandZflagbitsforthefollowingcode:

;assumeR2=0xFFFFFF9F

;assumeR1=0x00000061

ADDSR2,R1,R2

4.FindtheZflagbitforthefollowingcode:

;assumeR7=0x22

;assumeR3=0x22

ADDSR7,R3,R7

5.FindtheCandZflagbitsforthefollowingcode:

;assumeR2=0x67

;assumeR1=0x99

ADDSR2,R1,R2

Page 63: ARM Assembly Language: Programming and Architecture
Page 64: ARM Assembly Language: Programming and Architecture

Section2.5:ARMDataFormatandDirectivesInthissectionwelookatsomewidelyuseddataformatsanddirectivessupportedby

theARMassembler.

ARMdatatype

ARMhasfourdatatypes.Theyarebit,byte(8-bit),half-word(16-bit)andword(32bit).DuetothefactthatARMregistersare32-bititisthejobofprogrammer/compilertobreakdowndatalargerthan32bitstobeprocessedbytheCPU.ThedatatypesusedbytheARMcanbepositiveornegative.AdiscussionofsignednumbersisgiveninChapter5.

Dataformatrepresentation

There are several ways to represent a byte of data in the ARM assembler. Thenumberscanbeinhex,binary,decimal,orASCIIformats.Thefollowingareexamplesofhoweachworks.

Hexnumbers

TorepresentHexnumbersinanARMassemblerweput0x(or0X)infrontofthenumberlikethis:

MOVR1,#0x99

Hereareafewlinesofcodethatusethehexformat:

MOVR2,#0x75;R2=0x75

MOVR1,#0x11

ADDR2,R1,R2;R2=R1+R2=0x75+0x11=0x86

Decimalnumbers

ToindicatedecimalnumbersinsomeARMassemblerssuchasKeilwesimplyusethedecimal(e.g.,12)andnothingbeforeorafterit.Herearesomeexamplesofhowtouseit:

MOVR7,#12;R7=00001100or0Cinhex

MOVR1,#32;R1=32=0x20

Binarynumbers

To represent binary numbers in an ARM assembler we put 2_ in front of thenumber.Itisasfollows:

MOVR6,#2_10011001;R6=10011001inbinaryor99inhex

Numbersinanybasebetween2and9

To indicate a number in any base n between 2 and 9 in an ARM assembler wesimplyusethen_infrontofit.Herearesomeexamplesofhowtouseit:

Page 65: ARM Assembly Language: Programming and Architecture

MOVR7,#8_33;R7=33inbase8or011011inbinaryformat

MOVR6,#2_10011001;R6=10011001inbase2or99inhex

ASCIIcharacters

TorepresentASCIIdatainanARMassemblerweusesinglequotesasfollows:

LDRR3,#‘2’;R3=00110010or32inhex(SeeAppendixF)

This is the same as other assemblers such as the 8051 and x86. Here is anotherexample:

LDRR2,#‘9’;R2=0x39,whichishexnumberforASCII‘9’

Torepresentastring,doublequotesareused;andfordefiningASCIIstrings(morethanonecharacter),weusetheDCBdirective.

Assemblerdirectives

While instructions tell the CPU what to do, directives (also called pseudo-instructions) give directions to the assembler. For example, the MOV and ADDinstructionsarecommandstotheCPU,butEQU,END,andENTRYaredirectivestotheassembler.The followingsectionpresentssomewidelyuseddirectivesof theARMandhow they are used. The directives help us develop our program easier and make ourprogramlegible(morereadable).Table2-7showssomeassemblerdirectives.

Directive Description

AREA Instructstheassemblertoassembleanewcodeordatasection

END Informstheassemblerthatithasreachedtheendofasourcefile.

ENTRY Declaresanentrypointtoaprogram.

EQU Givesasymbolicnametoanumericconstant,aregister-relativevalueoraPC-relativevalue.

INCLUDE Itaddsthecontentsofafiletoourprogram.

Table2-7:SomeWidelyUsedARMDirective

AREA

TheAREA directive tells the assembler to define a new section ofmemory. Thememory can be code (instruction) or data and can have attributes such as ReadOnly,ReadWrite, and so on. This iswidely used to define one ormore blocks of indivisiblememoryforcodeordatatobeusedbythelinker.EveryAssemblylanguageprogramhasatleastoneAREA.Thefollowingistheformat:

AREAsectionname,attribute,attribute,…

ThefollowinglinedefinesanewareanamedMY_ASM_PROG1whichhasCODEandREADONLYattributes:

Page 66: ARM Assembly Language: Programming and Architecture

AREAMY_ASM_PROG1,CODE,READONLY

Among widely used attributes are CODE, DATA, READONLY, READWRITE,COMMON,andALIGN.Thefollowingdescribesthesewidelyusedattributes.

READWRITE isanattributegiven toanareaofmemorywhichcanbereadfromandwrittento.SinceitisREADWRITEsectionoftheprogramitisbydefaultforDATA.InARMAssemblylanguageweusethisareatosetasideSRAMmemoryforscratchpadandstack.TheAssemblerputstheREADWRITEsectionsnexttoeachotherintheSRAMmemory.

READONLY is an attribute given to an area ofmemorywhich can only be readfrom.SinceitisREADONLYsectionoftheprogramitisbydefaultforCODE.InARMAssemblylanguageweusethisareatowriteourinstructionsformachinecodeexecution.TheREADONLYsectionsareputnexttoeachotherintheflashmemory.

Note

InKeil,ThememoryspaceofREADONLYandREADWRITEaredefinedintheLinkerandTargettabsoftheProject\Options.Keilsetsthevaluesaccordingtothememorymapofthechosenchip.

CODE is an attribute given to an area of memory used for executable machineinstruction.SinceitisusedforcodesectionoftheprogramitisbydefaultREADONLYmemory. In ARM Assembly language we use this area to write our instructions. Thefollowinglinedefinesanewareaforwritingprograms:

AREAOUR_ASM_PROG,CODE,READONLY

DATA isanattributegiven toanareaofmemoryusedfordataandno instruction(machine instructions)canbeplaced in thisarea.Since it isusedfordatasectionof theprogram it is bydefault aREADWRITEmemory. InARMAssembly languageweusethisareatosetasideSRAMmemoryforscratchpadandstack.Thefollowinglinedefinesanewareafordefiningvariables:

AREAOUR_VARIABLES,DATA,READWRITE

Todefineconstantvaluesintheflashmemorywewritethefollowing:

AREAOUR_CONSTS,DATA,READONLY

COMMONisanattributegiventoanareaofDATAmemorysectionwhichcanbeusedcommonlybyseveralprogramcodes.WedonotinitializetheCOMMONsectionofthe memory since it is used by compiler exclusively. The compiler initializes theCOMMONmemoryareawithallzeros.

ALIGN is another attribute given to an area ofmemory to indicate howmemoryshouldbeallocatedaccordingtotheaddresses.WhentheALIGNisusedforCODEandREADONLYitalignedin4-bytesaddressboundarybydefaultsincetheARMinstructionsare all 32-bit (4-bytes) word. The ALIGN attribute of AREA has a number after likeALIGN=3whichindicatestheinformationshouldbeplacedinmemorywithaddressesof

Page 67: ARM Assembly Language: Programming and Architecture

23,thatis0x50000,0x50008,0x50010,0x50020,andsoon.Wewillseemoreaboutthatsoon.TheusageandimportanceofALIGNattributeisdiscussedinChapter6.

ENTRY

Another important pseudocode is the ENTRY directive. This indicates to theassemblerthebeginningoftheexecutablecode.TheENTRYdirectiveisthefirstlineoftheARMAssemblylanguagecodesectionoftheprogram,meaningthatanythingaftertheENTRY directive in the source code is considered actual machine instruction to beexecutedbytheCPU.ForagivenARMAssemblylanguageprogramwecanhaveonlyoneENTRYpoint.HavingmultipleENTRYdirectiveinanAssemblylanguageprogramwillgiveyouanerrorbyassembler.

ENDdirective

AnotherimportantpseudocodeistheENDdirective.Thisindicatestotheassemblertheendofthesource(asm)file.TheENDdirectiveisthelastlineoftheARMAssemblylanguageprogram,meaning that anything after theENDdirective in the source code isignored by the assembler. Program 2-1 shows how the AREA, ENTRY and ENDdirectivesareused.

Program2-1

;ARMAssemblyLanguageProgramToAddSomeDataandStoretheSUMinR3.

AREAPROG_2_1,CODE,READONLY

ENTRY

MOVR1,#0x25;R1=0x25

MOVR2,#0x34;R2=0x34

ADDR3,R2,R1;R3=R2+R1

HEREBHERE;stayhereforever

END

LDR

Inthelastsection,westatedthatonecannot loadvalueslarger than0xFFintothe32-bit registers ofARM since only 8 bits are used for the immediate value.TheARMassemblerprovideusapseudo-instructionof“LDRRd,=32-bit_immidiate_vlaue”toloadvaluegreaterthan0xFF.Wewillexaminehowthispseudo-instructionworksinChapter 6.Fornow,justnoticethe=signusedinthesyntax.Thefollowingpseudo-instructionloadsR7with0x112233.

LDRR7,=0x112233

Wewill use this pseudo-instruction to load 32-bit value into register extensivelythroughout the book. To load values less than 0xFF, we still use the “MOV Rd,#8-

Page 68: ARM Assembly Language: Programming and Architecture

bit_immidiate_value” instruction since it is a real instruction of ARM, therefore moreefficientincodesize.

EQU(equate)

Thisisusedtodefineaconstantvalueorafixedaddress.TheEQUdirectivedoesnotsetasidestorageforadata item,butassociatesaconstantnumberwithadataoranaddresslabelsothatwhenthelabelappearsintheprogram,itsconstantwillbesubstitutedfor the label.ThefollowingusesEQUfor thecounterconstant,and then theconstant isusedtoloadtheR2register:

COUNTEQU0x25

……….

MOVR2,#COUNT;R2=0x25

Whenexecutingtheaboveinstruction“MOVR2,#COUNT”,theregisterR2willbeloadedwiththevalue0x25.WhatistheadvantageofusingEQU?Assumethataconstant(afixedvalue)isusedthroughouttheprogram,andtheprogrammerwantstochangeitsvalue everywhere. By the use of EQU, the programmer can change it once and theassembler will change all of its occurrences throughout the program. This allows theprogrammertoavoidsearchingtheentireprogramtryingtofindeveryoccurrence.

UsingEQUforfixeddataassignment

TogetmorepracticeusingEQUtoassignfixeddata,examinethefollowing:

DATA1EQU0x39;thewaytodefinehexvalue

DATA2EQU2_00110101;thewaytodefinebinaryvalue(35inhex)

DATA3EQU39;decimalnumbers(27inhex)

DATA4EQU‘2’;ASCIIcharacters

UsingEQUforSFRaddressassignment

EQUisalsowidelyusedtoassignSFRaddresses.Examinethefollowingcode:

PORTBEQU0xF0018;SFRPortBaddress

MOVR6,#0x01;R6=0x01

LDRR2,=PORTB;R2=0xF0018

STRBR6,[R2];PortBnowhas0x01

UsingEQUforRAMaddressassignment

AnothercommonusageofEQUisfortheaddressassignmentoftheinternalSRAM.ItisexactlylikeusingEQUforSFRaddressassignment.Examinethefollowingcode:

SUMEQU0x40000120;assignRAMloctoSUM

MOVR2,#5;loadR2with5

Page 69: ARM Assembly Language: Programming and Architecture

MOVR1,#2;loadR1with2

ADDR2,R2,R1;R2=R2+R1

LDRR3,=SUM;loadR3with0x40000120

STRBR2,[R3];storetheresultSUM

This is especiallyhelpfulwhen theaddressneeds tobechanged inorder touseadifferentARMchipforagivenproject.ItismucheasiertorefertoanamethananumberwhenaccessingRAMaddresslocations.

RN(equate)

Thisisusedtodefineanameforaregister.TheRNdirectivedoesnotsetasideaseperate storage for the name, but associates a registerwith that name. It improves theclarity.Program2-2showshowweuseSUMnameforR3.

Program2-2:AnARMAssemblyLanguageProgramUsingRNDirective

;ARMAssemblyLanguageProgramToAddSomeData

;andstoretheSUMinR3.

VAL1RNR1;defineVAL1asanameforR1

VAL2RNR2;defineVAL2asanameforR2

SUMRNR3;defineSUMasanameforR3

AREAPROG_2_2,CODE,READONLY

ENTRY

MOVVAL1,#0x25;R1=0x25

MOVVAL2,#0x34;R2=0x34

ADDSUM,VAL1,VAL2;R3=R2+R1

HEREBHERE

END

INCLUDEdirective

The includedirective tells theARM assembler toadd thecontentsofa file toourprogram(likethe#includedirectiveinClanguage).

Assemblerdataallocationdirectives

In most Assembly languages there are some directives to allocate memory andinitializeitsvalue.InARMAssemblylanguageDCB,DCD,andDCWallocatememoryandinitializethem.TheSPACEdirectiveallocatesmemorywithoutinitializingit.

Page 70: ARM Assembly Language: Programming and Architecture

DCBdirective(defineconstantbyte)

TheDCBdirectiveallocatesabytesizememoryandinitializesthevalues.

MYVALUEDCB5;MYVALUE=5

MYMSAGEDCB“HELLOWORLD”;string

DCWdirective(defineconstanthalf-word)

TheDCWdirectiveallocatesahalf-wordsizememoryandinitializesthevalues.

MYDATADCW0x20,0xF230,5000,0x9CD7

DCDdirective(defineconstantword)

TheDCDdirectiveallocatesawordsizememoryandinitializesthevalues.

MYDATADCD0x200000,0xF30F5,5000000,0xFFFF9CD7

SeeTables2-8and2-9.

Directive Description

DCB Allocatesoneormorebytesofmemory,anddefinestheinitialruntimecontentsofthememory

DCW Allocatesoneormorehalfwordsofmemory, alignedon two-byteboundaries, anddefinestheinitialruntimecontentsofthememory.

DCWU Allocatesoneormorehalfwordsofmemory,anddefinestheinitialruntimecontentsofthememory.Thedataisnotaligned.

DCD Allocates one or more words of memory, aligned on four-byte boundaries, anddefinestheinitialruntimecontentsofthememory.

DCDU Allocatesoneormorewordsofmemoryanddefinestheinitialruntimecontentsofthememory.Thedataisnotaligned.

Table2-8:SomeWidelyUsedARMMemoryAllocationDirectives

DataSize Bits Decimal Hexadecimal Directive Instruction

Byte 8 0–255 0-0xFF DCB STRB/LDRB

Half-word 16 0–65535 0-0xFFFF DCW STRH/LDRH

Word 32 0–232-1 0-0xFFFFFFFF DCD STR/LDR

Table2-9:UnsignedDataRangeinARMandassociatedInstructions

In Program 2-3A you see an example of storing constant values in the programmemoryusingthedirectives.Figure2-12showshowthedataisstoredinmemory.Intheexample, theprogramgoes to locations0x00 to0x0F.TheDCBdirectivestoresdata inaddresses0x10–0x17.Asyouseeonebyteisallocatedforeachdata.TheDCDallocates4

Page 71: ARM Assembly Language: Programming and Architecture

bytesforeachdata.Asaresultthelowestbyteof0x23222120(whichis0x20)isstoredinlocation0x18andthenextbytesarestoredinthenextlocations.

Program2-3A:SampleofStoringFixedDatainProgramMemory

;storingdatainprogrammemory.

AREALOOKUP_EXAMPLE,READONLY,CODE

ENTRY

LDRR2,=OUR_FIXED_DATA;pointtoOUR_FIXED_DATA

LDRBR0,[R2];loadR0withthecontents

;ofmemorypointedtobyR2

ADDR1,R1,R0;addR0toR1

HEREBHERE;stayhereforever

OUR_FIXED_DATA

DCB0x55,0x33,1,2,3,4,5,6

DCD0x23222120,0x30

DCW0x4540,0x50

END

Figure2-12:MemoryDumpforProgram2-3A

TheDCWdirective allocates 2 bytes for eachdata.For example, the lowbyte of0x4540islocatedinaddress0x20andthehighbyteofitgoestoaddress0x21.Similarlythelowbyteof0x50islocatedinaddress0x22andthehighbyteofitinaddress0x23.

Intheprogramtoaccessthedata,firsttheR2registerisloadedwiththeaddressofOUR_FIXED_DATA.Inthisexample,OUR_FIXED_DATAhasaddress0x10. So,R2isloadedwith0x10.Then,thecontentsoflocation0x10isloadedintoregisterR0,usingtheLDRBinstruction.

SPACEdirective

Using the SPACE directivewe can allocatememory for variables. The followinglinesallocate4and2bytesofmemoryandnamethemasLONG_VARandOUR_ALFA:

Page 72: ARM Assembly Language: Programming and Architecture

LONG_VARSPACE4;Allocate4bytes

OUR_ALFASPACE2;Allocate2bytes

In the followingprogram3variablesaredefined:A,B,andC.ThenAandBareinitializedwith5and4,respectively.InthenextstepAandBareaddedtogetherandtheresultisputinC:

Program2-3B

AREAOUR_PROG,CODE,READONLY

ENTRY

;A=5

LDRR0,=A;R0=Addr.ofA

MOVR1,#5;R1=5

STRR1,[R0];init.Awith5

;B=4

LDRR0,=B;R0=Addr.ofB

MOVR1,#4;R1=4

STRR1,[R0];init.Bwith4

;R1=A

LDRR0,=A;R0=Addr.ofA

LDRR1,[R0];R1=valueofA

;R2=B

LDRR0,=B;R0=Addr.ofA

LDRR2,[R0];R2=valueofA

;C=R1+R2(C=A+B)

ADDR3,R1,R2;R3=A+B

LDRR0,=C;R0=Addr.ofC

STRR3,[R0];C=R3

loopBloop

AREAOUR_DATA,DATA,READWRITE

;AllocatesthefollowingsinSRAMmemory

ASPACE4

Page 73: ARM Assembly Language: Programming and Architecture

BSPACE4

CSPACE4

END

ADRdirective

ToloadregisterswiththeaddressesofmemorylocationswecanalsousetheADRpseudo-instructionwhichhasabetterperformance.SeeChapter6formore.ADRhasthefollowingsyntax:

ADRRn,label

For example, in Program 2-3A we can load R2 with the address ofOUR_FIXED_DATAusingthefollowingpseudo-instruction:

ADRR2,OUR_FIXED_DATA;pointtoOUR_FIXED_DATA

ALIGN

Thisisusedtomakesuredataisalignedin32-bitwordor16-bithalfwordmemoryaddress.ThefollowingusesALIGNtomakethedata32-bitwordaligned:

ALIGN4;thenextinstructionisword(4bytes)aligned

ALIGN2;thenextinstructionishalf-word(2bytes)aligned

Example2-7showstheresultofusingtheALIGNdirective.

Example2-7

ComparetheresultofusingALIGNinthefollowingprograms:

a)

AREAE2_7A,READONLY,CODE

ENTRY

ADRR2,DTA

LDRBR0,[R2]

ADDR1,R1,R0

H1BH1

DTADCB0x55

Page 74: ARM Assembly Language: Programming and Architecture

DCB0x22

END

b)

AREAE2_7B,READONLY,CODE

ENTRY

ADRR2,DTA

LDRBR0,[R2]

ADDR1,R1,R0

H1BH1

DTADCB0x55

ALIGN2

DCB0x22

END

c)

AREAE2_7C,READONLY,CODE

ENTRY

ADRR2,DTA

LDRBR0,[R2]

ADDR1,R1,R0

H1BH1

DTADCB0x55

ALIGN4

DCB0x22

END

Solution:

a)

WhenthereisnoALIGNdirectivetheDCBdirectiveallocatesthefirstemptylocationforitsdata.Inthisexample,address0x10isallocatedfor0x55.So0x22goestoaddress0x11.

Page 75: ARM Assembly Language: Programming and Architecture

b)

IntheexampletheALIGNissetto2whichmeansthedatashouldbeputinalocationwithevenaddress.The0x55goestothefirstemptylocationwhichis0x10.Thenextemptylocationis0x11whichisnotamultipleof2.So,itisfilledwith0andthenextdatagoestolocation0x12.

c)

IntheexampletheALIGNissetto4whichmeansthedatashouldgotolocationswhoseaddressismultipleof4.The0x55goestothefirstemptylocationwhichis0x10.Thenextemptylocationsare0x11,0x12,and0x13whicharenotamultipleof4.So,theyarefilledwith0sandthenextdatagoestolocation0x14.

RulesforlabelsinAssemblylanguage

Bychoosing labelnames that aremeaningful, aprogrammercanmakeaprogrammucheasier to readandmaintain.Thereareseveral rules thatnamesmust follow.First,each label name must be unique. The names used for labels in Assembly languageprogramming consist of alphabetic letters in both uppercase and lowercase, the digits 0through9,andthespecialcharactersquestionmark(?),period(.),at(@),underline(_),and dollar sign ($). The first character of the labelmust be an alphabetic character. Inotherwords,itcannotbeanumber.Everyassemblerhassomereservedwordsthatmustnot be used as labels in the program. Foremost among the reserved words are themnemonics for the instructions.Forexample,“MOV”and“ADD”are reservedbecausethey are instruction mnemonics. In addition to the mnemonics there are some other

Page 76: ARM Assembly Language: Programming and Architecture

reservedwords.Checkyourassemblerforthelistofreservedwords.

ReviewQuestions

1.GiveanexampleofhexdatarepresentationintheARMassembler.

2. Showhow to represent decimal 20 in formats of (a) hex, (b) decimal, and (c)binaryintheARMassembler.

3.WhatistheadvantageinusingtheEQUdirectivetodefineaconstantvalue?

4.Showthehexnumbervalueusedbythefollowingdirectives:

(a)ASC_DATAEQU‘4’(b)MY_DATAEQU2_00011111

5.GivethevalueinR2forthefollowing:

MYCOUNTEQU15

MOVR2,#MYCOUNT

6.Givethevalueindatamemorylocation0x200000forthefollowing:

MYCOUNTEQU0x95

MYMEMEQU0x200000

MOVR0,#MYCOUNT

LDRR2,=MYMEM

STRBR0,[R2]

7.Givethevalueindatamemory0x630000forthefollowing:

MYDATAEQU12

MYMEMEQU0x00630000

FACTOREQU0x10

MOVR1,#MYDATA

MOVR2,#FACTOR

LDRR3,=MYMEM

ADDR1R2,R1

STRBR1,[R3]

Page 77: ARM Assembly Language: Programming and Architecture
Page 78: ARM Assembly Language: Programming and Architecture

Section2.6:IntroductiontoARMAssemblyProgrammingInthissectionwediscussAssemblylanguageformatanddefinesomewidelyused

terminologyassociatedwithAssemblylanguageprogramming.

WhiletheCPUcanworkonlyinbinary,itcandosoataveryhighspeed.Itisquitetedious and slow for humans, however, to dealwith 0s and 1s in order to program thecomputer.Aprogramthatconsistsof0sand1s iscalledmachine language. In theearlydaysof thecomputer,programmerscodedprograms inmachine language.Although thehexadecimal systemwasused as amore efficientway to represent binarynumbers, theprocess of working in machine code was still cumbersome for humans. Eventually,Assembly languagesweredeveloped,whichprovidedmnemonics for themachine codeinstructions,plusotherfeaturesthatmadeprogrammingfasterandlesspronetoerror.Thetermmnemonicisfrequentlyusedincomputerscienceandengineeringliteraturetoreferto codes and abbreviations that are relatively easy to remember. Assembly languageprograms must be translated into machine code by a program called an assembler.Assemblylanguageisreferredtoasalow-levellanguagebecauseitdealsdirectlywiththeinternal structureof theCPU.Toprogram inAssembly language, theprogrammermustknowalltheregistersoftheCPUandthesizeofeach,aswellasotherdetails.

Today,onecanusemanydifferentprogramminglanguages,suchasBASIC,Pascal,C, C++, Java, and numerous others. These languages are called high-level languagesbecause the programmer does not have to be concernedwith the internal details of theCPU. Whereas an assembler is used to translate an Assembly language program intomachine code (sometimes also called object code or opcode for operation code), high-level languages are translated into machine code by a program called a compiler. Forinstance,towriteaprograminC,onemustuseaCcompilertotranslatetheprogramintomachinelanguage.NextwelookatARMAssemblylanguageformat.

StructureofAssemblylanguage

AnAssemblylanguageprogramconsistsof,amongotherthings,aseriesoflinesofAssembly language instructions. An Assembly language instruction consists of amnemonic,optionallyfollowedbytwoorthreeoperands.Theoperandsarethedataitemsbeingmanipulated,andthemnemonicsarethecommandstotheCPU,tellingitwhattodowiththoseitems.SeeProgram2-4.

Program2-4:SampleofanARMAssemblyLanguageProgram

;ARMAssemblylanguageprogramtoaddsomedataandstoretheSUMinR3.

AREAPROG_2_4,CODE,READONLY

ENTRY

MOVR1,#0x25;R1=0x25

MOVR2,#0x34;R2=0x34

Page 79: ARM Assembly Language: Programming and Architecture

ADDR3,R2,R1;R3=R2+R1

HEREBHERE

END

AnAssemblylanguageprogramisaseriesofstatements,orlines,whichareeitherAssemblylanguageinstructions,suchasADDandMOV,orstatementscalleddirectives.While instructions tell the CPUwhat to do, directives (also called pseudo-instructions)givedirectionstotheassembler.Forexample,inProgram2-4,whiletheMOVandADDinstructionsarecommandstotheCPU,ENTRYandENDaredirectivestotheassembler.The directive END tells the assembler that it is the end of the code, while ENTRY isbeginningofthecode.

AnAssemblylanguageinstructionconsistsoffourfields:

[label]mnemonic[operands][;comment]

Bracketsindicatethatafieldisoptionalandnotalllineshavethem.Bracketsshouldnotbetypedin.Regardingtheaboveformat,thefollowingpointsshouldbenoted:

1.Thelabelfieldallowstheprogramtorefertoalineofcodebyname.Thelabelfieldcannotexceedacertainnumberofcharacters.Checkyourassemblerfortherule.

2.TheAssemblylanguagemnemonic(instruction)andoperand(s)fieldstogetherperform the real work of the program and accomplish the tasks forwhich theprogramwaswritten.InAssemblylanguagestatementssuchas

MOVR3,#0x55

MOVR2,#0x67

ADDR2,R2,R3;R2=R2+R3

ADDandMOVarethemnemonicsthatproduceopcodes;the“0x55”and“0x67”arethe operands. Instead of a mnemonic and an operand, these two fields could containassembler pseudo-instructions, or directives. Remember that directives do not generateanymachinecode(opcode)andareusedonlybytheassembler,asopposedtoinstructionsthataretranslatedintomachinecode(opcode)fortheCPUtoexecute.InProgram2-4thecommands END and ENTRY are examples of directives. Many of these pseudo-instructionswerediscussedinthelastsection.

3.Thecommentfieldbeginswithasemicoloncommentindicator“;”.Commentsmaybe at the endof a lineoron a lineby themselves.The assembler ignorescomments,but theyare indispensable toprogrammers.Althoughcommentsareoptional, it isrecommendedthattheybeusedtodescribetheprograminawaythatmakesiteasierforsomeoneelsetoreadandunderstand.

4. Noticethelabel“HERE”inthelabelfieldinProgram2-4.IntheB(Branch)statementtheARMistoldtostayinthisloopindefinitely.Ifyoursystemhasamonitor program you do not need this line and should delete it from your

Page 80: ARM Assembly Language: Programming and Architecture

program.Inthenextsectionwewillseehowtocreateaready-to-runprogram.

Note!

Thefirstcolumnofeachlineisalwaysconsideredaslabel.Thus,becarefultopressaTabatthebeginningofeachlinethatdoesnothavelabel;otherwise,yourinstructionisconsideredasalabelandanerrormessagewillappearwhencompiling.

ReviewQuestions

1.Whatisthepurposeofpseudo-instructions?

2._____________aretranslatedbytheassemblerintomachinecode,whereas__________________arenot.

3.Trueorfalse.Assemblylanguageisahigh-levellanguage.

4.Whichofthefollowinginstructionsproducesopcode?Listallthatdo.

(a)MOVR6,#0x25(b)ADDR2,R1,R3(c)END(d)HEREBHERE

5.Pseudo-instructionsarealsocalled___________.

6.Trueorfalse.AssemblerdirectivesarenotusedbytheCPUitself.Theyaresimplyaguidetotheassembler.

7.InQuestion4,whichoneisanassemblerdirective?

Page 81: ARM Assembly Language: Programming and Architecture
Page 82: ARM Assembly Language: Programming and Architecture

Section2.7:AssemblinganARMProgramNowthatthebasicformofanAssemblylanguageprogramhasbeengiven,thenext

questionis:Howitiscreated,assembled,andmadereadytorun?ThestepstocreateanexecutableAssemblylanguageprogram(Figure2-13)areoutlinedasfollows:

Figure2-13:StepstoCreateaProgram

1. FirstweuseatexteditortotypeinaprogramsimilartoProgram2-4.Inthecase of ARM, we can use the Keil IDE, which has a text editor, assembler,simulator, and much more all in one software package. It is an excellentdevelopmentsoftwarethatsupportsalltheARMchipsandisfreeforuniversitystudents.Seewww.keil.comforevaluationversionof thesoftwareforstudents.Manyeditorsorwordprocessorsarealsoavailablethatcanbeusedtocreateoredittheprogram.AwidelyusededitoristheNotepadinWindows,whichcomeswith all Microsoft operating systems. Notice that the editor must be able toproduce an ASCII file. For assemblers, the file names follow the usual DOSconventions, but the source file has the extension “.a” or “.asm”. The “a”extensionforthesourcefileisusedbyanassemblerinthenextstep.

2. The“a”sourcefilecontainingtheprogramcodecreatedinstep1isfedtotheARMassembler.Theassemblerproducesanobjectfile,andalistfile.Theobjectfilehastheextension“.o”,andthelistfilehas“.lst”extension.

3.Theobjectfileplusascriptfileareusedbythelinkertoproducemapfileandhex file. The map file has the extension “.map”, and the hex file has “.hex”extension.The script file is optional and can be replacedwith some commandlineoptions.Aftera successful link, thehex file is ready tobeburned into theARM’sprogramROMandisdownloadedintotheARMchip.

Moreaboutasmandobjectfiles

Theasmfileisalsocalledthesourcefileandmusthavethe“a”or“asm”extension.Asmentioned earlier, this file is created with a text editor such asWindowsNotepad.

Page 83: ARM Assembly Language: Programming and Architecture

Manyassemblerscomewithatexteditor.Theassemblerconvertstheasmfile’sAssemblylanguage instructions intomachine languageandprovides theo (object) file.Theobjectfile,asmentionedearlier,hasan“o”asitsextension.Theobjectfileisusedasinputtoasimulatororanemulator.

Beforewecanassembleaprogramtocreateaready-to-runprogram,wemustmakesurethatitiserrorfree.TheKeiluVisionIDEprovidesuserrormessagesandweexaminethemtoseethenatureofsyntaxerrors.Theassemblerwillnotassembletheprogramuntilallthesyntaxerrorsarefixed.AsampleofanerrormessageisshowninFigure2-14.

Buildtarget‘Target1’

assemblinga1.asm…a1.asm(7):error:A1163E:UnknownopcodeMOVE,expectingopcodeorMacro

Targetnotcreated

Figure2-14:SampleofanErrorMessage

“lst”and“map”files

Themap file shows the labels defined in the program togetherwith their values.ExamineFigure2-15.ItshowstheMapfileofProgram2-4.

MemoryMapoftheimage

ImageEntrypoint:0x00000000

LoadRegionLR_1(Base:0x00000000,Size:0x00000010,Max:0xffffffff,ABSOLUTE)

ExecutionRegionER_RO(Base:0x00000000,Size:0x00000010,Max:0xffffffff,ABSOLUTE)

BaseAddrSizeTypeAttrIdxESectionNameObject

0x000000000x00000010CodeRO1*PROG_2_1a2.o

ExecutionRegionER_RW(Base:0x40000000,Size:0x00000000,Max:0xffffffff,ABSOLUTE)

****Nosectionassignedtothisexecutionregion****

ExecutionRegionER_ZI(Base:0x40000000,Size:0x00000000,Max:0xffffffff,ABSOLUTE)

****Nosectionassignedtothisexecutionregion****

Figure2-15:SampleofaMapFile

Thelst(list)file,whichisoptional,isveryusefultotheprogrammer.Thelistshowsthebinaryandsourcecode;italsoshowswhichinstructionsareusedinthesourcecode,andtheamountofmemorytheprogramuses.SeeFigure2-16.

ARMMacroAssembler Page1

100000000 ;ARMAssemblyLanguageProgramToAddSomeDataandStoretheSUMinR3.

2

Page 84: ARM Assembly Language: Programming and Architecture

00000000

300000000 AREAPROG_2_4,CODE,READONLY

400000000 ENTRY

500000000 E3A01025 MOVR1,#0x25;R1=0x25

600000004 E3A02034 MOVR2,#0x34;R2=0x34

700000008 E0823001 ADDR3,R2,R1;R3=R2+R1

80000000C EAFFFFFE HERE BHERE

900000010 END

Figure2-16:SampleofaListFileforARM

Manyassemblersassumethatthelistfileisnotwantedunlessyouindicatethatyouwant to produce it. These files can be accessed by a text editor such as Notepad anddisplayedonthemonitor,orsenttotheprintertogetahardcopy.Theprogrammerusesthelistandmapfilestolocatesyntaxerror.

There are many different ARM assemblers available for evaluation nowadays. Ifyou use theWindows operating system,ARM IAR IDE andKeil uVision can be used.Theyhavegreatfeaturesandniceenvironments.

ReviewQuestions

1.Trueorfalse.TheARMuVisionIDEandWindowsNotepadtexteditorbothproduceanASCIIfile.

2.Trueorfalse.Theextensionforthesourcefileis“a”.

3.Whichofthefollowingfilescanbeproducedbyatexteditor?

(a)myprog.a(b)myprog.obj(c)myprog.hex(d)myprog.lst

4.Whichofthefollowingfilesisproducedbyanassembler?

(a)myprog.asm(b)myprog.obj(c)myprog.hex(d)myprog.lst

Page 85: ARM Assembly Language: Programming and Architecture
Page 86: ARM Assembly Language: Programming and Architecture

Section2.8:TheProgramCounterandProgramROMSpaceintheARMIn this section we discuss the role of the program counter (PC) in executing a

programandshowhowthecodeisfetchedfromROMandexecuted.Wewillalsodiscusstheprogram(code)ROMspaceforvariousARMfamilymembers.Finally,weexaminetheHarvardarchitectureoftheARM.

ProgramcounterintheARM

ThemostimportantregisterinARMisthePC(programcounter).AswementionedearliertheR15istheprogramcounterinARM.TheprogramcounterisusedbytheCPUto point to the address of the next instruction to be executed. As the CPU fetches theopcode from the program ROM, the program counter is incremented automatically topointtothenextinstruction.Thewidertheprogramcounter,themorememorylocationsaCPUcanaccess.Thatmeansthata32-bitprogramcountercanaccessamaximumof4G(232=4G)bytesprogrammemorylocations.

Inmostmicrocontrollerseachmemorylocationis1bytewide.Inthecaseofa32-bit program counter, the memory space is 4G (232 = 4G) bytes, which occupies the0x00000000 –0xFFFFFFFF address range since it is byte-addressable. The programcounterintheARMfamilyis32bitswide.ThismeansthattheARMfamilycanaccessaddresses00000000to0xFFFFFFFF,atotalof4Gbyteofmemoryspacelocations.The4GbytesofmemoryspacelocationsareallocatedamongtheI/Operipherals,SRAM,andFlashROM.However,atthetimeofthiswriting,noneofthemembersoftheARMfamilyhavetheentire4Gbytesofmemoryspacepopulatedwithon-chipperipherals,SRAM,andFlash.

PoweruplocationforARM

Onequestionthatwemustaskaboutanymicrocontroller(ormicroprocessor)is:“Atwhat address does the CPUwake upwhen power is applied?” Eachmicroprocessor isdifferent.InthecaseoftheARMmicrocontrollers(thatis,allmembersregardlessofthefamilyandvariation),themicrocontrollerwakesupatmemoryaddress0x00000000whenit ispoweredup.BypoweringupwemeanapplyingVCCoractivatingRESETpin. Inotherwords,when theARM ispoweredup, thePC (programcounter)has thevalueof0x00000000init.ThismeansthatitexpectsthefirstopcodetobestoredatROMaddress0x00000000.For this reason, in theARMsystem, the first opcodemustbeburned intomemorylocation0x00000000ofprogramROMbecausethisiswhereitlooksforthefirstinstruction when it is booted. Next we discuss the step-by-step action of the programcounterinfetchingandexecutingasampleprogram.

PlacingcodeinprogramROM

To get a better understanding of the role of the program counter in fetching andexecutingaprogram,weexaminetheactionoftheprogramcounteraseachinstructionisfetchedandexecuted.First,weexamineoncemorethelistfileofthesampleprogramandshowhow the code is placed into theFlashROMof theARMchip.Aswe can see inFigure2-16,theopcodeandoperandforeachinstructionarelistedontheleftsideofthe

Page 87: ARM Assembly Language: Programming and Architecture

listfile.

AftertheprogramisburnedintoROMofanARMchip,theopcodeandoperandareplacedinROMmemorylocationsstartingat0x00000000asshownintheProgram2-4listfile.

Thelistshowsthataddress00000000containsE3A01025,whichistheopcodeformovingavalueintoregister,andtheoperand(inthiscase0x25)tobemovedtoanotheroperand (in this caseR1). Therefore, the instruction “MOVR1, #0x25” has amachinecodeof“E3A01025”,whereE3Aistheopcodeand01025istheoperands.SeeFigure2-16. Similarly, the machine code “E3A02034” is located in ROM memory location00000004 and represents the opcode and the operands for the instruction “MOV R2,#0x34”. In the same way, machine code “E0813002” is located in memory location00000008 and represents the opcode and the operand for the instruction “ADDR3,R1,R2”. The opcode for “HERE B HERE” and its target address are located inlocations 0000000C. Notice that all the instructions in this program are 4-byteinstructions.

Executingaprograminstructionbyinstruction

Assuming that the above program is burned into the ROM of anARM chip, thefollowingisastep-by-stepdescriptionoftheactionoftheARMuponapplyingpowertoit:

1. WhentheARMispoweredup,thePC(programcounter)has00000000andstartstofetchthefirstinstructionfromlocation00000000oftheprogramROM.InthecaseoftheaboveprogramthefirstcodeisE3A01025,whichisthecodeformovingoperand0x25 toR1.Uponexecuting thecode, theCPUplaces thevalueof25inR1.Nowoneinstructionisfinished.Theprogramcounterisnowincremented to point to 00000004 (PC = 00000004), which contains codeE3A02034,themachinecodefortheinstruction“MOVR2,#0x34”.

2. UponexecutingthemachinecodeE3A02034,thevalue0x34isloadedtoR2.Theprogramcounterisincrementedto00000008.

3. ROM location 00000008 has the machine code for instruction “ADDR3,R2,R1”.ThisinstructionisexecutedandnowPC=0000000C.

4. Now PC = 0000000C points to the next instruction, which is “HERE BHERE”.Aftertheexecutionofthisinstruction,PC=0000000C.Thiskeepstheprograminaninfiniteloop.

The fact that the program counter points at the next instruction to be executedexplains why some microprocessors (notably the x86) call the program counter theinstructionpointer.

The steps of running a code in ARM is slightly different from what mentionedabovebecauseoftheuseofpipelineinARMarchitecture.WewillexaminepipelineslaterinChapter 7.AlsoinARMCortexMchipsthepower-onResetlocationvalueisdifferentaswewillseeinChapter8.

Page 88: ARM Assembly Language: Programming and Architecture

InstructionformationoftheARM

RecallthattheARMinstructionsarealways4-byte.Nextweexploretheinstructionformationforafewoftheinstructionswehaveusedinthischapter.ThisshouldgiveyousomeinsightintotheinstructionsoftheARM.

ADDinstructionformation

TheADDisa4-byte(32-bit)instruction.SeeFigure2-17.Ofthe32bits,thefirst4bitsaresetasidefortheconditionfieldwhichwillbediscussedmoreinChapter4.Bits26and27arealways0inADDinstruction.Bit25whichisindicatedbyIdefinesthetypeofsecondoperand.Aswementionedbefore,thesecondoperandcanbeeitheraregisteroranimmediate value between 0–255. If I = 1, the second operand is an immediate valueotherwiseitshouldbearegister.Bits24to21aretheoperationcodeofADDinstruction.Whenthesebitsare0100theCPUknowsthatitshouldruntheADDinstruction.Bit20whichisindicatedbySdefineseithertheinstructionshouldupdatetheflagbitsornot.InADDinstructionthisbitiszerowhileinADDSinstructionitisone.Bits19to16definethefirstoperand(Rn).ItcanbearegisternumberbetweenR0toR15.Likewise,bits15to12definethedestinationregister(Rd).Finally,bits11to0definethesecondoperand.Aswementionedbefore,bit25(I)definesthateitherthesecondoperandshouldbearegisteroranimmediatevalue.WewilldiscussthebitsofOperand2inmoredetailinChapter3.

Figure2-17:ADDInstructionFormation

SUBinstructionformation

TheSUBisa4-byte(32-bit)instruction.Ofthe32bits,thefirst4bitsaresetasidefor theconditionfield.Bits26and27arealways0 inSUBinstruction.Bit25which isindicated by I defines the type of second operand. If I = 1, the second operand is animmediatevalueotherwiseitshouldbearegister.Bits24to21aretheoperationcodeofSUB instruction.When these bits are 0010 theCPUknows that it should run theSUBinstruction.Bit20whichisindicatedbySdefineseithertheinstructionshouldupdatetheflagbitsornot.InSUBinstructionthisbitiszerowhileinSUBSinstructionitisone.Bits19 to16define the first operand (Rn). It canbe a registernumberbetweenR0 toR15.Likewise,bits15to12definethedestinationregister(Rd).Finally,bits11to0definethesecondoperand.Aswementionedbefore,bit25(I)definesthateitherthesecondoperandshouldbearegisteroranimmediatevalue.TheformationofSUBinstructionisshowninFigure2-18.

Figure2-18:SUBInstructionFormation

Generalformationofdataprocessinginstructions

Asyoumayhavenoticed,theformationofADDandSUBinstructionsarethesame

Page 89: ARM Assembly Language: Programming and Architecture

exceptbits24to21whicharecalledtheoperationcodeandtellstheCPU whatinstructionitshouldexecute.InARM,allofthedataprocessinginstructionshavethesameformat.Figure2-19showsthegeneralformationofdataprocessinginstructions.Eachofthedataprocessing instructionhasauniqueoperationcode. InTable2-10you see the listof alldataprocessinginstructionsandtheiropcodes.

Figure2-19:GeneralFormationofDataProcessingInstructions

Branchinstructionformation

TheBisa4-byte(32-bit)instruction.SeeFigure2-20.Ofthe32bits,thefirst4bitsaresetasidefortheconditionfieldwhichwillbediscussedmoreinChapter4.Bits27to25arealways101inBinstruction.Bit24whichisindicatedbyLiszeroinBinstructionandoneinBLinstruction.Bits23to0giveusbranchtargetlocationrelativetothecurrentaddress.ThesewillbediscussedfurtherinChapter4.

Figure2-20:BranchInstructionFormation

ROMwidthintheARM

Aswehaveseenso far in this section,each locationof theaddressspaceholds1byte. Ifwe have 32 address lines, thiswill give us 232 locations,which is 4Gbytes ofmemory locationwith an addressmap of 0x00000000–0xFFFFFFFF. To bring inmoreinformation(codeordata) intotheCPUineachbuscycle,ARMincreasedthewidthofthedatabusto32bits.Inotherwords,theARMisword-addressable.Incontrast,the8051CPUisbyte-addressableonly.Inasense,thedatabusisliketrafficlanesonthehighwaywhereeachlaneis8bitswide.Themorelanes,themoreinformationwecanbringintotheCPUforprocessing.FortheARM,theinternaldatabusbetweenthecodememoryandtheCPUis32bitswide,asshowninFigure2-21.Therefore,the4Gmemoryspaceisshownas1G×32usinga32-bitworddatabussize.ThewideningofthedatapathbetweentheprogramROMand theCPU is anotherway inwhich theARMdesigners increased theprocessingpoweroftheARMfamily.Anotherreasontomakethecodememory32bitswideistomatchitwiththeinstructionwidthoftheARMbecausealloftheinstructionsare4-bytewide.Thisway, theCPUbrings inan instructionfrommemoryevery time itmakesatriptotheprogrammemory.Thatwillmakeinstructionfetchasinglecycle,aswewillseeintheChapter4wheninstructiontimingisdiscussed.

TheARMdesignershavemadeallinstructionsfixedat4-byte;thereareno1-byte,2-byte,or3-byteinstructions,asisthecasewiththex86and8051chips.ThisispartoftheRISCarchitecturalphilosophy,whichwill bediscussed later in this chapter. ItmustalsobenotedthatthedatamemorySRAMintheARMmicrocontrollerarealso4-byte.

HarvardandvonNeumannarchitecturesintheARM

Page 90: ARM Assembly Language: Programming and Architecture

In Chapter 0,we discussedHarvard andVonNeumann architecture.ARM 9 andnewerarchitecturesuseHarvardarchitecture,whichmeans that thereare separatebusesforthecodeandthedatamemory.SeeFigure2-21.TheprogrambusprovidesaccesstotheprogrammemorywhereasthedatabusisusedforbringingdatatotheCPU.

Figure2-21:Harvardvs.VonNeumannArchitecture

AswecanseeinFigure2-21,intheprogrambus,thedatabusis32bitswideandthe address bus is as wide as the PC register to enable the CPU to address the entireprogrammemory.

InSections2-2and2-3,we learnedaboutdatamemoryspaceandhowtouse theSTR and LDR instructions. When the CPU wants to execute the “LDR Rd,[Rx]”instruction, itputsRxon theaddressbusof thedatabus,and receivesdata through thedatabus.Forexample,toexecute“LDRR2,[R5]”,assumingthatR5=0x40000200,theCPUputsthevalueofR5ontheaddressbus.Thelocation0x40000200isintheSRAM(seeFigure2-4).Thus, theSRAMputs thecontentsof location0x40000200onthedatabus.TheCPUgetsthecontentsoflocation0x40000200throughthedatabusandbringsintoCPUandputsitinR2.

The “STR Rx,[Rd]” instruction is executed similarly. The CPU puts Rd on theaddressbusandthecontentsofRxonthedatabus.Thememorylocationwhoseaddressisontheaddressbusreceivesthecontentsofdatabus.

Littleendianvs.bigendianwar

ExaminetheplacingofthecodeintheARMprogrammemory,showninFigure2-22.The lowbyte goes to the lowmemory location, and the highbyte goes to the highmemory address. This convention is called little endian to contrast it with big endian.Figure2-23showsstoring thesamedatausingbigendianconvention.Theoriginof thetermsbigendianandlittleendianisfromanargumentinaGulliver’sTravelsstoryoverhowaneggshouldbeopened:fromthebigendorthelittleend.Inthebigendianmethod,thehighbytegoestothelowaddress,whereasinthelittleendianmethod,thehighbytegoestothehighaddressandthelowbytetothelowaddress.AllIntelmicroprocessorsand

Page 91: ARM Assembly Language: Programming and Architecture

many microcontrollers use the little endian convention. Freescale (formerly Motorola)microprocessors,alongwithsomemainframes,usebigendian.Thedifferencemightseemastrivialaswhethertobreakaneggfromthebigendorthelittleend,butitisanuisanceinconvertingsoftwarefromonecamptoberunonacomputeroftheothercamp.Manymicroprocessors,includingtheARM,letthesoftwaredesignerchooselittleendianorbigendianconvention.

Figure2-22:ARMProgramMemoryContentsforProgram2-4ListFile(LittleEndian)

Figure2-23:BigEndianConvention

ReviewQuestions

1.IntheARM,theprogramcounteris______bitswide.

2.Trueorfalse.EverymemberoftheARMfamilywakesupatmemory0x00000000whenitispoweredup.

3.AtwhatROMlocationdowestorethefirstopcodeofanARMprogram?

4.Trueorfalse.AlltheinstructionsintheARMare4-byteinstructions.

Page 92: ARM Assembly Language: Programming and Architecture

5.Trueorfalse.ARM9andnewerarchitecturesusevonNeumannarchitecture.

6.Trueorfalse.ARM7andolderarchitecturesusevonNeumannarchitecture.

Page 93: ARM Assembly Language: Programming and Architecture
Page 94: ARM Assembly Language: Programming and Architecture

Section2.9:SomeARMAddressingModesTheCPUcanaccessoperands(data)invariousways,calledaddressingmodes.The

number of addressing modes is determined when the microprocessor is designed andcannotbechanged.Usingadvancedaddressingmodestheaccessingofdifferentdatatypesanddatastructures(e.g.arrays,pointers,classes)arediscussedinChapter6.SomeofthesimpleARMaddressingmodesare:

1.register

2.immediate

3.registerindirect(indexedaddressingmode)

Registeraddressingmode

The register addressingmode involves the use of registers to hold the data to bemanipulated.Memoryisnotaccessedwhenthisaddressingmodeisexecuted;therefore,itisrelativelyfast.SeeFigure2-24.

Figure2-24:RegisterAddressingMode

Examplesofregisteraddressingmodeareasfollow:

MOVR6,R2;copythecontentsofR2intoR6

ADDR1,R1,R3;addthecontentsofR3tocontentsofR1

SUBR7,R7,R2;subtractR2fromR7

Immediateaddressingmode

Intheimmediateaddressingmode, thesourceoperandisaconstant.In immediateaddressingmode, as the name implies, when the instruction is assembled, the operandcomes immediately after the opcode. For this reason, this addressing mode executesquickly.SeeFigure2-25.Examples:

MOVR9,#0x25;move0x25intoR9

MOVR3,#62;loadthedecimalvalue62intoR3

ADDR6,R6,#0x40;add0x40toR6

Figure2-25:ImmediateAddressingMode

Inthefirsttwoaddressingmodes,theoperandsareeitherinsidethemicroprocessor

Page 95: ARM Assembly Language: Programming and Architecture

ortaggedalongwiththeinstruction.Inmostprograms,thedatatobeprocessedisofteninsomememorylocationoutsidetheCPU.Therearemanywaysofaccessingthedatainthedatamemoryspace.Thefollowingdescribesoneofthemethods.

RegisterIndirectAddressingMode(Indexedaddressingmode)

Intheregisterindirectaddressingmode,theaddressofthememorylocationwheretheoperandresidesisheldbyaregister.SeeFigure2-26.Forexample:

STRR5,[R6];moveR5intothememorylocation

;pointedtobyR6

LDRR10,[R3];moveintoR10thecontentsofthe

;memorylocationpointedtobyR3.

Figure2-26:RegisterIndirectAddressingMode

SampleUsage:Registerindirectaddressingmode

Usingregisterindirectaddressingmodewecanimplementthedifferentpointers.Sincetheregistersare32-bittheycanaddresstheentirememoryspace.HereyouseeasimplecodeinCanditsequivalentinAssembly:

CLanguage:char*ourPointer;

ourPointer=(char*)0x12456;//Pointtolocation12456

*ourPointer=25;//store25inlocation0x12456

ourPointer++;//pointtonextlocation

AssemblyLanguage:LDRR2,=0x12456;pointtolocation0x12456

MOVR0,#25;R0=25

STRBR0,[R2];storeR0inlocation0x12456

ADDR2,R2,#1;incrementR2topointtonextlocation

Dependingonthedatatypethatthepointerpointsto,STR/LDR,STRH/LDRH,orSTRB/LDRBmightbeused.Intheaboveexample,sinceitpointstochar(whichis8-bit)STRBisused.

Page 96: ARM Assembly Language: Programming and Architecture

SeeChapter6formoreadvancedaddressingmodes.

ReviewQuestions

1.CantheARMprogrammermakeupnewaddressingmodes?

2.Whichregisterscanbeusedfortheregisterindirectaddressingmode?

3.Whereisthedatalocatedinimmediateaddressingmode?

Page 97: ARM Assembly Language: Programming and Architecture
Page 98: ARM Assembly Language: Programming and Architecture

Section2.10:RISCArchitectureinARMThere are three ways available to microprocessor designers to increase the

processingpoweroftheCPU:

1. Increasetheclockfrequencyof thechip.Onedrawbackof thismethodisthatthehigherthefrequency,themorepowerandheatdissipation.Powerandheatdissipationisespeciallyaproblemforhand-helddevices.

2. UseHarvardarchitecturebyincreasingthenumberofbusestobringmoreinformation(codeanddata)intotheCPUtobeprocessed.Whileinthecaseofx86 and other general purpose microprocessors this architecture is veryexpensiveandunrealistic,intoday’smicrocontrollersthisisnotaproblem.AswesawinSection2.8,someoftheARMchipshaveHarvardarchitecture.

3. Change the internalarchitectureof theCPUandusewhat iscalledRISCarchitecture.

ARMhasusedall threemethods to increase theprocessingpowerof theARMmicrocontrollers.InthissectionwediscussthemeritsofRISCarchitecture.

In theearly1980s,acontroversybrokeout in thecomputerdesigncommunity,butunlikemostcontroversies, itdidnotgoaway.Since the1960s, inallmainframesandminicomputers,designersputasmanyinstructionsastheycouldthinkofintotheCPU.Someoftheseinstructionsperformedcomplextasks.Anexampleisaddingdatamemory locations and storing the sum into memory. Naturally, microprocessordesignersfollowedtheleadofminicomputerandmainframedesigners.Becausethesemicroprocessorsused sucha largenumberof instructions,manyofwhichperformedhighly complex activities, they came to be known as CISC (complex instruction setcomputer) processors. According to several studies in the 1970s, many of thesecomplex instructions etched into CPUs were never used by programmers andcompilers. The huge cost of implementing a large number of instructions (some ofthem complex) into the microprocessor, plus the fact that a good portion of thetransistorsonthechipareusedbytheinstructiondecoder,madesomedesignersthinkofsimplifyingandreducingthenumberofinstructions.Asthisconceptdeveloped,theresultingprocessorscametobeknownasRISC(reducedinstructionsetcomputer).

FeaturesofRISC

The following are some of the features ofRISC as implemented by theARMmicrocontroller.

Feature1

RISCprocessorshaveafixedinstructionsize.InaCISCmicroprocessorssuchasthex86,instructionscanbe1,2,3,oreven5bytes.Forexample,lookatthefollowinginstructionsinthex86:

CLRC;clearCarryflag,a1-byteinstruction

ADDAccumulator,#mybyte;a2-byteinstruction

Page 99: ARM Assembly Language: Programming and Architecture

LJMPtarget_address;a5-byteinstruction

This variable instruction size makes the task of the instruction decoder verydifficult because the size of the incoming instruction is never known. In a RISCarchitecture, the size of all instructions is fixed. Therefore, theCPU can decode theinstructionsquickly.This is likeabricklayerworkingwithbricksof thesamesizeasopposed tousingbricksofvariablesizes.Ofcourse, it ismuchmoreefficient tousebricks of the same size. In the last section we saw how the ARM uses 4-byteinstructions and if not all the 32 bits are needed to form the instruction it fillswithzeros.

Feature2

One of the major characteristics of RISC architecture is a large number ofregisters.AllRISCarchitectureshaveat least8or16registers.Ofthese16registers,onlya fewareassigned toadedicatedfunction.Oneadvantageofa largenumberofregistersisthatitavoidstheneedforalargestacktostoreparameters.AlthoughastackisimplementedonaRISCprocessor,itisnotasessentialasinCISCbecausesomanyregistersareavailable.InARMtheuseofalargenumberofgeneralpurposeregisters(GPRs)satisfiesthisRISCfeature.ThestackfortheARMiscoveredinChapter6.

Feature3

RISCprocessorshavea small instruction set.RISCprocessorshaveonlybasicinstructions such as ADD, SUB, MUL, LOAD, STORE, AND, OR, EOR, CALL,JUMP,andsoon.ThelimitednumberofinstructionsisoneofthecriticismsleveledattheRISCprocessorbecauseitmakesthejobofAssemblylanguageprogrammersmuchmoretediousanddifficultcomparedtoCISCAssemblylanguageprogramming.Thisisone reason that RISC is used more commonly in high-level language environmentssuchastheCprogramminglanguageratherthanAssemblylanguageenvironments.ItisinterestingtonotethatsomedefendersofCISChavecalledit“completeinstructionsetcomputer”insteadof“complexinstructionsetcomputer”becauseithasacompletesetofeverykindofinstruction.Howmanyoftheseinstructionsareusedandhowoftenisanothermatter.The limitednumberof instructions inRISC leads toprograms thatare large. Although these programs can use more memory, this is not a problembecausememoryischeap.Beforetheadventofsemiconductormemoryinthe1960s,however, CISC designers had to pack as much action as possible into a singleinstruction toget themaximumbangfor theirbuck. In theARMwehavearound50instructions. We will examine more of the instruction set for the ARM in futurechapters.

Feature4

At this point, one might ask, with all the difficulties associated with RISCprogramming, what is the gain? The most important characteristic of the RISCprocessoristhatmorethan99%ofinstructionsareexecutedwithonlyoneclockcycle,incontrasttoCISCinstructions.Evensomeofthe1%oftheRISCinstructionsthatareexecuted with two clock cycles can be executed with one clock cycle by juggling

Page 100: ARM Assembly Language: Programming and Architecture

instructionsaround(codescheduling).CodeschedulingisdiscussedinChapter7.WewillexaminetheinstructioncycletimeandpipeliningoftheARMinChapter7.

Feature5

RISCprocessorshaveseparatebusesfordataandcode.Inallthex86processors,likeallotherCISCcomputers,thereisonesetofbusesfortheaddress(e.g.,A0–A31inthe 80386) and another set of buses for data (e.g., D0–D31 in the 80386) carryingopcodes and operands in and out of the CPU. To access any section of memory,regardlessofwhetheritcontainscodeordataoperands,thesameaddressbusanddatabusareused.InmanyRISCprocessors, therearefoursetsofbuses:(1)asetofdatabusesforcarryingdata(operands)inandoutoftheCPU,(2)asetofaddressbusesforaccessingthedata,(3)asetofbusestocarrytheopcodes,and(4)asetofaddressbusesto access the opcodes. The use of separate buses for code and data operands iscommonlyreferred toasHarvardarchitecture.WeexaminedtheHarvardarchitectureoftheARMintheprevioussection.

Feature6

Because CISC has such a large number of instructions, each with so manydifferentaddressingmodes,microinstructions(microcode)areusedtoimplementthem.TheimplementationofmicroinstructionsinsidetheCPUemploysmorethan40–60%oftransistors inmanyCISCprocessors.RISCinstructions,however,dueto thesmallsetof instructions, are implementedusing thehardwiremethod.HardwiringofRISCinstructionstakesnomorethan10%ofthetransistors.

Feature7

RISC uses load/store architecture. In CISC microprocessors, data can bemanipulatedwhile it is still inmemory. For example, in instructions such as “ADDReg,Memory”, themicroprocessormust bring the contents of the external memorylocationintotheCPU,addittothecontentsoftheregister,thenmovetheresultbacktotheexternalmemorylocation.Theproblemistheremightbeadelayinaccessingthedatafromexternalmemory.Thenthewholeprocesswouldbestalled,preventingotherinstructions fromproceeding in thepipeline. InRISC,designersdidawaywith thesekindsof instructions. InRISC, instructionscanonly load fromexternalmemory intoregisters or store registers into externalmemory locations.There is nodirectwayofdoingarithmeticand logicoperationsbetweena register and thecontentsofexternalmemory locations. All these instructions must be performed by first bringing bothoperands into the registers inside the CPU, then performing the arithmetic or logicoperation,andthensendingtheresultbacktomemory.Thisideawasfirstimplementedby the Cray 1 supercomputer in 1976 and is commonly referred to as load/storearchitecture. In the last section, we saw that the arithmetic and logic operations arebetweentheGPRsregisters,butnoneinvolvesamemorylocation.Forexample,thereisno“ADDR1,RAM-Loc”instructioninARM.

In concluding this discussion of RISC processors, it is interesting to note thatRISC technologywasexploredby thescientistsat IBMin themid-1970s,but itwas

Page 101: ARM Assembly Language: Programming and Architecture

DavidPattersonof theUniversityofCaliforniaatBerkeleywho in1980brought themeritsofRISCconcepts totheattentionofcomputerscientists.Itmustalsobenotedthat in recent years CISC processors such as the Pentium have used some RISCfeatures in their design. This was the only way they could enhance the processingpowerof thex86processorsandstaycompetitive.Ofcourse, theyhad touse lotsoftransistorstodothejob,becausetheyhadtodealwithalltheCISCinstructionsofthex86processorsandthelegacysoftwareofDOS/Windows.

ReviewQuestions

1.WhatdoRISCandCISCstandfor?

2.Trueorfalse.TheCISCarchitectureexecutesthevastmajorityofitsinstructionsin2,3,ormoreclockcycles,whileRISCexecutestheminoneclock.

3.RISCprocessorsnormallyhavea_____(large,small)numberofgeneral-purposeregisters.

4.Trueorfalse.Instructionssuchas“ADDR16,ROMmemory”donotexistinRISCmicroprocessorssuchastheARM.

5.HowmanyinstructionsofARMare32-bitwide?

6.Trueorfalse.WhileCISCinstructionsareofvariablesizes,RISCinstructionsareallthesamesize.

7.WhichofthefollowingoperationsdonotexistfortheADDinstructioninRISC?

(a)registertoregister(b)immediatetoregister(c)memorytomemory

8. Trueorfalse.Harvardarchitectureusesthesameaddressanddatabusestofetchbothcodeanddata.

Page 102: ARM Assembly Language: Programming and Architecture
Page 103: ARM Assembly Language: Programming and Architecture

Section2.11:ViewingRegistersandMemorywithARMKeilIDETheARMmicroprocessorhasgreattoolsandsupportsystems,manyofthemfree

orinexpensive.ARMKeiluVisionisanassemblerandsimulatorprovidedforfreebyKeil Corporation and can be downloaded from the www.keil.com website. Seehttp://www.MicroDigitalEd.com for tutorials on how to use the Keil ARM uVisionassemblerandsimulator.

ManyassemblersandCcompilerscomewithasimulator.Simulatorsallowustoview the contents of registers and memory after executing each instruction (single-stepping). It is strongly recommended to use a simulator to single-step some of theprograms in this chapter and future chapters. Single-stepping a program with asimulatorgivesusadeeperunderstandingofmicrocontrollerarchitecture,inadditiontothefactthatwecanuseittofindtheerrorsinourprograms.

Figures2-27and2-28showscreenshots forARMsimulators fromARMKeiluVision.

Figure2-27:KeiluVisionScreenshot

Page 104: ARM Assembly Language: Programming and Architecture

Figure2-28:KeiluVisionScreenshot

Page 105: ARM Assembly Language: Programming and Architecture

ProblemsSection2.1:TheGeneralPurposeRegistersintheARM

1.ARMisa(n)_____-bitmicroprocessor.

2.Thegeneralpurposeregistersare_____bitswide.

3.ThevalueinMOVR2,#valueis_____bitswide.

4.ThelargestnumberthatanARMGPRregistercanhaveis_____inhex.

5.Whatistheresultofthefollowingcodeandwhereisitkept?

MOVR2,#0x15

MOVR1,#0x13

ADDR2,R1,R2

6.Whichofthefollowingsis(are)illegal?

(a)MOVR2,#0x50000(b)MOVR2,#0x50(c)MOVR1,#0x00

(d)MOVR1,255(e)MOVR17,#25(f)MOVR23,#0xF5

(g)MOV123,0x50

7.Whichofthefollowingis(are)illegal?

(a)ADDR2,#20,R1(b)ADDR1,R1,R2(c)ADDR5,R16,R3

8.Whatistheresultofthefollowingcodeandwhereisitkept?

MOVR9,#0x25

ADDR8,R9,#0x1F

9.Whatistheresultofthefollowingcodeandwhereisitkept?

MOVR1,#0x15

ADDR6,R1,#0xEA

10.Trueorfalse.Wehave32generalpurposeregistersintheARM.

Section2.2:TheARMMemoryMap

11.Trueorfalse.R13andR14arespecialfunctionregisters.

12.Trueorfalse.Theperipheralregistersaremappedtomemoryspace.

13.Trueorfalse.TheOn-chipFlashisthesamesizeinallmembersofARM.

14.Trueorfalse.TheOn-chipdataSRAMisthesamesizeinallmembersofARM.

15.WhatisthedifferencebetweentheEEPROManddataSRAMspaceintheARM?

16.CanwehaveanARMchipwithnoEEPROM?

Page 106: ARM Assembly Language: Programming and Architecture

17.CanwehaveanARMchipwithnodataRAM?

18.WhatisthemaximumnumberofbytesthattheARMcanaccess?

19.Findtheaddressofthelastlocationofon-chipFlashforeachofthefollowing,assumingthefirstlocationis0:

(a)ARMwith32KB(b)ARMwith8KB

(c)ARMwith64KB(d)ARMwith16KB

(e)ARMwith128KB(f)ARMwith256KB

20.Showthelowestandhighestvalues(inhex)thattheARMprogramcountercantake.

21.AgivenARMhas0x7FFFastheaddressofthelastlocationofitson-chipROM.Whatisthesizeofon-chipFlashforthisARM?

22.RepeatQuestion21for0x3FFF.

23.Findtheon-chipprogrammemorysizeinKfortheARMchipwiththefollowingaddressranges:

(a)0x0000–0x1FFF(b)0x0000–0x3FFF

(c)0x0000–0x7FFF(d)0x0000–0xFFFF

(e)0x0000–0x1FFFF(f)0x00000–0x3FFFF

24.Findtheon-chipprogrammemorysizeinKfortheARMchipswiththefollowingaddressranges:

(a)0x00000–0xFFFFFF(b)0x00000–0x7FFFF

(c)0x00000–0x7FFFFF(d)0x00000–0xFFFFF

(e)0x00000–0x1FFFFF(f)0x00000–0x3FFFFF

Section2.3:LoadandStoreInstructionsinARM

25.Showasimplecodetostorevalues0x30and0x97intolocations0x20000015and0x20000016,respectively.

26.Showasimplecodetoloadthevalue0x55intolocations0x20000030–0x20000038.

27.Trueorfalse.WecannotloadimmediatevaluesintothedataSRAMdirectly.

28.Showasimplecodetoloadthevalue0x11intolocations0x20000010–0x20000015.

29.RepeatProblem28,exceptloadthevalueintolocations0x20000034–0x2000003C.

Page 107: ARM Assembly Language: Programming and Architecture

Section2.4:ARMCPSR(CurrentProgramStatusRegister)

30.Thestatusregisterisa(n)______-bitregister.

31.WhichbitsofthestatusregisterareusedfortheCandZflagbits,respectively?

32.WhichbitsofthestatusregisterareusedfortheVandNflagbits,respectively?

33.IntheADDinstruction,whenisCraised?

34.IntheADDinstruction,whenisZraised?

35.WhatisthestatusoftheCandZflagsafterthefollowingcode?

LDRR0,=0xFFFFFFFF

LDRR1,=0xFFFFFFF1

ADDSR1,R0,R1

36.FindtheCflagvalueaftereachofthefollowingcodes:

(a)LDRR0,=0xFFFFFF54

LDRR5,=0xFFFFFFC4

ADDSR2,R5,R0

(b)MOVR3,#0

LDRR6,=0xFFFFFFFF

ADDSR3,R3,R6

(c)LDRR3,=0xFFFFFFFF

LDRR8,=0xFFFFFF05

ADDSR2,R3,R8

37.Writeasimpleprograminwhichthevalue0x55isadded5times.

Section2.5:ARMDataFormatandDirectives

38.Statethevalue(inhex)usedforeachofthefollowingdata:

MYDAT_1EQU55

MYDAT_2EQU98

MYDAT_3EQU‘G’

MYDAT_4EQU0x50

MYDAT_5EQU200

MYDAT_6EQU‘A’

MYDAT_7EQU0xAA

MYDAT_8EQU255

MYDAT_9EQU2_10010000

MYDAT_10EQU2_01111110

MYDAT_11EQU10

MYDAT_12EQU15

Page 108: ARM Assembly Language: Programming and Architecture

39.Statethevalue(inhex)foreachofthefollowingdata:

DAT_1EQU22

DAT_2EQU0x56

DAT_3EQU2_10011001

DAT_4EQU32

DAT_5EQU0xF6

DAT_6EQU2_11111011

40.Showasimplecodetoloadthevalue0x10102265intolocations0x40000030–0x4000003F.

41.Showasimplecodeto(a)loadthevalue0x23456789intolocations0x40000060–0x4000006F,and(b)addthemtogetherandplacetheresultinR9asthevaluesareadded.UseEQUtoassignthenamesTEMP0–TEMP3tolocations0x40000060–0x4000006F.

Section2.6:IntroductiontoARMAssemblyProgrammingand

Section2.7:AssemblinganARMProgram

42.Assemblylanguageisa________(low,high)-levellanguagewhileCisa________(low,high)-levellanguage.

43.OfCandAssemblylanguage,whichismoreefficientintermsofcodegeneration(i.e.,theamountofprogrammemoryspaceituses)?

44.Whichprogramproducestheobjfile?

45.Trueorfalse.Thesourcefilehastheextension“asm”.

46.Trueorfalse.Thesourcecodefilecanbeanon-ASCIIfile.

47.Trueorfalse.EverysourcefilemusthaveEQUdirective.

48.DotheEQUandENDdirectivesproduceopcodes?

49.Whyarethedirectivesalsocalledpseudocode?

50.Thefilewiththe_______extensionisdownloadedintoARMFlashROM.

51.GivethreefileextensionsproducedbyARMKeil.

Section2.8:TheProgramCounterandProgramROMSpaceintheARM

52.EveryARMfamilymemberwakesupataddress_________whenitispoweredup.

53.Aprogrammerputsthefirstopcodeataddress0x100.Whathappenswhenthe

Page 109: ARM Assembly Language: Programming and Architecture

microcontrollerispoweredup?

54.ARMinstructionsare_______bytes.

55.Writeaprogramtoaddeachofyour5-digitIDtoaregisterandplacetheresultintomemorylocation0x4000100.UsetheprogramlistingtoshowtheFlashmemoryaddressesandtheircontents.

56.Showtheplacementofdatainfollowingcode:

LDRR1,=0x22334455

LDRR2,=0x20000000

STRR1,[R2]

Usea)littleendianandb)bigendian.

57.Showtheplacementofdatainfollowingcode:

LDRR1,=0xFFEEDDCC

LDRR2,=0x2000002C

STRR1,[R2]

Usea)littleendianandb)bigendian.

58.HowwideisthememoryintheARMchip?

59.HowwideisthedatabusbetweentheCPUandtheprogrammemoryintheARM7chip?

60.In“ADDRd,Rn,operand2”,explainhowmanybitsaresetasideforRdandhowitcoverstheentireGPRsintheARMchip.

Section2.9:SomeARMAddressingModes

61.Givetheaddressingmodeforeachofthefollowing:

(a)MOVR5,R3 (b)MOVR0,#56

(c)LDRR5,[R3] (d)ADDR9,R1,R2

(e)LDRR7,[R2] (f)LDRBR1,[R4]

62.Showthecontentsofthememorylocationsaftertheexecutionofeachinstruction.

(a)LDRR2,=0x129F

LDRR1,=0x1450

LDRR2,[R1]

(b)LDRR4,=0x8C63

LDRR1,=0x2400

LDRHR4,[R1]

0x1450=(…….) 0x2400=(…….)

Page 110: ARM Assembly Language: Programming and Architecture

0x1451=(…….) 0x2401=(…….)

Section2.10:RISCArchitectureinARM

63.WhatdoRISCandCISCstandfor?

64.In______(RISC,CISC)architecturewecanhave1-,2-,3-,or4-byteinstructions.

65.In______(RISC,CISC)architectureinstructionsarefixedinsize.

66.In______(RISC,CISC)architectureinstructionsaremostlyexecutedinoneortwocycles.

67.In______(RISC,CISC)architecturewecanhaveaninstructiontoADDaregistertoexternalmemory.

68.Trueorfalse.MostinstructionsinCISCareexecutedinoneortwocycles.

Page 111: ARM Assembly Language: Programming and Architecture

AnswerstoReviewQuestionsSection2.1

1.MOVR2,#0x34

2.

MOVR1,#0x16

MOVR2,#0xCD

ADDR1,R1,R2

or

MOVR1,#0x16

ADDR1,R1,#0xCD

3.False

4.FFinhexand255indecimal

5.32

Section2.2

1.True

2.general-purposeregisters

3.32

4.Specialfunctionregisters(SFRs)

5.32

Section2.3

1.True

2.

MOVR1,#0x20

MOVR2,#0x95

STRBR2,[R1]

3.STRR2,[R8]

4.

MOVR1,#0x20

LDRR4,[R1]

5.FFinhexor255indecimal

6.R6

Page 112: ARM Assembly Language: Programming and Architecture

7.Itcopiesthelower8bitsofR1intolocationpointedtobyR2.

8.FFFFFFFFinhexor4,294,967,295indecimal(232-1)

Section2.4

1.CPSR(currentprogramstatusregister)

2.32bits

3.

Hex Binary

FFFFFF9F 11111111111111111111111110011111

+00000061 +00000000000000000000000001100001

100000000 100000000000000000000000000000000

ThisleadstoC=1andZ=1.

4.

Hex Binary

00000022 00000000000000000000000000100010

+00000022 +00000000000000000000000000100010

000000000 00000000000000000000000001000100

ThisleadstoZ=0.

5.

Hex Binary

00000067 00000000000000000000000001100111

+00000099 +00000000000000000000000010011001

00000100 00000000000000000000000100000000

ThisleadstoC=0andZ=0.

Section2.5

1.MOVR1,#0x20

Page 113: ARM Assembly Language: Programming and Architecture

2.(a)MOVR2,#0x14(b)MOVR2,#20(c)MOVR2,#2_00010100

3.Ifthevalueistobechangedlater,itcanbedoneonceinoneplaceinsteadofateveryoccurrenceandthecodebecomesmorereadable,aswell.

4.(a)0x34(b)0x1F

5.15indecimal(0x0Finhex)

6.Valueoflocation0x00000200=0x95

7.0x0C+0x10=0x1Cwillbeindatamemorylocation0x00000630.

Section2.6

1.TherealworkisperformedbyinstructionssuchasMOVandADD.Pseudo-instructions,alsocalledAssemblydirectives,directtheassemblerindoingitsjob.

2.Theinstructionmnemonics,pseudo-instructions

3.False

4.Allexcept(c)

5.Assemblerdirectives

6.True

7.(c)

Section2.7

1.True

2.True

3.(a)

4.(b)and(d)

Section2.8

1.32

2.True

3.0x00000000

4.True

5.False

6.True

Section2.9

1.No

2.Thegeneralpurposeregisters(R0toR15)

Page 114: ARM Assembly Language: Programming and Architecture

3.Itisapartoftheinstruction

Section2.10

1.RISCisreducedinstructionsetcomputer;CISCstandsforcomplexinstructionsetcomputer.

2.True

3.Large

4.True

5.Allofthem

6.True

7.(c)

8.False

Page 115: ARM Assembly Language: Programming and Architecture
Page 116: ARM Assembly Language: Programming and Architecture
Page 117: ARM Assembly Language: Programming and Architecture

Chapter3:ArithmeticandLogicInstructionsandProgramsIn this chapter, most of the arithmetic and logic instructions are discussed and

programexamples are given to illustrate the applicationof these instructions.Unsignednumbersareusedinthisdiscussionofarithmeticandlogicinstructions.InSection3.1weexamine the arithmetic instructions for unsigned numbers. The logic instructions andprograms are covered in Section 3.2. Section 3.3 is dedicated to rotate and shiftoperations.Weexamineloadingfixed(constant)valuesintoregistersusingrotateoptions,as well. In Section 3.4 we discuss the ARM Cortex instructions for rotate and shift.Section3.5isdedicatedtoBCDandASCIIdataconversion.

Page 118: ARM Assembly Language: Programming and Architecture
Page 119: ARM Assembly Language: Programming and Architecture

Section3.1:ArithmeticInstructionsUnsignednumbersaredefinedasdatainwhichallthebitsareusedtorepresentdata

andnobitsaresetasideforthepositiveornegativesign.Thismeansthattheoperandcanbe between 00 and 0xFF (0 to 255 decimal) for 8-bit data and between 0x0000 and0xFFFF(0to65535decimal)for16-bitdata.Forthe32-bitoperanditcanbebetween0and0xFFFFFFFF (0 to232 -1).SeeTable3-1.This sectioncovers theADD,SUB,andmultiplyinstructionsforunsignednumber.

DataSize Bits Decimal Hexadecimal Loadinstructionused

Byte 8 0–255 0-0xFF STRB

Half-word 16 0–65535 0-0xFFFF STRH

Word 32 0–232-1 0–0xFFFFFFFF STR

Table3‑1:UnsignedDataRangeSummaryinARM

AffectingflagsinARMinstructions

AuniquefeatureoftheexecutionofARMarithmeticinstructionsisthatitdoesnotaffect(updates)theflagsunlesswespecifyit.Thisisdifferentfromothermicrocontrollersand CPUs such as 8051 and x86. In the x86 and 8051 the arithmetic instructionsautomaticallychangetheZandCflagsregardlessofwewantitornot.Thisisnotthecasewith ARM. The default for the ARM instruction is not to affect these flags after theexecutionofarithmeticinstructionssuchasADDandSUB.TheARMassemblergivesustheoptionof telling theCPU toupdate the flagbits in theCSPR register to reflect theresult.WeoverridethedefaultbyhavingletterSintheinstruction.Forexample,wemustuseSUBSinsteadofSUBsincetheSUBinstructionwillnotupdatetheflags.TheSUBSmeanssubtractandsettheflags,whiletheSUBsimplysubtractswithouthavinganyeffectontheflags.SeeTable3-2andFigure3-1.

Figure3-1:GeneralFormationofDataProcessingInstruction

Instruction(Flagsunchanged) Instruction(Flagsupdated)

ADD Add ADDS Addandsetflags

ADC Addwithcarry ADCS Addwithcarryandsetflags

SUB SUBS SUBS Subtractandsetflags

SBC Subtractwithcarry SBCS Subtractwithcarryandsetflags

MUL Multiply MULS Multiplyandsetflags

Page 120: ARM Assembly Language: Programming and Architecture

UMULL Multiplylong UMULLS MultiplyLongandsetflags

RSB Reversesubtract RSBS Reversesubtractandsetflags

RSC Reversesubtractwithcarry RSCS Reversesubtractwithcarryandsetflags

Note:TheaboveinstructionaffectalltheZ,C,VandNflagbitsofCPSR(currentprogramstatusregister)buttheNandVflagsareforsigneddataandarediscussedinChapter5.

Table3-2:ArithmeticInstructionsandFlagBitsforUnsignedData

Additionofunsignednumbers

TheformoftheADDinstructionis

ADDRd,Rn,Op2;Rd=Rn+Op2

The instructions ADD and ADC are used to add two operands. The destinationoperandmustbearegister.TheOp2operandcanbearegisterorimmediate.Rememberthatmemory-to-registerormemory-to-memoryarithmeticandlogicoperationsareneverallowedinARMAssembly languagesince it isaRISCprocessor.The instructioncouldchangeanyoftheZ,C,N,orVbitsofthestatusflagregister,aslongasweusetheADDSinsteadofADD.TheeffectoftheADDSinstructionontheoverflow(V)andN(negative)flagsisdiscussedinChapter5sincetheyareusedinsignednumberoperations.LookatExamples3-1and3-2fortheeffectofADDSinstructiononZandCflags.

Example3-1

Showtheflagbitsofstatusregisterforthefollowingcases:

a)LDRR2,=0xFFFFFFF5;R2=0xFFFFFFF5(noticethe=sign)

MOVR3,#0x0B

ADDSR1,R2,R3;R1=R2+R3andupdatetheflags

b)LDRR2,=0xFFFFFFFF

ADDSR1,R2,#0x95;R1=R2+95andupdatetheflags

Solution:

a)

0xFFFFFFF5 11111111111111111111111111110101

Page 121: ARM Assembly Language: Programming and Architecture

+0x0000000B

+00000000000000000000000000001011

0x100000000 100000000000000000000000000000000

First,noticehowthe“LDRR2,=0xFFFFFFF5”pseudo-instructionloadsthe32-bitvalueintoR2register.AlsonoticetheuseofADDSinstructioninsteadofADDsincetheADDinstructiondoesnotupdatetheflags.Now,aftertheaddition,theR1register(destination)contains0andtheflagsareasfollows:

C=1,sincethereisacarryoutfromD31

Z=1,theresultoftheactioniszero(forthe32bits)

b)

0xFFFFFFFF 11111111111111111111111111111111

+0x00000095

+00000000000000000000000010010101

0x100000094 100000000000000000000000010010100

Aftertheaddition,theR1register(destination)contains0x94andtheflagsareasfollows:

C=1,sincethereisacarryoutfromD31

Z=0,theresultoftheactionisnotzero(forthe32bits)

Example3-2

Showtheflagbitsofstatusregisterforthefollowingcase:

LDRR2,=0xFFFFFFF1;R2=0xFFFFFFF1

MOVR3,#0x0F

ADDSR3,R3,R2;R3=R3+R2andupdatetheflags

ADDR3,R3,#0x7;R3=R3+0x7andflagsunchanged

MOVR1,R3

Solution:

0xFFFFFFF1 11111111111111111111111111110001

Page 122: ARM Assembly Language: Programming and Architecture

+0x0000000F +00000000000000000000000000001111

0x100000000 100000000000000000000000000000000

AftertheADDSaddition,theR3register(destination)contains0andtheflagsareasfollows:

C=1,sincethereisacarryoutfromD31

Z=1,theresultoftheactioniszero(forthe32bits)

Afterthe“ADDR3,R3,#0x7”addition,theR3register(destination)contains0x7(0x+07=07)andtheflagsareunchangedfrompreviousinstructionsinceweusedADDinsteadofADDS.Therefore,theZ=1andC=1.Ifweuse“ADDSR3,R3,#0x7”instructioninsteadof“ADDR3,R3,#0x7”,wewillhaveZ=0andC=0.UsetheKeilARMsimulatortoverifythis.

Comment

MicrosoftWindowscomeswithacalculator.Use it toverify thecalculations in thisandfuturechapters.Thecalculatorsupportsdatasizeofupto64-bit

NoincrementinstructioninARM

There is no increment instruction in theARMprocessor. Insteadwe useADD toperformthisaction.Theinstruction“ADDSR4,R4,#1”willincrementtheR4andplacestheresultinR4register.TheRISCprocessorseliminatetheunnecessaryinstructionsanduseanexistinginstructiontoperformtheoperation.

ADC(addwithcarry)

Thisinstructionisusedformultiword(datalargerthan32-bit)numbers.TheformoftheADCinstructionis

ADCRd,Rn,Op2;Rd=Rn+Op2+C

Indiscussingaddition,thefollowingtwocaseswillbeexamined:

1.Additionofindividualworddata

2.Additionofmultiworddata

CASE1:Additionofindividualworddata

Page 123: ARM Assembly Language: Programming and Architecture

Inpreviousexamples,inprogramsregardingaddition,thetotalsumwaspurposelykeptlessthan0xFFFFFFFF,themaximumvaluea32-bitregistercanhold.Inrealworldto calculate the total sumof anynumberof operands, the carry flag shouldbe checkedaftertheadditionofeachoperandtoseeifthetotalsumisgreaterthan0xFFFFFFFF.SeeExample3-3andProgram3-1.

Example3-3

Showtheflagbitsofstatusregisterforthefollowingcase:

LDRR2,=0xFFFFFFF1;R2=0xFFFFFFF1

MOVR3,#0x0F

ADDSR3,R3,R2;R3=R3+R2andupdatetheflags

ADDR3,R3,#0x7;R3=R3+0x7andflagsunchanged

MOVR1,R3

Solution:

0xFFFFFFF1 11111111111111111111111111110001

+0x0000000F +00000000000000000000000000001111

0x100000000 100000000000000000000000000000000

AftertheADDSaddition,theR3register(destination)contains0andtheflagsareasfollows:

C=1,sincethereisacarryoutfromD31

Z=1,theresultoftheactioniszero(forthe32bits)

Afterthe“ADDR3,R3,#0x7”addition,theR3register(destination)contains0x7(0x0+0x7=0x7)andtheflagsareunchangedfrompreviousinstructionsinceweusedADDinsteadofADDS.Therefore,theZ=1andC=1.Ifweuse“ADDSR3,R3,#0x7”instructioninsteadof“ADDR3,R3,#0x7”,wewillhaveZ=0andC=0.UsetheKeilARMsimulatortoverifythis.

Page 124: ARM Assembly Language: Programming and Architecture

Program3-1Writeaprogramtocalculatethetotalsumoffivewordsofdata.Eachdatavaluerepresentsthemassofaplanetininteger.Thedecimaldataareasfollow:1000000000,2000000000,3000000000,4000000000,and

4100000000.

AREAPROG3_1,CODE,READONLY

ENTRY

LDRR1,=1000000000

LDRR2,=2000000000

LDRR3,=3000000000

LDRR4,=4000000000

LDRR5,=4100000000

MOVR8,#0;R8=0forsavingthelowerword

MOVR9,#0;R9=0foraccumulatingthecarries

ADDSR8,R8,R1;R8=R8+R1

ADCR9,R9,#0;R9=R9+0+Carry

;(incrementR9ifthereiscarry)

ADDSR8,R8,R2;R8=R8+R2

ADCR9,R9,#0;R9=R9+0+Carry

ADDSR8,R8,R3;R8=R8+R3

ADCR9,R9,#0;R9=R9+0+Carry

ADDSR8,R8,R4;R8=R8+R4

ADCR9,R9,#0;R9=R9+0+Carry

ADDSR8,R8,R5;R8=R8+R5

ADCR9,R9,#0;R9=R9+0+Carry

HEREBHERE

END;Markendoffile

CASE2:Additionofmultiwordnumbers

AssumeaprogramisneededthatwilladdthetotalU.S.budgetforthelast100years

Page 125: ARM Assembly Language: Programming and Architecture

or themass of all the planets in the solar system. In cases like this, the numbers beingaddedcouldbeupto8byteswideormore.SinceARMregistersareonly32bitswide(4bytes),itisthejoboftheprogrammertowritethecodetobreakdowntheselargenumbersinto smaller chunks to be processed by the CPU. If a 32-bit register is used and theoperand is 8 bytes wide, that would take a total of two iterations. See Example 3-4.However,ifa16-bitregisterisused,thesameoperandswouldrequirefouriterations.Thisobviouslytakesmoretimefor theCPU.This isonereasontohavewideregisters in thedesignoftheCPU.

Example3-4

Analyzethefollowingprogramwhichadds0x35F62562FAto0x21F412963B:

LDRR0,=0xF62562FA;R0=0xF62562FA

LDRR1,=0xF412963B;R1=0xF412963B

MOVR2,#0x35;R2=0x35

MOVR3,#0x21;R3=0x21

ADDSR5,R1,R0;R5=0xF62562FA+0xF412963B

;nowC=1

ADCR6,R2,R3;R6=R2+R3+C

;=0x35+21+1=0x57

Solution:

AftertheR5=R0+R1thecarryflagisone.SinceC=1,whenADCisexecuted,R6 = R2 + R3 + C =0x35+0x21+1=0x57.

MicrosoftWindowscalculatorsupportdatasizeofup64-bit(doubleword).Useittoverifytheabovecalculations.

Subtractionofunsignednumbers

SUBRd,Rn,Op2;Rd=Rn-Op2

Insubtraction,theARMmicroprocessors(indeed,almostallmodernCPUs)usethe2’s complementmethod.Although everyCPU contains adder circuitry, itwould be too

Page 126: ARM Assembly Language: Programming and Architecture

cumbersome(and take toomany logicgates) todesignseparate subtractorcircuitry.Forthis reason, theARMuses internal adder circuitry toperform the subtractionoperation.AssumingthattheARMisexecutingsimplesubtractinstructions,onecansummarizethestepsofthehardwareoftheCPUinexecutingtheSUBinstructionforunsignednumbersasfollows:

1.Takethe2’scomplementofthesubtrahend(Op2operand).

2.Addittotheminuend(Rnoperand).

3.PlacetheresultindestinationRd.

4.Setthecarryflagifthereisacarry.

ThesefourstepsareperformedforeverySUBSinstructionbytheinternalhardwareoftheARMCPU.Itisafterthesefourstepsthattheresultisobtainedandtheflagsareset.Examples3-5through3-7illustratesthefoursteps.

Example3-5

Showthestepsinvolvedforthefollowingcases:

a)

MOVR2,#0x4F;R2=0x4F

MOVR3,#0x39;R3=0x39

SUBSR4,R2,R3;R4=R2–R3

b)

MOVR2,#0x4F;R2=0x4F

SUBSR4,R2,#0x05;R4=R2–0x05

Solution:

a)

0x4F 0000004F

–0x39 +FFFFFFC7 2’scomplementof0x39

16 100000016 (C=1step4)

Theflagswouldbesetasfollows:C=1,andZ=0.Theprogrammermustlookatthecarryflag(notthesignflag)todetermineiftheresultispositiveornegative.

Page 127: ARM Assembly Language: Programming and Architecture

b)

0x4F 0000004F

–0x05 +FFFFFFFB 2’scomplementof0x05

0x4A 10000004A (C=1step4)

Example3-6

Analyzethefollowinginstructions:

LDRR2,=0x88888888;R2=0x88888888

LDRR3,=0x33333333;R3=0x33333333

SUBSR4,R2,R3;R4=R2–R3

Solution:

Followingarethestepsfor“SUBR4,R2,R3”:

88888888 88888888

–33333333 +CCCCCCCD (2’scomplementof0x33333333)

55555555 155555555 (C=1step4)resultispositive

Notice that, unlikex86CPUs,ARMdoesnot invert the carry flagafterSUBSsoC=0 when there is borrow and C=1 when there is no borrow. It means that after theexecutionofSUBS,ifC=1,theresultispositive;ifC=0,theresultisnegativeandthedestination has the 2’s complement of the result. Normally, the result is left in 2’scomplement,butyoucantakethe2’scomplementoftheresultbyinvertingitandaddingonetoit.

Example3-7

Analyzethefollowinginstructions:

MOVR1,#0x4C;R1=0x4C

MOVR2,#0x6E;R2=0x6E

SUBSR0,R1,R2;R0=R1–R2

Page 128: ARM Assembly Language: Programming and Architecture

Solution:

Followingarethestepsfor“SUBR0,R1,R2”:

4C 0000004C

–6E +FFFFFF92 (2’scomplementof0x6E)

–22 0FFFFFFDE (C=0step4)resultisnegative

SBC(subtractwithborrow)

SBCRd,Rn,Op2;Rd=Rn–Op2–1+C

This instruction is used for subtraction of multiword (data larger than 32-bit)numbers. Notice that in some other architectures, the CPU inverts the C flag aftersubtraction so the content of carry flag is theborrowbit of subtract operation. In thosearchitectures the subtractwith borrow is implemented as “Rd=Rn –Op2 –C” but inARMthecarryflagisnotinvertedaftersubtractionandcarryflagisinvertofborrow.Toinvertthecarryflagwhilerunningthesubtractwithborrowinstructionitisimplementedas“Rd = Rn – Op2 –1+ C”SeeExample3-8.

Example3-8

Analyzethefollowingprogramwhichsubtracts0x21F62562FAfrom0x35F412963B:

LDRR0,=0xF62562FA;R0=0xF62562FA,

;noticethesyntaxforLDR

LDRR1,=0xF412963B;R1=0xF412963B

MOVR2,#0x21;R2=0x21

MOVR3,#0x35;R3=0x35

SUBSR5,R1,R0;R5=R1–R0

;=0xF412963B–0xF62562FA,andC=0

SBCR6,R3,R2;R6=R3–R2–1+C

;=0x35–0x21–1+0=0x13

Solution:

AftertheR5=R1–R0thereisaborrowsothecarryflagiscleared.SinceC=0,whenSBCisexecuted,R6=R3–R2–1+C=0x35–0x21–1+0=0x35–0x21–1=0x13.

Page 129: ARM Assembly Language: Programming and Architecture

NodecrementinstructioninARM

There isnodecrement instruction in theARMprocessor. InsteadweuseSUB toperform the action. The instruction “SUB R4,R4,#1” will decrement one from R4 andplaces the result in R4 register. The RISC processors eliminate the unnecessaryinstructionsanduseanexistinginstructiontoperformthedesiredoperation.

RSB(reversesubtract)

TheformatfortheRSBinstructionis

RSBRd,Rn,Op2;Rd=Op2–Rn

Notice the differencebetween theRSBandSUB instruction.They are essentiallythesameexcept thewaythesourceoperandsaresubtractedisreversed.This instructioncanbeusedtoget2’scomplementofa32-bitoperand.SeeExample3-9.

Example3-9

FindtheresultofR0forthefollowings:

a)

MOVR1,#0x6E;R1=0x6E

RSBR0,R1,#0;R0=0–R1

b)

MOVR1,#0x1;R1=1

RSBR0,R1,#0;R0=0–R1=0–1

Solution:

a)Followingarethestepsfor“RSBR0,R1,#0”:

0 0000000

–6E +FFFFFF92 (2’scomplement)

Page 130: ARM Assembly Language: Programming and Architecture

–6E FFFFFF92 (C=0)resultisnegative

b)Thisisonewaytogetafixedvalueof0xFFFFFFFFinaregister.Therefore,wehaveR0=0xFFFFFFFF.

RSC(reversesubtractwithcarry)

TheformoftheRSCinstructionis

RSCRd,Rn,Op2;Rd=Op2–Rn–1+C

Notice thedifferencebetween theRSBandRSCinstructions.Theyareessentiallythesameexcept thewaythesourceoperandsaresubtractedisreversed.This instructioncanbeusedtogetthe2’scomplementofthe64-bitoperand.SeeExample3-10.

Example3-10

Showhowtocreate2’scomplementofa64-bitdatainR0andR1register.TheR0holdthelower32-bit.

Solution:

LDRR0,=0xF62562FA;R0=0xF62562FA

LDRR1,=0xF812963B;R1=0xF812963B

RSBR5,R0,#0;R5=0–R0

;=0–0xF62562FA=9DA9D06andC = 0

RSCR6,R1,#0;R6=0–R1–1+C

;=0–0xF812963B – 1 + 0=7ED69C4

UseMicrosoftWindowscalculatortoverifytheabovecalculations.

Multiplicationanddivisionofunsignednumbers

Not all CPUs have instructions for multiplication and division. All the ARMprocessorshave amultiplication instructionbut not thedivision.Some familymemberssuchasARMCortexhaveboththedivisionandmultiplicationinstructions.Inthissectionwe examine the multiplication of unsigned numbers. Signed numbers multiplication istreatedinChapter5.

MultiplicationofunsignednumbersinARM

TheARMgivesyou twochoicesofunsignedmultiplication:normalmultiplyandlongmultiply.Thenormalmultiplyinstruction(MUL)isusedwhentheresultislessthan32-bit,whilethelongmultiply(MULL)mustbeusedwhentheresultisgreaterthan32-bit.SeeTable3-3.Inthissectionweexaminebothofthem.

Page 131: ARM Assembly Language: Programming and Architecture

Instruction Source1 Source2 Destination Result

MUL Rn Op2 Rd(32bits) Rd=Rn×Op2

UMULL Rn Op2 RdLo,RdHi(64bits) RdLo:RdHi=Rn×Op2

Note1:UsingMULforword×wordmultiplicationprovidesonlythelower32-bitresultinRdandtherestaredroppediftheresultisgreaterthan32-bit.Iftheresultisgreaterthan0xFFFFFFFF,thenwemustuseUMULL(unsignedMultiplyLong)instruction.

Note2:inword-by-wordmultiplicationusingMULinstruction,iftheresultisgreaterthan32-bitonlythelower32-bitissavedbyARMandtheupperpartisdroppedwithoutsettinganyflag.InsomeCPUstheCflagisusedtoindicatetheresultisgreaterthan32-bitbutthisisnotthecasewithARM.

Table3-3:UnsignedMultiplication(UMULRd,Rn,Op2)Summary

MUL(multiply)

MULRd,Rn,Op2;Rd=Rn×Op2

Innormalmultiplication,theoperandsmustbeinregisters.Afterthemultiplication,thedestinationregisterswillcontaintheresult.Seethefollowingexample:

MOVR1,#0x25;R1=0x25

MOVR2,#0x65;R2=0x65

MULR3,R1,R2;R3=R1×R2=0x65×0x25

Note that in the case of half-word times half-word or smaller sources since thedestinationregisteris32-bitthereisnoprobleminkeepingtheresultof65,535×65,535,the highest possible unsigned 16-bit data. That is not the case in word times wordmultiplicationbecause32-bit×32-bitcanproducearesultgreaterthan32-bit.IftheMULinstruction is used, the destination register will hold the lower word (32-bit) and theportion beyond 32-bit is dropped. So it is not safe to use MUL for multiplication ofnumbersgreater than65,536. Inmanymicroprocessorswhen theresultgoesbeyond thedestination register size, the C flag is raised to indicate that. This is not the casewithARM.Seethefollowingexample:

LDRR1,=100000;R1=100,000

LDRR2,=150000;R2=150,000

MULR3,R2,R1;R3isnot15,000,000,000because

;itcannotfitin32bits.

For this reason wemust use UMULL (unsignedmultiply long) instruction if theresultisgoingtobegreaterthan0xFFFFFFFF.

UMULL(unsignedmultiplylong)

UMULLRdLo,RdHi,Rn,Op2

;RdHi:RdLoRd=Rn×Op2

In unsigned long multiplication, the operands must be in registers. After themultiplication, the destination registerswill contain the result.Notice that the leftmost

Page 132: ARM Assembly Language: Programming and Architecture

register,RdLoinourcase,willholdthelowerwordandthehigherportionbeyond32-bitissavedinthesecondregister,RdHi.Seethefollowingexample:

LDRR1,=0x54000000;R1=0x54000000

LDRR2,=0x10000002;R2=0x10000002

UMULLR3,R4,R2,R1;0x54000000×0x10000002

;=0x054000000A8000000

;R3=0xA8000000,thelower32bits

;R4=0x05400000,thehigher32bits

Notice that it is the job of programmer to choose the best type ofmultiplicationdependingonthesizeofoperandsandtheresult.SeeExample3-11.

Example3-11

Writeashortprogramtomultiply0xFFFFFFFFbyitself.

Solution:

MOVR0,#1;R0=1

RSBR1,R0,#0;R1=0–R0

;=0xFFFFFFFF

RSBR2,R0,#0;R2=0xFFFFFFFF

UMULLR3,R4,R1,R2

Since 0xFFFFFFFF × 0xFFFFFFFF = 0xFFFFFFFE00000001, then R4=0xFFFFFFFEandR3=0x00000001.IfwehadusedMULinstruction,thenthe0xFFFFFFFFwouldhavebeendroppedandonly0x00000001wouldhavebeenkeptbythedestinationregister.

MultiplyandAccumulateInstructionsinARM

Insomeapplicationsuchasdigitalsignalprocessing(DSP)weneedtomultiplytworegistersandaddtheresultwithanotherregister.TheARMhasaninstructiontodobothjobs in a single instruction. The format of MLA (multiply and add) instruction is asfollows:

MLARd,Rm,Rs,Rn;Rd=Rm×Rs+Rn

Inmultiplicationandadd,theoperandsmustbeinregisters.Afterthemultiplicationandadd,thedestinationregisterwillcontaintheresult.Seethefollowingexample:

MOVR1,#100;R1=100

MOVR2,#5;R2=5

Page 133: ARM Assembly Language: Programming and Architecture

MOVR3,#40;R3=40

MLAR4,R1,R2,R3;R4=R1×R2+R3=100×5+40=540

Noticethatmultiplyandaddcanproducearesultgreaterthan32-bit, if theMLAinstruction is used, the destination register will hold the lower word (32-bit) and theportion beyond 32-bit is dropped. For this reason we must use UMLAL (unsignedmultiplyandadd long) instruction if theresult isgoing tobegreater than0xFFFFFFFF.TheformatofUMLALinstructionisasfollows:

UMLALRdLo,RdHi,Rn,Op2;RdHi:RdLo=Rn×Op2+RdHi:RdLo

Inmultiplicationandadd,theoperandsmustbeinregisters.Noticethattheaddendandthehighwordofthedestinationusethesameregisters,thetwoleftmostregistersintheinstruction.ItmeansthatthecontentsoftheregisterswhichhavetheaddendwillbechangedafterexecutionofUMLALinstruction.Seethefollowingexample:

LDRR1,=0x34000000;R1=0x34000000

LDRR2,=0x2000000;R2=0x2000000

EORR3,R3,R3;R3=0x00

LDRR4,=0x00000BBB;R4=0x00000BBB

UMLALR4,R3,R2,R1;0x34000000×0x2000000+0xBBB

;=0x068000000000000BBB

DivisionofunsignednumbersinARM

SomeARMfamilies donot have an instruction for divisionof unsignednumberssinceittakestoomanygatestoimplementit.WecanuseSUBinstructiontoperformthedivision. In the next chapter, after explaining conditional branches, we will show anexampleofunsigneddivisionusingsubtractoperation.

ReviewQuestions

1.ExplainthedifferencebetweenADDSandADDinstructions.

2.TheADCinstructionthathasthesyntax“ADCRd,Rn,Op2”means_____________.

3.ExplainwhytheZ=0forthefollowing:

MOVR2,#0x4F

MOVR4,#0xB1

ADDSR2,R4,R2

4.ExplainwhytheZ=1forthefollowing:

MOVR2,#0x4F

LDRR4,=0xFFFFFFB1

Page 134: ARM Assembly Language: Programming and Architecture

ADDSR2,R4,R2

5.ShowhowtheCPUwouldsubtract0x05from0x43.

6.IfC=1,R2=0x95,andR3=0x4Fpriortotheexecutionof“SBCR2,R2,R3”,whatwillbethecontentsofR2afterthesubtraction?

7.Inunsignedmultiplicationof“MULR2,R3,R4”,theproductwillbeplacedinregister__________.

8.Inunsignedmultiplicationof“MULR1,R2,R4”,theR2canbemaximumof___________ifR4=0xFFFFFFFF.

Page 135: ARM Assembly Language: Programming and Architecture
Page 136: ARM Assembly Language: Programming and Architecture

Section3.2:LogicInstructionsInthissectionwediscussthelogicinstructionsAND,OR,andEx-ORinthecontext

ofmanyexamples. Just likearithmetic instruction,wemustuse theS in the instructionsyntaxifwewanttoupdatetheflags.IftheSsyntaxisusedtheZflagwillbesetifandonlyiftheresultisallzeros,andtheNflagwillbesettothelogicalvalueofbit31oftheresult.TheVflagintheCPSRwillbeunaffected,andtheCflagwillbesettothecarryoutfromthebarrelshifterwhichisdiscussedinthenextsection.SeeTable3-4.

Instruction

(FlagsUnchanged)Action

Instruction

(FlagsChanged)Hexadecimal

AND ANDing ANDS Andingandsetflags

ORR ORRing ORS Oringandsetflags

EOR Exclusive-ORing EORS ExclusiveOringandsetflags

BIC BitClearing BICS Bitclearingandsetflags

Table3-4:LogicInstructionsandFlagBits

TheinstructionformatoflogicinstructionsinARMissimilartotheformatofotherdataprocessinginstructions.SeeFigure3-2.

Figure3-2:GeneralFormationofDataProcessingInstruction

AND

ANDRd,Rn,Op2;Rd=RnANDedOp2

Inputs Output Symbol

X Y XANDY

0 0 0

0 1 0

1 0 0

1 1 1

ThisinstructionwillperformabitwiselogicalANDontheoperandsandplacetheresult in thedestination.Thedestinationand the first sourceoperandsare registers.Thesecondsourceoperandcanbearegisteroranimmediatevalueoflessthan0xFF.

Page 137: ARM Assembly Language: Programming and Architecture

IfweuseANDSinsteadofANDitwillchangetheCandZflagsaccordingtotheresult.AsseeninExample 3-12,ANDcanbeusedtomaskcertainbitsoftheoperand.

Example3-12

Showtheresultsofthefollowingcases

a)

MOVR1,#0x35

ANDR2,R1,#0x0F;R2=R1ANDedwith0x0F

b)

MOVR0,#0x97

MOVR1,#0xF0

ANDR2,R0,R1;R2=R0ANDedwithR1

Solution:

a)

0x3500110101

AND0x0F00001111

0x0500000101

b)

0x9710010111

AND0xF011110000

0x9010010000

ORR

ORRRd,Rn,Op2;Rd=RnORedOp2

Inputs Output Symbol

X Y XORY

0 0 0

0 1 1

1 0 1

Page 138: ARM Assembly Language: Programming and Architecture

1 1 1

TheoperandsareORedandtheresultisplacedinthedestination.ORRcanbeusedtosetcertainbitsofanoperandtoone.Thedestinationandthefirstsourceoperandsareregisters.Thesecondsourceoperandcanbeeitheraregisteroranimmediatevalueoflessthan0xFF.

IfweuseORRSinsteadofORR,theflagswillbeupdated,justthesameasfortheANDSinstruction.SeeExample3-13.

Example3-13

Showtheresultsofthefollowingcases:

a)

MOVR1,#0x04;R1=0x04

ORRSR2,R1,#0x68;R2=R1ORed0x68

b)

MOVR0,#0x97

MOVR1,#0xF0

ORRR2,R0,R1;R2=R0ORedwithR1

Solution:

a)

0x0400000100

OR0x6801101000Flagwillbe:Z=0

0x6C01101100

b)

0x9710010111

OR0xF011110000Flagwillbeunchanged

0xF711110111

The ORR instruction can also be used to test for a zero operand. For example,“ORRSR2,R2,#0”willORtheregisterR2withzeroandmakeZ=1ifR2iszero.

Page 139: ARM Assembly Language: Programming and Architecture

EOR

EORRd,Rn,Op2;Rd=RnEx-ORedwithOp2

Inputs Output Symbol

X Y XEORY

0 0 0

0 1 1

1 0 1

1 1 0

TheEOR instructionwill Exclusive-OR the operands and place the result in thedestinationregister.EORsetstheresultbitsto1iftheyarenotequal;otherwise,theyarereset to 0. The flags are updated if we use EORS instead of EOR. The rules for theoperandsarethesameasintheANDandORinstructions.SeeExamples3-14and3-15.

Example3-14

Showtheresultsofthefollowing:

MOVR1,#0x54

EORR2,R1,#0x78;R2=R1ExOredwith0x78

Solution:

0x5401010100

XOR0x7801111000

0x2C00101100

Example3-15

TheEORinstructioncanbeusedtoclearthecontentsofaregisterbyEx-ORingitwithitself.

Showhow“EORSR1,R1,R1”clearsR1,assumingthatR1=0x45.

Solution:

0x4501000101

Page 140: ARM Assembly Language: Programming and Architecture

XOR0x4501000101

0x0000000000

EOR can also be used to see if two registers have the same value. “EORSR2,R3,R4”willmakeZ=1ifbothregistersR4andR3havethesamevalue,andiftheydo,theresult(00000000)issavedinR2,thedestination.

Another widely used application of EOR is to toggle bits of an operand. Forexample,totogglebit2ofregisterR2:

EORR2,R2,#0x04;EORR2with00000100

Thiswouldcausebit2ofR2tochangetotheoppositevalue;allotherbitswouldremainunchanged.

BIC(bitclear)

BICRd,Rn,Op2;clearcertainbitsofRnspecifiedby

;theOp2andplacetheresultinRd

Inputs Output

X Y XAND(NOTY)

0 0 0

0 1 0

1 0 1

1 1 0

TheBIC(bitclear) instruction isused toclear theselectedbitsof theRnregister.TheselectedbitsareheldbyOp2.ThebitsthatareHIGHinOp2willbeclearedandbitswith LOW will be left unchanged. For example, assuming that R3 = 00001000 theinstruction “BIC R2,R2,R3” will clear bit 3 of R2 and leaves the rest of the bitsunchanged.Inreality,theBICinstructionperformsANDoperationonRnregisterwiththecomplement ofOp2 and places the result in destination register. Look at the followingexample:

MOVR1,#0x0F

MOVR2,#0xAA

BICR3,R2,R1;nowR3=0xAAANDedwith0xF0=0xA0

Ifwewanttheflagstobeupdated,thenwemustuseBICSinsteadofBIC.

Page 141: ARM Assembly Language: Programming and Architecture

MVN(movenegative)

MVNRd,Rn;movenegativeofRntoRd

TheMVN(movenegative)instructionisusedtogenerateone’scomplementofanoperand.Forexample,theinstruction“MVNR2,#0”willmakeR2=0xFFFFFFFF.Lookatthefollowingexample:

LDRR2,=0xAAAAAAAA;R2=0xAAAAAAAA

MVNR2,R2;R2=0x55555555

WecanalsouseEx-ORinstructiontogenerateone’scomplementofanoperand.Ex-ORinganoperandwith0xFFFFFFFFwillgeneratethe1’scomplement.Seethefollowingcode:

LDRR2,=0xAAAAAAAA;R2=0xAAAAAAAA

MVNR0,#0;R0=0xFFFFFFFF

EORR2,R2,R0;R2=R2ExORedwith0xFFFFFFFF

;=0x55555555

Itmustbenotedthattheinstruction“MVNRd,#0”iswidelyusedtoloadthefixedvalueof0xFFFFFFFFintodestinationregister.Wecanusethe“LDRRd,=0xFFFFFFFF”pseudo-instruction to do the same thing, but theARMassemblerwill substitute severalrealARMinstructionsinitsplaceandthereforeittakesmorecodespace.

ReviewQuestions

1.Useoperands0x4FCAand0xC237toperform:

(a)AND(b)OR(c)XOR

2.ANDingawordoperandwith0xFFFFFFFFwillresultinwhatvalueforthewordoperand?Tosetallbitsofanoperandto0,itshouldbeANDedwith______.

3.Tosetallbitsofanoperandto1,itcouldbeORedwith_______.

4.XORinganoperandwithitselfresultsinwhatvaluefortheoperand?

5.Writeaninstructionthatsetsbit4ofR7.

6.Writeaninstructionthatclearsbit3ofR5.

Page 142: ARM Assembly Language: Programming and Architecture
Page 143: ARM Assembly Language: Programming and Architecture

Section3.3:RotateandBarrelShifter Although ARM Cortex has shift and rotate instruction, for the ARM7 we can

perform the shift and rotate operations as part of other instructions such as MOV. Inprevious sections we discussed that as the second argument of process instructions(arithmetic and logic instructions) we can use register or immediate values. In otherwords,theprocessinstructionscanbeusedinoneofthefollowingforms:

1.opcodeRd,Rn,Rs(e.g.ADDR1,R2,R3)

2.opcodeRd,Rn,immediateValue(e.g.ADDR2,R3,#5)

ARMisabletoshiftorrotatethesecondargumentbeforeusingitastheargument.Inthissectionwefirstdiscussshiftingandrotatingonregistersandthenwecovershiftingimmediatevalues.

BarrelShifter

Therearetwokindsofshifts:logicalandarithmetic.Thelogicalshiftisforunsignedoperandsandthearithmeticshiftisforsignedoperands.Logicalshiftwillbediscussedinthis section and the discussionof arithmetic shift is covered inChapter 5.UsingMOVinstructionsinARMonecanshift thecontentsofaregisterrightor left.Thenumberoftimes(orbits)thattheoperandisshiftedcanbespecifieddirectlyorthrougharegister.

LSR

Thisisthelogicalshiftright.Theoperandisshiftedrightbitbybit,andforeveryshift the LSB (least significant bit) will go to the carry flag (C) and the MSB (mostsignificantbit)isfilledwith0.Onecanuseanimmediateoperandoraregistertoholdthenumberoftimesitistobeshifted.Examples3-16and3-17shouldhelptoclarifyLSR.

Example3-16

ShowtheresultofLSRinthefollowing:

MOVR0,#0x9A;R0=0x9A

MOVSR1,R0,LSR#3;shiftR0toright3times

;andthenmove(copy)theresulttoR1

Solution:

0x9A=000000000000000000000000000000010011010

firstshift:000000000000000000000000000000001001101C=0

secondshift:000000000000000000000000000000000100110C=1

thirdshift:000000000000000000000000000000000010011C=0

Page 144: ARM Assembly Language: Programming and Architecture

Aftershiftingrightthreetimes,R1=0x00000013andC=0.Anotherwaytowritetheabovecodeis:

MOVR0,#0x9A

MOVR2,#0x03

MOVR1,R0,LSRR2;shiftR0torightR2times

;andmovetheresulttoR1

Example3-17

ShowtheresultsofLSRinthefollowing:

TIMESEQU0x4

LDRR1,=0x777;R1=0x777

MOVR2,#TIMES;R2=0x04

MOVSR3,R1,LSRR2;shiftR1rightR2numberoftimes

;andplacetheresultinR3

Solution:

Afterfourshifts,theR3willcontain0x77.ThefourLSBsarelostthroughthecarry,onebyone,and0sfillthefourMSBs.

AlthoughLSRdoesaffecttheSandZflags,theyarenotimportantinthiscase.Butnoticethatifyouwanttheflagstobeset,usetheMOVSinstructioninsteadofMOV.

OnecanusetheLSRtodivideanumberby2.SeeExample3-18.

Example3-18

ShowtheresultsofLSRinthefollowing:

LDRR0,=0x88;R0=0x88

MOVSR1,R0,LSR#3;shiftR0rightthreetimes(R1=0x11)

Solution:

Afterthethreeshifts,theR1willcontain0x11.Thisdividesthenumberby8since2tothepower3is8.

Page 145: ARM Assembly Language: Programming and Architecture

LSL

Shiftleftisalsoalogicalshift.ItisthereverseofLSR.Aftereveryshift,theLSBisfilledwith0andtheMSBgoestoC.AlltherulesarethesameasforLSR.Onecanuseanimmediateoperandora register tohold thenumberof times it is tobeshifted left.SeeExample3-19.OnecanusetheLSLtomultiplyanumberby2.SeeExample3-20.

Example3-19

ShowtheeffectsofLSLinthefollowing:

LDRR1,=0x0F000006

MOVSR2,R1,LSL#8

Solution:

00001111000000000000000000000110

C=000011110000000000000000000001100(shiftedleftonce)

C=000111100000000000000000000011000

C=001111000000000000000000000110000

C=011110000000000000000000001100000

C=111100000000000000000000011000000

C=111000000000000000000000110000000

C=110000000000000000000001100000000

C=100000000000000000000011000000000(shiftedeighttimes)

Aftereightshiftsleft,theR2registerhas0x00000600andC=1.TheeightMSBsarelostthroughthecarry,onebyone,and0sfilltheeightLSBs.Anotherwaytowritetheabovecodeis:

LDRR1,=0x0F000006

MOVR0,#0x08

MOVR2,R1,LSLR0

Page 146: ARM Assembly Language: Programming and Architecture

Example3-20

ShowtheresultsofLSLinthefollowing:

TIMESEQU0x5

LDRR1,#0x7;R1=0x7

MOVR2,#TIMES;R2=0x05

MOVR1,R1,LSLR2;shiftR1leftR2numberoftimes

;andplacetheresultinR1

Solution:

Afterthefiveshifts,theR1willcontain0x000000E0.0xE0is224indecimal.Noticethatitmultipliesnumberbypowerof2.7×32=224=0xE0since2tothepowerof5is32.

Table3-5liststhelogicalshiftoperationsinARM.

Operation Destination Source Numberofshifts

LSR(ShiftRight) Rd Rn Immediatevalue

LSR(ShiftRight) Rd Rn registerRm

LSL(ShiftLeft) Rd Rn Immediatevalue

LSL(ShiftLeft) Rd Rn registerRm

Note:Numberofshiftcannotbemorethan32.

Table3-5:LogicShiftoperationsforunsignednumbersinARM

ArithmeticshiftrightASR

ThearithmeticshiftASRisusedforsignednumbersandisdiscussedinChapter5.

Rotatingthebitsofanoperandrightandleft

Therearetwotypesofrotations.Oneisasimplerotationofthebitsoftheoperand,andtheotherisarotationthroughthecarry.Eachisexplainedbelow.

ROR(rotateright)

Page 147: ARM Assembly Language: Programming and Architecture

Inrotateright,asbitsareshiftedfromlefttorighttheyexitfromtherightend(LSB)andentertheleftend(MSB).Inaddition,aseachbitexitstheLSB,acopyofitisgiventothecarryflag.Inotherwords,inRORtheLSBismovedtotheMSBandisalsocopiedtoCflag,asshowninthediagram.Onecanuseanimmediateoperandoraregistertoholdthenumberoftimesitistoberotated.

MOVR1,#0x36

;R1=00000000000000000000000000110110

MOVSR1,R1,ROR#1

;R1=00000000000000000000000000011011C=0

MOVSR1,R1,ROR#1

;R1=10000000000000000000000000001101C=1

MOVSR1,R1,ROR#1

;R1=11000000000000000000000000000110C=1

or:

MOVR1,#0x36

;R1=00000000000000000000000000110110

MOVR0,#3

;R0=3numberoftimestorotate

MOVSR1,R1,RORR0

;R1=11000000000000000000000000000110C=1

alsolookatthefollowingcase:

LDRR2,=0xC7E5

;R2=00000000000000001100011111100101

MOVR4,#0x06

;R4=6numberoftimestorotate

MOVSR3,R2,RORR4

;R3=10010100000000000000001100011111C=1

Rotateleft

ThereisnorotateleftoptioninARM7sinceonecanusetherotateright(ROR)todothejob.Thatmeansinsteadofrotatingleftnbitswecanuserotateright32–nbitstodothe job of rotate left. Using this method does not give us the proper carry if actualinstructionofROLwasavailable.Lookatthefollowingexamples:

LDRR0,=0x00000072

Page 148: ARM Assembly Language: Programming and Architecture

;R0=00000000000000000000000001110010

MOVSR0,R0,ROR#31

;R0=00000000000000000000000011100100C=0

MOVSR0,R0,ROR#31

;R0=00000000000000000000000111001000C=0

MOVSR0,R0,ROR#31

;R0=00000000000000000000001110010000C=0

MOVSR0,R0,ROR#31

;R0=00000000000000000000011100100000C=0

or:

MOVR0,#0x72;R0=01110010

MOVR1,#28;R1=32–4=28

MOVSR0,R0,RORR1;R0=011100100000C=0

;assumeR2=0x672A

LDRR2,=0x671A

;R2=00000000000000000110011100101010

MOVSR2,R2,ROR#27

;R2=00000000000011001110010101000000

;C=0

RRXrotaterightthroughcarry

InRRX,asbitsareshiftedfromlefttoright,theyexitfromtherightend(LSB)tothecarryflag,and thecarryflagenters the leftend(MSB). Inotherwords, inRRXtheLSBismovedtoCandCismovedtotheMSB.Inreality,Cflagactsasifitispartoftheoperand.ThatmeanstheRRXislike33-bitregistersincetheCflagis33thbit.TheRRXtakesnoargumentsandthenumberoftimesanoperandtoberotatedisfixedatone.

;assumeC=0

MOVR2,#0x26

;R2=00000000000000000000000000100110

MOVSR2,R2,RRX

;R2=00000000000000000000000000010011C=0

Page 149: ARM Assembly Language: Programming and Architecture

MOVSR2,R2,RRX

;R2=00000000000000000000000000001001C=1

MOVSR2,R2,RRX

;R2=10000000000000000000000000000100C=1

MOVR2,#0x0F

;R2=00000000000000000000000000001111

MOVSR2,R2,RRX

;R2=00000000000000000000000000000111C=1

MOVSR2,R2,RRX

;R2=10000000000000000000000000000011C=1

MOVSR2,R2,RRX

;R2=11000000000000000000000000000001C=1

Table3-6liststherotateinstructionsoftheARM.

Operation Destination Source NumberofRotates

ROR(RotateRight) Rd Rn Immediatevalue

ROR(RotateRight) Rd Rn registerRm

RRX(RotateRightThroughCarry) Rd Rn 1bit

Table3-6:RotateoperationsforunsignednumbersinARM

ShiftingImmediateArguments

Wejustexaminedtherotateandshiftoperations.Onecanusetherotateoperationtoloadfixed(constant)valuesintoARMregister,aswell.ExaminetheMOVinstructionbitassignmentinFigure3-3.Ofthe32-bitopcode,theupper12bits(D31–D20)areusedfortheopcodeitself.Thelowest8(D7–D0)bitsareusedforfixedvaluesand4bits(D11–D8)areusedforthenumberoftimesrotateoperationisperformedbeforeloadedintotheregister.The8bitsforthefixedvaluegivesus00to255(00to0xFFinhex)rangeandthe4bitsoftherotatenumbergivesus0to15(0to0xFinhex)range.TheMOVinstructionscanbeconstructedtorotatean8-bitfixedvaluetotherightanumberoftimesandthenloadedintotheregister.Thenumberoftimesrotatedrightisalwaystwicethenumberintherotateportionoftheinstruction.Sincerotatevaluecanbe0–15thatgivesnumberofrotationsbetween0–30.Thismeans thatwhenever the secondoperand is an immediatevaluethenumberofrotationisanevennumber.

Page 150: ARM Assembly Language: Programming and Architecture

Figure3-3:MOVInstruction

Aswementioned earlier, theARM supports the rotate right only and there is norotateleftoption.Sotodorotateleftwemustuse32-nforthenumberofrotationright.Therefore, “MOV R0,#0xFF,#30” is the same as rotating left 2 times and “MOVR0,#0xFF,#28”isthesameasrotatingleft4times.SeeExamples3-21through3-23.

Example3-21

ShowallthepossiblecasesofusingMOVinstructionfor0xFFvalueandrotateoptions.

Solution:

MOVR0,#0xFF,#0

;R0=0xFFisrotatedright0times.R0=0x000000FF

MOVR0,#0xFF,#2

;R0=0xFFisrotatedright2times.R0=0xC000003F

MOVR0,#0xFF,#4

;R0=0xFFisrotatedright4times.R0=0xF000000F

MOVR0,#0xFF,#6

;R0=0xFFisrotatedright6times.R0=0xFC000003

MOVR0,#0xFF,#8

;R0=0xFFisrotatedright8times.R0=0xFF000000

MOVR0,#0xFF,#10

;R0=0xFFisrotatedright10times.R0=0x3FC00000

MOVR0,#0xFF,#12

;R0=0xFFisrotatedright12times.R0=0x0FF00000

MOVR0,#0xFF,#14

;R0=0xFFisrotatedright14times.R0=0x03FC0000

MOVR0,#0xFF,#16

;R0=0xFFisrotatedright16times.R0=0x00FF0000

MOVR0,#0xFF,#18

;R0=0xFFisrotatedright18times.R0=0x003FC000

MOVR0,#0xFF,#20

;R0=0xFFisrotatedright20times.R0=0x000FF000

Page 151: ARM Assembly Language: Programming and Architecture

MOVR0,#0xFF,#22

;R0=0xFFisrotatedright22times.R0=0x0003FC00

MOVR0,#0xFF,#24

;R0=0xFFisrotatedright24times.R0=0x0000FF00

MOVR0,#0xFF,#26

;R0=0xFFisrotatedright26times.R0=0x00003FC0

MOVR0,#0xFF,#28

;R0=0xFFisrotatedright28times.R0=0x00000FF0

MOVR0,#0xFF,#30

;R0=0xFFisrotatedright30times.R0=0x000003FC

Example3-22

UsingMOVinstruction,showhowtorotateleftthefixedvalueof0x99totalof(a)4,(b)8,and(c)16times.Alsogivethevalueintheregisteraftertherotation.

Solution:

Sincewedonothaverotateleftoperationwemustuserotateright32–ntimes.

MOVR1,#0x99,#28

;rotatingright28timesisthesameasrotateleft4times

MOVR2,#0x99,#24

;rotatingright24timesisthesameasrotateleft8times

MOVR3,#0x99,#16

;rotatingright16timesisthesameasrotateleft16times

Now,wehave(a)R1=0x00000990,(b)R2=0x00009900,and(c)R3=0x00990000

Example3-23

Page 152: ARM Assembly Language: Programming and Architecture

UsingtheKeilIDE,assembletheprograminExample3-21andexaminethelistfile.Compareandcontrastthecountvalueintheinstructionwithcountvalueoftheopcodes.

Solution:

Asexpected,thenumberoftimesrotatedrightistwicethenumberofrotatefield.

AlsoseeExample3-24forfurtherexamplesofrotateoperation.

Example3-24

Givetheregistervalueforeachofthefollowinginstructionsafteritisexecuted.

Page 153: ARM Assembly Language: Programming and Architecture

MOVR0,#0xAA,#2

MOVR1,#0x20,#28

MOVR4,#0x99,#6

MOVR2,#0x55,#24

MOVR3,#0x01,#20

MOVR7,#0x80,#12

MOVR10,#0x0F,#14

MOVR5,#0x66,#2

Solution:

MOVR0,#0xAA,#2

;R0=0xAAisrotatedright2times.R0=0x8000002A

MOVR1,#0x20,#28

;R1=0x20isrotatedright28times.R0=0x00000200

MOVR4,#0x99,#6

;R4=0x99isrotatedright6times.R0=0x64000002

MOVR2,#0x55,#24

;R2=0x55isrotatedright24times.R0=0x00005500

MOVR3,#0x01,#20

;R3=0x01isrotatedright20times.R0=0x00001000

MOVR7,#0x80,#12

;R7=0x80isrotatedright12times.R0=0x08000000

MOVR10,#0x0F,#14

;R10=0x0Fisrotatedright14times.R0=0x003C0000

MOVR5,#0x66,#2

;R5=0x66isrotatedright2times.R0=0x80000019

WecanusetheconceptofrotatewithotherdataprocessinginstructionsoftheARMtoloadfixedvaluesintotheARMregister.SeeExample3-25.

Example3-25

Givetheregistervalueforeachofthefollowinginstructionsafteritisexecuted.

Page 154: ARM Assembly Language: Programming and Architecture

MVNR0,#0xAA,#2

MVNR1,#0x20,#28

MVNR4,#0x99,#6

MVNR2,#0x55,#24

MVNR3,#0x01,#20

MVNR7,#0x80,#12

MVNR10,#0x0F,#14

MVNR5,#0x66,#2

Solution:

MVNR0,#0xAA,#2

;0xAArotatedright2times=0x8000002A;R0=0x7FFFFFD5

MVNR1,#0x20,#28

;0x20isrotatedright28times=0x00000200;R1=0xFFFFFDFF

MVNR4,#0x99,#6

;0x99isrotatedright6times=0x64000002;R4=0x9BFFFFFD

MVNR2,#0x55,#24

;0x55isrotatedright24times=0x00001A40;R2=0xFFFFAAFF

MVNR3,#0x01,#20

;0x01isrotatedright20times=0x00001000;R3=0xFFFFEFFF

MVNR7,#0x80,#12

;0x80isrotatedright12times=0x08000000;R7=0xF7FFFFFF

MVNR10,#0x0F,#14

;0x0Fisrotatedright14times=0x003C0000;R10=0xFFC3FFFF

MVNR5,#0x66,#2

;0x66isrotatedright2times=0x80000019;R5=0x7FFFFFE6

GeneralFormationofProcessinstruction

NextwewillshowhowthedifferentoperandsaresupportedbyARM.InFigure3-4youseethegeneralformationofprocessinstructions.

Page 155: ARM Assembly Language: Programming and Architecture

Figure3-4:DataProcessInstructions

InallARMinstructionsbits28–31areputasideforconditionfieldwhichiscoveredinChapter4.

Bits26and27are0inprocessinstructionsshowingthattheinstructionisaprocessinstruction and the opcode is represented by bits 24–21. Using 4 bits 16 differentinstructionsareprovidedasshowninFigure3-4.

TheSbit (bit20) shows if the flagsshouldbeupdated.WhenweaddanS (e.g.MOVS)thebitwillbesetwhichshowsthattheflagsshouldbeupdatedbytheCPU.

Bits12–15containthedestinationregister.Using4bitswecanselectregistersR0–R15.

Bits16–19representthefirstoperandregister.

Bit25(I)showsthetypeofsecondoperand.Thebitis1whenthesecondoperandisanimmediatevalue.Wheneverthesecondoperandisaregisterthisbitiszero.

Immediate values: In the case that I is 1, bits 0–7, contain an immediate valuewhichcanbeanumberbetween0–0xFF.

Second register: In the case that I is 0, bits 0–11 represent the second operandregister, togetherwith the amount of shift/rotate and type of shift/rotate.Asmentionedearlier the shift amount can be provided either by a register or an immediate value.WhenevertheIbitiscleared,bit4showsthewayshiftamountisprovided.SeeFigure3-5.

Page 156: ARM Assembly Language: Programming and Architecture

Figure3-5:DataProcessInstructions

Example3-26shouldhelptoclarifythis.

Example3-26

UsingtheKeilIDE,assemblethefollowingprogramandcomparetheinstructionwiththeprocessinstructionformat.

AREAEX3_26,CODE,READONLY

ENTRY

ADDR0,R1,R5,LSR#2

ADDR0,R1,R5,LSRR2

ADDR0,R1,R5,LSLR2

ADDR0,R1,R5

ADDR0,R1,#5,#2

ADDR0,R1,#5

H1BH1

END

Page 157: ARM Assembly Language: Programming and Architecture

Solution:

In“ADDR0,R1,R5,LSR#2”thesecondoperandisaregister.Therefore,theIbitiscleared.Sincetheshiftamountisanimmediatevalue,thebit4iscleared,aswell.Theshifttypeissetto01representingLSR.

In“ADDR0,R1,R5,LSRR2”thesecondoperandisaregister.Therefore,theIbitiscleared.Asaregisterprovidestheshiftamount,thebit4isset.

The“ADDR0,R1,R5,LSLR2”instructionisthesameas“ADDR0,R1,R5,LSRR2”exceptthattheshifttypeissetto00torepresentLSL.

Themachinecoderepresentsinfacttheinstruction“ADD R0,R1,R5,LSL#0”.But,ifweshiftanumber0bitstoleft,thenumberremainsunchanged.Asaresult,itrepresents“ADD R0,R1,R5”.

Page 158: ARM Assembly Language: Programming and Architecture

In“ADDR0,R1,#5,#2”thesecondoperandisimmediate.Therefore,theIbitisset.Inimmediateoperandsthevalueisshiftedtwicetherotatefield.Thatiswhytherotatefieldis1.

In“ADDR0,R1,#5”,theimmediatevalueisrotated0bits.Therefore,theimmediatevalueremainsunchanged.

LoadingfixedvaluesintoRegisters

In this section, we examined loading fixed values into registers. However, aswehaveseensofar,itisnotpossibletocreateeveryconstantvaluebyusingtherotateoptionoftheinstructions.

AnotherwayistousecombinationoftheseinstructionssuchasMVN,MOV,ADD,AND,OR,andsoontoloadanyvalueintoaregister.ForexamplethefollowingprogramloadsR1with0x87654321:

MOVR1,#0x21;R1=0x00000021

ORRR1,R1,#0x43,#24

;R1=0x00000021ORed0x00004300=0x00004321

ORRR1,R1,#0x65,#16

;R1=0x00004321ORed0x00650000=0x00654321

ORRR1,R1,#0x87,#8

;R1=0x00654321ORed0x87000000=0x87654321

But it is very tedious and time consuming to come upwith such combination ofinstructions to get the resultwe need. For this reason theARM assembler has pseudo-instructionsuchasLDR.TheLDRisnotarealinstructionlikeMOVandADDsinceyouwillnotfindtheopcodeforitintheARMmanual.WhentheARMassemblerassemblesaprogramwiththeLDRitreplacestheLDRwithagroupofinstructionstodothejobofloading a fixed value in themost efficient way possible. Aswe have seen in previouschapters, theLDRuses = sign for the immediate value instead of # used by theMOVinstruction. Indeed this is thewaywedistinguishbetween them.Examine the followingcodestoseethedifferencesinsyntaxes:

MOVR1,#0x55;R1=0x55

LDRR2,=0x5555;R2=0x5555

ItneedstobeemphasizedthatwemustusetheMOVinstructionforloadingfixed

Page 159: ARM Assembly Language: Programming and Architecture

valueoflessthan(orequalto)0xFFandusetheLDRpseudo-instructiononlyforloadingvalues larger than0xFF.This isdue to thefact thatMOVisareal instructionand takesless memory space when it is compiled. See Chapter 6 for more on LDR and ADRpseudo-instructions.

ReviewQuestions

1.FindthecontentsofR3afterexecutingthefollowingcode:

MOVR0,#0x04

MOVR3,R0,LSR#2

2.FindthecontentsofR4afterexecutingthefollowingcode:

LDRR1,=0xA0F2

MOVR2,#0x3

MOVR4,R1,LSRR2

3.FindthecontentsofR3afterexecutingthefollowingcode:

LDRR1,=0xA0F2

MOVR2,#0x3

MOVR3,R1,LSLR2

4.FindthecontentsofR5afterexecutingthefollowingcode:

SUBSR0,R0,R0

MOVR0,#0xAA

MOVR5,R0,ROR#4

5.FindthecontentsofR0afterexecutingthefollowingcode:

LDRR2,=0xA0F2

MOVR1,0x4

MOVR0,R2,RRXR1

6.GivetheresultinR1forthefollowing:

MVNR1,#0x01,#2

7.GivetheresultinR2forthefollowing:

MVNR2,#0x02,#28

Page 160: ARM Assembly Language: Programming and Architecture
Page 161: ARM Assembly Language: Programming and Architecture

Section3.4:ShiftandRotateInstructionsinARMCortex(CaseStudy)ARMCortexhasnew instructions specifically for shift and rotate. In this section,

wediscusstheseinstructions.ForfullinstructionsetseeAppendixA.ForthepurposeofcomparisonwehavecopiedthissectionfromAppendixA.

LSLLogicalShiftLeft

LSLRd,Rm,Rn

Function:AseachbitofRmregisterisshiftedleft,theMSBisremovedandtheemptybitsarefilledwithzeros.ThenumberofbitstobeshiftedleftisgivenbyRnandtheresultisplacedinRdregister.TheLSLdoesnotupdatestheflags.

Example1:

LDRR2,=0x00000010

LSLR0,R2,#8;R0=R2isshiftedleft8times

;now,R0=0x00001000,flagsnotupdated

Example2:

LDRR0,=0x00000018

MOVR1,#12

LSLR2,R0,R1;R2=R0isshiftedleftR1numberoftimes

;now,R2=0x000018000,flagsnotupdated

Example3:

LDRR0,=0x0000FF18

MOVR1,#16

LSLR2,R0,R1;R2=R0isshiftedleftR1numberoftimes

;now,R2=0xFF180000,flagsnotupdated

LSLSLogicalShiftLeft(updatetheflags)

LSLSRd,Rm,Rn

Function:AseachbitofRmregisterisshiftedleft,theMSBiscopiedtoCflagandtheemptybitsarefilledwithzeros.ThenumberofbitstobeshiftedleftisgivenbyRnandtheresultisplacedinRdregister.TheLSLSupdatestheflags.

Example1:

LDRR2,=0x00000010

LSLSR0,R2,#8;R0=R2isshiftedleft8times

Page 162: ARM Assembly Language: Programming and Architecture

;now,R0=0x00001000,C=0,N=0,Z=0

Example2:

LDRR0,=0x00000018

MOVR1,#12

LSLSR2,R0,R1;R2=R0isshiftedleftR1numberoftimes

;now,R2=0x000018000,C=0,N=0,Z=0

Example3:

LDRR0,=0x000FFF18

MOVR1,#16

LSLSR2,R0,R1;R2=R0isshiftedleftR1numberoftimes

;now,R2=0xFF180000,C=1,Z=0,N=0

Thelogicalshiftleftisusedforshiftingunsignednumbers.LSLSessentiallymultipliesRmbyapowerof2aftereachbitisshifted.

LSRLogicalShiftRight

LSRRd,Rm,Rn

Function:AseachbitofRmregisterisshiftedright,theLSBisremovedandtheemptybitsarefilledwithzeros.ThenumberofbitstobeshiftedleftisgivenbyRnandtheresultisplacedinRdregister.TheLSRdoesnotupdatetheflags.

Example1:

LDRR2,=0x00001000

LSRR0,R2,#8;R0=R2isshiftedright8times

;now,R0=0x00000010,C=0

Example2:

LDRR0,=0x000018000

MOVR1,#12

LSRR2,R0,R1;R2=R0isshiftedrightR1numberoftimes

;now,R2=0x00000018,C=0

Example3:

LDRR0,=0x7F180000

MOVR1,#16

Page 163: ARM Assembly Language: Programming and Architecture

LSRR2,R0,R1;R2=R0isshiftedrightR1numberoftimes

;now,R2=0x00007F18,C=0

Thelogicalshiftrightisusedforshiftingunsignednumbers.LSRessentiallydividesRmbyapowerof2aftereachbitisshifted.

LSRSLogicalShiftRight(updatetheflags)

LSRSRd,Rm,Rn

Function:AseachbitofRmregisterisshiftedright,theLSBiscopiedtoCflagandtheemptybitsarefilledwithzeros.ThenumberofbitstobeshiftedleftisgivenbyRnandtheresultisplacedinRdregister.TheLSRSupdatestheflags.

Example1:

LDRR2,=0x00001FFF

LSRSR0,R2,#8;R0=R2isshiftedright8times

;now,R0=0x0000001F,C=1,N=0,Z=0

Example2:

LDRR0,=0x00000018

MOVR1,#12

LSRSR2,R0,R1;R2=R0isshiftedrightR1numberoftimes

;now,R2=0x000000000,C=0,N=0,Z=1,

Example3:

LDRR0,=0x000FFF18

MOVR1,#16

LSRSR2,R0,R1;R2=R0isshiftedrightR1numberoftimes

;now,R2=0x0000000F,C=1,Z=0,N=0

The logical shift right is used for shifting unsigned numbers. LSRS essentiallydividesRmbyapowerof2aftereachbitisshifted.

RORRotateRight

RORRd,Rm,Rn;Rd=rotateRmrightRnbitpositions

Function:AseachbitofRmregisterisshiftedfromlefttoright,theyexitfromtheend(LSB)andenteredfromleftend(MSB).ThenumberofbitstoberotatedrightisgivenbyRnandtheresultisplacedinRdregister.TheRORdoesnotupdatetheflags.

Page 164: ARM Assembly Language: Programming and Architecture

Example1:

LDRR2,=0x00000010

RORR0,R2,#8;R0=R2isrotatedright8times

;now,R0=0x10000000,C=0

Example2:

LDRR0,=0x00000018

MOVR1,#12

RORR2,R0,R1;R2=R0isrotatedrightR1numberoftimes

;now,R2=0x01800000,C=0

Example3:

LDRR0,=0x0000FF18

MOVR1,#16

RORR2,R0,R1;R2=R0isrotatedrightR1numberoftimes

;now,R2=0xFF180000,C=0

RORSRotateRight(updatetheflags)

RORSRd,Rm,Rn;Rd=rotateRmrightRnbitpositions

Function:AseachbitofRmregistershiftsfromlefttoright,theyexitfromtherightend(LSB)andenterfromtheleftend(MSB).InadditionaseachbitexitstheLSB,acopyofitisgiventoCflag.ThenumberofbitstoberotatedrightisgivenbyRnandtheresultisplacedinRdregister.TheRORSupdatestheflags.

Example1:

LDRR2,=0x00000010

RORSR0,R2,#8;R0=R2isrotatedright8times

;now,R0=0x01000000,C=0,N=0,Z=0

Example2:

LDRR0,=0x00000018

MOVR1,#12

RORSR2,R0,R1;R2=R0isrotatedrightR1numberoftimes

;now,R2=0x01800000,C=0,N=0,Z=0

Page 165: ARM Assembly Language: Programming and Architecture

Example3:

LDRR0,=0x0000FF18

MOVR1,#16

RORSR2,R0,R1;R2=R0isrotatedrightR1numberoftimes

;now,R2=0xFF180000,C=1,N=0,Z=0

RRXRotateRightwithextend

RRXRd,Rm;Rd=rotateRmright1bitposition

Function:EachbitofRmregisterisshiftedfromlefttorightonebit.TheRRXdoesnotupdatetheflags.

Example:

LDRR2,=0x00000002

RRXR0,R2;R0=R2isshiftedrightonebit

;now,R0=0x00000001

RRXSRotateRightwithextend(updatetheflags)

RRXSRd,Rm;Rd=rotateRmright1bitposition

Function:EachbitofRmregisterisshiftedfromlefttorightonebit.TheRRXSupdatestheflags.

Example1:

LDRR2,=0x00000002

RRXSR0,R2;R0=R2isshiftedrightonebit

;now,R0=0x00000001

ReviewQuestions

1.Trueorfalse.ARMCortexhasitsowninstructionforrotateandshift.

2.FindthecontentsofR3afterexecutingthefollowingcode:

MOVR1,#0x08

RORR2,R1,#2

Page 166: ARM Assembly Language: Programming and Architecture

3.FindthecontentsofR4afterexecutingthefollowingcode:

MOVR2,#0x3

LSLR5,R3,#2

Page 167: ARM Assembly Language: Programming and Architecture
Page 168: ARM Assembly Language: Programming and Architecture

Section3.5:BCDandASCIIConversionThissectioncoversbinary,BCD,andASCIIconversionswithsomeexamples.

BCDnumbersystem

BCDstandsforbinarycodeddecimal.BCDisneededbecauseweusethedigits0to9fornumbersineverydaylife.BCDsystemiswidelyusedinreal-timeclock(RTC)oftheembeddedsystems.Binaryrepresentationof0to9iscalledBCD.IncomputerliteratureoneencounterstwotermsforBCDnumbers:(1)unpackedBCD,and(2)packedBCD.

Digit BCD

0 0000

1 0001

2 0010

3 0011

4 0100

5 0101

6 0110

7 0111

8 1000

9 1001

Table3-9:BCDCodes

UnpackedBCD

InunpackedBCD,thelower4bitsofthenumberrepresenttheBCDnumberandtherestofthebitsare0.Forexample,“00001001”and“00000101”areunpackedBCDfor9and5,respectively.InthecaseofunpackedBCDittakes1byteofmemorylocationoraregisterof8bitstoholdthenumber.

PackedBCD

In the caseofpackedBCD, a singlebytehas twoBCDnumbers in it, one in thelower4bitsandoneintheupper4bits.Forexample,“01011001”ispackedBCDfor59.Ittakesonly1byteofmemorytostorethepackedBCDoperands.ThisisonereasontousepackedBCDsinceitistwiceasefficientinstoringdata.

ASCIInumbers

InASCIIkeyboards,whenkey“0”ispressed,“0110000”(0x30)isprovidedtothecomputer.Inthesameway,0x31(0110001)isprovidedforkey“1”,andsoon,asshown

Page 169: ARM Assembly Language: Programming and Architecture

inthefollowinglist:

Key ASCII Binary(hex) BCD(unpacked)

0 30 0110000 00000000

1 31 0110001 00000001

2 32 0110010 00000010

3 33 0110011 00000011

4 34 0110100 00000100

5 45 0110101 00000101

6 36 0110110 00000110

7 37 0110111 00000111

8 38 0111000 00001000

9 39 0111001 00001001

Itmust be noted that althoughASCII is standard in theUnited States (andmanyother countries), BCD numbers have universal application. Now since the keyboard,printers,andmonitorsareallinASCII,howdoesdatagetconvertedfromASCIItoBCD,andviceversa?Thesearethesubjectscoverednext.

ASCIItounpackedBCDconversion

ToconvertASCIIdatatounpackedBCD,theprogrammermustgetridofthetagged“011” in theupper4bitsof theASCII.Todo that,eachASCIInumber isANDedwith“00001111”(0x0F).

ASCIItopackedBCDconversion

ToconvertASCIItopackedBCD,itisfirstconvertedtounpackedBCD(togetridofthe3)andthencombinedtomakepackedBCD.Forexample,for2and7thekeyboardgives0x32and0x37,respectively.Thegoalistoproduce0x27or“00100111”,whichiscalledpackedBCD,asdiscussedearlier.Thisprocessisillustratedindetailbelow.

Key ASCII UnpackedBCD PackedBCD

2 32 00000010

7 37 00000111 00100111(0x27)

MOVR1,#0x37;R1=0x37

Page 170: ARM Assembly Language: Programming and Architecture

MOVR2,#0x32;R2=0x32

ANDR1,R1,#0x0F;mask3togetunpackedBCD

ANDR2,R2,#0x0F;mask3togetunpackedBCD

MOVR3,R2,LSL#4;shiftR24bitstolefttogetR3=0x20

ORRR4,R3,R1;ORthemtogetpackedBCD,R4=0x27

PackedBCDtoASCIIconversion

Fordata tobedisplayedon themonitororbeprintedby theprinter, itmustbe inASCII format. Conversion from packed BCD to ASCII is discussed next. To convertpackedBCDtoASCII,itmustfirstbeconvertedtounpackedandthentheunpackedBCDis tagged with 011 0000 (0x30). The following shows the process of converting frompackedBCDtoASCII.

PackedBCD UnpackedBCD ASCII

0x29 0x02&0x09 0x32&0x39

00101001 00000010&00001001 0110010&0111001

MOVR0,#0x29

ANDR1,R0,#0x0F;maskupperfourbits

ORRR1,R1,#0x30;combinewith30togetASCII

MOVR2,R0,LSR#04;shiftright4bitstogetunpackedBCD

ORRR2,R2,#0x30;combinewith30togetASCII

ReviewQuestions

1.Forthefollowingdecimalnumbers,givethepackedBCDandunpackedBCDrepresentationsinbinary

(a)15(b)99

2.ForthefollowingpackedBCDnumbers,givethedecimalandunpackedBCDrepresentations.

(a)0x41(b)0x09

3.Repeatquestion2forASCII.

Page 171: ARM Assembly Language: Programming and Architecture

ProblemsSection3.1:ArithmeticInstructions

1.FindCandZflagsforeachofthefollowing.Alsoindicatetheresultoftheadditionandwheretheresultissaved.

(a)

MOVR1,#0x3F

MOVR2,#0x45

ADDSR3,R1,R2

(b)

LDRR0,=0x95999999

LDRR1,=0x94FFFF58

ADDSR1,R1,R0

(c)

LDRR0,=0xFFFFFFFF

ADDSR0,R0,#1

(d)

LDRR2,=0x00000001

LDRR1,=0xFFFFFFFF

ADDSR0,R1,R2

ADCSR0,R0,#0

(e)

LDRR0,=0xFFFFFFFE

ADDSR0,R0,#2

ADCR1,R0,#0x0

2.StatethethreestepsinvolvedinaSUBandshowthestepsforthefollowingdata.

(a)0x23–0x12(b)0x43–0x51(c)0x99–0x99

Section3.2:LogicInstructions

3.Assumethatthefollowingregisterscontainthesehexcontents:R0=0xF000,R1=0x3456,andR2=0xE390.Performthefollowingoperations.Indicatetheresultandtheregisterwhereitisstored.

Note:theoperationsareindependentofeachother.

(a)ANDR3,R2,R0 (b)ORRR3,R2,R1

(c)EORR0,R0,#0x76 (d)ANDR3,R2,R2

(e)EORR0,R0,R0 (f)ORRR3,R0,R2

(g)ANDR3,R0,#0xFF (h)ORRR3,R0,#0x99

(i)EORR3,R1,R0 (j)EORR3,R1,R1

Page 172: ARM Assembly Language: Programming and Architecture

4.GivethevalueinR2afterthefollowingcodeisexecuted:

MOVR0,#0xF0

MOVR1,#0x55

BICR2,R1,R0

5.GivethevalueinR2afterthefollowingcodeisexecuted:

LDRR1,=0x55555555

MVNR0,#0

EORR2,R1,R0

Section3.3:RotateandBarrelShifter

6.AssumingC=0,whatisthevalueofR1afterthefollowing?

MOVR1,#0x25

MOVSR1,R1,ROR#4

7.AssumingC=0,whatarethevaluesofR0andCafterthefollowing?

LDRR0,=0x3FA2

MOVR2,#8

MOVSR0,R0,RORR2

8.AssumingC=0whatisthevalueofR2andCafterthefollowing?

MOVR2,#0x55

MOVSR2,R2,RRX

9.AssumingC=0whatisthevalueofR1afterthefollowing?

MOVR1,#0xFF

MOVR3,#5

MOVSR1,R1,RORR3

10.Givetheregistervalueforeachofthefollowinginstructionsafteritisexecuted.

a)MOVR1,#0x88,#4 b)MOVR0,#0x22,#22

c)MOVR2,#0x77,#8 d)MOVR4,#0x5F,#28

e)MOVR6,#0x88,#22 f)MOVR5,#0x8F,#16

g)MOVR7,#0xF0,#20 h)MOVR1,#0x33,#28

11.Givetheregistervalueforeachofthefollowinginstructionsafteritisexecuted.

a)MVNR2,#0x1 b)MVNR2,#0xAA,#20

Page 173: ARM Assembly Language: Programming and Architecture

c)MVNR1,0x55,#4 d)MVNR0,#0x66,#28

e)MVNR1,#0x80,#24 f)MVNR6,#0x10,#20

g)MVNR7,#0xF0,#24 h)MVNR4,#0x99,#4

12.FindthecontentsofregistersandCflagafterexecutingeachofthefollowingcodes:

a)

MOVR0,#0x04

MOVSR1,R0,LSR#2

MOVSR3,R0,LSRR1

b)

LDRR1,=0xA0F2

MOVR2,#0x3

MOVSR3,R1,LSLR2

c)

LDRR1,=0xB085

MOVR2,#3

MOVSR4,R1,LSRR2

13.FindthecontentsofregistersandCflagafterexecutingeachofthefollowingcodes:

a)

SUBSR2,R2,R2

MOVR0,#0xAA

MOVSR1,R0,ROR#4

b)

MOVR2,#0xAA,#4

MOVR0,#1

MOVSR1,R2,RORR0

c)

LDRR1,=0x1234

MOVR2,#0x010,#2

MOVSR1,R0,RORR2

d)

MOVR0,#0xAA

MOVSR1,R0,RRX

14.UsingMOVinstruction,showhowyourotateleftthefixedvalueof0x33totalofa)4,b)8,andc)12times.Alsogivethevalueintheregisteraftertherotation.

Section3.5:BCDandASCIIConversion

15.Writeaprogramtoconvert0x76frompackedBCDnumbertoASCII.PlacetheASCIIcodesintoR1andR2.

16.For3and2thekeyboardgives0x33and0x32,respectively.Writeaprogramtoconvert0x33and0x32topackedBCDandstoretheresultinR2.

Page 174: ARM Assembly Language: Programming and Architecture

AnswerstoReviewQuestionsSection3.1:ArithmeticInstructions

1.TheADDSinstructionupdatestheflagbitswhileADDdoesnotdothat.

2.Rd=Rn+Op2+C

3.0x4F+0xB1=0x100,sinceitisabyteadditionandresultislessthan32-bittheC=0andZ=0.

4.0x4F+0xFFFFFFB1=00000000,sinceitisawordadditionandresultisgreaterthan32-bit,theC=1andZ=1.

5.

0x43 01000011 00000000000000000000000001000011

–0x05 00000101 2’scomplement= +11111111111111111111111111111011

0x3E 100000000000000000000000000111110

C=1;therefore,theresultispositive

6.R2=R2–R3–C+1=0x95–0x4F–1+1=0x46

7.R2

8.R2=1

Section3.2:LogicInstructions

1.(a)0x4202(b)0xCFFF(c)0x8DFD

2.Theoperandwillremainunchanged;allzeros

3.Allones

4.Allzeros

5.ORRR7,R7,#0x10;R7=R7ORed00010000

6.ANDR5,R5,#0x8;R5=R5ORed00001000

Section3.3:RotateandBarrelShifterOperation

1.R3=1

2.R4=0x0000141E

3.R3=0x00050790

4.R5=0xA000000A

5.R0=0x00005079

6.0xBFFFFFFF

7.0xFFFFFFDF

Page 175: ARM Assembly Language: Programming and Architecture

Section3.4:ShiftandRotateInstructionsinARMCortex

1.True

2.0x01

3.0x0C

Section3.5:BCDandASCIIConversion

1.(a)15=00010101packedBCD=0000000100000101unpackedBCD

(b)99=10011001packedBCD=0000100100001001unpackedBCD

2.(a)0x41=0000010000000001unpackedBCD=41indecimal

(b)0x09=0000000000001001unpackedBCD=9indecimal

3.(a)0x34,0x31

(b)0x30,0x39

Page 176: ARM Assembly Language: Programming and Architecture
Page 177: ARM Assembly Language: Programming and Architecture
Page 178: ARM Assembly Language: Programming and Architecture

Chapter4:Branch,Call,andLoopinginARMIn the sequence of instructions to be executed, it is often necessary to transfer

programcontroltoadifferentlocation.(e.g.whenafunctioniscalled,executionofaloopisrepeated,oraninstructionexecutesconditionally)TherearemanyinstructionsinARMto achieve this. This chapter covers the control transfer instructions available in ARMAssembly language. InSection4.1,wediscuss instructionsused for looping, aswell asinstructionsforconditionalandunconditionalbranches(jumps).Inthesecondsection,weexamine the instructions associated with calling subroutine. In Section 4.3, instructionpipeliningoftheARMisexamined.Instructiontimingandtimedelaysubroutinesarealsodiscussed in Section 4.3. In Section 4.4, we examine the conditional execution of theARMinstructionswhichisauniquefeatureofARM.

Page 179: ARM Assembly Language: Programming and Architecture
Page 180: ARM Assembly Language: Programming and Architecture

Section4.1:LoopingandBranchInstructionsInthissectionwefirstdiscusshowtoperformaloopingactioninARMandthenthe

branch(jump)instructions,bothconditionalandunconditional.

LoopinginARM

Repeating a sequenceof instructionsor anoperation a certainnumberof times iscalleda loop.The loop isoneof themostwidelyusedprogramming techniques. In theARM,thereareseveralwaystorepeatanoperationmanytimes.Onewayistorepeattheoperationoverandoveruntilitisfinished,asshownbelow:

MOVR0,#0;R0=0

MOVR1,#9;R1=9

ADDR0,R0,R1;R0=R0+R1,add9toR0(NowR0is0x09)

ADDR0,R0,R1;R0=R0+R1,add9toR0(NowR0is0x12)

ADDR0,R0,R1;R0=R0+R1,add9toR0(NowR0is0x1B)

ADDR0,R0,R1;R0=R0+R1,add9toR0(NowR0is0x24)

ADDR0,R0,R1;R0=0x2D

ADDR0,R0,R1;R0=0x36

Intheaboveprogram,weadd0x9toR0sixtimes.Thatmakes6×9=54=0x36.One problemwith the above program is that toomuch code spacewould be needed toincrease the number of repetitions to 50or 1000. Amuchbetterway is to use a loop.Next,wedescribethemethodtodoaloopinARM.

UsinginstructionBNEforlooping

TheBNE(branch ifnotequal) instructionuses thezero flag in thestatus register.TheBNEinstructionisusedasfollows:

BACK………;startoftheloop

………;bodyoftheloop

………;bodyoftheloop

SUBSRn,Rn,#1;Rn=Rn-1,settheflagZ=1ifRn=0

BNEBACK;branchifZ=0

Inthelasttwoinstructions,theRn(e.g.R2orR3)isdecremented;ifitisnotzero,itbranches(jumps)backtothetargetaddressreferredtobythelabel.Priortothestartoftheloop,theRnisloadedwiththecountervalueforthenumberofrepetitions.NoticethattheBNE instruction refers to the Z flag of the status register affected by the previousinstruction,SUBS.ThisisshowninExample4-1.

Example4-1

Page 181: ARM Assembly Language: Programming and Architecture

Writeaprogramto(a)clearR0,(b)add9toR0athousandtimes,then

(c)placethesuminR4.

UsethezeroflagandBNEinstruction.

Solution:

;–thisprogramaddsvalue9totheR0a1000times–

AREAEXAMPLE4_1,CODE,READONLY

ENTRY

LDRR2,=1000;R2=1000(decimal)forcounter

MOVR0,#0;R0=0(sum)

AGAINADDR0,R0,#9;R0=R0+9(add09toR1,R1=sum)

SUBSR2,R2,#1;R2=R2-1andsettheflags.Decrementcounter

BNEAGAIN;repeatuntilCOUNT=0(whenZ=1)

MOVR4,R0;storethesuminR4

HEREBHERE;stayhere

END

IntheprograminExample 4-1,registerR2isusedasacounter.Thecounterisfirstset to1000. Ineach iteration, theSUBSinstructiondecrements theR2andsets the flagbitsaccordingly.IfR2isnotzero(Z=0),itjumpstothetargetaddressassociatedwiththe

Page 182: ARM Assembly Language: Programming and Architecture

label“AGAIN”.ThisloopingactioncontinuesuntilR2becomeszero.AfterR2becomeszero(Z=1),itfallsthroughtheloopandexecutestheinstructionimmediatelybelowit,inthiscase“MOVR4,R0”.

ItmustbeemphasizedagainthatwemustuseSUBSinsteadofSUBsincetheSUBinstructionwillnotchange(update)theflags.SincewearemonitoringtheZflagfortheloop counter we must use SUBS instruction for the decrementing the counter. As wementioned inChapter3,manyof theARMinstructionshave theoptionofaffecting theflags.Intheseinstructionsthedefaultisnottoaffecttheflags.Thereforetomakethemtoeffect the flag we must add letter S to the instruction. That means SUBS and ADDSinstructions are different from SUB and ADD, as far as the flags are concerned. AsanotherexampleseeExample4-2.

Example4-2

Writeaprogramtoplacevalue0x55into100bytesofRAMlocations.

Solution:

AREAEXAMPLE4_2,CODE,READONLY

ENTRY

RAM_ADDREQU0x40000000;changetheaddressforyourARM

MOVR2,#25;counter(25times4=100byteblocksize)

LDRR1,=RAM_ADDR;R1=RAMAddress

LDRR0,=0x55555555;R0=0x55555555

OVERSTRR0,[R1];sendittoRAM

ADDR1,R1,#4;R1=R1+4toincrementpointer

SUBSR2,R2,#1;R2=R2–1fordec.counter

BNEOVER;keepdoingit

HEREBHERE

END

Loopingatrilliontimeswithloopinsidealoop

AsshowninExample 4-3,themaximumcountis232-1.Whathappensifwewanttorepeatanactionmoretimesthanthat?Todothat,weusealoopinsidealoop,whichis

Page 183: ARM Assembly Language: Programming and Architecture

called a nested loop. In a nested loop, we use two registers to hold the count. SeeExample 4-3.

Example4-3

ExplainwhatisthemaximumnumberoftimesthattheloopinExample4-1canberepeated?Now,writeaprogramto(a)loadtheR0registerwiththevalue0x55,and(b)complementit16,000,000,000(16billion)times.

Solution:

BecauseRxisa32-bitregister,itcanholdamaximumof0xFFFFFFFF(232–1decimal);therefore,theloopcanberepeatedamaximumof232–1times.Thisexampleshowshowtocreateanestinglooptogobeyond4billiontimes.Because16,000,000,000islargerthan0xFFFFFFFF(themaximumcapacityofanyR0–R12registers),weusetworegisterstoholdthecount.ThefollowingcodeshowshowtouseR2andR1asaregisterforcountersinanestingloop.

AREAEXAMPLE4_3,CODE,READONLY

ENTRY

MOVR0,#0x55;R0=0x55

MOVR2,#16;load16intoR2(outerloopcount)

L1LDRR1,=1000000000;R1=1,000,000,000(innerloopcount)

L2EORR0,R0,#0xFF;complementR0(R0=R0Ex-OR0xFF)

SUBSR1,R1,#1;R1=R1–1,dec.R1(innerloop)

BNEL2;repeatituntilR1=0

SUBSR2,R2,#1;R2=R2–1,dec.R2(outerloop)

BNEL1;repeatituntilR2=0

HEREBHERE;stayhere

END

In this program,R1 is used to keep the inner loop count. In the instruction “BNEL2”,whenever R1 becomes 0 it falls through and “SUBS R2,R2,#1” is executed. The nextinstructions force theCPUto load the innercountwith1,000,000,000 ifR2 isnotzero,andtheinnerloopstartsagain.ThisprocesswillcontinueuntilR2becomeszeroandtheouterloopisfinished.IfyouusetheKeilIDEtoverifytheoperationoftheaboveprogram

Page 184: ARM Assembly Language: Programming and Architecture

usesmallervaluesforcountertogothroughtheiterations.SeeFigure4-1.

Figure4-1:FlowchartforExample4-3

OtherconditionalBranches

Aswementioned inChapter 3,C andZ flags reflect the result of calculation onunsigned numbers. Table 4-1 lists available conditional branches for unsigned numbersthatuseCandZflags.Moredetailsofeach instructionareprovided inAppendixA.InTable 4-1noticethattheinstructions,suchasBEQ(BranchifZ=1)andBCS(Branchif

Page 185: ARM Assembly Language: Programming and Architecture

carry set, C = 1), jump only if a certain condition is met. Next, we examine someconditionalbranch instructionswithexamples.Theotherconditionalbranch instructionsassociatedwiththesignednumbersarediscussedinChapter5whenarithmeticoperationsforsignednumbersarediscussed.

Instruction Action

BCS/BHS branchifcarryset/branchifhigherorsame BranchifC=1

BCC/BLO branchifcarryclear/branchlower BranchifC=0

BEQ branchifequal BranchifZ=1

BNE branchifnotequal BranchifZ=0

BLS branchiflessorsame BranchifZ=1orC=0

BHI branchifhigher BranchifZ=0andC=1

Table4-1:ARMConditionalBranchInstructionsforUnsignedData

BCC(branchifcarryclear,branchifC=0)

In this instruction, thecarry flagbit inprogramstatus registers (CPSR) isused tomakethedecisionwhethertobranch.Inexecuting“BCClabel”,theprocessorlooksatthecarry flag to see if it is cleared (C = 0). If it is, the CPU starts to fetch and executeinstructionsfromtheaddressofthelabel.IfC=1, itwillnotjumpbutwillexecutethenextinstructionbelowBCC.SeeExample4-4.

Example4-4

ExaminethefollowingcodeandgivetheresultinregistersR0,R1,andR2.

MOVR1,#0;clearhighword(R1=0)

MOVR0,#0;clearlowword(R0=0)

LDRR2,=0x99999999;R2=0x99999999

ADDSR0,R0,R2;R0=R0+R2andsettheflags

BCCL1;ifC=0,jumptoL1andaddnextnumber

ADDSR1,R1,#1;ELSE,increment(R1=R1+1)

L1ADDSR0,R0,R2;R0=R0+R2andsettheflags

BCCL2;ifC=0,addnextnumber

ADDSR1,R1,#1;ifC=1,increment

L2ADDSR0,R2;R0=R0+R2andsettheflags

BCCL3;ifC=0,addnextnumber

Page 186: ARM Assembly Language: Programming and Architecture

ADDSR1,R1,#1;C=1,increment

L3ADDSR0,R2;R0=R0+R2andsettheflags

BCCL4;ifC=0,addnextnumber

ADDSR1,R1,#1;ifC=1,andsettheflags

L4

Solution:

Thisprogramadds0x99999999togetherfourtimes.

R1(highbyte) R0(lowbyte)

Atfirst 0 0

JustbeforeL1 0 0x99999999

JustbeforeL2 1 0x33333332

JustbeforeL3 1 0xCCCCCCCB

JustbeforeL4 2 0x66666664

Hereistheloopversionoftheaboveprogramthatruns10times.

AREAEXAMPLE4_4,CODE,READONLY

ENTRY

MOVR1,#0;clearhighword(R1=0)

MOVR0,#0;clearlowword(R0=0)

LDRR2,=0x99999999;R2=0x99999999

MOVR3,#10;counter

L1ADDSR0,R2;R0=R0+R2andsettheflags

BCCNEXT;ifC=0,addnextnumber

ADDR1,R1,#1;ifC=1,incrementtheupperword

NEXTSUBSR3,R3,#1;R3=R3-1andsettheflags

;(Decrementcounter)

BNEL1;nextroundifZ=0

Page 187: ARM Assembly Language: Programming and Architecture

HEREBHERE;stayhere

END

Notethatthereisalsoa“BCSlabel”instruction.IntheBCSinstruction,ifC=1itjumps to the target address. We will give more examples of these instructions in thecontextofapplications.

Comparisonofunsignednumbers

CMPRn,Op2;compareRnwithOp2andsettheflags

TheCMPinstructioncomparestwooperandsandchangestheflagsaccordingtotheresult of the comparison. The operands themselves remain unchanged. There is nodestination register and the second source operands can be a register or an immediatevaluenotlargerthan0xFF.Itmustbeemphasizedthat“CMPRn,Op2”instructionisreallyasubtractoperation.Op2issubtractedfromRn(Rn–Op2)andtheresultisdiscardedandflags are set accordingly. The contents of Rn and Op2 remain unchanged after theexecutionofCMPinstruction.AlthoughalltheC,S,Z,andVflagsreflecttheresultofthecomparison,onlyCandZareusedforunsignednumbers,asoutlinedinTable4-2.

Instruction C Z

Rn>Op2 1 0

Rn=Op2 1 1

Rn<Op2 0 0

Table4-2:FlagSettingsforCompare(CMPRn,Op2)ofUnsignedData

Lookatthefollowingcase:

LDRR1,=0x35F;R1=0x35F

LDRR2,=0xCCC;R2=0xCCC

CMPR1,R2;compare0x35Fwith0xCCC

BCCOVER;branchifC=0

MOVR1,#0;ifC=1,thenclearR1

OVERADDR2,R2,#1;R2=R2+1=0xCCC+1=0xCCD

Figure4-2showsthediagramandtheClanguageversionofthecode.

Pseudocode:

R1=0x35F

R2=0xCCC

InC:

R1=0x35F;

R2=0xCCC;

Page 188: ARM Assembly Language: Programming and Architecture

IF(R1>=R2)THEN

R1=0

ENDIF

R2=R2+1

if(R1>=R2)

{

R1=0;

}

R2=R2+1;

Figure4-2:FlowchartofifInstruction

Intheaboveprogram,R1islessthantheR2(0x35F<0xCCC);therefore,C=0andBCC(branchifcarryclear)willgototargetOVER.Incontrast,lookatthefollowing:

LDRR1,=0xFFF

LDRR2,=0x888

CMPR1,R2;compare0xFFFwith0x888

BCCNEXT

ADDR1,R1,#0x40

NEXTADDR1,R1,#0x25

In theabove,R1 isgreater thanR2 (0xFFF>0x888),which setsC=1,making“BCCNEXT”fallthroughsothat“ADDR1,R1,0x40”isexecuted.

Again,itmustbeemphasizedthatinCMPinstructions,theoperandsareunaffectedregardlessoftheresultofthecomparison.Onlytheflagsareaffected.ThisisdespitethefactthatCMPusestheSUBoperationtosetorresettheflags.Italsomaybenotedthat,unlike other arithmetic and logic instructions, there is no need to put S in the CMPinstructiontoupdatetheflags.Inotherwords,theCMPinstructionautomaticallyupdatestheflags.

Program4-1usestheCMPinstructiontosearchforthehighestbyteinaseriesof5databytes.Tosearchforthehighestvaluetheinstruction“CMPR1,R3”worksasfollows,whereR1 is the contents of thememory location brought into R1 register by the [R2]pointer.

a)IfR1<R3,thenC=0andR3becomesthebasisofthenewcomparison.

b) IfR1≥R3, thenC=1andR1 is the largerof the twovaluesand remains thebasisofcomparison.

Program4-1Assumethatthereisaclassoffivepeoplewiththefollowinggrades:69,87,96,45,and75.

Findthehighestgrade.

Page 189: ARM Assembly Language: Programming and Architecture

;searchingforhighestvalue

COUNTRNR0;COUNTisthenewnameofR0

MAXRNR1;MAXisthenewnameofR1

;(MAXhasthehighestvalue)

POINTERRNR2;POINTERisthenewnameofR2

NEXTRNR3;NEXTisthenewnameofR3

AREAPROG_4_1D,DATA,READONLY

MYDATADCD69,87,96,45,75

AREAPROG_4_1,CODE,READONLY

ENTRY

MOVCOUNT,#5;COUNT=5

MOVMAX,#0;MAX=0

LDRPOINTER,=MYDATA

;POINTER=MYDATA(addressoffirstdata)

AGAINLDRNEXT,[POINTER]

;loadcontentsofPOINTERlocationtoNEXT

CMPMAX,NEXT

;compareMAXandNEXT

BHSCTNU

;ifMAX>NEXTbranchtoCTNU

MOVMAX,NEXT;MAX=NEXT

CTNUADDPOINTER,POINTER,#4

;POINTER=POINTER+4topointtothenext

SUBSCOUNT,COUNT,#1;decrementcounter

BNEAGAIN;branchAGAINifcounterisnotzero

HEREBHERE

END

Program4-1searchesthroughfivedataitemstofindthehighestvalue.Theprogramhasavariablecalled“MAX”thatholds thehighestgradefoundsofar.Onebyone, thegradesarecompared toHighest. Ifanyof them ishigher, thatvalue isplaced inMAX.

Page 190: ARM Assembly Language: Programming and Architecture

Thiscontinuesuntilalldataitemsarechecked.AREPEAT-UNTILstructurewaschosenintheprogramdesign.Figure4-3showstheflowchartforProgram4-1.Thisdesigncouldbeusedtocodetheprograminmanydifferentlanguages.

Pseudocode:

Count=5

Highest=0

REPEAT

IF(Next>MAX)

THEN

MAX=Next

ENDIF

Incrementpointer

DecrementCount

UNTILCount=0

;nowMAXistheHighest

InC:

//InKeil,longis32-bitwide

unsignedlongmyData[5]={69,87,96,45,75};

unsignedlongcount=5;

unsignedlongmax=0;

unsignedlongnext;

unsignedlong*pointer=myData;

do

{

next=*pointer;

if(max<next)

Page 191: ARM Assembly Language: Programming and Architecture

max=next;

pointer++;

count—;

}while(count!=0);

Figure4-3:FlowchartandPseudocodeforProgram4-1

Using CMP followed by conditional branches we can make any comparison onunsigned numbers, as shown in Table 4-3. Although BCS (branch carry set) and BCC(branchcarryclear)checkthecarryflagandcanbeusedafteracompareinstruction,itisrecommendedthatBHS(branchhigherorsame)andBLO(branchbelow)beusedfortworeasons.OnereasonisthatassemblerswillunassembleBCSasBLO,andBCCasBHS,whichmaybeconfusingtobeginnerprogrammers.Anotherreasonisthat“branchhigher”and “branch below” are easier to understand than “branch carry set” and “branch carryclear,”sinceitismoreimmediatelyapparentthatonenumberislargerthananother,thanwhetheracarrywouldbegeneratedifthetwonumbersweresubtracted.

Instruction Action

BCS/BHS branchifcarryset/branchifhigherorsame BranchifRn≥Op2

BCC/BLO branchifcarryclear/branchlower BranchifRn<Op2

BEQ branchifequal BranchifRn=Op2

BNE branchifnotequal BranchifRn≠Op2

BLS branchiflessorsame BranchifRn≤Op2

BHI branchifhigher BranchifRn>Op2

Table4-3:ARMConditionalBranchInstructionsforUnsignedData

DivisionofunsignednumbersinARM

TheolderARMfamilymembersdonothaveaninstructionfordivisionofunsignednumberssinceittooktoomanygatestoimplementit.However,manyoftheARMCortexchipsimplementthedivideinstruction.InARMswithnodivideinstructionswecanuseSUB instruction to perform the division. Program 4-2 shows an example of unsigneddivisionusingsubtractoperation.Intheprogramthenumeratorisplacedinaregisterandthedenominatorissubtractedfromitrepeatedly.Thequotientisthenumberoftimeswesubtractedandtheremainderisintheregisteruponcompletion.SeeFigure4-4.

Program4-2:DivisionbyRepeatedSubtractions

AREAPROG_4_2,CODE,READONLY;Divisionbysubtractions

ENTRY

Page 192: ARM Assembly Language: Programming and Architecture

LDRR0,=2012;R0=2012(numerator)

;itwillcontainremainder

MOVR1,#10;R1=10(denominator)

MOVR2,#0;R2=0(quotient)

L1CMPR0,R1;CompareR0withR1toseeiflessthan10

BLOFINISH;ifR0<R1jumptofinish

SUBR0,R0,R1;R0=R0-R1(divisionbysubtraction)

ADDR2,R2,#1;R2=R2+1(quotientisincremented)

BL1;gotoL1(Bisdiscussedinthenextsection)

FINISHBFINISH

Pseudocode:

NUM=2012

DENOM=10

WHILENUM>=DENOM

SubtractDENOMfromNUM

IncrementQUOTIENT

ENDWHILE

InC:

R0=2012;

R1=10;

while(R0>=R1)

{

R0=R0-R1;

R2=R2+1;

}

Figure4-4:FlowchartandPseudo-codeforProgram4-2

Page 193: ARM Assembly Language: Programming and Architecture

TST(Test)

TSTRn,Op2;RnANDwithOp2andflagbitsareupdated

TheTSTinstructionisusedtotestthecontentsofregistertoseeifanybitissettoHIGH. After the operands are ANDed together the flags are updated. After the TSTinstructionifresult iszero,thenZflagisraisedandonecanuseBEQ(branchequal)tomakedecision.Lookatthefollowingexample:

MOVR0,#0x04;R0=00000100inbinary

LDRR1,=myport;portaddress

OVERLDRBR2,[R1];loadR2frommyport

TSTR2,R0;isbit2HIGH?

BEQOVER;keepchecking

InTST,likeotherlogicorarithmeticdataprocessinginstructions,theOp2canbeanimmediatevalueoflessthan0xFF.Lookatthefollowingexample:

LDRR1,=myport;portaddress

OVERLDRBR2,[R1];loadR2frommyport

TSTR2,#0x04;isbit2HIGH?

BEQOVER;keepchecking

SeeExample4-5.

Example4-5

Assumeaddresslocation0x200000isassignedtoaninputportaddressandconnectedto8DIPswitches.WriteasimpleshortprogramtocheckthePORTandwheneverbothpins4or6areLOW,R4registerisincremented.

Solution:

MYPORTEQU0x200000

MOVR0,#2_01010000;R0=0x50(01010000inbinary)

LDRR1,=MYPORT;R1=portaddress

OVERLDRBR2,[R1];getabytefromPORTandplaceitinR2

TSTR2,R0;arebits4and6LOW?

BNEOVER;keepchecking

ADDR4,R4,#1

Page 194: ARM Assembly Language: Programming and Architecture

TEQ(testequal)

TEQRn,Op2;RnEX-ORedwithOp2andflagbitsareset

TheTEQinstructionisusedtotesttoseeifthecontentsoftworegistersareequal.After the source operands are Ex-ORed together the flag bits are set according to theresult.AftertheTEQinstructionifresultis0,thenZflagisraisedandonecanuseBEQ(branch zero) tomake decision.Recall that ifweExclusive-OR a valuewith itself, theresult is zero.Look at the following example for checking the temperatureof 100on agivenport:

TEMPEQU100

MOVR0,#TEMP;R0=Temp

LDRR1,=myport;portaddress

OVERLDRBR2,[R1];loadR2frommyport

TEQR2,R0;isit100?

BNEOVER;keepchecking

Unconditionalbranch(jump)instruction

Theunconditionalbranchisajumpinwhichcontrolistransferredunconditionallytothetarget location.IntheARMtherearetwounconditionalbranches:B(branch)andBX(branchandexchange).Thisisdiscussednext.

B(Branch)

B(branch)isanunconditionaljumpthatcangotoanymemorylocationinthe32MbyteaddressspaceoftheARM.AnothersyntaxforBinstructionisBAL(branchalways).

Bhasdifferentusageslikeimplementingif/else,while,andforinstructions.Inthefollowingcodeyouseeanexampleofimplementingtheif/elseinstruction:

CMPR1,R2

BHSL1

MOVR3,#2

BOVER

L1MOVR3,#5

OVER

//inC

if(R1<R2)

{

R3=2;

}

else

{

R3=5;

}

Intheabovecode,R3isinitializedwith2whenR1islowerthanR2.Otherwise,itisinitializedwith5.

Page 195: ARM Assembly Language: Programming and Architecture

Asanexampleofimplementingthewhileinstructionseethefollowingprogram.Itcalculatesthesumofnumbersbetween1and5:

MOVR1,#1

MOVR2,#0

L1CMPR1,#5

BHIL2

ADDR2,R2,R1

ADDR1,R1,#1

BL1

L2MOVR3,#5

//inC

unsignedlongR1=1;

unsignedlongR2=0;

while(R1<=5)

{

R2=R2+R1;

R1=R1+1;

}

Thefor instructioncanbeimplementedthesamewayasthewhileinstruction.Forexample,theaboveassemblyprogramcanbeconsideredasaforloop.

Incaseswherethereisnooperatingsystemormonitorprogram,weusetheBranchto itself inorder tokeep themicrocontrollerbusy.Asimplewayofdoing that isshownbelow:

HEREBHERE;stayhere

AnothersyntaxfortheBinstructionisBAL(branchalways)asshownbelow:

HEREBALHERE;stayhere

SinceARMinstructionis32-bit,8bitsareusedfortheopcode,andtheother24bitsrepresenttheaddressofthetargetlocation.The24-bittargetaddressisshiftedlefttwiceandthatallowsajumpto–32Mto+32Mbytesofmemorylocationsfromtheaddressofcurrentinstruction.Next,weexplainthereasonforthis.

Allbranchesareshortbranches(jumps)

Itmustbenotedthatallbranchinstructions(conditionalandunconditional)areshortjumps,meaningtheaddressofthetargetmustbewithin32Mbytesoftheprogramcounter(PC). That means the short jumps cannot cover the entire address space of 4G bytes(0x00000000to0xFFFFFFFF).

Calculatingtheshortbranchaddress

AllconditionalbranchessuchasBCC,BEQ,andBNEareshortbranches.This isduetothefactthattheARMinstructionsare32-bitandsomeofthe32-bitmustbeusedforopcodeandthereforeitcannotcovertheentire4GbytesaddressspaceofARM.Inthebranchinstructiontheopcodeis8bitsandtherelativeaddressis24bits.SeeFigure4-5.Thetargetaddressisrelativetothevalueoftheprogramcounter.Iftherelativeaddressispositive, the jump is forward. If the relative address is negative, then the jump isbackwards. The relative address can covermemory space of –32Mbytes to +32Mbytes

Page 196: ARM Assembly Language: Programming and Architecture

fromcurrentlocationofprogramcounter.Thereasonfor32MBisthefactthat24bitsareshiftedlefttwice(multipliedby4)bytheARMCPUautomatically.Thatgivesus26bits.Now, since one bit is used for positive or negative sign, we have only 25 bits formagnitude.The25bitsmagnitudegivesus32Mbytes(225=32M)ineachdirection.Thatis–32Mbytesif it isbackwardand+32MBif it isforwardjump.Therefore, tocalculatethetargetaddress,therelativeaddressisshiftedlefttwiceandaddedtotheaddress ofthenextinstructiontobefetched[targetaddress=relativeaddressshiftedlefttwice+addressof the next instruction to be fetched]. It must be noted that the next instruction to befetched is two instructions below the current branch instruction. The reason is theinstructionrightbelowthebranchisalreadyinthepipeline.SeeFigure 4-5.

Figure4-5:B(Branch)Instruction

Youmight askwhyweadd the relativeaddress to theaddressof two instructionsbelowthecurrentinstruction.(whydon’tweaddtherelativeaddresstotheaddressoftheinstruction rightbelow thecurrent instructionas it is inotherCPUs).This isdue to thepipelineissuesaswewillseeinnextsectionandChapter7.

Now, to calculate the target address of the branch we add address of the nextinstructiontobefetchedtotheoffsetshiftedleft twice(multipliedby4).Thereasonforshifting left twice is tomake sure it is word aligned. Given the fact that all theARMinstructionsare4-byte (word) longandarewordaligned,with225=32Mbytesaddressspacefortheforwardandbackwardjumpsthetargetaddressofjump(branch)canbeupto8Minstructions(32MBytes/4byes=8Minstructions)fromthecurrent instructionineachdirection.Althoughthisdoesnotcovertheentire1Ginstructions(4GBytes/4byes=1Ginstructions)ofARMmemoryspace,itismorethanadequateformanyapplications.SeeExample 4-6.

Example4-6

InARM7,thenextinstructiontobefetchedis2instructionsbelowthecurrentexecutinginstruction.Usingthefollowinglistfileverifythejumpforwardaddresscalculation.

Page 197: ARM Assembly Language: Programming and Architecture

LINEADDRESSMachineMnemonicOperand

100000000AREAEXAMPLE_4_6,CODE,READONLY

200000000ENTRY

300000000E3A01015MOVR1,#0x15;R1=0x15

400000004EA000002BTHERE

500000008E3A01025MOVR1,#0x25;R1=0x25

60000000CE3A02035MOVR2,#0x35;R2=0x35

700000010E3A03045MOVR3,#0x45;R3=0x45

800000014E3A04055THEREMOVR4,#0x55;R4=0x55

900000018EAFFFFFEHEREBHERE

10END

Solution:

FirstnoticethattheBinstructioninline4jumpsforward.Tocalculatethetargetaddress,therelativeaddress(offset)isshiftedlefttwiceandaddedtothePC ofthenextinstructiontobefetched.Thepositionofthenextinstructiontobefetchedis2instructionsbelowthecurrentinstruction.RecallthateachinstructionofARMtakes4bytes.Sothenextinstructiontobefetchedis2×4bytes=8bytesbelowthecurrentinstructionaddress(00000004).Sotheaddressofthenextinstructiontobefetchedis0000004+8=000000C(thepositionofMOVinstructioninline6).Inline4theinstruction“BTHERE”hasthemachinecodeofEA000002.Todistinguishtheoperandandopcodeparts,weshouldcomparethemachinecodewiththeBinstructionformat.Inthefollowing,youseetheformatoftheBinstruction.InthisexamplethemachinecodeisEA000002.IfwecompareitwiththeBformat,weseethat,theoperandis000002andtheopcodeisEA.The000002istheoffset,relativetotheaddressofthenextinstruction.Recallthattocalculatethetargetaddress,therelativeaddress(offset)isshiftedlefttwiceandaddedtothecurrentvalueofthePC(ProgramCounter).Shiftingtheoffset(000002)lefttwiceresultsin000008andthenaddingittotheaddressofthenextinstructiontobefetched(0000000C)wehave000008+0000000C=00000014whichisexactlytheaddressofTHERElabel.Allthejumpinstructions,whosemnemonicsbeginwithB,havethesameinstructionformat,andtheopcodechangesfrominstructiontoinstruction.So,wecancalculatetheshortbranchaddressforanyofthem,aswejustdidinthisexample.

Itmustalsobenotedthatforthebackwardbranchtherelativevalueisnegative(2’scomplement).ThatisshowninExample 4-7.

Page 198: ARM Assembly Language: Programming and Architecture

Example4-7

VerifythecalculationofbackwardjumpsforthelistingofExample4-1,shownbelow.

LINEADDRESSMachineMnemonicOperand

500000000E3A02FFALDRR2,=1000;R2=1000

600000004E3A00000MOVR0,#0;R0=0,sum

700000008E2800009AGAINADDR0,R0,#9;R0=R0+9

80000000CE2522001SUBSR2,R2,#1;R2=R2-1

9000000101AFFFFFCBNEAGAIN;repeat

1000000014E1A04000MOVR4,R0;storethesuminR4

1100000018EAFFFFFEHEREBHERE;stayhere

120000001CEND

Solution:

Intheprogramlist,“BNEAGAIN”inline9hasmachinecode1AFFFFFC.Tospecifytheoperandandopcode,wecomparetheinstructionwiththebranchinstructionformat,whichyousawinthepreviousexample.Theopcodeis1Aandtheoperand(relativeoffsetaddress)isFFFFFC.TheFFFFFCgivesus–4,whichmeansthedisplacementis(–4×4=–16=–0x10).

Thebranchislocatedinaddress0x0010.Theaddressofthenextinstructiontobefetchedistwoinstructionsaheadofcurrentbranchinstruction,andeachinstructionis4-bytewide.Therefore,addressofthenextinstructiontobefetched=0x0010+(2×4)=0x0018

Whentherelativeaddressof–0x10isaddedto00000018,wehave–0x0010+0x0018=0x08

Noticethat00000008istheaddressofthelabelAGAIN.

FFFFFCisanegativenumberandthatmeansitwillbranchbackward.Forfurtherdiscussionoftheadditionofnegativenumbers,seeChapter5.

Branchingbeyond32Mbytelimit

To branch beyond the address space of 32M bytes, we use BX (branch andexchange) instruction.The “BXRn” instructionuses registerRn to hold target address.

Page 199: ARM Assembly Language: Programming and Architecture

SinceRncanbeanyoftheR0–R14registersandtheyare32-bitregisters,the“BXRn”instruction can land anywhere in the 4G bytes address space of the ARM. In theinstruction“BXR2”theR2isloadedintotheprogramcounter(R15)andCPUstartstofetch instructions from the target address pointed to by R15, the program counter. SeeFigure 4-6.Sincetheinstructionsarewordaligned,wemustmakesurethatthelowertwobitsoftheRnare0s.

Figure4-6:BX(Branchandexchange)InstructionTargetAddress

TheBXinstructionisalsousedtoswitchtoTHUMBversionoftheARMCPU.FormoreinformationseetheARM manual.

ReviewQuestions

1.ThemnemonicBNEstandsfor_______.

2. True or false. “BNEBACK”makes its decision based on the last instructionaffectingtheZflag.

3.“BNEHERE”isa___-byteinstruction.

4.In“BEQNEXT”,whichflagbitischeckedtoseeifitishigh?

5.B(ranch)isa(n)___-byteinstruction.

6.CompareBandBXinstructions.

Page 200: ARM Assembly Language: Programming and Architecture
Page 201: ARM Assembly Language: Programming and Architecture

Section4.2:CallingSubroutinewithBLAnothercontroltransferinstructionistheBL(branchandlink)instruction,whichis

used to call a subroutine. Subroutines are often used to perform tasks that need to beperformed frequently. This makes a program more structured in addition to savingmemoryspace.In theARMthere isonlyoneinstructionforcallandthat isBL(branchandlink).TouseBLinstructionforcall,wemustleavetheR14registerunusedsincethatis where the ARM CPU stores the address of the next instruction where it returns toresumeexecutingtheprogram.

BL(BranchandLink)instructionandcallingsubroutine

In the32-bit instructionBL, 8 bits are used for theopcode and theother 24bits,k23–k0, are used for the address of the target subroutine, just like in the Branchinstruction. Therefore, BL can be used to call subroutines located anywherewithin the32MaddressspaceoftheARM,asshowninFigure 4-7.

Figure4-7:BL(BranchandLink)Instruction

Tomake sure that theARMknowswhere to comeback to after executionof thecalled subroutine, the ARM automatically saves in the link register (LR), the R14, theaddressoftheinstructionimmediatelybelowtheBL.WhenasubroutineiscalledbytheBL instruction, control is transferred to that subroutine, and the processor saves thePC(program counter) in the R14 register and begins to fetch instructions from the newlocation.Afterfinishingexecutionofthesubroutine,wemustuse“BXLR“instructiontotransfercontrolbacktothecaller.Everysubroutineneeds“BXLR”asthelastinstructionforreturnaddress.

BLinstructionandtheroleoflinkerregister

When a subroutine is called using BL instruction, first the processor saves theaddress of the instruction just below theBL instruction on theR14 register (LR, linkerregister), and thencontrol is transferred to that subroutine.This ishow theCPUknows

Page 202: ARM Assembly Language: Programming and Architecture

wheretoresumewhenitreturnsfromthecalledsubroutine.

Thelinkerregisterandreturningfromsubroutine

ThereisnoRETurninstructioninARM7.Thereforeattheendofthesubroutinewemustcopythelinkerregister,R14,totheprogramcounter(R15).WhentheBLinstructionis executed the address of the instruction below the BL instruction is placed into R14register, so, when the execution of the function finishes, the address of the instructionbelow the BL must be reloaded back into the PC, and so the CPU can resume theexecutionofinstructionsbelowtheBLinstruction.

TounderstandtheroleoftheR14registerinBLinstructionandthereturn,examinetheExamples 4-8.ThefollowingpointsshouldbenotedfortheExample 4-8:

1. Notice theDELAY subroutine.Upon executing the first “BLDELAY”, theaddress of the instruction right below it, “MOVR0,#0xAA”, is savedonto theR14register,andtheARMstartstoexecuteinstructionsatDELAYsubroutine.

2.IntheDELAYsubroutine,firstthecounterR3issetto5(R3=5);therefore,theinnerloopisrepeated5times.WhenR3becomes0,controlfallstothe“BXLR”instruction,which restores the address into the program counter and returns tomainprogramtoresumeexecutingtheinstructionsaftertheBL.

Example4-8

Writeaprogramtotoggleallthebitsofaddress0x40000000bysendingtoitthevalues0x55and0xAAcontinuously.Putatimedelaybetweeneachissuingofdatatoaddresslocation.

Solution:

AREAEXAMPLE4_8,CODE,READONLY

ENTRY

RAM_ADDREQU0x40000000;changetheaddressforyourARM

LDRR1,=RAM_ADDR;R1=RAMaddress

AGAINMOVR0,#0x55;R0=0x55

STRBR0,[R1];sendittoRAM

BLDELAY;calldelay(R14=PCofnextinstruction)

MOVR0,#0xAA;R0=0xAA

STRBR0,[R1];sendittoRAM

BLDELAY;calldelay

BAGAIN;keepdoingit

;––––––—DELAYSUBROUTINE

Page 203: ARM Assembly Language: Programming and Architecture

DELAYLDRR3,=5;R3 =5,modifythisvaluefordifferentsizedelay

L1SUBSR3,R3,#1;R3=R3-1

BNEL1

BXLR;returntocaller

;––––––—endofDELAYsubroutine

END;noticetheplaceforENDdirective

UseKeilIDEsimulatorforARMtosimulatetheaboveprogramandexaminetheregistersandmemorylocation0x40000000.Youmighthavetochangetheaddress0x40000000tosomeothervaluedependingontheRAMaddressoftheARMchipyouuse.

Inaboveprogram,inplaceof“BXLR”forreturn,wecouldhaveused“BXR14”,“MOV R15,R14”,or“MOVPC,LR”instructions.Allofthemdothesamething;butitisrecommendedtousethe“BXLR”instruction.

Theamountof timedelay inExample 4-8dependson the frequencyof theARMchip.Howtocalculatethetimewillbeexplainedinthelastsectionofthischapter.

MainProgramandCallingSubroutines

In realworld projectswe divide the programs into small subroutines (also calledfunctions) and the subroutines are called from themain program.Figure 4-8 shows theformat.

;MAINprogramcallingsubroutines

AREAPogramName,CODE,READONLY

ENTRY

MAINBLSUBR_1;CallSubroutine1

BLSUBR_2;CallSubroutine1

BLSUBR_3;CallSubroutine1

HEREBALHERE;stayhere.BAListhesameasB

;––-endofMAIN

;––––––—SUBROUTINE1

SUBR_1….

….

BXLR;returntomain

;––endofsubroutine1

Page 204: ARM Assembly Language: Programming and Architecture

;––––––—SUBROUTINE2

SUBR_2….

….

BXLR;returntomain

;––endofsubroutine2

;––––––—SUBROUTINE3

SUBR_3….

….

BXLR;returntomain

;––endofsubroutine3

END;noticetheENDoffile

Figure4-8:ARMAssemblyMainProgramThatCallsSubroutines

Program4-3showsanexampleofthemainprogramcallingsubroutine.

Program4-3

;Thisprogramfillsablockofmemorywithafixedvalueand

;thentransfers(copies)theblocktonewareaofmemory

AREAPROGRAM4_3,CODE,READONLY

ENTRY

RAM1_ADDREQU0x40000000;ChangetheaddressforyourARM

RAM2_ADDREQU0x40000100;ChangetheaddressforyourARM

BLFILL;callblockfillsubroutine

BLCOPY;callblocktransfersubroutine

HEREBALHERE;BAL(branchalways)isthesameasB

;–––––-BLOCKFILLSUBROUTINE

FILLLDRR1,=RAM1_ADDR;R1=RAMAddresspointer

MOVR0,#10;counter

LDRR2,=0x55555555

L1STRR2,[R1];sendittoRAM

ADDR1,R1,#4;R1=R1+4toincrementpointer

SUBSR0,R0,#1;R0=R0-1fordeccounter

BNEL1;keepdoingit

Page 205: ARM Assembly Language: Programming and Architecture

BXLR;returntocaller

;–––––—BLOCKCOPYSUBROUTINE

COPYLDRR1,=RAM1_ADDR;R1=RAMAddresspointer(source)

LDRR2,=RAM2_ADDR;R2=RAMAddresspointer(dest.)

MOVR0,#10;counter

L2LDRR3,[R1];getfromRAM1

STRR3,[R2];sendittoRAM2

ADDR1,R1,#4;R1=R1+4toincrementpointerforRAM1

ADDR2,R2,#4;R2=R2+4toincrementpointerforRAM2

SUBSR0,R0,#1;R0=R0–1fordecrementingcounter

BNEL2;keepdoingit

BXLR;returntocaller

;–––-

END;noticetheplaceofENDdirective

Therearecasesinwhichweneedtocallasubroutinewithinacall(nestedcall).TodothatwemustusestacksincetheARMCPUcansupportonlyonecallatatimesincethereisonlyonelinkerregister(R14).

ReviewQuestions

1.ThemnemonicBLstandsfor_______.

2.Trueorfalse.“BLDELAY”savestheaddressoftheinstructionbelowBLinLRregister.

3.“BLDELAY”isa___-byteinstruction.

4.LRisan___-bitregister.

5.LRisthesameas______register.

6.ExplainthedifferencebetweenBandBLinstructions.

Page 206: ARM Assembly Language: Programming and Architecture
Page 207: ARM Assembly Language: Programming and Architecture

Section4.3:ARMTimeDelayandInstructionPipelineIn this sectionwediscusshow togeneratevarious timedelays and calculate time

delays for the ARM. We will also discuss instruction pipelining and its impact onexecutiontime.

DelaycalculationfortheARM

IncreatingatimedelayusingAssemblylanguageinstructions,onemustbemindfuloftwofactorsthatcanaffecttheaccuracyofthedelay:

1. Thecrystalfrequency:ThefrequencyofthecrystaloscillatorconnectedtotheCPUisonefactorinthetimedelaycalculation.Thedurationoftheclockperiodfortheinstructioncycleisafunctionofthiscrystalfrequency.

2. TheARMdesign: Since the 1970s, both the field of IC technology and thearchitecturaldesignofmicroprocessorshave seengreat advancements.Due to thelimitationsofICtechnologyandlimitedCPUdesignexperienceformanyyears,theinstruction cycle duration was longer. Advances in both IC technology and CPUdesign in the 1980s and 1990s havemade the single instruction cycle a commonfeatureofmanymicroprocessors.Indeed,onewaytoincreaseperformancewithoutlosingcodecompatibilitywiththeoldergenerationofagivenfamilyistoreducethenumberof instructioncycles it takes toexecutean instruction.OnemightwonderhowmicroprocessorssuchasARMareabletoexecuteaninstructioninonecycle.Thereare threeways todo that: (a)UseHarvardarchitecture toget themaximumamountofcodeanddata into theCPU,(b)useRISCarchitecturefeaturessuchasfixed-size instructions, and finally (c) use pipelining to overlap fetching andexecutionofinstructions.WehaveexaminedtheHarvardandRISCarchitecturesinChapter2.Next,wegiveabriefdiscussionofpipelining.Chapter7coverstheARMpipelineinmuchmoredetail.

Pipelining

Inearlymicroprocessorssuchasthe8085,theCPUcouldeitherfetchorexecuteatagiventime.Inotherwords,theCPUhadtofetchaninstructionfrommemory,decode,andthenexecuteit,andthenfetchthenextinstruction,decodeandexecuteit,andsoonasshown inFigure 4-9.The ideaofpipelining in its simplest form is toallow theCPUtofetch and execute at the same time. That is an instruction is being fetched while thepreviousinstructionisbeingexecuted.

Figure4-9:Non-pipelineexecution

We can use a pipeline to speed up execution of instructions. In pipelining, theprocessofexecutinginstructionsissplitintosmallstepsthatareallexecutedinparallel.Inthisway,theexecutionofmanyinstructionsisoverlapped.Onelimitationofpipeliningisthatthespeedofexecutionislimitedtothesloweststageofthepipeline.Comparethistomaking pizza. You can split the process ofmaking pizza intomany stages, such as

Page 208: ARM Assembly Language: Programming and Architecture

flatteningthedough,puttingonthetoppings,andbaking,buttheprocessislimitedtotheslowest stage, baking, no matter how fast the rest of the stages are performed. Whathappensifweusetwoorthreeovensforbakingpizzastospeeduptheprocess?Thismaywork for making pizza but not for executing programs, because in the execution ofinstructionswemustmake sure that the sequence of instructions is kept intact and thatthereisnoout-of-stepexecution.

ARMmultistageexecutionpipeline

As shown in Figure 4-10, in the ARM, each instruction is executed in 3 stages:Fetch,Decode,andExecute.

Figure4-10:PipelineinARM

In step 1, the opcode is fetched. In step 2, the opcode is decoded. In step 3, theinstruction is executed and result is written into the destination register. This 3-stagepipelinewasusedintheoriginalARM.ThenewerversionofARMmayhavemorestagesofpipeline.SeeChapter7andyourARMmanual.

InstructioncycletimefortheARM

IttakesacertainamountoftimefortheCPUtoexecuteaninstruction.Thisamountoftimeisreferredtoasmachinecycles.ThankstotheRISCarchitecture,ARMexecutesmost instructions in onemachine cycle. In theARM family, the length of themachinecycle depends on the frequency of the oscillator connected to the ARM system. Thecrystal oscillator, along with on-chip circuitry, provide the clock source for the ARMCPU.IntheARM,onemachinecycleconsistsofoneoscillatorperiod,whichmeansthatwitheachoscillatorclock,onemachinecyclepasses.Therefore,tocalculatethemachinecyclefortheARM,wetaketheinverseofthecrystalfrequency,asshowninExample 4-9.

Example4-9

ThefollowingshowsthecrystalfrequencyforfourdifferentARM-basedsystems.Findtheperiodoftheinstructioncycleineachcase.

(a)80MHz(b)160MHz(c)100MHz(d)50MHz

Solution:

(a)instructioncycleis1/80MHz=0.0125ms(microsecond)=12.5ns(nanosecond)

Page 209: ARM Assembly Language: Programming and Architecture

(b)instructioncycle=1/160MHz=0.00625ms=6.25ns

(c)instructioncycle=1/100MHz=0.01ms=10ns

(d)instructioncycle=1/50MHz=0.02ms=20ns

Branchpenalty

Theoverlappingoffetchandexecutionoftheinstructioniswidelyusedintoday’smicroprocessorssuchasARM.Fortheconceptofpipeliningtowork,weneedabufferorqueue in which an instruction is prefetched and ready to be executed. In somecircumstances,theCPUmustflushoutthequeue.Forexample,whenabranchinstructionisexecuted,theCPUstartstofetchcodesfromthenewmemorylocationandthecodeinthequeue thatwasfetchedpreviously isdiscarded. In thiscase, theexecutionunitmustwaituntil thefetchunitfetchesthenewinstruction.This iscalledabranchpenalty.Thepenaltyisanextrainstructioncycletofetchtheinstructionfromthetargetlocationinsteadofexecutingtheinstructionrightbelowthebranch.Rememberthattheinstructionbelowthe branch has already been fetched and is next in line to be decoded when the CPUbranches to a different address. This means that while the vast majority of ARMinstructions take only onemachine cycle, some instructions take threemachine cycles.These are Branch, BL (call), and all the conditional branch instructions such as BNE,BLO,andsoon.Theconditionalbranchinstructioncantakeonlyonemachinecycleifitdoes not jump.For example, theBNEwill jump ifZ=0 and that takes threemachinecycles.IfZ=1,thenitfallsthroughandittakesonlyonemachinecycle.SeeExamples 4-10and4-11.

Example4-10

ForanARMsystemof100MHz,findhowlongittakestoexecuteeachofthefollowinginstructions:

(a)MOV(b)SUB(c)B

(d)ADD(e)NOP(f)BHI

(g)BLO(h)BNE(i)EQU

Solution:

Themachinecycleforasystemof100MHzis10ns,asshowninExample4-9.Therefore,wehave:

InstructionInstructioncyclesTimetoexecute

(a)MOV11×10ns=10ns

Page 210: ARM Assembly Language: Programming and Architecture

(b)SUB11×10ns=10ns

(c)B33×10ns=30ns

(d)ADD11×10ns=10ns

(e)NOP11×10ns=10ns

Forthefollowing,duetobranchpenalty,3clockcyclesiftakenand1ifitfallsthrough:

(f)BHI3/13×10ns=30ns

(g)BLO3/13×10ns=30ns

(h)BNE3/13×10ns=30ns

(i)EQU0(directivesdonotproducemachineinstructions)

NoticethatARMchipdoesnothavetheNOPinstruction.TheARMAssemblerreplacesitwithsomeother1-cycleinstruction.OnyourARMAssembler,youmightgetawarningthatNOPdoesnotexist,butitstillcompilestheprogram.Itdoesnotgiveanerror.

DelaycalculationforARM

Adelaysubroutineconsistsoftwoparts:(1)settingacounter,and(2)aloop.Mostofthetimedelayisperformedbythebodyoftheloop,asshowninExample 4-11.

Example4-11

Findthesizeofthedelayofthecodesnippetbelowifthecrystalfrequencyis100MHz:

DELAYMOVR0,#255

AGAINNOP

NOP

SUBSR0,R0,#1

BNEAGAIN

MOVPC,LR;return

Solution:

WehavethefollowingmachinecyclesforeachinstructionoftheDELAYsubroutine:

InstructionMachineCycle

Page 211: ARM Assembly Language: Programming and Architecture

DELAYMOVR0,#255;1

AGAINNOP;1

NOP;1

SUBSR0,R0,#1;1

BNEAGAIN;3/1

MOVPC,LR;1

Therefore,wehaveatimedelayof[1+((1+1+1+3)×255)+1]×10ns=15,320ns.

NoticethatBNEtakesthreeinstructioncyclesifitjumpsback,andtakesonlyonecyclewhenfallingthroughtheloop.Thatmeanstheabovenumbershouldbe153.0ns.Becausethelasttime,whenR0iszero,theBNEtakesonlyonecyclebecauseitfallsthroughtheloop

Veryoftenwecalculatethetimedelaybasedontheinstructionsinsidetheloopandignoretheclockcyclesassociatedwiththeinstructionsoutsidetheloop.

InExample 4-11,thelargestvaluetheR0registercantakeis232=4G.Onewaytoincrease the delay is to use NOP instructions in the loop. NOP, which stands for “nooperation,” simply wastes time, but takes 4 bytes of program memory and that is tooheavyapricetopayforjustoneinstructioncycle.Abetterwayistouseanestedloop.

Loopinsidealoopdelay

Anotherwaytogetalargedelayistousealoopinsidealoop,whichisalsocalledanestedloop.SeeExample 4-12.

Example4-12

InagivenARMtraineranI/Oportisconnectedto8LEDs.ThefollowingprogramtogglestheLEDsbysendingtoit0x55and0xAAvaluescontinuously.CalculatethetimedelayfortogglingofLEDs.Assumethesystemclockfrequencyof100MHz.

Solution:

AREAExample4_12,CODE,READONLY

ENTRY

PORT_ADDREQU0x40000000;changetheaddressforyourARM

LDRR1,=PORT_ADDR;R1=portaddress

AGAINMOVR0,#0x55;R0=0x55

Page 212: ARM Assembly Language: Programming and Architecture

STRBR0,[R1];sendittoLEDs

BLDELAY;calldelay

MOVR0,#0xAA;R0=0xAA

STRBR0,[R1];sendittoLEDs

BLDELAY;calldelay

BALAGAIN;keepdoingitforever(BAListhesameasB)

;––––––—DELAYSUBROUTINE

DELAY

MOVR3,#100;R3=100,modifythisvaluefordifferentsizedelay

L1LDRR4,=250000;R4=250,000(innerloopcount)

L2SUBSR4,R4,#1;1clock

BNEL2;3clock

SUBSR3,R3,#1;R3=R3-1

BNEL1

MOVPC,LR;returntocaller

END

Ignoringthedelayassociatedwiththeouterloop,wehavethefollowingtimedelay:

[(1+3)×250,000×100]×10ns=1secondsince1/100MHz=10ns.

ExaminetheworkingfrequencyforyourARMtrainer,changetheaboveaddress0x40000000toyourARMtrainerportaddressandverifythetimedelayusingoscilloscope.

Fromthesediscussionsweconcludethat theuseofinstructionsingeneratingtimedelayisnot themostreliablemethod.Togetmoreaccurate timedelayTimersareused.All ARM microcontrollers come with on-chip Timers. We can use Keil uVision’ssimulatortoverifydelaytimeandnumberofcyclesused.Meanwhile,togetanaccuratetimedelayforagivenARMmicrocontroller,wemustuseanoscilloscopetomeasuretheexacttimedelay.

ReviewQuestions

1.Trueorfalse.IntheARM,themachinecyclelasts1clockperiodofthecrystalfrequency.

Page 213: ARM Assembly Language: Programming and Architecture

2.TheminimumnumberofmachinecyclesneededtoexecuteanARMinstructionis_____.

3.Findthemachinecycleforacrystalfrequencyof66MHz.

4.Assumingacrystalfrequencyof100MHz,findthetimedelayassociatedwiththeloopsectionofthefollowingDELAYsubroutine:

DELAYLDRR2,=50000000

HERENOP

NOP

NOP

NOP

NOP

SUBSR2,R2,#1

BNEHERE

MOVPC,LR

5.FindthemachinecycleforanARMifthecrystalfrequencyis50MHz.

6.Trueorfalse.IntheARM,theinstructionfetchingandexecutionaredoneatthesametime.

7.Trueorfalse.BandBLwillalwaystake2machinecycles.

8.Trueorfalse.TheBNEinstructionwillalwaystake3machinecycles.

Page 214: ARM Assembly Language: Programming and Architecture
Page 215: ARM Assembly Language: Programming and Architecture

Section4.4:ConditionalExecutionEverymicroprocessor has the conditional branch (jump) instruction based on the

statusofflagbitssuchasZandC.InstructionssuchasBEQ(branchequal,Z=1)orBNC(branchifnocarry,C=0)arecommoninallCPUs.TheARMCPUhasauniquefeaturethatwedonotseeinothermicroprocessors.InARM,theconceptofconditionalexecutionisimplementedforallinstructionandnotjustforbranchwhichmakesyouabletodecidetorunorignoreeachsingleinstructiondependingonthestatusofflagbits.Inotherwords,notonlythebranchinstructionbutalloftheARMinstructionscanbeconditional.Aswediscussed inChapters 2 and 3, theADD, SUB, and other arithmetic instruction do notaffect the flag bits in CPSR (current program status register) register unless they haveletterS in the syntax.Thedefault isnot toupdate the flags.Weoverride thedefaultbyhaving letterS in the instruction.Thesame thing is trueaboutconditional fieldofeachinstruction. If we do not add a condition after an instruction, it will be executedunconditionallybecausethedefaultisnottochecktheflagsandexecuteunconditionally.If we want an instruction to be executed only when a condition is met, we put theconditionsyntaxrightaftertheinstruction.

ThisfeatureofARMallowstheexecutionofaninstructionconditionallybasedonthestatusofZ,C,VandNflags.Todothat,theARMinstructionshavesetasidethemostsignificant 4bits of the instruction field for the conditions.SeeFigure4-11.The4bitsgivesus16possibleconditions.Table4-4showsthelistofallthe16possibleconditions.

Figure4-11:ConditionFieldinARMInstructions

Bits MnemonicExtension Meaning Flag

0000 EQ Equal Z=1

0001 NE Notequal Z=0

0010 CS/HS CarrySet/HigherorSame C=1

0011 CC/LO CarryClear/Lower C=0

0100 MI Minus/Negative N=1

0101 PL Plus N=0

0110 VS VSet(Overflow) V=1

0111 VC VClear(NoOverflow) V=0

1000 HI Higher C=1andZ=0

1001 HS LowerorSame C=1andZ=1

Page 216: ARM Assembly Language: Programming and Architecture

1010 GE GreaterthanorEqual N=V

1011 LT Lessthan N≠V

1100 GT Greaterthan Z=0andN=V

1101 LE LessthanorEqual Z=0orN≠V

1110 AL Always(unconditional)

1111 – NotValid

Table4-4:ARMConditioncodesfortheOpcodebits[31-28]

Note!

By default, all the instructions are executed unconditionally. As a result the AL(Always)suffixhasnoeffectontheinstruction.Forexample,BAL(BranchAlways)isexactlythesameasB(Branch).Thesameistrueforallinstructions.

Tomakeaninstructionconditional,simplyweputtheconditionsyntaxfromTable4-4infrontofit.Seethefollowingexamples:

MOVR1,#10;R1=10

MOVR2,#12;R2=12

CMPR2,R1;compare12with10,Z=0becausetheyarenotequal

MOVEQR4,#20;thislineisnotexecutedbecause

;theconditionEQisnotmet

Thefollowingcodeadds10toR1ifitisnotzero:

CMPR1,#0;compareR1with0

ADDNER1,R1,#10;thislineisexecutedifZ=0

;(ifinthelastCMPoperandswerenotequal)

NotethatwecanaddbothSandconditiontosyntaxofaninstruction.ItiscommontoputSafterthecondition.Seethefollowingexamples:

ADDNESR1,R1,#10

;thislineisexecutedandsettheflagsifZ=0

Oneadvantageofusingconditionalexecutionisitsavestimeintheexecutionofaninstruction by avoiding branch penalty. As we discussed earlier, each instruction goesthroughthreepipelinestagesoffetch,decode,andexecution.WhenabranchinstructionsuchasBCC(branchifCcleared)entersthepipeline,theinstructionbelowisalsofetchedintotheCPUnotknowingwhethertheCflagishighorlow.IftheC=0,thebranchwilljump to new location resulting in emptying the instruction pipeline queue which wasworking on the instruction below theBCC instruction. This penalty can be avoided by

Page 217: ARM Assembly Language: Programming and Architecture

usingconditionalexecutionfeatureoftheARM.WecanmaketheexecutionofmanyoftheARMinstructionsconditionalbysimplyaddingamnemonicextensiontotheendofit.For example, we can use ADDCS instead of ADD. The ADDCS means add if C=1.(ADDCS:AddifCarrySet).BythesametokentheADDEQmeansaddonlyifZ=1.

Example 4-13 shows two versions of the Example 4-4we covered earlier. In thenewversion,we use the conditional execution instructions. Simulate and compare bothversionstoseehowtheconditionalinstructionsareexecuted.

Example4-13

Thefollowingprogramadds0x99999999together10times.Comparethecodesyntaxtoseehowtheconditionalexecutionofthecodeisused.

Code1:(Example4-4withoutconditionalexecution)

AREAEXAMPLE4_4,CODE,READONLY

ENTRY

MOVR1,#0;clearhighword(R1=0)

MOVR0,#0;clearlowword(R0=0)

LDRR2,=0x99999999;R2=0x99999999

MOVR3,#10;counter

L1ADDSR0,R0,R2;R0=R0+R2andupdatetheflags

BCCNEXT;ifC=0,gotonextnumber

ADDR1,R1,#1;ifC=1,incrementtheupperword

NEXTSUBSR3,R3,#1;R3=R3-1andupdatetheflags

BNEL1;nextroundifz=0

HEREBHERE;stayhere

END

Code2:(Example4-4withconditionalexecution)

AREAEXAMPLE4_13,CODE,READONLY

ENTRY

MOVR1,#0;clearhighword(R1=0)

MOVR0,#0;clearlowword(R0=0)

LDRR2,=0x99999999;R2=0x99999999

MOVR3,#10;counter

Page 218: ARM Assembly Language: Programming and Architecture

L1ADDSR0,R0,R2;R0=R0+R2andupdatetheflags

ADDCSR1,R1,#1;ifCset(C=1),incrementtheupperword

NEXTSUBSR3,R3,#1;R3=R3-1andupdatetheflags

BNEL1;nextroundifz=0

HEREBHERE;stayhere

END

SeealsoExamples4-14and4-15.

Example4-14

RewritethemainpartofProgram4-1usingconditionalexecutionofARMinstructions

Solution:

MOVCOUNT,#5;COUNT=5

MOVMAX,#0;MAX=0

LDRPOINTER,=MYDATA

;POINTER=MYDATA(addressoffirstdata)

AGAIN

LDRNEXT,[POINTER]

;loadcontentsofPOINTERlocationtoNEXT

CMPMAX,NEXT;compareMAXandNEXT

MOVLOMAX,NEXT

;ifMAXislowerthanNEXTthenMAX=NEXT

ADDPOINTER,POINTER,#4

;POINTER=POINTER  +  4topointtonext

SUBSCOUNT,COUNT,#1;decrementcounter

BNEAGAIN;branchAGAINifcounterisnotzero

Example4-15

Page 219: ARM Assembly Language: Programming and Architecture

Usingrotation,writeaprogramthatcountsthenumberof1sinR0.

Solution:

AREAEXAMPLE4_15,CODE,READONLY

ENTRY

LDRR0,=0x34F37D36

MOVR1,#0;numberof1s

MOVR2,#32;counter

BEGINMOVR0,R0,RRX;RotaterightwithcarrytheR0register

ADDCSR1,R1,#1;ifC=1thenincrementR1

SUBSR2,R2,#1;decrementcounter

BNEBEGIN;ifcounterisnotequaltozerobranchBEGIN

END

ReviewQuestions

1.Trueorfalse.AlltheARMinstructionshavetheconditionalexecutionfeature.

2.HowmanybitsoftheARMinstructionaresetasidefortheconditioncodes?

3.TheADDALstandsfor……………...

4.Trueorfalse.MOVVCisavalidinstructioninARM

5.Trueorfalse.SUBEQSisavalidinstructioninARM

Page 220: ARM Assembly Language: Programming and Architecture

ProblemsSection4.1:LoopingandBranchInstructions

1.IntheARM,loopingactionusingasingleregisterislimitedto_______iterations.

2.Ifaconditionalbranchisnottaken,whatisthenextinstructiontobeexecuted?

3.Incalculatingthetargetaddressforabranch,adisplacementisaddedtothecontentsofregister_______.

4.ThemnemonicBNEstandsfor__________.

5.WhatistheadvantageofusingBXoverB?

6.Trueorfalse.ThetargetofaBNEcanbeanywhereinthe4Gwordaddressspace.

7.Trueorfalse.AllARMbranchinstructionscanbranchtoanywhereinthe4Gbyteaddressspace.

8.DissecttheBinstruction,indicatinghowmanybitsareusedfortheoperandandtheopcode,andindicatehowfaritcanbranch.

9.Trueorfalse.Allconditionalbranchesare2-byteinstructions.

10.Showcodeforanestedlooptoperformanaction10,000,000,000times.

11.Showcodeforanestedlooptoperformanaction200,000,000,000times.

12.Findthenumberoftimesthefollowingloopisperformed:

MOVR0,#0x55

MOVR2,#40

L1LDRR1,=10000000

L2EORR0,R0,#0xFF

SUBR1,R1,#1

BNEL2

SUBR2,R2,#1

BNEL1

13.IndicatethestatusofZandCafterCMPisexecutedineachofthefollowingcases.

(a)

MOVR0,#50

MOVR1,#40

CMPR0,R1

(b)

MOVR1,#0xFF

MOVR2,#0x6F

CMPR1,R2

(c)

MOVR2,#34

MOVR3,#88

CMPR2,R3

Page 221: ARM Assembly Language: Programming and Architecture

(d)

SUBR1,R1,R1

MOVR2,#0

CMPR1,R2

(e)

EORR2,R2,R2

MOVR3,#0xFF

CMPR2,R3

(f)

EORR0,R0,R0

EORR1,R1,R1

CMPR0,R1

(g)

MOVR4,#0x78

MOVR2,#0x40

CMPR4,R2

(h)

MOVR0,#0xAA

ANDR0,R0,#0x55

CMPR0,#0

14.RewriteProgram4-1tofindthelowestgradeinthatclass.

15.ThetargetaddressofaBNEisbackwardiftherelativeaddressportionofopcodeis_______(negative,positive).

16.ThetargetaddressofaBNEisforwardiftherelativeaddressportionofopcodeis_______(negative,positive).

Section4.2:CallingSubroutinewithBL

17.BLisa(n)___-byteinstruction.

18.InARM,whichregisteristhelinkerregister?

19.Trueorfalse.TheBLtargetaddresscanbeanywhereinthe4Gbyteaddressspace.

20.DescribehowwecanreturnfromasubroutineinARM.

21.InARM,whichaddressissavedwhenBLinstructionisexecuted.

Section4.3:ARMTimeDelayandInstructionPipeline

22.Findtheoscillatorfrequencyifthemachinecycle=1.25ns.

23.Findthemachinecycleifthecrystalfrequencyis200MHz.

24.Findthemachinecycleifthecrystalfrequencyis100MHz.

25.Findthemachinecycleifthecrystalfrequencyis160MHz.

26.FindthetimedelayforthedelaysubroutineshownbelowifthesystemhasanARMwithafrequencyof80MHz:

MOVR8,#200

BACKLDRR1,=400000000

Page 222: ARM Assembly Language: Programming and Architecture

HERENOP

SUBSR1,R1,#1

BNEHERE

SUBSR8,R8,#1

BNEBACK

27.FindthetimedelayforthedelaysubroutineshownbelowifthesystemhasanARMwithafrequencyof50MHz:

MOVR2,#100

BACKLDRR0,=50000000

HERENOP

NOP

SUBSR0,R0,#1

BNEHERE

SUBSR2,R2,#1

BNEBACK

28.FindthetimedelayforthedelaysubroutineshownbelowifthesystemhasaARMwithafrequencyof40MHz:

MOVR1,#200

BACKLDRR0,#20000000

HERENOP

NOP

NOP

SUBSR0,R0,#1

BNEHERE

SUBSR1,R1,#1

BNEBACK

29.FindthetimedelayforthedelaysubroutineshownbelowifthesystemhasanARMwithafrequencyof100MHz:

Page 223: ARM Assembly Language: Programming and Architecture

MOVR8,#500

BACKLDRR1,=20000

HERENOP

NOP

NOP

SUBSR1,R1,#1

BNEHERE

SUBSR8,R8,#1

BNEBACK

30.TheARMchipdoesnothavetheNOPinstruction.ExaminethelistfilegeneratedbyyourARMassemblertoseewhatinstructionisusedinplaceoftheNOP.

Section4.4:ConditionalExecution

31.WhichbitsoftheARMinstructionaresetasideforconditionexecution?

32.Trueorfalse.OnlyADDandMOVinstructionshaveconditionalexecutionfeature.

33.Trueorfalse.InARM,theconditionalexecutionisdefault.

34.WhichflagbitisexaminedbeforetheMOVEQinstructionisexecuted?

35.StatethedifferencebetweentheADDEQandADDNEinstructions.

36.StatethedifferencebetweentheBALandBinstructions.

37.StatethedifferencebetweentheSUBCCandSUBCSinstructions.

38.StatethedifferencebetweentheANDEQandANDNEinstructions.

39.Trueorfalse.ThedecisiontoexecutetheSUBCCisbasedonthestatusofZflag.

40.Trueorfalse.ThedecisiontoexecutetheADDEQisbasedonthestatusofZflag.

Page 224: ARM Assembly Language: Programming and Architecture

AnswerstoReviewQuestions1.BranchifnotEqual

2.True

3.4

4.ZflagofCPSR(statusregister)

5.4

6.TheBcanonlybranchtoanaddresslocationwithin32MBaddressspace,whiletheBXcangoanywhereinthe4GBaddressspaceofARM.

Section4.2:CallingSubroutinewithBL

1.BranchandLink

2.True

3.4

4.32

5.R14

6.Inbothofthemthetargetaddressisrelativetothevalueoftheprogramcounterandtherelativeaddresscancovermemoryspaceof–32MBto+32MBfromcurrentlocationofprogramcounter.TheBLinstructionsavestheaddressofthenextinstructionintheLRregisterbeforejumping,whiletheBinstructionjustjumpswithoutsavinganything.

Section4.3:ARMTimeDelayandInstructionPipeline

1.True

2.1

3.MC=1/66MHz=0.015ms=15ns

4.[50,000,000×(1+1+1+1+1+1+3)]×10ns=4.5seconds

5.MachineCycle=1/50MHz=0.02ms=20ns

6.True

7.False

8.False.Ittakes3cycles,onlyifitbranchestothetargetaddress.

Section4.4:ConditionalExecution

1.True

2.4bits

3.ADDalwaysregardlessofthestatusflag.

Page 225: ARM Assembly Language: Programming and Architecture

4.True

5.True

Page 226: ARM Assembly Language: Programming and Architecture
Page 227: ARM Assembly Language: Programming and Architecture
Page 228: ARM Assembly Language: Programming and Architecture

Chapter5:SignedNumbersandIEEE754FloatingPointThischapterdealswithsignednumber instructionsandoperations. InSection5.1,

we focus on the concept of signed numbers in software engineering. Signed numberarithmeticoperationsand instructionsareexplainedalongwithexamples inSection5.2.TheIEEE754floatingpointwillbeexplainedinSection5.3.

Page 229: ARM Assembly Language: Programming and Architecture
Page 230: ARM Assembly Language: Programming and Architecture

Section5.1:SignedNumbersConceptAlldataitemsusedsofarhavebeenunsignednumbers,meaningthattheentire8-

bit,16-bitor32-bitoperandwasusedforthemagnitude.Manyapplicationsrequiresigneddata.Inthissectiontheconceptofsignednumbersisdiscussed.

Conceptofsignednumbersincomputers

Ineverydaylife,numbersareusedthatcouldbepositiveornegative.Forexample,atemperatureof5degreesbelowzerocanberepresentedas-5,and20degreesabovezeroas +20. Computersmust be able to accommodate such numbers. To do that, computerscientistshavedevisedthefollowingarrangementfortherepresentationofsignedpositiveandnegativenumbers:Themostsignificantbit(MSB)issetasideforthesign(+or-)andtherestofthebitsareusedforthemagnitude.Thesignisrepresentedby0forpositive(+)numbers and 1 for negative (-) numbers. Signed byte and word representations arediscussedbelow.

Signedbyteoperands

Insignedbyteoperands,D7(MSB) is thesignandD0toD6aresetasidefor themagnitudeofthenumber.IfD7=0,theoperandispositive,andifD7=1,itisnegative.

Therangeofpositivenumbersthatcanberepresentedbytheaboveformatis0to+127.

Dec. Binary

0 00000000

+1 00000001

… ….….

+5 00000101

… ….….

+127 01111111

Dec. Binary

-0 10000000

-1 10000001

… ….….

-5 10000101

… ….….

-127 11111111

Negativenumbersusing2’scomplement

TosaveALUcircuitry,weuse2’scomplementmethod to implement thenegativenumbers. For negative numbers, D7 is 1 but the non-sign (magnitude) portion isrepresented in 2’s complement. Although the assembler does the conversion, it is stillimportant to understand how the conversion works. To convert to negative numberrepresentation(2’scomplement),followthesesteps:

Page 231: ARM Assembly Language: Programming and Architecture

1.Writethemagnitudeofthenumberin8-bitbinary(nosign).

2.Inverteachbit.

3.Add1toit.

Dec. Binary

0 00000000

+1 00000001

… ….….

+5 00000101

… ….….

+127 01111111

Dec. Binary

0 00000000

-1 11111111

… ….….

-5 11111011

… ….….

-127 10000001

-128 10000000

Examples5-1,5-2,and5-3demonstratethesethreesteps.

Example5-1

Showhowthecomputerwouldrepresent-5.

Solution:

1.000001015in8-bitbinary

2.11111010inverteachbit

3.11111011add1(0xFB)

Thisisthesignednumberrepresentationin2’scomplementfor-5.

Example5-2

Show-34hexasitisrepresentedinternally.

Solution:

Page 232: ARM Assembly Language: Programming and Architecture

1.00110100

2.11001011

3.11001100(whichis0xCC)

Example5-3

Showtherepresentationfor-12810.

Solution:

1.10000000

2.01111111

3.10000000Noticethatthisisnotnegativezero(–0).

Fromtheexamplesaboveitisclearthattherangeofbyte-sizednegativenumbersis-1to-128.Thefollowinglistsbyte-sizedsignednumberranges:

Decimal Binary Hex

-128 10000000 80

-127 10000001 81

-126 10000010 82

… ….… ..

-2 11111110 FE

-1 11111111 FF

0 00000000 00

00000001 01

+2 00000010 02

… ….…. ..

Page 233: ARM Assembly Language: Programming and Architecture

+127 01111111 7F

Halfword-sizedsignednumbers

InARMCPUahalf-wordis16bitsinlength.SettingasidetheMSB(D15)forthesignleavesatotalof15bits(D14–D0)forthemagnitude.Thisgivesarangeof–32,768(–215)to+32,767(215–1).

Ifanumber is larger than16-bit, itmustbe treatedasa32-bitwordoperand.Thefollowingshowstherangeofsignedhalf-wordoperands.Toconvertanegativenumbertoits half-word operand representation, the steps discussed in negative byte operands areused.

Decimal Binary Hex

-32,768 1000000000000000 8000

-32,767 1000000000000001 8001

-32,766 1000000000000010 8002

… ….… ..

-2 1111111111111110 FFFE

-1 1111111111111111 FFFF

0 0000000000000000 0000

+1 0000000000000001 0001

+2 0000000000000010 0002

… ….…. …

+32,766 0111111111111110 7FFE

+32,767 0111111111111111 7FFF

UsingMicrosoftWindowscalculatorforsignednumbers

AllMicrosoftWindowsoperatingsystemscomewithahandycalculator.Useit toverifythesignednumberoperationsinthissection.

Word-sizedsignednumbers

InARMCPUs,awordis32bitsinlength.SettingasidetheMSB(D31)forthesignleavesatotalof31bits(D30–D0)forthemagnitude.Thisgivesarangeof-(231)to+(231-1).

Page 234: ARM Assembly Language: Programming and Architecture

To convert a negative number to its word operand representation, the three stepsdiscussedinnegativebyteoperandsareused.SeeExample5-4.

Example5-4

Showhowthecomputerwouldrepresent-5for(a)8-bit,(b)16-bit,and(c)32-bitdatasizes.

Solution:

(a)8-bit

1.000001015in8-bitbinary

2.11111010inverteachbit

3.11111011add1(0xFB)

(b)16-bit

1.00000000000001015in16-bitbinary

2.1111111111111010inverteachbit

3.1111111111111011add1(0xFFFB)

(c)32-bit

1.000000000000000000000000000001015in32-bitbinary

2.11111111111111111111111111111010inverteachbit

3.11111111111111111111111111111011add1(0xFFFFFFFB)

UsetheWindowscalculatortoverifytheseexamples.

Ifanumberislargerthan32-bit,itmustbetreatedasa64-bitdoublewordoperandand be processed chunk by chunk the same way as unsigned numbers. The followingshowstherangeofsignedwordoperands.

Decimal Binary Hex

Page 235: ARM Assembly Language: Programming and Architecture

-2,147,483,648 10000000000000000000000000000000 80000000

-2,147,483,647 10000000000000000000000000000001 80000001

-2,147,483,646 10000000000000000000000000000010 80000002

… … …

-2 11111111111111111111111111111110 FFFFFFFE

-1 11111111111111111111111111111111 FFFFFFFF

0 00000000000000000000000000000000 00000000

+1 00000000000000000000000000000001 00000001

+2 00000000000000000000000000000010 00000002

… … …

+2,147,483,646 01111111111111111111111111111110 7FFFFFFE

+2,147,483,647 01111111111111111111111111111111 7FFFFFFF

Table5-1showsasummaryofsigneddataranges.

DataSize Bits 2n Decimal Hexadecimal

Byte 8 -27to+27-1 -128to+127 0x80–0x7F

Half-word 16 -215to+215-

1-32,768to+32,767 0x8000–0x7FFF

Word 32 -231to+231-1 -2,147,483,648to+2,147,483,647

0x80000000–0x7FFFFFFF

Table5-1:SignedDataRangeSummary

ReviewQuestions

1.Inan8-bitoperand,bit_____isusedforthesignbit,whereasina16-bitoperand,bit_____isusedforthesignbit.Repeatfor32-bitsigneddata.

2.Computethebyte-sized2’scomplementof0x16.

3.Therangeofbyte-sizedsignedoperandsis-______to+_____.Therangeofhalfword-sizedsignedoperandsis-__________to+__________.

4.Therangeofword-sizedsignedoperandsis-__________to+__________.

5.Computethe2’scomplementof0x500000.

Page 236: ARM Assembly Language: Programming and Architecture
Page 237: ARM Assembly Language: Programming and Architecture

Section5.2:SignedNumberInstructionsandOperationsIn this section we examine issues associated with signed number arithmetic

operations.WewillalsodiscusstheARMinstructionsforsignednumbersandhowtousethem. Itmust be noted that inARM theN flag bit inCSPR is the signbit.N=0 is forpositiveandN=1fornegativenumbers.

Overflowprobleminsignednumberoperations

Whenusingsignednumbers,aseriousproblemarisesthatmustbedealtwith.Thisistheoverflowproblem.TheCPUindicatestheexistenceoftheproblembyraisingtheV(oVerflow) flag,but it isup to theprogrammer to takecareof it.TheCPUunderstandsonly0sand1sandignoresthehumanconventionofpositiveandnegativenumbers.Nowwhatisanoverflow?Iftheresultofanoperationonsignednumbersistoolargefortheregister,anoverflowoccursandtheprogrammermustbenotified.LookatExample5-5.

Example5-5

Lookatthefollowingcasefor8-bitdatasize:

+96 01100000

+70 +01000110

+166 10100110

AccordingtotheCPU,thisis-90,whichiswrong.(V=1,N=1,C=0)

Intheexampleabove,+96isaddedto+70andtheresultaccordingtotheCPUis- 90. Why? The reason is that the result was more than 8 bits could handle. The 8-bitregisterscanonlycontainupto+127.ThedesignersoftheCPUcreatedtheoverflowflagspecifically for the purpose of informing the programmer that the result of the signednumberoperationiserroneous.

Whentheoverflowflagissetin8-bitoperations

In 8-bit signed number operations, V is set to 1 if either of the following twoconditionsoccurs:

1.ThereisacarryfromD6toD7butnocarryoutofD7(C=0).

2.ThereisacarryfromD7out(C=1)butnocarryfromD6toD7.

Inotherwords,theoverflowflagissetto1ifthereisacarryfromD6toD7orfromD7out,butnotboth.ThismeansthatifthereisacarrybothfromD6toD7andfromD7out,V=0.InExample5-5,sincethereisonlyacarryfromD6toD7andnocarryfromD7out,V=1.Examples5-6,5-7,and5-8givefurtherillustrationsoftheoverflowflaginsignednumberarithmetic.

Page 238: ARM Assembly Language: Programming and Architecture

Example5-6

Examinethefollowingcase:

-128 10000000

+-2 +11111110

-130 01111110 V=1,N=0(positive),C=1

AccordingtotheCPU,theresultis+126,whichiswrong.TheerrorisindicatedbythefactthatV=1.

Example5-7

Observetheresultsofthefollowing:

;assumeR3=+7(R3=0x07)

;assumeR2=+18(R2=0x12)

ADDR2,R2,R3;(R2=0x19=+25,correct)

+7 00000111

++18 +00010010

+25 00011001

V=0,C=0,andN=0(positive).

Example5-8

Observetheresultsofthefollowing:

-2 11111110

+-5 11111011

-7 11111001

V=0,C=0,andN=1(negative);theresultiscorrectsinceV=0.

Page 239: ARM Assembly Language: Programming and Architecture

Overflowflagin16-bitoperations

Ina16-bitoperation,Vissetto1ineitheroftwocases:

1.ThereisacarryfromD14toD15butnocarryoutofD15(C=0).

2.ThereisacarryfromD15out(C=1)butnocarryfromD14toD15.

Againtheoverflowflagislow(notset)ifthereisacarryfrombothD14toD15andfromD15out.TheVissetto1onlywhenthereisacarryfromD14toD15orfromD15out,butnotfromboth.SeeExamples5-9and5-10.

Example5-9

Observetheresultsinthefollowing16-bithexnumbers:

+6E2F 0110111000101111

++13D4 0001001111010100

+8203 1000001000000011

=–0x7DFD=–32,253incorrect!

V=1,C=0,N=1

Example5-10

Observetheresultsinthefollowing16-bithexnumbers:

+542F 0101010000101111

++12E0 +0001001011100000

+670F 0110011100001111=+0x670F=+26,383(correctanswer);

V=0,C=0,N=0

Overflowflagin32-bitoperations

Ina32-bitoperation,Vissetto1ineitheroftwocases:

1.ThereisacarryfromD30toD31butnocarryoutofD31(C=0).

2.ThereisacarryfromD31out(C=1)butnocarryfromD30toD31.

Againtheoverflowflagislow(notset)ifthereisacarryfrombothD30toD31andfromD31out.TheVissetto1onlywhenthereisacarryfromD30toD31orfromD31

Page 240: ARM Assembly Language: Programming and Architecture

out,butnotfromboth.SeeExamples5-11and5-12.

Example5-11

Observetheresultsinthefollowing32-bithexnumbers:

+6E2F356F 01101110001011110011010101101111

++13D49530 +00010011110101001001010100110000

+8203CA9F 10000010000000111100101010011111 =–0x7DFC3561

incorrect!V=1,C=0,N=1

Example5-12

Observetheresultsinthefollowing32-bithexnumbers:

+542F356F 01010100001011110011010101101111

++12E09530 00010010111000001001010100110000

+670FCA9F 01100111000011111100101010011111 =+670FCA9F

correctanswer;V=0,C=0,N=0

Signextensionandavoidingerroneousresultsinsignednumberoperations

To avoid the problems associated with signed number operations, one can sign-extendtheoperand.Signextensioncopiesthesignbit(D7)ofthelowerbyteofaregisterintotheupperbitsofthe32-bitregister,orcopiesthesignbitofa16-bitregisterinto32-bit.TheLDRSB(loadregistersignedbyte)instructionloadsintotheregisterabytefrommemoryandsignextendstheD7totheentire32-bitregister.TheLDRSH(loadregistersignedhalf-word) instruction loads into the register a half-word frommemory and signextendstheD15totheentire32-bitregister.Theyworkasfollows:

LDRSB loads into the destination register a byte frommemory and sign extends(copyD7,thesignflag)toall32bits.ThisisillustratedinFigure5-1.

Figure5-1:SignExtendingaByte

Lookatthefollowingexample:

Page 241: ARM Assembly Language: Programming and Architecture

;assumelocation0x80000has+96=01100000andR1=0x80000

LDRSBR0,[R1];nowR0=00000000000000000000000001100000

;assumelocation0x80000contains-2=11111110andR2=0x80000

LDRSBR4,[R2];nowR4=11111111111111111111111111111110

Ascanbeseenintheaboveexamples,LDRSBdoesnotalterthelower8bits.Thesignof8bitsiscopiedtotherestofthe32-bitregister.

LDRSHloadsthedestinationregisterwitha16-bitsignednumberandsign-extendsto the32-bit register. It copiesD15ofRd toallbitsof theRd register.This isused forsignedhalf-wordoperandandisillustratedinFigure5-2.

Figure5-2:SignExtendingaHalf-word

Lookatthefollowingexample:

;assume0x80000contains+260=0000000100000100andR1=0x80000

LDRSHR0,[R1];R0=00000000000000000000000100000100

Anotherexample:

;assumelocation0x20000has-327660=0x8002andR2=0x20000

LDRSHR1,[R2];R1=FFFF8002

Ascanbeseenintheaboveexamples,LDRSHdoesnotalterthelower16bits.Thesignof16bitsiscopiedtotherestofthe32-bitregister.Howcantheseinstructionshelpcorrect the overflow error?To answer that question,Example 5-13 showsExample 5-5rewrittentocorrecttheoverflowproblem.

Example5-13

WriteaprogramforExample5-5tohandletheoverflowproblem.

Solution:

AREAEXAMPLE5_13,CODE,READONLY

ENTRY

LDRR1,=DATA1

LDRR2,=DATA2

LDRR3,=RESULT

LDRSBR4,[R1];R4=+96

LDRSBR5,[R2];R5=+70

Page 242: ARM Assembly Language: Programming and Architecture

ADDR4,R4,R5;R4=R4+R5=96+70=+166

STRR4,[R3];Store+166inlocationRESULT

HLBHL

DATA1DCB+96

DATA2DCB+70

AREAVARIABLES,DATA,READWRITE;ThefollowingisstoredinRAM

RESULTDCW0

END

ThefollowingisananalysisofthevaluesinExample5-13.Eachissign-extendedandthenaddedasfollows:

Sign Binarynumbers Decimal

0 0000000000000000000000001100000 +96aftersignext.

0 0000000000000000000000001000110 +70aftersignext.

0 0000000000000000000000010100110 +166

Asarule,ifthepossibilityofoverflowexists,allbyte-sizedsignednumbersshouldbesign-extendedintoaword,andsimilarly,allhalfword-sizedsignedoperandsshouldbesign-extended to a word before they are processed. This is shown in Program 5-1.Program5-1findstotalsumofagroupofsignednumberdata.

Program5-1

;Thisprogramcalculatesthesumofsignednumbers

AREAPROG5_1,CODE,READONLY

ENTRY

LDRR0,=SIGN_DAT

MOVR3,#9

MOVR2,#0

LOOPLDRSBR1,[R0]

;LoadintoR1andsignextendit.

Page 243: ARM Assembly Language: Programming and Architecture

ADDR2,R2,R1;R2=R2+R1

ADDR0,R0,#1;pointtonext

SUBSR3,R3,#1;decrementcounter

BNELOOP

LDRR0,=SUM

STRR2,[R0];StoreR2inlocationSUM

HEREBHERE

SIGN_DATDCB+13,-10,+19,+14,-18,-9,+12,-19,+16

AREAVARIABLES,DATA,READWRITE

SUMDCD0

END

Signednumbermultiplication

Signed number multiplication is similar in its operation to the unsignedmultiplication described in Chapter 3. The only difference between them is that theoperands in signed number operations can be positive or negative; therefore, the resultmust indicate the sign. InARMwehaveSMULL(signedmultiply long)butnoSMUL(signmultiply).Table5-2summarizessignednumbermultiplication;itissimilartoTable3-3inChapter3.SeeExamples5-14and5-15.

Multiplication Operand1 Operand2 Result

word×word Rm Rs RdHi=upper32-bit,RdLo=lower32-bit

Note:UsingSMULL(signedmultiplylong)forword×wordsmultiplicationprovidesthe64-bitresultinRdLoandRdHiregister.Thisisusedfor32-bit×32-bitnumbersinwhichresultcangobeyond0xFFFFFFFF.

Table5-2:SignedMultiplication(SMULLRdLo,RdHi,Rm,Rs)Summary

Example5-14

Observetheresultsofthefollowingmultiplicationofsignednumbers:

LDRR1,=-3500;R1=-3500(0xFFFFF254)

LDRR0,=-100;R0=-100(0xFFFFFF9C)

SMULLR2,R3,R0,R1

Solution:

Page 244: ARM Assembly Language: Programming and Architecture

-3500×-100=350,000=55730inhex.AfterexecutingtheaboveprogramR2andR3willcontain0x55730and00000000,respectively.

Example5-15

ThefollowingprogramissimilartoExample5-14.But,insteadofSMULL,theUMULLinstructionisused.Observetheresultsofthefollowingmultiplication:

LDRR1,=-3500;R1=-3500(0xFFFFF254)

MOVR0,#-100;R0=-100(0xFFFFFF9C)

UMULLR2,R3,R0,R1

Solution:

0xFFFFF254×0xFFFFFF9C=0xFFFFF1F000055730.Thus,R2andR3willcontain0x00055730and0xFFFFF1F0,respectively.Asyoucansee,theresultsoftheprogramsarecompletelydifferent.InthepreviousprogramtheSMULLinstructionconsiderstheoperandssignednumbersandtheresultoftwonegativenumbersbecomespositive.Asaresult,thesignbitbecomeszero,butinthisexampletheoperandsareconsideredasunsignednumbers.

Signednumbercomparison

InChapter4wesawthat theCMPinstructionaffects theZandCflags;usingtheflagswecomparedunsignednumbers.ThisinstructionaffectstheNandVflags,aswell;We can use flags Z, V, and N to compare signed numbers. The Z flag shows if thenumbersareequalornot.WhenthenumbersareequaltheZflagissettoone.NandVflagsshowiftheleftoperandisbiggerthantherightoperandornot.WhenNandVhavethesamevalue,thefirstoperandhasagreatervalue.

Insummary,afterexecutingtheinstructionCMPRn,Op2 theflagsarechangedasfollows:

Op2>RnV=N

Op2=RnZ=1

Op2<RnN≠V

Table 5-3 lists the branch instructions which check the Z, V, and N flags. TheinstructionscanbeusedtogetherwiththeCMPinstructiontocomparesignednumbers.

Page 245: ARM Assembly Language: Programming and Architecture

Instruction Action

BEQ branchequal BranchifZ=1

BNE Branchnotequal BranchifZ=0

BMI Branchminus(branchnegative) BranchifN=1

BPL Branchplus(branchpositive) BranchifN=0

BVS BranchifVset(branchoverflow) BranchifV=1

BVC BranchifVclear(branchifnooverflow) BranchifV=0

BGE Branchgreaterthanorequal BranchifN=V

BLT Branchlessthan BranchifN≠V

BGT Branchgreaterthan BranchifZ=0andN=V

BLE Branchlessthanorequal BranchifZ=1orN≠V

Table5-3:ARMConditionalBranch(Jump)InstructionsforSignedData

Program5-2findsthelowestnumberamongalistofnumbers.

Program5-2

;Findingthelowestofsignednumbers

AREAPROG5_2,CODE,READONLY

ENTRY

LDRR0,=SIGN_DAT

MOVR3,#8

LDRSBR2,[R0]

;bringintoR2thefirstsignnumberandsignextendit

ADDR0,R0,#1;pointtonext

BEGINLDRSBR1,[R0];R1=contentsofloc.pointedtobyR0

CMPR1,R2;compareR1andR2

BGENEXT

;branchtoNEXTifR1isgreaterthanorequaltoR2

MOVR2,R1

;R2=R1(usethenewnumberforcomparison)

NEXTADDR0,R0,#1;pointtonext

Page 246: ARM Assembly Language: Programming and Architecture

SUBSR3,R3,#1;decrementcounter

BNEBEGIN;ifR3isnotzerobranchBEGIN

LDRR0,=LOWEST;R0=addressofLOWEST

STRR2,[R0];storeR2inlocationSUM

HEREBHERE

SIGN_DATDCB+13,-10,+19,+14,-18,-9,+12,-19,+16

AREAVARIABLES,DATA,READWRITE

LOWESTDCD0

END

;Noticethatweusethefirstsignnumberasbasisfor

;comparisonandkeepcomparingitwithothernumbers.

;Anytimeanyofthenumberissmallerweuseitasbasis

;forcomparison.

CMNinstruction

CMNRn,Op2

In ARM we have two compare instructions: CMP and CMN. While the CMPinstructionsetstheflagsbysubtractingthesourceoperandfromthedestination,theCMNsetstheflagsbyaddingsourcetodestination.AsaresultCMNcomparesthedestinationoperandwiththenegativeofthesourceoperand:

destination>(-1×source)V=N

destination=(-1×source)Z=1

destination<(-1×source)N=negationofV

When the source operand is an immediate value, the instructions can be usedinterchangeably.Example5-16isanexampleofusingtheCMNinstruction.

Example5-16

AssumingR5hasapositivevalue,writeaprogramthatfindsitsnegativematchinanarrayofdata(OUR_DATA).

Solution:

Page 247: ARM Assembly Language: Programming and Architecture

AREAEXAMPLE5_16,CODE,READONLY

ENTRY

MOVR5,#13

LDRR0,=OUR_DATA

MOVR3,#9

BEGIN

LDRSBR1,[R0];R1=contentsofloc.pointedtobyR0

;(signextended)

CMNR1,R5;compareR1andnegativeofR5

BEQFOUND;branchifR1isequaltonegativeofR5

ADDSR0,R0,#1;incrementpointer

SUBSR3,R3,#1;decrementcounter

BNEBEGIN;ifR3isnotzerobranchBEGIN

NOT_FOUNDBNOT_FOUND

FOUNDBFOUND

OUR_DATADCB+13,-10,-13,+14,-18,-9,+12,-19,+16

END

IntheaboveprogramR5isinitializedwith13.Therefore,itfinishessearchingwhenitgetsto– 13.

Arithmeticshift

AswasdiscussedinChapter3,therearetwotypesofshifts:logicalandarithmetic.Logical shift, which is used for unsigned numbers, was discussed in Chapter 3. Thearithmetic shift is used for signednumbers. It is basically the same as the logical shift,exceptthatthesignbitiscopiedtotheshiftedbits.

ASR(arithmeticshiftright)

MOVRn,Op2,ASRcount

Page 248: ARM Assembly Language: Programming and Architecture

AsthebitsofthesourceareshiftedtotherightintoC,theemptybitsarefilledwiththesignbit.OnecanusetheASRinstructiontodivideasignednumberby2,asshownbelow:

MOVR0,#-10;R0=-10=0xFFFFFFF6

MOVR3,R0,ASR#1;R0isarithmeticshiftedrightonce

;R3=0xFFFFFFFB=-5

ReviewQuestions

1.Explainthedifferencebetweenanoverflowandacarry.

2.ExplainthepurposeoftheLDRSBandLDRSHinstructions.DemonstratetheeffectofLDRSBonR0=0xF6.DemonstratetheeffectofLDRSHonR1=0x124C.

3.Theinstructionforsignedmultiplicationis________.

4.Foreachofthefollowinginstructions,indicatetheflagconditionnecessaryforeachbranchtooccur:(a)BLE(b)BGT

Page 249: ARM Assembly Language: Programming and Architecture
Page 250: ARM Assembly Language: Programming and Architecture

Section5.3:IEEE754Floating-PointStandardsInthissectionwestudytheIEEEstandardforfloating-pointnumbers.

IEEEfloating-pointstandard

Uptothelate1970s,realnumbers(numberswithdecimalpoints)wererepresenteddifferentlyinbinaryformbydifferentcomputermanufacturers.Thismademanyprogramsincompatible for different machines. In 1980, an IEEE committee standardized thefloating-point data representation of real numbers. This standard, much of which wascontributedbyIntelbasedonthe8087mathcoprocessor,recognizedtheneedfordifferentdegreesofprecisionbydifferentapplications;therefore,itestablishedsingleprecisionanddoubleprecision.Sincealmostallsoftwareandhardwarecompaniesnowabidebythesestandards,eachoneisexplainedthoroughly.

IEEE754single-precisionfloating-pointnumbers

IEEEsingle-precision floating-pointnumbersuseonly32bitsofdata to representany real number range 2128 to 2-126, for both positive and negative numbers. Thistranslatesapproximatelytoarangeof1.2×10-38to3.4×10+38indecimalnumbers,againforbothpositiveandnegativevalues.Insomemathcoprocessorterminology,thesesingle-precision32-bitfloating-pointnumbersarereferredtoasshortreal.Assignmentofthe32bitsinthesingle-precisionformatisshowninFigure5-3.

Figure5-3:IEEE754Single-precisionFloating-pointNumbers

Tomakethehardwaredesignofthemathprocessorsmucheasierandlesstransistorconsuming, the exponent part is added to a constant of 0x7F (127 decimal). This isreferred to as a biased exponent. Conversion from real to floating point involves thefollowingsteps.

1.Therealnumberisconvertedtoitsbinaryform.

2.Thebinarynumberisrepresentedinscientificform:1.xxxxEyyyy

3.Bit31iseither0forpositiveor1fornegative.

4.Theexponentportion,yyyy,isaddedto7Ftogetthebiasedexponent,whichisplacedinbits23to30.

5.Thesignificand,xxxx,isplacedinbits22to0.

Examples5-17,5-18,and5-19demonstratethisprocess.

Example5-17

Convert9.7510tosingle-precision(shortreal)floatingpoint.

Solution:

Page 251: ARM Assembly Language: Programming and Architecture

decimal9.75=binary1001.11=scientificbinary1.00111E3

Signbit31is0forpositive.

Exponentbits30to23are10000010(3+7F=82H)afterbiasing.

Significandbits22to0are001110000000000000000…00.

Puttingitalltogethergivesthefollowingbinaryform,underwhichiswrittenthehexform:

01000001000111000000000000000000

411C0000

ThiscanbeverifiedbyusinganassemblersuchasKeil.

Example5-18

Convert0.07812510toshortIEEEfloating-pointstandardrealFP(singleprecision).

Solution:

decimal0.078125=binary0.000101=scientificbinary1.01E–4

Signbit31is0forpositive.

Exponentbits30–23are01111011(–4+7F=7B)afterbiasing.

Significandbits22–0are01000000….000.

Thisnumberwillberepresentedinbinaryandhexas

00111101101000000000000000000000

3DA00000

Page 252: ARM Assembly Language: Programming and Architecture

Example5-19

Convert–96.2710tosingle-precisionFPformat.

Solution:

decimal96.27=binary1100000.01000101000111101=

scientificbinary1.10000001000101000111101E6

Signbit31is1fornegative.

Exponentbits30–23are10000101(6+7F=85H)afterbiasing,

Fractionbits22–0are10000001000101000111101,

Thefinalforminbinaryandhexis

11000010110000001000101000111101

C2C08A3D

Itmustbenotedthatconversionofthedecimalportion0.27tobinarycanbecontinuedbeyondthepointshownabove,butbecausethefractionpartofthesingleprecisionislimitedto23bits,thiswasallthatwasshown.Forthatreason,double-precisionFPnumbersareusedinsomeapplicationstoachieveahigherdegreeofaccuracy.

IEEE754double-precisionfloating-pointnumbers

Double-precisionFP(calledlongrealbyIntel)canrepresentnumbersintherange2.3×10-308to1.7×10308,bothpositiveandnegative.Atotalof52bits(bits0to51)areforthesignificand,11bits(bits52to62)arefortheexponent,andfinally,bit63isforthesign.Theconversionprocess is the sameas for singleprecision in that the realnumbermustfirstberepresentedas1.xxxxxxxEYYYY,thenYYYYisaddedto3FFtogetthebiasedexponent.SeeFigure5-4andExample5-20.

Figure5-4:IEEE754Double-precisionFloating-pointNumbers

Example5-20

Convert152.187510todouble-precisionFP.

Page 253: ARM Assembly Language: Programming and Architecture

Solution:

decimal152.1875=binary10011000.0011=

scientificbinary1.00110000011E7

Bit63is0forpositive.

Exponentbits62–53are10000000110(7+3FF=406)afterbiasing.

Fractionbits52–0are00110000011000…..000.

0100 0000 0110 0011 0000 0110 0000 0000 0000 … 0000

4 0 6 3 0 6 0 0 0 … 0

Thisexamplewillbeverifiedbyanassemblerinthenextsection.

MathcoprocessorinARM

Usingageneral-purposemicroprocessorsuchastheARMtoperformmathematicalfunctionssuchasfloatingpointcalculation,log,sine,andothersisverytimeconsuming,notonlyfortheCPUbutalsoforprogrammerswritingsuchprograms.Intheabsenceofamath coprocessor, programmers must write subroutines using ARM instructions formathematicalfunctions.Althoughsomeofthesesubroutinesarealreadywritten,nomatterhow good the subroutine, its CPU run time will still be quite long. To acceleratecomplicated mathematical and floating point calculation, math or floating pointcoprocessorsareused.Asimplervariationofmathcoprocessoriscalledfloatingpointunit(FPU) and someARMchips comewith on-chip FPU. InKeil there is a library namedfplibtodofloatingpointcalculation.

ReviewQuestions

1.Single-precisionIEEEFPstandarduses_________bitstorepresentdata.

2.Double-precisionIEEEFPstandarduses________bitstorepresentdata.

3.TogetthebiasedexponentportionofIEEEsingle-precisionfloating-pointdataweadd________.

4.TogetthebiasedexponentportionofIEEEdouble-precisionfloating-pointdataweadd________.

5.Trueorfalse.Intheabsenceofamathprocessor,thegeneral-purposeprocessormustperformallmathcalculations.

6.Trueorfalse.AlloftheARMchipscomewiththeFPU.

Page 254: ARM Assembly Language: Programming and Architecture

Problems:Section5.1:SignedNumbersConcept

1.Showhowthe32-bitcomputerswouldrepresentthefollowingnumbersandverifyeachwithacalculator.

(a)-23 (b)+12 (c)-0x28

(d)+0x6F (e)-128 (f)+127

(g)+365 (h)-32,767

2.Showhowthe32-bitcomputerswouldrepresentthefollowingnumbersandverifyeachwithacalculator.

(a)-230 (b)+1200 (c)-0x28F

(d)+0x6FF

Section5.2:SignedNumberInstructionsandOperations

3.FindtheoverflowflagforeachcaseandverifytheresultusinganARMIDE.Dobyte-sizedcalculationonthem.

(a)(+15)+(-12) (b)(-123)+(-127) (c)(+0x25)+(+34)

(d)(-127)+(+127) (e)(+100)+(-100)

4.Sign-extendthefollowingandwritesimpleprogramsinusingARMIDEtoverifythem.

(a)-122 (b)-0x999 (c)+0x17

(d)+127 (e)-129

5.ModifyProgram5-2tofindthehighesttemperature.Verifyyourprogram.

Section5.3:IEEE754Floating-PointStandards

6.Whatisthedisadvantageofusingageneral-purposeprocessortoperformmathoperations?

7.ShowthebitassignmentoftheIEEEsingle-precisionstandard.

8.Convert(byhandcalculation)eachofthefollowingrealnumberstoIEEEsingle-precisionstandard.

(a)15.575(b)89.125(c)–1022.543(d)–0.00075

Page 255: ARM Assembly Language: Programming and Architecture

9.ShowthebitassignmentoftheIEEEdouble-precisionstandard.

10.Insingle-precisionFP(floatingpoint),thebiasedexponentiscalculatedbyadding________tothe_________portionofascientificbinarynumber.

11.Indouble-precisionFP,thebiasedexponentiscalculatedbyadding________tothe_________portionofascientificbinarynumber.

12.Convertthefollowingtodouble-precisionFP.

(a)12.9375(b)98.8125

Page 256: ARM Assembly Language: Programming and Architecture

AnswerstoReviewQuestionsSection5.1

1.D7,D15,andD31for32-bitsigneddata.

2.0x16=00010110;its2’scomplementis:11101010

3.–128to+127;–32,768to+32,767(decimal)

4.-2,147,483,648to+2,147,483,647

5.0x500000=010100000000000000000000;

Its2’scomplementis:101100000000000000000000

Section5.2

1.Cflagisraisedwhenthereisacarryoutoftheresult,butVflagisraisedwhenthereisacarrytothesignbitandnocarryoutofthesignbitorwhenthereisnocarrytothesignbitandthereisacarryoutofthesignbit.CflagisusedtoindicateoverflowinunsignedarithmeticoperationswhileVflagisinvolvedinsignedoperations.

2.TheLDRSBinstructionsignextendsthesignbitofabyteintoaword;theLDRSHinstructionsignextendsthesignbitofahalf-wordintoaword.

In0xF6thesignbitis1;thus,itissign-extendedinto0xFFFFFFF6

0x124Csign-extendedintoR1wouldbeR0=0x0000124C.

3.SMULL

(a)BLEwilljumpifVistheinverseofN,orifZ=1.

(b)BGTwilljumpifVequalsN,andifZ=0.

Section5.3

1.32

2.64

3.0x7F

4.0x3FF

5.True

6.False

Page 257: ARM Assembly Language: Programming and Architecture
Page 258: ARM Assembly Language: Programming and Architecture
Page 259: ARM Assembly Language: Programming and Architecture

Chapter6:ARMMemoryMap,MemoryAccess,andStackThis chapter discusses the issue of memory access and the stack. Section 6.1 is

dedicatedtoARMmemorymapandmemoryaccess.Wewillalsoexplaintheconceptsofalign,non-align,littleendian,andbigendiandataaccess.InSection6.2,weexaminetheuseofthestackinARM.Wediscussthebit-addressable(bit-band)SRAMandperipheralsinSection6.3.AdvancedindexedaddressingmodeisexplainedinSection6.4.InSection6.5,wedescribethePCrelativeaddressingmodeanditsuse in implementingADRandLDR.

Page 260: ARM Assembly Language: Programming and Architecture
Page 261: ARM Assembly Language: Programming and Architecture

Section6.1:ARMMemoryMapandMemoryAccessTheARMCPUuses32-bitaddressestoaccessmemoryandperipherals.Thisgives

us amaximum of 4GB (gigabytes) ofmemory space. This 4GB of directly accessiblememoryspacehasaddresses0x00000000to0xFFFFFFFF,meaningeachbyteisassignedauniqueaddress(ARMisabyte-addressableCPU).SeeFigure6-1.

Figure6-1:MemoryByteAddressinginARM

The4GBofmemoryspaceisdividedintothreeregions:code,data,andperipheraldevices.SeeTable6-1.

Addressrange Name Description

0x00000000-0x1FFFFFFF Code ROMorFlashmemory

0x20000000-0x3FFFFFFF SRAM SRAMregionusedforon-chipRAM

0x40000000-0x5FFFFFFF Peripheral On-chipperipheraladdressspace

0x60000000-0x9FFFFFFF RAM Memory,cachesupport

0xA0000000-0xDFFFFFFF Device Sharedandnon-shareddevicespace

0xE0000000-0xFFFFFFFF System PPBandvendorsystemperipherals

Table6-1:MemorySpaceAllocationinARMCortex

In other words, the ARM uses the memory mapped I/O. For the ARMmicrocontrollers,generallytheFlashROMisusedforprogramcode,SRAMforscratchpad data, and memory-mapped I/O ports for peripherals. While there is an absolutestandard for the ARM instructions that all licensees of ARMmust follow, there is no

Page 262: ARM Assembly Language: Programming and Architecture

standardforexactlocationsandtypesofmemoryandperipherals.Thereforethelicenseescanimplementthememoryandperipheralsastheychoose.ForthisreasontheamountandtheaddresslocationsofmemoryusedbyFlashROM,SRAM,andI/Operipheralsvariesamong the familymembers and chipmanufacturers. TheARMmanufacturer datasheetshouldgiveyouthedetailsofthememorymapforbothon-chipandoff-chipmemoryandperipherals.MakesuretoexaminethememorymapofagivenARMchipbeforeyoustarttoprogramit.FromTable6-1,noticethatsomeofthememoryaddressesaresetasideforthe external (off-chip) memory and peripherals. At the time of this writing, no ARMmanufacturerhaspopulatedtheentire4GBofmemoryspacewithon-chipROM,RAM,andI/Operipherals.

ARM-basedMotherboards

InARM systemsforMicrosoftWindows,Unix,andAndroidoperatingsystemstheARMmotherboardsuseDRAMfortheRAMmemory,justlikethex86andPentiumPCs.AstheARMCPUispushedintothelaptop,desktop,andtabletsPCs,andthehighendofembedded systems products such as routers,wewill see the use ofDRAM as primarymemory to store both the operating systems and the applications. In such systems, theFlashmemorywill beholding thePOST (poweron self test),BIOS (basic Input/outputsystems) andboot programs. Just likex86 system, such systemshavebothon-chip andoff-chiphighspeedSRAMforcache.Currently,thereareARMchipsonthemarketwithsome on-chip Flash ROM, SRAM, and memory decoding circuitry for connection toexternal(off-chip)memory.Thisoff-chipmemorycanbeSRAM,Flash,orDRAM.Thedatasheet for suchARMchipsprovide thedetailsofmemorymap forbothon-chipandoff-chipmemories.Next,weexaminetheARMbusesandmemoryaccess.

Figure6-2:MemoryConnectionBlockDiagraminARM

D31–D0Databus

Page 263: ARM Assembly Language: Programming and Architecture

The32-bitdatabusoftheARMprovidesthe32-bitdatapathtotheon-chipandoff-chipmemoryandperipherals.Theyaregroupedinto8-bitdatachunks,D0–D7,D8–D15,D16–D23,andD24–D31.

A31–A0

These signalsprovide the32-bit addresspath to theon-chipandoff-chipmemoryandperipherals.SincetheARMsupportsdataaccessofbyte(8bits),halfword(16bits),and word (32 bits), the buses must be able to access any of the 4 banks of memoryconnectedtothe32-bitdatabus.TheA0andA1areusedtoselectoneofthe4bytesoftheD31-D0databus.SeeFigure6-3.

Figure6-3:MemoryBlockDiagraminARM

AHBandAPBbuses

TheARMCPUisconnected to theon-chipmemoryviaanAHB(advancedhigh-performancebus).TheAHBisusednotonlyforconnectiontoon-chipROMandRAM,itisalsoused forconnection to someof thehighspeed I/Os (input/output) suchasGPIO(general purpose I/O). ARM chip also has the APB (advanced peripherals bus) busdedicated for communication with the on-chip peripherals such as timers, ADC, serialCOM,SPI, I2C,andotherperipheralports.Whileweneed the32-bitdatabusbetweenCPUand thememory (RAMandROM),many slower peripherals are 8 or 16 bits andthere is noneed for entire fast 32-bit databuspathway.For this reason,ARMuses theAHB-to-APBbridgetoaccessthesloweron-chipdevicessuchasperipherals.Alsosinceperipheralsdonotneedahighspeedbus,abridgebetweenAHBandAPBallowsgoingfromthehigherspeedbusofAHBtolowerspeedbusofperipherals.TheAHBbusallowsasingle-cycleaccess.SeeFigure6-4forAHB-to-APBbridge.

Page 264: ARM Assembly Language: Programming and Architecture

Figure6-4:AHBandAPBinARM

Buscycletime

ToaccessadevicesuchasmemoryorI/O,theCPUprovidesafixedamountoftimecalledabuscycletime.Duringthisbuscycletime,thereadorwriteoperationofmemoryorI/Omustbecompleted.ThebuscycletimeusedforaccessingmemoryisoftenreferredtoasMC(memorycycle)time.ThetimefromwhentheCPUprovidestheaddressesatitsaddresspinstowhenthedataisexpectedatitsdatapinsiscalledmemoryreadcycletime.Whileforon-chipmemorythecycletimecanbe1clock,intheoff-chipmemorythecycletimeisoften2clocks.IfmemoryisslowanditsaccesstimedoesnotmatchtheMCtimeoftheCPU,extratimecanberequestedfromtheCPUtoextendthereadcycletime.Thisextratimeiscalledawaitstate(WS).Inthe1980s,theclockspeedformemorycycletimewasthesameastheCPU’sclockspeed.Forexample,inthe20MHzprocessors,thebuseswereworkingatthesamespeedof20MHz.Thisresultedin2×50ns=100nsforthememorycycletime(1/20MHz=50ns).SeeExample6-1.

Example6-1

Calculatethememorycycletimeofa50-MHzbussystemwith

(a)0WS,

(b)1WS,and

(c)2WS.

Assumethatthebuscycletimeforoff-chipmemoryaccessis2clocks.

Solution:

1/50MHz=20nsisthebusclockperiod.Sincethebuscycletimeofzerowaitstatesis2clocks,wehave:

Memorycycletimewith0WS2×20=40ns

Memorycycletimewith1WS40+20=60ns

Page 265: ARM Assembly Language: Programming and Architecture

Memorycycletimewith2WS40+2×20=80ns

It ispreferred that all bus activitiesbe completedwith0WS.However, if the readandwriteoperationscannotbecompletedwith0WS,werequestanextensionofthebuscycletime.ThisextensionisintheformofanintegernumberofWS.Thatis,wecanhave1,2,3,andsoonWS,butnot1.25WS.

WhentheCPU’sspeedwasunder100MHz,thebusspeedwascomparabletotheCPU speed. In the 1990s theCPU speed exploded to 1GHz (gigahertz)while the busspeedmaxedoutataround200MHz.ThegapbetweentheCPUspeedandthebusspeedisoneofthebiggestproblemsinthedesignofhigh-performancesystems. ToavoidtheuseoftoomanywaitstatesininterfacingmemorytoCPU,cachememoryandotherhigh-speedDRAMsareused.ThesearediscussedinChapters9and10.

Busbandwidth

The rate of data transfer is generally called bus bandwidth. In other words, busbandwidth is a measure of how fast buses transfer information between the CPU andmemoryorperipherals.Thewiderthedatabus, thehigherthebusbandwidth.However,theadvantageofthewiderexternaldatabuscomesatthecostofincreasingthediesizeforsystem on-chip (SOC) or the printed circuit board size for off-chipmemory. Now youmightaskwhyweshouldcarehowfastbusestransferinformationbetweentheCPUandoutside, as long as theCPU isworking as fast as it can. The problem is that theCPUcannotprocess information that it doesnothave. Inotherwords, the speedof theCPUmustbematchedwiththehigherbusbandwidth;otherwise,thereisnouseforafastCPU.ThisislikedrivingaPorscheorFerrari infirstgear; it isaterribleunderusageofCPUpower.Bus bandwidth ismeasured inMB (megabytes) per second and is calculated asfollows:

busbandwidth=(1/buscycletime)×buswidthinbytes

In the above formula, bus cycle time can be for bothmemory and I/O since theARMusesthememorymappedI/O.Example6-2clarifiestheconceptofbusbandwidth.As can be seen from Example 6-2, there are twoways to increase the bus bandwidth:Eitheruseawiderdatabusorshortenthebuscycletime(ordoboth).Thatisexactlywhatmanyprocessorshavedone.Again, itmustbenotedthatalthoughtheprocessor’sspeedcango to1GHzorhigher, thebusspeed foroff-chipmemory is limited toaround200MHz.Thereasonforthisisthatthesignalsbecometoonoisyforthecircuitboardiftheyareabove100MHz.

Example6-2

CalculatememorybusbandwidthforthefollowingCPUifthebusspeedis100MHz.

Page 266: ARM Assembly Language: Programming and Architecture

(a)ARMThumbwith0WSand1WS(16-bitdatabus)

(b)ARMwith0WSand1WS(32-bitdatabus)

Assumethatthebuscycletimeforoff-chipmemoryaccessis2clocks.

Solution:

Thememorycycletimeforbothis2clocks,withzerowaitstates.Withthe100MHzbusspeedwehaveabusclockof1/100MHz=10ns.

(a)Busbandwidth=(1/(2×10ns))×2bytes=100Mbytes/second(MB/s)

With1waitstate,thememorycyclebecomes3clockcycles

3×10=30nsandthememorybusbandwidthis=(1/30ns)×2bytes=66.6MB/s

(b)Busbandwidth=(1/(2×10ns))×4bytes=200MB/s

With1waitstate,thememorycyclebecomes3clockcycles

3×10=30nsandthememorybusbandwidthis=(1/30ns)×4bytes=126.6MB/s

Fromtheaboveitcanbeseenthatthetwofactorsinfluencingbusbandwidthare:

1.Theread/writecycletimeoftheCPU

2.Thewidthofthedatabus

Codememoryregion

The 4 GB of ARMmemory space is organized as 1G × 32 bits since the ARMinstructionsare32-bit.TheinternaldatabusoftheARMis32-bit,allowingthetransferofone instruction into theCPUeveryclockcycle.This isoneof thebenefitsof theRISCfixedinstructionsize.Thefetchingofaninstructionineveryclockcyclecanworkonlyifthecodeiswordaligned,meaningeachinstructionisplacedatanaddresslocationendingwith0,4,8,orC.Example6-3showstheplacementofcodeinARMmemory.Noticethatthecodeaddressesgoupby4sincetheARMinstructionsarefixedat4byteseach.Whilecompilersensurethatcodesarewordaligned,itisjoboftheprogrammertomakesurethedatainSRAMiswordalignedtoo.Wewillexaminethisimportanttopicsoon.

Example6-3

Page 267: ARM Assembly Language: Programming and Architecture

CompileanddebugthefollowingcodeinKeilandseetheplacementofinstructionsinmemorylocations.

AREAARMex,CODE,READONLY

ENTRY

MOVR2,#0x00;R2=0x00

MOVR3,#0x35;R3=0x35

ADDR4,R3,R2

END;Markendoffile

Solution

Asyoucanseeinthefigure,thefirstMOVinstructionstartsfromlocation0x00000000,thesecondMOVinstructionstartsfromlocation0x00000004andtheADDinstructionstartsfromlocation0x00000008.

Thefollowingimagedisplaysthefirstlocationsofmemory.ThecodeofthefirstMOVinstructionislocatedinthefirstword(fourbytes)ofmemorywhichiswordaligned.Thesameruleappliesfortheotherinstructions.NotethatthecodeofMOVR2,0isE3A02000but0020A0E3isstoredinthememory.Wewilldiscussthereasoninthischapterwhenwefocusontheconceptofbigendianandlittleendian.

SRAMmemoryregion

AsectionofthememoryspaceisusedbySRAM.TheSRAMcanbeon-chiporoff-chip (external).The sameway that everyARMchip has someon-chipFlashROM forcode,aportionofthememoryregionisusedbytheon-chipSRAM.Thison-chipSRAMisusedbytheCPUforscratchpadtostoreparameters.ItisalsousedbytheCPUforthepurposeofthestack.WeexaminethestackusagebytheARMinthenextsection.Inusing

Page 268: ARM Assembly Language: Programming and Architecture

theSRAMmemory for storingparameters,wemustbecarefulwhen loadingor storingdata in the SRAM lest we use unaligned data access. Next, we discuss this importantissue.

DatamisalignmentinSRAM

ThecaseofmisaligneddatahasamajoreffectontheARMbusperformance.Ifthedataisaligned,foreverymemoryreadcycle, theARMbringsin4bytesofinformation(data or code) using theD31–D0 data bus. Such data alignment is referred to aswordalignment. Tomake dataword aligned, the least significant digits of the hex addressesmustbe0,4,8,orC(inhex).

Whilethecompilersmakesurethatprogramcodes(instructions)arealwaysaligned(Example 6-3), it is the placement of data in SRAM by the programmer that can benonaligned and therefore subject to memory access penalty. In other words, the singlecycleaccessofmemoryisalsousedbyARMtobringintoregisters4bytesofdataeveryclockcycleassumingthatthedataisaligned.Tomakesurethatdataarealsoalignedweusethealigndirective.TheuseofaligndirectiveforRAMdatamakessurethateachwordislocatedatanaddresslocationendingwithaddressof0,4,8,orC.Ifourdataiswordsize(usingDCDUdirective)thentheuseofaligndirectiveatthestartofthedatasectionguarantiesallthedataplacementswillbewordaligned.WhenawordsizedataisdefinedusingtheDCDdirective,theassembleralignsittobewordaligned.

Accessingnon-aligneddata

As we have stated many times before, ARM defines 32-bit data as a word. Theaddressofawordcanstartatanyaddresslocation.Forexample,intheinstruction“LDRR1,[R0]”ifR0=0x20000004,theaddressofthewordbeingfetchedintoR1startsatanalignedaddress.Inthecaseof“LDRR1,[R0]”ifR0=0x20000001theaddressstartsatanon-aligned address. In systems with a 32-bit data bus, accessing a word from a non-alignedaddressedlocationcanbeslower.Thisissueisimportantandappliestoall32-bitprocessors.

In the 8-bit system, accessing a word (4 bytes) is treated like accessing fourconsecutive bytes regardless of the address location. Since accessing a byte takes onememory cycle, accessing 4 bytes will take 4 memory cycles. In the 32-bit system,accessingawordwithanalignedaddresstakesonememorycycle.ThatisbecauseeachbyteiscarriedonitsowndatapathofD0–D7,D8–D15,D16–D23,andD24–D31inthesamememorycycle.However,accessingawordwithanon-alignedaddressrequirestwomemory cycles. For example, see how accessing theword in the instruction “LDRR1,[R0]” works as shown in Figure 6-5. As a case of aligned data, assume that R0 =0x80000000. In this instruction, 4 bytes contents of memory locations 0x80000000through 0x80000003 are being fetched in one cycle. In only one cycle, theARMCPUaccesseslocations0x80000000through0x80000003andputsitinR1.

Page 269: ARM Assembly Language: Programming and Architecture

Figure6-5:MemoryAccessforAlignedandNon-alignedData

Now assuming that R0 = 0x80000001 in this instruction, 8 bytes contents ofmemorylocations0x80000000through0x80000007arebeingfetchedintwoconsecutivecyclesbutonly4bytesofitareused.Inthefirstcycle,theARMCPUaccesseslocations0x80000000 through 0x80000003 and puts them in R1 only the desired three bytes oflocations0x800000001through0x80000003.Inthesecondcycle,thecontentsofmemorylocations 0x8000004 through 0x80000007 are accessed and only the desired byte of0x80000004isputintoR1.SeeExample6-4.

Example6-4

Showthedatatransferofthefollowingcasesandindicatethenumberofmemorycycletimesittakesfordatatransfer.AssumethatR2=0x4598F31E.

LDRR1,=0x40000000;R1=0x40000000

LDRR2,=0x4598F31E;R2=0x4598F31E

STRR2,[R1];StoreR2tolocation0x40000000

ADDR1,R1,#1;R1=R1+1=0x40000001

STRR2,[R1];StoreR2tolocation0x40000001

ADDR1,R1,#1;R1=R1+1=0x40000002

STRR2,[R1];StoreR2tolocation0x40000002

ADDR1,R1,#1;R1=R1+1=0x40000003

STRR2,[R1];StoreR2tolocation0x40000003

Solution:

ForthefirstSTRR2,[R1]instruction,theentire32bitsofR2isstoredintolocationswith

Page 270: ARM Assembly Language: Programming and Architecture

addresses of 0x40000000, 0x40000001, 0x40000002, and 0x40000003, The 4-bytecontent of register R2 is stored into memory locations with starting address of0x40000000 via the 32-bit data bus of D31–D0. This address is word aligned sinceaddress of the least significant digit is 0. Therefore, it takes only onememory cycle totransferthe32-bitdata.

ForthesecondSTRR2,[R1]instruction,inthefirstmemorycycle,thelower24bitsofR2is stored into locations 0x40000001, 0x40000002, and 0x40000003. In the secondmemorycycle,theupper8bitsofR2isstoredintothe0x40000004location.

ForthethirdSTRR2,[R1]instruction,inthefirstmemorycycle,thelower16bitsofR2isstoredintolocations0x40000002and0x40000003.Inthesecondmemorycycle,theupper16bitsofR2isstoredintolocations0x40000004and0x40000005.

ForthefourthSTRR2,[R1]instruction,inthefirstmemorycycle,thelower8bitsoftoR2isstoredintolocations0x40000003.Inthesecondmemorycycle,theupper24bitsofR2isstoredintothelocations0x40000004,0x40000005,and0x40000006.

Thelessontobe learnedfromthis is to trynot toputanywordsonanon-alignedaddress location ina32-bit system. Indeed this is so important thatdirectiveALIGN isspecificallydesignedforthispurpose.Next,wediscusstheissueofaligneddata.

UsingLDRinstructionwithDCDandALIGNdirectives

TheDCDandDCDUdirectivesareusedfor32-bit(word)data.TheDCDdirective

Page 271: ARM Assembly Language: Programming and Architecture

ensures32-bitdatatypesarealigned,incontrasttoDCDUwhichdoesnot.DCDisusedasfollows:

VALUE1DCD0x99775533

This ensures that VALUE1, a word-sized operand, is located in a word alignedaddress location. Therefore, an instruction accessing it will take only a singlememorycycle.SinceperformanceoftheCPUdependsonhowfastitcanfetchthedatawemustensure thatanymemoryaccess reading32-bitdata isdone inasingleclockcycle.Thismeanswemustmakesureall32-bitdataarewordaligned.ThisissoimportantthatARMhas an interrupt (exception) dedicated tomisaligneddata,meaning any time it accessesmisaligned data, it lets us know that there is a problem. The one-time use of ALIGNdirectiveatthebeginningofdataareausingDCDUmakesthedataalignedforthatgroupofdata.

UsingLDRHwithDCWandALIGNdirectives

Theproblemofmisaligneddataisalsoanissuewhenthedatasizeisinhalf-words(16-bit).InmanycasesusingDCWU,wemustusetheALIGNdirectivemultipletimesinthedataareaofagivenprogramtoensuretheyarealigned.ThisisincontrasttotheDCWdirectivewhichensuresdatatypetobehalf-wordaligned.Thisisespeciallythecasewhenwe use the LDRH instruction. See Example 6-5. Aligned data is also an issue for theThumbversionoftheARM.

Example6-5

ShowthedatatransferofthefollowingLDRHinstructionsandindicatethenumberofmemorycycletimesittakesfordatatransfer.

LDRR1,=0x80000000;R1=0x80000000

LDRR3,=0xF31E4598;R3=0xF31E4598

LDRR4,=0x1A2B3D4F;R4=0x1A2B3D4F

STRR3,[R1]

;(STRR3,[R1])storesR3tolocation0x80000000

STRR4,[R1,#4]

;(STRR4,[R1+4])storesR4tolocation0x80000004

LDRHR2,[R1]

;loadstwobytesfromlocation0x80000000toR2

LDRHR2,[R1,#1]

;loadstwobytesfromlocation0x80000001toR2

LDRHR2,[R1,#2]

;loadstwobytesfromlocation0x80000002toR2

Page 272: ARM Assembly Language: Programming and Architecture

LDRHR2,[R1,#3]

;loadstwobytesfromlocation0x80000003toR2

Solution:

IntheLDRHR2,[R1]instruction,locationswithaddressesof0x80000000,0x80000001,0x80000002,and0x80000003areaccessedbutonly0x80000000and0x80000001areusedtogetthe16bitstoR2.Thisaddressishalfwordalignedsincetheleastsignificantdigitis0.Therefore,ittakesonlyonememorycycletotransferthedata.Now,R2=0x00004598

FortheLDRHR2,[R1,#1],instruction,locationswithaddressesof0x80000000,0x80000001,0x80000002,and0x80000003areaccessed,butonly0x80000001and0x80000002areusedtogetthe16bitstoR2.Therefore,ittakesonlyonememorycycletotransferthedata.Now,R2=0x00001E45.

FortheLDRHR2,[R1,#2],instruction,locationswithaddressesof0x80000000,0x80000001,0x80000002,and0x80000003areaccessed,butonly0x80000002and0x80000003areusedtogetthe16bitstoR2.Therefore,ittakesonlyonememorycycletotransferthedata.Now,R2=0x0000F31E.

FortheLDRHR2,[R1,#3]instruction,inthefirstmemorycycle,locationswithaddressesof0x80000000,0x80000001,0x80000002,and0x80000003areaccessed,butonly0x80000003isusedtogetthelower8bitstoR2.Inthesecondmemorycycle,theaddresslocations0x80000004,0x80000005,0x80000006,and0x80000007areaccessedwhereonlythe0x80000004locationisusedtogettheupper8bitstoR2.Now,R2=0x00004FF3.

Page 273: ARM Assembly Language: Programming and Architecture

UsingLDRBwithDCBandALIGNdirectives

Theproblemofmisaligneddatadoesnotexistwhenthedatasizeisbytes.Incasessuch as using the string of ASCII characters with theDCB directive, accessing a bytetakes the same amount of time (one memory cycle) as an aligned word (4 bytes),regardlessoftheaddresslocationofthedata.TheonlyproblemwiththeLDRBisitbringsintotheCPUonlyasinglebyteofdataineachmemorycycleinsteadof4bytesifLDRis.SeeExample6-6.

Example6-6

ShowthedatatransferofthefollowingLDRBinstructionsandindicatethenumberofmemorycycletimesittakesfordatatransfer.

LDRR1,=0x80000000;R1=0x80000000

LDRR3,=0xF31E4598;R3=0xF31E4598

LDRR4,=0x1A2B3D4F;R4=0x1A2B3D4F

STRR3,[R1]

;StoreR3tolocation0x80000000

STRR4,[R1,#4]

;(STRR4,[R1+4])StoreR4tolocation0x80000004

LDRBR2,[R1]

;loadonebytefromlocation0x80000000toR2

LDRBR2,[R1,#1]

;(LDRBR2,[R1+1])loadonebytefromlocation0x80000001

LDRBR2,[R1,#2]

;(LDRBR2,[R1+2])loadonebytefromlocation0x80000002

LDRBR2,[R1,#3]

;(LDRBR2,[R1+3])loadonebytefromlocation0x80000003

Solution:

IntheLDRBR2,[R1]instruction,locationswithaddressesof0x80000000,0x80000001,0x80000002,and0x80000003areaccessedbutonly0x80000000isusedtogetthe8bitsto R2. Therefore, it takes only one memory cycle to transfer the data. Now,R2=0x00000098.

Page 274: ARM Assembly Language: Programming and Architecture

In the LDRB R2,[R1,#1] instruction, locations with addresses of 0x80000000,0x80000001,0x80000002,and0x80000003areaccessedbutonly0x80000001isusedtogetthe8bitstoR2.Therefore,ittakesonlyonememorycycletotransferthedata.Now,R2=0x00000045.

In the LDRB R2,[R1,#2] instruction, locations with addresses of 0x80000000,0x80000001,0x80000002,and0x80000003areaccessedbutonly0x80000002isusedtogetthe8bitstoR2.Therefore,ittakesonlyonememorycycletotransferthedata.Now,R2=0x0000001E.

In the LDRB R2,[R1,#3] instruction, locations with addresses of 0x80000000,0x80000001,0x80000002,and0x80000003areaccessedbutonly0x80000003isusedtogetthe8bitstoR2.Therefore,ittakesonlyonememorycycletotransferthedata.Now,R2=0x000000F3.

Peripheralregion

Table 6-1 showed a section of memory is set aside for peripherals. The type ofperipherals and memory address locations used is unique to a vendor. The ARMmanufacturersprovidethedetailsofmemorymapfortheperipherals.Ifyouexaminethemanufacturerdatasheetyouwillseethattheyusealignedaddressestomakesurethereisnoclockpenaltyforaccessingthem.

LittleEndianvs.BigEndianwar

In storing data, the ARM follows the little endian convention. The little endianplaces the least significant byte (little end of the data) in the low address and the bigendian is the opposite. The origin of the terms big endian and little endian is from aGulliver’sTravelsstoryabouthowaneggshouldbeopened:fromthebigendorthelittleend.ARMsupportsbothlittleandbigendian.InARMlittleendianisthedefault.SomeARMchipmanufacturersprovideanoptionforchangingittobigendian.SeeExample6-7tounderstandlittleendianandbigendiandatastorage.

Example6-7

Showhowdataisplacedafterexecutionofthefollowingcodeusing

a)littleendianand

b)bigendian.

LDRR2,=0x7698E39F;R2=0x7698E39F

LDRR1,=0x80000000

STRR2,[R1]

Solution:

Page 275: ARM Assembly Language: Programming and Architecture

a)Forlittleendianwehave:

Location80000000=(9F)

Location80000001=(E3)

Location80000002=(98)

Location80000003=(76)

b)Forbigendianwehave:

Location80000000=(76)

Location80000001=(98)

Location80000002=(E3)

Location80000003=(9F)

InExample6-7,noticehowtheleastsignificantbyte(thelittleendofthedata)0x9Fgoestothelowaddress0x80000000,andthemostsignificantbyteofthedata0x76goesto thehighaddress0x80000003.Thismeans that the little endof thedatagoes in first,hencethenamelittleendian.In theARMwithbigendianoptionenabled,data isstoredtheoppositeway:Thebigend(mostsignificantbyte)goesintothelowaddressfirst,andforthisreasonitiscalledbigendian.ManyofrecentRISCprocessorsallowselectionofmode,bigendianorlittleendian.

HarvardArchitectureandARM

In recent yearsmanyARMmanufacturers are using the Harvard architecture forARMCPUs.OldARM architectures up toARM7 useVonNeumann architecture. TheHarvardarchitecturefeedstheCPUwithbothcodeanddataatthesametimeviatwosetsofbuses,oneforcodeandonefordata.ThisincreasestheprocessingpoweroftheCPU

Page 276: ARM Assembly Language: Programming and Architecture

sinceitcanbringinmoreinformation.

Figure6-6:VonNeumannvs.HarvardArchitecture

ReviewQuestions

1.InARM,alltheinstructionsare___bytes?

2.Whomakessurethatinstructionarealignedonwordboundary?

3.InARM,the_____endianisthedefault.

4.A66 MHzsystemhasamemorycycletimeof_____nsifitisusedwithazerowaitstate.

5.Tointerfacea100 MHzprocessortoa50 nsaccesstimeROM,howmanywaitstatesareneeded?

6.Trueorfalse.ARMusesbigendianformatwhenispoweredup.

Page 277: ARM Assembly Language: Programming and Architecture
Page 278: ARM Assembly Language: Programming and Architecture

Section6.2:StackandStackUsageinARMThestack isa sectionofRAMusedby theCPU tostore information temporarily.

ThisinformationcouldbedataoranaddressorCPUregisterswhencallingasubroutine.Stack is alsowidely usedwhen executing an interrupt service routine (ISR). TheCPUneedsthisstorageareabecausethereareonlyalimitednumberofregisters.

Howstacksareaccessed

IfthestackisasectionofRAM,theremustbearegisterinsidetheCPUtopointtoit.IntheARMCPUtheregisterusedtoaccessthestackisR13.

ThestoringofCPUinformationsuchastheregistersonthestackiscalledaPUSH,and loading thecontentsof thestackback intoaCPUregister iscalledaPOP. Inotherwords,aregisterispushedontothestacktosaveitandpoppedoffthestacktoretrieveit.Thefollowingdescribeseachprocess.

Pushingontothestack

Thestackpointer(SP)pointstothetopofthestack(TOS).IntheARMregisterR13isdesignatedasstackpointer.Aswepush(store)dataontothestack,thedataaresavedinSRAM (wheretheSPpointsto)andSPmustbedecremented(orincremented)topointtothe next location. In the ARM we have a choice of either incrementing the SP ordecrementing it.Notice that this is different frommany othermicroprocessors, notablyx86processors, inwhich theSPisdecrementedautomaticallywhendata ispushedontothe stack.Again itmust be emphasized thatwhile in traditionalCPUs such as x86, thestackpointerisdecrementedautomaticallybytheCPUitself,intheARMCPUwemustactuallycodetheinstructionforstackpointerdecrementation(orincrementation)forthatmatter.Therefore,topusharegisterontostackweusetheSTRandSUBinstructionsasshowninthefollowingcode:

STRRr,[R13];Rrcanbeanyregisters(R0-R12)

SUBR13,R13,#4;decrementstackpointer

Forexample,tostorethevalueofR1wecanwritethefollowinginstructions:

STRR1,[R13];storeR1ontothestack,

SUBR13,R13,#4;anddecrementSP

Poppingfromthestack

Popping(loading)thecontentsofthestackbackintoagivenregisteristheoppositeprocessofpushing.When thePOP isexecuted, theSP is incremented (ordecremented)and the top locationof thestack iscopied (loaded)back to the register.Thatmeans thestackisLIFO(Last-In-First-Out)memory.

ToretrievedatafromstackwecanusetheLDRinstruction.

ADDR13,R13,#4;incrementstackpointer

LDRRr,[R13];Rrcananyoftheregisters(R0-R13)

Page 279: ARM Assembly Language: Programming and Architecture

Forexample,thefollowinginstructionspopfromthetopofstackandcopytoR1:

ADDR13,R13,#4;incrementSP

LDRR1,[R13];load(POP)thetopofstacktoR1

InitializingthestackpointerinARM

When theARMispoweredup, theR13(SP) registercontainsvalue0.Therefore,wemustinitializetheSPatthebeginningoftheprogramsothatitpointstosomewhereinthe internal SRAM. In ARM, we can make the stack to grow from a higher memorylocationtoalowermemorylocation.Inthiscasewhenwepush(store)ontothestacktheSPisdecremented.Wecanalsomakethestacktogrowfromalowermemorylocationtoa higher memory location, therefore when we push (store) onto the stack, the SP isincremented. It is common to initialize the SP to the uppermostRAMmemory region,whichmeansaswepushdataontothestack,thestackpointermustbedecremented.

Different ARMs have different amounts of RAM. In some ARM assemblerStack_Toprepresentstheaddressoftopofthestack.So,ifwewanttoinitializetheSP,wecansimplyloadStack_Topinto theSP.Notice thatSP(R13) isa32-bit register.So,wecanuseany32-bitaddressofSRAMasastacksection.

Example 6-8showshowtoinitializetheSPandusethestoreandloadinstructionsforPUSHandPOPoperations.

Example6-8

ThefollowingARMprogramplacessomedataintoregistersandcallsasubroutinethatusesthesameregisters.Itshowshowtousethestack.Examinethestack,stackpointer,andtheregistersusedaftertheexecutionofeachinstruction.

;initializetheSPtopoint

;tothelastlocationofRAM(Stack_Top)

;AssumeStack_Top=0xFF8

LDRR13,=Stack_Top;loadSP

LDRR0,=0x125;R0=0x125

LDRR1,=0x144;R1=0x144

MOVR2,#0x56;R2=0x56

BLMY_SUB;callasubroutine

ADDR3,R0,R1

;R3=R0+R1=0x125+0x144=0x269

ADDR3,R3,R2

;R3=R3+R2=0x269+0x56=0x2BF

HEREBHERE;stayhere

Page 280: ARM Assembly Language: Programming and Architecture

;–––––––––

MY_SUB

;––––––-

;saveR0,R1,andR2onstack

;beforetheyareusedbyaloop

STRR0,[R13];saveR0onstack

SUBR13,R13,#4

;R13=R13-4,todecrementthestackpointer

STRR1,[R13];saveR1onstack

SUBR13,R13,#4

;R13=R13-4,todecrementthestackpointer

STRR2,[R13];saveR2onstack

SUBR13,R13,#4

;R13=R13-4,todecrementthestackpointer

;––—R0,R1,andR2arechanged

MOVR0,#0;R0=0

MOVR1,#0;R1=0

MOVR2,#0;R2=0

;––––––––––––––

;restoretheoriginalregisterscontentsfromstack

ADDR13,R13,#4

;R13=R13+4toincrementthestackpointer

LDRR2,[R13];restoreR2fromstack

ADDR13,R13,#4

;R13=R13+4toincrementthestackpointer

LDRR1,[R13];restoreR1fromstack

ADDR13,R13,#4

;R13=R13+4toincrementthestackpointer

LDRR0,[R13];restoreR0fromstack

BXLR;returntocaller

Page 281: ARM Assembly Language: Programming and Architecture

Solution:

Aftertheexecutionof

Contentsofsometheregisters

(inHex) Stack

R0 R1 R2 SP(R13)

LDRR13,=0xFF8 0 0 0 FF8

LDRR0,=0x125

LDRR1,=0x144

LDRR2,=0x56

125 144 56 FF8

STRR0,[R13]

SUBR13,R13,#4125 144 56 FF4

STRR1,[R13]

SUBR13,R13,#4125 144 56 FF0

STRR2,[R13]

SUBR13,R13,#4125 144 56 FEC

MOVR0,#0

MOVR1,#0

MOVR2,#0

0 0 0 FEC

ADDR13,R13,#4

LDRR2,[R13]0 0 56 FF0

ADDR13,R13,#4

LDRR1,[R13]0 144 56 FF4

Page 282: ARM Assembly Language: Programming and Architecture

ADDR13,R13,#4

LDRR0,[R13]125 144 56 FF8

ThestacklimitandnestedcallsinARM

Asmentionedearlier,wecandefinethestackanywhereintheread/writememory.So,intheARMthestackcanbeasbigasitsRAM.InARM,thestackisusedforcallsandinterrupts.Wemustrememberthatuponcallingasubroutinefromthemainprogramusing theBL instruction,R14, the linker register,keeps trackofwhere theCPUshouldreturnaftercompletingthesubroutine.Now,ifwehaveanothercallinsidethesubroutineusingtheBLinstruction,thenitisourjobtostoretheoriginalR14onthestack.Failuretodothatwillendupcrashingtheprogram.Forthisreason,wemustbeverycarefulwhenmanipulatingthestackcontents.

UsingLDMandSTMinstructionsforthestack

AnotherwaytopushregistercontentsontothestackistouseSTM(storemultiple)andLDM(loadmultiple)instructions.Inmanyinterruptandmultitaskingapplicationsweneedtosavethecontentsofmultipleregistersonthestackandrestorethemback.Pushingtheregistersontothestackoneregisteratatimeistimeconsuming.ForthisreasonARMhas theSTMandLDMinstructions.TheSTMandLDMallowtostore(push)and load(pop)multiple registerswith a single instruction.Next,weexplain each instruction andhowitisused.

STM

BelowshowsthesyntaxfortheSTM.Noticewecanspecifythenumberofregistersto be stored and destination RAM address (pointer) to which the data is pushed onto.Examinethefollowinginstructions:

STMR11,{R0-R3}

;StoreR0throughR3ontomemorypointedtobyR11

STMR8,{R0-R7}

;StoreR0throughR7ontomemorypointedtobyR8

STMR7,{R0,R3,R5}

;StoreR0,R3,R5ontomemorypointedtobyR7

STMR11,{R0-R10}

;StoreR0throughR10ontomemorypointedtobyR11

LDM

BelowshowsthesyntaxfortheLDM.Noticewecanspecifythenumberofregisters

Page 283: ARM Assembly Language: Programming and Architecture

tobeloadedanddestinationRAMaddress(pointer)fromwhichthedataispoppedfrom.Examinethefollowinginstructions:

LDMR11,{R0-R3}

;LoadR0throughR3frommemorypointedtobyR11

LDMR8,{R0-R7}

;LoadR0throughR7frommemorypointedtobyR8

LDMR7,{R0,R3,R5}

;LoadR0,R3,R5frommemorypointedtobyR7

LDMR11,{R0-R10}

;LoadR0throughR10frommemorypointedtobyR11

Example6-9showshowwecanuseSTMandLDMtosimplifyacodeandpreventunwantederrors.

Example6-9

ModifytheExample6-8usingtheLDMandSTMinstructions.

Solution:

;initializetheSPtopointto

;thelastlocationofRAM(Stack_Top)

;AssumeStack_Top=0xFF8

LDRR13,=Stack_Top;loadSP

LDRR0,=0x125;R0=0x125

LDRR1,=0x144;R1=0x144

MOVR2,#0x56;R2=0x56

BLMY_SUB;callasubroutine

ADDR3,R0,R1;R3=R0+R1

;=0x125+0x144=0x269

ADDR3,R3,R2;R3=R3+R2

;=0x269+0x56=0x2BF

HEREBHERE;stayhere

;–––––––––

MY_SUB

Page 284: ARM Assembly Language: Programming and Architecture

;–––––––––

;saveR0,R1,andR2onstack

;beforetheyareusedbyaloop

STMR13,{R0-R2};saveR0,R1,R2onstack

SUBR13,R13,#12

;––—R0,R1,andR2arechanged

MOVR0,#0;R0=0

MOVR1,#0;R1=0

MOVR2,#0;R2=0

;–––

;restoretheoriginalregisterscontentsfromstack

ADDR13,R13,#12

LDMR13,{R0-R2};restoreR0,R1,andR2fromstack

BXLR;returntocaller

;–––-

WecanalsousethePUSHandPOPpseudo-instructionstodothesamething.

CopyingablockofdatawithLDMandSTM

To copy a block of data, we bring into the CPU’s register a word of data frommemoryandthencopyitfromtheregistertoalocationinRAM.Inthatcasewecopyoneword(4bytes)atatime.Sotocopy16wordswehavetosetacounterto16foraloopiteration.We can use theLDMandSTM to do the same thingwithmuch less coding.Using LDM and STM instructions to copy a block of data, we need a register for thesourceaddressandanotheroneforthedestinationaddress.Program6-1usesR11andR12for source and destination addresses, respectively. The registers R0–R9 are used astemporaryplacefordatabeforetheyarecopiedtothedestination.Thatgivesus10words(40bytes)transferatatime.NoticethatintheSTMandLDMinstructions,registersaretransferredtoorfrommemoryintheorderofthelowesttohighest,soR0willalwaysbetransferredtoorfromalocationlowerthanthelocationofR1inthememoryandsoon.ThatiswhyweshouldnotbeworryingabouttheorderofregistersinthestackwhenweuseSTMandLDManditisnotneededtoPUSHandPOPregistersinoppositeorderasiscommoninnormalPOPandPUSHinstructions.

Program6-1:CopyingaBlockofMemory

Thisprogramcopiesablockof10words(40bytes)memoryfromsourcetodestination.TheregistersR11andR12areusedforsourceanddestinationaddresses.

Page 285: ARM Assembly Language: Programming and Architecture

AREAPROG6_1,CODE,READONLY

ENTRY

LDMR11,{R0-R9};LoadR0thruR9frommemorypointedtobyR11

STMR12,{R0-R9};StoreR0thruR9tomemorypointedtobyR12

END

ReviewQuestions

1.The______registeristhedefaultstackpointer.

2.HowdeepisthesizeofthestackintheARM?

3.WriteaprogramthatpushesR5,R6,R7,andR8intothestack.

4.WriteaprogramthatpopsR5,R6,R7,andR8fromthestack.

5.Whatdoesthefollowingprogramdo?

LDRR5,=0x40000000

LDMR5,{R1,R4}

LDRR5,=0x50000000

STMR5,{R1,R4}

Page 286: ARM Assembly Language: Programming and Architecture
Page 287: ARM Assembly Language: Programming and Architecture

Section6.3:ARMBit-AddressableMemoryRegionManymicroprocessorsallowprogramstoaccessmemoryandI/Oportsinbytesize

incrementsonly.Inotherwords,ifyouneedtocheckasinglebitofanI/Oport,youmustreadtheentirebytefirstandthenmanipulatethewholebytewithsomelogicinstructionstogetholdofthedesiredbit.ThisisnotthecasewithARMmicroprocessors.Indeed,oneofthemostimportantfeaturesoftheCPUistheabilitytoaccesstheRAMandI/Oportsinbits insteadofbytes.This isavery importantandpowerful featureof theARMusedwidelyintheembeddedsystemdesignandapplications.InthissectionweshowaddressassignmentofbitsofI/OandRAM,inadditiontowaysofprogrammingthem.

Bit-addressable(bit-band)SRAM

Ofthe4GBmemoryspaceoftheARM,anumberofbytesarebit-addressable.TheARMliteraturereferstothisregionasthebit-bandregion.SincetheARMinstructionis32-bitandtheCPUexecutescodeoneinstructionatatime,thecode(FlashROM)spaceisalwayswordsizeandwordaligned,aswehaveseenthroughoutthechapters.Thatmeansthebit-addressableregionsmustbelocatedinSRAMandI/Operipheralsregions.Thebit-addressableRAMand I/O locations vary among the familymembers and vendors.TheARM generic manual defines the location addresses of bit-band as 0x20000000 to0x200FFFFFforSRAMand0x40000000to0x400FFFFFforperipherals.Noticetheyarelocatedatthelowest1MBaddressspaceofSRAMandperipherals.SeeTable6-1.Itmustbe alsonoted that thebit-band (bit-addressable) regions are theonly region that canbeaccessedinbothbitandbyte/halfword/wordformatswhiletheotherareaofmemorymustbeaccessedinbyte/halfword/wordsize.Inotherwords,thememoryspaceregionoutsideof the bit-band must always be accessed in byte/halfword/word formats only. For theARMCortex-M,thebit-bandSRAMhasaddressesof0x20000000to0x200FFFFF.This1Mbytesbit-addressableregionisgivenaliasaddressesof0x22000000to0x23FFFFFF.Therefore, thebitaddresses0x22000000 to0x2200001Fare for the firstbyteofSRAMlocation0x20000000,and0x22000020to0x2200003FarethebitaddressesofthesecondbyteofSRAMlocation0x20000001,andsoon.SeeFigure6-7.

Page 288: ARM Assembly Language: Programming and Architecture

Figure6-7:SRAMbit-addressableregionandtheiraliasaddresses

SinceeachbyteofSRAMhas8bitsweneedanaddressforeachbit.Thismeansweneedatleast8Maddresslocationstoaccess8Mbits,oneaddressforeachbit.However,tomaketheaddressesword-alignedtheARMprovides4-bytealiasaddressforeachbit.Forexample,0x22000000to0x2200001Fisassignedtoasinglebytelocationof0x20000000.That means we have 0x22000000 to 0x23FFFFFF (total of 32M locations, as aliasaddresses)for1Mbytesofaddress.

BitmapforSRAM

FromFigure6-7onceagainnoticethefollowingfacts:

1.Thebitaddress0x22000000isassignedtoD0ofSRAMlocation0x20000000.

2.Thebitaddress0x22000004isassignedtoD1ofSRAMlocation0x20000000.

3.Thebitaddress0x22000008isassignedtoD2ofSRAMlocationof0x20000000.

4.Thebitaddress0x2200000CisassignedtoD3ofSRAMlocationof0x20000000.

5.Thebitaddress0x22000010isassignedtoD4ofSRAMlocation0x20000000.

6.Thebitaddress0x22000014isassignedtoD5ofRAMlocation0x20000000.

7.Thebitaddress0x22000018isassignedtoD6ofSRAMlocation0x20000000.

8.Thebitaddress0x2200001CisassignedtoD7ofSRAMlocation0x20000000.

NoticethatSRAMlocations0x20000000–0x200FFFFFarebothbyte-addressableand bit-addressable. The only difference is when we access it in byte (or halfword orword)weuseaddresses0x20000000to0x200FFFFF,butwhentheyareaccessedinbit,theyareaccessedviatheiraliasaddressesof0x22000000to0x23FFFFFF.Thereasonit

Page 289: ARM Assembly Language: Programming and Architecture

is called aliases is because it is same physical location but accessed by two differentaddresses. It is like a same person but different names (aliases) See Examples 6-10through6-12.

Example6-10

ThegenericARMchiphasthefollowingaddressassignments.Calculatethespaceandtheamountofmemorygiventoeachregion.

(a)Addressrangeof0x20000000–200FFFFFforSRAMbit-addressableregion

(b)Addressrangeof0x22000000–23FFFFFFforaliasaddressesofbit-addressableSRAM

Solution:

(a)200FFFFF–20000000=FFFFFbytes.ConvertingFFFFFtodecimal,weget1,048,575+1=1,048,576,whichisequalto1Mbytes.

(b)23FFFFFF–22000000=1FFFFFFbytes.Converting1FFFFFFtodecimal,weget33,554,431+1=33,554,432,whichisequalto32Mbytes.

Example6-11

WriteaprogramtosetHIGHtheD6oftheSRAMlocation0x20000001usinga)byteaddressandb)thebitaliasaddress.

Solution:

a)

LDRR1,=0x20000001;loadtheaddressofthebyte

LDRBR2,[R1];getthebyte

ORRR2,R2,#2_01000000;makeD6bithigh

;(binaryrepresentationinKeilfor0b01000000)

STRBR2,[R1];writeitback

b)FromFigure6-7wehaveaddress0x22000038asthebitaddressofD6ofSRAM

Page 290: ARM Assembly Language: Programming and Architecture

location0x20000001.

LDRR1,=0x22000038;loadthealiasaddressofthebit

MOVR2,#1;R2=1

STRBR2,[R1];WriteonetoD6

Example6-12

WriteaprogramtosetLOWtheD0bitoftheSRAMlocation0x20000005usinga)byteaddressandb)thebitaliasaddress.

Solution:

a)

LDRR1,=0x20000005;loadtheaddressofbyte

LDRBR2,[R1];getthebyte

ANDR2,R2,#2_11111110;makeD0bitlow

STRBR2,[R1];writeitback

b)FromFigure6-7wehaveaddress0x220000A0asthebitaddressofD0ofSRAMlocation0x20000005.

LDRR2,=0x220000A0;loadthealiasaddressofthebit

MOVR0,0;R0=0

STRBR0,[R2];writezerotoD0

PeripheralI/Oportbit-addressableregion

The general purpose I/O (GPIO) and peripherals such as ADC, DAC, RTC, andserialCOMport arewidelyused in the embedded systemdesign. InmanyARM-basedtrainerboardsweseetheconnectionofLEDs,switches,andLCDtotheGPIOpinsoftheARM chip. In such trainers the vendor provides the details of I/O port and peripheralconnectionstotheARMchipinadditiontotheiraddressmap.Aswediscussedearlier,theARMhas set aside1Mbytesofaddress space tobeusedbit-band (bit-addressable) I/Oand peripherals. The address space assigned to bit-band peripherals and GPIO is0x40000000 to 0x400FFFFF with address aliases of 0x42000000 to 0x43FFFFFF.Examine your trainer board data sheet for the bit-band addresses implemented on theARMchip.

Page 291: ARM Assembly Language: Programming and Architecture

Figure6-8:Peripheralsbit-addressableregionandtheiraliasaddresses.

BitmapforI/Operipherals

FromFigure6-8onceagainnoticethefollowingfacts.

1.Thebitaddress0x42000000isassignedtoD0ofperipheralslocation0x40000000.

2.Thebitaddress0x42000004isassignedtoD1ofperipheralslocation0x40000000.

3.Thebitaddress0x42000008isassignedtoD2ofperipheralslocationof0x40000000.

4.Thebitaddress0x4200000CisassignedtoD3ofperipheralslocationof0x40000000.

5.Thebitaddress0x42000010isassignedtoD4ofperipheralslocation0x40000000.

6.Thebitaddress0x42000014isassignedtoD5ofperipheralslocation0x40000000.

7.Thebitaddress0x42000018isassignedtoD6ofperipheralslocation0x40000000.

8.Thebitaddress0x4200001CisassignedtoD7ofperipheralslocation0x40000000.

Whenaccessingaperipheralport in a single-bitmanner,wemustuse theaddressaliasesof0x42000000–0x43FFFFFF.Indeedthebit-addressableperipheralsarewidelyusedinembeddedsystemdesign.

ReviewQuestions

Page 292: ARM Assembly Language: Programming and Architecture

1.Trueorfalse.AllbytesofSRAMinARMarebit-addressable.

2.Trueorfalse.AllbitsoftheI/OperipheralsinARMarebit-addressable.

3.Trueorfalse.AllROMlocationsoftheARMarebit-addressable.

4.Ofthe4GbytesofmemoryintheARM,howmanybytesarebit-addressable?Listthem.

5.HowwouldyouchecktoseewhetherbitD0oflocation0x20000002ishighorlow?

6.Findouttowhichbyteeachofthefollowingbitsbelongs.GivetheaddressoftheRAMbyteinhex.

(a)0x23000030 (b)0x23000040 (c)0x23000048

(d)0x4200003C (e)0x43FFFFFC

Page 293: ARM Assembly Language: Programming and Architecture
Page 294: ARM Assembly Language: Programming and Architecture

Section6.4:AdvancedIndexedAddressingModeInpreviouschaptersweshowedhow tousea simple indexedaddressingmode in

STR, LDR, LDM and STM. The advanced addressing modes bring very importantadvantagesintheARMandwillbediscussedinthissection.

Indexedaddressingmode

Intheindexedaddressingmode,aregisterisusedasapointertothedatalocation.TheARMprovidesthreeindexedaddressingmodes.Thesemodesare:preindex,preindexwithwriteback,andpostindex.Table6-2summarizesthesemodes.Eachoftheseindexedaddressingmodecanbeusedwithoffsetoffixedvalueoroffsetofashiftedregister.SeeTable6-3.Inthissectionwewilldiscusseachmodeindetail.

IndexedAddressingMode Syntax PointingLocationin

MemoryRmValueAfterExecution

Preindex LDRRd,[Rm,#k] Rm+#k Rm

PreindexwithWB* LDRRd,[Rm,#k]! Rm+#k Rm+#k

Postindex LDRRd,[Rm],#k Rm Rm+#k

*WBmeansWriteback

**RdandRmareanyofregistersand#kisasigned12-bitimmediatevaluebetween-4095and+4095

Table6-2:IndexedAddressinginARM

Offset Syntax PointingLocation

Fixedvalue LDRRd,[Rm,#k] Rm+#k

Shiftedregister LDRRd,[Rm,Rn,<shift>] Rm+(Rnshifted<shift>)

*RnandRmareanyofregistersand#kisasigned12-bitimmediatevaluebetween-4095and+4095

**<shift>isanyofshiftsstudiedinChapter3likeLSL#2

Table6-3:OffsetofFixedValuevs.OffsetofShiftedRegister

Preindexedaddressingmodewithfixedoffset

In thisaddressingmode,a registerandapositiveornegative immediatevalueareused as a pointer to the data location. The value of register does not change afterinstructionisexecuted.ThisaddressingmodecanbeusedwithSTR,STRB,STRH,LDR,LDRB,andLDRH.SeeExample6-13.

Example6-13

WriteaprogramtostorecontentsofR5totheSRAMlocation0x10000000to0x10000000Fusingpreindexedaddressingmodewithfixedoffset.

Page 295: ARM Assembly Language: Programming and Architecture

Solution:

LDRR5,=0x55667788

LDRR1,=0x10000000

;loadtheaddressoffirstlocation

STRR5,[R1];storeR5tolocation0x10000000

STRR5,[R1,#4]

;storeR5tolocation0x10000000+4(0x10000004)

STRR5,[R1,#8]

;storeR5tolocation0x10000000+8(0x10000008)

STRR5,[R1,#0x0C]

;storeR5tolocation0x10000000+0x0C(0x1000000C)

NoticethatafterrunningthiscodethecontentofR1isstill0x10000000

Itisacommonpracticetousearegistertopointtothefirstlocationofthememoryspace and access the different locations using proper offsets. For example see thefollowingprogram:

ADRR0,OUR_DATA;pointtoOUR_DATA

LDRBR2,[R0,#1];loadR2withBETA

OUR_DATA

ALFADCB0x30

BETADCB0x21

Preindexedaddressingmodewithwrite-backandfixedoffset

Thisaddressingmode is likepreindexedaddressingmodewithfixedoffsetexceptthat the calculated pointer is written back to the pointing register. We put ! after theinstructiontotelltheassemblertoenablewritebackintheinstruction.SeeExample6-14.

Example6-14

RewriteExample6-13usingpreindexedaddressingmodewithwritebackandfixedoffset.

Solution:

LDRR1,=0x10000000;loadtheaddressoffirstlocation

Page 296: ARM Assembly Language: Programming and Architecture

STRR5,[R1];storeR5tolocation0x10000000

STRR5,[R1,#4]!

;storeR5tolocation0x10000000+4(0x10000004)

;writebackmakesR1=0x10000004

STRR5,[R1,#4]!

;storeR5tolocation0x10000004+4(0x10000008)

;writebackmakesR1=0x10000008

STRR5,[R1,#4]!

;storeR5tolocation0x10000008+4(0x1000000C)

;writebackmakesR1=0x1000000C

NoticethatafterrunningthiscodethecontentofR1is0x1000000C

Postindexedaddressingmodewithfixedoffset

Thisaddressingmodeislikepreindexedaddressingmodewithwritebackandfixedoffset except that the instruction is executed on the location that Rn is pointing toregardlessofoffset value.After running the instruction, thenewvalueof thepointer iscalculatedandwrittenbacktothepointingregister.Examinethefollowinginstructions:

STRR1,[R2],#4;storeR1ontomemorypointedtoby

;R2andthenwritebackR2+4toR2

LDRBR5,[R3],#1;loadabytefrommemorypointedto

;byR3andthenwritebackR3+1toR3

Noticethatwritebackisbydefaultenabledinpostindexedaddressingandthereisnoneed toput ! after instructionsbecausepostindexingwithoutwriteback isuselessas theindex is neither used in the instruction nor written back to the pointing register. SeeExample6-15.

Example6-15

RewriteExample6-13usingpostindexedaddressingmodewithfixedoffset.

Solution:

LDRR1,=0x10000000;loadtheaddressoffirstlocation

STRR5,[R1],#4;storeR5tolocation0x10000000andwriteback

Page 297: ARM Assembly Language: Programming and Architecture

;0x10000000+4(0x10000004)toR1

STRR5,[R1],#4;storeR5tolocation0x10000004andwriteback

;0x10000004+4(0x10000008)toR1

STRR5,[R1],#4;storeR5tolocation0x10000008andwriteback

;0x10000008+4(0x1000000C)toR1

STRR5,[R1],#4;storeR5tolocation0x1000000Candwriteback

;0x1000000C+4(0x10000010)toR1

NoticethatafterrunningthiscodethecontentofR1is0x10000010.

Preindexedaddressmodewithoffsetofashiftedregister

This advancedaddressingmode is avery important feature in theARM.We startdescribingthismodefromsimplecasewithnoshift(shift=0)andthenwewillfocusonmorecomplexformats.

Simpleformatofpreindexedaddressmodewithoffsetregister

ThefollowingisthesimplesyntaxforLDRandSTR.

LDRRd,[Rm,Rn];RdisloadedfromlocationRm+Rnofmemory

STRRs,[Rm,Rn];RsisstoredtolocationRm+Rnofmemory

Thisaddressingmodeiswidelyused in implementingofobjectorientedprogramswhenwewant tomakeadynamicarrayofvariables.Example6-16showshowweusethisaddressingmodeinaccessingdifferentlocationsofanarraydefinedinmemory.

Example6-16

ExaminethevalueofR5andR6aftertheexecutionofthefollowingprogram.

POINTERRNR2

ARRAY1RNR1

AREAEXAMPLE_6_16,CODE,READONLY

ENTRY

LDRARRAY1,=MYDATA

LDRBR4,[ARRAY1]

;loadPOINTERlocationofARRAY1toR4(R4= 0x45)

Page 298: ARM Assembly Language: Programming and Architecture

MOVPOINTER,#1;POINTER=1topointtolocation1ofarray

LDRBR5,[ARRAY1,POINTER]

;LoadPOINTERlocationofARRAY1toR5

;(R5 = 0x24)

MOVPOINTER,#2;POINTER=2topointtolocation2ofarray

LDRBR6,[ARRAY1,POINTER]

;loadPOINTERlocationofARRAY1toR6

;(R6 = 0x18)

HEREBHERE

MYDATADCB0x45,0x24,0x18,0x63

END

Solution:

AfterrunningtheLDRBR4,[ARRAY1]instruction,location0ofMYDATAisloadedtoR4.NowR4=0x45.

Next,afterrunningtheLDRBR5,[ARRAY1,POINTER]instruction,location1ofMYDATAisloadedtoR5.NowR5=0x24.

Next,afterrunningtheLDRBR6,[ARRAY1,POINTER]instruction,location2ofMYDATAisloadedtoR6.SothecontentofR6=0x18.

NoticethatpurposelyweusedDCBandLDRBinthisexamplesoeachlocationofMYDATAtakesonebyte.

InExample6-16,wepurposelyusedDCBandLDRBsoeachlocationofMYDATAtakesonebyte.IfwedefineanarrayusingDCD,thenwewillnotbeabletouseLDRBR5,[ARRAY1,POINTER] to load thePOINTERlocationofARRAY.SeeExample6-17forclarification.

Example6-17

InExample6-16,changeMYDATADCB0x45,0x24,0x18,0x63toMYDATADCD0x45,0x2489ACF5andexaminethevalueofR5andR6aftertheexecutionofthe

Page 299: ARM Assembly Language: Programming and Architecture

followingprogram.

POINTERRNR2

ARRAY1RNR1

AREAEXAMPLE_6_17,CODE,READONLY

ENTRY

LDRARRAY1,=MYDATA

LDRBR4,[ARRAY1];loadPOINTERlocationof

;ARRAY1toR4(R4= 0x45)

MOVPOINTER,#1;POINTER=1topointto

;location1ofarray

LDRBR5,[ARRAY1,POINTER]

;LoadPOINTERlocationofARRAY1toR5

MOVPOINTER,#2;POINTER=2topointto

;location2ofarray

LDRBR6,[ARRAY1,POINTER]

;loadPOINTERlocationofARRAY1toR6

HEREBHERE

MYDATADCD0x45,0x2489ACF5

END

Solution:

AfterrunningtheLDRBR4,[ARRAY1]instruction,location0ofMYDATAisloadedtoR4.NowR4=0x45.

Next,afterrunningtheLDRBR5,[ARRAY1,POINTER]instruction,location1ofMYDATAisloadedtoR5.NowR5=0x00.

Next,afterrunningtheLDRBR6,[ARRAY1,POINTER]instruction,location2ofMYDATAisloadedtoR6.SothecontentofR6=0x00.

Page 300: ARM Assembly Language: Programming and Architecture

Toaccess locations of aword size arraywehave tomultiply the pointer by four.Similarly,toaccesslocationsofahalf-wordsizearraywehavetomultiplythepointerbytwo.Forexample,wecancorrectprogramofExample6-17byreplacing

MOVPOINTER,#2;POINTER=2topointtolocation2ofarray

instructionwithfollowinginstructions:

MOVPOINTER,#2;POINTER=2

MOVPOINTER,POINTER,LSL#2;POINTERisshiftedlefttwobits(×4)

;topointtoword2ofthearray

Noticethatbyshiftingleftavaluetwobits,wemultiplyitbyfour.NextwewillseehowwecanuseindexedaddressingwithshiftedregisterstocombinemultiplicationwiththeLDRandSTRinstructions.

Generalformatofpreindexedaddressmodewithoffsetregister

ThegeneralformatofindexedaddressingwithshiftedregisterforLDRandSTRisasfollows:

LDRRd,[Rm,Rn,<shift>];(ShiftedRn)+Rmisusedasthepointer

STRRd,[Rm,Rn,<shift>];(ShiftedRn)+Rmisusedasthepointer

Intheaboveinstructions<shift>canbeanyofshiftinstructionsstudiedinChapter3suchasLSL,LSR,ASRandROR.Examinethefollowinginstructions:

LDRR1,[R2,R3,LSL#2];R2+(R3 × 4)isusedasthepointer

;contentoflocationR2+(R3×4)isloadedtoR1

STRR1,[R2,R3,LSL#1];R2+(R3×2)isusedasthepointer

;R1isstoredtolocationR2 + (R3×2)

STRBR1,[R2,R3,LSL#2];R2+(R3 × 4)isusedasthepointer

;firstbyteofR1isstoredtolocationR2+(R3×4)

LDRR1,[R2,R3,LSR#2];R2+(R3/4)isusedasthepointer

Page 301: ARM Assembly Language: Programming and Architecture

;contentoflocationR2+(R3/4)is

;loadedtoR1

From the above codeswe can see that indexed addressingwith shifted register ismostly used tomultiply thepointer by a powerof two and that iswhy it is also calledindexedaddressingwith scaled register.ExamineExample6-18 to seehowwecanusescaledregisterindexingtoaccessanarrayofwords.

Notice that scaled register indexing is not supported for halfword load and storeinstructions.

Example6-18

ExaminethevalueofR5andR6aftertheexecutionofthefollowingprogram.

POINTERRNR2

ARRAY1RNR1

AREAEXAMPLE_6_18,CODE,READONLY

ENTRY

LDRARRAY1,=MYDATA

MOVPOINTER,#0;POINTER=0

LDRR4,[ARRAY1,POINTER,LSL#2];

MOVPOINTER,#1;POINTER=1

LDRR5,[ARRAY1,POINTER,LSL#2];

MOVPOINTER,#2;POINTER=2

LDRR6,[ARRAY1,POINTER,LSL#2];

HEREBHERE

MYDATADCD0x45,0x2489ACf5,0x2489AC23

END

Solution:

AfterrunningtheLDRR4,[ARRAY1,POINTER,LSL#2]]instruction,location0of

Page 302: ARM Assembly Language: Programming and Architecture

MYDATAisloadedtoR4.NowR4=0x45.

Next,afterrunningtheLDRR5,[ARRAY1,POINTER,LSL#2]instruction,location1ofMYDATAisloadedtoR5.NowR5=0x2489ACF5.

Next,afterrunningtheLDRR6,[ARRAY1,POINTER,LSL#2]instruction,location2ofMYDATAisloadedtoR6.SothecontentofR6=0x2489AC23.

Writebacksign(!)inpreindexedloadandstorewithscaledregister

We can force all scaled register load and store instructions to writeback thecalculated pointer to the pointing register by putting ! after each load and storeinstructions.Examinethefollowinginstructions:

LDRR1,[R2,R3,LSL#2]!

;R2+(R3×4)isusedasthepointer

;contentoflocationR2+(R3×4)isloadedtoR1

;R2=R2+(R3×4)(R2isupdated.)

STRR1,[R2,R3,LSL#1]!

;R2+(R3×2)isusedasthepointer

;R1isstoredtolocationR2+(R3×2)

;R2=R2+(R3×2)(R2isupdated.)

Scaledregisterpostindex

The following instructionsare someexamplesof scaled registerpostindex in loadandstoreinstructions:

STRR1,[R2],R3,LSL#2

;storeR1atlocationR2ofmemoryandwriteback

;R2+(R3 × 4)toR2.

LDRR1,[R2],R3,LSL#2

;loadlocationR2ofmemorytoR1andwriteback

;R2+(R3 × 4)toR2.

Page 303: ARM Assembly Language: Programming and Architecture

Look-uptable

Oneuseofarrayisforimplementinglook-uptables.Thelook-uptableisawidelyusedconceptinmicrocontrollerprogramming.Itallowsaccesstoelementsofafrequentlyusedtablewithminimumoperations.Asanexample,assumethatforacertainapplicationwe need 4 + x2 values in the range of 0 to 9. To do so, the look-up table is stored inprogrammemoryspaceandaccessedasanarray.ThisisshowninExamples6-19through6-21.

Example6-19

WriteaprogramtogetthexvaluefromR9andsendx2+2x+3toR10.AssumeR9hasthexvalueof0–9.Usealook-uptableinsteadofamultiplyinstruction.

Solution:

AREALOOKUP_EXAMP6_19,READONLY,CODE

ENTRY

ADRR2,LOOKUP;pointtoLOOKUP

LDRBR10,[R2,R9];R10=locationR9oflookuparray

HEREBHERE;stayhereforever

LOOKUPDCB3,6,11,18,27,38,51,66,83,102

END

Example6-20

WriteaprogramtogetthexvaluefromR9andsendfactorialofxtoR10.AssumeR9hasthexvalueof0–10.Usealook-uptableinsteadofamultiplyinstruction.

Solution:

AREALOOKUP_EXAMP6_20,READONLY,CODE

ENTRY

MOVR9,#5

ADRR2,LOOKUP;pointtoLOOKUP

LDRR10,[R2,R9,LSL#2];R10=locationR9oflookuparray

Page 304: ARM Assembly Language: Programming and Architecture

HEREBHERE;stayhereforever

LOOKUPDCD1,1,2,6,24,120,720,5040,40320,362880,3628800

END

Example6-21

Writeaprogramthatcalculates10tothepowerofR2andstorestheresultinR3.AssumeR2hasthexvalueof0–6.Usealook-uptableinsteadofamultiplyinstruction.

Solution:

AREALOOKUP_EXAMP_6_21,READONLY,CODE

ENTRY

ADRR1,LOOKUP;pointtoLOOKUP

LDRR3,[R1,R2,LSL#2];R3=locationR2oflookuparray

HEREBHERE;stayhereforever

LOOKUPDCD1,10,100,1000,10000,100000,1000000

END

WritebackoptionsofSTMandLDM

TheSTMandLDMinstructionsallowyoutostoreandloadmultipleregisterswithasingleinstruction.Wecanalsospecifytheactiontobetakenforthepointer.Theactioncanbeincrementordecrementbeforeorafterthepopisdone.ThisisshowninTable6-4.

Option Description

IA IncrementAfter

IB IncrementBefore

DA DecrementAfter

DB DecrementBefore

Page 305: ARM Assembly Language: Programming and Architecture

Table6-4:OptionsforLDMandSTMinstructions

IA stands for IncrementAfter and adds four (the size of register in bytes) to thepointerafterloadorstoringeachregister.

IBstands for IncrementBeforeandadds four (the sizeof register inbytes) to thepointerbeforeloadorstoringeachregister.

DAstandsforDecrementAfterandsubtractsfour(thesizeofregisterinbytes)fromthepointerafterloadorstoringeachregister.

DB stands forDecrementBefore and subtracts four (the size of register in bytes)fromthepointerbeforeloadorstoringeachregister.

ForfurtherclarificationassumethatR1=0x100.Figure6-9showsthememoryafterrunningSTMR1!,{R2,R3}witheachofIA,IB,DAandDBoptions.

Figure6-9:FourOptionsofSTMandLDMinARM

Notice that generally we have four stack structure, either it is ascending ordescending.Thestackiscalledascendingwhenitisincrementedaftereachstore(PUSH)instruction and decremented after each load (POP) instruction. It is called descendingwhen it is decremented after each store (PUSH) instruction and incremented after eachload(POP)instruction.Thestackpointercanpointtothelastfilledlocation;inthiscasethestackiscalledFullStack.Thestackpointercanpointtothenextavailablelocation,aswell;whichiscalledanEmptyStack.SeeFigure6-10formoreclarification.

Page 306: ARM Assembly Language: Programming and Architecture

Figure6-10:FourGeneralStackStructure

To implement a Full Ascending stack we have to use STMIB because the stackpointershouldincrementonstoreinstructionanditshouldbeincrementedbeforestoringeach registerbecause it shouldpoint to full location.On theotherhandwehave touseLDMDA for pop instruction because the stack pointer should decrement on loadinstructionanditshouldbedecrementedbeforeloadingeachfulllocationbecauseitwaspointing to an empty locationbeforedecrementing.Table6-5 lists appropriate load andstore instruction for each stack structure. It is difficult and prone to error to rememberwhichloadandstoreshouldbeusedwitheachstackstructure.Tosolvethisproblem,eachofloadandstoreoptionshasalternatenamewhichiseasytorememberwhenitisusedforstackoperation.ThelasttwocolumnsofTable6-5listthealternatenames.

StackStructure Load StoreLoad

(alternateNames)

Store

(alternateNames)

FullAscending LDMDA STMIB LDMFA STMFA

FullDescending LDMIA STMDB LDMFD STMFD

EmptyAscending LDMDB STMIA LDMEA STMEA

EmptyDescending LDMIB STMDA LDMED STMED

Page 307: ARM Assembly Language: Programming and Architecture

Table6-5:OptionsforLDMandSTMinstructions

Program6-2usesSTMandLDMtosimplifyprogramofExample6-8.

Program6-2:UsingSTMFAandLDMFAforStack(RepeatofExample6-8)

;UsingFullAscendingLoadandStoreforstack

AREAPROG6_2,CODE,READONLY

ENTRY

LDRR13,=Stack_Top;loadSP

LDRR0,=0x125;R0=0x125

LDRR1,=0x144;R1=0x144

MOVR2,#0x56;R2=0x56

BLMY_SUB;callasubroutine

ADDR3,R0,R1;R3=R0+R1=0x125+0x144=0x269

ADDR3,R3,R2;R3=R3+R2=0x269+0x56=0x2BF

HEREBHERE;stayhere

;–––––––––

MY_SUB

;––—saveR0,R1,andR2onstackbeforetheyareusedbyaloop

STMFAR13,{R0-R2};saveR0,R1,R2onstackusingFullAscending

;––—R0,R1,andR2arechanged

MOVR0,#0;R0=0

MOVR1,#0;R1=0

MOVR2,#0;R2=0

;–––restoretheoriginalregisterscontentsfromstack

LDMFAR13,{R0-R2}

;restoreR0,R1,andR2fromstackusingF.Ascending

BXLR;returntocaller

END

ItmustbenotedthatARM CortexhasPUSHandPOPinstructions.SeeAppendixAformoreinformation.

ReviewQuestions

1.Trueorfalse.Inafullstackthepointerpointstothelaststoredlocation.

Page 308: ARM Assembly Language: Programming and Architecture

2.Trueorfalse.Inadescendingstackthepointerincrementsaftereachpush.

3.WriteaninstructionthatpushesLR(R14)intoanemptydescendingstack.

4.WriteaninstructionthatpopsR10fromafullascendingstack.

Page 309: ARM Assembly Language: Programming and Architecture
Page 310: ARM Assembly Language: Programming and Architecture

Section6.5:ADR,LDR,andPCRelativeAddressingIn indexedaddressingmodes,anyregisters includingthePC(R15)registercanbe

usedas thepointerregister.Forexample, thefollowing instructionreads thecontentsofmemorylocationPC+4:

LDRR0,[PC,#4]

Inthisway,thedatawhichhasaknowndistancefromthecurrentexecutinglinecanbe accessed. As discussed in Chapter 4, the PC register points 8 bytes (2 instructions)ahead of executing instruction. As a result, “LDR R0,[PC,#4]” accesses a memorylocationwhoseaddressis4+8bytesaheadofthecurrentinstruction.Generallyspeaking,theaddressofthememorylocationwhichisbeingaccessedusing“LDRR0,[PC,offset]”canbefoundusingthisformula:theaddressofcurrentinstruction+8+offset.

Forinstance,if“LDRR0,[PC,#4]”islocatedinaddress0x10theeffectiveaddressis:0x10+8+4=0x1C.

Thisaddressingmodecanbeconsideredasasubsetofindexedaddressingmodes.InARMapplicationnotes,theprogramrelativeaddressingmodeisconsideredasaseparateaddressingmode.

ImplementingtheADRDirective(LDRRn,=value)

The ADR directive uses the PC relative addressing mode to load registers. Forexampleseethefollowingprogram:

AREALOOKUP_EXAMPLE,READONLY,CODE

ENTRY

ADRR2,OUR_FIXED_DATA;pointtoOUR_FIXED_DATA

LDRBR0,[R2];loadR0withthecontents

;ofmemorypointedtobyR2

ADDR1,R1,R0;addR0toR1

HEREBHERE;stayhereforever

OUR_FIXED_DATA

DCB0x55,0x33,1,2,3,4,5,6

DCD0x23222120,0x30

DCW0x4540,0x50

END

SeeFigure6-11.Atcompiletime,theADRisreplacedwith“ADDR2,PC,#0x08”.Sincetheinstructionisinaddress0x00,theinstructionaccesseslocation0+8+0x08=0x10.AsshownintheFigure,0x10istheaddressofOUR_FIXED_DATA.

Page 311: ARM Assembly Language: Programming and Architecture

Figure6-11:MemoryDumpforADRInstruction

ImplementingtheLDRDirective

To implement the LDR directive, assembler stores the value as a fixed data inprogram memory and then accesses it using the LDR instruction and the PC relativeaddressingmode. Figure 6-12 shows the implementation of Program6-3. For example,0x12345678isstoredinmemorylocations0x10–0x13,andtheLDRdirectiveisreplacedwithLDRR0,[PC,#0x08].SincePC=0, theLDRR0,=0x1234567 is located at address0x000000.Nowwehave0+8+8=16=0x10.

Program6-3:LDRDirective

AREAEXAMPLE,READONLY,CODE

ENTRY

LDRR0,=0x12345678

LDRR1,=0x86427531

ADDR2,R0,R1

H1BH1

END

Figure6-12:MemoryDumpforLDRInstruction

Page 312: ARM Assembly Language: Programming and Architecture

The same way for the LDR R1,=0x86427531 is located in ROM address0x00000004. Therefore we have 4+8+8=20=0x14, which is the address of the data0x86427531.

ReviewQuestions

1.WhichregisterisusedasthepointerinPCrelativeaddressingmode?

2.WhichdirectiveismoreoptimizedADRorLDR?Why?

Page 313: ARM Assembly Language: Programming and Architecture

ProblemsSection6.1:ARMMemoryMapandMemoryAccess

1.Whatisthebusbandwidthunit?

2.Givethevariablesthataffectthebusbandwidth.

3.Trueorfalse.Onewaytoincreasethebusbandwidthistowidenthedatabus.

4.Trueorfalse.Anincreaseinthenumberofaddressbuspinsresultsinahigherbusbandwidthforthesystem.

5.Calculatethememorybusbandwidthforthefollowingsystems.

(a)ARMof100MHzbusspeedand0WS

(b)ARMof80MHzbusspeedand1WS

6.Indicatewhichofthefollowingaddressesiswordaligned.

(a)0x1200004A (b)0x52000068 (c)0x66000082

(d)0x23FFFF86 (e)0x23FFFFF0 (f)0x4200004F

(g)0x18000014 (h)0x43FFFFF3 (i)0x44FFFF05

7.Showhowdataisplacedafterexecutionofthefollowingcodeusing(a)littleendianand(b)bigendian.

LDRR2,=0xFA98E322

LDRR1,=0x20000100

STR[R1],R2

8.Trueorfalse.InARM,instructionsarealwayswordaligned.

9.Trueorfalse.Inawordalignedaddressthelowerdigitoftheaddressis0,4,8,orC.

10.Showhowmanymemorycyclesdoesittaketofetchthefollowingdataintoregister

LDRR1,=0x20000004

LDRD[R1],R2

11.Showhowmanymemorycyclesdoesittaketofetchthefollowingdataintoregister

LDRR1,=0x20000102

LDRD[R1],R2

12.Showhowmanymemorycyclesdoesittaketofetchthefollowingdataintoregister

Page 314: ARM Assembly Language: Programming and Architecture

LDRR1,=0x20000103

LDRD[R1],R2

13.Showhowmanymemorycyclesdoesittaketofetchthefollowingdataintoregister

LDRR1,=0x20000006

LDRH[R1],R2

14.Showhowmanymemorycyclesdoesittaketofetchthefollowingdataintoregister

LDRR1,=0x20000C10

LDRB[R1],R2

Section6.2:StackandStackUsageinARM

15.Trueorfalse.InARMtheR13isdesignatedasstackpointer.

16.WhenBLisexecuted,howmanylocationsofthestackareused?

17.WhenBisexecuted,howmanylocationsofthestackareused?

18.InARM,stackpointeris_________register.

19.DescribehowtheactionassociatedwiththereturnoperationisperformedinARM.

20.GivethesizeofthestackinARM.

21.InARM,whichaddressissavedwhenBLinstructionisexecuted.

Section6.3:ARMBit-AddressableMemoryRegion

22.WhichmemoryregionsofARMarebit-addressable?

23.Givethebit-addressableSRAMregionaddressforgenericARM.

24.Whatbitaddressesareassignedtobyteaddressof0x20000004?

25.Whatbitaddressesareassignedtobyteaddressof0x20000010?

26.Whatbitaddressesareassignedtobyteaddressof0x200FFFFF?

27.Whatbitaddressesareassignedtobyteaddressof0x20000020?

28.Whatbitaddressesareassignedtobyteaddressof0x40000008?

29.Whatbitaddressesareassignedtobyteaddressof0x4000000C?

30.Whatbitaddressesareassignedtobyteaddressof0x40000020?

31.Thefollowingarebitaddresses.Indicatewhereeachonebelongs.

(a)0x2200004C (b)0x22000068 (c)0x22000080

(d)0x23FFFF80 (e)0x23FFFF00 (f)0x4200004C

Page 315: ARM Assembly Language: Programming and Architecture

(g)0x42000014 (h)0x43FFFFF0 (i)0x43FFFF00

32.Ofthe4GbytesofmemorylocationsintheARM,howmanyofthemarealsoassignedabitaddressaswell?Indicatewhichbytesthoseare.

33.Trueorfalse.Thebit-addressableregioncannotbeaccessinbyte.

34.Trueorfalse.Thebit-addressableregioncannotbeaccessinword.

35.WriteaprogramtoseewhethertheD7bitofRAMlocation0x20000020ishigh.Ifso,senda1toD1ofRAMlocation0x20000000.

36.WriteaprogramtoseewhethertheD7bitofI/Olocation0x40000000islow.Ifso,senda0totheD0oflocation0x400FFFFF.

37.WriteaprogramtosethighallthebitsofRAMlocations0x2000000usingthefollowingmethods:

(a)byteaddresses(b)bitaddresses

38.WriteaprogramtoseewhethertheSRAMlocation0x20000000isdivisibleby8.

39.ExplainhowtheLDMinstructionworks.

40.ExplainhowtheSTMinstructionworks.

41.ExplainthedifferencebetweenLDMandLDRinstructions.

42.ExplainhowthedifferencebetweenSTMandSTRinstructions.

43.ExplaintheLDMIAoperationanditsimpactontheSP.

44.ExplaintheLDMIBoperationanditsimpactontheSP.

45.ExplaintheSTMIAoperationanditsimpactontheSP.

46.ExplaintheSTMIBoperationanditsimpactontheSP.

Section6.4:AdvancedIndexedAddressingMode

47.Trueorfalse.Writebackisbydefaultenabledinpreindexedaddressingmode.

48.Indicatetheaddressingmodeineachofthefollowinginstructions

(a)LDRR1,[R5],R2,LSL#2(b)STRR2,[R1,R0]

(c)STRR2,[R1,R0,LSL#2]!(d)STRR9,[R1],R0

49.Whatisanascendingstack?

50.Whatisthedifferencebetweenanemptyandafullstack?

51.WriteaninstructionthatstoresR0inafulldescendingstack.

52.WriteaninstructionthatloadsR9fromanemptydescendingstack.

Section6.5:ADR,LDR,andPCRelativeAddressing

53.Assumingthattheinstruction“LDRR2,[PC,#8]islocatedinaddress0x300,

Page 316: ARM Assembly Language: Programming and Architecture

calculatetheaddressofthememorylocationwhichisaccessed.

54.UsingPCrelativeaddressingmode,writeanLDRinstructionthataccessesamemorylocationwhichis0x20bytesaheadofitself.

Page 317: ARM Assembly Language: Programming and Architecture

AnswerstoReviewQuestionsSection6.1

1.4bytes

2.Compilersensurethatcodesarewordaligned.

3.littleendian

4.1/66MHz=15.15nsisthebusclockperiod.Sincethebuscycletimeofzerowaitstatesis2clocks,wehave2×15.15=30.3ns

5.1/100MHz=10nsisthebusclockperiod.50ns-10ns=40ns.TheNumberofWSis40ns/10ns=4.

6.False

Section6.2

1.R13

2.ThestackcanbeasbigasitsRAM

3.STMR13,{R5-R8}

SUBR13,R13,#16

4.ADDR13,R13,#16

LDMR13,{R5-R8}

5.Itcopiesthecontentsoflocations0x40000000–0x4000000Fintolocations0x50000000–0x5000000FusingtheLDMandSTMinstructions.

Section6.3

1.False

2.False

3.False

4.2MBytes;locations0x20000000to0x200FFFFFofSRAMand0x40000000to0x400FFFFFofGPIO

5.

LDRR0,=0x22000040

LDRR1,[R0]

CMPR1,#0

BNEL1

L1:

Page 318: ARM Assembly Language: Programming and Architecture

6.

(a)0x23000030-0x22000000=0x1000030;0x1000030/0x20=0x80001;thusitisinlocation0x20000000+0x80001=0x20080001

(0x1000030%32)/4=(48%32)/4=16/4=4;itisD4of0x20080001.

(b)0x1000040/0x20=0x80002;itisinlocation0x20080002

(0x1000040%0x20)/4=0;itisD0oflocation0x20080002

(c)0x1000048/0x20=80002;itisinlocation0x20080002

(0x1000048%0x20)/4=2;itisD2oflocation0x20080002

(d)0x4200003C-0x42000000=0x03C;0x03C/0x20=0x01

(0x3C%0x20)/4=0x1C/4=7;itisD7of0x40000001

(e)0x43FFFFFC–0x42000000=0x1FFFFFC;0x1FFFFFC/0x20=0xFFFFF

(0x1FFFFFC%0x20)/4=0x1C/4=7;D7of0x400FFFFF

Section6.4

1.True

2.False

3.STMEDSP!,{LR}

4.LDMFASP!,{R10}

Section6.5

1.PC(R15)

2.ADR,ToimplementtheLDRdirectivethevalueisstoredinmemory;asaresult,itusesmorememorywhiletheADRusesnomemory.

Page 319: ARM Assembly Language: Programming and Architecture
Page 320: ARM Assembly Language: Programming and Architecture
Page 321: ARM Assembly Language: Programming and Architecture

Chapter7:ARMPipelineandCPUEvolutionThis chapterwill look at pipeline evolution inARMwhile examining otherCPU

enhancements. In Section 7.1 the ARM’s pipelines are studied. Section 7.2 exploresvariousprocessorsenhancements.

Page 322: ARM Assembly Language: Programming and Architecture
Page 323: ARM Assembly Language: Programming and Architecture

Section7.1:ARMPipelineEvolutionThere aremanyways available to processor designers to increase the processing

poweroftheCPU.HerewelistsomeofthemthatareusedinARM.

1. Increase theclockfrequencyof thechip.Onedrawbackof thismethod is that thehigher the frequency, the more the power dissipation and the more difficult andexpensivethedesignofthemicroprocessorandmotherboard.

2.Increasethenumberofdatabusestobringmoreinformation(codeanddata)intotheCPU to be processed. For example, Von Neumann architecture in ARM7 has beenreplacedbyHarvardarchitectureinnewerversions.SeeChapter6.

3. Change the internal architecture of the CPU to overlap the execution of moreinstructions. This requires a lot of transistors. There are two trends for this option,superpipelineandsuperscalar.Insuperpipelining,theprocessoffetchingandexecutinginstructionsissplitintomanysmallstepsandallaredoneinparallel.Inthiswaytheexecution of many instructions is overlapped. The number of instructions beingprocessedatagiventimedependsonthenumberofpipelinestages,commonlytermedthe pipeline depth. Some designers use as many as 8 stages of pipelining. Onelimitationofsuperpipeliningisthatthespeedoftheexecutionislimitedtothesloweststage of the pipeline. Compare this to making pizza. You can split the process ofmakingpizzaintomanystages,suchasflatteningthedough,puttingonthetoppings,andbaking,buttheprocessislimitedtothesloweststage,baking,nomatterhowfastthe rest of the stages areperformed.Whathappens ifweuse twoor threeovens forbakingpizzas tospeedup theprocess? Thismayworkformakingpizzabutnot forexecutingprograms,sinceintheexecutionofinstructionswemustmakesurethatthesequenceof instructions iskept intact and that there isnoout-of-stepexecution.Thedifficultiesassociatedwithastalledpipeline(aslowdowninonestageofthepipeline,which prevents the remaining stages from advancing) has made CPU designersabandonsuperpipelininginfavorofsuperscaling.Insuperscaling,theentireexecutionunit has beendoubled and eachunit has 5 pipeline stages.Therefore, in superscalar,there is more than one execution unit and each has many stages, rather than oneexecution unit with 8 stages as in the case of a superpipelined processor. In somesuperscalarprocessors,therearetwoexecutionunitseachwith4pipelinestagesinsteadofasingleexecutionunitwith8pipelinestagesassuperpipeliningproponentswouldhave it. Inotherwords, insuperscalingwehave two(oreven three)executionunitsandastheinstructionsarefetchedtheyareissuedtothevariousexecutionunits.Usingtheanalogyofpizza,superscalarislikedoublingortriplingtheentirecrewflatteningthedough,puttingtoppingson,andbaking.Ofcourse,youwillneedalotmorepeopleinvolvedintheprocessandyouhavetohavemoreovens,butatthesametimeyouaredoublingortriplingthepizzaoutput.Incasesofrecentmicroprocessorarchitecture,avastmajorityofdesignershavechosensuperscalingoversuperpipelining.Thisrequiresnumeroustransistorstoduplicateseveralexecutionunits,justlikeneedingmorepeoplein our pizza-making analogy. Fortunately, advances in IC design have alloweddesigners access to hundreds of million transistors to throw around for the

Page 324: ARM Assembly Language: Programming and Architecture

implementationofpowerfulsuperscaling.Therearesomeproblemswithsuperscaling,suchasdatadependencyissues,whichcanbesolvedbythecompiler.

4.Combiningmorethanonecoreinasingleprocessorisanotherwayofimprovingthespeed of high end processers. Cortex-A series ofARM supports up to four cores incombinationwitheachother.

Next,wewillexaminetheissueofpipelining.SeeFigure7-1.

Figure7-1:Non-PipelinedInstructionExecutionvs.2-stagePipeline(8086)

Moreaboutpipelining

IntheearlyCPUstherewasnopipelining.Atanygivenmoment,iteitherfetchedorit executed. It couldnotdobothat the same time. In thenon-pipelinedCPU,while thebuseswerefetchingtheinstructions(opcodes)anddata,theCPUwassittingidle,andinthesameway,whentheCPUwasexecutinginstructions,busesweresittingidle.However,intheearlypipelinedCPUsuchas8086thefetchandexecutewereperformedinparallelby two sections inside the CPU called the BIU (bus interface unit) and EU (executionunit).

For the concept of pipelining to work, we need a buffer or queue in which aninstructionisprefetchedandreadytobeexecuted.Insomecircumstances,theCPUmustflush out the queue. For example, when a branch (B, BNE, BCS, and so on) or callinstructionisexecuted,theCPUstartstofetchcodesfromthenewmemorylocationandthecodeinthequeuethatwasfetchedpreviouslyisdiscarded.Inthiscase,theexecutionunit must wait until the fetch unit fetches the new instruction. This is called a branchpenalty.Thepenalty isanextra instructioncycle to fetch the instruction from the targetlocation insteadof executing the instruction rightbelow thebranch.Remember that theinstructionbelowthebranchhasalreadybeenfetchedand isnext in line tobeexecutedwhentheCPUbranchestoadifferentaddress.NotethatnewerCPUshavemorestagesintheir pipeline. For example a 3 stage pipelinemay divide the code execution to Fetch,

Page 325: ARM Assembly Language: Programming and Architecture

Decode and Execute stages.When the number of stages in a pipeline increases, morestagesshouldbeflushedoutwhenabranchinstructionisexecuted.ExamineExample7-1toseehowbranchpenaltyslowsdowntheexecutionofacode.Next,youwill seehowbranchpredictionsolvestheproblem.

Example7-1

Howmanycyclesdoesittakefora3stagepipelinedCPUtorun3iterationofthefollowingcode?

MOVR1,#0

L1ADDR2,R2,#1

BL1

MOVR3,#3

MOVR4,#4

Solution:

For the first instruction (MOV R1,#0), it takes 3 cycles to pass through the stages ofpipelineandbeexecuted.Afterthethirdcycle,oneinstructionisexecutedineachcycle.WhentheBranchinstructionisexecutedincycle5,theCPUflushesthepipelinebecausethe fetch and decode instruction in cycles 4 and 5 are not needed. It causes two clockcyclesbranchpenalty.ThesamescenariohappenseachtimetheCPUexecutesabranch.Aswecanseeinthefigure,Ittakes13cyclestorun7instructions.

Branchprediction

Branch prediction is another new feature of the new CPUs. The penalty forbranchingisveryhighforahigh-performancepipelinedmicroprocessorsuchastheARM.Forexample, inthecaseoftheBNE(branchifnotequal) instruction, if itbranches, the

Page 326: ARM Assembly Language: Programming and Architecture

pipelinemustbeflushedandrefilledwithinstructionsfromthetargetlocation.Thistakestime.Incontrast,theinstructionimmediatelybelowtheBNEisalreadyinthepipelineandisadvancingwithoutdelay.Someprocessorshave thecapability topredict andprefetchcodefrombothpossible locationsandhavethemadvancedthroughthepipelinewithoutwaiting (stalling) for the outcome of the zero flag. The ability to predict branches andavoidthebranchpenaltycanresultinasubstantialreductionintheclockcountforagivenprogram. Some CPUs have branch prediction, but with greater capability. When itencountersbranchinstructions(suchasBNE),itcreatesalistoftheminwhatiscalledthebranchtargetbuffer(BTB).TheBTBpredictsthetargetofthebranchandstartsexecutingfromthere.Whenthebranchisexecuted,theresultiscomparedwithwhatthepredictionsection of the CPU said it would do. If they match, the branch is retired. If not, allinstructions behind the branch are removed from the pool and the correct branch targetaddressisprovidedtotheBTB.FromtheretheBTBrefillsthepipelinewithinstructionsfromthenewtargetaddress.SeeExample7-2.

Example7-2

Showhowmanycyclesdoesittakefora3-stagepipelinedCPUtorun3iterationofthecodeinExample7-1?Assumethatthebranchpredictionunithaspredictedallbranches.

Solution:

Ittakes9cyclestorun7instructions.Incycles4to8instructionsarepredictedbybranchpredictionunit.

Notethatstoresareneverperformedspeculativelysincethereisnotransparentwaytoundo them.Storesarealsonever re-orderedamong themselves.Astore isdispatchedonly when both the address and the data are available and there are no older storesawaitingdispatch.

3-stagepipelineinARM7

Sincetheintroductionofthe8086microprocessorin1978,processordesignershavecometorelymoreandmoreontheconceptofpipeliningtoincreasetheprocessingpower

Page 327: ARM Assembly Language: Programming and Architecture

oftheCPU.ARM7usedtheconceptofpipeliningwiththreestagesoffetch,decode,andexecute.SeeExample7-3.

Example7-3

ShowhowthefollowingcodeisexecutedinARM7.

MOVR4,R5

ADDR1,R2,R3

SUBR6,R7,R8

Solution:

5-stagepipelineinARM9

AswementionedearliertheARM7hasa3-stagepipeline.AsshowninFigure7-2,theARM9hasextendedthepipelineto5stages.Theyare:

1.Fetch

2.Decode

3.Execute

4.Memory

5.Write

Figure7-2:5-StagePipelineinARM9

Fetch:IntheFetchstagetheinstructionsarefetchedfrommemoryandplacedinthequeueandwaittobedecoded.InthisstagetheProgramCounter(PC)isalsoincrementedby4sincetheARMinstructionsare4-bytes.

Decode: In thedecodestage the instruction isdecodedand theregister file isalso

Page 328: ARM Assembly Language: Programming and Architecture

accessedtogeteverythingreadyfortheExecutestage.

Execute:Inthisstage,anyeffectiveaddresscalculationandsignextendingofabyteor a half-word are done. In the instructions such as Load and Store this stage getseverything ready for the next stage of memory access. In instructions such as “ADDR1,R2,R3”inwhichalltheresourcesneededfortheexecutionoftheinstructionarereadybeforeitcomestothisstage,theregistersareaddedanditgoesdirectlytothewrite-backstagetowritetheresulttotheregisterfile.

Memory: For instructions such as Load and Store in which external memoryaccessesareneeded,thememorystagefetchesthedatafromtheexternalmemoryandhasthedatainsidetheCPUreadyforthenextstageofthewrite-back.Ifaninstructiondoesnotneedtoaccessmemory, thisstageisbypassedandtheresult isforwardedtothelaststageofwrite-back.Thismeansitbecomesa4-stagepipeline.

Write:Alsocalledwrite-backisthestageinwhichtheinstructioniscompletedbywriting the result to the register fileand retiring the instruction.Aswe just stated, ifaninstructiondoesnotneedtoaccessmemory,write-backisthestagerightaftertheexecutestage,meaningformanyinstructionswereallyhave4-stagepipeline.

3-stagevs.5-stagepipeline

Inthe3-stagepipelineofARM7,theexecution,thememoryaccess,andthewritingtheresulttoregisterfileareallperformedbytheExecutestage.Inthe5-stagepipeline,theCPUdecouplesthememoryaccessandexecutestage.Withthetwonewstagesofmemoryandwrite-back, theARM9 increases the processing power of theCPUby allowing theCPUtoworkconcurrentlyon5instructionsinsteadof3instructionsatagiventime.ThisisamajorenhancementoftheARM9overARM7.

ReviewQuestions

1.Whatissuperpipelining?

2.Whatisthelimitationofsuperpipeline?

3.Whatissuperscaling?

4.Trueorfalse.The5-stagepipelinehasbetterperformancethanthe3-stagepipeline.

5.Givethenamesofthe5-stagepipelineinARM9.

Page 329: ARM Assembly Language: Programming and Architecture
Page 330: ARM Assembly Language: Programming and Architecture

Section7.2:OtherCPUEnhancementsThere aremany otherways available tomicroprocessor designers to increase the

processingpoweroftheCPU.Next,weexaminesomeofthem.

SuperscalarCPUs

AnotheruniquefeatureofthemanyofnewCPUsisitssuperscalararchitecture.AlargenumberoftransistorsareusedtoputmorethanoneexecutionunitinsidetheCPU.Astheinstructionsarefetched,theyareissuedtotheseexecutionunits.Figure7-3showstheconceptofsuperscalar.

Figure7-3:SuperscalarCPUs

Issuingtwoinstructionsatthesametimetodifferentexecutionunitscanworkonlyiftheexecutionofonedoesnotdependontheotherone,inotherwords,ifthereisnodatadependency.Asanexample,lookatthefollowinginstructions.

ADDR1,R2,R3

SUBR4,R1,R5

ANDR6,R7,R8

MOVR9,R10

Intheabovecode,theADDandSUBinstructionscannotbeissuedtotwoexecutionunitssinceR1,thedestinationofthefirstinstruction,isusedimmediatelybythesecondinstruction.Thisiscalledread-after-writedependencysincetheSUBinstructionwantstoreadtheR1contents,but itmustwaituntilafter theADDisfinishedwritingit intoR1.TheproblemisthatADDwillnotwriteintoR1untilthelaststageofthepipeline,andbythen it is too late for the pipeline of the SUB instruction. This prevents the SUBinstructionfromadvancinginthepipeline,thereforecausingthepipelinetobestalleduntiltheADDfinisheswritingandthentheSUBinstructioncanadvancethroughthepipeline.ThiskindofdatadependencyraisestheclockcountfortheSUBinstruction.Whatiftheinstructionsarerescheduled?Wewilldiscussoutoforderexecutionnextinthischapter.

ADDR1,R2,R3

ANDR6,R7,R8

Page 331: ARM Assembly Language: Programming and Architecture

SUBR4,R1,R5

MOVR9,R10

If they are rescheduled as shownabove, each canbe issued to separate executionunits,allowingparallelexecutionofbothinstructionsbytwodifferentunitsoftheCPU.Since the clock count for each instruction is one, having two execution units leads toexecuting two instructionsbypairing themtogether, therebyusingonlyoneclockcountfor two instructions. In the case of the above program, if it is run on the CPU withsuperscalar it will take only 2 clocks instead of 4, assuming that two instructions arepaired together. This reordering of instructions to take advantage of the two internalexecutionunitsoftheCPUcanbedonebycompilerorCPU itselfandiscalledinstructionscheduling.Currently,somecompilersarebeingequippedtodoinstructionschedulingtoremovedependencies.Theprocessofissuingtwoinstructionstothetwoexecutionunitsiscommonlyreferredtoasinstructionpairing.

Superpipelinedandsuperscalar

Somemicroprocessors use a 10-stage pipeline for the CPU. In contrast to the 5-pipestage,althougheachpipestageofthe10-pipestageperformslesswork,therearemorestages. This means that in such processors, more instructions can be worked on andfinished at a time. These CPUs with their 10- or 12-stage pipeline are referred to assuperpipelined. Since they also have multiple execution units capable of working inparallel,theyarealsosuperscalar.Anotheradvantageofthesuperpipelinedconceptisthatitcanachieveahigherclockrate(frequency)withthegiventransistor technology.Theyalso usewhat is called out-of-order execution to increase the performance of theCPU.Thisisexplainednext.

Decouplingandout-of-orderexecution

InCPUarchitecture,whenoneof thepipelinestages isstalled, thepriorstagesoffetchanddecodearealsostalled.Inotherwords,thefetchstagestopsfetchinginstructionsif the execution stage is stalled, due, for example, to a delay in memory access. Thisdependency of fetch and execution has to be resolved in order to increase CPUperformance.ThatisexactlywhatmanydesignershavedonewiththeCPUandiscalleddecoupling the fetch and execution phases of the instructions. In these processors,instructionsarefetchedfrommemoryandplacedintoapoolcalledthe instructionpool.SeeFigure 7-4.

Figure7-4:CPUInstructionExecution

Thisfetch/decodeoftheinstructionsisdoneinthesameorderastheprogramwas

Page 332: ARM Assembly Language: Programming and Architecture

codedbytheprogrammer(orcompiler).However,whentheyareplacedintheinstructionpool theycanbeexecuted inanyorderas longas thedataneeded isavailable. Inotherwords,ifthereisnodependency,theinstructionsareexecutedoutoforder,notinthesameorder as the programmer coded them. Such a speculative execution can go 20–30instructionsdeepintotheprogram.Itisthejoboftheretireunittoprovidetheresultstothe programmer’s (visible) registers (e.g., R0, R1) according to the order in which theinstructionswerecoded.Again,itisimportanttonotethattheinstructionsarefetchedinthesameorderthattheywerecoded,butexecutedoutoforderifthereisnodependency,andultimatelyretiredinthesameorderastheywerecoded.Thisout-of-orderexecutioncanboostperformanceinmanycases.LookatExample7-4.

Example7-4

ThefollowingARMcode(a)setsthepointerforthreedifferentarrays,andthecountervalue,(b)getseachelementofARRAY_1,addsafixedvalueof100toit,andstorestheresultinARRAY_2,and(c)complementstheelementandstoresitinARRAY_3.Analyzetheexecutionofthecodeinlightoftheout-of-orderexecutionandbranchpredictioncapabilitiesofanARMCPU.

(i1)LDRR1,=ARRAY_1;loadpointer

(i2)LDRR2,=ARRAY_2;loadpointer

(i3)LDRR3,=ARRAY_3;loadpointer

(i4)MOVR4,#COUNT;loadthecounter

(i5)AGAINLDRR5,[R1];loadtheelement

(i6)ADDR5,R5,#100;addthefixvalue

(i7)ADDR1,R1,#4;updatethepointer

(i8)STRR5,[R2];storetheresult

(i9)ADDR2,R2,#4;updatethepointer

(i10)MVNR5,R5;complementtheresult

(i11)STRR5,[R3];andstoreit

(i12)ADDR3,R3,#4;updatethepointer

(i13)SUBSR4,R4,#4;

(i14)BNEAGAIN;stayintheloop

(i15)

Solution:

Page 333: ARM Assembly Language: Programming and Architecture

Thefetch/decodeunitfetchesandputsinstructionsintothepool.Sincethereisnodependencyforinstructionsi1throughi5,theyaredispatched,executed,andretiredexceptfori5.Noticethatthepointervaluesofi1toi4areimmediatevalues;therefore,theyareembeddedintotheinstructionwhenthefetch/decodeunitgetsthem.Nowi5isamemoryfetchthatcantakemanyclocks,dependingonwhethertheneededdataislocatedincacheormainmemory.Meanwhilei6,i8,i10,andi11mustwaituntilthedataisavailable.Howeveri7,i9,andi12canbeexecutedoutoforder.Moreimportantly,theBNEinstructionispredictedtogotothetargetaddressofAGAINandi5,i6,…aredispatchedoncemoreforthenextiteration.Thistimethememoryfetchwilltakeveryfewclockssinceinthepreviousdatafetch,theCPUreadsomebytesofdataintothecache.ThisprocesswillgoonuntilthelastroundofloopingwhereR4becomeszeroandfallsthrough.Atthistime,duetomisprediction,alltheinstructionsbelongingtoinstructionsi5,i6,i7,…(startoftheloop)areremovedandthewholepipelinerestartswithinstructionsbelongingtoi15,i16,andsoon.

Due to the fact that memory fetches (due to cachemisses) can takemany clockcyclesandresultinunderutilizationoftheCPU,out-of-orderexecutionisawayoffindingsomething to do for theCPU.Simply put, the idea of out-of-order execution is to lookdeepintothestreamofinstructionsandfindtheonesthatcanbeexecutedaheadofothers,providedthatresourcesareavailable.Again,it isimportanttonotethattheseprocessorswillnotimmediatelyprovidetheresultsofout-of-orderexecutionstoprogrammer-visibleregisterssuchasR0,R1,andsoon,sinceitmustmaintaintheoriginalorderofthecode.Instead,theresultsofout-of-orderexecutionsarestoredinthepoolandwaittoberetiredinthesameorderastheywerecoded.Therefore,programmer-visibleregistersareupdatedinthesamesequenceasexpectedbytheprogrammer.

Registerrenaming

Therearesomecases inwhich instructionsarenotreallydependentoneachotherbutthereisakindofimplicitdependencycalledregisterdependency.SeeExample7-5.

Example7-5

Forthefollowingcode,indicatetheinstructionsthatcanbeexecutedinparalleloroutoforder.

(i1)LDRR4,[R2];loadR4frommemorypointedtobyR2

(i2)ADDR3,R4,R7;R4+R7–>R3

(i3)ADDR6,R8,R10;R8+R10–>R6

(i4)SUBR5,R1,R9;R1-R9–>R5

(i5)ADDR6,R12,#1;R12+1–->R6

Page 334: ARM Assembly Language: Programming and Architecture

Solution:

Instructioni2cannotbeexecuteduntilthedataisbroughtinfrommemory(eithercacheormainmemoryDRAM).Therefore,i2isdependentoni1andmustwaituntiltheR4registerhasthedata.However,instructionsi3andi4canbeexecutedoutoforderandparallelwitheachothersincethereisnodependencyamongthem.Noticethati5isnotreallydependantoni3becausei5doesnotuseanyofdatageneratedbyi3.Buti5andi3cannotbeexecutedoutoforderorinparallelbecauseR6ismodifiedbybothofi3andi5.Thiskindofdependencyiscalledregisterdependencyandissolvedbyamethodcalledregisterrenaming.

InthefollowingcodenoneoftheinstructionscanbeexecutedinparallelbecauseofusingR1inallinstructions:

MOVR1,#5

ADDR3,R1,#2

MOVR1,#6

ADDR4,R1,#2

Ifyouexaminetheabovecodecarefullyyouwillseethatthefirsttwolinesofcodeare independent from the second two lines of code and we can remove the implicitdependencybychangingR1toanotherregistersuchasR2inthelasttwolinesofcode:

MOVR1,#5

ADDR3,R1,#2

MOVR2,#6

ADDR4,R2,#2

Renaming the registersbefore issuing the instructions toexecutionunit isdone inmanyofnewadvancedCPUanditiscalledregisterrenaming.

PuttingthemalltogetherinanARMCPU

InFigure7-5youcanseeatop-leveldiagramoftheARMCortexA9processor.Ithasmostofthepartsdiscussedinthischapter.

Page 335: ARM Assembly Language: Programming and Architecture

Figure7-5:Top-leveldiagramoftheARMCortexA9processor

Busfrequencyvs.internalfrequencyinCPU

Frequently you may see an advertisement for a 1-GHz or 2-GHz CPUs. It isimportanttonotethatthestatedfrequencyistheinternalfrequencyoftheCPUandnotthebusfrequency.Thisisduetothefactthatdesigninga1-GHzmotherboardisverydifficultandexpensive.Suchadesignrequiresaveryfastlogicfamilyandmemoryinadditiontoamassive simulation to avoid crosstalk and signal radiation. The bus frequency for suchsystemsiscurrentlylessthan1GHz.

ReviewQuestions

1.Trueorfalse.TheARMinstructionsetisintriadicform.

2.Trueorfalse.ThebranchpredictiontaskisperformedbycircuitryinsidetheCPU.

3.WhyaresomeCPUscalledasuperscalarprocessor?

4.Instructionschedulingisdoneby________.

5.Outoforderexecutionisarrangedby________.

Page 336: ARM Assembly Language: Programming and Architecture

ProblemsSection7.1:ARMPipelineEvolution

1.TheARM7usesapipelineof_______stages.

2.GivethenamesofthepipelinestagesintheARM7

3.TheARM9usesapipelineof_______stages.

4.GivethenamesofthepipelinestagesintheARM9

Section7.2:OtherCPUEnhancements

5.Thenumberofpipelinestagesinasuperpipelinesystemis________(less,more)thaninasuperscalarsystem.

6.Whichhasoneormoreexecutionunits,superpipelineorsuperscalar?

7.Whichpartofon-chipcacheintheARMiswriteprotected,dataorcode?

8.Whatisinstructionpairing,andwhencanithappen?

9.Whatisdatadependency,andhowisitavoided?

10.Trueorfalse.Instructionsarefetchedaccordingtotheorderinwhichtheywerewritten.

11.Trueorfalse.Instructionsareexecutedaccordingtotheorderinwhichtheywerewritten.

12.Trueorfalse.Instructionsareretiredaccordingtotheorderinwhichtheywerewritten.

13.ThevisibleregistersR0,R1,andsoon,areupdatedbywhichunitoftheCPU?

14.Trueorfalse.Amongtheinstructions,STRs(store)areneverexecutedoutoforder.

Page 337: ARM Assembly Language: Programming and Architecture

AnswerstoReviewQuestionsSection7.1

1.Insuperpipelining,theprocessoffetchingandexecutinginstructionsissplitintomanysmallstepsandallaredoneinparallel.Inthiswaytheexecutionofmanyinstructionsisoverlapped.

2.Thespeedoftheexecutionislimitedtothesloweststageofthepipeline.

3.Insuperscaling,theentireexecutionunithasbeendoubled

4.True

5.Fetch,decode,execute,memory,writeback

Section7.2

1.True

2.True

3.Sinceithastwoexecutionunits(pipelines)capableofexecutingtwoinstructionswithoneclock

4.circuitryinsidetheCPUandthecompiler

5.theCPU

Page 338: ARM Assembly Language: Programming and Architecture
Page 339: ARM Assembly Language: Programming and Architecture

AppendixA:ARMCortex-M3InstructionDescription

Page 340: ARM Assembly Language: Programming and Architecture
Page 341: ARM Assembly Language: Programming and Architecture

SectionA.1:ListofARMCortex-M3InstructionsADCAddwithCarry

ADCSAddwithCarry(andupdatetheflags)

ADDADD

ADDSADDandupdatetheflags

ADRLoadPC-RelativeAddress

ANDLogicalAND

ANDSLogicalAND(updateflags)

ASRArithmeticShiftright

ASRSArithmeticShiftright(updatetheflags)

BBranch(unconditionaljump)

BxxBranchConditional

BFCBitFieldClear

BFIBitFieldInsert

BICBitClear

BICSBitClear(updateflags)

BKPTBreakpoint

BLBranchwithLink(thisisCallinstruction)

BLXBranchIndirectwithLink

BXBranchIndirect(BXLRisusedforReturn)

CBNZCompareandBranchonNon-Zero

CBZCompareandBranchonZero

CDPCoprocessorDataprocessing

CLREXClearExclusive

CLZCountLeadingZero

CMNCompareNegative

CMPCompare

CPSIDChangeprocessorIDandDisableInterrupt

CPSIEChangeProcessorStateandEnableInterrupt

DMBDataMemoryBarrier

DSBDataSynchronizationBarrier

Page 342: ARM Assembly Language: Programming and Architecture

EORExclusiveOR

EORSExclusiveORandupdatetheflags

ISBInstructionSynchronizationBarrier

ITIf-ThenConditionBlock

LDCLoadCoprocessor

LDMLoadMultipleregisters

LDMDBLoadMultipleregistersandDecrementBeforeeachaccess

LDMEALoadMultipleregistersfromEmptyAscending

LDMFDLoadMultipleregistersFullDescending

LDMIALoadMultipleregistersandIncrementaftereachAccess

LDRLoadRegister

LDRRx,=ValueLoadRegisterwith32-bitvalue

LDRBLoadRegisterByte

LDRBTLoadRegisterBytewithTranslation

LDREX,LDREXB,LDREXHLoadRegisterExclusive

LDRHLoadRegisterHalfword

LDRSBLoadRegistersignedByte

LDRSHLoadRegisterSignedHalfword

LDRTLoadRegisterwithTranslation

LSLLogicalShiftLeft

LSLSLogicalShiftLeft(updatetheflags)

LSRLogicalShiftRight

LSRSLogicalShiftRight(updatetheflags)

MCRMovetoCoprocessorfromARMRegister

MLAMultiplyAccumulate

MLSMultiplyandSubtract

MOVMove(ARM7)

MOVMove(ARMCortex)

MOVSMove(andupdateflags)

MOVTMoveTop

MOVWMove16-bitconstant

Page 343: ARM Assembly Language: Programming and Architecture

MRCMovetoARMRegisterfromCoprocessor

MRSMovetogeneralRegisterfromSpecialregister

MSRMovetoSpecialregisterfromgeneralRegister

MULUnsignedMultiplication

MVNMoveNegative

MVNSMoveNegativeandupdatetheflags

NOPNoOperation

ORNLogicalORNot

ORNSORNotandupdateflags

ORRLogicalOR

ORRSLogicalORandupdatetheflags

POPPOPregisterfromStack

PUSHPUSHregisterontostack

RBITReverseBits

REVReversebyteorderinaword

RV16Reversebyteorderin16-bit

REVSHReversebyteorderinbottomhalfwordandsignextend

RORRotateRight

RORSRotateRight(updatetheflags)

RRXRotateRightwithextend

RRXSRotateRightwithextend(updatetheflags)

RSBReverseSubtract

RSBSReverseSubtractandupdatetheflags

SBCSubtractwithCarry(Borrow)

SBCSSubtractwithCarry(Borrow)andupdatetheflags

SBFXSignBitFieldextract

SDIVSignedDivide

SEVSendEvent

SMLALSignedMultiplyAccumulateLong

SMULLSignedMultiplyLong

SSATSignSaturate

Page 344: ARM Assembly Language: Programming and Architecture

STMStoreMultiple

STMDBStoreMultipleregisterandDecrementBefore

STMEAStoreMultipleregisterEmptyAscending

STMIAStoreMultipleregisterEmptyAscending

STMFDStoreMultipleregisterFullDescending

STRStoreRegister

STRBStoreRegisterByte

STRBTStoreRegisterBytewithTranslation

STRDStoreRegisterDouble(twowords)

STREX,STREXB,STREXHStoreRegisterExclusive

STRHStoreRegisterHalfword

STRTStoreRegister

SUBSubtract

SUBSSubtract

SVCsupervisorCall(SoftwareInterrupt)

SXTBSignExtendbyte

SXTHSignExtendHalfword

TBBTableBranchByte

TBHTableBranchhalfword

TEQTestEquivalence

TSTTest

UBFXUnsignedBitfiledextract

UDIVUnsignedDivide

UMLALUnsignedMultiplywithAccumulate

UMULLUnsignedMultiplyLong

USATUnsignedSaturate

UXBTZeroextendabyte

UXTHZeroextendhalfword

WFEWaitforevent

WFIWaitforinterrupt

Page 345: ARM Assembly Language: Programming and Architecture
Page 346: ARM Assembly Language: Programming and Architecture

SectionA.2:ARMCortex-M3InstructionDescription

ADCAddwithCarry

Flags:Unaffected.

Format:ADCRd,Rn,Op2;Rd=Rn+Op2+C

Function:IfC=1priortothisinstruction,thenafterexecutionofthisinstruction,Op2isaddedtoRnplus1andtheresultisplacedinRd.IfC=0,Op2isaddedtoRnplus0.Usedwidelyinmultiwordadditions.Aftertheexecutiontheflagsarenotupdated.TheADCSinstructionupdatestheflags.

Example1:

LDRR0,=0xFFFFFFFB;R0=0xFFFFFFFB

LDRR1,=0xFFFFFFFF;R1=0xFFFFFFFF

MOVR2,#3;R2=3

MOVR3,#4;R3=4

ADDSR4,R0,R1;R4=R0+R1,C=1

ADCR5,R2,R3;R5=R2+R3+C=R2+R3+1

ADCSAddwithCarry(andupdatetheflags)

Flags:Affected:N,Z,V,C.

Format:ADCRd,Rn,Op2;Rd=Rn+Op2+C

Function:IfC=1priortothisinstruction,thenafterexecutionofthisinstruction,Op2isaddedtoRnplus1andtheresultisplacedinRd.IfC=0,Op2isaddedtoRnplus0.Usedwidelyinmultiwordadditions.NoticetheSindicatestheflagswillbeupdated.

Example1:

LDRR0,=0xFFFFFFFB;R0=0xFFFFFFFB

LDRR1,=0xFFFFFFFF;R1=0xFFFFFFFF

MOVR2,#3;R2=3

MOVR3,#4;R3=4

ADDSR4,R0,R1;R4=R0+R1,C=1

ADCSR5,R2,R3;R5=R2+R3+C=3+4+1,Z=0,N=0,C=0,

ADDADD

Flags:Unaffected

Format:ADDRd,Rn,Op2;Rd=Rn+Op2

Page 347: ARM Assembly Language: Programming and Architecture

Function:Addssourceoperandstogetherandplacestheresultindestination.Thiswillnotupdatetheflags.ToupdatetheflagswemustuseADDS.

Example1:

LDRR0,=0xFFFFFFFF;R0=0xFFFFFFFB

MOVR1,#0x5;R1=0x5

ADDR2,R0,R1

;R2=R0+R1=0xFFFFFFFFB+0x5=00000000

;flagsunchanged

Example2:

LDRR0,=0xFFFFFFFF;R0=0xFFFFFFFF

ADDR2,R0,#0xF1

;R2=R0+0xF1=R1=0xFFFFFFFF+0xF1=000000F0

;flagsunchanged

ADDSADDandupdatetheflags

Flags:Affected:N,Z,V,C.

Format:ADDSRd,Rn,Op2;Rd=Rn+Op2andupdatetheflags

Function:Addsourceoperandstogetherandplacestheresultindestinationandupdatetheflags.Next,weexaminethecasesofsignedandunsignednumbers.

Unsignedaddition:

Inadditionofunsignednumbers,thestatusofC,Z,N,andVmaychange,butonlyCandZareofanyusetoprogrammers.ThemostimportantoftheseflagsisC.Itbecomes1whenthereiscarryfromD31outina32-bit(D0–D31)operation.

Example1:

MOVR1,#0x45;R1=0x45

ADDSR1,R1,#0x4F;R1=0x94(0x45+0x4F=0x94),C=0,Z=0

Example2:

MOVR2,#0xFE;R2=0xFE

MOVR3,#0x75;R3=0x75

ADDSR4,R2,R3;R4=0xFE+0x75=0x73,C=0,Z=0

Example3:

LDRR0,=0xFFFFFFFF;R0=0xFFFFFFFB

MOVR1,#0x01;R1=0x1

Page 348: ARM Assembly Language: Programming and Architecture

ADDSR2,R0,R1

;R2=R0+R1=0xFFFFFFFFF+0x1=00000000,C=1,Z=1

Example4:

LDRR0,=0xFFFF126F;R0=0xFFFF126F

LDRR1,=0xFFFF46D4;R1=0xFFFF46D4

ADDSR2,R0,R1

;R1=0xFFFF126F+0xFFFF46D5=0xFFFF5943,C=1,Z=0

Signedaddition:

Inadditionofsignednumbers,thestatusofV,Z,andNmustbenoted.Specialattentionshouldbegiventotheoverflowflag(V)sincethisindicatesifthereisanerrorintheresultoftheaddition.TherearetworulesforsettingVinsignednumberoperation.Theoverflowflagissetto1:

1.IfthereisacarryfromD30toD31andnocarryfromD31outina32-bitoperation

2.IfthereisacarryfromD31outandnocarryfromD30toD31ina32-bitoperation

NoticethatifthereisacarrybothfromD31outandfromD30toD31,thenV=0in32-bitoperations.

Example5:

MOVR1,#+8;R1=0x00000008

MOVR2,#+4;R2=0x00000004

ADDSR3,R1,R2;R3=0x0000000CN=0,V=0,C=0

NoticeN=D31=0sincetheresultispositiveandV=0sincethereisneitheracarryfromD30toD31noranycarrybeyondD31.SinceV=0,theresultiscorrect[(+8)+(+4)=(+12)].

Example6:

LDRR1,=0x+42FFFFFF;R1=0x42FFFFFF,(apositivenumber)

LDRR2,=0x+45FFFFFF;R2=0x45FFFFFF(apositivenumber)

ADDSR3,R2,R1;R3=0x88FFFFFE,C=0,N=1,Z=0,andV=1

InExample6,thecorrectresultshouldbe+,buttheresultis–.TheV=1isanindicationofthiserror.NoticethatN=D31=1sincetheresultisnegative;V=1sincethereisacarryfromD30toD31andC=0.

Example7:

LDRR0,=-12;R0=0xFFFFFFF4

LDRR1,=+17;R1=0x00000011

Page 349: ARM Assembly Language: Programming and Architecture

ADDSR2,R1,R2;R2=00000005(whichis+5andcorrect)

;N=0,Z=0,V=0,andC=1

NoticethatV=0sincethereisacarryfromD30toD31andacarryfromD31out.

Example8:

LDRR0,=-30;R0=0xFFFFFFE2

LDRR1,=+14;R2=0x0000000E

ADDSR2,R1,R0;R2=FFFFFFF0(whichis-16andcorrect)

;N=1,Z=0,V=0,andC=0

V=0sincethereisnocarryfromD31outnoranycarryfromD30toD31.

Example9:

LDRR1,=-126;R1=0xFFFFFF82

LDRR2,=-127;R2=0xFFFFFF81

ADDSR3,R2,R1;R3=0xFFFFFF03

;(whichis-253andcorrect),N=1,Z=0andV=0,C=1

V=0sincethereiscarryfromD31outandcarryfromD30toD31.

ADRLoadPC-RelativeAddress

Flags:Unaffected:

Format:ADRRd,label;Rd=addressoflabel

Function:ThisallowsloadingintoRdregisteranaddressrelativetothecurrentPC(programcounter).Thelabeltargetaddressmustbewithinthe-4,095to+4,096bytesfromtheaddressinPCregister.Thatisnofartherthan1024instructionsineitherdirectionofbackwardorforward.

Example:

ADRR3,MyMessage

HEREBHERE

MyMessageDCB“Hello”

ANDLogicalAND

Flags:Unaffected

Format:ANDRd,Rn,Op2;Rd=RnANDedOp2

Function:PerformslogicalANDontheoperands,bitbybit,storingtheresultinthedestination.Thiswillnotupdatetheflags.ToupdatetheflagswemustuseANDS.NoticethatCflagisupdatedduringcalculationofOp2whenLSRorLSLareused.

Page 350: ARM Assembly Language: Programming and Architecture

Inputs Output

X Y XANDY

0 0 0

0 1 0

1 0 0

1 1 1

Example1:

MOVR0,#0x39;R0=0x39

MOVR1,#0x0F;R1=0x0F

ANDR2,R1,R0;R2=09

;3900111001

;0F00001111

;—–––

;0900001001Flagsunchanged

Example2:

MOVR0,#0x37;R0=0x37

ANDR1,R0,#0x0F;R1=R0ANDed0x0F=07

;3700110111

;0F00001111

;—–––

;0700000111Flagsunchanged

ANDSLogicalAND(updateflags)

Flags:Affected:N,Z,C

Format:ANDRd,Rn,Op2;Rd=RnANDedOp2

Function:PerformslogicalANDontheoperands,bitbybit,storingtheresultinthedestination.Thiswillalsoupdatetheflags.NoticethatCflagisupdatedduringcalculationofOp2whenLSRorLSLareused.

Inputs Output

X Y XANDY

Page 351: ARM Assembly Language: Programming and Architecture

0 0 0

0 1 0

1 0 0

1 1 1

Example1:

MOVR0,#0x39;R0=0x00000039

MOVR1,#0x0F;R1=0x0000000F

ANDSR3,R1,R0;R3=0x00000009

;3900111001

;0F00001111

;—–––

;0900001001Z=0,N=0,C=unchanged

Example2:

MOVR1,#0x39;R1=0x00000039

MOVR2,#0xC6;R2=0x000000C6

ANDSR3,R1,R2;R3=00000000

;3900111001

;C611000110

;—–––

;0000000000Z=1,N=0

Example3:

LDRR2,=0xFFFFFF82

LDRR3,=0xFFFFFF81

ANDSR4,R2,R3;R4=0xFFFFFF80,Z=0,N=1

Example4:

LDRR1,=0x55555555

LDRR2,=0xAAAAAAAA

ANDSR3,R2,R1;R3=00000000Z=1,N=0

ASRArithmeticShiftright

Flags:Unaffected.ExceptC

Page 352: ARM Assembly Language: Programming and Architecture

Format:ASRRd,Rm,Rn

Function:AseachbitofRmregisterisshiftedright,theLSBisremovedandtheemptybitsfilledwiththesignbit(MSB).ThenumberofbitstobeshiftedrightisgivenbyRnandtheresultisplacedinRdregister.Theflagsareunchanged.ToupdatetheflagsuseASRSinstruction.

Example1:

LDRR2,=0xFFFFFF82

ASRR0,R2,#6;R0=R2isshiftedright6times

;now,R0=0xFFFFFFFE

Example2:

LDRR0,=0x2000FF18

MOVR1,#12

ASRR2,R0,R1;R2=R0isshiftedrightR1numberoftimes.

;now,R2=0x0002000F

Example3:

LDRR0,=0x0000FF18

MOVR1,#16

ASRR2,R0,R1;R2=R0isshiftedrightR1numberoftimes

;now,R2=0x00000000

ASRarithmeticshiftisusedforsignednumbershifting.ASRessentiallydividesRmbyapowerof2foreachbitshift.

ASRSArithmeticShiftright(updatetheflags)

Flags:N,Z,andC.

Format:ASRRd,Rm,Rn

Function:AseachbitofRmregisterisshiftedright,theLSBiscopiedtoCflagandtheemptybitsfilledwiththesignbit(MSB).ThenumberofbitstobeshiftedrightisgivenbyRnandtheresultisplacedinRdregister.TheASRSupdatestheflags.

Example1:

LDRR2,=0xFFFFFF82

ASRSR0,R2,#6;R0=R2isshiftedright6times.

;now,R0=0xFFFFFFFE,C=0,N=1,Z=0

Example2:

Page 353: ARM Assembly Language: Programming and Architecture

LDRR0,=0x2000FF18

MOVR1,#12

ASRSR2,R0,R1;R2=R0isshiftedrightR1numberoftimes.

;now,R2=0x0002000F,C=1,N=0,Z=0

Example3:

LDRR0,=0x0000FF18

MOVR1,#16

ASRSR2,R0,R1;R2=R0isshiftedrightR1numberoftimes.

;now,R2=0x00000000,C=1,N=0,Z=1

ASRSarithmeticshiftisusedforsignednumbershifting.ASRSessentiallydividesRmbyapowerof2foreachbitshift.

BBranch(unconditionaljump)

Flags:Unchanged.

Format:Btarget;jumptotargetaddress

Function:Thisinstructionisusedtotransfercontrolunconditionallytoanewaddress.ThedifferencebetweenBandBListhattheBLinstructionsavestheaddressofthenextinstructiontoLR(thelinkregister,R14).ForARM7,thetargetaddressiscalculatedby(a)shiftingthe24-bitsigned(2’scomp)offsetlefttwobits,(b)sign-extendtheresultto32-bit,and(c)addittocontentsofPC(programcounter).Thismeansthetargetaddresscouldbewithinthe–32Mbytesto+32Mbytesofaddressspacefromthecurrentprogramcounter.ForARMCortexM3.thetargetaddressmustbewithin–16MBto+16MBaddressspacefromcurrentinstruction.

BxxBranchConditional

Flags:Unaffected.

Format:Bxxtarget;jumptotargetuponcondition

Function:Usedtojumptoatargetaddressifcertainconditionsaremet.InARM7,thetargetaddresscannotbemorethan–32MBto+32MBbytesaway.ForARMCortexM3.thetargetaddressmustbewithin–16MBto+16MBaddressspacefromcurrentinstruction.Theconditionsareindicatedbytheflagregister.Theconditionsthatdeterminewhetherthejumptakesplacecanbecategorizedintothreegroups:

1.flagvalues,

2.thecomparisonofunsignednumbers,and

3.thecomparisonofsignednumbers.

Eachisexplainednext.

1.“Bcondition”wheretheconditionreferstoflagvalues.Thestatusofeachbitof

Page 354: ARM Assembly Language: Programming and Architecture

theflagregisterhasbeendecidedbyexecutionofinstructionspriortothejump.Thefollowing“Bcondition”instructionscheckifacertainflagbitisraisedornot.

Instruction Condition

BCS BranchifCarrySet jumpifC=1

BCC BranchifCarryClear jumpifC=0

BEQ BranchifEqual jumpifZ=1

BNE BranchifNotEqual jumpifZ=0

BMI BranchifMinus/Negative jumpifN=1

BPL BranchifPlus/Positive jumpifN=0

BVS BranchifOverflow jumpifV=1

BVC BranchifNooverflow jumpifV=0

2.“Bcondition”wheretheconditionreferstothecomparisonofunsignednumbers.Afteracompare(CMPRn,Op2)instructionisexecuted,CandZindicatetheresultofthecomparison,asfollows:

C Z

Rn>Op2 1 0

Rn=Op2 1 1

Rn<Op2 0 0

Sincetheoperandscomparedareviewedasunsignednumbers,thefollowing“Bcondition”instructionsareused.

Instruction Condition

BHI BranchifHigher jumpifC=1andZ=0

BEQ BranchifEqual jumpifC=1andZ=1

BLS BranchifLowerorsame jumpifC=0orZ=1

Inreality,the“CMPRn,Op2”isasubtractinstruction(Rn-Op2).Afterthesubtractiontheresultisdiscardedandflagsarechangedaccordingtotheresult.NoticeinARMthesubtractaffectstheCflagsettingdifferentlyfromthex86andotherCPUs.SeetheSUB

Page 355: ARM Assembly Language: Programming and Architecture

instruction.

3.“Bcondition”wheretheconditionreferstothecomparisonofsignednumbers.Inthecaseofthesignednumbercomparison,althoughthesameinstruction,“CMPRn,Op2”,isused,theflagsusedtochecktheresultareasfollows:

Rn>Op2 V=NorZ=0

Rn=Op2 Z=1

Rn<Op2 VinverseofN

Consequently,the“Bcondition”instructionsusedaredifferent.Theyareasfollows:

Instruction

BGE BranchGreaterorEqualjumpifN=1andV=1orN=0andV=0

(V=N)

BLT BranchLessthanjumpifN=1andV=0orN=0andV=1

(NnotequaltoV)

BGT BranchGreaterthanjumpifZ=0andeitherN=1andV=1orN=0and

V=0

(N=V)

BLE BranchLessorEqualjumpifZ=1orN=1andV=0.OrN=0andV=1

(Z=1orNnotequaltoV)

BEQ BranchifEqual jumpifZ=1

All“Bcondition”instructionsareshortjumps,meaningthatthetargetaddresscannotbemore than -32Mbytesbackwardor+32Mbytes forward from thePCof the instructionfollowingthejump.InARMCortexM3itis16MBineachdirection.Whathappensifaprogrammerneedstousea“Bcondition”togotoatargetaddressbeyondthe-32MBto+32MB range?The solution is to use the “BX condition,Rm” sinceRm can be 32-bitaddressandcoverstheentire4GBaddressspaceoftheARM.Thisisshownnext.

LDRR4,=MYTARGET

ADDSR1,R2,R3

BXEQR4;branchtoaddressheldbyR4ifZ=1

MYTRGTSUBSR7,#4

NOP

NOP

Page 356: ARM Assembly Language: Programming and Architecture

….

C Z N V

Rn>Op2 0 0 0 N

Rn=Op2 0 1 0 N

Rn<Op2 1 0 1 InverseofN

BFCBitFieldClear

Flags:Unaffected.

Format:BFCRd,#LSB,#Width

Function:ClearsselectedbitsofRd.ThestartlocationoftheRdbitisindicatedby#LSBandmustbeintherangeof0–31.Howmanybitsshouldbeclearedisindicatedby#Widthandmustbeintherangeof1–32.

Example1:

LDRR1,=0xFFFFFFFF;R1=0xFFFFFFFF

BFCR1,#2,#14;nowR1=0xFFFF0003

Example2:

LDRR2,=0x999999999;R2=0x99999999

BFCR2,#8,#24;nowR2=0x00000099

BFIBitFieldInsert

Flags:Unaffected.

Format:BFIRd,Rn,#LSB,#Width

Function:SelectedbitsofRnarecopiedtoRd.ThestartlocationoftheRdbitisindicatedby#LSBandmustbeintherangeof0–31.Howmanybitsshouldbecopiedisindicated by #Width andmust be in the range of 1–32. The start bit location of Rn isalwaysbit0(D0).

Example:

LDRR1,=0xABCDABCD;R1=0xABCDABCD

LDRR2,=0x12345678;R2=0x12345678

BFIR1,R2,#4,#8;nowR1=0xABCDA78D

BICBitClear

Flags:Unaffected.

Page 357: ARM Assembly Language: Programming and Architecture

Format:BICRd,Rn,Op2;Rd=RnANDedwithNOTofOp2

Function:SelectedbitsofRnareclearedandplacedinRd.TheOp2providesthebitsselection.IftheselectedbitsinOp2arehighthencorrespondingbitsinRnareclearedandtheresultisplacedinRd.IftheselectedbitsinOp2arelowthecorrespondingbitsinRnareleftunchangedandtheresultisplacedinRd.Inreality,theBICperformstheANDoperation on the bits ofRnwith the complement of the bits inOp2. TheBICwill notupdatetheflags.ToupdatetheflagswemustuseBICS.

Inputs Output

X Y XAND(NOTY)

0 0 0

0 1 0

1 0 1

1 1 0

Example:

LDRR1,=0xFFFFFF00;R1=0xFFFFFF00

LDRR2,=0x99999999;R2=0x9999999

BICR3,R2,R1;nowR3=0x00000099

BICSBitClear(updateflags)

Flags:NandZ.

Format:BICSRd,Rn,Op2;Rd=RnANDedwithNOTofOp2

Function:SelectedbitsofRnareclearedandplacedinRd.TheOp2providestheselectedbits.IftheselectedbitsinOp2ishighthenthecorrespondingbitsinRnisclearedandtheresultisplacedinRd.IftheselectedbitsinOp2islowthemthecorrespondingbitsinRnisleftunchangedandtheresultisplacedinRd.Inreality,theBICSperformstheANDoperationonthebitsofRnwiththecomplementofthebitsinOp2andupdatestheflags.

Inputs Output

X Y XAND(NOTY)

0 0 0

0 1 0

Page 358: ARM Assembly Language: Programming and Architecture

1 0 1

1 1 0

Example1:

LDRR1,=0xFFFFFF00;R1=0xFFFFFF00

LDRR2,=0x99999999;R2=0x9999999

BICSR3,R2,R1;nowR3=0x00000099,N=0,Z=0

Example2:

LDRR0,=0xFFFFFFFF;R0=0xFFFFFFFF

LDRR1,=0x01234567;R1=0x01234567

BICSR2,R1,R0;nowR2=0,N=0,Z=1

Example3:

MOVR0,#0;R0=0

LDRR1=0x99999999;R1=0x99999999

BICSR2,R1,R0;nowR2=0x99999999,N=1,Z=0

BKPTBreakpoint

Flags:Unaffected.

Format:BKPT#imme_value

Function:usedbycompilertoinsertbreakpointintoprograms.UponexecutionoftheBKPTinstructiontheprogramenterstheDebugmode.SeeyourARM compilerformoreinformation

BLBranchwithLink(thisisCallinstruction)

Flags:Unchanged.

Format:BLSubroutine_Addr;transfercontroltoasubroutine

Function:Transferscontroltoasubroutine.ThisinstructionsavestheaddressoftheinstructionaftertheBLinR14(linkregister).Attheendofthesubroutinethecontrolto the instruction after theBL is achieved by copying the LR (R14) register to PC. InARM7,thetargetaddresscannotbemorethan–32MBto+32MBbytesaway.ForARMCortexM3. the target address must bewithin –16MB to +16MB address space fromcurrentinstruction.

Example:

LDRR7,=20000000

BLDELAY;CallsubroutineMY_DELAY

Page 359: ARM Assembly Language: Programming and Architecture

ADDR3,#4;addressofthisinstructionissavedinR14

DELAYSUBSR7,#4

NOP

NOP

MOVPC,R14;Return,couldhaveused“BXLR”instruction

BLXBranchIndirectwithLink

Flags:Unaffected.

Format:BLXRm;transfercontroltoasubroutinewhose

;addressisgivenbyRm

Function: Transfers control to a subroutinewhoseaddress isgivenby theRmregister. This instruction saves the address of the instruction after the BL in R14 (linkregister). At the end of the subroutine the control to the instruction after the BL isachieved by copying the LR (R14) register to PC. One can use “BX LR” as returninstruction. Notice the difference between this instruction and “BL Target_Addr”instruction. In the “BL Target_Addr” instruction the target address of the subroutine isgiven right there. However, in the “BLX Rm” instruction, the target address of thesubroutineisheldbyregisterRm.

Example:

ADRR2,DELAY

BLXR2;CallsubroutinepointedtobyR2

ADDR3,#4;addressofthisinstructionissavedinR14

DELAYSUBSR1,#4

NOP

NOP

BXLR;return

BXBranchIndirect(BXLRisusedforReturn)

Flags:Unchanged.

Format:BXRm;BXLRisusedforReturnfromasubroutine

Function:Themostwidelyusageofthisinstructionisintheformof“BXLR”for

Page 360: ARM Assembly Language: Programming and Architecture

thepurposeofreturninstructionattheendofsubroutine.

Example:

LDRR1,=20000000

BLDELAY;CallsubroutineMY_DELAY

ADDR3,#4;addressofthisinstr.issavedinR14

DELAYSUBSR1,#4

NOP

NOP

BXLR;returntocaller

CBNZCompareandBranchonNon-Zero

Flags:Unchanged.

Format:CBNZRn,Target

Function:TransferscontroltothetargetlocationifRnisnotequaltozero.TheRnmustbeintherangeofR0–R7andtargetaddresscannotbefartherthan130bytesawayfromtheinstruction.ThisinstructioncomparestheRnwithzeroandjumpsonlyifRnisnotzero.Thecomparisonhasnoeffectonflags.Thiscanbeusedforloopsinwhichthebodyoftheloopisnomorethan20instructions.

Example1:

MOVR1,#10;R1=10

L1NOP

NOP

NOP

SUBR1,R1,1;R1=R1-1

CBNZR1,L1

CBZCompareandBranchonZero

Flags:Unaffected.

Format:CBZRn,Target

Function:TransferscontroltothetargetlocationifRniszero.TheRnmustbeinthe rangeofR0–R7and target address cannot be farther than130bytes away from theinstruction.ThisinstructioncomparestheRnwithzeroandjumpsonlyifRniszero.Thecomparisonhasnoeffectonflags.Thiscanbeusedtotestaregistervalueafterreadinga

Page 361: ARM Assembly Language: Programming and Architecture

port.

Example1:

LDRR0,=MYPORT_ADR;R0=MYPORTaddress

HERELDRR2,[R0];readfromMYPORT

CBZR2,HERE;keepreadingMYPORTuntilitiszero

CDPCoprocessorDataprocessing

SeeARMCortex-MManual.

CLREXClearExclusive

SeeARMCortex-MManual.

CLZCountLeadingZero

Flags:Unchanged.

Format:CLZRd,Rn

Function:ScanstheRnregistercontentsfrommostsignificantbit(D31)towardleastsignificantbit(D0)untilitfindthefirstHIGH.ThenumberofbinaryzerobitsbeforeitencountersthefirstbinaryHIGHisplacedinRd.

Example:

LDRR3,=0x01FFFFFF

CLZR1,R3;R1=7sincethereare7zerosbeforethefirstbinary1

CMNCompareNegative

Flags:Affected:V,N,Z,C.

Format:CMNRn,Op2;setsflagsasif“Rn+Op2”

;Notice,theRn-(-Op2)=Rn+Op2

Function:ComparesRnregistervaluewiththenegativeofOp2value.ThisisdonebyRn-(negativeofOp2)whichisRn-(-Op2)=Rn+Op2.TheRnandOp2operandsarenotaltered. In other words, the CMN adds the Op2 to Rn (Rn+Op2) and sets the flagsaccordingly.ThisisthesameasADDSinstructionexcepttheoperandsareunchangedandtheresultisdiscarded.SeeBxxinstructionforpossiblecasesofcomparison.

CMPCompare

Flags:Affected:V,N,Z,C.

Format:CMPRn,Op2;setsflagsasif“Rn-Op2”

Function:Comparestwooperands.Theoperandsarenotaltered.PerformscomparisonbysubtractingtheOp2operandfromtheRnandupdatesflagsasifSUBSwereperformed.AswecanseeinSUBS,theCMPperformtheoperationofRn+2’s

Page 362: ARM Assembly Language: Programming and Architecture

compofOp2andsetstheflagsaccordingtotheresult.SeeBxxinstructionforpossiblecasesofcomparison.

CPSIDChangeprocessorIDandDisableInterrupt

Flags:Unaffected

Format:CPSIDiflag;iflagisiinPRIMASKorfinFAULTMASK

Function:UsedfordisablingtheinterruptflagsinPRIMASKorFAULTMASKregisters.SeeARMCortexmanual.

CPSIEChangeProcessorStateandEnableInterrupt

Flags:Unaffected

Format:CPSIEiflag;iflagisiinPRIMASKorfinFAULTMASK

Function:UsedforenablingtheinterruptflagsinPRIMSKorFAULTMASKregisters.SeeARMCortexmanual.

DMBDataMemoryBarrier

Flags:Unaffected

Format:DMB

Function:ItmakessurethatalltheexplicitmemoryaccessespriortoDMBinstructionarecompletedbeforetheexplicitmemoryaccessesaftertheDMB.SeeARMCortexmanual.

DSBDataSynchronizationBarrier

Flags:Unaffected

Format:DSB

Function:ItmakessurethatalltheexplicitmemoryaccessespriortoDSBinstructionarecompletedbeforetheDSBinstructionisexecuted.SeeARMCortexmanual.

EORExclusiveOR

Flags:Unaffected

Format:EORRd,Rn,Op2

Function:PerformslogicalEx-ORontheRnandOp2operands,bitbybit,storingtheresultintheRd.Thiswillnotupdatetheflags.UseEORSinstructiontoupdatestheflags.

Inputs Output

X Y XEORY

0 0 0

Page 363: ARM Assembly Language: Programming and Architecture

0 1 1

1 0 1

1 1 0

Example1:

MOVR0,#0xAA;R0=0xAA

EORR2,R0,#0xFF;now,R2=0x55

;AA10101010

;FF11111111

;—–––

;5501010101flagsunchanged

Example2:

LDRR0,=0xAAAAAAAA;R0=0xAAAAAAAA

LDRR1,=0x55555555;R1=0x55555555

EORR2,R1,R0;R2=0xFFFFFFFF

;AA10101010

;5501010101

;—–––

;FF11111111flagsunchanged

The“EORRd,Rx,Rx”canbeusedtoclearRd.

Example3:

MOVR1,#0x55

EORR2,R1,R1;R2=0

;5501010101

;5501010101

;—–––

;0000000000flagsunchanged

TocomplementthebitsofRn,EX-ORitwith0xFF.

Example4:

LDRR0,=0xAAAAAAAA;R0=0xAAAAAAAA

LDRR1,=0xFFFFFFFF;R1=0xFFFFFFFF

Page 364: ARM Assembly Language: Programming and Architecture

EORR2,R1,R0;R2=0x55555555

;AA10101010

;FF11111111

;—–––

;5501010101flagsunchanged

EORSExclusiveORandupdatetheflags

Flags:Affected:C,V,N,Z

Format:EORSRd,Rn,Op2

Function:PerformslogicalEx-ORontheRnandOp2operands,bitbybit,storingtheresultintheRd.Aftertheexecutiontheflagsareupdated.

Inputs Output

X Y XEORY

0 0 0

0 1 1

1 0 1

1 1 0

Example1:

MOVR0,#0xAA;R0=0xAA

EORSR2,R0,#0xFF;now,R2=0x55

;AA10101010

;FF11111111

;—–––

;5501010101N=0,C=0,Z=0,V=0

Example2:

LDRR0,=0xAAAAAAAA;R0=0xAAAAAAAA

LDRR1,=0x55555555;R1=0x55555555

EORSR2,R1,R0;R2=0xFFFFFFFF

;AA10101010

;5501010101

;—–––

Page 365: ARM Assembly Language: Programming and Architecture

;FF11111111N=1,C=0,Z=0,V=0

The“EORRd,Rx,Rx”canbeusedtoclearRd.

Example3:

MOVR1,#0x55

EORSR2,R1,R1;R2=0x0

;5501010101

;5501010101

;—–––

;0000000000N=0,C=0,Z=1,V=0

ISBInstructionSynchronizationBarrier

Flags:Unaffected.

Format:ISB

Function:ItflushesthepipelinetomakesuretheinstructionsexecutedrightaftertheISBinstructionarefetchedfreshfromthecacheormemory.

ITIf-ThenConditionBlock

Flags:Unaffected

Format:SeeARMmanual

Function:ItallowstheexecutionofuptofourinstructionaftertheITtobeconditional.

LDCLoadCoprocessor

SeetheARMManual

LDMLoadMultipleregisters

Flags:Unaffected.

Format:LDMRn,{Rx,Ry,..}

Function: Loadsintoregistersfromconsecutivememorylocations.ThestartingaddressofmemorylocationisgivenbyRnregister.Thedestinationregistersseparatedbycommaandplacedinbraces.IntheARMCortex,thestackisdescendingmeaningthatasinformationispushedontostackthestackpointerisdecremented.ThisIA(IncrementtheaddressaftereachAccess)isthedefaultforloading(Poping).ThisinstructioniswidelyusedforPoping(loading)multiplewordsfromdescendingstackintoCPUregisters.

Example:

;Assumethefollowingmemorylocationswiththecontents:

;12000=(46)

Page 366: ARM Assembly Language: Programming and Architecture

;12001=(10)

;12002=(38)

;12003=(82)

;12004=(56)

;12005=(50)

;12006=(58)

;12007=(15)

;12008=(63)

;12009=(60)

;1200A=(68)

;1200B=(39)

;1200C=(79)

;1200D=(70)

;1200E=(75)

;1200F=(92)

LDRR7,=0x12000

LDMR7,{R0,R2,R4}

;now,R0=0x82381046,R2=0x15585056,…

;thecontentsofmemorylocations0x12000-0x12003are

;movedtoregisterR0,andthecontentsofmemory

;locations0x12004-0x12007aremovedtoregister

;R2,andsoon.ThereforewehaveR0=0x82381046,

;R2=0x15585056,andR4=0x39686063.

LDMDBLoadMultipleregistersandDecrementBeforeeachaccess

Flags:Unaffected.

Format:LDMDBRn,{Rx,Ry,…}

Function:ThisisthesameasLDMEA(loadmultipleregistersfromEmptyAscending)usedforcasesinwhichthestackisascending.SeeLDMEAinstruction.

LDMEALoadMultipleregistersfromEmptyAscending

Flags:Unaffected.

Page 367: ARM Assembly Language: Programming and Architecture

Format:LDMEARn,{Rx,Ry,…}

Function:Loadsintoregistersfromconsecutivememorylocations.ThestartingaddressofmemorylocationisgivenbyRnregister.Thedestinationregistersseparatedbycommaandplacedinbraces.IntheARMCortex,thedefaultforstackisdescendingmeaningthatasinformationarepushedontostackthestackpointerisdecremented.TheIA(IncrementtheaddressaftereachAccess)isthedefault.IfwechangethedefaultofdescendingstacktoascendingstackthenwehavetousetheEA(EmptyAscending).Theascendingstackmeansasinformationarepushedontostackthestackpointerisincremented.TheLDMEAisusedforPoping(loading)multiplewordsfromascendingstackintoCPUregisters.

LDMFDLoadMultipleregistersFullDescending

Flags:Unaffected.

Format:LDMFDRn,{Rx,Ry,..}

Function:ThisisthesameasLDMandLDMIA.

LDMIALoadMultipleregistersandIncrementaftereachAccess

Flags:Unaffected.

Format:LDMRn,{Rx,Ry,..}

Function: This is the same as the LDM instructions. In theARMCortex, the stack isdescending meaning that as information are pushed onto stack the stack pointer isdecremented.ThisIA(IncrementtheaddressaftereachAccess)isthedefault.WeusethisforPoping(loading)multiplewordsfromdescendingstackintoCPUregisters.

LDRLoadRegister

Flags:Unaffected.

Format:LDRRd,[Rx];loadintoRdawordfrommemory

;locationpointedtobeRx

Function: Loadsintodestinationregisterthecontentsoffourmemorylocations.The [Rx]points to addressofmemory location.This iswidelyused to load32-bit datafrommemoryintoRdregisteroftheARMsinceinthe“MOVRd,#immediate_value”theimmediatevaluecannotbelargerthan0xFF.

Example:

;Assumethefollowingmemorylocationswiththecontents:

;12000=(46)

;12001=(10)

;12002=(38)

;12003=(82)

LDRR0,=0x12000

Page 368: ARM Assembly Language: Programming and Architecture

LDRR1,[R0]

;now,R1=82381046.

LDRRx,=ValueLoadRegisterwith32-bitvalue

Flags:Unaffected.

Format:LDRRd,=32_bit_value;loadRdwith32-bitvalue

Function:Loadsintodestinationregistera32-bitimmediatevalue.Thisiswidelyused to load 32-bit immediate value into Rd register of the ARM since in the “MOVRd,#immediate_value”theimmediatevaluecannotbelargerthan0xFF.

Example:

LDRR0,=0x1200000;R0=0x1200000

LDRR1,=0x2FFFF;R1=0x2FFFF

LDRR0,=0xFFFFFFFF;R0=0xFFFFFFFF

LDRR1,=200000000;R1=200000000

LDRBLoadRegisterByte

Flags:Unaffected.

Format:LDRBRd,[Rx];loadintoRdabytefrommemory

;locationpointedtobeRx

Function:LoadsintodestinationregisterthecontentsofasinglememorylocationindicatedbyRx.

Example:

;Assumethefollowingmemorylocationswiththecontents:

;12000=(46)

;12001=(10)

;12002=(38)

;12003=(82)

LDRR0,=0x12000

LDRBR1,[R0]

;now,R0=00000046

LDRBTLoadRegisterBytewithTranslation

Flags:Unaffected.

Format:LDRBTRd,[Rx]

Function:LoadsintoRdregisterabytefrommemorylocationpointedtobyRx

Page 369: ARM Assembly Language: Programming and Architecture

andzero-extendsthebyteto32-bitword.Thatmeansazeroiscopiedtoalltheupper24bitsoftheRdregister.Usedforunprivilegedmemoryaccess.

Example:

;Assumethefollowingmemorylocationswiththecontents:

;12000=(46)

;12001=(10)

;12002=(38)

;12003=(82)

LDRR0,=0x12000

LDRBTR1,[R0];now,R1=00000046

LDREX,LDREXB,LDREXHLoadRegisterExclusive

Function:TheyareusedwithSTREX,STREXB,andSTREXHinstructionstoperformCPUsynchronizationoperation.SeeARMCortexmanual.

LDRHLoadRegisterHalfword

Flags:Unaffected.

Format:LDRHRd,[Rx];loadintoRda2-bytefrommemory

;locationpointedtobeRx

Function:Loadsintodestinationregisterthecontentsofthetwoconsecutivememorylocations(halfword)indicatedbyRx.

Example:

;Assumethefollowingmemorylocationswiththecontents:

;12000=(46)

;12001=(10)

;12002=(38)

;12003=(82)

LDRR0,=0x12000

LDRHR1,[R0]

;now,R0=00001046

LDRSBLoadRegistersignedByte

Flags:Unaffected.

Format:LDRSBRd,[Rx]

Page 370: ARM Assembly Language: Programming and Architecture

Function:LoadsintoRdregisterabytefrommemorylocationpointedtobyRxandsign-extendsthebyteto32-bitword.Thatmeansthesign(D7)ofthebyteiscopiedtoalltheupper24bitsoftheRdregister.

Example1:

;Assumethefollowingmemorylocationswiththecontents:

;12000=(85)

;12001=(10)

;12002=(38)

;12003=(82)

LDRR0,=0x12000

LDRBR1,[R0];nowR1=FFFFFF85becauseMSBof85is1

Example2:

;Assumethefollowingmemorylocationswiththecontents:

;12000=(15)

;12001=(20)

;12002=(3F)

;12003=(82)

LDRR0,=0x12000

LDRBR1,[R0];now,R1=00000015becauseMSBof15is0

LDRSHLoadRegisterSignedHalfword

Flags:Unaffected.

Format:LDRSHRd,[Rx]

Function:LoadsintoRdregisterahalf-word(2-byte)frommemorylocationpointedtobyRxandsign-extendsitto32-bitword.Thatmeansthesign(D15)ofthe16-bitoperandiscopiedtoalltheupper16bitsoftheRdregister.

Example1:

;Assumethefollowingmemorylocationswiththecontents:

;12000=(46)

;12001=(F3)

;12002=(38)

;12003=(82)

Page 371: ARM Assembly Language: Programming and Architecture

LDRR0,=0x12000

LDRBR1,[R0];now,R0=FFFFF346becauseMSBofF3is1

Example2:

;Assumethefollowingmemorylocationswiththecontents:

;12000=(4F)

;12001=(23)

;12002=(18)

;12003=(B2)

LDRR0,=0x12000

LDRBR1,[R0];now,R1=0000234FbecauseMSBof23is0

LDRTLoadRegisterwithTranslation

Flags:Unaffected

Format:LDRTRd,[Rx]

Function:LoadsintoRdregisterabytefrommemorylocationpointedtobyRxandzero-extendsthebyteto32-bitword.Thatmeansazeroiscopiedtoalltheupper24bitsoftheRdregister.Usedforunprivilegedmemoryaccess.

Example:

;Assumethefollowingmemorylocationswiththecontents:

;12000=(46)

;12001=(10)

;12002=(38)

;12003=(82)

LDRR0,=0x12000

LDRBR1,[R0];now,R1=00000046

LSLLogicalShiftLeft

Flags:Unaffected.

Format:LSLRd,Rm,Rn

Function:AseachbitofRmregisterisshiftedleft,theMSBisremovedandtheemptybitsarefilledwithzeros.ThenumberofbitstobeshiftedleftisgivenbyRnandtheresultisplacedinRdregister.TheLSLdoesnotupdatestheflags.

Page 372: ARM Assembly Language: Programming and Architecture

Example1:

LDRR2,=0x00000010

LSLR0,R2,#8;R0=R2isshiftedleft8times

;now,R0=0x00001000,flagsnotchanged

Example2:

LDRR0,=0x00000018

MOVR1,#12

LSLR2,R0,R1;R2=R0isshiftedleftR1numberoftimes

;now,R2=0x000018000,flagsnotchanged

Example3:

LDRR0,=0x0000FF18

MOVR1,#16

LSLR2,R0,R1;R2=R0isshiftedleftR1numberoftimes

;now,R2=0xFF180000,flagsnotchanged

Thelogicalshiftleftusedforunsignednumbershifting.LSLessentiallymultipliesRmbyapowerof2foreachbitshift.

LSLSLogicalShiftLeft(updatetheflags)

Flags:Affected.

Format:LSLSRd,Rm,Rn

Function:AseachbitofRmregisterisshiftedleft,theMSBiscopiedtoCflagandtheemptybitsarefilledwithzeros.ThenumberofbitstobeshiftedleftisgivenbyRnandtheresultisplacedinRdregister.TheLSLSupdatestheflags.

Example1:

LDRR2,=0x00000010

LSLSR0,R2,#8;R0=R2isshiftedleft8times

;now,R0=0x00001000,C=0,N=0,Z=0

Example2:

LDRR0,=0x00000018

MOVR1,#12

LSLSR2,R0,R1;R2=R0isshiftedleftR1numberoftimes

;now,R2=0x000018000,C=0,N=0,Z=0

Example3:

Page 373: ARM Assembly Language: Programming and Architecture

LDRR0,=0x000FFF18

MOVR1,#16

LSLSR2,R0,R1;R2=R0isshiftedleftR1numberoftimes

;now,R2=0xFF180000,C=1,Z=0,N=0

Thelogicalshiftleftusedforunsignednumbershifting.LSLSessentiallymultipliesRmbyapowerof2foreachbitshift.

LSRLogicalShiftRight

Flags:Unaffected.

Format:LSRRd,Rm,Rn

Function:AseachbitofRmregisterisshiftedright,theLSBisremovedandtheemptybitsarefilledwithzeros.ThenumberofbitstobeshiftedleftisgivenbyRnandtheresultisplacedinRdregister.TheLSRdoesnotupdatetheflags.

Example1:

LDRR2,=0x00001000

LSRR0,R2,#8;R0=R2isshiftedright8times

;now,R0=0x00000010,C=0

Example2:

LDRR0,=0x000018000

MOVR1,#12

LSRR2,R0,R1;R2=R0isshiftedrightR1numberoftimes

;now,R2=0x00000018,C=0

Example3:

LDRR0,=0x7F180000

MOVR1,#16

LSRR2,R0,R1;R2=R0isshiftedrightR1numberoftimes

;now,R2=0x00007F18,C=0

Thelogicalshiftrightusedforshiftingunsignednumbers.LSRessentiallydividesRmbyapowerof2foreachbitshift.

LSRSLogicalShiftRight(updatetheflags)

Flags:Affected.

Page 374: ARM Assembly Language: Programming and Architecture

Format:LSRSRd,Rm,Rn

Function:AseachbitofRmregisterisshiftedright,theLSBiscopiedtoCflagandtheemptybitsarefilledwithzeros.ThenumberofbitstobeshiftedleftisgivenbyRnandtheresultisplacedinRdregister.TheLSRSupdatestheflags.

Example1:

LDRR2,=0x00001FFF

LSRSR0,R2,#8;R0=R2isshiftedright8times

;now,R0=0x0000001F,C=1,N=0,Z=0

Example2:

LDRR0,=0x00000018

MOVR1,#12

LSRSR2,R0,R1;R2=R0isshiftedrightR1numberoftimes

;now,R2=0x000000000,C=0,N=0,Z=1,

Example3:

LDRR0,=0x000FFF18

MOVR1,#16

LSRSR2,R0,R1;R2=R0isshiftedrightR1numberoftimes

;Now,R2=0x0000000F,C=1,Z=0,N=0

Thelogicalshiftrightusedforshiftingunsignednumbers.LSRSessentiallydividesRmbyapowerof2foreachbitshift.

MCRMovetoCoprocessorfromARMRegister

SeeARMManual.

MLAMultiplyAccumulate

Flags:Unaffected

Format:MLARd,Rs1,Rs2,Rs3;Rd=(Rs1×Rs2)+Rs3

Function:MultipliesanunsignedwordheldbyRs1byaunsignedwordinRs2andtheresultisaddedtoRs3andplacedinRd.

Example:

MOVR0,#0x20;R0=0x20

MOVR1,#0x50;R1=0x50

MOVR2,#0x10;R2=0x10

MLAR4,R0,R1,R2;nowR4=(0x20×0x50)+10=0xA10

Page 375: ARM Assembly Language: Programming and Architecture

MLSMultiplyandSubtract

Flags:Unaffected

Format:MLSRd,Rm,Rs,Rn;Rd=Rn-(Rs×Rm)

Function:MultipliesanunsignedwordheldbyRmbyaunsignedwordinRsandtheresultissubtractedfromRnandplacedinRd.

Example:

MOVR0,#0x20;R0=0x20

MOVR1,#0x50;R1=0x50

LDRR2,=0x1000;R2=0x1000

MLSR4,R0,R1,R2;nowR4=0x1000-(0x20×0x50)=0x600

MOVMove(ARM7)

Flags:Unaffected.

Format:MOVRd,#imm_value;Rd=imm_Value<0x200

Function:LoadtheRdregisterwithanimmediatevalue.Theimmediatevaluecannotbelargerthan0xFF(0–255).Aftertheexecutiontheflagsarenotupdated.TheMOVSinstructionupdatestheflags.

Example1:

MOVR0,#0x25;R0=0x25

MOVR1,#0x5F;R1=0x5F

ToloadtheARMregisterwithvaluelargerthan0xFFwemustusethe“LDRRd,=32_bit_data.”Forexample,wecanuseLDRR2,=0xFFFFFFFF.

Example2:

LDRR0,=0x2000000;R0=0x2000000

MOVMove(ARMCortex)

Flags:Unaffected.

Format:MOVRd,#imm_val;Rd=imm_val(imm_val<0x10000)

Function:LoadtheRdregisterwithanimmediatevalue.Theimmediatevaluecannotbelargerthan0xFFFF(0–65535).

Example:

MOVR1,#0xF459;R1=0xF459

ToloadtheARMregisterwithvaluelargerthan0xFFFFwemustusethe“LDRRd,=32_bit_data.”ForexamplewecanuseLDRR2,=0xFFFFFFF.

Page 376: ARM Assembly Language: Programming and Architecture

MOVSMove(andupdateflags)

Flags:Affected:C,N,Z

Format:MOVRd,#immediate_value

Function:LoadtheRdregisterwithanimmediatevalueandupdatetheflags.

Example:

MOVSR0,#0x25;R0=0x25,N=0,Z=0,andC=0

MOVSR0,#0x0;R0=0x0,N=0,Z=1,andC=0

MOVTMoveTop

Flags:Unaffected.

Format:MOVTRd,#imm_value;imm_value<0x10000

Function:Loadstheupper16-bitofRdregisterwithanimmediatevalue.Theimmediatevaluecannotbelargerthan0xFFFF(0–65535).Thelower16-bitoftheRdregisterremainsunchanged.

Example:

LDRR0,=0x25579934;R0=0x25579934

MOVTR0,#0xAAAA;R0=0xAAAA9934

MOVWMove16-bitconstant

Flags:Unaffected.

Format:MOVWRd,#imm_value;imm_value<0x10000

Function:LoadtheRdregisterwithanimmediatevalue.Theimmediatevaluecannotbelargerthan0xFFFF(0–65535).

Example:

MOVWR1,#0x5555;R1=0x5555

ToloadtheARMregisterwithvaluelargerthan0xFFFFwemustusethe“LDRRd,=32_bit_data.”ForexamplewecanuseLDRR2,=0xFFFFFFF.

MRCMovetoARMRegisterfromCoprocessor

SeeARMmanual

MRSMovetogeneralRegisterfromSpecialregister

Flags:Unaffected.

Format:MRSRd,special_reg;copyspecial_regtoRd

Function:Copiesthecontentsofaspecialfunctionregistertoageneral-purposeregister.ThisinstructionalongwiththeMSRiswidelyusedtomodifythespecialfunction

Page 377: ARM Assembly Language: Programming and Architecture

registerssuchasCONTROL,PRIMASK,andISPR.Thisistheonlywaywecanaccessthespecialfunctionregisters.

Example:

MRSR1,CONTROL;R1=CONTROL

ANDR1,#0x00;maskthelower8bits

MSRCONTROL,R1

MSRMovetoSpecialregisterfromgeneralRegister

Flags:Unaffected.

Format:MSRspecial_reg,Rn;copyspecial_regtoRn

Function:Copiesthecontentsofageneral-purposeregistertospecialfunctionregister.ThisinstructionalongwiththeMRSiswidelyusedtomodifythecontentsofspecialfunctionregisterssuchasCONTROL,PRIMASK,andISPR.Thisistheonlywaywecanaccessthespecialfunctionregisters.

Example:

MRSR1,CONTROL;R1=CONTROL

ANDR1,#0x00;maskthelower8bits

MSRCONTROL,R1;maskthelower8bitsofCONTROLreg.

MULUnsignedMultiplication

Flags:Affected:N,Z,Unaffected:C,V

Format:MULRd,Rn,Rm;Rd=Rn×Rm

Function:MultipliesawordinregisterRnbyawordinregisterRmandplacestheresultinRd.

Example1:

MOVR0,#100;R0=100

MOVR1,#200;R1=200

MULR3,R0,R1;R3=R0xR1=100x200=20000

Example2:

LDRR0,=10000;R0=10000

LDRR1,=20000;R1=20000

MULR3,R0,R1;R3=R0xR1=10000x20000=200000000

MVNMoveNegative

Flags:Unaffected.

Page 378: ARM Assembly Language: Programming and Architecture

Format:MVNRd,Op2;Rd=1’scomp.ofOp2

Function:PlacesinRdthenegation(the1’scomplement)ofOp2.EachbitofOp2isinverted(logicalNOT)andplacedinRdwhileflagsremainunchanged.

Example1:

MOVR0,#0xAA;R0=0xAA

MVNR2,R0;now,R2=0xFFFFFF55

Example2:

LDRR0,=0xAAAAAAAA;R0=0xAAAAAAAA

MVNR1,R0;R1=0x55555555

Example3:

MVNR0,#0x0F;R0=0xFFFFFFF0

Example4:

MVNR2,#0x0;R0=0xFFFFFFFFwidelyusedtoloadRxwithall1s

MVNSMoveNegativeandupdatetheflags

Flags:NandZareaffected

Format:MVNSRd,Op2;Rd=1’scomp.ofOp2andupdateflags

Function:PlacesinRdthenegation(the1’scomplement)ofOp2.EachbitofOp2isinverted(logicalNOT)andplacedinRdandNandZflagsareupdated.

Example1:

MOVR0,#0xAA;R0=0xAA

MVNSR2,R0;now,R2=0xFFFFFF55,N=1,Z=0

Example2:

LDRR0,=0xAAAAAAAA;R0=0xAAAAAAAA

MVNSR1,R0;R1=0x55555555,Z=0andN=0

Example3:

LDRR0,=0xFFFFFFFF;R0=0xFFFFFFFF

MVNSR1,R0;R1=0x00000000,N=0andZ=1

Example4:

MVNR2,0x0;R0=0xFFFFFFFF,N=1,Z=0

NOPNoOperation

Flags:Unaffected.

Page 379: ARM Assembly Language: Programming and Architecture

Format:NOP

Function:Performsnooperation.Sometimesusedfortimingdelaystowasteclockcycles.UpdatesPC(programcounter)topointtonextinstructionfollowingNOP.InsomeARMCPUs,thepipelineremovestheNOPbeforeitreachestheexecutionstage.

ORNLogicalORNot

Flags:Unaffected.

Format:ORNRd,Rn,Op2;Rd=RnORedwith1’scompofOp2

Function:PerformstheORoperationonthebitsofRnwiththecomplementofthebitsinOp2.TheORNwillnotupdatetheflags.ToupdatetheflagswemustuseORNS.

Inputs Output

A B AOR(NOTB)

0 0 1

0 1 0

1 0 1

1 1 1

Example1:

LDRR1,=0xFFFFFF00;R1=0xFFFFFF00

LDRR2,=0x99999999;R2=0x9999999

ORNR3,R2,R1;nowR3=0x999999FF

Example2:

MOVR1,#0;R1=0

LDRR0=0xFFFFFFFF;R0=0xFFFFFFFF

ORNR2,R1,R0;now,R2=0x0

ORNSORNotandupdateflags

Flags:NandZareaffected

Format:ORNSRd,Rn,Op2;Rd=RnORedwith1’scompofOp2

Function:PerformstheORoperationonthebitsofRnwiththecomplementofthebitsinOp2andupdatestheflags.TheresultisplacedinRd.

Inputs Output

AOR(NOT

Page 380: ARM Assembly Language: Programming and Architecture

A B B)

0 0 1

0 1 0

1 0 1

1 1 1

Example1:

LDRR1,=0xFFFFFF00;R1=0xFFFFFF00

LDRR2,=0x99999999;R2=0x9999999

ORNSR3,R2,R1;now,R3=0x999999FF,N=1,Z=0

Example2:

LDRR0,=0xFFFFFFFF;R0=0xFFFFFFFF

LDRR1,=0x01234567;R1=0x01234567

ORNSR2,R1,R0;now,R2=0x01234567,N=0,Z=0

Example3:

MOVR1,#0;R1=0

LDRR0=0xFFFFFFFF;R0=0xFFFFFFFF

ORNSR2,R1,R0;now,R2=0x0,N=0,Z=1

ORRLogicalOR

Flags:Unaffected

Format:ORRRd,Rn,Op2;Rd=RnORedOp2

Function:PerformslogicalORonthebitsofRnandOp2,andplacestheresultinRd.Oftenusedtoturnabiton.ORRwillnotupdatetheflags.

Example1:

MOVR0,#0xAA;R0=0xAA

ORRR2,R0,#0x55;now,R2=0xFF

Example2:

LDRR0,=0x00010203;R0=00010203

LDRR1,=0x30303030

ORRR2,R0,R1;R2=0x30313233

Page 381: ARM Assembly Language: Programming and Architecture

Example3:

LDRR0,=0x55555555;R0=0x55555555

LDRR1,=0xAAAAAAAA;R0=0xAAAAAAAA

ORRR2,R1,R0;R1=0xFFFFFFFF

ORRSLogicalORandupdatetheflags

Flags:NandZareaffected.

Format:ORRSRd,Rn,Op2;Rd=RnORedOp2,updatetheflags

Function:PerformslogicalORonthebitsofRnandOp2,andplacestheresultinRd.Oftenusedtoturnabiton.ORRSupdatestheflags.

Inputs Output

X Y XORY

0 0 0

0 1 1

1 0 1

1 1 1

Example1:

MOVR0,#0xAA;R0=0xAA

ORRSR2,R0,#0x55;now,R2=0xFF,N=0,Z=0

Example2:

LDRR0,=0x00010203;R0=00010203

LDRR1,=0x30303030

ORRSR2,R0,R1;R2=0x30313233N=0,Z=0

Example3:

LDRR0,=0x55555555;R0=0x55555555

LDRR1,=0xAAAAAAAA;R0=0xAAAAAAAA

ORRSR2,R1,R0;R1=0xFFFFFFFFN=1,Z=0

POPPOPregisterfromStack

Flags:Unaffected.

Format:POP{reg_list};reg_reg=wordsofftopofstack

Function:Copiesthewordspointedtobythestackpointertotheregistersindicated

Page 382: ARM Assembly Language: Programming and Architecture

by the reg_list and increments the SP by 4, 8, 12, 16,… depending on the number ofregistersinthereg_list.

Example:

POP{R1};POPthetopwordofstacktoR1

POP{R1,R4,R7};POPthetop3wordsofstacktoR1,R4,R7

POP{R2-R6};POPthetop5wordsofstacktoR2-R6

POP{R0,R5};POPthetop2wordsofstacktoR0andR5

POP{R0-R7};POPthetop8wordsofstacktoR0-R7

ThePOPinstructionissynonymsforLDMIA.

PUSHPUSHregisterontostack

Flags:Unaffected.

Format:PUSH{reg_list};PUSHreg_listontostack

Function:Copiesthecontentsofregistersstatedinreg_listontothestackanddecrementsSPby4,8,12,16,…dependingonthenumberofregistersinreg_list.

Example:

PUSH{R1};PUSHtheR1ontotopofstack

PUSH{R1,R4,R7};PUSHR1,R4,R7ontotopofstack

PUSH{R2-R6};PUSHtheR2,R3,R4,R5,R6ontotopofstack

PUSH{R0,R5};PUSHtheR0andR5ontotopofstack

PUSH{R0-R7};PUSHtheR0throughR7ontotopofstack

ThePUSHinstructionissynonymsforSTMDB.

RBITReverseBits

Flags:Unaffected.

Format:RBITRd,Rn;ReversethebitorderofRnandplaceinRd

Function:Reversesthebitpositionorderofthe32-bitvalueinRnregisterandplacetheresultinRd.

Example:

MOVR1,#0x5F

RBITR2,R1;now,R2=0xF5000000

REVReversebyteorderinaword

Flags:Unaffected

Format:REVRd,Rn;ReversethebyteofRnandplaceitinRd

Page 383: ARM Assembly Language: Programming and Architecture

Function:Reversesthebytepositionorderofthe32-bitvalueinRnregisterandplacestheresultinRd.Thiscanbeusedtoconvertfromlittleendiantobigendianorfrombigendiantolittleendian.

Example:

LDRR1,=0x12345678

REVR2,R1;now,R2=0x78564312

RV16Reversebyteorderin16-bit

Flags:Unaffected

Format:REV16Rd,Rn;ReversethebitsifRnandplaceitinRd

Function:Reversesthe16-bitpositionorderofthe32-bitvalueinRnregisterandplacestheresultinRd.Thiscanbeusedtoconvert16-bitlittleendiantobigendianorfrom16-bitbigendiantolittleendian.

Example:

LDRR1,=0x559922FF

RV16R2,R1;now,R2=0x22FF5599

REVSHReversebyteorderinbottomhalfwordandsignextend

Flags:Unaffected

Format:REVSHRd,Rn;Rd=ReversethebyteandsignextendRn

Function:Reversesthe16-bitpositionorderofRnregisterandaftersignextendingto32-bititisplacedinRd.Thiscanbeusedtoconvertasigned16-bitlittleendianto32-bitsignedbigendianorfromsigned16-bitbigendianto32-bitsignedlittleendian.

Example:

LDRR1,=0x559922FF

REVSHR2,R1;now,R2=0x22FF5599

RORRotateRight

Flags:Unaffected.

Format:RORRd,Rm,Rn;Rd=rotateRmrightRnbitpositions

Function:AseachbitofRmregistershiftsfromlefttoright,theyexitfromtherightend(LSB)andenterfromleftend(MSB).ThenumberofbitstoberotatedrightisgivenbyRnandtheresultisplacedinRdregister.TheRORdoesnotupdatetheflags.

Page 384: ARM Assembly Language: Programming and Architecture

Example1:

LDRR2,=0x00000010

RORR0,R2,#8;R0=R2isrotatedright8times

;now,R0=0x10000000,C=0

Example2:

LDRR0,=0x00000018

MOVR1,#12

RORR2,R0,R1;R2=R0isrotatedrightR1numberoftimes

;now,R2=0x01800000,C=0

Example3:

LDRR0,=0x0000FF18

MOVR1,#16

RORR2,R0,R1;R2=R0isrotatedrightR1numberoftimes

;Now,R2=0xFF180000,C=0

RORSRotateRight(updatetheflags)

Flags:C,N,Zareaffected.

Format:RORSRd,Rm,Rn;Rd=rotateRmrightRnbitpositions

Function:AseachbitofRmregistershiftsfromlefttoright,theyexitfromtherightend(LSB)andentersfromleftend(MSB).InadditionaseachbitexitstheLSB,acopyisofitisgiventoCflag.ThenumberofbitstoberotatedrightisgivenbyRnandtheresultisplacedinRdregister.TheRORSupdatestheflags.

Example1:

LDRR2,=0x00000010

RORSR0,R2,#8;R0=R2isrotatedright8times

;now,R0=0x01000000,C=0,N=0,Z=0

Example2:

LDRR0,=0x00000018

MOVR1,#12

RORSR2,R0,R1;R2=R0isrotatedrightR1numberoftimes

Page 385: ARM Assembly Language: Programming and Architecture

;now,R2=0x01800000,C=0,N=0,Z=0

Example3:

LDRR0,=0x0000FF18

MOVR1,#16

RORSR2,R0,R1;R2=R0isrotatedrightR1numberoftimes

;now,R2=0xFF180000,C=1,N=0,Z=0

RRXRotateRightwithextend

Flags:Unaffected.

Format:RRXRd,Rm;Rd=rotateRmright1bitposition

Function:EachbitofRmregisterisshiftedfromlefttorightonebit.TheRRXdoesnotupdatetheflags.

Example:

LDRR2,=0x00000002

RRXR0,R2;R0=R2isshiftedrightonebit

;now,R0=0x00000001

RRXSRotateRightwithextend(updatetheflags)

Flags:Affected.

Format:RRXSRd,Rm;Rd=rotateRmright1bitposition

Function:EachbitofRmregisterisshiftedfromlefttorightonebit.TheRRXSupdatestheflags.

Example1:

LDRR2,=0x00000002

RRXSR0,R2;R0=R2isshiftedrightonebit

;now,R0=0x00000001

RSBReverseSubtract

Flags:Unaffected

Page 386: ARM Assembly Language: Programming and Architecture

Format:RSBRd,Rn,Op2;Rd=Op2-Rn

Function:SubtractstheRnfromtheOp2andputstheresultintheRd.TheRSBhasnoeffectonflags.ThestepsforsubtractionperformedbytheinternalhardwareoftheCPUareasfollows:

1.Takesthe2’scomplementoftheRn

2.AddsthistotheOp2

3.PlacestheresultinRd

TheOp2andRnoperandsremainunchangedbythisinstruction.

Example:

LDRR0,=0x55555555;R0=0x55555555

LDRR1,=0x99999999;R1=0x99999999

RSBR2,R0,R1;R2=R1-R0

;For“RSBR2,R0,R1”wehave:

;R2=R1-R0=0x99999999-0x55555555=

;R2=0x99999999+2’scompof0x55555555

;R2=0x99999999+0xAAAAAAAB=0x44444444

;0x99999999

;-0x55555555

;–––––

;0x44444444

RSBSReverseSubtractandupdatetheflags

Flags:Affected:V,N,Z,C.

Format:RSBSRd,Rn,Op2;Rd=Op2-Rn

Function:SubtractstheRnfromtheOp2andputstheresultintheRd.TheRSBSupdatestheflags.

ThestepsforsubtractionperformedbytheinternalhardwareoftheCPUareasfollows:

1.Takesthe2’scomplementoftheRn

2.AddsthistotheOp2

3.PlacestheresultintheRd

TheRnandOp2operandsremainunchangedbythisinstruction.

Example:

LDRR0,=0x55555555;R0=0x55555555

Page 387: ARM Assembly Language: Programming and Architecture

LDRR1,=0x99999999;R1=0x99999999

RSBR2,R0,R1;R2=R1-R0

;For“RSBR2,R0,R1”wehave:

;R2=R1-R0=0x99999999-0x55555555=

;R2=0x99999999+2’scompof0x55555555

;R2=0x99999999+0xAAAAAAAB=0x44444444

;0x99999999

;-0x55555555

;–––––

;0x44444444C=0,N=0,Z=0,V=0

SBCSubtractwithCarry(Borrow)

Flags:Unaffected

Format:SBCRd,Rn,Op2;Rd=Rn–Op2–(1–C)

Function:SubtractstheOp2operandfromtheRn,placingtheresultinRd.IfC=0,itsubtracts1fromtheresult;otherwise,itoperateslikeSUB.TheSBChasnoeffectonflags.Thisisusedwidelyformultiword(64-bit)subtraction.

Example:

LDRR0,=0x55555555;R0=0x55555555

LDRR1,=0x99999999;R1=0x99999999

SUBSR2,R0,R1;R2=R0-R1

MOVR3,#0x09;R3=0x09

SBCR4,R3,#03;R4=R3-0x3

;ForSUBSwehave:

;R2=R1-R0=0x55555555-0x99999999=

;R2=0x55555555+2’scompof0x99999999

;R2=0x55555555+0x66666667=0xBBBBBBBCC=0

;ForSBCwehave:

;R4=R3-0x3=0x09-0x3-(1-C)=9-3-1

;R4=0x9+2’comp.of-4=0x9+0xFFFFFFFC=0x05

;0x0000000955555555

;-0x0000000399999999

SBCSSubtractwithCarry(Borrow)andupdatetheflags

Page 388: ARM Assembly Language: Programming and Architecture

Flags:C,Z,N,V

Format:SBCSRd,Rn,Op2;Rd=Rn–Op2–(1–C)

Function:SubtractstheOp2operandfromtheRn,placingtheresultinRd.IfC=0,itsubtracts1fromtheresult;otherwise,itexecuteslikeSUB.TheSBCSupdatestheflags.Thisisusedwidelyformultiword(64-bit)subtraction.

Example:

LDRR0,=0x55555555;R0=0x55555555

LDRR1,=0x99999999;R1=0x99999999

SUBSR2,R0,R1;R2=R0-R1

MOVR3,#0x09;R3=0x09

SBCSR4,R3,#03;R4=R3-0x3

;ForSUBSwehave:

;R2=R1-R0=0x55555555-0x99999999=

;R2=0x55555555+2’scompof0x99999999

;R2=0x55555555+0x66666667=0xBBBBBBBCC=0

;ForSBCSwehave:

;R4=R3-0x3=0x09-0x3-(1-C)=9-3-1

;R4=0x9+2’comp.of-4=0x9+0xFFFFFFFC=0x05

;0x0000000955555555

;-0x0000000399999999

;––––––

;0x10000005BBBBBBBCC=0,Z=0,V=0,N=0

SBFXSignBitFieldextract

Flags:Unaffected

Format:SBFXRd,Rn,#LSB,#Width

Function:ExtractsthebitfieldfromtheRnregisterandthenaftersignextendingitisplacedinRd.The#LSBindicateswhichbitand#Widthindicateshowmanybits.

Example1:

LDRR0,=0x00000543;R0=0x00000543

SBFXR2,R0,#8,#4;now,R2=0x00000005

Example2:

LDRR0,=0x00000C43;R0=0x00000C43

Page 389: ARM Assembly Language: Programming and Architecture

SBFXR2,R0,#4,#8;now,R2=0xFFFFFFC4

SDIVSignedDivide

Flags:Unaffected

Format:SDIVRd,Rn,Rm;Rd=Rn/Rm

Function:DividesasignedintegerwordinRnbyanothersignedintegerwordinRm.ThequotientresultisplacedinRd.IfvalueinRnregisterisnotdivisiblebythevalueinRmregister,theresultisroundedtozeroandplacedinRd.Dividebyzerocausesinterrupttype3.

Example:

LDRR0,=-20000;R0=-20000

LDRR1,=-1000;R1=-1000

SDIVR2,R0,R1;now,R2=-2000/-1000=2

SEVSendEvent

Flags:Affected.

Format:SEV

Function:Sendssignaltoalltheprocessorsinthemultiprocessorssystem.SeetheARMCortexmanual.

SMLALSignedMultiplyAccumulateLong

Flags:Unaffected

Format:SMLALRdlo,Rdhi,Rn,Rm;Rdhi:Rdlo=(Rm×Rn)+(Rdhi:Rdlo)

Function:MultipliessignedwordsinRnandRmregister,addsthe64-bitresulttoRdhi:Rdloregister,andsavesthefinalresultinRdhi:Rdlo.TheRdlo(low)andRdhi(high)arethelowerwordandhigherwordofa64-bitvalue.

Example1:

LDRR0,=0

LDRR1,=0x23

LDRR2,=-5000

LDRR3,=-4000

SMLALR0,R1,R2,R3;now,R3:R2=(R3:R2)+(R1×R0)

;=0x2300000000+(-5000×-4000)

;=0x2300000000+20000000

;=0x23000000+0x1312D00=0x2301312D00

;=>R0=0x1312D00andR1=0x23

Page 390: ARM Assembly Language: Programming and Architecture

SMULLSignedMultiplyLong

Flags:Unaffected

Format:SMULLRdlo,Rdhi,Rn,Rm;Rdhi:Rdlo=Rm×Rn

Function:MultipliessignedwordsinRnandRmregister,andsavestheresultinRdhi:Rdlo.TheRdl(low)andRdh(high)arethelowerwordandhigherwordofa64-bitvalue.

Example:

LDRR0,=-20000;R0=-20000(signed2’scomp)

LDRR1,=-1000000;R0=-100000(signed2’scomp)

SMLALR2,R3,R0,R1;now,R3:R2=R1×R0=-20000×-1000000=

;20000000000=0x4A817C800=>R3=0x4and

;R2=0xA817C800

SSATSignSaturate

Flags:Unaffected.

Format:SSATRd,#n,Rm,shift#

Function:Usedforsaturationoperation.SeeARMCortexmanual.

STMStoreMultiple

Flags:Unaffected.

Format:STMRn,{Rx,Ry,…}

Function:StoresregistersRx,Ry,…intoconsecutivememorylocations.ThestartingaddressofmemorylocationisgivenbyRnregister.Thesourceregistersareseparatedbycommaandplacedinbraces.IntheARMCortex,thedefaultstackisdescendingmeaningthatasinformationarepushedontostackthestackpointerisdecremented.ThisIA(IncrementtheaddressAftereachaccess)isthedefault.ThisinstructioniswidelyusedforPushing(storing)multipleregistersintoascendingstack.

Example:

LDRR7,=0x12000

LDRR0,=0x82381046;R0=0x82381046

LDRR2,=0x15585056;R2=0x15585056

LDRR4,=0x39686063;R4,=0x39686063

STMR7,{R0,R2,R4};now,R2=0x15585056,..

;ThecontentsofregistersR0,R2,andR4arestoredinto

;consecutivememorylocationsstartingatanaddressgivenbyR7.

Page 391: ARM Assembly Language: Programming and Architecture

;TheR0contentsarestoredintomemorylocations0x12000-0x12003,

;theR2contentsarestoredintomemorylocations0x12004through

;0x12007,andsoon.Thisisshownbelow.

;12000=(46)

;12001=(10)

;12002=(38)

;12003=(82)

;12004=(56)

;12005=(50)

;12006=(58)

;12007=(15)

;12008=(63)

;12009=(60)

;1200A=(68)

;1200B=(39)

STMDBStoreMultipleregisterandDecrementBefore

Flags:Unaffected.

Format:STMDBRn,{Rx,Ry,…}

Function:StoresregistersRx,Ry,…intoconsecutivememorylocations.ThestartingaddressofmemorylocationisgivenbyRnregister.Thesourceregistersareseparatedbycommaandplacedinbraces.IntheARMCortex,thedefaultstackisdescendingmeaningthatasinformationarepushedontostackthestackpointerisdecremented.SinceIA(IncrementtheaddressAftereachaccess)isthedefaultweneedtouseDB(DecrementtheaddressBeforeeachaccess)istooverwritethedefault.ThisinstructioniswidelyusedforPushing(storing)multipleregistersintoDescendingstack.

Example:

LDRR7,=0x12000

LDRR0,=0x39686063;R0=0x39686063

LDRR2,=0x15585056;R2=0x15585056

LDRR4,=0x82381046;R4,=0x82381046

STMDBR7,{R0,R2,R4}

;ThecontentsofregistersR0,R2,andR4arestoredinto

;consecutivememorylocationsstartingatanaddressgivenbyR7.

Page 392: ARM Assembly Language: Programming and Architecture

;TheR0contentsarestoredintomemorylocations0x11FFF-0x11FFC,

;theR2contentsarestoredintomemorylocations0x11FFB

;through0x11FF8,andsoon.Thisisshownbelow.

;11FF4=(46)

;11FF5=(10)

;11FF6=(38)

;11FF7=(82)

;11FF8=(56)

;11FF9=(50)

;11FFA=(58)

;11FFB=(15)

;11FFC=(63)

;11FFD=(60)

;11FFE=(68)

;11FFF=(39)

STMEAStoreMultipleregisterEmptyAscending

Flags:Unaffected.

Format:STMEARn,{Rx,Ry,…}

Function:ThisissameasSTM.

STMIAStoreMultipleregisterEmptyAscending

Flags:Unaffected.

Format:STMIARn,{Rx,Ry,…}

Function:ThisissameasSTM.

STMFDStoreMultipleregisterFullDescending

Flags:Unaffected.

Format:STMFDRn,{Rx,Ry,…}

Function:ThisisanothernameforSTMDB.TheFDisforpushingontoFullDescendingstacks

STRStoreRegister

Flags:Unaffected.

Format:STRRd,[Rx];StoreRdintomemorylocationpointedto

Page 393: ARM Assembly Language: Programming and Architecture

beRx

Function:StoresRdregisterintofourconsecutivememorylocations.The[Rx]pointstostartingaddressofmemorylocation.Thisiswidelyusedtostore32-bitregisterintomemorylocations.

Example:

LDRR1,=0x82381046;R1=0x82381046

LDRR0,=0x12000;R0=0x12000

STRR1,[R0];now,

;12000=(46)

;12001=(10)

;12002=(38)

;12003=(82)

STRBStoreRegisterByte

Flags:Unaffected.

Format:STRBRd,[Rn]

Function:StoresthelowestbyteoftheRdregisterintoasinglememorylocationindicatedbyRn.

Example:

LDRR1,=0x82381046;R1=0x82381046

LDRR0,=0x12000;R0=0x12000

STRBR1,[R0];now,12000=(46)

STRBTStoreRegisterBytewithTranslation

Flags:Unaffected.

Format:STRBT

Function:StoresthelowestbyteoftheRdregisterintoasinglememorylocationindicatedbyRn.ThisisthesameasSTRBbutisusedforunprivilegedmemoryaccess.SeeARMCortexmanual.

Example:

LDRR1,=0x82381046;R1=0x82381046

LDRR0,=0x12000;R0=0x12000

STRBTR1,[R0]

Page 394: ARM Assembly Language: Programming and Architecture

;now,12000=(46)

STRDStoreRegisterDouble(twowords)

Flags:Unaffected.

Format:STRDRd,[Rn]

Function:StorestworegistersofRdandRd+1into8consecutivememorylocationsindicatedbyRn.RdcanbeR0,R2,R4,R6,R8,R10,orR12.

Example:

LDRR2,=0x12000

LDRR0,=0x82381046;R0=0x82381046

LDRR1,=0x15585056;R1=0x15585056

STRDR0,R1,[R2];storeR0andR1intomemorylocationsstarting

;atanaddressgivenbyR2.Now,wehave:

;12000=(46)

;12001=(10)

;12002=(38)

;12003=(82)

;12004=(56)

;12005=(50)

;12006=(58)

;12007=(15)

STREX,STREXB,STREXHStoreRegisterExclusive

Function:TheyareusedwithLDREX,LDREXB,andLDREXHinstructionstoperformCPUsynchronizationoperation.SeeARMCortexmanual.

STRHStoreRegisterHalfword

Flags:Unaffected.

Format:STRHRd,[Rn]

Function:Storesthelower2bytesoftheRdregisterintotwoconsecutivememorylocationsindicatedbyRn.

Example:

LDRR1,=0x82381046;R1=0x82381046

LDRR0,=0x12000;R0=0x12000

Page 395: ARM Assembly Language: Programming and Architecture

STRBR1,[R0];now,12000=(46),and12001=(10)

STRTStoreRegister

Flags:Unaffected

Format:STRTRx,[Rn]

Function:StoresRxregisterintomemorylocationpointedtobyRx.ThisisthesameasSTRbutisusedforunprivilegedmemoryaccess.SeeARMCortexmanual.

Example:

LDRR1,=0x82381046;R1=0x82381046

LDRR0,=0x12000;R0=0x12000

STRTR1,[R0]

;now,12000=(0x82381046)

SUBSubtract

Flags:Unaffected

Format:SUBRd,Rn,Op2;Rd=Rn–Op2

Function:SubtractstheOp2fromtheRnandputstheresultintheRd.Hasnoeffectonflags.ThestepsforsubtractionperformedbytheinternalhardwareoftheCPUareasfollows:

1.Takesthe2’scomplementoftheOp2

2.AddsthistotheRn

3.PlacetheresultintheRd

TheRdandOp2operandsremainunchangedbythisinstruction.

Example:

LDRR0,=0x55555555;R0=0x55555555

LDRR1,=0x99999999;R1=0x99999999

SUBR2,R1,R0;R2=R1-R0

;For“SUBR2,R1,R0”wehave:

;R2=R1-R0=0x99999999-0x55555555=

;R2=0x99999999+2’scompof0x55555555

;R2=0x99999999+0xAAAAAAAB=0x44444444

;0x99999999

;-0x55555555

;–––––

Page 396: ARM Assembly Language: Programming and Architecture

;0x44444444

SUBSSubtract

Flags:Affected:V,N,Z,C.

Format:SUBRd,Rn,Op2;Rd=Rn–Op2

Function:SubtractstheOp2fromtheRnandputstheresultintheRd.TheSUBSupdatestheflags.ThestepsforsubtractionperformedbytheinternalhardwareoftheCPUareasfollows:

1.Takesthe2’scomplementoftheOp2

2.AddsthistotheRn

3.PlacetheresultintheRd

TheRdandOp2operandsremainunchangedbythisinstruction.

Example:

LDRR0,=0x55555555;R0=0x55555555

LDRR1,=0x99999999;R1=0x99999999

SUBSR2,R1,R0;R2=R1-R0

;For“SUBSR2,R1,R0”wehave:

;R2=R1-R0=0x99999999-0x55555555=

;R2=0x99999999+2’scomp.of0x55555555

;R2=0x99999999+0xAAAAAAAB=0x44444444

;0x99999999

;-0x55555555

;––––—

;0x44444444C=1,Z=0,N=0,V=0

SVCsupervisorCall(SoftwareInterrupt)

Flags:Unaffected.

Format:SVC#imm_value

Function:Itisusedbyapplicationsoftwaretogetservicesfromoperatingsystems(OS).ThisisliketheSWI(softwareinterrupt)instructioninARM7.

SXTBSignExtendbyte

Flags:Unaffected.

Format:SXTBRd,Rm

Page 397: ARM Assembly Language: Programming and Architecture

Function:ConvertsasignedbyteinRmintoasignedwordbycopyingthesignbit(D7)ofRmintoallthebitsofRd.UsedwidelytoconvertasignedbyteinRmtoasignedwordtoavoidtheoverflowprobleminsignednumberarithmetic.

Example:

MOVR1,#0xFB;R1=0xFBwhichis2’scomplementof-5

SXTBR0,R1;now,R0=0xFFFFFFFB

;R1=00000000000000000000000011111011

;nowR0=0xFFFFFFFB

;R0=11111111111111111111111111111011

SXTHSignExtendHalfword

Flags:Unaffected.

Format:SXTHRd,Rm

Function:ConvertsasignedhalfwordinRmintoasignedwordbycopyingthesignbit(D15)ofRmintoallthebitsofRd.Usedwidelytoconvertasignedhalfword(16-bit)inRmtoasignedwordtoavoidtheoverflowprobleminsignednumberarithmetic.

Example:

;assumeR1=0xFFFBwhichis2’scomplementof-5

SXTHR0,R1;now,R0=0xFFFFFFFB

;R1=00000000000000001111111111111011

;now,R0=0xFFFFFFFB

;R0=11111111111111111111111111111011

TBBTableBranchByte

Flags:Unaffected.

Format:TBB[Rn,Rm]

Function:BranchesforwardusingtableofsinglebyteoffsetusingPC-relativeaddressingmode.RnhasstartingaddressofthetableandRmisanindexintothetable.SeeARMCortexM3manual.

TBHTableBranchhalfword

Flags:Unaffected.

Format:TBH[Rn,Rm,LSL#1]

Function:BranchesforwardusingtableofhalfwordoffsetusingPC-relativeaddressingmode.RnhasstartingaddressofthetableandRmisanindexintothetable.The“LSL#1”shiftslefttheaddressoncetomakeithalfwordalignedaddress.SeeARMCortexM3

Page 398: ARM Assembly Language: Programming and Architecture

manual.

TEQTestEquivalence

Flags:Affected:NandZ

Format:TEQRn,Op2;performsRnEx-OROp2

Function:PerformsabitwiselogicalEx-ORonRnandOp2,settingflagsbutleavingthecontentsofbothRnandOp2unchanged.WhiletheEORSinstructionchangesthecontentsofthedestinationandtheflagbits,theTEQinstructionchangesonlytheflagbits.Thisiswidelyusedtoseeiftworegistersareequal.

Example1:

TEQR1,R2;checktoseeifR1=R2.IfsoZ=1.R1andR2

;remainunchanged

Example2:

TEQR2,#0x01;checktoseeifD0ofR2is1,ifsoZ=1.R2

;remainsunchanged

Example3:

TEQR1,#0xFF;checktoseeifD7_D0ofR1are1s,

;ifsoZ=1.R1remainsunchanged

TSTTest

Flags:Affected:NandZ

Format:TSTRn,Op2;performsRnANDOp2

Function:PerformsabitwiselogicalANDonRnandOp2,settingflagsbutleavingthecontentsofbothRnandOp2unchanged.WhiletheANDSinstructionchangesthecontentsofthedestinationandtheflagbits,theTSTinstructionchangesonlytheflagbits.TotestwhetherabitofRnis0or1,usetheTSTinstructionwithanOp2constantthathasthatbitsetto1andallotherbitsclearedto0.

Example1:

TSTR1,#0x01;checktoseeifD0ofR1iszero,ifsoZ=1.

;R1remainunchanged

Example2:

TSTR1,#0xFF;checktoseeifanybitsofR1iszero,ifso

;Z=1.R1remainunchanged

UBFXUnsignedBitfiledextract

Page 399: ARM Assembly Language: Programming and Architecture

Flags:Unaffected.

Format:UBFXRd,Rn,#LSB,#Width

Function:ExtractsthebitfieldfromtheRnregisterandthenzeroextendsitandplacesinRd.The#LSBindicatesfromwhichbitand#Widthindicateshowmanybits.

Example1:

LDRR0,=0x00077555;R0=0x00077555

UBFXR2,R0,#8,#4;now,R2=0x00000005

Example2:

LDRR0,=0x12345678;R0=0x12345678

UBFXR2,R0,#8,#12;now,R2=0x00000456

UDIVUnsignedDivide

Flags:Unaffected

Format:UDIVRd,Rn,Rm;Rd=Rn/Rm

Function:DividesanunsignedintegerwordinRnbyanotherunsignedintegerwordinRm.ThequotientresultisplacedinRd.IfvalueinRnregisterisnotdivisiblebythevalueinRmregister,theresultisroundedtozeroandplacedinRd.Dividebyzerocausesexceptioninterrupt.

Example1:

LDRR0,=100;R0=100

LDRR1,=2000

UDIVR2,R1,R0;now,R2=R1/R0=2000/100=20

Example2:

LDRR0,=20000;R0=20000

UDIVR2,R0,#100;now,R2=2000/100=20

UMLALUnsignedMultiplywithAccumulate

Flags:Unaffected

Format:UMLALRdLo,RdHi,Rn,Rm;RdHi:RdLo=(Rm×Rn)+(RdHi:RdLo)

Function:MultipliesunsignedwordsinRnandRmregister,addsthe64-bitresulttoRdHi:RdLoregisters,andsavesthefinalresultinRdHi:RdLo.TheRdLo(low)andRdHi(high)aretheunsignedlowerwordandhigherwordofthe64-bitvalue.

Example:

LDRR0,=20000;R0=20000

Page 400: ARM Assembly Language: Programming and Architecture

LDRR1,=1000

LDRR2,=5000

LDRR3,=4000

UMLALR2,R3,R0,R1;now,R3:R2=R1×R0+R3:R2

UMULLUnsignedMultiplyLong

Flags:Unaffected

Format:UMULLRdLo,RdHi,Rn,Rm;RdHi:RdLo=Rm×Rn

Function:MultipliesunsignedwordsinRnandRmregisters,andsavestheresultinRdHi:RdLo.TheRdLo(low)andRdHi(high)arethelowerwordandhigherwordofa64-bitvalue.

Example:

LDRR0,=20000;R0=20000

LDRR1,=10000;R1=10000

LDRR2,=50000;R2=50000

LDRR3,=40000;R3=40000

UMLALR2,R3,R0,R1;now,R3:R2=R1×R0

USATUnsignedSaturate

Flags:Unaffected

Format:USATRd,#n,Rm,shift#

Function:Usedforunsignedsaturationoperation.SeeARMCortexmanual.

UXBTZeroextendabyte

Flags:Unaffected

Format:UXBTRd,Rm

Function:ZeroextendsabyteinRmandplacesinRd.UsedwidelytoconvertabyteinRmtowordforsignednumberoperations.

Example:

MOVR1,#0xFB;R1=0xFB

UXBTR0,R1;now,R0=0x00000000FB

;R1=00000000000000000000000011111011

;nowR0=0x000000FB

;R0=000000000000000000000000011111011

UXTHZeroextendhalfword

Page 401: ARM Assembly Language: Programming and Architecture

Flags:Unaffected

Format:UXTHRd,Rm

Function:ZeroextendsahalfwordinRmandplacesinRd.UsedwidelytoconvertahalfwordinRmtowordforsignednumberoperations.

Example:

;assumeR1=0xFFFB

UXTHR0,R1;now,R0=0x00000FFFB

;R1=00000000000000001111111111111011

;now,R0=0x0000FFFB

;R0=00000000000000001111111111111011

WFEWaitforevent

Flags:Unaffected

Format:WFE

Function:Usedbypowermanagement.SeeARMCortexM3manual.

WFIWaitforinterrupt

Flags:Unaffected

Format:WFI

Function:Suspendsexecutionuntiloneofthefollowingeventsoccurs:

1.anon-maskedinterruptoccursandistaken,

2.aninterruptmaskedbyPRIMASKbecomespending,

3.aDebugEntryrequest.

SeeARMCortexmanual.

Page 402: ARM Assembly Language: Programming and Architecture
Page 403: ARM Assembly Language: Programming and Architecture

AppendixB:ARMAssemblerDirectives

Page 404: ARM Assembly Language: Programming and Architecture
Page 405: ARM Assembly Language: Programming and Architecture

SectionB.1:ListofARMAssemblerDirectives

ALIGN

AREA

DCBdirective(defineconstantbyte)

DCDdirective(defineconstantword)

DCWdirective(defineconstanthalf-word)

ENDPorENDFUNC

ENTRY

EQU(Equate)

EXPORTorGLOBAL

EXTRN(External)

FUNCTIONorPROC

INCLUDE

RN(equate)

Page 406: ARM Assembly Language: Programming and Architecture
Page 407: ARM Assembly Language: Programming and Architecture

SectionB.2:DescriptionofARMAssemblerDirectivesDirectives,orastheyaresometimescalled,pseudo-opsorpseudo-instructions,are

usedby theassembler to translateAssembly languageprograms intomachine language.Unlikethemicroprocessor’sinstructions,directivesdonotgenerateanyopcode;therefore,nomemorylocationsareoccupiedbydirectivesinthefinalhexversionoftheassemblyprogram.Tosummarize,directivesgivedirectionstotheassemblerprogramtotellithowto generate the machine code; instructions are assembled into machine code to giveinstructionstotheCPUatexecutiontime.Thefollowingaredescriptionsofthesomeofthemostwidelyuseddirectives for theARMassembler.Theyaregiven inalphabeticalorderforeaseofreference.

ALIGN

Format:

ALIGNn;nisanypowerof2from20to231

Thisisusedtomakesuredataisalignedin32-bitwordor16-bithalfwordmemoryaddress.Ifnisnotspecified,ALIGNsetsthecurrentlocationtothenextword(fourbyte)boundary.ThefollowingusesALIGNtomakethedatawordandhalfwordaligned:

ALIGN4;Thenextinstructionisword(4bytes)aligned

ALIGN;Thenextinstructionisword(4bytes)aligned

ALIGN2;Thenextinstructionishalfword(2bytes)aligned

Noticethat,thisALIGNdirectiveshouldnotbeconfusedwiththeALIGNattributeoftheAREAdirective.

AREA

Format:

AREAsectionnameattribute,attribute,…

TheAREA directive tells the assembler to define a new section ofmemory. ThememorycanbecodeordataandcanhaveattributessuchasReadOnly,ReadWrite,andsoon.This iswidelyused todefineoneormoreblocksof indivisiblememoryforcodeordatatobeusedbythelinker.EveryassemblylanguageprogramhasatleastoneAREA.

ThefollowinglinedefinesanewareanamedMY_ASM_PROG1whichhasCODEandREADONLYattributes:

AREAMY_ASM_PROG1CODE,READONLY

Among widely used attributes are CODE, DATA, READONLY, READWRITE,COMMON,andALIGN.Thefollowingdescribesthesewidelyusedattributes.

CODE is an attribute given to an area of memory used for executable machineinstruction.SinceitisusedforcodesectionoftheprogramitisbydefaultREADONLYmemory.InARMAssemblylanguageweusethisareatowriteourinstructions.

Page 408: ARM Assembly Language: Programming and Architecture

DATA isanattributegiven toanareaofmemoryusedfordataandno instruction(machine instructions)canbeplaced in thisarea.Since it isusedfordatasectionof theprogram it is bydefault aREADWRITEmemory. InARMAssembly languageweusethisareatosetasideSRAMmemoryforscratchpadandstack.

READWRITE isanattributegiven toanareaofmemorywhichcanbereadfromandwrittento.SinceitisREADWRITEsectionoftheprogramitisbydefaultforDATA.InARMAssemblylanguageweusethisareatosetasideSRAMmemoryforscratchpadandstack.

READONLY is an attribute given to an area ofmemorywhich can only be readfrom.SinceitisREADONLYsectionoftheprogramitisbydefaultforCODE.InARMAssemblylanguageweusethisareatowriteourinstructionsformachinecodeexecution.

COMMONisanattributegiventoanareaofDATAmemorysectionwhichcanbeusedcommonlybyseveralprogramcodes.WedonotinitializetheCOMMONsectionofthe memory since it is used by compiler exclusively. The compiler initializes theCOMMONmemoryareawithallzeros.

ALIGN is another attribute given to an area ofmemory to indicate howmemoryshouldbeallocatedaccordingtotheaddresses.WhentheALIGNisusedforCODEandREADONLYitalignedin4-bytesaddressboundarybydefaultsincetheARMinstructionsare all 32-bit (4-bytes) word. The ALIGN attribute of AREA has a number after likeALIGN=3whichindicatestheinformationshouldbeplacedinmemorywithaddressesof23,thatis0x50000,0x50008,0x50010,0x50020,andsoon.ThisALIGNattributeoftheAREAshouldnotbeconfusedwiththeALIGNdirective.

DCBdirective(defineconstantbyte)

Format:

labelDCBn;nbetween-128to256,byteorstring

The DCB directive allocates a byte size memory and initializes the values forreadingonly.

MYVALUEDCB5;MYVALUE=5

MYMSAGEDCB“HELLOWORLD”;string

DCDdirective(defineconstantword)

Format:

labelDCDn

The DCD directive allocates a word size memory and initializes the values forreadingonly.Thedatais32bitaligned.

MYDATADCD0x200000,0xF30F5,5000000,0xFFFF9CD7

DCWdirective(defineconstanthalf-word)

Format:

Page 409: ARM Assembly Language: Programming and Architecture

labelDCBn

TheDCWdirectiveallocatesahalf-wordsizememoryandinitializesthevaluesforreadingonly.

MYDATADCW0x20,0xF230,5000,0x9CD7

END

EveryprogrammusthaveanentryandanENDpoint.Thelabelsfortheentryandendpointsmustmatch.TheENDdirectivetellstheassemblerthatithasreachedtheendofsourcefile.

AREAPROG_C2,CODE,READONLY

ENTRY

END

ENDPorENDFUNC

TheENDFUNCorENDPdirectiveinformstheassemblerthatithasreachedtheendofafunction.ENDFUNCandENDParethesame.SeeFUNCTIONorPROCdirectives.

ENTRY

TheENTRYdirective shows the entry point of a program to the assembler.Eachprogrammusthaveoneentrypoint.

AREAPROG_C2,CODE,READONLY

ENTRY

END

EQU(Equate)

Toassignafixedvaluetoaname,oneusestheEQUdirective.Theassemblerwillreplaceeachoccurrenceofthenamewiththevalueassignedtoit.

DATA1EQU0x39;thewaytodefinehexvalue

PORTBEQU0xF0018000;SFRPortBaddress

SUM1EQU0x40000120;assignRAMloctoSUM1

Unlike data directives such asDCB,DCD, and so on, EQU does not assign anymemorystorage;therefore,itcanbedefinedatanytimeandatanyplace,andcanevenbeusedwithinthecodesegment.

EXPORTorGLOBAL

Toinformtheassemblerthatanameorsymbolwillbereferencedbyothermodules

Page 410: ARM Assembly Language: Programming and Architecture

(in other files), it is marked by the EXPORT or GLOBAL directives. If a module isreferencinganameoutsideitself, thatnamemustbedeclaredasEXTRN(orIMPORT).Correspondingly, in the module where the variable is defined, that variable must bedeclaredasEXPORTorGLOBALinordertoallowittobereferencedbyothermodules.SeetheEXTRNdirectiveforexamplesoftheuseofbothEXTRNandEXPORT.

EXTRN(External)

TheEXTRNdirectiveisusedtoindicatethatcertainvariablesandnamesusedinamodule are defined by another module. In the absence of the EXTRN directive, theassemblerwouldsearchforthedefinitionandgiveanerrorwhenitcouldn’tfindit.Theformatofthisdirectiveis:

EXTRNname

ThefollowingexampleshowshowtheEXPORTandEXTERNdirectivesareused:

;fromthemainprogram:

EXTRNMY_FUNC

BLMY_FUNC

;–––––––––––––

;MY_FUNCislocatedinadifferentfile:

AREAOUR_EXAMPLE,CODE,READONLY

EXTRNDATA1

EXPORTMY_FUNC

MY_FUNCFUNCTION

LDRR1,=DATA1

ENDFUNC

Notice that the EXTRN directive is used in the main procedure to show thatMY_FUNC is defined in another module. This is needed because MY_FUNC is notdefined in that module. Correspondingly, MY_FUNC is defined as GLOABAL in themodulewhere it is defined. EXTRN is used in theMY_FUNCmodule to declare thatoperandDATA1hasbeendefinedinanothermodule.Correspondingly,DATA1isdeclaredasGLOBALinthecallingmodule.

FUNCTIONorPROC

Often,agroupofAssemblylanguageinstructionswillbecombinedintoaprocedure

Page 411: ARM Assembly Language: Programming and Architecture

sothatitcanbecalledbyanothermodule.TheFUNCTIONandENDFUNCdirectivesareusedtoindicatethebeginningandendoftheprocedure.Seethefollowingexample:

MY_FUNCFUNCTION

ENDFUNC

INCLUDE

Whenthereisagroupofmacroswrittenandsavedinaseparatefile,theINCLUDEdirectivecanbeusedtobringthemintoanotherfile.

RN(equate)

This isusedtodefineanameforaregister.TheRNdirectivedoesnotsetasideaseparatestorageforthename,butassociatesaregisterwiththatname.ThefollowingcodeshowshowweuseRN:

VAL1RNR1;defineVAL1asanameforR1

VAL2RNR2;defineVAL2asanameforR2

SUMRNR3;defineSUMasanameforR3

AREAPROG_2_1,CODE,READONLY

ENTRY

MOVVAL1,#0x25;R1=0x25

MOVVAL2,#0x34;R2=0x34

ADDSUM,VAL1,VAL2;addR2toR1andplaceitinR3

HEREBHERE

END

Page 412: ARM Assembly Language: Programming and Architecture
Page 413: ARM Assembly Language: Programming and Architecture

AppendixC:MacrosWhatisamacroandhowisitused?

There are applications in Assembly language programming where a group ofinstructionsperformsataskthatisusedrepeatedly.Forexample,youmightneedtoaddthree registers together. So it does notmake sense to rewrite them every time they areneeded. Therefore, to reduce the time that it takes towrite these codes and reduce thepossibility of errors, the concept ofmacroswasborn.Macros allow the programmer towritethetask(setofcodestoperformaspecificjob)onceonlyandtoinvokeitwheneveritisneeded,whereveritisneeded.

MACROdefinition

Everymacrodefinitionmusthavethreeparts,asfollows:

MACRO

[$label]macroNameparameter1,parameter2,…,parameterN

……

……

MEND

The MACRO directive indicates the beginning of the macro definition and theMEND directive signals the end. What goes in between the MACRO and MENDdirectives is called the body of themacro. The namemust be unique andmust followAssembly language naming conventions. The parameters are names, or parameters, oreven registers that are mentioned in the body of themacro. After themacro has beenwritten, itcanbe invoked(orcalled)byitsname,andappropriatevaluesaresubstitutedfor parameters. For example you might want to have an instruction that adds threeregisters.Thefollowingisamacroforthepurpose:

MACRO

ADD3VAL$DEST,$ARG1,$ARG2,$ARG3

ADD$DEST,$ARG1,$ARG2

ADD$DEST,$DEST,$ARG3

MEND

The above code is the macro definition. Note that parameters $DEST, $ARG1,$ARG2,and$ARG3arementionedinthebodyofthemacro.Todistinguishparameterstheymuststartwith$.Inthefollowingexample,themacroisinvokedbyitsnamewiththeuser’sactualdata:

AREAOURCODE,READONLY,CODE

ENTRY

Page 414: ARM Assembly Language: Programming and Architecture

MOVR1,#5

MOVR2,#2

ADD3VALR0,R1,R2,#5

Theinstruction“ADD3VALR0,R1,R2,#5”invokesthemacro.

Theassemblerexpandsthemacrobyprovidingthefollowingcodeinthe.LSTfile:

300000008E0810002ADDR0,R1,R2

40000000CE2800005ADDR0,R0,#5

DefaultValuesforparameters

Wecandefinedefaultvaluesforparametersasshownbelow:

MACRO

ADD3VAL$DEST,$ARG1=R3,$ARG2,$ARG3=#5

ADD$DEST,$ARG1,$ARG2

ADD$DEST,$DEST,$ARG3

MEND

To use the default value we put a | instead of the parameter while invoking themacro:

ADD3VALR0,R1,R2,|

Theabovecodeusesthedefaultvalueof$ARG3whichissetto#5.

Usinglabelsinmacros

In thediscussionofmacrosso far,exampleshavebeenchosen thatdonothavealabelornameinthebodyofthemacro.Thisisbecauseifamacroisexpandedmorethanonceinaprogramandthereisalabelinthelabelfieldofthebodyofthemacro,thesamelabelwouldbegeneratedmorethanonceandanassemblererrorwouldbegenerated.Toaddresstheproblemwecangiveauniquelabeltothemacrowhenweinvokeit,asshownbelow:

MACRO

$lblOUR_MACRO

CMPR1,#5

BEQ$lbl

MOVR1,#1

$lbl

MEND

Page 415: ARM Assembly Language: Programming and Architecture

AREAOURCODE,READONLY,CODE

ENTRY

MOVR1,#3

label1OUR_MACRO

MOVR1,#5

label2OUR_MACRO

HEREBHERE

Theassemblerexpandsthemacrobyprovidingthefollowingcodeinthe.LSTfile:

2000000000AREAOURCODE,READONLY,CODE

2100000000ENTRY

2200000000E3A01003MOVR1,#3

2300000004label1OUR_MACRO

300000004E3510005CMPR1,#5

4000000080A000000BEQlabel1

50000000CE3A01001MOVR1,#1

600000010label1

2400000010E3A01005MOVR1,#5

2500000014label2OUR_MACRO

300000014E3510005CMPR1,#5

4000000180A000000BEQlabel2

50000001CE3A01001MOVR1,#1

600000020label2

2600000020EAFFFFFE

HEREBHERE

Incases that therearemore thanone label inamacro the linescanbe labeledasshownbelow:

MACRO

$lblOUR_MACRO

CMPR1,#5

BEQ$lbl.equal

MOVR1,#1

B$lbl.next

Page 416: ARM Assembly Language: Programming and Architecture

$lbl.equal

MOVR1,#2

$lbl.next

MEND

AREAOURCODE,READONLY,CODE

ENTRY

MOVR1,#3

label1OUR_MACRO

MOVR1,#5

label2OUR_MACRO

HEREBHERE

Theassemblerexpandsthemacrobyprovidingthefollowingcodeinthe.LSTfile:

1300000000AREAOURCODE,READONLY,CODE

1400000000ENTRY

1500000000E3A01003MOVR1,#3

1600000004label1OUR_MACRO

300000004E3510005CMPR1,#5

4000000080A000001BEQlabel1equal

50000000CE3A01001MOVR1,#1

600000010EA000000Blabel1next

700000014label1equal

800000014E3A01002MOVR1,#2

900000018label1next

1700000018E3A01005MOVR1,#5

180000001Clabel2OUR_MACRO

30000001CE3510005CMPR1,#5

4000000200A000001BEQlabel2equal

500000024E3A01001MOVR1,#1

600000028EA000000Blabel2next

70000002Clabel2equal

Page 417: ARM Assembly Language: Programming and Architecture

80000002CE3A01002MOVR1,#2

900000030label2next

1900000030EAFFFFFE

HEREBHERE

Conditionalmacros

Wecanpassconditionintomacros,aswell:

MACRO

$lblOurMacro$cond

CMPR1,#5

B$cond$lbl.equal

MOVR1,#1

$lbl.equal

MEND

AREAOURCODE,READONLY,CODE

ENTRY

MOVR1,#3

label1OurMacroEQ;inthemacrocheckequality

MOVR1,#3

label2OurMacroLO;inthemacrocheckifislower

HEREBHERE

Theassemblerexpandsthemacrobyprovidingthefollowingcodeinthe.LSTfile:

1000000000AREAOURCODE,READONLY,CODE

1100000000ENTRY

1200000000E3A01003MOVR1,#3

1300000004label1OurMacroEQ

300000004E3510005CMPR1,#5

4000000080A000000BEQlabel1equal

50000000CE3A01001MOVR1,#1

600000010label1equal

1400000010E3A01003MOVR1,#3

Page 418: ARM Assembly Language: Programming and Architecture

1500000014label2OurMacroLO

300000014E3510005CMPR1,#5

4000000183A000000BLOlabel2equal

50000001CE3A01001MOVR1,#1

600000020label2equal

1600000020EAFFFFFE

HEREBHERE

Notice that the firstB$cond is substitutedwithBEQwhile the secondB$cond issubstitutedwithBLOsincetheconditionsEQandLOareusedrespectively.

INCLUDEdirective

Assumethatthereareseveralmacrosthatareusedineveryprogram.Musttheyberewritten every time? The answer is no if the concept of the INCLUDE directive isknown.TheINCLUDEdirectiveallowsaprogrammertowritemacrosandsavetheminafile, and later bring them into any file. For example, assuming that some widely usedmacroswerewrittenandthensavedunderthefilename“MYMACRO1.S”,theINCLUDEdirectivecanbeusedtobringthisfileintoany“.asm”fileandthentheprogramcancallupon any of the macros as many times as needed. In the following example theADD3VALmacro isdefined in theMyMACRO.sfileand it isused in theexample.asmfile.

FigureC.1:DefiningaMacroinanIncludeFile

Page 419: ARM Assembly Language: Programming and Architecture

Macrosvs.subroutinesMacros and subroutines are useful in writing assembly programs, but each has

limitations.Macros increasecodesizeevery time theyare invoked.Forexample, ifyoucall a 10-instruction macro 10 times, the code size is increased by 100 instructions;whereas, if you call the same subroutine 10 times, the code size is only that of thesubroutine instructions.On theotherhand, a functioncall takes3clocksand the returninstructiontakes3clockstogetexecuted.So,usingfunctionsaddsaround6clockcycles.Thesubroutinesmightusestackspaceaswellwhencalled,whilethemacrosdonot.

Page 420: ARM Assembly Language: Programming and Architecture
Page 421: ARM Assembly Language: Programming and Architecture

AppendixD:FlowchartsandPseudocodeFlowcharts

If you have taken any previous programming courses, you are probably familiarwith flowcharting. Flowcharts use graphic symbols to represent different types ofprogramoperations.Thesesymbolsareconnected together intoa flowchart to show theflowofexecutionofaprogram.Themorecommonlyusedsymbolsareasfollows:

StartandEndpoints

Start and End points are commonly represented as rounded rectangles or ovalscontainingthewords“Start”or“End”.Smallcircleisanotherwaytoshowthem.

FigureD-1:StartandEndPoints

Decisions

Theconditionsarerepresentedindiamonds.

FigureD-2:aDecisionthatComparesAwithB

Process

Theprocessingstepsarerepresentedusingrectangles.

FigureD-3:aProcessSample

Inputsandoutputs

Inputsandoutputsarerepresentedasparallelogram.

FigureD-4:anOutputSample

Subroutines

Callingsubroutinesarerepresentedasshownbelow

FigureD-5:aSubroutineCallSample

Page 422: ARM Assembly Language: Programming and Architecture

PseudocodeFlowcharting has been standard practice in industry for decades. However, some

findlimitationsinusingflowcharts,suchasthefactthatyoucan’twritemuchinthelittleboxes, and it is hard to get the “big picture” ofwhat the programdoeswithout gettingbogged down in the details. An alternative to using flowcharts is pseudocode, whichinvolves writing brief descriptions of the flow of the code. Figures D-6 through D-10showflowchartsandpseudocodeforcommonlyusedcontrolstructures.

Structured programming uses three basic types of program control structures:sequence, control, and iteration. Sequence is simply executing instructions one afteranother. Figure D-6 shows how sequence can be represented in pseudocode andflowcharts.

FigureD-6:SEQUENCEPseudocodeversusFlowchart

NoteinFiguresD-6throughD-11that“statement”canindicateonestatementoragroupofstatements.

FigureD-7throughD-9showtwocontrolprogrammingstructures:IF-THEN-ELSEandIF-THENinbothpseudocodeandflowcharts.

FigureD-7:IFTHENELSEPseudocodeversusFlowchart

Page 423: ARM Assembly Language: Programming and Architecture

FigureD-8:anIFTHENELSESample

FigureD-9:IFTHENPseudocodeversusFlowchart

FiguresD-10andD-11showtwoiterationcontrolstructures:REPEATUNTILandWHILEDO.Bothstructuresexecuteastatementorgroupofstatementsrepeatedly.Thedifference between them is that the REPEAT UNTIL structure always executes thestatement(s) at least once, and checks the condition after each iteration, whereas theWHILEDOmaynotexecutethestatement(s)atallbecausetheconditionischeckedatthebeginningofeachiteration.

Page 424: ARM Assembly Language: Programming and Architecture

FigureD-10:REPEATUNTILPseudocodeversusFlowchart

FigureD-11:WHILEDOPseudocodeversusFlowchart

ProgramD-1finds thesumofnumbersbetween1and10.Compare theflowchartversus thepseudocode forProgramD-1 (shown inFigureD-12). In this example,moreprogram details are given than one usually finds. For example, this shows steps forinitializingandchangingvalues.Anotherprogrammermaynotincludethesestepsintheflowchart or pseudocode. It is important to remember that the purpose of flowcharts orpseudocode is to show the flow of the program and what the program does, not thespecificAssemblylanguageinstructionsthataccomplishtheprogram’sobjectives.Noticealsothatthepseudocodegivesthesameinformationinamuchmorecompactformthandoestheflowchart.Itisimportanttonotethatsometimespseudocodeiswritteninlayers,sothattheouterlevelorlayershowstheflowoftheprogramandsubsequentlevelsshowmoredetailsofhowtheprogramaccomplishesitsassignedtasks.

ProgramD-1

intmain()

{

intsum=0;

intvalue=1;

do{

sum=sum+value;

value++;

}while(value<=10);

Page 425: ARM Assembly Language: Programming and Architecture

printf(“%d”,sum);

}

FigureD-12:PseudocodeversusFlowchartforProgramD-1

Page 426: ARM Assembly Language: Programming and Architecture
Page 427: ARM Assembly Language: Programming and Architecture

AppendixE:PassingArgumentsintoFunctionsTherearedifferentwaystopassargumentstofunctions.Someofthemare:

·throughregisters

·throughmemoryusingreferences

·usingstack

Page 428: ARM Assembly Language: Programming and Architecture

E.1:PassingargumentsthroughregistersInthefollowingprogramtheBIGGERfunctiongetstwovaluesthroughR0andR1.

AftercomparingR0andR1,itreturnsthebiggervaluethroughR2.

ProgramE-1

AREAOUR_PROG,CODE,READONLY

ENTRY

MOVR0,#5;R0=5

MOVR1,#7;R1=7

BLBIGGER;BIGGER(5,7)

HEREBHERE;stayhere

;=======================================

;BIGGERreturnsthebiggervalue

;Parameters:

;R0andR1:thevaluestobecompared

;Returns:

;R2:containingthebiggervalue

;=======================================

BIGGER

CMPR0,R1

BHIL1;ifR0>R1gotoL1

MOVR2,R1;R2=R1

BXLR;return

L1MOVR2,R0;R2=R0

BXLR;return

END

Thisisafastwayofpassingargumentstothefunction.

Page 429: ARM Assembly Language: Programming and Architecture

E.2:PassingthroughmemoryusingreferencesWe can store the data in memory and pass its address through a register. In the

following program the STR_LENGTH function gets the address of a zero-ended stringthroughR0andreturnsthelengthofthestringthroughR1.

ProgramE-2

AREAOUR_PROG,CODE,READONLY

ENTRY

ADRR0,OUR_STR;R0=addr.ofOUR_STR

BLSTR_LENGTH;STR_LENGTH(&OUR_STR)

HEREBHERE;stayhere

OUR_STRDCB“HELLO!”

ALIGN4

;=======================================

;STR_LENGTHreturnsthelengthofstr

;Parameters:

;R0:addressofthestring

;Returns:

;R0:thelengthofstring

;=======================================

STR_LENGTH

MOVR2,#0

L_BEGIN

LDRBR5,[R0]

CMPR5,#0

BXEQLR;returnif0

ADDR2,R2,#1;R2=R2+1

ADDR0,R0,#1;R0=R0+1

BL_BEGIN

END

Page 430: ARM Assembly Language: Programming and Architecture
Page 431: ARM Assembly Language: Programming and Architecture

E.3:PassingargumentsthroughstackPassing through the stack is a flexible way of passing arguments. To do so, the

argumentsarepushedintothestackjustbeforecallingthefunctionandpoppedoffafterreturning. InProgramE-3, theBIGGER function gets two arguments through the stackandreturnsthebiggervaluethroughthestack.

ProgramE-3

AREAOUR_PROG,CODE,READONLY

ENTRY

;initstackpointer

LDRSP,=(0x40000000+(16*1024))

MOVR0,#5

PUSH{R0};pushArg1

MOVR0,#7

PUSH{R0};pushArg2

BLBIGGER;BIGGER(5,7)

LDRR1,[SP,#0];R1=returnvalue

POP{R0}

POP{R0}

HEREBHERE;stayhere

;=======================================

;BIGGERreturnsthebiggervalue

;Parameters:

;valuestobecompared

;Returns:

;thebiggervalue

;=======================================

BIGGER

LDRR0,[SP,#4];R0=arg1

Page 432: ARM Assembly Language: Programming and Architecture

LDRR1,[SP,#0];R1=arg2

CMPR0,R1

BHIL1;ifR0>R1gotoL1

STRR1,[SP,#0];returnR1

BXLR;return

L1STRR0,[SP,#0];returnR0

BXLR;return

END

Thismethodofpassingargumentsareusedinx86computers.

Page 433: ARM Assembly Language: Programming and Architecture

E.4:AAPCS(ARMApplicationProcedureCallStandard)TheAAPCSprovidesanstandardforimplementingthefunctionsandthefunction

callsso that thecodesmadebydifferentcompilersanddifferentprogrammerscanworkwitheachother.Someoftherulesofthestandardare:

·TheargumentsmustbesentthroughR0toR4,respectively.

·ThereturnvaluemustbesentbackthroughR0andR1.

· The functions canuseR5 toR8 for their internal calculations.But theirvalues must be retrieved before returning. To do so, we push the registersbeforeusingthemandpopthembeforereturningfromthefunction.

·ThestackmustbeusedasFullDescending

InProgramE-1mostoftheaboverulesareconsidered.ButthereturnvaluemustbeinR0insteadofR2.

InProgramE-4theaboverulesareconsidered.

ProgramE-4

AREAOUR_PROG,CODE,READONLY

ENTRY

;initstackpointer

;(changeitaccordingtoyourchip)

LDRSP,=(0x40000000+(16*1024))

MOVR0,#20

BLDELAY;DELAY(20)

HEREBHERE;stayhere

;=======================================

;DELAYwaitsforawhile

;Parameters:

;R0:theamountofwait

;Returns:

;none

;=======================================

DELAY

Page 434: ARM Assembly Language: Programming and Architecture

CMPR0,#0

BXEQLR;returnifzero

PUSH{R5};saveR5

LDRR5,=5000000;R5=5000000

L1SUBSR5,R5,#1;R5=R5-1

BNEL1;gotoL1ifR5isnotzero

POP{R5};restoreR5

BXLR;return

END

Moreinformation

FormoreinformationaboutAAPCSseethefollowingarticleorsearch“AAPCS”ontheInternet:

http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042e/IHI0042E_aapcs.pdf

Page 435: ARM Assembly Language: Programming and Architecture
Page 436: ARM Assembly Language: Programming and Architecture

AppendixF:ASCIICodes