computer architecture and assembly language · 2020-04-11 · introductionproblemdefinition...

517
Computer Architecture and Assembly Language Gabriel Laskar EPITA 2015

Upload: others

Post on 07-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Computer Architecture and Assembly Language

Gabriel Laskar

EPITA

2015

Page 2: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

License

I Copyright c© 2004-2005, ACU, Benoit PerrotI Copyright c© 2004-2008, Alexandre BecouletI Copyright c© 2009-2013, Nicolas PouillonI Copyright c© 2014, Joël PorquetI Copyright c© 2015, Gabriel Laskar

Permission is granted to copy, distributeand/or modify this document under the termsof the GNU Free Documentation License, Version1.2 or any later version published by the FreeSoftware Foundation; with the Invariant Sectionsbeing just ‘‘Copying this document’’, noFront-Cover Texts, and no Back-Cover Texts.

Page 3: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Introduction

Part I

Introduction

Gabriel Laskar (EPITA) CAAL 2015 3 / 378

Page 4: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Introduction Problem definition

1: Introduction

Problem definition

Outline

Gabriel Laskar (EPITA) CAAL 2015 4 / 378

Page 5: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Introduction Problem definition

What are we trying to learn?Computer Architecture

What is in the hardware?I A bit of history of computers, current machinesI Concepts and conventions: processing, memory, communication,

optimization

How does a machine run code?I Program execution modelI Memory mapping, OS support

Gabriel Laskar (EPITA) CAAL 2015 5 / 378

Page 6: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Introduction Problem definition

What are we trying to learn?Assembly Language

How to “talk” with the machine directly?I Mechanisms involvedI Assembly language structure and usageI Low-level assembly language featuresI C inline assembly

Gabriel Laskar (EPITA) CAAL 2015 6 / 378

Page 7: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Introduction Problem definition

Who do I talk to?

I System gurusI Low-level enthusiasts

I ProgrammersI Wise managers

Gabriel Laskar (EPITA) CAAL 2015 7 / 378

Page 8: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Introduction Problem definition

Who do I talk to?

I System gurusI Low-level enthusiastsI Programmers

I Wise managers

Gabriel Laskar (EPITA) CAAL 2015 7 / 378

Page 9: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Introduction Problem definition

Who do I talk to?

I System gurusI Low-level enthusiastsI Programmers

C/C++

I Wise managers

Gabriel Laskar (EPITA) CAAL 2015 7 / 378

Page 10: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Introduction Problem definition

Who do I talk to?

I System gurusI Low-level enthusiastsI Programmers

C/C++, Objective-C

I Wise managers

Gabriel Laskar (EPITA) CAAL 2015 7 / 378

Page 11: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Introduction Problem definition

Who do I talk to?

I System gurusI Low-level enthusiastsI Programmers

C/C++, Objective-C, C#

I Wise managers

Gabriel Laskar (EPITA) CAAL 2015 7 / 378

Page 12: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Introduction Problem definition

Who do I talk to?

I System gurusI Low-level enthusiastsI Programmers

C/C++, Objective-C, C#, Java

I Wise managers

Gabriel Laskar (EPITA) CAAL 2015 7 / 378

Page 13: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Introduction Problem definition

Who do I talk to?

I System gurusI Low-level enthusiastsI Programmers

C/C++, Objective-C, C#, Java, JS/AS

I Wise managers

Gabriel Laskar (EPITA) CAAL 2015 7 / 378

Page 14: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Introduction Problem definition

Who do I talk to?

I System gurusI Low-level enthusiastsI Programmers at any level:

(C/C++, Objective-C, C#, Java, JS/AS, etc.)

I Wise managers

Gabriel Laskar (EPITA) CAAL 2015 7 / 378

Page 15: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Introduction Problem definition

Who do I talk to?

I System gurusI Low-level enthusiastsI Programmers at any level:

(C/C++, Objective-C, C#, Java, JS/AS, etc.)I Wise managers

Gabriel Laskar (EPITA) CAAL 2015 7 / 378

Page 16: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Introduction Outline

1: Introduction

Problem definition

Outline

Gabriel Laskar (EPITA) CAAL 2015 8 / 378

Page 17: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Introduction Outline

Course outline

I Processor architectureI MemoryI Memory mappingI Execution flowI Object file formatsI Assembly programmingI Focus on x86I Focus on RISC processorsI CPU-aware optimizationsI Multi-/Many-core, heterogeneous systems

Gabriel Laskar (EPITA) CAAL 2015 9 / 378

Page 18: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture

Part II

Processor architecture

Gabriel Laskar (EPITA) CAAL 2015 10 / 378

Page 19: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Overview

2: Processor architecture

Overview

Inside the processor

Processor units

Instructions

Instruction flow

Pipeline processor

CISC & RISC architectures

Gabriel Laskar (EPITA) CAAL 2015 11 / 378

Page 20: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Overview

What a processor is...

A processor must be able to perform the following basic tasks:I Execute instructionsI Read operandsI Store results

It needs several basic units to perform those tasks:I A control unitI An arithmetic and logical unit (ALU)I A register bank

Let’s design it!

Gabriel Laskar (EPITA) CAAL 2015 12 / 378

Page 21: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Overview

What a processor is...

A processor must be able to perform the following basic tasks:I Execute instructionsI Read operandsI Store results

It needs several basic units to perform those tasks:I A control unitI An arithmetic and logical unit (ALU)I A register bank

Let’s design it!

Gabriel Laskar (EPITA) CAAL 2015 12 / 378

Page 22: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Overview

Basic architecture

Control Unit

ALU

R1

R0

= 0123

= 0100

= 0023

R5

R4

R3

R2

Reg

iste

rs

Gabriel Laskar (EPITA) CAAL 2015 13 / 378

Page 23: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Overview

Basic architecture (2)

In this model, the system state is entirely contained in the processor.I This might be sufficient for a very basic processorI More features could be leveraged by adding registers or program steps

Unfortunately,I Internal memory is expensive and hard to designI There is no communicationI Updating the program may not be easy

We need an access to memory, external devices, etc.

Gabriel Laskar (EPITA) CAAL 2015 14 / 378

Page 24: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Overview

Basic architecture (2)

In this model, the system state is entirely contained in the processor.I This might be sufficient for a very basic processorI More features could be leveraged by adding registers or program steps

Unfortunately,I Internal memory is expensive and hard to designI There is no communicationI Updating the program may not be easy

We need an access to memory, external devices, etc.

Gabriel Laskar (EPITA) CAAL 2015 14 / 378

Page 25: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Overview

Basic architecture (2)

In this model, the system state is entirely contained in the processor.I This might be sufficient for a very basic processorI More features could be leveraged by adding registers or program steps

Unfortunately,I Internal memory is expensive and hard to designI There is no communicationI Updating the program may not be easy

We need an access to memory, external devices, etc.

Gabriel Laskar (EPITA) CAAL 2015 14 / 378

Page 26: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Overview

Revised processor model

A processor must be able to perform the following basic tasks:I Fetch instructions from an external entity and understand them (fetch

and decode)I Execute instructionsI Store results to registers or external memory

It needs several basic units to perform those tasks:I A control unitI An arithmetic and logical unit (ALU)I A register bankI A program memory accessI A data memory access

Gabriel Laskar (EPITA) CAAL 2015 15 / 378

Page 27: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Overview

Revised processor model

A processor must be able to perform the following basic tasks:I Fetch instructions from an external entity and understand them (fetch

and decode)I Execute instructionsI Store results to registers or external memory

It needs several basic units to perform those tasks:I A control unitI An arithmetic and logical unit (ALU)I A register bankI A program memory accessI A data memory access

Gabriel Laskar (EPITA) CAAL 2015 15 / 378

Page 28: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Overview

Revised processor model (2)

Inst

ruct

ion

@

Co

de

Co

mm

and

R D

ata

Dat

a @

W D

ata

Control Unit

ALU

R1

R0

= 0123

= 0100

= 0023

R5

R4

R3

R2

Reg

iste

rs

Gabriel Laskar (EPITA) CAAL 2015 16 / 378

Page 29: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Overview

Processor physical layout

A processor is composed of many different units:I Caches, MMUI Integer unit, Control unit, Floating-point unit

Each unit is:I implemented as an hardware componentI made of switchable parts (transistors)

In old processors:I Units used to be independent chipsI Some were even optional “coprocessors”

Today, processors are embedded on a single chip.

Gabriel Laskar (EPITA) CAAL 2015 17 / 378

Page 30: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Inside the processor

2: Processor architecture

Overview

Inside the processor

Processor units

Instructions

Instruction flow

Pipeline processor

CISC & RISC architectures

Gabriel Laskar (EPITA) CAAL 2015 18 / 378

Page 31: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Inside the processor

Processor package

Gabriel Laskar (EPITA) CAAL 2015 19 / 378

Page 32: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Inside the processor

Processor physical layout

Gabriel Laskar (EPITA) CAAL 2015 20 / 378

Page 33: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Inside the processor

Processor physical layout

Gabriel Laskar (EPITA) CAAL 2015 21 / 378

Page 34: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Inside the processor

Processor physical layoutTransistor details

Gabriel Laskar (EPITA) CAAL 2015 22 / 378

Page 35: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Processor units

2: Processor architecture

Overview

Inside the processor

Processor units

Instructions

Instruction flow

Pipeline processor

CISC & RISC architectures

Gabriel Laskar (EPITA) CAAL 2015 23 / 378

Page 36: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Processor units

Units

I Control unit fetches anddecodes the instruction

I Registers gives the dataI ALU implements the

operationI Some instructions access

external data

Inst

ruct

ion

@

Co

de

Co

mm

and

R D

ata

Dat

a @

W D

ata

Control Unit

ALU

R1

R0

= 0123

= 0100

= 0023

R5

R4

R3

R2

Reg

iste

rs

Gabriel Laskar (EPITA) CAAL 2015 24 / 378

Page 37: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Processor units

RegistersMay be seen as variables located inside the processor

I 8, 16, 32, 64, 128, ... -bits largeI General-purpose registers:

I integer (int)I floating point (float, double)

I Specialized registers:I flags

I Zero,I Negative,I Carry,I Overflow,I etc.

I systemI Mode,I IRQ masking,I etc.

Gabriel Laskar (EPITA) CAAL 2015 25 / 378

Page 38: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Processor units

ALU: Arithmetic and Logical UnitAn unit without registers!

I Logical operationsI AND, OR, XOR, NOT, NOR

I Arithmetic operationsI addition, subtraction, multiplication, division

I ShiftsI Compares

Division is not possible without registers!

Gabriel Laskar (EPITA) CAAL 2015 26 / 378

Page 39: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Instructions

2: Processor architecture

Overview

Inside the processor

Processor units

Instructions

Instruction flow

Pipeline processor

CISC & RISC architectures

Gabriel Laskar (EPITA) CAAL 2015 27 / 378

Page 40: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Instructions

Instruction types

There are different instruction types:I Arithmetic and logical operationsI Control instructionsI Memory access instructions

Gabriel Laskar (EPITA) CAAL 2015 28 / 378

Page 41: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Instructions

Instructions classification

Flynn’s taxonomy:I SISD: Single Instruction, Single Data

Classical Von Neumann architectureI SIMD: Single Instruction, Multiple Data

Vectorial computersI MIMD: Multiple Instruction, Multiple Data

Multiprocessor computers

Gabriel Laskar (EPITA) CAAL 2015 29 / 378

Page 42: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Instruction flow

2: Processor architecture

Overview

Inside the processor

Processor units

Instructions

Instruction flow

Pipeline processor

CISC & RISC architectures

Gabriel Laskar (EPITA) CAAL 2015 30 / 378

Page 43: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Instruction flow

Microprogrammed processorInstruction flow

Processors may use a finite state machine or micro ops to process eachinstruction step (Fetch, Decode, Execute, Memory access, Register writeback, etc.)

A basic processor needs several clock cycles to process all steps of thecurrent instruction before starting the next one.

Fetch

PC

ad

dre

ss

sub

jmp

ld

Gabriel Laskar (EPITA) CAAL 2015 31 / 378

Page 44: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Instruction flow

Microprogrammed processorInstruction flow

Processors may use a finite state machine or micro ops to process eachinstruction step (Fetch, Decode, Execute, Memory access, Register writeback, etc.)

A basic processor needs several clock cycles to process all steps of thecurrent instruction before starting the next one.

DecodFetch

PC

ad

dre

ss

sub

jmp

ld

Gabriel Laskar (EPITA) CAAL 2015 31 / 378

Page 45: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Instruction flow

Microprogrammed processorInstruction flow

Processors may use a finite state machine or micro ops to process eachinstruction step (Fetch, Decode, Execute, Memory access, Register writeback, etc.)

A basic processor needs several clock cycles to process all steps of thecurrent instruction before starting the next one.

ExecDecodFetch

PC

ad

dre

ss

sub

jmp

ld

Gabriel Laskar (EPITA) CAAL 2015 31 / 378

Page 46: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Instruction flow

Microprogrammed processorInstruction flow

Processors may use a finite state machine or micro ops to process eachinstruction step (Fetch, Decode, Execute, Memory access, Register writeback, etc.)

A basic processor needs several clock cycles to process all steps of thecurrent instruction before starting the next one.

WriteExecDecodFetch

PC

ad

dre

ss

sub

jmp

ld

Gabriel Laskar (EPITA) CAAL 2015 31 / 378

Page 47: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Instruction flow

Microprogrammed processorInstruction flow

Processors may use a finite state machine or micro ops to process eachinstruction step (Fetch, Decode, Execute, Memory access, Register writeback, etc.)

A basic processor needs several clock cycles to process all steps of thecurrent instruction before starting the next one.

Fetch

WriteExecDecodFetch

PC

ad

dre

ss

sub

jmp

ld

Gabriel Laskar (EPITA) CAAL 2015 31 / 378

Page 48: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Instruction flow

Microprogrammed processorInstruction flow

Processors may use a finite state machine or micro ops to process eachinstruction step (Fetch, Decode, Execute, Memory access, Register writeback, etc.)

A basic processor needs several clock cycles to process all steps of thecurrent instruction before starting the next one.

DecodFetch

WriteExecDecodFetch

PC

ad

dre

ss

sub

jmp

ld

Gabriel Laskar (EPITA) CAAL 2015 31 / 378

Page 49: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Instruction flow

Microprogrammed processorInstruction flow

Processors may use a finite state machine or micro ops to process eachinstruction step (Fetch, Decode, Execute, Memory access, Register writeback, etc.)

A basic processor needs several clock cycles to process all steps of thecurrent instruction before starting the next one.

ExecDecodFetch

WriteExecDecodFetch

PC

ad

dre

ss

sub

jmp

ld

Gabriel Laskar (EPITA) CAAL 2015 31 / 378

Page 50: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Instruction flow

Microprogrammed processorInstruction flow

Processors may use a finite state machine or micro ops to process eachinstruction step (Fetch, Decode, Execute, Memory access, Register writeback, etc.)

A basic processor needs several clock cycles to process all steps of thecurrent instruction before starting the next one.

Fetch

ExecDecodFetch

WriteExecDecodFetch

PC

ad

dre

ss

sub

jmp

ld

Gabriel Laskar (EPITA) CAAL 2015 31 / 378

Page 51: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Instruction flow

Microprogrammed processorInstruction flow

Processors may use a finite state machine or micro ops to process eachinstruction step (Fetch, Decode, Execute, Memory access, Register writeback, etc.)

A basic processor needs several clock cycles to process all steps of thecurrent instruction before starting the next one.

DecodFetch

ExecDecodFetch

WriteExecDecodFetch

PC

ad

dre

ss

sub

jmp

ld

Gabriel Laskar (EPITA) CAAL 2015 31 / 378

Page 52: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Instruction flow

Microprogrammed processorInstruction flow

Processors may use a finite state machine or micro ops to process eachinstruction step (Fetch, Decode, Execute, Memory access, Register writeback, etc.)

A basic processor needs several clock cycles to process all steps of thecurrent instruction before starting the next one.

ExecDecodFetch

ExecDecodFetch

WriteExecDecodFetch

PC

ad

dre

ss

sub

jmp

ld

Gabriel Laskar (EPITA) CAAL 2015 31 / 378

Page 53: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Instruction flow

Microprogrammed processorInstruction flow

Processors may use a finite state machine or micro ops to process eachinstruction step (Fetch, Decode, Execute, Memory access, Register writeback, etc.)

A basic processor needs several clock cycles to process all steps of thecurrent instruction before starting the next one.

MemExecDecodFetch

ExecDecodFetch

WriteExecDecodFetch

PC

ad

dre

ss

sub

jmp

ld

Gabriel Laskar (EPITA) CAAL 2015 31 / 378

Page 54: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Instruction flow

Microprogrammed processorInstruction flow

Processors may use a finite state machine or micro ops to process eachinstruction step (Fetch, Decode, Execute, Memory access, Register writeback, etc.)

A basic processor needs several clock cycles to process all steps of thecurrent instruction before starting the next one.

WriteMemExecDecodFetch

ExecDecodFetch

WriteExecDecodFetch

PC

ad

dre

ss

sub

jmp

ld

Gabriel Laskar (EPITA) CAAL 2015 31 / 378

Page 55: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Instruction flow

Microprogrammed processorPro/cons

Cons:I SlowI The more complex the instructions are, the longer they take to get

processedI Most of the hardware is used only once for each instructionI Most of the hardware is unused most of the time

Pros:I Easy to implementI SmallI New instructions can be added just by adding new steps

Gabriel Laskar (EPITA) CAAL 2015 32 / 378

Page 56: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Instruction flow

Microprogrammed processorPro/cons

Cons:I SlowI The more complex the instructions are, the longer they take to get

processedI Most of the hardware is used only once for each instructionI Most of the hardware is unused most of the time

Pros:I Easy to implementI SmallI New instructions can be added just by adding new steps

Gabriel Laskar (EPITA) CAAL 2015 32 / 378

Page 57: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Pipeline processor

2: Processor architecture

Overview

Inside the processor

Processor units

Instructions

Instruction flow

Pipeline processor

CISC & RISC architectures

Gabriel Laskar (EPITA) CAAL 2015 33 / 378

Page 58: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Pipeline processor

Pipelined processorInstruction flowA pipeline architecture enables the parallel execution of severalinstructions:

I Split the execution of each instruction in several stepsI Each step performs an elementary operationI Each step is associated to a specific part of the hardwareI All parts of the hardware work in parallel

Fetch Decode Exec Memory

Instruction

memory

access

Data

memory

access

Writeback

Gabriel Laskar (EPITA) CAAL 2015 34 / 378

Page 59: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Pipeline processor

PipelineFlow

Pipeline timeline

Inst

ruct

ion

seq

uen

ce

Fetch?

Gabriel Laskar (EPITA) CAAL 2015 35 / 378

Page 60: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Pipeline processor

PipelineFlow

sub

Pipeline timeline

Inst

ruct

ion

seq

uen

ce

Decod

Fetch?

Fetch

Gabriel Laskar (EPITA) CAAL 2015 35 / 378

Page 61: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Pipeline processor

PipelineFlow

not

sub

Pipeline timeline

Inst

ruct

ion

seq

uen

ce

Fetch

Decod

Exec

?

Decod

Fetch

Fetch

Gabriel Laskar (EPITA) CAAL 2015 35 / 378

Page 62: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Pipeline processor

PipelineFlow

add

not

sub

Pipeline timeline

Inst

ruct

ion

seq

uen

ce

Exec

Decod

Fetch

Mem

?

Fetch

Decod

ExecDecod

Fetch

Fetch

Gabriel Laskar (EPITA) CAAL 2015 35 / 378

Page 63: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Pipeline processor

PipelineFlow

Exec

Decod

Fetch

Mem

Write

?

jmp

add

not

sub

Pipeline timeline

Inst

ruct

ion

seq

uen

ce

Exec

Decod

Fetch

Mem

Fetch

Decod

ExecDecod

Fetch

Fetch

Gabriel Laskar (EPITA) CAAL 2015 35 / 378

Page 64: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Pipeline processor

PipelineFlow

sub

Exec

Decod

Fetch

Mem

Write

?

Exec

Decod

Fetch

Mem

Write

jmp

add

not

sub

Pipeline timeline

Inst

ruct

ion

seq

uen

ce

Exec

Decod

Fetch

Mem

Fetch

Decod

ExecDecod

Fetch

Fetch

Gabriel Laskar (EPITA) CAAL 2015 35 / 378

Page 65: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Pipeline processor

PipelineFlow

sub

Fetch

Decod

Exec

Write

Mem

?

Exec

Decod

Fetch

Mem

Write

Exec

Decod

Fetch

Mem

Write

add

jmp

add

not

sub

Pipeline timeline

Inst

ruct

ion

seq

uen

ce

Exec

Decod

Fetch

Mem

Fetch

Decod

ExecDecod

Fetch

Fetch

Gabriel Laskar (EPITA) CAAL 2015 35 / 378

Page 66: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Pipeline processor

Speeding up the pipeline

Current processors extensively use the pipeline architecture to acceleratethe execution of instructions.

Because a pipeline architecture works in parallel, the slowest step delaydetermines the pipeline global cycle delay (and working frequency).

8 ns18 ns10 ns

Step 1 Step 2 Step 3

Gabriel Laskar (EPITA) CAAL 2015 36 / 378

Page 67: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture Pipeline processor

Speeding up the pipelineSplitting operations in shorter steps enables the processor frequency toincrease.

8 ns18 ns10 ns

Step 1 Step 2 Step 3

Step 1

10 ns 9 ns

Step 2.1 Step 2.2 Step 3

8 ns9 ns

Gabriel Laskar (EPITA) CAAL 2015 37 / 378

Page 68: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture CISC & RISC architectures

2: Processor architecture

Overview

Inside the processor

Processor units

Instructions

Instruction flow

Pipeline processor

CISC & RISC architectures

Gabriel Laskar (EPITA) CAAL 2015 38 / 378

Page 69: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture CISC & RISC architectures

Instruction-set classification

Based on internal architecture and instructions formats, processorarchitectures may be classified in two groups:

I Complex Instruction Set Computer (CISC)I Reduced Instruction Set Computer (RISC)

Early processor architectures were mostly CISC-based: z80, Intel x86,Motorola 68000, etc.

More recent designs are rather RISC-based: MIPS, Sparc, Alpha,PowerPC, ARM, etc.

Gabriel Laskar (EPITA) CAAL 2015 39 / 378

Page 70: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture CISC & RISC architectures

RISCCharacteristics

Pros:I Simple instructionsI Fixed-length instructionsI Decoding instructions requires simple hardware

Cons:I Programs are longer as they need more instructionsI Optimization is harder, compilers need to be smarter

Sometimes said as “Reject Important Stuff into Compiler”

Gabriel Laskar (EPITA) CAAL 2015 40 / 378

Page 71: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture CISC & RISC architectures

RISCInstruction example

sub %g1, %g2, %g3

0x200x86 0x40 0x02

00010000000000000010001000001110

%g3 sub %g1 %g2

source unused sourceformat destination opcode

Gabriel Laskar (EPITA) CAAL 2015 41 / 378

Page 72: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture CISC & RISC architectures

CISCCharacteristics

Pros:I Lots of instructions and opcodesI A single instruction can perform complex operationsI Assembly programs are easier and shorter to writeI Code compression ratio is good

Cons:I Binary instruction format has variable lengthI It requires more complex hardware and high frequencies are harder to

achieve

Modern processors often internally translate the CISC code to RISCmicrocode

Gabriel Laskar (EPITA) CAAL 2015 42 / 378

Page 73: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Processor architecture CISC & RISC architectures

CISCInstruction example

opcode opc2 immediate memorydisplacement

modr/m

Gabriel Laskar (EPITA) CAAL 2015 43 / 378

Page 74: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory

Part III

Memory

Gabriel Laskar (EPITA) CAAL 2015 44 / 378

Page 75: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Memories

3: Memory

MemoriesMemory typesAccess examples

Memory accessing modes

Alignment

Endianness

Caches

Gabriel Laskar (EPITA) CAAL 2015 45 / 378

Page 76: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Memories Memory types

3: Memory

MemoriesMemory typesAccess examples

Memory accessing modes

Alignment

Endianness

Caches

Gabriel Laskar (EPITA) CAAL 2015 46 / 378

Page 77: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Memories Memory types

Reasons to access memory

How does the memory work with the processor?I Memory is used to fetch instructions,I Memory is used to access data.I There may be one unique memory, or two.

I If there is one memory, instruction / data accesses must besequencialized

I If there are two, code cannot be accessed as dataConventional names:1 Von Neuman architecture2 Harvard architecture

Gabriel Laskar (EPITA) CAAL 2015 47 / 378

Page 78: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Memories Access examples

3: Memory

MemoriesMemory typesAccess examples

Memory accessing modes

Alignment

Endianness

Caches

Gabriel Laskar (EPITA) CAAL 2015 48 / 378

Page 79: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Memories Access examples

Instruction fetchThe processor needs an instruction to process

Processor

Memory

Gabriel Laskar (EPITA) CAAL 2015 49 / 378

Page 80: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Memories Access examples

Instruction fetchThe processor needs an instruction to process

Inst

ruct

ion @

Processor

Memory

Gabriel Laskar (EPITA) CAAL 2015 49 / 378

Page 81: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Memories Access examples

Instruction fetchThe processor needs an instruction to process

Code

Inst

ruct

ion @

Processor

Memory

Gabriel Laskar (EPITA) CAAL 2015 49 / 378

Page 82: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Memories Access examples

Data accessloadLoad the content of the memory cells pointed to by %g1 into %g2 register.

Processor

Memory

ld [%g1], %g2

Gabriel Laskar (EPITA) CAAL 2015 50 / 378

Page 83: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Memories Access examples

Data accessloadLoad the content of the memory cells pointed to by %g1 into %g2 register.

Com

man

d

Dat

a @

Processor

Memory

ld [%g1], %g2

Gabriel Laskar (EPITA) CAAL 2015 50 / 378

Page 84: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Memories Access examples

Data accessloadLoad the content of the memory cells pointed to by %g1 into %g2 register.

R D

ata

Com

man

d

Dat

a @

Processor

Memory

ld [%g1], %g2

Gabriel Laskar (EPITA) CAAL 2015 50 / 378

Page 85: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Memories Access examples

Data accessstoreStores the content of %g2 register into the memory cells pointed to by %g1.

Processor

Memory

st %g2, [%g1]

Gabriel Laskar (EPITA) CAAL 2015 51 / 378

Page 86: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Memories Access examples

Data accessstoreStores the content of %g2 register into the memory cells pointed to by %g1.

W D

ata

Com

man

d

Dat

a @

Processor

Memory

st %g2, [%g1]

Gabriel Laskar (EPITA) CAAL 2015 51 / 378

Page 87: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Memory accessing modes

3: Memory

Memories

Memory accessing modesImmediate addressingAbsolute addressingRegister indirect addressingComplex addressing

Alignment

Endianness

Caches

Gabriel Laskar (EPITA) CAAL 2015 52 / 378

Page 88: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Memory accessing modes Immediate addressing

3: Memory

Memories

Memory accessing modesImmediate addressingAbsolute addressingRegister indirect addressingComplex addressing

Alignment

Endianness

Caches

Gabriel Laskar (EPITA) CAAL 2015 53 / 378

Page 89: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Memory accessing modes Immediate addressing

Immediate addressing

I The value of data is directly stored in the instructionI No memory access needed to get the value

In C language:int a, b = ...;

a = b + 0x831;

In assembly language:add %g1, 0x831, %g2

Gabriel Laskar (EPITA) CAAL 2015 54 / 378

Page 90: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Memory accessing modes Immediate addressing

Immediate addressing

I The value of data is directly stored in the instructionI No memory access needed to get the value

In C language:int a, b = ...;

a = b + 0x831;

In assembly language:add %g1, 0x831, %g2

Gabriel Laskar (EPITA) CAAL 2015 54 / 378

Page 91: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Memory accessing modes Immediate addressing

Immediate addressing

I The value of data is directly stored in the instructionI No memory access needed to get the value

In C language:int a, b = ...;

a = b + 0x831;

In assembly language:add %g1, 0x831, %g2

Gabriel Laskar (EPITA) CAAL 2015 54 / 378

Page 92: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Memory accessing modes Immediate addressing

Immediate addressingSparc instruction details

0x200x84

000010001000001010

%g2 sub %g1

sourceformat destination opcode

sub %g1, 0x831, %g2

0x68 0x31

0x831

1 0 1000 0011 0001

immediate

Gabriel Laskar (EPITA) CAAL 2015 55 / 378

Page 93: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Memory accessing modes Absolute addressing

3: Memory

Memories

Memory accessing modesImmediate addressingAbsolute addressingRegister indirect addressingComplex addressing

Alignment

Endianness

Caches

Gabriel Laskar (EPITA) CAAL 2015 56 / 378

Page 94: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Memory accessing modes Absolute addressing

Absolute addressing

I The address of the data is stored in the instructionI A memory access is needed to get the value

In C language:int a = *(int*)0x830;

In assembly language:ld [0x830], %g1

Gabriel Laskar (EPITA) CAAL 2015 57 / 378

Page 95: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Memory accessing modes Absolute addressing

Absolute addressing

I The address of the data is stored in the instructionI A memory access is needed to get the value

In C language:int a = *(int*)0x830;

In assembly language:ld [0x830], %g1

Gabriel Laskar (EPITA) CAAL 2015 57 / 378

Page 96: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Memory accessing modes Absolute addressing

Absolute addressing

I The address of the data is stored in the instructionI A memory access is needed to get the value

In C language:int a = *(int*)0x830;

In assembly language:ld [0x830], %g1

Gabriel Laskar (EPITA) CAAL 2015 57 / 378

Page 97: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Memory accessing modes Register indirect addressing

3: Memory

Memories

Memory accessing modesImmediate addressingAbsolute addressingRegister indirect addressingComplex addressing

Alignment

Endianness

Caches

Gabriel Laskar (EPITA) CAAL 2015 58 / 378

Page 98: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Memory accessing modes Register indirect addressing

Register indirect addressing

I The address of the data is stored in a registerI A memory access is needed to get the value

In C language:int a, *b = ...;

a = *b;

In assembly language:ld [%g2], %g1

Gabriel Laskar (EPITA) CAAL 2015 59 / 378

Page 99: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Memory accessing modes Register indirect addressing

Register indirect addressing

I The address of the data is stored in a registerI A memory access is needed to get the value

In C language:int a, *b = ...;

a = *b;

In assembly language:ld [%g2], %g1

Gabriel Laskar (EPITA) CAAL 2015 59 / 378

Page 100: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Memory accessing modes Register indirect addressing

Register indirect addressing

I The address of the data is stored in a registerI A memory access is needed to get the value

In C language:int a, *b = ...;

a = *b;

In assembly language:ld [%g2], %g1

Gabriel Laskar (EPITA) CAAL 2015 59 / 378

Page 101: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Memory accessing modes Complex addressing

3: Memory

Memories

Memory accessing modesImmediate addressingAbsolute addressingRegister indirect addressingComplex addressing

Alignment

Endianness

Caches

Gabriel Laskar (EPITA) CAAL 2015 60 / 378

Page 102: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Memory accessing modes Complex addressing

Complex addressing

I Register indirect with base registerI Register indirect with offsetI And many others...

Assembly example:ld [%g2 + 0x124], %g1ld [%g2 + %g3], %g1

Gabriel Laskar (EPITA) CAAL 2015 61 / 378

Page 103: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Alignment

3: Memory

Memories

Memory accessing modes

AlignmentMemory access alignmentStructure alignmentpacked structures

Endianness

Caches

Gabriel Laskar (EPITA) CAAL 2015 62 / 378

Page 104: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Alignment Memory access alignment

3: Memory

Memories

Memory accessing modes

AlignmentMemory access alignmentStructure alignmentpacked structures

Endianness

Caches

Gabriel Laskar (EPITA) CAAL 2015 63 / 378

Page 105: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Alignment Memory access alignment

Definition

Data access alignment is solely about considered data type width.I 32-bit integer access is aligned for addresses multiple of 4I 16-bit integer access is aligned for even addressesI 8-bit integer (char) accesses are always aligned!

Think about address % sizeof(type)

Gabriel Laskar (EPITA) CAAL 2015 64 / 378

Page 106: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Alignment Memory access alignment

Access alignment

0x00

0x10

0x04

0x08

0x0c

Bus Width

Data read

0x6b0x5a

0xef0xcd 0x9f 0x8c

0xab0x89

0x56 0x780x340x01Address

Gabriel Laskar (EPITA) CAAL 2015 65 / 378

Page 107: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Alignment Structure alignment

3: Memory

Memories

Memory accessing modes

AlignmentMemory access alignmentStructure alignmentpacked structures

Endianness

Caches

Gabriel Laskar (EPITA) CAAL 2015 66 / 378

Page 108: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Alignment Structure alignment

Structure alignment

In a structure:I Fields must be in declaration orderI Fields must all be aligned

Data alignment does not depend on architecture bus width

15 160 32

struct bit_packed_s

{

int a;

short b;

short c;};

c

a

b

Gabriel Laskar (EPITA) CAAL 2015 67 / 378

Page 109: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Alignment Structure alignment

PaddingSometimes, consecutive fields in declaration cannot be consecutive andaligned in memory

I Compilers put structure fields at aligned offsetsI Alignment may add unused padding bytes between fields

15 160 32

struct example_aligned_s

{

int b;

short c;};

char a;

a

b

c

padding bytes

Gabriel Laskar (EPITA) CAAL 2015 68 / 378

Page 110: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Alignment Structure alignment

PaddingSometimes, consecutive fields in declaration cannot be consecutive andaligned in memory

I Compilers put structure fields at aligned offsetsI Alignment may add unused padding bytes between fields

15 160 32

struct example_aligned_s

{

int b;

short c;};

char a;

a

b

c

padding bytesGabriel Laskar (EPITA) CAAL 2015 68 / 378

Page 111: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Alignment packed structures

3: Memory

Memories

Memory accessing modes

AlignmentMemory access alignmentStructure alignmentpacked structures

Endianness

Caches

Gabriel Laskar (EPITA) CAAL 2015 69 / 378

Page 112: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Alignment packed structures

Basic packing

I Fields alignment can be ignored by compiler, on requestI Few architectures are able to access non aligned fields directlyI If non-native, unaligned access is emulated with multiple memory

accesses, shifts, ORs, etc.

struct example_packed_s

{

int b;

short c;

char a;

}

__attribute__((packed));

15 160 32

a b

c

Gabriel Laskar (EPITA) CAAL 2015 70 / 378

Page 113: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Alignment packed structures

Low-level packingI Packing can even be done at bit level!I Compiler will handle shifts and masksI Can be mixed with unionI Powerful for matching existing protocols

struct bit_packed_s

{

int a:17;

int b:5;

int c:12;

int d:16;

int e:14;

}

__attribute__((packed));

a

d

cb

e

15 160 32

Beware of endiannessGabriel Laskar (EPITA) CAAL 2015 71 / 378

Page 114: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Endianness

3: Memory

Memories

Memory accessing modes

Alignment

EndiannessDifferent designs, different paradigmsExplainationDemo

Caches

Gabriel Laskar (EPITA) CAAL 2015 72 / 378

Page 115: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Endianness Different designs, different paradigms

3: Memory

Memories

Memory accessing modes

Alignment

EndiannessDifferent designs, different paradigmsExplainationDemo

Caches

Gabriel Laskar (EPITA) CAAL 2015 73 / 378

Page 116: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Endianness Different designs, different paradigms

Endianness

A data string represented with multiple bytes must be stored in memory.Similarly to written language, these bytes may be written “left-to-right” or“right-to-left”.

I Big-EndianI Little-EndianI Other endian modes

Gabriel Laskar (EPITA) CAAL 2015 74 / 378

Page 117: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Endianness Explaination

3: Memory

Memories

Memory accessing modes

Alignment

EndiannessDifferent designs, different paradigmsExplainationDemo

Caches

Gabriel Laskar (EPITA) CAAL 2015 75 / 378

Page 118: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Endianness Explaination

EndiannessMathematical reference

With a base b, a natural N may be decomposed in digits dk .If we naturally write it:N = dndn−1...dk ...d1d0N = dn · bn + dn−1 · bn−1 + · · ·+ dk · bk + · · ·+ d1 · b1 + d0 · b0

N = 4810310 = bbe716I With b = 10: N = 4 · 104 + 8 · 103 + 1 · 102 + 0 · 101 + 3 · 100I With b = 16: N = 11 · 163 + 11 · 162 + 14 · 161 + 7 · 160

So logically we tend to count digits from LSB: right to left

Digit no 4 3 2 1 0Base 10 value 4 8 1 0 3Base 16 value b b e 7

Gabriel Laskar (EPITA) CAAL 2015 76 / 378

Page 119: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Endianness Explaination

EndiannessMathematical reference

With a base b, a natural N may be decomposed in digits dk .If we naturally write it:N = dndn−1...dk ...d1d0N = dn · bn + dn−1 · bn−1 + · · ·+ dk · bk + · · ·+ d1 · b1 + d0 · b0

N = 4810310 = bbe716I With b = 10: N = 4 · 104 + 8 · 103 + 1 · 102 + 0 · 101 + 3 · 100I With b = 16: N = 11 · 163 + 11 · 162 + 14 · 161 + 7 · 160

So logically we tend to count digits from LSB: right to left

Digit no 4 3 2 1 0Base 10 value 4 8 1 0 3Base 16 value b b e 7

Gabriel Laskar (EPITA) CAAL 2015 76 / 378

Page 120: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Endianness Explaination

EndiannessMathematical reference

With a base b, a natural N may be decomposed in digits dk .If we naturally write it:N = dndn−1...dk ...d1d0N = dn · bn + dn−1 · bn−1 + · · ·+ dk · bk + · · ·+ d1 · b1 + d0 · b0

N = 4810310 = bbe716I With b = 10: N = 4 · 104 + 8 · 103 + 1 · 102 + 0 · 101 + 3 · 100I With b = 16: N = 11 · 163 + 11 · 162 + 14 · 161 + 7 · 160

So logically we tend to count digits from LSB: right to left

Digit no 4 3 2 1 0Base 10 value 4 8 1 0 3Base 16 value b b e 7

Gabriel Laskar (EPITA) CAAL 2015 76 / 378

Page 121: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Endianness Explaination

Memory representation

Usually, we like to represent memory in written order, the same way wewrite words on a paper sheet: left to right

0 1 2 30x00 ’A’ ’ ’ ’s’ ’i’0x04 ’m’ ’p’ ’l’ ’e’0x08 ’ ’ ’m’ ’e’ ’s’0x0c ’s’ ’a’ ’g’ ’e’

Gabriel Laskar (EPITA) CAAL 2015 77 / 378

Page 122: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Endianness Explaination

Integer memory representationThe “endianness” problem is whether to write integers

I in text order: the natural way (for western languages)I in index order: with digit 0 at address 0

3 2 1 0

0x00

0x10

0x04

0x08

0x0c

20 21 22 23

00 02 03

10 12

40

01

31 32

41 42

13

33

43

11

30

Gabriel Laskar (EPITA) CAAL 2015 78 / 378

Page 123: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Endianness Explaination

Integer memory representationThe “endianness” problem is whether to write integers

I in text order: the natural way (for western languages)I in index order: with digit 0 at address 0

3 2 1 0

20 21 22 23Big endian

0x00

0x10

0x04

0x08

0x0c

20 21 22 23

00 02 03

10 12

40

01

31 32

41 42

13

33

43

11

30

Gabriel Laskar (EPITA) CAAL 2015 78 / 378

Page 124: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Endianness Explaination

Integer memory representationThe “endianness” problem is whether to write integers

I in text order: the natural way (for western languages)I in index order: with digit 0 at address 0

3 2 1 0

2023 22 21Little endian

0x00

0x10

0x04

0x08

0x0c

20 21 22 23

00 02 03

10 12

40

01

31 32

41 42

13

33

43

11

30

Gabriel Laskar (EPITA) CAAL 2015 78 / 378

Page 125: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Endianness Explaination

Integer memory representationThe “endianness” problem is whether to write integers

I in text order: the natural way (for western languages)I in index order: with digit 0 at address 0

3 2 1 0

2023 22 21Little endian

0x00

0x10

0x04

0x08

0x0c

2023 22 21

00

10

40

30

03

13

33

43

02

12

32

42

01

31

41

11

0x00

0x10

0x04

0x08

0x0c

20 21 22 23

00 02 03

10 12

40

01

31 32

41 42

13

33

43

11

30

Gabriel Laskar (EPITA) CAAL 2015 78 / 378

Page 126: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Endianness Explaination

Integer memory representationThe “endianness” problem is whether to write integers

I in text order: the natural way (for western languages)I in index order: with digit 0 at address 0

3 2 1 0

2023 22 21Little endian

0x00

0x10

0x04

0x08

0x0c

2023 22 21

00

10

40

30

03

13

33

43

02

12

32

42

01

31

41

11

0x00

0x10

0x04

0x08

0x0c

20 21 22 23

00 02 03

10 12

40

01

31 32

41 42

13

33

43

11

30

Gabriel Laskar (EPITA) CAAL 2015 78 / 378

Page 127: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Endianness Explaination

EndiannessDo not mix everything!

I Data must be stored and fetched using the same convention.I Don’t worry about byte order in registers, it does not make sense.

Gabriel Laskar (EPITA) CAAL 2015 79 / 378

Page 128: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Endianness Demo

3: Memory

Memories

Memory accessing modes

Alignment

EndiannessDifferent designs, different paradigmsExplainationDemo

Caches

Gabriel Laskar (EPITA) CAAL 2015 80 / 378

Page 129: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Endianness Demo

Endianness Demo

int main(void){

unsigned int a = 0x12345678;hexdump(&a);

}

Gabriel Laskar (EPITA) CAAL 2015 81 / 378

Page 130: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Caches

3: Memory

Memories

Memory accessing modes

Alignment

Endianness

Caches

Gabriel Laskar (EPITA) CAAL 2015 82 / 378

Page 131: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Caches

Cache memory: reason

I Once upon a time, CPU and memory speed were the same.I Of course, they evolved:

I Moore’s law: CPU power doubles every 2 years,I Memory speed: +7% every year.

Gabriel Laskar (EPITA) CAAL 2015 83 / 378

Page 132: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Caches

Cache memory: definition

Cache memory is:I a local copy of central memory,I transparent, on-demand,I volatile (may be flushed anytime),I faster, closer to CPU than central memory,I expensive!

Gabriel Laskar (EPITA) CAAL 2015 84 / 378

Page 133: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Caches

Cache memoryHierarchy

L1

Data

CPU

L1

Data

CPU

Die−Shared L3

Ins + Data + TLB

CPU−Shared L2

Ins + Data

Instr.

L1L1

Instr.

CPU−Shared L2

Ins + Data

I There are multiple caches “levels”I with different size,I with different speed,I with different latency.

I Caches may be shared between code and data.Gabriel Laskar (EPITA) CAAL 2015 85 / 378

Page 134: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory Caches

Cache memories side-effects

Caches may also involve stangeness:I having copies of memory introduce a coherence problem,I an access to a given memory location may vary,

There are various memory operators:Programmer that you handle in C code,Assembler that the compiler generated,

CPU that the CPU does,In-cache that the cache does in the system.

Gabriel Laskar (EPITA) CAAL 2015 86 / 378

Page 135: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping

Part IV

Memory mapping

Gabriel Laskar (EPITA) CAAL 2015 87 / 378

Page 136: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping Address space

4: Memory mapping

Address spaceDefinitionTranslation

Computer address spaces

Computer address space translation

MMU patterns

Gabriel Laskar (EPITA) CAAL 2015 88 / 378

Page 137: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping Address space Definition

4: Memory mapping

Address spaceDefinitionTranslation

Computer address spaces

Computer address space translation

MMU patterns

Gabriel Laskar (EPITA) CAAL 2015 89 / 378

Page 138: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping Address space Definition

Address spaceDefinition

An address space is a set of discrete values targetting a set of objects.I Each address points to one objectI One given object may be pointed by more than one address

1

2

3

foo

bar

baz

Gabriel Laskar (EPITA) CAAL 2015 90 / 378

Page 139: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping Address space Translation

4: Memory mapping

Address spaceDefinitionTranslation

Computer address spaces

Computer address space translation

MMU patterns

Gabriel Laskar (EPITA) CAAL 2015 91 / 378

Page 140: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping Address space Translation

Address space translation

Address space translation creates a new address space where each sourceaddress is mapped to a destination address.A given object can then be targetted by either addresses.

a

b

c

1

2

3

foo

bar

baz

Gabriel Laskar (EPITA) CAAL 2015 92 / 378

Page 141: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping Computer address spaces

4: Memory mapping

Address space

Computer address spacesDefinitionUsual address spaces

Computer address space translation

MMU patterns

Gabriel Laskar (EPITA) CAAL 2015 93 / 378

Page 142: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping Computer address spaces Definition

4: Memory mapping

Address space

Computer address spacesDefinitionUsual address spaces

Computer address space translation

MMU patterns

Gabriel Laskar (EPITA) CAAL 2015 94 / 378

Page 143: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping Computer address spaces Definition

Memory address spaceDefinition

A memory address spaceI is contiguousI can be mapped to another target address space (through address

space translation)All memory accesses are done with respect to an address space.

Gabriel Laskar (EPITA) CAAL 2015 95 / 378

Page 144: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping Computer address spaces Usual address spaces

4: Memory mapping

Address space

Computer address spacesDefinitionUsual address spaces

Computer address space translation

MMU patterns

Gabriel Laskar (EPITA) CAAL 2015 96 / 378

Page 145: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping Computer address spaces Usual address spaces

Computer address spaces

CPU Address space available through a CPU register. Current machineshave 32 or 64 bit pointer registers

Physical Address space actually wired between hardware components. Currentmachines have physical address spaces around 40 bits.

Virtual Address space reachable by a process. Generally 32 or 64 bits.

Gabriel Laskar (EPITA) CAAL 2015 97 / 378

Page 146: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping Computer address spaces Usual address spaces

Physical memory

Physical memory provides the lowest accessible address space in computer.Physical RAM is usually accessible as a small memory subset of thephysical address space.Physical memory can be mapped in different ways depending on physicaladdress bus implementation:

I Accessible at a given location,I Scattered at multiple locations,I Accessible (many times) at multiple locations.

Gabriel Laskar (EPITA) CAAL 2015 98 / 378

Page 147: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping Computer address space translation

4: Memory mapping

Address space

Computer address spaces

Computer address space translationSegmentationPagination

MMU patterns

Gabriel Laskar (EPITA) CAAL 2015 99 / 378

Page 148: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping Computer address space translation Segmentation

4: Memory mapping

Address space

Computer address spaces

Computer address space translationSegmentationPagination

MMU patterns

Gabriel Laskar (EPITA) CAAL 2015 100 / 378

Page 149: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping Computer address space translation Segmentation

Memory segments

Memory segments define an address space as a sub-region of anotheraddress space. It is mostly defined by the following attributes

I Segment base address in target address space,I Segment size,I Segment type and access rights.

Gabriel Laskar (EPITA) CAAL 2015 101 / 378

Page 150: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping Computer address space translation Segmentation

Simple segment

Segmentation keeps memory locations contiguous.

0

0

Source address space

Destination address space

Gabriel Laskar (EPITA) CAAL 2015 102 / 378

Page 151: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping Computer address space translation Segmentation

Physical address space

Physical address space

Physical memory

0

0

Gabriel Laskar (EPITA) CAAL 2015 103 / 378

Page 152: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping Computer address space translation Segmentation

Single mapped RAM

addr[x<n] = mem[x]

Physical address space

Physical memory

0

0

Gabriel Laskar (EPITA) CAAL 2015 104 / 378

Page 153: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping Computer address space translation Segmentation

Single mapped RAM

addr[x>=n] = ?

n

?

Physical address space

Physical memory

0

0

Gabriel Laskar (EPITA) CAAL 2015 104 / 378

Page 154: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping Computer address space translation Segmentation

Multiple/loop mapped RAM

addr[x] = mem[x%n]

Physical address space

Physical memory

0

0

Gabriel Laskar (EPITA) CAAL 2015 105 / 378

Page 155: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping Computer address space translation Segmentation

Memory access using segments

To access memory through a defined segment, the CPU performs thefollowing tasks:

I Check requested address against the segment size,I Add the segment base address to the requested memory address,I Check access rights.

Gabriel Laskar (EPITA) CAAL 2015 106 / 378

Page 156: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping Computer address space translation Segmentation

Memory access using segmentsMemory access

0

0

Segment baseaddress

Accessoffset

Effective address

Target addressspace

Process addressspace

Gabriel Laskar (EPITA) CAAL 2015 107 / 378

Page 157: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping Computer address space translation Segmentation

Segment descriptor table

0

0

addressspace

Target

Segment 2

Segment 3

Segment 0

sizebase

CPU

Register

1

0

0

6

9

3

3

− −

0 12

Gabriel Laskar (EPITA) CAAL 2015 108 / 378

Page 158: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping Computer address space translation Segmentation

Limitations of segmentation

Segmentation is a great thing, but is has a few limitations:I Address-space must be mapped in contiguous blocksI Thus segments are difficult to grow on-demandI A whole segment must be present in the target address space

Gabriel Laskar (EPITA) CAAL 2015 109 / 378

Page 159: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping Computer address space translation Pagination

4: Memory mapping

Address space

Computer address spaces

Computer address space translationSegmentationPagination

MMU patterns

Gabriel Laskar (EPITA) CAAL 2015 110 / 378

Page 160: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping Computer address space translation Pagination

Memory pagesModern operating systems need more than segmentation.Basic idea is to split the source address space in pages, and map eachsource page to a target page.

addressspace

Target

addressspace

Source

0

0

Gabriel Laskar (EPITA) CAAL 2015 111 / 378

Page 161: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping Computer address space translation Pagination

Pages mapping

Memory pages need to be mapped using a specific descriptor tablerecognized by the CPU. Usual attributes for a page are

I address in target address space,I type and access rights,I page size,I other attributes (cacheability, coherence, ...)

Gabriel Laskar (EPITA) CAAL 2015 112 / 378

Page 162: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping Computer address space translation Pagination

Pages descriptor table

addressspace

Target

addressspace

Source

0

0

CPU

Register

5

3

6

10

4

1

Gabriel Laskar (EPITA) CAAL 2015 113 / 378

Page 163: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping Computer address space translation Pagination

Pages mappingRationale

Splitting memory in pages allows more powerful memory management:I Address spaces may be mapped to uncontiguous target pages,

allowing memory fragmentation.I Many interesting operations can be performed on pages (sharing,

swapping, ...)

Gabriel Laskar (EPITA) CAAL 2015 114 / 378

Page 164: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns

4: Memory mapping

Address space

Computer address spaces

Computer address space translation

MMU patternsMemory protectionPrivilegesProcess switchingMemory sharingCopy On WritePage swappingmmap()

Gabriel Laskar (EPITA) CAAL 2015 115 / 378

Page 165: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Memory protection

4: Memory mapping

Address space

Computer address spaces

Computer address space translation

MMU patternsMemory protectionPrivilegesProcess switchingMemory sharingCopy On WritePage swappingmmap()

Gabriel Laskar (EPITA) CAAL 2015 116 / 378

Page 166: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Memory protection

Memory protectionMotivations

Why do we need memory protection?

Gabriel Laskar (EPITA) CAAL 2015 117 / 378

Page 167: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines
Page 168: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Memory protection

Memory protection

In order to execute secure operating system, hardware has to providememory protection systems. This protects

I the system from the hosted processesI hosted processes from each other

It may even protect the system from its own components in aMicro-Kernel approach.

Gabriel Laskar (EPITA) CAAL 2015 119 / 378

Page 169: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Memory protection

Memory protectionHow?

Several memory access checks are performed by the CPU, for each access,transparently:

I address bounds validity,I privilege level,I operation type.

Gabriel Laskar (EPITA) CAAL 2015 120 / 378

Page 170: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Memory protection

Memory protectionWhere?

I In a segmentation-based system, priviliges are per segment.I In a pagination-based system, privileges are in page descriptor table,

with a page granularity.

x86 mixes both with segmentation on top of pagination.

Gabriel Laskar (EPITA) CAAL 2015 121 / 378

Page 171: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Privileges

4: Memory mapping

Address space

Computer address spaces

Computer address space translation

MMU patternsMemory protectionPrivilegesProcess switchingMemory sharingCopy On WritePage swappingmmap()

Gabriel Laskar (EPITA) CAAL 2015 122 / 378

Page 172: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Privileges

Privilege levelsPrivilege level keeps memory from being accessed by non-authorized code.Different CPUs define different privilege levels.

Datasegment/page

level 0

Datasegment/page

level 1

Codesegment/page

level 1

Allowed

memory

access

memory

Illegal

access

Instruction

Target addressspace

Gabriel Laskar (EPITA) CAAL 2015 123 / 378

Page 173: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Privileges

Operation typeOperation types checking keeps code from doing unwanted memoryoperations.

segment/pageData

(read)segment/page

Data

(read/write)segment/page

Code

(exec)

Target addressspace

Instruction

ok

ReadWrite

okok

Read

writeread

Illegal

writeok

Execution Illegal Illegal

Gabriel Laskar (EPITA) CAAL 2015 124 / 378

Page 174: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Process switching

4: Memory mapping

Address space

Computer address spaces

Computer address space translation

MMU patternsMemory protectionPrivilegesProcess switchingMemory sharingCopy On WritePage swappingmmap()

Gabriel Laskar (EPITA) CAAL 2015 125 / 378

Page 175: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Process switching

Process switching

Pagination systems are easier to use with a separate memory address spacefor each process running on a computer:

I Switching process space implies changing the page descriptor table.I Each process has its own page descriptor table ready to be used by

the CPU.I Only the CPU register pointing to this table has to be changed to

setup a new address space.

Gabriel Laskar (EPITA) CAAL 2015 126 / 378

Page 176: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Process switching

Process switching

Process Baddress space

Process Aaddress space

Target addressspace

Process Bpage table

Process Apage table

CPU register

Gabriel Laskar (EPITA) CAAL 2015 127 / 378

Page 177: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Memory sharing

4: Memory mapping

Address space

Computer address spaces

Computer address space translation

MMU patternsMemory protectionPrivilegesProcess switchingMemory sharingCopy On WritePage swappingmmap()

Gabriel Laskar (EPITA) CAAL 2015 128 / 378

Page 178: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Memory sharing

Memory sharing

The memory pagination can be used to share pages between processes.A single page can be mapped in several process address spaces to permitdifferent behaviors:

I Save physical memory by not duplicating shared code and read-onlydata memory (used for shared libraries),

I Use shared memory for inter-process communication,

Gabriel Laskar (EPITA) CAAL 2015 129 / 378

Page 179: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Memory sharing

Memory sharing

Process Apage table

Process Aaddress space

Process Baddress space

Target addressspace

Process Bpage table

0 1 2 3 0 1 2

0 1 2 3 4 18........

Gabriel Laskar (EPITA) CAAL 2015 130 / 378

Page 180: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Copy On Write

4: Memory mapping

Address space

Computer address spaces

Computer address space translation

MMU patternsMemory protectionPrivilegesProcess switchingMemory sharingCopy On WritePage swappingmmap()

Gabriel Laskar (EPITA) CAAL 2015 131 / 378

Page 181: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Copy On Write

Copy On Write

Copy On Write (COW) is a powerful trick used in many situations.

One of the most usual ones is fork() where the whole address space of aprocess has to be cloned.

Basic idea is to make the whole memory read-only and actually copy onlywhen necessary, as late as possible.

Gabriel Laskar (EPITA) CAAL 2015 132 / 378

Page 182: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Copy On Write

Copy On WriteStep by step

0

0

4

10

3

6

Gabriel Laskar (EPITA) CAAL 2015 133 / 378

Page 183: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Copy On Write

Copy On WriteStep by step

0

0

4

10

3

6

Gabriel Laskar (EPITA) CAAL 2015 133 / 378

Page 184: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Copy On Write

Copy On WriteStep by step

0

0

4

10

3

6

4

10

3

6

Gabriel Laskar (EPITA) CAAL 2015 133 / 378

Page 185: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Copy On Write

Copy On WriteStep by step

0

0 0

4

10

3

6

4

10

3

6

Gabriel Laskar (EPITA) CAAL 2015 133 / 378

Page 186: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Copy On Write

Copy On WriteStep by step

0

0 0

4

10

3

6

4

10

3

6

Gabriel Laskar (EPITA) CAAL 2015 133 / 378

Page 187: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Copy On Write

Copy On WriteStep by step

0

0 0

4

10

3

6

4

10

3

6

Gabriel Laskar (EPITA) CAAL 2015 133 / 378

Page 188: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Copy On Write

Copy On WriteStep by step

0

0 0

2

10

3

6

4

10

3

6

Gabriel Laskar (EPITA) CAAL 2015 133 / 378

Page 189: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Copy On Write

Copy On WriteStep by step

0

0 0

2

10

3

6

4

10

3

6

Gabriel Laskar (EPITA) CAAL 2015 133 / 378

Page 190: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Page swapping

4: Memory mapping

Address space

Computer address spaces

Computer address space translation

MMU patternsMemory protectionPrivilegesProcess switchingMemory sharingCopy On WritePage swappingmmap()

Gabriel Laskar (EPITA) CAAL 2015 134 / 378

Page 191: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Page swapping

Page swapping

Page swapping is a mechanism artificially enlarging Physical memory witha part of the hard-disk.

Gabriel Laskar (EPITA) CAAL 2015 135 / 378

Page 192: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Page swapping

Page swappingStep by step

addressspace

Source

addressspace

Target

Disk

0

0

CPU

Register

1

6

5

3

10

4

Gabriel Laskar (EPITA) CAAL 2015 136 / 378

Page 193: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Page swapping

Page swappingStep by step

addressspace

Source

addressspace

Target

Disk

0

0

CPU

Register

1

6

5

3

10

4

Gabriel Laskar (EPITA) CAAL 2015 136 / 378

Page 194: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Page swapping

Page swappingStep by step

addressspace

Source

addressspace

Target

Disk

0

0

CPU

Register

1

6

5

3

10

4

Gabriel Laskar (EPITA) CAAL 2015 136 / 378

Page 195: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Page swapping

Page swappingStep by step

addressspace

Source

addressspace

Target

Disk

0

0

CPU

Register

1

S: 3

6

5

3

10

4

Gabriel Laskar (EPITA) CAAL 2015 136 / 378

Page 196: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Page swapping

Page swappingStep by step

addressspace

Source

addressspace

Target

Disk

0

0

CPU

Register

1

S: 3

6

5

3

10

4

Gabriel Laskar (EPITA) CAAL 2015 136 / 378

Page 197: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Page swapping

Page swappingStep by step

addressspace

Source

addressspace

Target

Disk

0

0

CPU

Register

1

S: 3

disk

5

3

10

4

Gabriel Laskar (EPITA) CAAL 2015 136 / 378

Page 198: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Page swapping

Page swappingStep by step

addressspace

Source

addressspace

Target

Disk

0

0

CPU

Register

1

S: 3

disk

5

3

10

4

Gabriel Laskar (EPITA) CAAL 2015 136 / 378

Page 199: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Page swapping

Page swappingStep by step

addressspace

Source

addressspace

Target

Disk

0

0

CPU

Register

1

S: 3

disk

5

3

10

4

Gabriel Laskar (EPITA) CAAL 2015 136 / 378

Page 200: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Page swapping

Page swappingStep by step

addressspace

Source

addressspace

Target

Disk

0

0

CPU

Register

1

S: 3

5

3

10

4

7

Gabriel Laskar (EPITA) CAAL 2015 136 / 378

Page 201: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns Page swapping

Page swappingStep by step

addressspace

Source

addressspace

Target

Disk

0

0

CPU

Register

1

S: 3

5

3

10

4

7

Gabriel Laskar (EPITA) CAAL 2015 136 / 378

Page 202: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns mmap()

4: Memory mapping

Address space

Computer address spaces

Computer address space translation

MMU patternsMemory protectionPrivilegesProcess switchingMemory sharingCopy On WritePage swappingmmap()

Gabriel Laskar (EPITA) CAAL 2015 137 / 378

Page 203: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns mmap()

mmap()

Make part of the memory exactly match the contents of a file.I Reflect changes immediatly between processes having file openedI Permit different protections on different parts of the fileI Lazily load parts of fileI Lazily write parts back

Gabriel Laskar (EPITA) CAAL 2015 138 / 378

Page 204: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns mmap()

mmap()Step by step

addressspace

Source

addressspace

Target

Disk

0

0

CPU

Register

1

5

3

Gabriel Laskar (EPITA) CAAL 2015 139 / 378

Page 205: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns mmap()

mmap()Step by step

addressspace

Source

addressspace

Target

Disk

0

0

CPU

Register

1

foo.bin

f:0

f:1

f:2

f:3

5

3

Gabriel Laskar (EPITA) CAAL 2015 139 / 378

Page 206: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns mmap()

mmap()Step by step

addressspace

Source

addressspace

Target

Disk

0

0

CPU

Register

1

foo.bin

f:0

f:1

f:2

f:3

5

3

Gabriel Laskar (EPITA) CAAL 2015 139 / 378

Page 207: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns mmap()

mmap()Step by step

addressspace

Source

addressspace

Target

Disk

0

0

CPU

Register

1

foo.bin

f:0

f:1

f:2

f:3

foo

.bin

:3

5

3

Gabriel Laskar (EPITA) CAAL 2015 139 / 378

Page 208: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns mmap()

mmap()Step by step

addressspace

Source

addressspace

Target

Disk

0

0

CPU

Register

1

foo.bin

f:0

f:1

f:2

8

5

3

Gabriel Laskar (EPITA) CAAL 2015 139 / 378

Page 209: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Memory mapping MMU patterns mmap()

mmap()Step by step

addressspace

Source

addressspace

Target

Disk

0

0

CPU

Register

1

foo.bin

f:0

f:1

f:2

8

5

3

Gabriel Laskar (EPITA) CAAL 2015 139 / 378

Page 210: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow

Part V

Execution flow

Gabriel Laskar (EPITA) CAAL 2015 140 / 378

Page 211: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls

5: Execution flow

Branch, function callsBranch principlePipeline considerations

Function calls

Handling events

Multitasking

Gabriel Laskar (EPITA) CAAL 2015 141 / 378

Page 212: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Branch principle

5: Execution flow

Branch, function callsBranch principlePipeline considerations

Function calls

Handling events

Multitasking

Gabriel Laskar (EPITA) CAAL 2015 142 / 378

Page 213: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Branch principle

Branch principle

Branching is breaking the normal incremental execution flow to go executecode somewhere else.A branch has

I a destination,I optionally a condition,I optionally a “link” feature, saving return point.

Gabriel Laskar (EPITA) CAAL 2015 143 / 378

Page 214: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Branch principle

Branch offset

Instructions are fetched from memory to CPU by dereferencing theprogram counter (%pc register)

I in normal execution flow, the %pc is auto-incremented

%pc

80450020

8045001C

80450018

80450014

80450010

Addresses Instructions

add %g1, %g2, %g3

mov %g0, %g1

xor %g2, %g3, %g2

sub %g3, %g2, %g1

...

Next instruction

Executed instruction

I when a branch occurs, the %pc is affected in two possible ways:relative branch: a constant offset is added to the %pcabsolute branch: an absolute address is loaded into the %pc

Gabriel Laskar (EPITA) CAAL 2015 144 / 378

Page 215: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Branch principle

Unconditional branch

The %pc register is always modified.I Explicit jump: gotoI Explicit infinite loop, optimized by the compiler

%pc

80450120

8045011C

80450118

80450014

80450010

Addresses Instructions

b 8045011C

mov %g0, %g1

xor %g2, %g3, %g2

sub %g3, %g2, %g1

...

Next instruction

Executed instruction

Gabriel Laskar (EPITA) CAAL 2015 145 / 378

Page 216: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Branch principle

Unconditional branchC listing

#include <stdio.h>

void main(void){

do {puts("hello");

} while (1);}

Gabriel Laskar (EPITA) CAAL 2015 146 / 378

Page 217: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Branch principle

Unconditional branchControl flow graph

return

main

puts("hello");

Gabriel Laskar (EPITA) CAAL 2015 147 / 378

Page 218: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Branch principle

Conditional branch

The %pc register is modified only if the condition is verifiedI if statementsI loop (for, while) statements

80450010

Addresses

80450014

80450118

8045011C

bz 8045011C

mov %g0, %g1

xor %g2, %g3, %g2

sub %g3, %g2, %g1

Instructions

Executed instruction

Potential next instruction

80450120 ...

%pc

Potential next instruction

80450018 ...

Gabriel Laskar (EPITA) CAAL 2015 148 / 378

Page 219: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Branch principle

Conditional branchC listing

#include <stdio.h>

void main(void){

int i = 10;

do {printf("%i\n", --i);

} while (i != 0);}

Gabriel Laskar (EPITA) CAAL 2015 149 / 378

Page 220: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Branch principle

Conditional branchControl flow graph

yes

no

main

i = 10

printf("%i\n");

i != 0

−−i

returnGabriel Laskar (EPITA) CAAL 2015 150 / 378

Page 221: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Branch principle

Conditions

The decision to take the branch is based on register contents:I conditional branch may occur if a specific bit is set in a registerI conditional branch may occur if a specific bit is clear in a registerI conditional branch may occur if a register equals a specific value

(usually zero)

Gabriel Laskar (EPITA) CAAL 2015 151 / 378

Page 222: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Branch principle

Complete examples

1. strlen2. pgcd

Gabriel Laskar (EPITA) CAAL 2015 152 / 378

Page 223: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Pipeline considerations

5: Execution flow

Branch, function callsBranch principlePipeline considerations

Function calls

Handling events

Multitasking

Gabriel Laskar (EPITA) CAAL 2015 153 / 378

Page 224: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Pipeline considerations

Pipeline considerations

A branch may break the pipeline, slowing the execution1. when things goes wrong, a bubble is created2. sometimes, the processor succeeds in predicting the branch target3. when a misprediction occurs, a bubble is created

Gabriel Laskar (EPITA) CAAL 2015 154 / 378

Page 225: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Pipeline considerations

Pipeline considerationsBubble

When a branch occurs, the processor may flush the stages of the pipelinethat contains useless instructions:

Pipeline timeline

Inst

ruct

ion

seq

uen

ce

Fetch?

Gabriel Laskar (EPITA) CAAL 2015 155 / 378

Page 226: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Pipeline considerations

Pipeline considerationsBubble

When a branch occurs, the processor may flush the stages of the pipelinethat contains useless instructions:

sub

Pipeline timeline

Inst

ruct

ion

seq

uen

ce

Decod

Fetch?

Fetch

Gabriel Laskar (EPITA) CAAL 2015 155 / 378

Page 227: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Pipeline considerations

Pipeline considerationsBubble

When a branch occurs, the processor may flush the stages of the pipelinethat contains useless instructions:

not

sub

Pipeline timeline

Inst

ruct

ion

seq

uen

ce

Fetch

Decod

Exec

?

Decod

Fetch

Fetch

Gabriel Laskar (EPITA) CAAL 2015 155 / 378

Page 228: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Pipeline considerations

Pipeline considerationsBubble

When a branch occurs, the processor may flush the stages of the pipelinethat contains useless instructions:

add

not

sub

Pipeline timeline

Inst

ruct

ion

seq

uen

ce

Exec

Decod

Fetch

Mem

?

Fetch

Decod

ExecDecod

Fetch

Fetch

Gabriel Laskar (EPITA) CAAL 2015 155 / 378

Page 229: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Pipeline considerations

Pipeline considerationsBubble

When a branch occurs, the processor may flush the stages of the pipelinethat contains useless instructions:

jmp

add

not

sub

Pipeline timeline

Inst

ruct

ion

seq

uen

ce

Exec

Decod

Fetch

Mem

Write

?

!

Exec

Decod

Fetch

Mem

Fetch

Decod

ExecDecod

Fetch

Fetch

Gabriel Laskar (EPITA) CAAL 2015 155 / 378

Page 230: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Pipeline considerations

Pipeline considerationsBubble

When a branch occurs, the processor may flush the stages of the pipelinethat contains useless instructions:

−−

jmp

add

not

sub

Pipeline timeline

Inst

ruct

ion

seq

uen

ce

Exec

−−

Fetch

Mem

Write

?

Exec

Decod

Fetch

Mem

Write

Exec

Decod

Fetch

Mem

Fetch

Decod

ExecDecod

Fetch

Fetch

Gabriel Laskar (EPITA) CAAL 2015 155 / 378

Page 231: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Pipeline considerations

Pipeline considerationsBubble

When a branch occurs, the processor may flush the stages of the pipelinethat contains useless instructions:

add

−−

jmp

add

not

sub

Pipeline timeline

Inst

ruct

ion

seq

uen

ce

Fetch

Decod

−−

Write

Mem

?

Exec

−−

Fetch

Mem

Write

Exec

Decod

Fetch

Mem

Write

Exec

Decod

Fetch

Mem

Fetch

Decod

ExecDecod

Fetch

Fetch

Gabriel Laskar (EPITA) CAAL 2015 155 / 378

Page 232: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Pipeline considerations

Pipeline considerationsDelay slot

A delay slot is a processor feature. It’s a convention saying the instructionfollowing branches is always executed.Branch is delayed.

Exec

Decod

Fetch

Mem

Write

?

jmp

add

not

sub

Pipeline timeline

Inst

ruct

ion

seq

uen

ce

Exec

Decod

Fetch

Mem

Fetch

Decod

ExecDecod

Fetch

Fetch

Gabriel Laskar (EPITA) CAAL 2015 156 / 378

Page 233: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Pipeline considerations

Pipeline considerationsDelay slot

A delay slot is a processor feature. It’s a convention saying the instructionfollowing branches is always executed.Branch is delayed.

Exec

Decod

Fetch

Mem

Write

?

jmp

add

not

sub

Pipeline timeline

Inst

ruct

ion

seq

uen

ce

Exec

Decod

Fetch

Mem

Fetch

Decod

ExecDecod

Fetch

Fetch

jmp is executedinstruction following

execution flowis modified one cycle later

Gabriel Laskar (EPITA) CAAL 2015 156 / 378

Page 234: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Pipeline considerations

Pipeline considerationsDelay slot

A delay slot is a processor feature. It’s a convention saying the instructionfollowing branches is always executed.Branch is delayed.

sub

Exec

Decod

Fetch

Mem

Write

?

Exec

Decod

Fetch

Mem

Write

jmp

add

not

sub

Pipeline timeline

Inst

ruct

ion

seq

uen

ce

Exec

Decod

Fetch

Mem

Fetch

Decod

ExecDecod

Fetch

Fetch

jmp is executedinstruction following

execution flowis modified one cycle later

Gabriel Laskar (EPITA) CAAL 2015 156 / 378

Page 235: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Pipeline considerations

Pipeline considerationsDelay slot

A delay slot is a processor feature. It’s a convention saying the instructionfollowing branches is always executed.Branch is delayed.

sub

Fetch

Decod

Exec

Write

Mem

?

Exec

Decod

Fetch

Mem

Write

Exec

Decod

Fetch

Mem

Write

add

jmp

add

not

sub

Pipeline timeline

Inst

ruct

ion

seq

uen

ce

Exec

Decod

Fetch

Mem

Fetch

Decod

ExecDecod

Fetch

Fetch

jmp is executedinstruction following

execution flowis modified one cycle later

Gabriel Laskar (EPITA) CAAL 2015 156 / 378

Page 236: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Pipeline considerations

Pipeline considerationsLoads (digression)

Pipeline timeline

Inst

ruct

ion

seq

uen

ce

Fetch?

Gabriel Laskar (EPITA) CAAL 2015 157 / 378

Page 237: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Pipeline considerations

Pipeline considerationsLoads (digression)

sub g1, g2, g3

Pipeline timeline

Inst

ruct

ion

seq

uen

ce

Decod

Fetch?

Fetch

Gabriel Laskar (EPITA) CAAL 2015 157 / 378

Page 238: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Pipeline considerations

Pipeline considerationsLoads (digression)

ld g5, [g2]

sub g1, g2, g3

Pipeline timeline

Inst

ruct

ion

seq

uen

ce

Fetch

Decod

Exec

?

Decod

Fetch

Fetch

Gabriel Laskar (EPITA) CAAL 2015 157 / 378

Page 239: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Pipeline considerations

Pipeline considerationsLoads (digression)

add g6, 0x2a, g5

ld g5, [g2]

sub g1, g2, g3

Pipeline timeline

Inst

ruct

ion

seq

uen

ce

Exec

Decod

Fetch

Mem

!

?

Fetch

Decod

ExecDecod

Fetch

Fetch

Gabriel Laskar (EPITA) CAAL 2015 157 / 378

Page 240: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Pipeline considerations

Pipeline considerationsLoads (digression)

add g6, 0x2a, g5

ld g5, [g2]

sub g1, g2, g3

Pipeline timeline

Inst

ruct

ion

seq

uen

ce

Decod

Fetch

Mem

Write

Exec!

Exec

Decod

Fetch

Mem

Fetch

Decod

ExecDecod

Fetch

Fetch

Gabriel Laskar (EPITA) CAAL 2015 157 / 378

Page 241: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Pipeline considerations

Pipeline considerationsLoads (digression)

add g6, 0x2a, g5

ld g5, [g2]

sub g1, g2, g3

Pipeline timeline

Inst

ruct

ion

seq

uen

ce

Decod

Fetch

Mem

Write

Exec!

Exec

Decod

Fetch

Mem

Fetch

Decod

ExecDecod

Fetch

Fetch

Gabriel Laskar (EPITA) CAAL 2015 157 / 378

Page 242: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Pipeline considerations

Pipeline considerationsLoads (digression)

add g6, 0x2a, g5

ld g5, [g2]

sub g1, g2, g3

Pipeline timeline

Inst

ruct

ion

seq

uen

ce

Decod

Fetch

Mem

Write

Exec!

Exec

Decod

Fetch

Mem

Fetch

Decod

ExecDecod

Fetch

Fetch

add g6, 0x2a, g5

Gabriel Laskar (EPITA) CAAL 2015 157 / 378

Page 243: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Pipeline considerations

Pipeline considerationsLoads (digression)

ld i3, [g6 + 0x8]

add g6, 0x2a, g5

ld g5, [g2]

sub g1, g2, g3

Pipeline timeline

Inst

ruct

ion

seq

uen

ce

Exec

Decod

Fetch

−−

Write

?

(Exec)

(Decod)

(Fetch)

Mem

Write

Exec

Decod

Fetch

Mem

Fetch

Decod

ExecDecod

Fetch

Fetch

add g6, 0x2a, g5

Gabriel Laskar (EPITA) CAAL 2015 157 / 378

Page 244: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Pipeline considerations

Pipeline considerationsLoads (digression)

ld i3, [g6 + 0x8]

add g6, 0x2a, g5

ld g5, [g2]

sub g1, g2, g3

Pipeline timeline

Inst

ruct

ion

seq

uen

ce

Exec

Decod

Fetch

−−

Write

?

(Exec)

(Decod)

(Fetch)

Mem

Write

Exec

Decod

Fetch

Mem

Fetch

Decod

ExecDecod

Fetch

Fetch

add g6, 0x2a, g5

Gabriel Laskar (EPITA) CAAL 2015 157 / 378

Page 245: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Branch, function calls Pipeline considerations

Pipeline considerationsLoads (digression)

xor i4, 0x2a, i3

ld i3, [g6 + 0x8]

add g6, 0x2a, g5

ld g5, [g2]

sub g1, g2, g3

Pipeline timeline

Inst

ruct

ion

seq

uen

ce

Exec

Decod

Fetch

Mem

−−

?

Exec

Decod

Fetch

−−

Write

(Exec)

(Decod)

(Fetch)

Mem

Write

Exec

Decod

Fetch

Mem

Fetch

Decod

ExecDecod

Fetch

Fetch

Gabriel Laskar (EPITA) CAAL 2015 157 / 378

Page 246: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Function calls

5: Execution flow

Branch, function calls

Function callsPrinciplesArgument passingCall conventions

Handling events

Multitasking

Gabriel Laskar (EPITA) CAAL 2015 158 / 378

Page 247: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Function calls Principles

5: Execution flow

Branch, function calls

Function callsPrinciplesArgument passingCall conventions

Handling events

Multitasking

Gabriel Laskar (EPITA) CAAL 2015 159 / 378

Page 248: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Function calls Principles

Function calls principleA function is a piece of code that returns a value depending on theparameters the user specifies as inputs, if any.

8004500

8004504

8004508

800450B

8002000

8002004

8002008

Addresses AddressesInstructions Instructions

Gabriel Laskar (EPITA) CAAL 2015 160 / 378

Page 249: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Function calls Principles

“call” instruction

I Saves next instruction address (the return path),I Jumps to function

80002000

80002004

80450018

80450014

8045001080450018

%i7

Return address

%pc add %g1, %g2, %g3

...

xor %g2, %g1, %g3

mov %g0, %g1

call 80002000

Addresses Instructions

Delay slot instruction

Executed instruction

Next instruction

Gabriel Laskar (EPITA) CAAL 2015 161 / 378

Page 250: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Function calls Principles

“ret” instruction

I Stores the result in a dedicated registerI Restores the previously-saved PC address

Return address

%i7

80450018

%pc 80450018

80450014

80450010

80002100

800020FC

800020F8

Addresses Instructions

ret

not %g1, %g2

...

Executed instruction

Delay slot instruction

call 80002000

mov %g0, %g1

xor %g2, %g1, %g3 Next instruction

Gabriel Laskar (EPITA) CAAL 2015 162 / 378

Page 251: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Function calls Argument passing

5: Execution flow

Branch, function calls

Function callsPrinciplesArgument passingCall conventions

Handling events

Multitasking

Gabriel Laskar (EPITA) CAAL 2015 163 / 378

Page 252: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Function calls Argument passing

How to pass arguments and return values?

We need a space of shared data between the caller and the callee.I From caller to callee, for argumentsI From callee to caller, for return values

Some arguments are purely for execution purposes:I context pointers (stack, globals, ...)I return address

Gabriel Laskar (EPITA) CAAL 2015 164 / 378

Page 253: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Function calls Argument passing

Simplest case

I Non-nested function callI Less arguments than available machine registers

Gabriel Laskar (EPITA) CAAL 2015 165 / 378

Page 254: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Function calls Argument passing

Through registers

On most RISC architectures, there are registers dedicated to argumentstorage:

I Each argument is stored and preserved directly in a registerI Local variables may be held in another set of registers

Example: on SPARC architecture, the CPU has:I 8 registers (%g0, ... %g7) dedicated to global variablesI 8 registers (%i0, ... %i7) dedicated to input argumentsI 8 registers (%l0, ... %l7) dedicated to local variablesI 8 registers (%o0, ... %o7) dedicated to output arguments

A callee’s “input” registers are shared with caller’s “output” registers.

Gabriel Laskar (EPITA) CAAL 2015 166 / 378

Page 255: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Function calls Argument passing

Through global memory

Principle:I Each argument is stored and preserved directly in memoryI Each local variable is held in memory

Gabriel Laskar (EPITA) CAAL 2015 167 / 378

Page 256: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Function calls Call conventions

5: Execution flow

Branch, function calls

Function callsPrinciplesArgument passingCall conventions

Handling events

Multitasking

Gabriel Laskar (EPITA) CAAL 2015 168 / 378

Page 257: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Function calls Call conventions

Call conventions

I How to deal with huge amount of variables and arguments?I How to deal with recursive calls and nested function calls?I How to deal with variables bigger than registers?I How to deal with “...” (like printf)?I How to deal with dedicated registers (floats)?

Gabriel Laskar (EPITA) CAAL 2015 169 / 378

Page 258: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Function calls Call conventions

Problem

Consider a recursive function that needs local variables:I Each time the function is called, a new context must be allocated to

preserve each local variableI Each time the function returns, the previous context must be restoredI The function may call itself an unpredictable number of timesI These local variables cannot be reserved in a static memory space (as

global variables are).

Gabriel Laskar (EPITA) CAAL 2015 170 / 378

Page 259: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Function calls Call conventions

Abstract context stack

Local

context

Local

context

Local

context

calls

Function

returns

Function

void foo(int a)

{

return foo(a + 1);

}

Gabriel Laskar (EPITA) CAAL 2015 171 / 378

Page 260: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Function calls Call conventions

Register window

Hardware context stack implementation:I Uses a large amount of registers (example: 512 on Sparc)I Uses a logical limitation to define multiple contextsI Does context change by sliding a window on each function call

Gabriel Laskar (EPITA) CAAL 2015 172 / 378

Page 261: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Function calls Call conventions

Principle

%i0 − %i7

%l0 − %l7

%o0 − %o7

Sliding register windowHardware registers

%R95

%R96

%R98

%R97

%R99

%R100

%R101

%R102

%R115

%R116

%R117

%R118

%R119

%R120

Previous

Current

Next

Abstract stack layout

Local

context

Local

context

Local

context

calls

Function

returns

Function

Gabriel Laskar (EPITA) CAAL 2015 173 / 378

Page 262: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Function calls Call conventions

Sparc overlapping register window

registerwindow

Input registers

%i0 ... %i7

Local registers

%l0 ... %l7

Output registers

%o0 ... %o7

registerwindow

Calling fonctions context

Called fonctions context

Registeroverlap

Function call

%l0 ... %l7

Local registers

Input registers

%i0 ... %i7

Output registers

%o0 ... %o7

Gabriel Laskar (EPITA) CAAL 2015 174 / 378

Page 263: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Function calls Call conventions

Function prologue and epilogue

prologue: slide register windowExample:

save %sp, -96, %sp

epilogue: slide back register window (i.e. restore previous context).Example:

restore

Gabriel Laskar (EPITA) CAAL 2015 175 / 378

Page 264: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Function calls Call conventions

Memory stack

Register window has hard limitations:I The call depth is limited to the total amount of registersI The CPU needs a large amount of physical registers ⇒ expensive

On most systems, memory is used to implement a cheaper stack.

Gabriel Laskar (EPITA) CAAL 2015 176 / 378

Page 265: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Function calls Call conventions

Principle

Stack

memory

space

frame

stack

Currentvariables

Local

arguments

Called function

Return address

argumentsInput

Frame

pointer

pointer

Stack

Sliding stack frame Process memory

Previous

Current

Next

Abstract stack layout

Local

context

Local

context

Local

context

calls

Function

returns

Function

Gabriel Laskar (EPITA) CAAL 2015 177 / 378

Page 266: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Function calls Call conventions

Nested function calls

Local variables

Stack memory space

Called functionarguments

Return address

Inputarguments

Return address

Local variables

Called functionarguments

Inputarguments

Return address

Local variables

Currentfunctioncontext

Stackpointer

Framepointer

Functioncalls

Gabriel Laskar (EPITA) CAAL 2015 178 / 378

Page 267: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Function calls Call conventions

Function prologue and epilogue

prologue: saves previous frame pointer, set new frame pointer, reservespace on memory stack for local variablesExample: a function that needs three 32 bits local variables(12 bytes) on its stack:

[%sp] <- %fp%fp <- %sp%sp <- %sp - 12

epilogue: restores previous frame and stack pointers (i.e. restoreprevious context).Example:

%sp <- %fp%fp <- [%fp]

Gabriel Laskar (EPITA) CAAL 2015 179 / 378

Page 268: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Function calls Call conventions

Argument and local variable access

argument:Dereference the address “frame pointer + argument offset”:

ld [%fp + 16], %g1; Access to arg_a

local variable:Dereference the address “frame pointer - local variableoffset”

ld [%fp - 4], %g1; Access to var_b

Gabriel Laskar (EPITA) CAAL 2015 180 / 378

Page 269: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Function calls Call conventions

Argument and local variable accessschema

previous fp

ret address

localvariables

functionarguments

Stackpointer

Framepointer

void foo(int arg_a,

int arg_b)

{

int var_a, var_, var_c;

...

}

int var_c

int var_b

int var_a

previous fp

int arg_b

int arg_a

ret address

Gabriel Laskar (EPITA) CAAL 2015 181 / 378

Page 270: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Handling events

5: Execution flow

Branch, function calls

Function calls

Handling eventsEventsSystem callsFaultsHardware interruptions

Multitasking

Gabriel Laskar (EPITA) CAAL 2015 182 / 378

Page 271: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Handling events Events

5: Execution flow

Branch, function calls

Function calls

Handling eventsEventsSystem callsFaultsHardware interruptions

Multitasking

Gabriel Laskar (EPITA) CAAL 2015 183 / 378

Page 272: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Handling events Events

Events

What are events?

How to handle them?

Gabriel Laskar (EPITA) CAAL 2015 184 / 378

Page 273: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Handling events Events

Categorizing events

An interrupt is caused by an external event.An exception is caused by instruction execution.

Unplanned DeliberateSynchronous fault syscall trap,

software interruptAsynchronous hardware

interruption

→ An incoming event must be executed with a high priority.

Gabriel Laskar (EPITA) CAAL 2015 185 / 378

Page 274: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Handling events Events

User space / kernel spaceA trap is a critical event that must be handled by the kernel in a safememory space, the kernel space.

Kernel mode

Usr.stack

User data

User code

stack

Kernel

code

Kernel

Kernel

spacememory

Processmemoryspace

User mode

Invalid

User code

User data

Usr.stack

Gabriel Laskar (EPITA) CAAL 2015 186 / 378

Page 275: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Handling events Events

Trap handler tableThe processor jumps to a trap routine defined by the operating system.The trap routine address is stored in a dedicated descriptor table.

Interruptiontable

addressregister

Kernel mode

Usr.stack

User data

User code

code

Kernel

data

Kernel

trap #4

Interruption table

Gabriel Laskar (EPITA) CAAL 2015 187 / 378

Page 276: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Handling events System calls

5: Execution flow

Branch, function calls

Function calls

Handling eventsEventsSystem callsFaultsHardware interruptions

Multitasking

Gabriel Laskar (EPITA) CAAL 2015 188 / 378

Page 277: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Handling events System calls

System calls

The system call mechanism can be used by a process to request servicesfrom the operating system:

I executing a process, exitingI reading input, writing output (read, write...)I performing restricted actions such as accessing hardware devices or

accessing the memory management unit.I etc, ...

Gabriel Laskar (EPITA) CAAL 2015 189 / 378

Page 278: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Handling events System calls

Processor modes

Hardware facilities have been implemented to ensure process isolation inmulti-user/multi-processor environment.Today, processors have two (or more) execution modes:user mode

dedicated to user applicationssupervisor mode

reserved for operating system kernel

Gabriel Laskar (EPITA) CAAL 2015 190 / 378

Page 279: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Handling events System calls

Execution permissions

For security sake, some operations are restricted to kernel space:I peripheral input and outputI low-level memory management

System calls allow user to switch to kernel space and perform restrictedactions, under certain conditions.The kernel must check system call parameters validity.

Gabriel Laskar (EPITA) CAAL 2015 191 / 378

Page 280: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Handling events System calls

System call implementation

1. System calls use a dedicated instruction which causes the processor tochange mode to superuser (or protected) mode.

2. Each system call is indexed by a single number: the syscall traphandler makes an indirect call through the system call dispatch tableto the handler for the specific system call.

Gabriel Laskar (EPITA) CAAL 2015 192 / 378

Page 281: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Handling events System calls

Execution pathSystem calls run in kernel mode on the kernel memory space.

systemreturn

contextswitch

contextswitch

Kernel

code

User

codesystemcall

User

code

User mode Kernel mode

Gabriel Laskar (EPITA) CAAL 2015 193 / 378

Page 282: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Handling events System calls

libc example

Standart C library’s read, write, pipe, ... functions are wrappers tocorresponding system calls:

_SYSENTRY(pipe)mov %o0, %o2mov SYS_pipe, %g1ta %xcc, ST_SYSCALLbcc,a,pt %xcc, 1fstw %o0, [%o2]ERROR()1: stw %o1, [%o2 + 4]retlclr %o0_SYSEND(pipe)

Gabriel Laskar (EPITA) CAAL 2015 194 / 378

Page 283: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Handling events Faults

5: Execution flow

Branch, function calls

Function calls

Handling eventsEventsSystem callsFaultsHardware interruptions

Multitasking

Gabriel Laskar (EPITA) CAAL 2015 195 / 378

Page 284: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Handling events Faults

Faults

How to handle divide by zero?How to handle overflow?

How to handle ...

Gabriel Laskar (EPITA) CAAL 2015 196 / 378

Page 285: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Handling events Faults

Similarities with system calls

Faults are similar to system calls in some respects:I Faults occur as a result of a process executing an instructionI The kernel exception handler may return to the faulty user context

But faults are different in other respects:I Syscalls are deliberate, faults are unexpectedI Not every execution of the instruction results in a fault

Gabriel Laskar (EPITA) CAAL 2015 197 / 378

Page 286: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Handling events Faults

Handling a fault

Different actions may be taken by the operating system in response tofaults:

I kill the user processI notify the process that a fault occurred (so it may recover in its own

way)I solve the problem and resume the process transparently

Gabriel Laskar (EPITA) CAAL 2015 198 / 378

Page 287: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Handling events Faults

Execution path

Instructionfault

contextsave

processexit

User

code

contextrestore

User

code

Kernel

code

User mode Kernel mode

Gabriel Laskar (EPITA) CAAL 2015 199 / 378

Page 288: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Handling events Faults

Events interface

Unix systems can notify a user program of a fault with a signal.Signals are also used for other forms of asynchronous event notifications.

Gabriel Laskar (EPITA) CAAL 2015 200 / 378

Page 289: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Handling events Hardware interruptions

5: Execution flow

Branch, function calls

Function calls

Handling eventsEventsSystem callsFaultsHardware interruptions

Multitasking

Gabriel Laskar (EPITA) CAAL 2015 201 / 378

Page 290: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Handling events Hardware interruptions

Hardware interruptions

How to handle devices interruptions?

Gabriel Laskar (EPITA) CAAL 2015 202 / 378

Page 291: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Handling events Hardware interruptions

Execution path

contextload

contextsave

User

code

Kernel

code

User

code

interruptionreturn

hardwareinterruption

User mode Kernel mode

Gabriel Laskar (EPITA) CAAL 2015 203 / 378

Page 292: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Multitasking

5: Execution flow

Branch, function calls

Function calls

Handling events

Multitasking

Gabriel Laskar (EPITA) CAAL 2015 204 / 378

Page 293: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Multitasking

Multitasking

A single process may not use system resources at full capacity.The idea of multitasking is to simulate the execution of concurrentexecution of many processes, using a single processor.The operating system has to:

I create and delete processes,I organize processes in memory,I schedule processes for CPU use.

Gabriel Laskar (EPITA) CAAL 2015 205 / 378

Page 294: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Multitasking

Memory mapping

Pysical memory space

B process addressingspace

A process addressingspace

A processmemory map

B processmemory map

Page tableaddress register

Gabriel Laskar (EPITA) CAAL 2015 206 / 378

Page 295: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Multitasking

Context switching

B

process

registers

A

process

registers

Real

CPU

registers

Memory

contexts

backup

Reg 1

Reg 2

FlagReg

PageReg

Gabriel Laskar (EPITA) CAAL 2015 207 / 378

Page 296: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Execution flow Multitasking

Process life

Kernel

Process A Process B Process C

context savecontext load

Timer

interruption

This approach requires a way of handling events.Gabriel Laskar (EPITA) CAAL 2015 208 / 378

Page 297: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats

Part VI

Object file formats

Gabriel Laskar (EPITA) CAAL 2015 209 / 378

Page 298: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Build process

6: Object file formats

Build processOverviewDevelopment toolsAnalysis tools

Binary formats

Executable code loading

Gabriel Laskar (EPITA) CAAL 2015 210 / 378

Page 299: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Build process Overview

6: Object file formats

Build processOverviewDevelopment toolsAnalysis tools

Binary formats

Executable code loading

Gabriel Laskar (EPITA) CAAL 2015 211 / 378

Page 300: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Build process Overview

The big picture

ld

C sourcecode

pure Ccode

Assemblycode

Objectfile

ascc1cpp

.o

.o

.o.s.i.c

.h

a.out

Header files

Gabriel Laskar (EPITA) CAAL 2015 212 / 378

Page 301: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Build process Overview

Preprocessor

cpp as ld

C source

code

Pure C

code

Assembly

code file

Object

cc1

Executable

Header

files

.i .s .o.c

I Also called macro-processorI Transforms a text (source) file into another text (source) file:

I merges many files into one (#include)I replaces macros by their definitions (#define)I removes code considering conditions (#if*)

Example: cppI Input: C source file with directivesI Output: “pure” C source file

Gabriel Laskar (EPITA) CAAL 2015 213 / 378

Page 302: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Build process Overview

C compiler

cpp as ld

C source

code

Pure C

code

Assembly

code file

Object

cc1

Executable

Header

files

.i .s .o.c

I Scans and parses the source fileI Analyzes the source (type checking)I Generates assembly code (another source file) for the target

architecture

Gabriel Laskar (EPITA) CAAL 2015 214 / 378

Page 303: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Build process Overview

Assembler

cpp as ld

C source

code

Pure C

code

Assembly

code file

Object

cc1

Executable

Header

files

.i .s .o.c

I Translates instructions in binary opcode sequence for the targetarchitecture

I Collects symbols and resolve label addressesI Writes an object file

The assembler source file will be different depending on architecture!

Gabriel Laskar (EPITA) CAAL 2015 215 / 378

Page 304: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Build process Overview

Linker

cpp as ld

C source

code

Pure C

code

Assembly

code file

Object

cc1

Executable

Header

files

.i .s .o.c

I Merges (many) object filesI Resolves external symbolsI Computes addressesI Writes an executable file

Gabriel Laskar (EPITA) CAAL 2015 216 / 378

Page 305: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Build process Development tools

6: Object file formats

Build processOverviewDevelopment toolsAnalysis tools

Binary formats

Executable code loading

Gabriel Laskar (EPITA) CAAL 2015 217 / 378

Page 306: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Build process Development tools

Development tools

I SPARC assembler: use aasmI Project at http://savannah.nongnu.org/projects/aasm

I Sparc, Mips, PPC, ARM, ... architecture emulatorsI SoCLib: https://www.soclib.frI QEmu

Gabriel Laskar (EPITA) CAAL 2015 218 / 378

Page 307: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Build process Analysis tools

6: Object file formats

Build processOverviewDevelopment toolsAnalysis tools

Binary formats

Executable code loading

Gabriel Laskar (EPITA) CAAL 2015 219 / 378

Page 308: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Build process Analysis tools

objdump

I Displays object (executable) file tables, on host architectureI Disassembles object (executable) file code-sectionsI Useful options:

I -h: display the section headersI -t: display the symbol tablesI -dx: disassemble

Gabriel Laskar (EPITA) CAAL 2015 220 / 378

Page 309: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Build process Analysis tools

readelf

I Displays ELF object (executable) file tables, on any architecture

Gabriel Laskar (EPITA) CAAL 2015 221 / 378

Page 310: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Binary formats

6: Object file formats

Build process

Binary formatsSimple binary formatsHow about code reuse and splittingLinkingFile formats history

Executable code loading

Gabriel Laskar (EPITA) CAAL 2015 222 / 378

Page 311: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Binary formats Simple binary formats

6: Object file formats

Build process

Binary formatsSimple binary formatsHow about code reuse and splittingLinkingFile formats history

Executable code loading

Gabriel Laskar (EPITA) CAAL 2015 223 / 378

Page 312: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Binary formats Simple binary formats

Simple binary formats

Consider an object program that does not need other object programs tobuild a binary program:

I the assembler collects code markers (labels)I the assembler resolves all labels and replaces all of them in the code

Gabriel Laskar (EPITA) CAAL 2015 224 / 378

Page 313: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Binary formats Simple binary formats

Flat program

The labels are immediately replaced and written by the assembler in thebinary file:

00001200

...

90 nop

00001201 ...

my_function:

00000001

OpcodesAddress

00000006

00000000 90 nop

B8 78 56 34 12 mov eax, 0x12345678

E8 00 12 00 00 call my_function

Source code

Gabriel Laskar (EPITA) CAAL 2015 225 / 378

Page 314: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Binary formats Simple binary formats

Flat binary format

In raw binary format, the whole program is written directly in file.Useful at boot time: the processor can only read raw code and there is nobinary loader.

Gabriel Laskar (EPITA) CAAL 2015 226 / 378

Page 315: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Binary formats How about code reuse and splitting

6: Object file formats

Build process

Binary formatsSimple binary formatsHow about code reuse and splittingLinkingFile formats history

Executable code loading

Gabriel Laskar (EPITA) CAAL 2015 227 / 378

Page 316: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Binary formats How about code reuse and splitting

Challenges

What happens when a function must be imported?

What happens when a function must be exported?

Gabriel Laskar (EPITA) CAAL 2015 228 / 378

Page 317: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Binary formats How about code reuse and splitting

Complex program

When the symbol of a called function is unresolved, the assembler leavesthe call destination address undefined:

00000006 E8 00 12 00 00

External

reference

call printf

00001200

...

90 nop

00001201 ...

my_function:

00000001

OpcodesAddress

00000006

00000000 90 nop

B8 78 56 34 12 mov eax, 0x12345678

E8 00 12 00 00 call my_function

Source code

⇒ The binary format must hold information on created “holes”

Gabriel Laskar (EPITA) CAAL 2015 229 / 378

Page 318: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Binary formats How about code reuse and splitting

Needs

To fill the holes left by the assembler later on, the binary file must hold:I A symbol table, which stores each label identifier,I A relocation table, which shows all the remaining “holes”I Labels associated to each “hole”.

More generaly, a binary file format must also hold:I A header containing general informations needed to access various

parts of the file,I Several sections holding code and data (raw data).

Gabriel Laskar (EPITA) CAAL 2015 230 / 378

Page 319: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Binary formats How about code reuse and splitting

Abstraction of a binary file format

string table

symbol table

section table

file header

.bss

reloc. table

.data

.rodata

.text

.data

.rodata

.text

(not in file)

object file sectionsobject file headers

(Information regarding sections, symbol table etc. are usually identifiedwithin the file header)

Gabriel Laskar (EPITA) CAAL 2015 231 / 378

Page 320: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Binary formats How about code reuse and splitting

The symbol table

The symbol table associates each symbol identifier to the code (or data)address it represents (when the address has been resolved):

symbol table

"printf"

"my_func"

"fopen"my_func:

...

0106

0105

0100

0000

0005

E8 ?? ?? ?? ??

90

E8 ?? ?? ?? ??

E8 00 01 00 00

E8 ?? ?? ?? ??

call printf

nop

call fopen

call my_func

call printf

Gabriel Laskar (EPITA) CAAL 2015 232 / 378

Page 321: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Binary formats How about code reuse and splitting

The relocation table

The relocation table associates each “hole” address to a label identifieraddress:

relocation table

printf

0006

0107

printf

0101

fopen

symbol table

"printf"

"my_func"

"fopen"my_func:

...

0106

0105

0100

0000

0005

E8 ?? ?? ?? ??

90

E8 ?? ?? ?? ??

E8 00 01 00 00

E8 ?? ?? ?? ??

call printf

nop

call fopen

call my_func

call printf

Gabriel Laskar (EPITA) CAAL 2015 233 / 378

Page 322: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Binary formats How about code reuse and splitting

BSS regions

Instead of keeping a bunch of empty bytes in the binary file for big staticvariables (arrays), the header can specify a whole region which must befilled with zero.BSS regions makes the file smaller and faster to load.

Gabriel Laskar (EPITA) CAAL 2015 234 / 378

Page 323: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Binary formats Linking

6: Object file formats

Build process

Binary formatsSimple binary formatsHow about code reuse and splittingLinkingFile formats history

Executable code loading

Gabriel Laskar (EPITA) CAAL 2015 235 / 378

Page 324: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Binary formats Linking

Linking

The operation that combines a list of object programs into a binaryprogram is called linking. The tasks that must be accomplished by thelinker are:1. Named sections from different object programs must be merged

together into one named section.2. Merged sections must be put together into the sections of the

memory model.3. Each use of a name in an object program’s references list must be

replaced by an address in the virtual address space.

Gabriel Laskar (EPITA) CAAL 2015 236 / 378

Page 325: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Binary formats Linking

Merging object files

Combining each .text section together, each .data section together, etc.:

.data

.rodata

.text

.data

.rodata

.text

.data

.rodata

.text

Gabriel Laskar (EPITA) CAAL 2015 237 / 378

Page 326: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Binary formats Linking

Zoom on .text section merging

Symbol = S Symbol = B + S

A object

section

B object

section

merged object

section

C

B

A

0x0 0x0 0x0

.text

.text

.text

Gabriel Laskar (EPITA) CAAL 2015 238 / 378

Page 327: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Binary formats Linking

Symbol resolution

.text / .data

relocations symbols.data.textfile header

a.out format

raw binary format

Gabriel Laskar (EPITA) CAAL 2015 239 / 378

Page 328: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Binary formats Linking

Executable binary files

When there is neither unresolved symbols nor relocations anymore, the fileis executable.Executable files usually have a fixed constant load address on a givensystem.Unresolved/unused symbols may exist in an executable file if no relocationuses it.The strip command wipes out unused symbols from the symbol table.

Gabriel Laskar (EPITA) CAAL 2015 240 / 378

Page 329: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Binary formats File formats history

6: Object file formats

Build process

Binary formatsSimple binary formatsHow about code reuse and splittingLinkingFile formats history

Executable code loading

Gabriel Laskar (EPITA) CAAL 2015 241 / 378

Page 330: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Binary formats File formats history

a.out: Assembler Output

relocations

raw binary format

a.out format

.text / .data

.text .data symbolsfile header

I Very simpleI .. but pretty fast to load

Gabriel Laskar (EPITA) CAAL 2015 242 / 378

Page 331: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Binary formats File formats history

COFF (Common Object File Format) and ELF (Executableand Linking Format)

.data.text relocs

.data.textsymbolsrelocs

syms

file sectiontableheader

file sectiontableheader

ELF format

COFF format

I Permits use of dynamic libraries

Gabriel Laskar (EPITA) CAAL 2015 243 / 378

Page 332: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Binary formats File formats history

Mach-O (Darwin)

rodata datatextload

commands

file

header

Gabriel Laskar (EPITA) CAAL 2015 244 / 378

Page 333: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Binary formats File formats history

Mach-O (Darwin)“Fat binaries”

rodata datatextload

commands

file

header

rodata datatextload

commands

file

header

Fat arch table

Fat header

Mach−O fat binaries

Gabriel Laskar (EPITA) CAAL 2015 245 / 378

Page 334: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Executable code loading

6: Object file formats

Build process

Binary formats

Executable code loadingBinary file loadingLoading processDynamic libraries

Gabriel Laskar (EPITA) CAAL 2015 246 / 378

Page 335: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Executable code loading Binary file loading

6: Object file formats

Build process

Binary formats

Executable code loadingBinary file loadingLoading processDynamic libraries

Gabriel Laskar (EPITA) CAAL 2015 247 / 378

Page 336: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Executable code loading Binary file loading

Binary file loading

Executable file format is like object file format, with the followingrestrictions:

I It has no relocations,I All addresses are resolvedI It is ready to load in memory

Gabriel Laskar (EPITA) CAAL 2015 248 / 378

Page 337: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Executable code loading Binary file loading

Executable format

string table

symbol table

section table

file header

.bss

reloc. table

.data

.rodata

.text

.data

.rodata

.text

(not in file)

object file sectionsobject file headersGabriel Laskar (EPITA) CAAL 2015 249 / 378

Page 338: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Executable code loading Binary file loading

Executable loading

Process memory structure is mapped to internal executable file format:I Loadable sections are loaded from file to memoryI Uninitialised data (.bss section) is allocated in process memoryI Internal file sections are ignoredI Heap and stack are allocated

Gabriel Laskar (EPITA) CAAL 2015 250 / 378

Page 339: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Executable code loading Loading process

6: Object file formats

Build process

Binary formats

Executable code loadingBinary file loadingLoading processDynamic libraries

Gabriel Laskar (EPITA) CAAL 2015 251 / 378

Page 340: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Executable code loading Loading process

Executable memory mapping

.data

.rodata

.text

memoryinitialized

noninitializedmemory

stack

heap

.bss

memory sections

process memory

string table

symbol table

section table

file header

.bss

reloc. table

.data

.rodata

.text

.data

.rodata

.text

(not in file)

object file sectionsobject file headers

Gabriel Laskar (EPITA) CAAL 2015 252 / 378

Page 341: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Executable code loading Loading process

Memory attributes

Memory attributes depends on section and area type:I .text section may be loaded in executable only region (depending on

OS)I memory used to store .rodata section will be marked as read-onlyI memory used to store .data, .bss sections, heap and stack will be

marked as read/write

Gabriel Laskar (EPITA) CAAL 2015 253 / 378

Page 342: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Executable code loading Loading process

Memory attributes

rw−

r−−

−−x

initializedmemory

noninitializedmemory

memory sections

from file

loaded

data

file datainternal

executable file

object file sections

executable

pages

read/write

virtual memory rights

process memory

.text

.rodata

.data

.text

.rodata

.data

.bss

heap

stack

not in file

sym. table

string table

pages

read only

pages

Gabriel Laskar (EPITA) CAAL 2015 254 / 378

Page 343: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Executable code loading Dynamic libraries

6: Object file formats

Build process

Binary formats

Executable code loadingBinary file loadingLoading processDynamic libraries

Gabriel Laskar (EPITA) CAAL 2015 255 / 378

Page 344: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Executable code loading Dynamic libraries

Dynamic libraries

Use of dynamic libraries implies:I different and more complex memory mappingI position independent codeI different way to handle relocationsI plenty of cool complicated stuff

Gabriel Laskar (EPITA) CAAL 2015 256 / 378

Page 345: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Executable code loading Dynamic libraries

Dynamic libs in memory

Libraries have specific memory mapping and organisation:I A dynamic library is loaded only once even if several processes use itI Extra .data and .text sections are mapped in process memory-spaceI Library .text section is mapped in several process memoryI Library .data section is copied in each process memory

Gabriel Laskar (EPITA) CAAL 2015 257 / 378

Page 346: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Executable code loading Dynamic libraries

Dynamic process memory layout

stack

heap heap

stack

copy

library A

.text

.data

.data

.rodata

.text .text

.rodata

.data

.data

.text

.text

.data

.data

.text

.text

.text

.data

.text

library B

executable A process A process B executable B

copy

Gabriel Laskar (EPITA) CAAL 2015 258 / 378

Page 347: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Executable code loading Dynamic libraries

Handling code location change

Dynamic libraries may not be mapped with the same virtual address in allprocesses:

I Address may already be in use by an other libraryI The same code must be able to run with different base addresses

Position independent code solves these problems without going throughthe complex relocation process.Relative jumps and relative data memory accesses need to be handled bythe CPU.

Gabriel Laskar (EPITA) CAAL 2015 259 / 378

Page 348: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Executable code loading Dynamic libraries

Position independent code

positionindependent

code

relativebranch0112 mov ...

0100 jmp +12

heap

stack

process

.text

.data

.rodata

.data

.rodata

.text

executable

.text

library

Gabriel Laskar (EPITA) CAAL 2015 260 / 378

Page 349: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Executable code loading Dynamic libraries

Position dependent code

Some location change are more complicated with dynamic libraries:I .data can’t be referred with relative addressing from .text section if

no relative data access is available.I .data section won’t have the same address in all process memory

maps.I .text section is common to all processes and can’t reflect .data

location differences.The solution is to use indirect memory access to reach data objects fromcode. One GOT (Global Offset Table) and PLT (Procedure Linkage Table)will hold data object addresses for each process.Some dynamic relocations and data copy will also help to solve thisproblem.

Gabriel Laskar (EPITA) CAAL 2015 261 / 378

Page 350: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Object file formats Executable code loading Dynamic libraries

Use of GOT/PLT

GOT/PLTin process A

GOT/PLT

.textcall printf.data

heap

stack

process

.text

.data

.rodata

.data

.rodata

.text

executable

.text

library

Gabriel Laskar (EPITA) CAAL 2015 262 / 378

Page 351: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming

Part VII

Assembly programming

Gabriel Laskar (EPITA) CAAL 2015 263 / 378

Page 352: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Introduction

7: Assembly programming

Introduction

Pure assembly files

Inline Assembly in C

Gabriel Laskar (EPITA) CAAL 2015 264 / 378

Page 353: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Introduction

What is assembly?

Assembly is notI binary dataI directly interpreted by the processorI don’t mix it up with machine code

Assembly isI a language, with a syntax, keywords, etc.I translated in a binary format

Gabriel Laskar (EPITA) CAAL 2015 265 / 378

Page 354: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Introduction

Benefits of assembly programming

I Improve developper’s understanding of computer architecture andprogramming

I Direct access to hardware resourcesI Access to system featuresI Speed and efficiency of programsI Low footprint of programs

Gabriel Laskar (EPITA) CAAL 2015 266 / 378

Page 355: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Introduction

Drawbacks of assembly programming

I Low-level programming forbids abstractionI Assembly code is hard to keep cleanI Slows development down: lots of keywords, unusual behaviorsI Each processor architecture has its own language: conventions,

registers, instructions, privileges, allowed arguments

Gabriel Laskar (EPITA) CAAL 2015 267 / 378

Page 356: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Pure assembly files

7: Assembly programming

Introduction

Pure assembly filesAnatomy of assembly codeExamples

Inline Assembly in C

Gabriel Laskar (EPITA) CAAL 2015 268 / 378

Page 357: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Pure assembly files Anatomy of assembly code

7: Assembly programming

Introduction

Pure assembly filesAnatomy of assembly codeExamples

Inline Assembly in C

Gabriel Laskar (EPITA) CAAL 2015 269 / 378

Page 358: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Pure assembly files Anatomy of assembly code

Assembly file organization

Source file is organized in different sections:code

Contains program binary opcodesdata

Contains program global variablesrodata

Contains read-only program variables

Gabriel Laskar (EPITA) CAAL 2015 270 / 378

Page 359: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Pure assembly files Anatomy of assembly code

Assembly language statements

directivesDetermine the assembler behavior

instructionsChosen in the CPU instruction set, translated in binaryformat, written in output file

operandsExpressions containing register identifiers, immediateoperands, symbols

labelsBetween instructions, used to mark specific locations insource code

Gabriel Laskar (EPITA) CAAL 2015 271 / 378

Page 360: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Pure assembly files Examples

7: Assembly programming

Introduction

Pure assembly filesAnatomy of assembly codeExamples

Inline Assembly in C

Gabriel Laskar (EPITA) CAAL 2015 272 / 378

Page 361: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Pure assembly files Examples

Assembly language statementsExample

.mod_load asm-sparc

.define FOO 5

.section code .text.mod_asm opcodes v8

add %g0, FOO, %g1.ends

Gabriel Laskar (EPITA) CAAL 2015 273 / 378

Page 362: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Pure assembly files Examples

Assembly language statementsExample

.mod_load asm-sparc

.mod_load out-elf32

.section code .text.mod_asm opcodes v8

mov 0x12, %g1mov 0x34533, %g2

add %g1, %g2, %g3.ends

Gabriel Laskar (EPITA) CAAL 2015 274 / 378

Page 363: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Pure assembly files Examples

Assembly language statementsExample

.mod_load asm-sparc

.include sparc/v8.def

.section data .datalbl:

.reserve 4.ends

.section code .text.mod_asm opcodes v8

@set .data:lbl, %g2ld [%g2],%g1

.ends

Gabriel Laskar (EPITA) CAAL 2015 275 / 378

Page 364: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Pure assembly files Examples

Assembly language statementsExample

.mod_load asm-sparc

.include sparc/v8.def

.section code .text.mod_asm opcodes v8

.export main

.proc mainretrestore

.endp.ends

Gabriel Laskar (EPITA) CAAL 2015 276 / 378

Page 365: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Pure assembly files Examples

Assembly language statementsExample

.mod_load asm-sparc

.include sparc/v8.def

.section code .text.mod_asm opcodes v8

.extern exit

.proc my_exitcall exitnop

.endp.ends

Gabriel Laskar (EPITA) CAAL 2015 277 / 378

Page 366: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Inline Assembly in C

7: Assembly programming

Introduction

Pure assembly files

Inline Assembly in CPresentationSimple examplesSyntaxComplex examplesNamed valuesQuizz

Gabriel Laskar (EPITA) CAAL 2015 278 / 378

Page 367: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Inline Assembly in C Presentation

7: Assembly programming

Introduction

Pure assembly files

Inline Assembly in CPresentationSimple examplesSyntaxComplex examplesNamed valuesQuizz

Gabriel Laskar (EPITA) CAAL 2015 279 / 378

Page 368: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Inline Assembly in C Presentation

Why?

I C can’t express everythingI Cpu-specific registers (flags, condition codes, hardware counters)I Features not addressed by language (atomic operations)I System features (Memory handling, interrupts handling, exception

handling)I Compiler are not always aware of possible optimizations

I Use of instruction side-effectsI Un-optimizable patterns (or optimization patterns specific to only one

algorithm)

Gabriel Laskar (EPITA) CAAL 2015 280 / 378

Page 369: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Inline Assembly in C Presentation

Inline AssemblyCompiler’s PoV

For the compiler, the assembly you pass in is:I a raw stringI with a printf-like syntaxI passed directly to the assemblerI surounded by optional statements

I input valuesI output valuesI clobbered values

Gabriel Laskar (EPITA) CAAL 2015 281 / 378

Page 370: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Inline Assembly in C Presentation

Inline AssemblyProgrammer’s PoV

For you, the assembly you pass is meant to eitherI compute a valueI change some hardware featureI have a side-effect

Gabriel Laskar (EPITA) CAAL 2015 282 / 378

Page 371: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Inline Assembly in C Simple examples

7: Assembly programming

Introduction

Pure assembly files

Inline Assembly in CPresentationSimple examplesSyntaxComplex examplesNamed valuesQuizz

Gabriel Laskar (EPITA) CAAL 2015 283 / 378

Page 372: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Inline Assembly in C Simple examples

ExampleNo data

static inlinevoid disable_interrupts(void){

asm volatile("cli");}

Gabriel Laskar (EPITA) CAAL 2015 284 / 378

Page 373: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Inline Assembly in C Simple examples

ExampleOutput only

static inlinecpu_cycle_t cpu_cycle_count(void){

uint32_t low, high;

asm("rdtsc" : "=a" (low), "=d" (high));

return (low | ((uint64_t)high << 32));}

Gabriel Laskar (EPITA) CAAL 2015 285 / 378

Page 374: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Inline Assembly in C Simple examples

ExampleInput only

static inlinevoid cpu_io_write_8(uintptr_t addr, uint8_t data){

asm volatile("outb %0, %1":: "a" (data), "d" ((uint16_t)addr));

}

Gabriel Laskar (EPITA) CAAL 2015 286 / 378

Page 375: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Inline Assembly in C Simple examples

ExampleIn-out

static inlineuint32_t cpu_endian_swap32(uint32_t x){asm ("bswap %0"

: "=r" (x): "0" (x));

return x;}

Gabriel Laskar (EPITA) CAAL 2015 287 / 378

Page 376: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Inline Assembly in C Syntax

7: Assembly programming

Introduction

Pure assembly files

Inline Assembly in CPresentationSimple examplesSyntaxComplex examplesNamed valuesQuizz

Gabriel Laskar (EPITA) CAAL 2015 288 / 378

Page 377: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Inline Assembly in C Syntax

Syntax

I one statement with optional argumentsasm [volatile]("statements \n\t"

"on many lines \n\t"[: [output_variables][: [input_variables][: clobbered_registers]]]

);I arguments are referenced in-order (%0..%n), whether they are input or

output!I arguments types abide constraints, enclosed in ""

Gabriel Laskar (EPITA) CAAL 2015 289 / 378

Page 378: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Inline Assembly in C Complex examples

7: Assembly programming

Introduction

Pure assembly files

Inline Assembly in CPresentationSimple examplesSyntaxComplex examplesNamed valuesQuizz

Gabriel Laskar (EPITA) CAAL 2015 290 / 378

Page 379: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Inline Assembly in C Complex examples

Add with carry and overflowC

uint32_t add_cv(uint32_t a, uint32_t b, uint8_t cin,uint8_t *cout, uint8_t *vout)

{uint64_t sum = (uint64_t)a + (uint64_t)b + (uint64_t)cin;*cout = sum >> 32;*vout = ( (b ^ ((uint32_t)sum)) & ~(a^b) ) >> 31;return sum;

}

Gabriel Laskar (EPITA) CAAL 2015 291 / 378

Page 380: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Inline Assembly in C Complex examples

Add with carry and overflowASM

uint32_t add_cv(uint32_t a, uint32_t b, uint8_t cin,uint8_t *cout, uint8_t *vout)

{uint32_t result;

asm(" btl $0, %k5 \n"" adcl %k3, %k4 \n"" setc %b1 \n"" seto %b2 \n": "=r" (result), "=qm" (*cout), "=qm" (*vout): "r" (a), "0" (b), "r" (cin)

);

return result;}

Gabriel Laskar (EPITA) CAAL 2015 292 / 378

Page 381: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Inline Assembly in C Complex examples

Atomic increment for ARMstatic inlinebool_t cpu_atomic_inc(volatile atomic_int_t *a){

reg_t tmp = 0, tmp2;

asm volatile("1: \n\t""ldrex %0, [%2] \n\t""add %0, %0, #1 \n\t""strex %1, %0, [%2] \n\t""tst %1, #1 \n\t""bne 1b \n\t": "=&r" (tmp), "=&r" (tmp2): "r" (a): "m" (*a));

return tmp != 0;}

Gabriel Laskar (EPITA) CAAL 2015 293 / 378

Page 382: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Inline Assembly in C Named values

7: Assembly programming

Introduction

Pure assembly files

Inline Assembly in CPresentationSimple examplesSyntaxComplex examplesNamed valuesQuizz

Gabriel Laskar (EPITA) CAAL 2015 294 / 378

Page 383: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Inline Assembly in C Named values

Named values

I Using numbers for values makes code harder to write and readI GCC permits named values in inline assembly

Gabriel Laskar (EPITA) CAAL 2015 295 / 378

Page 384: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Inline Assembly in C Named values

Named valuesWith numbers

static inlinebool_t cpu_atomic_inc(volatile atomic_int_t *a){

reg_t tmp = 0, tmp2;

asm volatile("1: \n\t""ldrex %0, [%2] \n\t""add %0, %0, #1 \n\t""strex %1, %0, [%2] \n\t""tst %1, #1 \n\t""bne 1b \n\t": "=&r" (tmp), "=&r" (tmp2): "r" (a): "m" (*a));

return tmp != 0;}

Gabriel Laskar (EPITA) CAAL 2015 296 / 378

Page 385: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Inline Assembly in C Named values

Named valuesNamed

static inlinebool_t cpu_atomic_inc(volatile atomic_int_t *a){

reg_t tmp = 0, tmp2;

asm volatile("1: \n\t""ldrex %[tmp], [%[atomic]] \n\t""add %[tmp], %[tmp], #1 \n\t""strex %[tmp2], %[tmp], [%[atomic]] \n\t""tst %[tmp2], #1 \n\t""bne 1b \n\t": [tmp] "=&r" (tmp), [tmp2] "=&r" (tmp2), [clobber] "=m" (*a): [atomic] "r" (a));

return tmp != 0;}

Gabriel Laskar (EPITA) CAAL 2015 297 / 378

Page 386: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Inline Assembly in C Quizz

7: Assembly programming

Introduction

Pure assembly files

Inline Assembly in CPresentationSimple examplesSyntaxComplex examplesNamed valuesQuizz

Gabriel Laskar (EPITA) CAAL 2015 298 / 378

Page 387: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Assembly programming Inline Assembly in C Quizz

Quizz!What does this do?

asm volatile("":::"memory");

Gabriel Laskar (EPITA) CAAL 2015 299 / 378

Page 388: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on x86

Part VIII

Focus on x86

Gabriel Laskar (EPITA) CAAL 2015 300 / 378

Page 389: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on x86 x86 story

8: Focus on x86

x86 storyTimelinex86 architectureInstruction set

Gabriel Laskar (EPITA) CAAL 2015 301 / 378

Page 390: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on x86 x86 story Timeline

8: Focus on x86

x86 storyTimelinex86 architectureInstruction set

Gabriel Laskar (EPITA) CAAL 2015 302 / 378

Page 391: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on x86 x86 story Timeline

Early events and CPU development

1971 Intel 40041978 Intel 8086 & 8088, first x86 CPUs1981 Intel 801861982 Intel 80286: 16 bits, 24 address bits, protection

Gabriel Laskar (EPITA) CAAL 2015 303 / 378

Page 392: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on x86 x86 story Timeline

Successful CPU development

1985 Intel 80386: 32 bits, MMU1989 Intel 80486: on-chip cache, pipeline1993 Intel PentiumTM: superscalar, mmx, 64 address bits1995 Intel Pentium Pro: ooo1997 Intel Pentium II: internal L21999 Intel Pentium III: cpuid!1999 AMD Athlon: ooo, 3dnow

Gabriel Laskar (EPITA) CAAL 2015 304 / 378

Page 393: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on x86 x86 story Timeline

Technology barriers

2000 Intel Pentium 4: HT2001 Intel Itanium2003 AMD Athlon 642003 Intel Pentium M: P6 (PPro to PIII)2006 Intel Pentium 4 Prescott 2M: 64 bits2006 Intel Core/Core2: fusion2009 Intel Core i5/i7: ...

Gabriel Laskar (EPITA) CAAL 2015 305 / 378

Page 394: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on x86 x86 story x86 architecture

8: Focus on x86

x86 storyTimelinex86 architectureInstruction set

Gabriel Laskar (EPITA) CAAL 2015 306 / 378

Page 395: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on x86 x86 story x86 architecture

x86 architecture

The x86 architecture is:I based on a CISC instruction set.I based on little-endian memory access.I based on two operands instructions including memory access.I backwards compatible: all new x86 processors are fully compatible

with their predecessors.I very complex: Pentium IV has a 20 stages pipeline.I newer processors have less pipeline stages

Gabriel Laskar (EPITA) CAAL 2015 307 / 378

Page 396: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on x86 x86 story x86 architecture

Available registers

First x86 generation had a few 16-bits registers available:I General purpose 16 bits registers:

I ax, bx, cx, dx, si, diI General purpose 8 bits registers:

I al, bl, cl, dlI ah, bh, ch, dh

I Stack and frame registers:I sp, bp

I Flag registers, ...

Gabriel Laskar (EPITA) CAAL 2015 308 / 378

Page 397: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on x86 x86 story x86 architecture

Nested registers

registers8 bits

register16 bits

AH AL

AX

Gabriel Laskar (EPITA) CAAL 2015 309 / 378

Page 398: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on x86 x86 story x86 architecture

Register extensions

I Starting with the 386 processor, all registers are now 32-bits wide.I Due to compatibility issue, 16 bits register are still present.I eax, ebx, ecx, edx, esi, edi, esp, ebp were added.I Even if registers size is increased, registers count remained the same:

only 6 registers are available for operations.

I For amd64, AMD extended the register count to 16, only availablewith 64 bits mode enabled

I rax, rbx, rcx, rdx, rsi, rdi, rsp, rbp,I r8, r9, r10, r11, r12, r13, r14, r15

I AMD licensed the amd64 to Intel after the Itanium flop...

Gabriel Laskar (EPITA) CAAL 2015 310 / 378

Page 399: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on x86 x86 story x86 architecture

Register extensions

I Starting with the 386 processor, all registers are now 32-bits wide.I Due to compatibility issue, 16 bits register are still present.I eax, ebx, ecx, edx, esi, edi, esp, ebp were added.I Even if registers size is increased, registers count remained the same:

only 6 registers are available for operations.I For amd64, AMD extended the register count to 16, only available

with 64 bits mode enabledI rax, rbx, rcx, rdx, rsi, rdi, rsp, rbp,

I r8, r9, r10, r11, r12, r13, r14, r15

I AMD licensed the amd64 to Intel after the Itanium flop...

Gabriel Laskar (EPITA) CAAL 2015 310 / 378

Page 400: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on x86 x86 story x86 architecture

Register extensions

I Starting with the 386 processor, all registers are now 32-bits wide.I Due to compatibility issue, 16 bits register are still present.I eax, ebx, ecx, edx, esi, edi, esp, ebp were added.I Even if registers size is increased, registers count remained the same:

only 6 registers are available for operations.I For amd64, AMD extended the register count to 16, only available

with 64 bits mode enabledI rax, rbx, rcx, rdx, rsi, rdi, rsp, rbp,I r8, r9, r10, r11, r12, r13, r14, r15

I AMD licensed the amd64 to Intel after the Itanium flop...

Gabriel Laskar (EPITA) CAAL 2015 310 / 378

Page 401: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on x86 x86 story x86 architecture

Register extensions

I Starting with the 386 processor, all registers are now 32-bits wide.I Due to compatibility issue, 16 bits register are still present.I eax, ebx, ecx, edx, esi, edi, esp, ebp were added.I Even if registers size is increased, registers count remained the same:

only 6 registers are available for operations.I For amd64, AMD extended the register count to 16, only available

with 64 bits mode enabledI rax, rbx, rcx, rdx, rsi, rdi, rsp, rbp,I r8, r9, r10, r11, r12, r13, r14, r15

I AMD licensed the amd64 to Intel after the Itanium flop...

Gabriel Laskar (EPITA) CAAL 2015 310 / 378

Page 402: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on x86 x86 story x86 architecture

32/64 bits nested registers

RAX64 bitsregister

EAX 32 bitsregister

registers8 bits

register16 bits

AH AL

AX

Gabriel Laskar (EPITA) CAAL 2015 311 / 378

Page 403: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on x86 x86 story x86 architecture

Instruction format

x86 architecture is a CISC based architecture:I All instructions have a variable length,I Instructions length can’t be determined before reading first bytes,I Many instructions with different formats are available.

Gabriel Laskar (EPITA) CAAL 2015 312 / 378

Page 404: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on x86 x86 story x86 architecture

Instruction format

8086-80286

opcode opc2 immediate memorydisplacement

modr/m

Gabriel Laskar (EPITA) CAAL 2015 313 / 378

Page 405: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on x86 x86 story x86 architecture

Instruction format

80386-Pentium

size sizedata

prefix prefix

address

SIBprefixopcode

opcode opc2 immediate memorydisplacement

modr/m

Gabriel Laskar (EPITA) CAAL 2015 313 / 378

Page 406: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on x86 x86 story x86 architecture

Instruction format

3dnow!

3dnowimm

size sizedata

prefix prefix

address

SIBprefixopcode

opcode opc2 immediate memorydisplacement

modr/m

Gabriel Laskar (EPITA) CAAL 2015 313 / 378

Page 407: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on x86 x86 story x86 architecture

Instruction format

x86-64

prefixx86−64

3dnowimm

size sizedata

prefix prefix

address

SIBprefixopcode

opcode opc2 immediate memorydisplacement

modr/m

Gabriel Laskar (EPITA) CAAL 2015 313 / 378

Page 408: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on x86 x86 story x86 architecture

Addressing mode

The x86 architecture supports several complex addressing modes. On 32bit processors (386 and later) a memory address can contain:

I A base address registerI A address displacementI An index registerI An multiply factor on index register

Example: mov eax, [ebx + 0x12345 + ecx * 8]

Gabriel Laskar (EPITA) CAAL 2015 314 / 378

Page 409: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on x86 x86 story x86 architecture

Wired stack management

Specific instructions are available for stack management:I push x instruction can be used to store data on the top of the stack

and decrement the stack pointer.I pop x instruction removes data from the stack.I call and ret instructions directly push and pop return address on

and from the stack.

The (e)sp register is used as stack pointer.

Gabriel Laskar (EPITA) CAAL 2015 315 / 378

Page 410: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on x86 x86 story x86 architecture

Branch

I The x86 architecture uses a flag register to manage conditionalbranches.

I Many different instructions with different lengthes can be useddepending on jump size.

I No delayed slot are used

Gabriel Laskar (EPITA) CAAL 2015 316 / 378

Page 411: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on x86 x86 story x86 architecture

Jump size

immediate16/32 bits

opc

E9 EB

imm.8 bits

opc

long jump(−32768 to +32767)

short jump(−128 to +127)

Gabriel Laskar (EPITA) CAAL 2015 317 / 378

Page 412: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on x86 x86 story Instruction set

8: Focus on x86

x86 storyTimelinex86 architectureInstruction set

Gabriel Laskar (EPITA) CAAL 2015 318 / 378

Page 413: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on x86 x86 story Instruction set

Instruction set

x86 instruction set is huge:I Over 580 instructions handled by Pentium 3 CPUI Over 850 opcodes handled by Pentium 3 CPU

A single instruction can be very complex:I Memory access capableI Handle different data widthsI Perform complex computation

Gabriel Laskar (EPITA) CAAL 2015 319 / 378

Page 414: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on x86 x86 story Instruction set

Instruction set extensions

Extensions to the x86 instruction set are common and all instructions arenot available on all CPUs. Starting with the pentium processor,SIMD-based instruction sets appeared:

I MMXI 3dNow!I MMX2, SSEI 3dNow2!, SSE2, SSE3, SSE4sI To be continued...

Gabriel Laskar (EPITA) CAAL 2015 320 / 378

Page 415: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on x86 x86 story Instruction set

Instruction setSIMD operations

source 1register

source 2register

destinationregister

normal operation

64 bits value

64 bits value

64 bits value

SIMD operation

4 x 16 bits

4 x 16 bits

4 x 16 bits

Gabriel Laskar (EPITA) CAAL 2015 321 / 378

Page 416: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on x86 x86 story Instruction set

x86-specific coding

Due to its amazing number of available instructions, x86 code is highlytunable for optimizations:

I High level languages compilers are not able to use all instructionsefficiently,

I Hand written assembly is often faster,I Complex Instructions can be used to process unexpected tasks.

Gabriel Laskar (EPITA) CAAL 2015 322 / 378

Page 417: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on x86 x86 story Instruction set

Instruction sethas-been parts

I aaa, aad, aam, aas, daa, dasI xlat

Gabriel Laskar (EPITA) CAAL 2015 323 / 378

Page 418: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on x86 x86 story Instruction set

x86-specific codingexample: LEA

The LEA instruction is designed to compute memory addresses. But it canbe used to add and multiply values.

lea eax, [ebx + ecx]

lea eax, [ebx * 8 + ebx]

Gabriel Laskar (EPITA) CAAL 2015 324 / 378

Page 419: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on x86 x86 story Instruction set

x86-specific codingexample: tiniest mem*()?

memcpy rep movbs

memset rep stosd

memcmp repe cmpsb

strncmp repnz cmpsb

Gabriel Laskar (EPITA) CAAL 2015 325 / 378

Page 420: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors

Part IX

Focus on RISC processors

Gabriel Laskar (EPITA) CAAL 2015 326 / 378

Page 421: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors History

9: Focus on RISC processors

History

Some instruction sets

Gabriel Laskar (EPITA) CAAL 2015 327 / 378

Page 422: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors History

Let’s see a timeline...

Alpha

EV6

21264

Alpha

EV7

21364

STI project CELL

1965 1970 1990 2000 20101975 1980 1985 1995 2005

CDC 6600

(Cray)

Data General Nova

(Edson de Castro)

RISC−I RISC−II SPARC SPARC−v8 SPARC−v9 Ultra−1 Ultra−2 Ultra−3

Mips−r2000 Mips−r3000 Mips−r4000

64−bit

Mips−r10k Mips−r12k

Mips V

Mips−r8k

Mips IV Mips32

Mips−r16k r24k

Cheetah America Bellatrix

RS−6000

RIOS 1/9

RSC PPC POWER 3

64 bits

POWER 4 POWER 5

ARM project ARM 2ARM 1 ARM 6 (v3) ARM 7

Thumb

ARM 9−tdmi ARM 11 M3 A8 M1

A9

M0

A9 >2GHz

Alpha

EV5

21164

Alpha

EV4

21064

RISC CPU

POWER 7DA

RPA

ARM

ltd

SUN

SGI

IBM

Aco

rn

Superscalar

Que

en

awar

d

PRISM

Ope

n so

urce

Har

d / S

oft c

ore

+licen

sing

Soft−

core

+licen

sing

Har

d−co

re

Super H SH−2 SH−4SH−3 SH−6

SH−7

TS1

PA−Risc

PCX−LPCX−S

PA−7000 PA−7100−LC

(SIMD)

PCX−U

PA−8000 PA−8600

PCX−W+

PA−8900

ShortfinHP

DEC

Hita

chi

A−I−

M

(ann

ounc

ed)

licen

sing

issu

es

Uns

ure

Uns

ure

Uns

ure

N64

PlayStation 2

PlayStation

PSP

Saturn

Dreamcast

3DO

GameCube

XBox 360

Wii

PS3

DS

GBA

DA

RPA

Mips

RISC ProjectDA

RPA

UC−Berkeley

Patterson

Stanford

Hennessy

Cambridge

Gabriel Laskar (EPITA) CAAL 2015 328 / 378

Page 423: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors Some instruction sets

9: Focus on RISC processors

History

Some instruction setsMipsSPARCPPCARM

Gabriel Laskar (EPITA) CAAL 2015 329 / 378

Page 424: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors Some instruction sets Mips

9: Focus on RISC processors

History

Some instruction setsMipsSPARCPPCARM

Gabriel Laskar (EPITA) CAAL 2015 330 / 378

Page 425: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors Some instruction sets Mips

MipsInstruction format

012345678910111213141516171819202122232425262728293031

J/JAL offset

OP rs rt immSpecial rs rt rd ... op

I 32 GPRI hard-wired r0 to 0I 32 FPU registers, all sizes aliasedI delay slot

Gabriel Laskar (EPITA) CAAL 2015 331 / 378

Page 426: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors Some instruction sets Mips

MipsAsm code

addiu r2, r3, 0x42 ; alu reg / immor r3, r4, r5 ; alu reg / reg

lui r2, 0x1234 ; load upper immediateori r2, r2, 0x5678 ; r2 = 0x12345678

lw r4, 0x1234(r9) ; load word from r9 + 0x1234

slt r1, r2, r3 ; test if r2 < r3bnez r1, 2f ; jump if truenop ; delay slot

lbu r2, (r3) ; load byte, not sign extended

jal foo ; pc = foo; r31 = return addressnopGabriel Laskar (EPITA) CAAL 2015 332 / 378

Page 427: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors Some instruction sets SPARC

9: Focus on RISC processors

History

Some instruction setsMipsSPARCPPCARM

Gabriel Laskar (EPITA) CAAL 2015 333 / 378

Page 428: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors Some instruction sets SPARC

SPARCInstruction format 012345678910111213141516171819202122232425262728293031

01 offset012345678910111213141516171819202122232425262728293031

00 rd op2 imm

00 a cond op2 imm012345678910111213141516171819202122232425262728293031

1x rd op3 rs1 0 asi rs2

1x rd op3 rs1 1 imm

1x rd op3 rs1 opf rs2

I 32 GPRI hard-wired %g0 to 0I register windowI unwindowed aliased FPU registers, 8 * 128 bits, 32 * 32 bitsI delay slotGabriel Laskar (EPITA) CAAL 2015 334 / 378

Page 429: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors Some instruction sets SPARC

SPARCAsm code

add %g3, 0x42, %g2 ; alu reg / immor %g3, %g4, %g5 ; alu reg / reg

sethi 0x12345, %g2 ; load upper immediate (20 bits)or %g2, 0x678, %g2 ; g2 = 0x12345678

ld [%l2 + 0x1234], %g4 ; load word from l2 + 0x1234ld [%l2 + %i3], %g4 ; load word from l2 + i3

tst %l2, %l3 ; test if l2 < l3blt 3f ; jump if true

Gabriel Laskar (EPITA) CAAL 2015 335 / 378

Page 430: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors Some instruction sets PPC

9: Focus on RISC processors

History

Some instruction setsMipsSPARCPPCARM

Gabriel Laskar (EPITA) CAAL 2015 336 / 378

Page 431: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors Some instruction sets PPC

PowerPCInstruction format

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

op jump offset aa lk

op rd ra immop rd ra rb mb me rc

op rd ra rb op2 rc

I 32 GPRI 32 FPR, fixed internal representationI additional registers for lr, ctrI no global flags, but 8 “cause registers” crxI can merge causes with logical operations

Gabriel Laskar (EPITA) CAAL 2015 337 / 378

Page 432: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors Some instruction sets PPC

PowerPCAsm code

addi 3, 2, 42 ; alu reg / immor. 3, 4, 5 ; alu reg / reg, update cr0

lis 3, 0x1234 ; load upper immediate (16 bits)ori 3, 3, 0x5678 ; r3 = 0x12345678

lwz 4, 9, 0x1234 ; load word from r9 + 0x1234lwzx 4, 9, 3 ; load word from r9 + r3

mtctr 4 ; put r4 in count register...bdnzl 2f ; Decrement CTR, Branch if CTR != 0

rlwinm ... ; Rotate and mask

Gabriel Laskar (EPITA) CAAL 2015 338 / 378

Page 433: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors Some instruction sets ARM

9: Focus on RISC processors

History

Some instruction setsMipsSPARCPPCARM

Gabriel Laskar (EPITA) CAAL 2015 339 / 378

Page 434: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors Some instruction sets ARM

ARMInstruction features

I 16 GPR, one of them is pc (r15)I Every instruction is “guarded” by a condition. It may be:

I always, >, >=, “higher”, overflow, negative, carry, ==I their opposites

I aliased FPR: 16 double, 32 float

You can prefix any instruction with “never”:I 1/16th of the instruction set is “nop”...

Gabriel Laskar (EPITA) CAAL 2015 340 / 378

Page 435: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors Some instruction sets ARM

ARMInstruction features

I 16 GPR, one of them is pc (r15)I Every instruction is “guarded” by a condition. It may be:

I always, >, >=, “higher”, overflow, negative, carry, ==I their opposites

I aliased FPR: 16 double, 32 float

You can prefix any instruction with “never”:I 1/16th of the instruction set is “nop”...

Gabriel Laskar (EPITA) CAAL 2015 340 / 378

Page 436: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors Some instruction sets ARM

ARMInstruction format

012345678910111213141516171819202122232425262728293031

cond 00 i op s rn rd imm/reg

cond 01 i p u bw l rn rd imm/reg

cond 100 p u bw l rn reg-list

cond 101 l jmp offset

cond 11 coproc, sys

Gabriel Laskar (EPITA) CAAL 2015 341 / 378

Page 437: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors Some instruction sets ARM

ARMInstruction format (2)

012345678910111213141516171819202122232425262728293031

cond 00 i op s rn rd imm/reg

cond 01 i p u bw l rn rd imm/reg

1 rotate imm

0 rs 0 ty 1 rm

0 amount ty 0 rm

and r3, r7, #42 ; r3 = r7 & 0x2aadd r2, r8, r5, lsl r1 ; r2 = r8 + (r5 << r1)sub r1, r9, r1, lsl #1 ; r1 = r9 - (r1 << 1)

012345678910111213141516171819202122232425262728293031

1110 00 1 and 0 r7 r3 « 0 0x2a

1110 00 0 add 0 r8 r2 r1 0 lsl 1 r5

1110 00 0 sub 0 r9 r1 0x1 lsl 0 r1

Gabriel Laskar (EPITA) CAAL 2015 342 / 378

Page 438: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors Some instruction sets ARM

ARMInstruction format (2)

012345678910111213141516171819202122232425262728293031

cond 00 i op s rn rd imm/reg

cond 01 i p u bw l rn rd imm/reg

1 rotate imm

0 rs 0 ty 1 rm

0 amount ty 0 rm

and r3, r7, #42 ; r3 = r7 & 0x2aadd r2, r8, r5, lsl r1 ; r2 = r8 + (r5 << r1)sub r1, r9, r1, lsl #1 ; r1 = r9 - (r1 << 1)

012345678910111213141516171819202122232425262728293031

1110 00 1 and 0 r7 r3 « 0 0x2a

1110 00 0 add 0 r8 r2 r1 0 lsl 1 r5

1110 00 0 sub 0 r9 r1 0x1 lsl 0 r1Gabriel Laskar (EPITA) CAAL 2015 342 / 378

Page 439: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors Some instruction sets ARM

ARMAsm code

add r3, r2, #0x42 ; alu reg / immorr r3, r4, r5 ; alu reg / regbic r3, r4, r5, lsr #4 ; alu reg / reg & shift

cmp r2, #0 ; test if r2 < 0rsblt r2, r2, #0 ; then r2 = 0 - r2

ldr r2, #0x12345678 ; rewritten asldr r2, [pc, #offset] ; pc-relative load

streq r4, [r2, r3, lsl #4] ; store word to r2 + r3 << 4; only if last cmp is ‘eq’

str r4, [r2, #8]! ; store word to r2 + 8; and r2 = r2 + 8

offset: ; 32-bit immediates zone0x12345678Gabriel Laskar (EPITA) CAAL 2015 343 / 378

Page 440: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors Some instruction sets ARM

ARMBlock transfers

012345678910111213141516171819202122232425262728293031

cond 100 p u bw l rn reg-list

stmdb sp!, {r2, r3, r4, r8}push {r2, r3, r4, r8}

012345678910111213141516171819202122232425262728293031

cond 100 1 0 0 1 0 r13 0000000100011100

Gabriel Laskar (EPITA) CAAL 2015 344 / 378

Page 441: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors Some instruction sets ARM

ARMBlock transfers hacks (1)The link register is r14, then the following code saves r14 and restores it toPC afterwards...

push {r2, r3, r4, r8, lr}...pop {r2, r3, r4, r8, pc}

..00

..04 sp → r2

..08 r3

..0c r4

..10 r8

..14 lr

..18 old sp → xxx

..1c

..20Gabriel Laskar (EPITA) CAAL 2015 345 / 378

Page 442: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors Some instruction sets ARM

ARMBlock transfers hacks (2)

Do a memcpy with some free registers, for instance, copy 16 bytes from*r0 to *r1, not using r2 nor r5, update r0 and r1 to point on next block...

ldmia r0!, {r3, r4, r6, r7}stmia r1!, {r3, r4, r6, r7}

Gabriel Laskar (EPITA) CAAL 2015 346 / 378

Page 443: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors Some instruction sets ARM

ARM Instruction-space exhaustion

Eventually, ARM instruction-set continued to grow despite the obviousexhaustion of the “clean” instruction-set space.

I abuse of concatenation of seldom bits around the instruction wordI sometimes forbid r15 as an operand, and reuse other fieldsI reuse the “never” condition code

Gabriel Laskar (EPITA) CAAL 2015 347 / 378

Page 444: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors Some instruction sets ARM

ARM Instruction-space exhaustionIllustration

012345678910111213141516171819202122232425262728293031

cond 00 I op s rn rd op2

1 rotate imm

0 rs 0 ty 1 rm

0 amount ty 0 rm

How to add the multiplication?

cond 00 0 000 a s rn rd rs 1001 rm

Gabriel Laskar (EPITA) CAAL 2015 348 / 378

Page 445: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors Some instruction sets ARM

ARM Instruction-space exhaustionIllustration

012345678910111213141516171819202122232425262728293031

cond 00 I op s rn rd op2

1 rotate imm

0 rs 0 ty 1 rm

0 amount ty 0 rm

How to add the multiplication?

cond 00 0 000 a s rn rd rs 1001 rm

Gabriel Laskar (EPITA) CAAL 2015 348 / 378

Page 446: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors Some instruction sets ARM

Thumb mode

One goal: Code compression.Concessions to pack a maximum of operations in a minimal space:

I Access to only 8 registers among 16I Implicit stack opsI Implicit pc-relative operationsI No predicates any more

Dirty hacks for making it work:I Use lower bit of r15 as a mode indicatorI Use a new bx/blx instruction

Gabriel Laskar (EPITA) CAAL 2015 349 / 378

Page 447: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors Some instruction sets ARM

Thumb2

Keep up with thumb, this is great.Ideas:

I Have an asm-code compatibility with ARMI Remap the whole ARM 32-bit instruction set in 16-bit thumb

I 16 bit instruction words are not enough any moreNever mind

I Keep the basic 16-bit thumb encodingI When we need more, just make the instruction 32-bits long...I Decode mixed 16/32-bits instructionsI Only 16-bit alignment constraint for 32-bit long thumb-2 ops

Then Thumb-2 is a RISC with a CISC encoding!

Gabriel Laskar (EPITA) CAAL 2015 350 / 378

Page 448: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors Some instruction sets ARM

Thumb2

Keep up with thumb, this is great.Ideas:

I Have an asm-code compatibility with ARMI Remap the whole ARM 32-bit instruction set in 16-bit thumbI 16 bit instruction words are not enough any more

Never mindI Keep the basic 16-bit thumb encodingI When we need more, just make the instruction 32-bits long...I Decode mixed 16/32-bits instructionsI Only 16-bit alignment constraint for 32-bit long thumb-2 ops

Then Thumb-2 is a RISC with a CISC encoding!

Gabriel Laskar (EPITA) CAAL 2015 350 / 378

Page 449: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors Some instruction sets ARM

Thumb2

Keep up with thumb, this is great.Ideas:

I Have an asm-code compatibility with ARMI Remap the whole ARM 32-bit instruction set in 16-bit thumbI 16 bit instruction words are not enough any more

Never mindI Keep the basic 16-bit thumb encodingI When we need more, just make the instruction 32-bits long...

I Decode mixed 16/32-bits instructionsI Only 16-bit alignment constraint for 32-bit long thumb-2 ops

Then Thumb-2 is a RISC with a CISC encoding!

Gabriel Laskar (EPITA) CAAL 2015 350 / 378

Page 450: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors Some instruction sets ARM

Thumb2

Keep up with thumb, this is great.Ideas:

I Have an asm-code compatibility with ARMI Remap the whole ARM 32-bit instruction set in 16-bit thumbI 16 bit instruction words are not enough any more

Never mindI Keep the basic 16-bit thumb encodingI When we need more, just make the instruction 32-bits long...I Decode mixed 16/32-bits instructions

I Only 16-bit alignment constraint for 32-bit long thumb-2 opsThen Thumb-2 is a RISC with a CISC encoding!

Gabriel Laskar (EPITA) CAAL 2015 350 / 378

Page 451: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors Some instruction sets ARM

Thumb2

Keep up with thumb, this is great.Ideas:

I Have an asm-code compatibility with ARMI Remap the whole ARM 32-bit instruction set in 16-bit thumbI 16 bit instruction words are not enough any more

Never mindI Keep the basic 16-bit thumb encodingI When we need more, just make the instruction 32-bits long...I Decode mixed 16/32-bits instructionsI Only 16-bit alignment constraint for 32-bit long thumb-2 ops

Then Thumb-2 is a RISC with a CISC encoding!

Gabriel Laskar (EPITA) CAAL 2015 350 / 378

Page 452: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Focus on RISC processors Some instruction sets ARM

Thumb2

Keep up with thumb, this is great.Ideas:

I Have an asm-code compatibility with ARMI Remap the whole ARM 32-bit instruction set in 16-bit thumbI 16 bit instruction words are not enough any more

Never mindI Keep the basic 16-bit thumb encodingI When we need more, just make the instruction 32-bits long...I Decode mixed 16/32-bits instructionsI Only 16-bit alignment constraint for 32-bit long thumb-2 ops

Then Thumb-2 is a RISC with a CISC encoding!

Gabriel Laskar (EPITA) CAAL 2015 350 / 378

Page 453: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations

Part X

CPU-aware optimizations

Gabriel Laskar (EPITA) CAAL 2015 351 / 378

Page 454: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Rationale

10: CPU-aware optimizations

Rationale

Examples

Code readability guidelines

Gabriel Laskar (EPITA) CAAL 2015 352 / 378

Page 455: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Rationale

Purpose

Knowing the low-level implementation of the processors, we can:I write code easier for the compiler to translateI write code easier for the CPU to run

I because nicer with the pipelineI because nicer with the memory subsystem

Gabriel Laskar (EPITA) CAAL 2015 353 / 378

Page 456: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples

10: CPU-aware optimizations

Rationale

Examplesabs()Sign extensionPower of two detectionMask merging

Code readability guidelines

Gabriel Laskar (EPITA) CAAL 2015 354 / 378

Page 457: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples abs()

10: CPU-aware optimizations

Rationale

Examplesabs()Sign extensionPower of two detectionMask merging

Code readability guidelines

Gabriel Laskar (EPITA) CAAL 2015 355 / 378

Page 458: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples abs()

Nicer with the pipelineexample: abs()

The absolute value is a good example of optimized code which is easy toimplement in an efficient and smart way.

Gabriel Laskar (EPITA) CAAL 2015 356 / 378

Page 459: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples abs()

Nicer with the pipelineabs() basic code

int abs(int x){

if (x < 0)return -x;

elsereturn x;

}

int abs(int x){

return (x < 0) ? -x : x;}

Gabriel Laskar (EPITA) CAAL 2015 357 / 378

Page 460: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples abs()

Nicer with the pipelineexample: abs()

Let’s go back to some mathematical principles:

I Addition: a + 1 = a - (-1)I Number representation: -1 = 0xffffffffI Negation: -a = (not a) + 1I not a = a xor 0xffffffff = a xor -1

I if (x < 0), returnI if (x > 0), return

I Sign extension: (x » 31)

Gabriel Laskar (EPITA) CAAL 2015 358 / 378

Page 461: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples abs()

Nicer with the pipelineexample: abs()

Let’s go back to some mathematical principles:I Addition: a + 1 = a - (-1)

I Number representation: -1 = 0xffffffffI Negation: -a = (not a) + 1I not a = a xor 0xffffffff = a xor -1

I if (x < 0), returnI if (x > 0), return

I Sign extension: (x » 31)

Gabriel Laskar (EPITA) CAAL 2015 358 / 378

Page 462: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples abs()

Nicer with the pipelineexample: abs()

Let’s go back to some mathematical principles:I Addition: a + 1 = a - (-1)I Number representation: -1 = 0xffffffff

I Negation: -a = (not a) + 1I not a = a xor 0xffffffff = a xor -1

I if (x < 0), returnI if (x > 0), return

I Sign extension: (x » 31)

Gabriel Laskar (EPITA) CAAL 2015 358 / 378

Page 463: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples abs()

Nicer with the pipelineexample: abs()

Let’s go back to some mathematical principles:I Addition: a + 1 = a - (-1)I Number representation: -1 = 0xffffffffI Negation: -a = (not a) + 1

I not a = a xor 0xffffffff = a xor -1

I if (x < 0), returnI if (x > 0), return

I Sign extension: (x » 31)

Gabriel Laskar (EPITA) CAAL 2015 358 / 378

Page 464: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples abs()

Nicer with the pipelineexample: abs()

Let’s go back to some mathematical principles:I Addition: a + 1 = a - (-1)I Number representation: -1 = 0xffffffffI Negation: -a = (not a) + 1I not a = a xor 0xffffffff = a xor -1

I if (x < 0), returnI if (x > 0), return

I Sign extension: (x » 31)

Gabriel Laskar (EPITA) CAAL 2015 358 / 378

Page 465: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples abs()

Nicer with the pipelineexample: abs()

Let’s go back to some mathematical principles:I Addition: a + 1 = a - (-1)I Number representation: -1 = 0xffffffffI Negation: -a = (not a) + 1I not a = a xor 0xffffffff = a xor -1

I if (x < 0), return -xI if (x > 0), return x

I Sign extension: (x » 31)

Gabriel Laskar (EPITA) CAAL 2015 358 / 378

Page 466: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples abs()

Nicer with the pipelineexample: abs()

Let’s go back to some mathematical principles:I Addition: a + 1 = a - (-1)I Number representation: -1 = 0xffffffffI Negation: -a = (not a) + 1I not a = a xor 0xffffffff = a xor -1

I if (x < 0), return ((not x) + 1)I if (x > 0), return x

I Sign extension: (x » 31)

Gabriel Laskar (EPITA) CAAL 2015 358 / 378

Page 467: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples abs()

Nicer with the pipelineexample: abs()

Let’s go back to some mathematical principles:I Addition: a + 1 = a - (-1)I Number representation: -1 = 0xffffffffI Negation: -a = (not a) + 1I not a = a xor 0xffffffff = a xor -1

I if (x < 0), return ((x ^ 0xffffffff) - 0xffffffff)I if (x > 0), return ((x ^ 0) + 0)

I Sign extension: (x » 31)

Gabriel Laskar (EPITA) CAAL 2015 358 / 378

Page 468: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples abs()

Nicer with the pipelineexample: abs()

Let’s go back to some mathematical principles:I Addition: a + 1 = a - (-1)I Number representation: -1 = 0xffffffffI Negation: -a = (not a) + 1I not a = a xor 0xffffffff = a xor -1

I if (x < 0), return ((x ^ 0xffffffff) - 0xffffffff)I if (x > 0), return ((x ^ 0) + 0)

How to produce -1 when x < 0?

I Sign extension: (x » 31)

Gabriel Laskar (EPITA) CAAL 2015 358 / 378

Page 469: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples abs()

Nicer with the pipelineexample: abs()

Let’s go back to some mathematical principles:I Addition: a + 1 = a - (-1)I Number representation: -1 = 0xffffffffI Negation: -a = (not a) + 1I not a = a xor 0xffffffff = a xor -1

I if (x < 0), return ((x ^ 0xffffffff) - 0xffffffff)I if (x > 0), return ((x ^ 0) + 0)

How to produce -1 when x < 0?I Sign extension: (x » 31)

Gabriel Laskar (EPITA) CAAL 2015 358 / 378

Page 470: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples abs()

Nicer with the pipelineabs() better code

int abs(int x){

int sign_word = x >> 31;return (x ^ sign_word) - sign_word;

}

int abs(int x){

int sign_word = -(1 & (x >> 31));return (x ^ sign_word) - sign_word;

}

Gabriel Laskar (EPITA) CAAL 2015 359 / 378

Page 471: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples abs()

Nicer with the pipelineabs() better code

int abs(int x){

int sign_word = x >> 31;return (x ^ sign_word) - sign_word;

}

int abs(int x){

int sign_word = -(1 & (x >> 31));return (x ^ sign_word) - sign_word;

}

Gabriel Laskar (EPITA) CAAL 2015 359 / 378

Page 472: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples Sign extension

10: CPU-aware optimizations

Rationale

Examplesabs()Sign extensionPower of two detectionMask merging

Code readability guidelines

Gabriel Laskar (EPITA) CAAL 2015 360 / 378

Page 473: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples Sign extension

Sign extension at an arbitrary size

Sometimes, we need to sign extend a value from an arbitrary word width(say 13 bits) to a CPU word (say 32 bits).How to do it?

x 0000000000000000000snnnnnnnnnnnn

-high 11111111111111111111000000000000

result ssssssssssssssssssssnnnnnnnnnnnn

int sign_ext(int val, int bits){

int high = 1 << (bits - 1);return (val & high) ? (val | (-high)) : val;

}

Gabriel Laskar (EPITA) CAAL 2015 361 / 378

Page 474: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples Sign extension

Sign extension at an arbitrary size

Sometimes, we need to sign extend a value from an arbitrary word width(say 13 bits) to a CPU word (say 32 bits).How to do it?

x 0000000000000000000snnnnnnnnnnnn

-high 11111111111111111111000000000000

result ssssssssssssssssssssnnnnnnnnnnnn

int sign_ext(int val, int bits){

int high = 1 << (bits - 1);return (val & high) ? (val | (-high)) : val;

}

Gabriel Laskar (EPITA) CAAL 2015 361 / 378

Page 475: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples Sign extension

Sign extension at an arbitrary size

Sometimes, we need to sign extend a value from an arbitrary word width(say 13 bits) to a CPU word (say 32 bits).How to do it?

x 0000000000000000000snnnnnnnnnnnn

-high 11111111111111111111000000000000

result ssssssssssssssssssssnnnnnnnnnnnn

int sign_ext(int val, int bits){

int high = 1 << (bits - 1);return (val & high) ? (val | (-high)) : val;

}

Gabriel Laskar (EPITA) CAAL 2015 361 / 378

Page 476: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples Sign extension

Sign extension at an arbitrary size

Sometimes, we need to sign extend a value from an arbitrary word width(say 13 bits) to a CPU word (say 32 bits).How to do it?

x 0000000000000000000snnnnnnnnnnnn

-high 11111111111111111111000000000000

result ssssssssssssssssssssnnnnnnnnnnnn

int sign_ext(int val, int bits){

int high = 1 << (bits - 1);return (val & high) ? (val | (-high)) : val;

}

Gabriel Laskar (EPITA) CAAL 2015 361 / 378

Page 477: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples Sign extension

Sign extension at an arbitrary size

x 0000000000000000000snnnnnnnnnnnn

« snnnnnnnnnnnn0000000000000000000

» ssssssssssssssssssssnnnnnnnnnnnn

int sign_ext(int val, int bits){

int shift_bits = 32 - bits;return (val << shift_bits) >> shift_bits;

}

Gabriel Laskar (EPITA) CAAL 2015 362 / 378

Page 478: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples Sign extension

Sign extension at an arbitrary size

x 0000000000000000000snnnnnnnnnnnn

« snnnnnnnnnnnn0000000000000000000

» ssssssssssssssssssssnnnnnnnnnnnn

int sign_ext(int val, int bits){

int shift_bits = 32 - bits;return (val << shift_bits) >> shift_bits;

}

Gabriel Laskar (EPITA) CAAL 2015 362 / 378

Page 479: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples Sign extension

Sign extension at an arbitrary size

x 0000000000000000000snnnnnnnnnnnn

« snnnnnnnnnnnn0000000000000000000

» ssssssssssssssssssssnnnnnnnnnnnn

int sign_ext(int val, int bits){

int shift_bits = 32 - bits;return (val << shift_bits) >> shift_bits;

}

Gabriel Laskar (EPITA) CAAL 2015 362 / 378

Page 480: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples Sign extension

Sign extension at an arbitrary size

x 0000000000000000000snnnnnnnnnnnn

« snnnnnnnnnnnn0000000000000000000

» ssssssssssssssssssssnnnnnnnnnnnn

int sign_ext(int val, int bits){

int shift_bits = 32 - bits;return (val << shift_bits) >> shift_bits;

}

Gabriel Laskar (EPITA) CAAL 2015 362 / 378

Page 481: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples Sign extension

Sign extension at an arbitrary sizex 0000000000000000000snnnnnnnnnnnn

sb 0000000000000000000s000000000000xor 0000000000000000000Snnnnnnnnnnnn

x 00000000000000000001nnnnnnnnnnnnsb 00000000000000000001000000000000

x ^ sb 00000000000000000000nnnnnnnnnnnn.. - sb 11111111111111111111nnnnnnnnnnnn

x 00000000000000000000nnnnnnnnnnnnsb 00000000000000000001000000000000

x ^ sb 00000000000000000001nnnnnnnnnnnn.. - sb 00000000000000000000nnnnnnnnnnnn

int sign_ext(int val, int bits){

int high = 1 << (bits-1);return (val ^ high) - high;

}

Gabriel Laskar (EPITA) CAAL 2015 363 / 378

Page 482: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples Sign extension

Sign extension at an arbitrary sizex 0000000000000000000snnnnnnnnnnnnsb 0000000000000000000s000000000000

xor 0000000000000000000Snnnnnnnnnnnn

x 00000000000000000001nnnnnnnnnnnnsb 00000000000000000001000000000000

x ^ sb 00000000000000000000nnnnnnnnnnnn.. - sb 11111111111111111111nnnnnnnnnnnn

x 00000000000000000000nnnnnnnnnnnnsb 00000000000000000001000000000000

x ^ sb 00000000000000000001nnnnnnnnnnnn.. - sb 00000000000000000000nnnnnnnnnnnn

int sign_ext(int val, int bits){

int high = 1 << (bits-1);return (val ^ high) - high;

}

Gabriel Laskar (EPITA) CAAL 2015 363 / 378

Page 483: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples Sign extension

Sign extension at an arbitrary sizex 0000000000000000000snnnnnnnnnnnnsb 0000000000000000000s000000000000xor 0000000000000000000Snnnnnnnnnnnn

x 00000000000000000001nnnnnnnnnnnnsb 00000000000000000001000000000000

x ^ sb 00000000000000000000nnnnnnnnnnnn.. - sb 11111111111111111111nnnnnnnnnnnn

x 00000000000000000000nnnnnnnnnnnnsb 00000000000000000001000000000000

x ^ sb 00000000000000000001nnnnnnnnnnnn.. - sb 00000000000000000000nnnnnnnnnnnn

int sign_ext(int val, int bits){

int high = 1 << (bits-1);return (val ^ high) - high;

}

Gabriel Laskar (EPITA) CAAL 2015 363 / 378

Page 484: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples Sign extension

Sign extension at an arbitrary sizex 0000000000000000000snnnnnnnnnnnnsb 0000000000000000000s000000000000xor 0000000000000000000Snnnnnnnnnnnn

x 00000000000000000001nnnnnnnnnnnnsb 00000000000000000001000000000000

x ^ sb 00000000000000000000nnnnnnnnnnnn.. - sb 11111111111111111111nnnnnnnnnnnn

x 00000000000000000000nnnnnnnnnnnnsb 00000000000000000001000000000000

x ^ sb 00000000000000000001nnnnnnnnnnnn.. - sb 00000000000000000000nnnnnnnnnnnn

int sign_ext(int val, int bits){

int high = 1 << (bits-1);return (val ^ high) - high;

}

Gabriel Laskar (EPITA) CAAL 2015 363 / 378

Page 485: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples Sign extension

Sign extension at an arbitrary sizex 0000000000000000000snnnnnnnnnnnnsb 0000000000000000000s000000000000xor 0000000000000000000Snnnnnnnnnnnn

x 00000000000000000001nnnnnnnnnnnn

sb 00000000000000000001000000000000x ^ sb 00000000000000000000nnnnnnnnnnnn.. - sb 11111111111111111111nnnnnnnnnnnn

x 00000000000000000000nnnnnnnnnnnnsb 00000000000000000001000000000000

x ^ sb 00000000000000000001nnnnnnnnnnnn.. - sb 00000000000000000000nnnnnnnnnnnn

int sign_ext(int val, int bits){

int high = 1 << (bits-1);return (val ^ high) - high;

}

Gabriel Laskar (EPITA) CAAL 2015 363 / 378

Page 486: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples Sign extension

Sign extension at an arbitrary sizex 0000000000000000000snnnnnnnnnnnnsb 0000000000000000000s000000000000xor 0000000000000000000Snnnnnnnnnnnn

x 00000000000000000001nnnnnnnnnnnnsb 00000000000000000001000000000000

x ^ sb 00000000000000000000nnnnnnnnnnnn.. - sb 11111111111111111111nnnnnnnnnnnn

x 00000000000000000000nnnnnnnnnnnnsb 00000000000000000001000000000000

x ^ sb 00000000000000000001nnnnnnnnnnnn.. - sb 00000000000000000000nnnnnnnnnnnn

int sign_ext(int val, int bits){

int high = 1 << (bits-1);return (val ^ high) - high;

}

Gabriel Laskar (EPITA) CAAL 2015 363 / 378

Page 487: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples Sign extension

Sign extension at an arbitrary sizex 0000000000000000000snnnnnnnnnnnnsb 0000000000000000000s000000000000xor 0000000000000000000Snnnnnnnnnnnn

x 00000000000000000001nnnnnnnnnnnnsb 00000000000000000001000000000000

x ^ sb 00000000000000000000nnnnnnnnnnnn.. - sb 11111111111111111111nnnnnnnnnnnn

x 00000000000000000000nnnnnnnnnnnn

sb 00000000000000000001000000000000x ^ sb 00000000000000000001nnnnnnnnnnnn.. - sb 00000000000000000000nnnnnnnnnnnn

int sign_ext(int val, int bits){

int high = 1 << (bits-1);return (val ^ high) - high;

}

Gabriel Laskar (EPITA) CAAL 2015 363 / 378

Page 488: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples Sign extension

Sign extension at an arbitrary sizex 0000000000000000000snnnnnnnnnnnnsb 0000000000000000000s000000000000xor 0000000000000000000Snnnnnnnnnnnn

x 00000000000000000001nnnnnnnnnnnnsb 00000000000000000001000000000000

x ^ sb 00000000000000000000nnnnnnnnnnnn.. - sb 11111111111111111111nnnnnnnnnnnn

x 00000000000000000000nnnnnnnnnnnnsb 00000000000000000001000000000000

x ^ sb 00000000000000000001nnnnnnnnnnnn.. - sb 00000000000000000000nnnnnnnnnnnn

int sign_ext(int val, int bits){

int high = 1 << (bits-1);return (val ^ high) - high;

}

Gabriel Laskar (EPITA) CAAL 2015 363 / 378

Page 489: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples Sign extension

Sign extension at an arbitrary sizex 0000000000000000000snnnnnnnnnnnnsb 0000000000000000000s000000000000xor 0000000000000000000Snnnnnnnnnnnn

x 00000000000000000001nnnnnnnnnnnnsb 00000000000000000001000000000000

x ^ sb 00000000000000000000nnnnnnnnnnnn.. - sb 11111111111111111111nnnnnnnnnnnn

x 00000000000000000000nnnnnnnnnnnnsb 00000000000000000001000000000000

x ^ sb 00000000000000000001nnnnnnnnnnnn.. - sb 00000000000000000000nnnnnnnnnnnn

int sign_ext(int val, int bits){

int high = 1 << (bits-1);return (val ^ high) - high;

} Gabriel Laskar (EPITA) CAAL 2015 363 / 378

Page 490: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples Power of two detection

10: CPU-aware optimizations

Rationale

Examplesabs()Sign extensionPower of two detectionMask merging

Code readability guidelines

Gabriel Laskar (EPITA) CAAL 2015 364 / 378

Page 491: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples Power of two detection

Power of twoProperties

I Powers of two have only one “1” bitI Substracting 1 from a power of two flips the only “1”

Examples:I 1110010 - 1 = 1110001I 0100000 - 1 = 0011111

I Lower bits up to the lowest 1 get flippedI Other bits stay the same

int is_pow2(int n){

return !(n & (n - 1));}

Gabriel Laskar (EPITA) CAAL 2015 365 / 378

Page 492: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples Power of two detection

Power of twoProperties

I Powers of two have only one “1” bitI Substracting 1 from a power of two flips the only “1”

Examples:I 1110010 - 1 = 1110001I 0100000 - 1 = 0011111

I Lower bits up to the lowest 1 get flippedI Other bits stay the same

int is_pow2(int n){

return !(n & (n - 1));}

Gabriel Laskar (EPITA) CAAL 2015 365 / 378

Page 493: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples Power of two detection

Power of twoProperties

I Powers of two have only one “1” bitI Substracting 1 from a power of two flips the only “1”

Examples:I 1110010 - 1 = 1110001I 0100000 - 1 = 0011111

I Lower bits up to the lowest 1 get flippedI Other bits stay the same

int is_pow2(int n){

return !(n & (n - 1));}

Gabriel Laskar (EPITA) CAAL 2015 365 / 378

Page 494: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples Mask merging

10: CPU-aware optimizations

Rationale

Examplesabs()Sign extensionPower of two detectionMask merging

Code readability guidelines

Gabriel Laskar (EPITA) CAAL 2015 366 / 378

Page 495: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples Mask merging

Mask merging

01234567

where_0 1 1 1 0 1 0 0 0where_1 1 0 0 0 1 1 1 0

mask 0 0 1 1 1 1 0 0

result 1 1 0 0 1 1 0 0

int mask_merge(int mask, int where_0, int where_1){

return (mask & where_1) | (~mask & where_0);}

Gabriel Laskar (EPITA) CAAL 2015 367 / 378

Page 496: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples Mask merging

Mask merging

01234567

where_0 1 1 1 0 1 0 0 0where_1 1 0 0 0 1 1 1 0

mask 0 0 1 1 1 1 0 0

result 1 1 0 0 1 1 0 0

int mask_merge(int mask, int where_0, int where_1){

return (mask & where_1) | (~mask & where_0);}

Gabriel Laskar (EPITA) CAAL 2015 367 / 378

Page 497: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples Mask merging

Mask mergingLess instructions

01234567

where_0 1 1 1 0 1 0 0 0where_1 1 0 0 0 1 1 1 0

where_0 ^ where_1 0 1 1 0 0 1 1 0

mask 0 0 1 1 1 1 0 0

(w0 ^ w1) &mask 0 0 1 0 0 1 0 0

((w0 ^ w1) &mask) ^ w0 1 1 0 0 1 1 0 0

int mask_merge(int mask, int where_0, int where_1){

return ((where_0 ^ where_1) & mask) ^ where_0;}

Gabriel Laskar (EPITA) CAAL 2015 368 / 378

Page 498: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples Mask merging

Mask mergingLess instructions

01234567

where_0 1 1 1 0 1 0 0 0where_1 1 0 0 0 1 1 1 0

where_0 ^ where_1 0 1 1 0 0 1 1 0

mask 0 0 1 1 1 1 0 0

(w0 ^ w1) &mask 0 0 1 0 0 1 0 0

((w0 ^ w1) &mask) ^ w0 1 1 0 0 1 1 0 0

int mask_merge(int mask, int where_0, int where_1){

return ((where_0 ^ where_1) & mask) ^ where_0;}

Gabriel Laskar (EPITA) CAAL 2015 368 / 378

Page 499: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples Mask merging

Mask mergingLess instructions

01234567

where_0 1 1 1 0 1 0 0 0where_1 1 0 0 0 1 1 1 0

where_0 ^ where_1 0 1 1 0 0 1 1 0

mask 0 0 1 1 1 1 0 0

(w0 ^ w1) &mask 0 0 1 0 0 1 0 0

((w0 ^ w1) &mask) ^ w0 1 1 0 0 1 1 0 0

int mask_merge(int mask, int where_0, int where_1){

return ((where_0 ^ where_1) & mask) ^ where_0;}

Gabriel Laskar (EPITA) CAAL 2015 368 / 378

Page 500: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Examples Mask merging

Mask mergingLess instructions

01234567

where_0 1 1 1 0 1 0 0 0where_1 1 0 0 0 1 1 1 0

where_0 ^ where_1 0 1 1 0 0 1 1 0

mask 0 0 1 1 1 1 0 0

(w0 ^ w1) &mask 0 0 1 0 0 1 0 0

((w0 ^ w1) &mask) ^ w0 1 1 0 0 1 1 0 0

int mask_merge(int mask, int where_0, int where_1){

return ((where_0 ^ where_1) & mask) ^ where_0;}

Gabriel Laskar (EPITA) CAAL 2015 368 / 378

Page 501: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Code readability guidelines

10: CPU-aware optimizations

Rationale

Examples

Code readability guidelines

Gabriel Laskar (EPITA) CAAL 2015 369 / 378

Page 502: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

CPU-aware optimizations Code readability guidelines

Code readability

If you ever code with this kind of hackI always create functions with explicit name and prototypeI eventually document the intended behaviorI may use a static inline

int abs(int x);int sign_ext(int val, int bits);int is_pow2(int n);int mask_merge(int mask, int where_0, int where_1);

Gabriel Laskar (EPITA) CAAL 2015 370 / 378

Page 503: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Multi-/Many-core, heterogeneoussystems

Part XI

Multi-/Many-core, heterogeneous systems

Gabriel Laskar (EPITA) CAAL 2015 371 / 378

Page 504: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Multi-/Many-core, heterogeneoussystems Topology evolution

11: Multi-/Many-core, heterogeneous systems

Topology evolution

Challenges

Gabriel Laskar (EPITA) CAAL 2015 372 / 378

Page 505: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Multi-/Many-core, heterogeneoussystems Topology evolution

History of hardware topologies

Video

Floppy

Serial

Keyboard

Sound

Network Hard Disk

RAM

CPU

Gabriel Laskar (EPITA) CAAL 2015 373 / 378

Page 506: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Multi-/Many-core, heterogeneoussystems Topology evolution

History of hardware topologies

Floppy

Serial

BridgeBridge

VideoNetwork

Hard DiskSound

RAM

Keyboard

CPU

Gabriel Laskar (EPITA) CAAL 2015 373 / 378

Page 507: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Multi-/Many-core, heterogeneoussystems Topology evolution

History of hardware topologies

Floppy

Serial

BridgeBridge

VideoNetwork

Hard DiskSound

CPU CPU

RAM

Keyboard

Gabriel Laskar (EPITA) CAAL 2015 373 / 378

Page 508: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Multi-/Many-core, heterogeneoussystems Topology evolution

History of hardware topologies

CPU CPU

Floppy

Serial

BridgeRAM

BridgeBridge

Network

Hard DiskSound

Video

USB

Keyboard

Gabriel Laskar (EPITA) CAAL 2015 373 / 378

Page 509: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Multi-/Many-core, heterogeneoussystems Topology evolution

History of hardware topologies

CPU CPU

BridgeRAM Video

Bridge

ATA

Net

USB

Sound

Floppy

Kbd

Mouse

PCI

USB

Serial

Gabriel Laskar (EPITA) CAAL 2015 373 / 378

Page 510: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Multi-/Many-core, heterogeneoussystems Topology evolution

History of hardware topologies

CPU CPU

BridgeRAM Video

Bridge

ATA

Net

USB

Sound

Floppy

Kbd

Mouse

PCI

USB

SATA

Wlan

SerialSD

Gabriel Laskar (EPITA) CAAL 2015 373 / 378

Page 511: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Multi-/Many-core, heterogeneoussystems Topology evolution

History of hardware topologies

BridgeRAM

CPU

Many core

CPU

Many core

Bridge

ATA

Net

USB

Sound

Floppy

Kbd

Mouse

PCI

USB

SATA

Wlan

SerialSD

Video

PCI−e

Gabriel Laskar (EPITA) CAAL 2015 373 / 378

Page 512: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Multi-/Many-core, heterogeneoussystems Topology evolution

History of hardware topologies

Bridge

CPU

Many core

CPU

Many coreRAM RAM

Video

Bridge

Net

USB

Sound

PCI

USB

SATA

Wlan

SD

PCI−e

PCI−e

ATA Floppy

Kbd

Mouse

Serial

Gabriel Laskar (EPITA) CAAL 2015 373 / 378

Page 513: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Multi-/Many-core, heterogeneoussystems Topology evolution

Future of hardware topologies?

USB

Bridge

Net

USB

Sound

SATA

Wlan

SD

ATA Floppy

Kbd

Mouse

Serial

PCI

Video

CPU

Many core

CPU

Many core

CPU

Many core

CPU

Many core

CPU

Many core

CPU

Many core

RAMRAM

RAMRAM

RAMRAM

GPGPUGPGPU

GPGPUGPGPU

GPGPUGPGPU

Gabriel Laskar (EPITA) CAAL 2015 374 / 378

Page 514: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Multi-/Many-core, heterogeneoussystems Topology evolution

Hardware topologiesObservations

Busses don’t scaleI Bandwidth is shared among connected peersI They need rest cycles between elections

We don’t have busses any moreI We use networks (QPI, HyperTransport)I This forbids snoopingI Coherence has to be done explicitely

Gabriel Laskar (EPITA) CAAL 2015 375 / 378

Page 515: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Multi-/Many-core, heterogeneoussystems Topology evolution

NUMAObservations

I There are more and more coresI Memory gets closer to the coresI Memory connections get distributed among coresI Systems become NUMA

Gabriel Laskar (EPITA) CAAL 2015 376 / 378

Page 516: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Multi-/Many-core, heterogeneoussystems Challenges

11: Multi-/Many-core, heterogeneous systems

Topology evolution

Challenges

Gabriel Laskar (EPITA) CAAL 2015 377 / 378

Page 517: Computer Architecture and Assembly Language · 2020-04-11 · IntroductionProblemdefinition Whatarewetryingtolearn? ComputerArchitecture Whatisinthehardware? I Abitofhistoryofcomputers,currentmachines

Multi-/Many-core, heterogeneoussystems Challenges

Some scalability bottlenecks

Coherent shared-memory systemsI are hard to designI dont scale wellI get slower with the load

NUMA systemsI are hard to program forI are not well supported in all OSes

Uncoherent shared-memory systems are not ready for prime-time yet

Gabriel Laskar (EPITA) CAAL 2015 378 / 378