translation cache policies for dynamic binary translation

Post on 05-Dec-2014

347 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

National School of Computer Sciences

Translation cache policies for dynamic binary translation

Saber FERJANI

TIMA Laboratory - SLS Group

18 Avril 2013

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 1 / 25

Who I am ?

Academic & Professional Cursus2010-2013 : Student at National School of Computer Sciences - Tunisia.2011/2012 : Robotic team leader, participation to many competitions.June-July 2011 : Intern at Alpha Technology, Design of many PCB layoutincluding QFP, SO, SMT and through hole components.July-August 2012 : Intern at STMicroelectronics : Developing software for aHygrometer and an Altimeter, for STM32F3 microcontroller

http ://about.me/ferjani

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 2 / 25

Context

Why ?Hardware design is taking more and more time,Software development should start earlier,Instruction Set Simulators (ISS) handles the simulation of processors, namedtarget, on a machine with a different architecture, named host.

How?Cross Compilation.Interpretive translation.Dynamic Binary Translation.

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 3 / 25

Context

Why ?Hardware design is taking more and more time,Software development should start earlier,Instruction Set Simulators (ISS) handles the simulation of processors, namedtarget, on a machine with a different architecture, named host.

How?Cross Compilation.Interpretive translation.Dynamic Binary Translation.

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 3 / 25

Terminology

Simulator : just duplicate the behavior of the system.

Emulator : duplicate the inner workings of the system.

TB : Translated Bloc.

IR : Intermediate representation (also called op-code)

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 4 / 25

Outline

1 Introduction

2 Cache Algorithms

3 Qemu internals

4 Preliminary Results

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 5 / 25

I- Introduction

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 6 / 25

Qemu OverviewGeneric and open source machine emulator and virtualizer,Created by Fabrice Bellard in 2003,uses portable dynamic translation,Supported Targets : x86, arm, mips, sh4, cris, sparc, powerpc, nds32...

Qemu FeaturesJust-in-time (JIT) compilation support,Self-modifying code support,Direct block chaining.

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 7 / 25

Subject

ProblematicSimulation speed is mainly affected by reuse of TB,Current policy just flush the entire cache when it is full,We need to enhance translation cache policy in order to maximize TB reuse.

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 8 / 25

II- Cache Algorithms

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 9 / 25

Optimal cache algorithmEvict entry that will not be used for the longest time.Unfeasible in practice, since we cannot really know future !

First In First OutMost simple cache replacement policy,Entry remain in memory a constant duration.

Least Recently UsedEnhancement to FIFO.Each time an entry is referenced, it is moved to the end of the queue.

Least Frequently UsedExploit the overall popularity rather than temporal locality.Least referenced entry is always chosen for eviction.

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 10 / 25

Optimal cache algorithmEvict entry that will not be used for the longest time.Unfeasible in practice, since we cannot really know future !

First In First OutMost simple cache replacement policy,Entry remain in memory a constant duration.

Least Recently UsedEnhancement to FIFO.Each time an entry is referenced, it is moved to the end of the queue.

Least Frequently UsedExploit the overall popularity rather than temporal locality.Least referenced entry is always chosen for eviction.

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 10 / 25

Optimal cache algorithmEvict entry that will not be used for the longest time.Unfeasible in practice, since we cannot really know future !

First In First OutMost simple cache replacement policy,Entry remain in memory a constant duration.

Least Recently UsedEnhancement to FIFO.Each time an entry is referenced, it is moved to the end of the queue.

Least Frequently UsedExploit the overall popularity rather than temporal locality.Least referenced entry is always chosen for eviction.

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 10 / 25

Optimal cache algorithmEvict entry that will not be used for the longest time.Unfeasible in practice, since we cannot really know future !

First In First OutMost simple cache replacement policy,Entry remain in memory a constant duration.

Least Recently UsedEnhancement to FIFO.Each time an entry is referenced, it is moved to the end of the queue.

Least Frequently UsedExploit the overall popularity rather than temporal locality.Least referenced entry is always chosen for eviction.

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 10 / 25

III- Qemu internals

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 11 / 25

Dynamic Binary Translation in Qemu

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 12 / 25

Dynamic Binary Translation in Qemu

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 12 / 25

Dynamic Binary Translation in Qemu

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 12 / 25

Bloc chaining

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 13 / 25

Bloc chaining

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 13 / 25

lookup tbby target pc

Cached ? Translate onebasic block

execute tbchain itto existedbasic block

Exceptionhandling

no

yes

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 14 / 25

lookup tbby target pc

Cached ? Translate onebasic block

execute tbchain itto existedbasic block

Exceptionhandling

no

yes

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 15 / 25

Focus on (Translate one basic block)

try to allocatespace for tb

sucess ?Flush entiretranslation

cache

generate op& host code

allocatespace for tb(cannot fail!)

no

yes

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 16 / 25

Implementation constraints

Variable TB sizeIn basics cache algorithms, evicting one entry is always sufficient to bring an other,but in our case, TB size is not only variable, but also unknown during allocation.

Self modifying codeWhen the executed code modify it self, the TB is re-translated into differentspace. thus result in many memory allocation while only the last one is needed.

Low overheadWe need to predict if the the replacement cache overhead remain below the costof cache flush, otherwise, we should simply flush the entire cache.

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 17 / 25

Implementation constraints

Variable TB sizeIn basics cache algorithms, evicting one entry is always sufficient to bring an other,but in our case, TB size is not only variable, but also unknown during allocation.

Self modifying codeWhen the executed code modify it self, the TB is re-translated into differentspace. thus result in many memory allocation while only the last one is needed.

Low overheadWe need to predict if the the replacement cache overhead remain below the costof cache flush, otherwise, we should simply flush the entire cache.

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 17 / 25

Implementation constraints

Variable TB sizeIn basics cache algorithms, evicting one entry is always sufficient to bring an other,but in our case, TB size is not only variable, but also unknown during allocation.

Self modifying codeWhen the executed code modify it self, the TB is re-translated into differentspace. thus result in many memory allocation while only the last one is needed.

Low overheadWe need to predict if the the replacement cache overhead remain below the costof cache flush, otherwise, we should simply flush the entire cache.

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 17 / 25

IV- Preliminary Results

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 18 / 25

GoalsSimulate LRU & LFU Algorithms,Compare cache hit ratio,Evaluate overhead of each algorithm.

AssumptionsWe ignore TB size & cache size,Quota of retained entries is 1/5,Cache size is just limited by number of TB,

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 19 / 25

GoalsSimulate LRU & LFU Algorithms,Compare cache hit ratio,Evaluate overhead of each algorithm.

AssumptionsWe ignore TB size & cache size,Quota of retained entries is 1/5,Cache size is just limited by number of TB,

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 19 / 25

Execution ratio = (executions/translation)

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 20 / 25

LFU cache hit ratio

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 21 / 25

LRU cache hit ratio

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 22 / 25

Perspectives

find a suitable cache replacement policy that take care of implementationconstraints.use a dynamically variable quota for retained entries.add small op-code buffer to optimize re-translation of self modifying code.divide translation cache into multiple space to optimize partial cache flush.

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 23 / 25

Bibliography

QEMU Just-In-Time Code Generator and System Emulation - cmchao(March 15,2010).QEMU internals - Chad D. Kersey (January 28, 2009).QEMU, a Fast and Portable Dynamic Translator - Fabrice Bellard (USENIX2005 Annual).Performance Evaluation of Traditional Caching Policies on A Large Systemwith Petabytes of Data - 2012 IEEE Seventh International Conference onNetworking, Architecture, and Storage

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 24 / 25

Thanks for your attention !

Feel free to ask any question !

Saber F. (TIMA SLS ) ENSI 18 Avril 2013 25 / 25

top related