august 8 th, 2011 kevan thompson creating a scalable coherent l2 cache

Post on 14-Dec-2015

227 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

August 8th, 2011Kevan Thompson

Creating a Scalable Coherent L2 Cache

Motivation

Cache Background

System Overview

Methodology

Progress

Future Work

Outline

2

Goal

Create a configurable shared Last Level Cache for the use in the PolyBlaze system

Motivation

3

Introduction

4

Zia

Eric

Kevan

In modern systems, processors out perform main memory, creating a bottleneck

This problem is only exacerbated as more cores contend for the memory

This problem is reduced if each processor maintains a local copy of the data

Cache Background

5

A cache is a small amount of memory on the same die as the processor

The cache is capable of providing a lower latency and a higher throughput than the main memory

Systems may include multiple cache levels

The smallest and most local cache is the L1 cache. The next level cache is the L2, etc

Caches

6

Shared Last Level Cache

Acts as a common location for data

Can be used to maintain cache coherency between processors

Does not exist in current MicroBlaze system

We will design our own shared L2 Cache to maintain cache coherency

7

Cache Speeds

In typical systems:

An L1 cache is very fast (1 or 2 cycles )

An L2 cache is slower (10’s of cycles)

Main memory is very slow (100’s of cycles)

8

Cache Speeds

In our system we expect :

The L1 cache to be very fast (1 or 2 cycles )

The L2 cache to be about (10 of cycles)

Main memory to be faster (10’s of cycles)

In order to model the memory bottleneck of a much faster system we’ll need to stall the Main Memory

9

Direct Mapped Cache

10

Caches store Data, a Valid Bit and a unique identifier called a tag

Tags

11

As an example imagine a system with the following :

32-bit Address Bus, and 32-bit Word Size

64-KByte Cache with 32-Byte Line Size

Therefore we have 2047 (211) Lines

Set-Associated Cache

12

A cache with n possible entries for each address is called an n-way set associated cache

4-Way Set Associated Cache

Replacement Policies

13

When an entry needs to be evicted from the cache we need to decide which Way it is evicted from.

To do this we use a replacement policy

LRU

Clock

FIFO

LRU

14

Keep track of when each entry is accessed

Always evict the Least Recently Used

Implemented using a stack

MRU

LRU

Access 4 Access 2

Clock

15

For each Way we store a Reference Bit

Also store a pointed to the oldest entry (Hand)

Starting with the Hand we test and clear each R Bit until we reach one that is 0

0 1 2 3

01 1 10 0 0

System Overview

16

PolyBlaze L2 Cache

17

1-16 Way Set Associated Cache

LRU or Clock Replacement Policy

32 or 64 Byte Line Width

64 Bit Memory Interface

Write Back Cache

L2 Cache

18

Reuse Policy

19

Determines which Way is evicted on Cache Miss

Currently uses LRU Policy

Tag Bank

20

Contains Tags and Valid Bits

Stored on FPGA using BRAMs

Instantiate one bank for each Way

Control Unit

21

Finite State Machine for L2 Cache Pipelining

If a request is outstanding from NPI we can service other requests in SRAM

Data Bank

22

Control interface for off-chip SRAM

SRAM

23

32-bit ZBT synchronous SRAM

1 MB

Methodology

24

Break L2 cache into three parts and test separately then combine and test system

SRAM Controller

NPI Interface

L2 Core

Complete L2 Cache

SRAM Controller

25

Create a wrapper that connects the SRAM controller to the MicroBlaze by an FSL

Write a program that will write and read data to all addresses in the SRAM

Write all 1’s

Write all 0’s

Alternate writing all 1’s and all 0’s

Write Random data

NPI Interface

26

Uses a custom FSL width, so we cannot test using MicroBlaze

Create a hardware test bench to read and write data to all addresses

Write all 1’s

Write all 0’s

Alternate writing all 1’s and all 0’s

Write Random data

X

X

X

X

L2 Core

27

Simulate the core of the L2 cache in iSim

Write a test bench that will approximate the responses from the L1/L2 Arbiter, SRAM Controller, and NPI Interface

The test bench will write to each line multiple times to create a large number of cache misses

X

X

X

Complete L2 Cache

28

Combine the L2 Cache with the rest of PolyBlaze

Write test programs to read and write to various regions of memory

X

X

Current Progress

29

SRAM Controller and Data Bank:

Designed and Tested

NPI Interface:

Testing and Debugging in Progress

L2 Core:

Testing and Debugging in Progress

Future Work

30

Add Clock Replacement Policy to L2 Cache

Add a Write Back Buffer to L2 Cache

Migrate System from XUPV5 to a BEE3 so we can create a system with more cores

Modify the L2 Cache into a NUMA system

Add Custom Hardware Accelerators to PolyBlaze

Questions?

31

top related