using the trigger test stand at cdf for benchmarking cpu (and eventually gpu) performance wesley...

18
Using the Trigger Test Stand at CDF for Benchmarking CPU (and eventually GPU) Performance Wesley Ketchum (University of Chicago) 10.27.2010

Post on 24-Jan-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using the Trigger Test Stand at CDF for Benchmarking CPU (and eventually GPU) Performance Wesley Ketchum (University of Chicago) 10.27.2010

Using the Trigger Test Stand at CDF for Benchmarking CPU (and eventually GPU)

Performance

Wesley Ketchum (University of Chicago)

10.27.2010

Page 2: Using the Trigger Test Stand at CDF for Benchmarking CPU (and eventually GPU) Performance Wesley Ketchum (University of Chicago) 10.27.2010

Outline

• Overview of previous work done for calculations done by a CPU– Description of test stand and components in our setup

– Latency measurements for a track fitting algorithm measured by PULSARS and internal timing in CPU

• Preliminary studies on latency measurements for calculations done by GPU– Comparisons with CPU

– Future work

Page 3: Using the Trigger Test Stand at CDF for Benchmarking CPU (and eventually GPU) Performance Wesley Ketchum (University of Chicago) 10.27.2010

Goals of Previous Work done with CPU

• Goals:– Restore CDF L2 test stand to working state– Configure pulsar boards to transmit and receive test

patterns– Run simplified linear track fitting algorithm on CPU

• Input read in from test patterns sent via S-LINK– Measure latency using internal CPU timing functions

and PULSAR boards

• Work served as required experimental project for Ho Ling Li (now 2nd year UChicago grad student)– Help from Jian Tang (UChicago), Pierluigi Catastini

and Ted Liu (FNAL)

Page 4: Using the Trigger Test Stand at CDF for Benchmarking CPU (and eventually GPU) Performance Wesley Ketchum (University of Chicago) 10.27.2010

Flow Chart of Test Stand Setup

AUX CardAUX Card

FILARFILAR

SOLARSOLAR

GPUGPU

MemoryMemory

CPUCPU

S-LINK Tx

S-LINK Rx

Page 5: Using the Trigger Test Stand at CDF for Benchmarking CPU (and eventually GPU) Performance Wesley Ketchum (University of Chicago) 10.27.2010

Physical Test Stand Setup• Pulsars housed in

VME crate– Tools exist to

communicate/load code into crate

– That code controls run configurations

• PC is a retired L2 Linux Machine– Equipped with

FILAR and SOLAR cards to receive/send S-LINK packets

• “Runs” occur using CDF RunControl DAQ software– Level 1 Accept

prompts sending of loaded test patterns

Page 6: Using the Trigger Test Stand at CDF for Benchmarking CPU (and eventually GPU) Performance Wesley Ketchum (University of Chicago) 10.27.2010

• PULSARS– PULSer And Recorder– Highly configurable

• Special purpose firmware loaded into FPGAs, defining board function

– Used for variety of purposes in L2 trigger at CDF

• S-LINK Tx– Test patterns loaded into

board, send on L1A

• AUX card– Attached to back of Tx– Sends out multiple copies of

S-LINK packets

• S-LINK Rx– Fitted with 4 mezzanine cards

that read in S-LINK packets– Measure time (to 100 ns)

after L1A a packet was received

S-LINK TxS-LINK Tx AUX CardAUX Card S-LINK RxS-LINK Rx

The PULSARS

S-LINK Card

Page 7: Using the Trigger Test Stand at CDF for Benchmarking CPU (and eventually GPU) Performance Wesley Ketchum (University of Chicago) 10.27.2010

• FILAR– Four Input Links for Atlas

Readout– Accepts S-LINK packets,

stored into PC memory on arrival

• SOLAR– Single Output Link for

Atlas Readout– Sends out specified

memory in S-LINK format

• FILAR and SOLAR cards connect to PC via PCI-X slots

FILARFILAR

SOLARSOLAR

FILAR and SOLAR Cards

FILAR

Page 8: Using the Trigger Test Stand at CDF for Benchmarking CPU (and eventually GPU) Performance Wesley Ketchum (University of Chicago) 10.27.2010

• The PC– 2.4 GHz processor speed– Pre-developed tools from L2 testing

for…• Reading in from FILAR• Sending out along SOLAR• Internal timing

• Track Fitting Procedure1. Copy in “track” data from S-LINK

package2. Retrieve constant set used for

evaluating fit parameters3. Run (linear) track fitting algorithm to

calculate fit parameters4. Store calculated parameters (and

internal timing info) to be sent on SOLAR

PCPC

PC and Track Fitting Algorithm

Page 9: Using the Trigger Test Stand at CDF for Benchmarking CPU (and eventually GPU) Performance Wesley Ketchum (University of Chicago) 10.27.2010

Latency Measurement Strategy

• From PULSARS– Record arrival time of packet coming straight from

AUX Card– Record arrival time of packet coming from PC

• Checking fit parameter evaluation has been done– Difference is time for PC evaluation (neglecting extra

cable time, which is small)

• From PC– Place time stamps around running of algorithm– Output difference along S-LINK

• Determine latency for various iterations of fitting algorithm (only step 3 from previous slide)– Model as TPC = n Talg + TO

Page 10: Using the Trigger Test Stand at CDF for Benchmarking CPU (and eventually GPU) Performance Wesley Ketchum (University of Chicago) 10.27.2010

Sample PULSAR Latency Measurements

Track fitting algorithm run once. Track fitting algorithm not run (read-in then read-out).

Page 11: Using the Trigger Test Stand at CDF for Benchmarking CPU (and eventually GPU) Performance Wesley Ketchum (University of Chicago) 10.27.2010

Algorithm Times as Measured in PULSAR and PC

Linear Scale Log Scale

Page 12: Using the Trigger Test Stand at CDF for Benchmarking CPU (and eventually GPU) Performance Wesley Ketchum (University of Chicago) 10.27.2010

Internal Timing Measurements

• Having validated CPU internal timing, place time stamps around various steps of track fitting procedure

Fitting algorithm run only once. Fitting algorithm run 100 times.

Page 13: Using the Trigger Test Stand at CDF for Benchmarking CPU (and eventually GPU) Performance Wesley Ketchum (University of Chicago) 10.27.2010

New Work with GPU

• Recently got new machine capable of housing a GPU– NVIDIA GTX 285 (for computations)– eVGA e-GeForce 9500 GT (for display)– Intel Core i7 Processor, 2.80 GHz– 6 GB RAM– 2 PCIe slots (GPUs) and 2 PCI-X slots (FILAR and

SOLAR)• Use CUDA tools/framework to run same linear track

fitting algorithm for multiple tracks in a GPU– Focus so far with getting things running with same simple

code– Plenty of optimization to go with just simple code, even

more when we complicate the fitting procedure

Page 14: Using the Trigger Test Stand at CDF for Benchmarking CPU (and eventually GPU) Performance Wesley Ketchum (University of Chicago) 10.27.2010

Recent Results with Internal Timing Measurements

Page 15: Using the Trigger Test Stand at CDF for Benchmarking CPU (and eventually GPU) Performance Wesley Ketchum (University of Chicago) 10.27.2010

Conclusion and Outlook

• Developed setup at test stand to measure latency of track fitting algorithm in CPU– Can include full readout times via timing information in

PULSARS

• Have new machine capable of housing GPU, FILAR, and SOLAR cards– Makes possible doing latency measurements for

calculations done in GPU

– Can compare with similar calculations in CPU

• Near Future– Setup new machine at test stand in place of old L2

PC and provide performance benchmark

Page 16: Using the Trigger Test Stand at CDF for Benchmarking CPU (and eventually GPU) Performance Wesley Ketchum (University of Chicago) 10.27.2010

BACKUP SLIDES

Page 17: Using the Trigger Test Stand at CDF for Benchmarking CPU (and eventually GPU) Performance Wesley Ketchum (University of Chicago) 10.27.2010

Cluster

Electron

Trigger Test stand at CDF

GPU

SLINKMerger

SVT TX

SVT Rx

Slinkto PCI

mem

CPUPCI to Slink

SLINK

Page 18: Using the Trigger Test Stand at CDF for Benchmarking CPU (and eventually GPU) Performance Wesley Ketchum (University of Chicago) 10.27.2010

Flow Chart of Test Stand Setup

S-LINK TxS-LINK Tx AUX CardAUX Card

S-LINK Rx S-LINK Rx

FILARFILAR

SOLARSOLAR

GPUGPU

MemoryMemory

CPUCPU