department of particle & particle astrophysics sea-of-flash-interface sofi introduction and...
Post on 20-Dec-2015
213 views
TRANSCRIPT
Department of Particle & Particle AstrophysicsDepartment of Particle & Particle Astrophysics
Sea-Of-Flash-InterfaceSea-Of-Flash-Interface
SOFI introduction and status
The PetaCache Review
Michael Huffer, [email protected]
Stanford Linear Accelerator CenterNovember 02, 2006
Department of Particle & Particle Astrophysics
2
Department of Particle & Particle Astrophysics
OutlineOutline
• Background– History of PPA involvement– Synergy with current activities
• Requirements– Usage model– System requirements– Individual client requirements
• Implementation– Abstract model and features– Building Blocks
• Deliverables– Packaging
• Schedule– Status– Milestones
• Summary– Reuse– Conclusions
Department of Particle & Particle Astrophysics
3
Department of Particle & Particle Astrophysics
BackgroundBackground• Research Engineering Group supports a wide range of activities with limited resources
– LSST, SNAP, ILC, SiD, EXO, LHC, LCLS, etc
• In order to utilize these resources most effectively requires understanding:– core competencies– the requirements of future electronics systems
• Two imperatives for REG:– Support upcoming experiments– Build for the future by advancing core competencies
• What are:– More detailed examples of a couple of upcoming experiments?– The necessary core competencies?
Department of Particle & Particle Astrophysics
4
Department of Particle & Particle Astrophysics
LSSTLSST
SLAC/KIPAC is lead institution for the camera– Camera contains > 3 gigapixels
• > 6 gigabytes of data/image• Readout time is 1-2 seconds
– KIPAC delivers camera DAQ system
“The Large Synoptic Survey Telescope (LSST) is a proposed ground-based 8.4-meter, 10 square-degree-field telescope that will provide digital imaging of faint astronomical objects across the entire sky, night after night. In a relentless campaign of 15 second exposures, LSST will cover the available sky every three nights, opening a movie-like window on objects that change or move on rapid timescales: exploding supernovae, potentially hazardous near-Earth asteroids, and distant Kuiper Belt Objects. The superb images from the LSST will also be used to trace billions of remote galaxies and measure the distortions in their shapes produced by lumps of Dark Matter, providing multiple tests of the mysterious Dark Energy.”
Department of Particle & Particle Astrophysics
5
Department of Particle & Particle Astrophysics
SNAPSNAP
• SLAC is lead institution for all non-FPA related electronics– One contact every 24 hours– Requires data to be stored on board instrument– Storage capacity is roughly 1 Terabyte (includes
redundancy)– Examining NAND flash as solution to storage problem
“The Supernova/Acceleration Probe (SNAP) satellite observatory is capable of measuring thousands of distant supernovae and mapping hundreds to thousands of square degrees of the sky for gravitational lensing each year. The results will include a detailed expansion history of the universe over the last 10 billion years, determination of its spatial curvature to provide a fundamental test of inflation - the theoretical mechanism that drove the initial formation of structure in the universe, precise measures of the amounts of the key constituents of the universe, ΩM and ΩL, and the behavior of the dark energy and its evolution over time.”
Department of Particle & Particle Astrophysics
6
Department of Particle & Particle Astrophysics
Core competenciesCore competencies
• System on Chip (SOC)– Integrated processors and functional blocks on an FPGA
• Small footprint, high performance, persistent, memory systems– NAND Flash
• Open Source R/T kernels– RTEMS (Real-time Executive for Multiprocessor Systems)
• High performance serial data transport and switching– MGTs (Multi-Gigabit Transceivers)
• Modern networking protocols:– 10 Gigabit Ethernet– InfiniBand– PCI-Express
Department of Particle & Particle Astrophysics
7
Department of Particle & Particle Astrophysics
PetaCache consistent with mission?PetaCache consistent with mission?
ProjectProjectUse core technology?Use core technology?
SOC Memory R/T kernels H/S transport
LSST yes no yes yes
SNAP no yes yes no
Petacache yes yes yes yes
Main Entry: syn·er·gy Pronunciation: 'si-n&r-jEFunction: nounInflected Form(s): plural -giesEtymology: New Latin synergia, from Greek synergos working together1 : SYNERGISM; broadly : combined action or operation2 : a mutually advantageous conjunction or compatibility of distinct business participants or elements (as resources or efforts)
Department of Particle & Particle Astrophysics
8
Department of Particle & Particle Astrophysics
Usage modelUsage model• System requirements:
– Scalable, both in: • Storage capacity• Number of concurrent clients
– Large address space– Random access– Support population evolution
• Features:– Changes are quasi-adiabatic
• “Write once, read many”– Able to treat as Read-Only system
• Requirements not addressed in this phase:– Access Control– Redundancy– Cost
Data Storage
distribution, transport & management
Host
client
“Lots of storage, shared concurrently by many clients, distributed over a large number of hosts”
Department of Particle & Particle Astrophysics
9
Department of Particle & Particle Astrophysics
Client RequirementsClient Requirements
• Uniform access time to fetch a “fixed” amount of data from storage– Implies: deterministic and relatively “small” latency in round-trip time
• Where: “fixed” is O(8 Kbytes) & “small” O(200 micro-seconds)
– Need approximately 40 Mbytes/sec between client & storage• Access time scales independent of:
– Address– Number of concurrent clients
Two contributions to latency:– Storage access time– Distribution, transport, and management overhead
Petacache project focus is on this issue alone
SOFI architecture attempts to address both issues
Department of Particle & Particle Astrophysics
10
Department of Particle & Particle Astrophysics
Abstract modelAbstract model
• Key features:– Available concurrency and bandwidth
scales with storage capacity– Many individual “Memory servers”
• Access granularity is 8 Kbytes• 16 GBytes of Memory/server• 40 Mbytes/sec/server
– Load Leveling• Data randomly distributed over memory
servers• Multicast for concurrent addressing• Both client & server side caching
– Two address spaces• Physical page access• Logical block access• Hides data distribution from client
– Network Attached storage
Memory serverFlash Memory Controller
(FMC)
Client
Content Addressable
Switching
Department of Particle & Particle Astrophysics
11
Department of Particle & Particle Astrophysics
Building BlocksBuilding Blocks
HostClient Interface (SOFI)
(1 of n)
1 Gigabit Ethernet (.1 GByte/sec)
(1 of n)
Application specific
Cluster Inter-Connect Module (CIM)
Host Inter-connect
1 Gbyte/sec PGP (Pretty Good
Protocol)
Network Attached Storage8 x 10 G-Ethernet
(8 GByte/sec)
10 G-Ethernet
Slice Access Module(SAM)
Four Slice Module (FSM)
256 GByte Flash
Department of Particle & Particle Astrophysics
12
Department of Particle & Particle Astrophysics
Four Slice Module (FSM)Four Slice Module (FSM)
clock configurationFPGA
PHY
to DIMM
initiator
PGP & command encode/decode
FMC1
CRC-In CRC-Out
FMC2
FMC3
FMC4
to PHY
out-bound transfer
& decode
in-bound transfer
& encode
in-bound
arbiter
out-bound
arbiter
1 DIMM (8 devices)
32 GBytes
1 x 4 slices
Department of Particle & Particle Astrophysics
13
Department of Particle & Particle Astrophysics
Flash Memory Controller (FMC)Flash Memory Controller (FMC)
• Implemented as Core IP• Controls 16 GBytes of memory (4 devices) in units of:
– Pages (8 Kbytes)– Blocks (512 Kbytes)
• Queues operations– Read Page (in units of 128 byte chunks)– Write Page– Erase Block– Read statistics counters– Read device attributes
• Transfers data at 40 Mbyte/sec
Department of Particle & Particle Astrophysics
14
Department of Particle & Particle Astrophysics
Universal Protocol Adapter (UPA)Universal Protocol Adapter (UPA)
Left side MFD
FPGA (SOC)200 DSPs
Lots of gatesXilinx XC4VFX60
Fabric clock
Right sideMGT clock
Right sidePPC-405
(450 MHZ)
Right sideConfiguration
memory128 Mbytes)
Samsung K9F5608
Right sideMemory
(512 Mbytes)Micron RLDRAM II
Right sideMulti-Gigabit Transceivers
(MGT)8 lanes
Left side 100-baseT
Reset
Reset optionsJTAG
The SAM is ½ of a UPA pair
Department of Particle & Particle Astrophysics
15
Department of Particle & Particle Astrophysics
UPA FeaturesUPA Features
• “Fat” Memory Subsystem– Sustains 8 Gbytes/sec– “Plug-In” DMA interface (PIC)
• Designed as a set of IP cores • Designed to work in conjunction with MGT and protocol cores
• Bootstrap loader (with up to 16 boot options and images)• Interface to configuration memory• Open Source R/T kernel (RTEMS)• 100 base-T Ethernet interface• Full network stack
“Think of the UPA as a Single Board Computer (SBC) which interfaces to one or more busses through its MGTs”
Department of Particle & Particle Astrophysics
16
Department of Particle & Particle Astrophysics
UPA Customization for SAMUPA Customization for SAM
• Implements two cores:– PGP – 10-GE
• All 8 lanes of MGT used:– 4 lanes for PGP core– 4 lanes for 10-GE
• Network driver to interface 10G-E to network stack • Executes application code to satisfy:
– Server side of SOFI client interface• Physical to Logical translation• Server side caching
– FSM management software• Proxy FMC command set• Maintains bad blocks• Maintains available blocks
Department of Particle & Particle Astrophysics
17
Department of Particle & Particle Astrophysics
(Cluster Inter-connect Module (CIM)(Cluster Inter-connect Module (CIM)
(21)(16)(16)(16)(16) (21) (22)
(4) (4)(4)(4)
(8)
to SAMs (high-speed) to SAMs (low-speed)
to host inter-connect (management-network)to host inter-connect (data-network)
High Speed SwitchData
(24 x 10-GE)Fulcrum FM2224
Switch management
(UPA)
10 GE
(XUI)
10 GE
(XUI)Low-speed
SwitchManagement
(24 x FE + 4 x GE)Zarlink ZL33020
1000 baseT
100 baseT
1000 baseT
100 baseT
Department of Particle & Particle Astrophysics
18
Department of Particle & Particle Astrophysics
Client/Server InterfaceClient/Server Interface
• Client Interface resides on host• Servers reside on SAMs• Any one client on any one host has uniform access to all flash storage• Client accesses flash through network Interconnect• Abstract Interconnect model
– Delivered implementation is IP (UDP and multicast services)• Interface delivers three types of services:
– Random Read access to objects within the store– Population of objects within the store (Write and Erase access)– Access to performance metrics
• Client Interface is Object-Oriented (C++)– Class library (distributed as a set of binaries and header files)
• Two address spaces (physical & logical)– Client access information only in logical space– Client is not sensitive to actual physical location of information– Population distribution is pseudo-random ( static load leveling)
Department of Particle & Particle Astrophysics
19
Department of Particle & Particle Astrophysics
221
22
22
232
20
Page
Slice
Manager
Interconnect
AddressingAddressing
Controller
20 x 232 x 22 x 22 x 221 = 128K peta-pages (1M peta-bytes)
Physical addressing (1 page = 8 Kbytes)
Logical Addressing (1 block = 8 Kbytes)
264
264
264
20
Bundle
Block
Partition
Interconnect
Department of Particle & Particle Astrophysics
20
Department of Particle & Particle Astrophysics
Using the interface Using the interface
• Partition is a management tool– Segment logically storage into disjoint sets– One-to-One correspondence between a partition and a server– One SAM may host more then one server
• Bundle is an organization tool– Bundle belongs to one (and only one) partition– Bundle is an access pattern hint. Allows:
• fetch look-ahead• optimization of overlapping fetches from different clients
• Both partition and bundle are assigned unique identifiers (over all time)• Identifiers may have character names (alias)
– Assigned at population time
• Client query is composed of: partition/cluster/offset/length– offset is expressed in units of blocks– length is express in units of bytes
• Client may query by either identifier or alias
Department of Particle & Particle Astrophysics
21
Department of Particle & Particle Astrophysics
DeliverablesDeliverables
• Two FSMs (8 slices) – 1/2 TByte
• Two SAMs– Enough to support FSM operations
• Client/Server interface (SOFI)– Targeted to Linux
• How will the hardware be packaged?– Where packaging is defined as:
• How the building blocks are partitioned• The specification of the electro-mechanical interfaces
Department of Particle & Particle Astrophysics
22
Department of Particle & Particle Astrophysics
The “Chassis”The “Chassis”
AcceptsDC power
PassiveBackplan
e
8 U
X2(XENPACK MSA)
1UFan-Tray
1UAir-Outlet
1UAir-Inlet
• 2 FSMs/Card– 1/2 TByte
• 16 Cards/Bank– 8 TByte
• 2 Banks/Chassis– 64 SAMS– 1 CIM– 16 TByte
• 3 chassis/rack– 48 TByte
Supervisor Card (8U)
Line Card (4U)
Department of Particle & Particle Astrophysics
23
Department of Particle & Particle Astrophysics
48 TByte facility48 TByte facility
Catalyst 6500 (3 x 4 10GE, 2 x 48 1GE)
SOFI Host ( 1 x 96)xRootD servers
1 chassis
Department of Particle & Particle Astrophysics
24
Department of Particle & Particle Astrophysics
Schedule/StatusSchedule/Status
• Methodology:– Hardware
• Implement 3 “platforms”– One for each type of module
• Decouple packaging from architectural & implementation issues…
– Evaluate layout issues concerning high-speed signals
– Evaluate potential packaging solutions– Allow concurrent development of VHDL & CPU code
IP protocol implementation
IP protocol implementation
Client API
logical/physcial translationcache management
FSM interface
The “wire”
logical/physcial translationcache management
SAM
Host
– Software• Emulate FSM component of server software
– Complete/debug in absence of hardware– Allows clients an “early look” at interface
Department of Particle & Particle Astrophysics
25
Department of Particle & Particle Astrophysics
Evaluation platformsEvaluation platforms
• UPA – Memory subsystem– Bootstrap loader– Configuration memory– RTEMs – Network stack/network driver interface issues
• CIM– Low and high speed management– Evaluate different physical interfaces (including X2)
• FSM Line card (depending on packaging this could be production prototype)– FMC debug– PGP debug
Department of Particle & Particle Astrophysics
26
Department of Particle & Particle Astrophysics
ScheduleSchedule
October November December January February March
schematic
layout
debug
spin/load
design
specification
implement
Chassis/mechanical
PIC
RTEMS/UPA
UPA/PGP
UPA/10GE driver
UPA/10GE MAC
SOFI
UPA platform 1
CIM platform 2
Supervisor PCB 4
Backplane 5
Line Card PCB 3
activities
products
Department of Particle & Particle Astrophysics
27
Department of Particle & Particle Astrophysics
MilestonesMilestones
MilestoneMilestone datedate
RTEMS running on UPA evaluation platform 2rd week December/2006
SOFI (emulation) ready 3rd week January/2007
Supervisor PCB ready for debug 3rd week January/2007
Chassis & PCBs complete 3th week of Febuary/2007
Start Test & Integrate 2nd week of March/2007
Department of Particle & Particle Astrophysics
28
Department of Particle & Particle Astrophysics
StatusStatus
ProductsProductsspecificationspecification designdesign implementationimplementation
SOFI in-progress in-progress
DIMM FCS FSM in-progress
SAM in-progress
CIM in-progress
UPA in-progress
PGP core 10-GE core
The “chassis”
Department of Particle & Particle Astrophysics
29
Department of Particle & Particle Astrophysics
Products & ReuseProducts & Reuse
ProductProduct Targeted for use?Targeted for use?
Petacache LSST Camera DAQ SNAP LCLS DAQ Atlas Trigger Upgrade
UPA yes yes no yes yes
10-GE core yes yes no yes yes
PGP core yes yes no yes yes
FCS yes no yes no no
CIM yes yes no yes yes
FSM yes no no no no
SAM yes no no no no
DIMM yes no no no no
SOFI yes no no no no
The “chassis” yes maybe no maybe maybe
Department of Particle & Particle Astrophysics
30
Department of Particle & Particle Astrophysics
ConclusionsConclusions
• Robust and well developed architecture – Concurrency and bandwidth scale as storage is added– Logical Address space hides client from actual data distribution– Network Attached Storage– Scalable (in size and users)
• Packaging solution may need an iteration…• Schedule
– Somewhat unstable, however… • sequence and activities are to a large degree correct• risk is in development of 10 GE
– Well-along implementation road• Well developed synergy between Petacache and the current activities of ESE
– Great mechanism to develop core competencies– Many of the project deliverables are directly usable in other experiments