btev workshopnashville, nov 15, 2002 mossé, pitt btev-rtes project very lightweight agents: vlas...
TRANSCRIPT
BTeV Workshop Nashville, Nov 15, 2002Mossé, Pitt
BTeV-RTES ProjectVery Lightweight Agents:
VLAs
Daniel Mossé, Jae Oh, Madhura Tamhankar, John
GrossComputer Science Department
University of Pittsburgh
BTeV Workshop Nashville, Nov 15, 2002Mossé, Pitt
Shameless plug
LARTESIEEE Workshop on Large Scale Real-Time and Embedded Systems
In conjunction withIEEE Real-Time Systems Symposium (RTSS 2002 is on Dec 3-5, 2002)
December 2, 2002Austin, TX, USA
http://www.rtss.org/LARTES.html
BTeV Workshop Nashville, Nov 15, 2002Mossé, Pitt
Collider detectors are about the size of a small apartment building. Fermilab's two detectors-CDF and DZero-are about four stories high, weighing some 5,000 tons (10 million pounds) each. Particle collisions occur in the middle of the detectors, which are crammed with electronic instrumentation.
Each detector has about 800,000 individual pathways for recording electronic data generated by the particle collisions. Signals are carried over nearly a thousand miles of wire and cable.
BTeV Test Station
Information from FERMI National Accelerator Laboratory
BTeV Workshop Nashville, Nov 15, 2002Mossé, Pitt
Information from FERMI National Accelerator Laboratory
L1/L2/L3 Trigger Overview
BTeV Workshop Nashville, Nov 15, 2002Mossé, Pitt
Software Perspective
Reconfigurable node allocation
L1 runs one physics application, severely time constrained
L2/L3 runs several physics applications, little time constraints
Multiple operating systems and differing processors
TI DSP BIOS, Linux, Windows?
Communication among system sections via fast network
Fault tolerance is essentially absent in embedded and RT systems
System Characteristics
BTeV Workshop Nashville, Nov 15, 2002Mossé, Pitt
Linux Nodes (320)RH 8.x Linux
Low-Level VLA
Global ManagerTimeSys RT LinuxGlobal Manager
VLA
Section Managers (8), RH 8.x Linux, Section Manager VLA
Regional L2/L3 Manager (1)
TimeSys RT LinuxRegional Manager VLA
Regional L1 Manager (1)TimeSys RT Linux
Regional Manager VLA
Crate Managers (20), TimeSys RT Linux, Crate Manager VLA
DSPs (8)TI DSP BIOS
Low-Level VLA
Farmlet Managers (16)TimeSys RT Linux
Farmlet Manager VLA
Gig
abit
Eth
ern
et
Gigabit Ethernet
Data ArchiveExternal Level
L1/L2/L3 Trigger Hierarchy
BTeV Workshop Nashville, Nov 15, 2002Mossé, Pitt
Minimize footprint
Platform independence
Monitor hardware
Monitor software
Comprehensible source code
Communication with high-level software entity
Error prediction
Error logging and messaging
Schedule and priorities of test events
Proposed Solution: Very Lightweight Agent
Very Lightweight Agents (VLAs)
BTeV Workshop Nashville, Nov 15, 2002Mossé, Pitt
PhysicsApplication
Hardware OS Kernel(DSP BIOS)
VLA
Level 1 Farm Nodes
L1 Manager Nodes
VLAs on L1 and L2/3 nodes
PhysicsApplication
Hardware OS Kernel(Linux)
VLA
L2/L3 Manager Nodes
PhysicsApplication
Level 2/3 Farm Nodes
NetworkAPI
NetworkAPI
BTeV Workshop Nashville, Nov 15, 2002Mossé, Pitt
Level 1/2/3 Manager Nodes
ARMOR
HardwareLinuxKernel
VLA
ManagerApplication
NetworkAPI
To Network
VLA Error Reporting
VLA
DSP
BTeV Workshop Nashville, Nov 15, 2002Mossé, Pitt
VLA Error Prediction
Buffer overflow: 1. VLA message or application data input buffers may overflow 2. Messages or data lost in each case 3. Detection through monitoring fill rate and overflow condition 4. High fill rate indicative of
* high error rate, producing messages* undersized data buffers
Throttled CPU: 1. Throttled from high temperature 2. Throttle by erroneous power saving feature 3. Causes missed deadlines due to low CPU speed 4. Potentially critical failure if L1 data not processed fast enough
Note the the CPU may be throttled on purpose
BTeV Workshop Nashville, Nov 15, 2002Mossé, Pitt
VLA Error Logging
VLAPackages info:1. Message time2. Operational data3. Environmental data4. Sensor values5. App & OS error codes6. Beam crossing ID
“15” Message B
uffer
Com
munication A
PI
HardwareFailures
SoftwareFailures
Message B
uffer
Com
munication A
PI
ARMOR1. Reads messages2. Stores/uses for error prediction3. Appends appropriate info4. Sends to archive
TCP/IPEthernet
Data Archive
FILTERS
BTeV Workshop Nashville, Nov 15, 2002Mossé, Pitt
VLA Scheduling Issues
L1 trigger application has highest priority
VLA must run sufficiently to ensure efficacy of purpose
VLA must internally prioritize error tests
VLA must preempt the L1 trigger app on critical errors
Task priorities must be alterable during run-time
BTeV Workshop Nashville, Nov 15, 2002Mossé, Pitt
VLA Scheduling Issues
Kernel Physics Application
VLA Kernel Physics Application
VLA
VLA
VLA
NormalScheduling
KernelPhysics
ApplicationV
LA Kernel Physics Application
VLA
VLA
VLA
VLA
VLA
AdaptiveResource
Scheduling
When physics app is unexpectedlyended, more VLAs can be scheduled
Kernel Physics Application KernelPhysics
ApplicationVLA
VLA
AlternativeScheduling
Concept
VLA has ability to control its own priority and thatof other apps, based on internal decision making
BTeV Workshop Nashville, Nov 15, 2002Mossé, Pitt
VLA Scheduling Issues
External Message Source(FPGA)
PhysicsApplication
Kernel
VLAInhibitor
NoVLA
BTeV Workshop Nashville, Nov 15, 2002Mossé, Pitt
VLA Status
• Current Status– VLA skeleton and timing implemented in Syracuse (poster)
– Hardware platform from Vandy
– Software (muon application) from Fermi and UIUC
– Linux drivers to use GME and Vandy devkit
• Near term– Muon application to run on the dsp board
– Muon application timing
– Instantiate VLAs with Vandy hardware and Muon application
BTeV Workshop Nashville, Nov 15, 2002Mossé, Pitt
VLA and Network Usage
• Network usage influences amount of data dropped by Triggers and other Filters
• Network usage typically not considered in load balancing algorithms (assume network is fast enough)
• VLAs monitor and report network usage• Agents use this information to re-distribute loads• Network architecture to control flows on a per-process
basis (http://www.netnice.org)