fsw workshop 2011flightsoftware.jhuapl.edu/files/2011/fsw11_monaco.pdffigure from: “gcc front-end...

21
FSW Workshop 2011 RBSP Mission Ops Flight Software Simulator – Saving & Restoring Sessions Christopher A. Monaco JHU/APL [email protected] 240-228-5387

Upload: lytuong

Post on 27-Mar-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FSW Workshop 2011flightsoftware.jhuapl.edu/files/2011/FSW11_Monaco.pdfFigure from: “GCC Front-End Internals”, Andi Hellmund, March 6 2011 int a;! Complex Data Structures? 11 FSW-11

FSW Workshop 2011

RBSP Mission Ops Flight Software Simulator – Saving & Restoring Sessions

Christopher A. Monaco JHU/APL

[email protected] 240-228-5387

Page 2: FSW Workshop 2011flightsoftware.jhuapl.edu/files/2011/FSW11_Monaco.pdfFigure from: “GCC Front-End Internals”, Andi Hellmund, March 6 2011 int a;! Complex Data Structures? 11 FSW-11

Overview

2 FSW-11 October 19-21, 2011

 Mission Operations tool (FAST) that simulates the state of the spacecraft and validates command sequences against operational constraints in “faster-than-real-time”.

 The platform for the flight component of the simulator is quad core x86 architecture running 64-bit SUSE Linux with real-time preemptive patch

 The simulator executes the RBSP flight software with targeted modifications and additional simulation framework and modeling

  It will be used on a daily basis and terminated between sessions  Technical challenges encountered with FAST: Endian, Time,

and Save/Restore

Page 3: FSW Workshop 2011flightsoftware.jhuapl.edu/files/2011/FSW11_Monaco.pdfFigure from: “GCC Front-End Internals”, Andi Hellmund, March 6 2011 int a;! Complex Data Structures? 11 FSW-11

RBSP FSW Architecture

3 FSW-11 October 19-21, 2011

HW components Mission Libraries

Mission Applications

cFE

Platform Support Package (PSP)

Operating System Abstraction Layer (OSAL)

App App

App App

App App

App App •  FSW uses cFE middleware

•  Event/Message based architecture

•  Applications are loosely coupled

•  FSW applications interfaces only to mission libraries (and cFE, PSP and OSAL)

•  Mission libraries may interface with devices – EEPROM, SSR, Interface Card, etc

Page 4: FSW Workshop 2011flightsoftware.jhuapl.edu/files/2011/FSW11_Monaco.pdfFigure from: “GCC Front-End Internals”, Andi Hellmund, March 6 2011 int a;! Complex Data Structures? 11 FSW-11

Simulator FSW-Component Architecture

4 FSW-11 October 19-21, 2011

  Simulator also uses cFE, with different OSAL and PSP for Linux and x86

 Minor changes to cFE  Minor changes to applications  HW independent (FSW) libraries used by simulator  HW dependent libraries are replaced by libs with emulations of

HW

Linux Workstation Emulation & Simulation Libraries

Mission Applications  Applications have no other

dependencies outside of cFE, OSAL, PSP and a few libraries

Page 5: FSW Workshop 2011flightsoftware.jhuapl.edu/files/2011/FSW11_Monaco.pdfFigure from: “GCC Front-End Internals”, Andi Hellmund, March 6 2011 int a;! Complex Data Structures? 11 FSW-11

Save/Restore Requirements

5 FSW-11 October 19-21, 2011

 The state file needs to be human readable – in the event that key elements need to be modified.  Infrequent and with the assistance of simulator developers.

 Restore process must be able to input state into new release of the simulator.  Not necessarily releases that take new FSW. Releases of new FSW to the SC

would be accompanied by a CPU reset.  A minor change to a model should not prevent loading the simulator state

 Constraint – the system has significant performance requirements – the solution to save/restore cannot impact the run-time of the system.

Page 6: FSW Workshop 2011flightsoftware.jhuapl.edu/files/2011/FSW11_Monaco.pdfFigure from: “GCC Front-End Internals”, Andi Hellmund, March 6 2011 int a;! Complex Data Structures? 11 FSW-11

Off the Shelf Solutions

6 FSW-11 October 19-21, 2011

  Cryopid2   Open source application checkpointing software   Handles sockets, pipes, open files, data   Available via SourceForge   Only works for 32-bit kernels – we are using a 64-bit kernel.   Checkpoint is NOT human readable/editable

  DMTCP – Distributed Mutlithreaded Checkpointing   Handles sockets, pipes, open files, data   Available via SourceForge   Able to build and execute on our 64-bit Linux distro.   Checkpoint is NOT human readable   No noticeable performance impact   Several warning messages during execution   Checkpoints are NOT compatible across minor builds – recompiling will invalidate a

checkpoint

  Berkeley Labs Checkpoint Restore (BLCR) – planning to look at this

Page 7: FSW Workshop 2011flightsoftware.jhuapl.edu/files/2011/FSW11_Monaco.pdfFigure from: “GCC Front-End Internals”, Andi Hellmund, March 6 2011 int a;! Complex Data Structures? 11 FSW-11

Custom Solution

7 FSW-11 October 19-21, 2011

  State file to reference internal data the same way the source code does.   Use the real variable names - human readable.

  Build the save/restore functions into the simulation software   Use the compiler to translate xxx.yyy variable mnemonic into the relocatable

address.   Variable xxxx.yyyy in state file (if name does not change) can be restored across releases.

  Across minor releases of the simulator the FSW does not change. Simulator executive and models may change.   FSW variable names do not change, and FSW state can persist across minor releases of

simulator.   The simulation has the functionality to pause – where each thread returns to

a home state – provides a good opportunity to perform save and restore operations.

  Challenge:  How to identify all of the data in the system to save / restore?

Page 8: FSW Workshop 2011flightsoftware.jhuapl.edu/files/2011/FSW11_Monaco.pdfFigure from: “GCC Front-End Internals”, Andi Hellmund, March 6 2011 int a;! Complex Data Structures? 11 FSW-11

Options

8 FSW-11 October 19-21, 2011

  Hand pick elements that are important to save and restore and write code for these elements?   Very manual approach and needs to be revisited with each version of the SW.

  Did we save everything that is important? or   Try to save everything and hand pick elements that should NOT be saved/

restored?   Did we exclude something that should be saved?   Can some things NOT be restored?   How do we save everything?

  Instead of choosing specific state information that we want to save/restore, lets list everything and then find things that we know we don’t want to (or cannot) save/restore.

Page 9: FSW Workshop 2011flightsoftware.jhuapl.edu/files/2011/FSW11_Monaco.pdfFigure from: “GCC Front-End Internals”, Andi Hellmund, March 6 2011 int a;! Complex Data Structures? 11 FSW-11

How to Identify All of the Internal Data?

9 FSW-11 October 19-21, 2011

 The highest priority state information is contained within global data and function static data.

  Intermediate stage of the compilation process: Translation Unit  Generated by the compiler frontend, passed to the compiler contains an abstract

parse tree / abstract syntax tree for the entity being compiled  Can be dumped into an ASCII file using –fdump-translation-unit  The TU is for the entire context of the source file

  All include files have been pulled in  Contains information about all variables and their types in the given context

Page 10: FSW Workshop 2011flightsoftware.jhuapl.edu/files/2011/FSW11_Monaco.pdfFigure from: “GCC Front-End Internals”, Andi Hellmund, March 6 2011 int a;! Complex Data Structures? 11 FSW-11

Mining the Translation Unit

10 FSW-11 October 19-21, 2011

  Perl library GCC::TranslationUnit – Ashley Winters 2003  Reads the translation unit output and stores internally so it can easily be traversed

  Start at the root of the TU parse tree and follow the chained nodes   Find all the variable declarations (var_decl) – this gives all of the global

variable declarations.

Figure from: “GCC Front-End Internals”, Andi Hellmund, March 6 2011

int a;!

Page 11: FSW Workshop 2011flightsoftware.jhuapl.edu/files/2011/FSW11_Monaco.pdfFigure from: “GCC Front-End Internals”, Andi Hellmund, March 6 2011 int a;! Complex Data Structures? 11 FSW-11

Complex Data Structures?

11 FSW-11 October 19-21, 2011

 Complex data structures are also described using a tree in the Translation Unit.

  a[i].b.c[j].e[k].f everything ultimately terminates with a primitive type (leaf)

 We want the leaf items in each data structure and for every array entry.

  So we need to un-nest complex data structures and ultimately unroll arrays.

Page 12: FSW Workshop 2011flightsoftware.jhuapl.edu/files/2011/FSW11_Monaco.pdfFigure from: “GCC Front-End Internals”, Andi Hellmund, March 6 2011 int a;! Complex Data Structures? 11 FSW-11

Input to the Code Generator

12 FSW-11 October 19-21, 2011

 The product of mining the translation output is a data file that contains:  Regular expression used in the C code to match a line in the

state file with internal C variable name  The actual variable name with the array component “[ ]”

specifying size of the array   foo_array[100].boo_array[50].leaf

 The actual variable name with the array component filled in with loop indices starting with “i”

  foo_array[i].boo_array[j].leaf  Signedness, primitive type, bit length, min and max values

Page 13: FSW Workshop 2011flightsoftware.jhuapl.edu/files/2011/FSW11_Monaco.pdfFigure from: “GCC Front-End Internals”, Andi Hellmund, March 6 2011 int a;! Complex Data Structures? 11 FSW-11

Output of the Code Generator - Save

13 FSW-11 October 19-21, 2011

 The code generator is run for each source file that has state that should be maintained.

 A save function and a restore function specific to the particular file.

 “Save” function serially writes each leaf name and value to a specified file:  array_var[0].leaf = 6;! array_var[0].another_leaf = 3;! array_var[1].leaf = 7;! array_var[1].another_leaf = 2;!

Page 14: FSW Workshop 2011flightsoftware.jhuapl.edu/files/2011/FSW11_Monaco.pdfFigure from: “GCC Front-End Internals”, Andi Hellmund, March 6 2011 int a;! Complex Data Structures? 11 FSW-11

Output of the Code Generator - Restore

14 FSW-11 October 19-21, 2011

 “Restore” function takes the variable name string and value string in as an argument.  It attempts to match the input string against regular expressions that correspond to the variables it knows how to restore.

  Regular expressions created for each leaf while mining the TU file

 When it finds a match it performs any conversions/casts necessary and does the assignment.

Page 15: FSW Workshop 2011flightsoftware.jhuapl.edu/files/2011/FSW11_Monaco.pdfFigure from: “GCC Front-End Internals”, Andi Hellmund, March 6 2011 int a;! Complex Data Structures? 11 FSW-11

What about Function-Static Data?

15 FSW-11 October 19-21, 2011

  Function-static data cannot be referenced outside of the specified function – not by name

 Developed a C preprocessor that replaces each function-static declaration with a unique global declaration  <function_name>_<original_variable_name>

 Replace all references within the scope with the new name.

 Do this prior to the previously outlined process

Page 16: FSW Workshop 2011flightsoftware.jhuapl.edu/files/2011/FSW11_Monaco.pdfFigure from: “GCC Front-End Internals”, Andi Hellmund, March 6 2011 int a;! Complex Data Structures? 11 FSW-11

State File

16 FSW-11 October 19-21, 2011

Data structures un-nested

Arrays are unrolled

Organized by source file maps to a single restore function

Page 17: FSW Workshop 2011flightsoftware.jhuapl.edu/files/2011/FSW11_Monaco.pdfFigure from: “GCC Front-End Internals”, Andi Hellmund, March 6 2011 int a;! Complex Data Structures? 11 FSW-11

Coordinating Save/Restore

17 FSW-11 October 19-21, 2011

 The FSW is an event driven system  All tasks and applications pend on queues, pipes or semaphores.

 We’ve modified the OSAL for the simulator to be aware of the simulator state: PAUSED, RT, FASTER_THAN_RT

 The system can be paused in one of two ways 1.  During nominal execution – any attempt to get data on queues, pipes,

semaphores or delay operation gets blocked until the simulator returns from PAUSE.

2.  Pause the events that cause the system to run: data on queues, pipes, semaphores – allows tasks and applications to return to a “home” state. 1.  Used to prior to save/restore operations.

Page 18: FSW Workshop 2011flightsoftware.jhuapl.edu/files/2011/FSW11_Monaco.pdfFigure from: “GCC Front-End Internals”, Andi Hellmund, March 6 2011 int a;! Complex Data Structures? 11 FSW-11

Run-Time Performance

18 FSW-11 October 19-21, 2011

  Save operation takes < 1 second to generate 20MB of ASCII state

  Initial implementation of Restore process  Single task matching up an ASCII string to an internal variable using regular

expression.  Linear search across all internal variables until match found  Each leaf (and array indices) treated as a separate item to match against  Restore operation took >1hour

  Second implementation of Restore process  Four tasks (quad core processor) – each data point can be restore independently  Segregate the state file by source file (and Save/Restore function) - 85 source files  Still linear search, however, on 1/85 of the original domain  Arrays use wild cards for array indices.  Restore operation takes 3 minutes on 20 MB file

Page 19: FSW Workshop 2011flightsoftware.jhuapl.edu/files/2011/FSW11_Monaco.pdfFigure from: “GCC Front-End Internals”, Andi Hellmund, March 6 2011 int a;! Complex Data Structures? 11 FSW-11

The Save/Restore Code Generator Apparatus (Courtesy of Rube Goldberg)

19 FSW-11 October 19-21, 2011

from within the makefile - calls external scripts & tools

(Step 1) Create a redirect.c (includes file to be compiled)

redirect.c

(Steps 2 & 10) preprocess to get one big file

Preprocessed file 1

(Steps 3&4, 11&12) Parse: replace and rename function static

variables with global

Preprocessed file 3 (Step 5) Compile

and dump the translation unit

Translation Unit

(Step 6) Parse the TU, find all global data, get types,

unroll all nests, get leaf type info

Global data file

(Step 7) Remove entries specified in removals list

Prebuilt Removals

list

Global data file

(Step 8 & 9) Code Generator

<filename>_save_restore.c

(Step 13) Compile this time for the

object file

Object file with save/restore

functions

Page 20: FSW Workshop 2011flightsoftware.jhuapl.edu/files/2011/FSW11_Monaco.pdfFigure from: “GCC Front-End Internals”, Andi Hellmund, March 6 2011 int a;! Complex Data Structures? 11 FSW-11

Hybrid Approach

20 FSW-11 October 19-21, 2011

 Considering using both COTS and custom approach  COTS process-checkpointing could be used on a daily basis  Custom approach could be used to “dump state” to:

  View it - human readable   Carry state across minor releases of FSW

 Programmatically generate a core dump files during a save operation.

  Can be used as a back-up   If COTS product fails, if Custom approach does not contain all of the state

information. The core dump can be examined using gdb.

Page 21: FSW Workshop 2011flightsoftware.jhuapl.edu/files/2011/FSW11_Monaco.pdfFigure from: “GCC Front-End Internals”, Andi Hellmund, March 6 2011 int a;! Complex Data Structures? 11 FSW-11

Thank You

21 FSW-11 October 19-21, 2011

 Comments  Questions