fault detection, isolation and recovery design for micro

35
Fault Detection, Isolation and Recovery Design for micro-satellite avionics The 9 th ESA Workshop on Avionics, Data, Control and Software System(ADCSS), October 20~22, 2015,ESA/ESTEC,Noordwijk, Netherlands Jiang Lianxiang Associate Head of the Department of microsatellite electronic Engineering, Shandong Aerospace Electronics Technology Institute, CAST, China [email protected]

Upload: others

Post on 02-Oct-2021

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fault Detection, Isolation and Recovery Design for micro

Fault Detection, Isolation and Recovery Design for micro-satellite avionics

The 9th ESA Workshop on Avionics, Data, Control and Software System(ADCSS), October 20~22, 2015,ESA/ESTEC,Noordwijk, Netherlands

Jiang Lianxiang Associate Head of the Department of microsatellite electronic Engineering, Shandong Aerospace Electronics Technology Institute, CAST, China [email protected]

Page 2: Fault Detection, Isolation and Recovery Design for micro

Agenda

Introduction

Background

FDIR design for microsatellites

Some ideas for further Research

Conclusion

-2-

Page 3: Fault Detection, Isolation and Recovery Design for micro

CAST Key Member

of CASC

Organization

CGWIC

ALIT

CSCC

Commercial Trading Company

China’s Sole Satellite Operator

Defense Trading Company

CALT

CASC

AASPT AALPT SCAIC CAAET

SAST

Page 4: Fault Detection, Isolation and Recovery Design for micro

15 subsidiaries in 7 cities 3 Representative Offices Abroad

Lan Zhou 1809 people

Tianjin 1257 people

Beijing

16036 people

Yantai 1196 people

Inner Mongolia 583peeople

Shen Zhen 133 people

CAST Xi’an 2672 people

CAST industrial footprint

Paris

Munich

St Jose

Page 5: Fault Detection, Isolation and Recovery Design for micro

Core Business

Space

Science Exploration

Navigation

Manned Spaceflight

Remote Sensing

Telecommunications

Spin-off

CAST

Satellites Application

Elelctronic Information

Systems

Civilian Engineering Systems and Equipment

Page 6: Fault Detection, Isolation and Recovery Design for micro

Space Segments

We design, build and deliver end-to-end space systems:

168 spacecrafts delivered

88 spacecrafts in orbit

Providing worldwide customers with full range of space to ground solutions

Page 7: Fault Detection, Isolation and Recovery Design for micro

Turnkey Solutions

Telecommunications Prime contractor for over 30 telecommunications satellites

• Consultation

• Orbit Frequency Coordination Support

• System Design & Integration

• Satellite Manufacture

• Launch and In-orbit Operation Support

• KHTT

Mature DFH platform series for

telecommunication, broadcast and

customized missions

Reliable and customized payloads

Page 8: Fault Detection, Isolation and Recovery Design for micro

Remote Sensing Core Player for over 80 Remote Sensing Satellites

Space to Ground turnkey solutions Versatile LEO/SSO/GEO satellites and constellations High performance and innovative bus & payloads Reliable ground application system

Page 9: Fault Detection, Isolation and Recovery Design for micro

The 1st generation: 4 satellites

The 2nd generation:

16 satellites offers regional services for Asia-Pacific area by 2012.

35 satellites will offer global coverage by 2020.

Beidou navigation satellite applications

Navigation Prime constructor of China’s Beidou Navigation Satellite System

Intelligent transportation

Disaster Relief

Emergency Command & Control

National security

Precise Timing

Page 10: Fault Detection, Isolation and Recovery Design for micro

Prime contractor of China’s Lunar & Mars Exploration Program

Space Science Exploration A prime role in China’s Space Science Exploration Missions by delivering over 15 satellites

CAST carries out lunar exploration by three steps. Orbiting Orbiting around the moon Landing Soft-landing on the moon surface Returning Sample collection of moon surface and returning to the Earth

Page 11: Fault Detection, Isolation and Recovery Design for micro

China’s Manned Space Program is implemented by three steps.

STEP 2

STEP 3

STEP 1

Manned Space Flight

EVA, Space lab rendezvous & docking

70-ton Space station

Human Space Flight Founder of China’s Manned Space Program Successfully launched 10 Shenzhou spaceships and 1 Space Lab, will build China’s first space station

Page 12: Fault Detection, Isolation and Recovery Design for micro

High performance orbit control and propulsion subsystems

More than 90% self-manufactured Optical Camera

All types of Antennas

Reliable Self-developed Robotic Arm

Spacecraft Subsystems and Equipment Reliable supplier of spacecraft subsystems and equipment 100% subsystem and more than 90% equipments design and manufactured by CAST

Page 13: Fault Detection, Isolation and Recovery Design for micro

Satellite Applications

Satellite Applications

Electronic Information Systems and Products

Civilian Engineering Systems and Equipment

Civilian Engineering Systems and Equipment

Electronic Information

Systems and Products

Space Technology Applications

Page 14: Fault Detection, Isolation and Recovery Design for micro

Global Partners of CAST

CNES ISS DLR ESA Eutelsat SSC

SES TAS ADS Turksat

MDA COM DEV ITL

ASAL NASRDA NARSS Nigcomsat

ABE ABAE INPE CONAE

SUPARCO Arabsat NTS BASARNAS LAPAN BAKAMLA GISTDA MOSTI KACST Supremesat

Page 15: Fault Detection, Isolation and Recovery Design for micro

Shandong Aerospace Electronics Technology Institute is one of 15 subsidiaries of CAST. It deal with the space data system application, integrated avionics, computer application、TM&TC、Power control and distribution is several key filed.

Part of products:

Power Control Unit

Transceiver Integrated RTU

current limit protector

Pneumatic discharge

pressure valve

TMR OBC

Introduction-what we do

Page 16: Fault Detection, Isolation and Recovery Design for micro

Introduction

Yantai city lies in Shandong province of China, it’s a seaside city. It is famous for its sea food, wine(Zhangyu) and fruits( apple, pear, cherry, etal.)

-16-

Page 17: Fault Detection, Isolation and Recovery Design for micro

Agenda

Introduction

Background

FDIR design for microsatellites

Some ideas for further Research

Conclusion

-17-

Page 18: Fault Detection, Isolation and Recovery Design for micro

Background

microsatellite widely used in communication, remote sensing, reconnaissance,

mobile internet, etc. more and more microsatellites are sending into space in

the recent years.

quicker, better, cheaper guidelines for microsatellites

more autonomy, operate with some intelligence, operation state self-

awareness ;

flying in formation or constellation, on-board task scheduling, becoming

more reliable and fault tolerant.

FDIR technology could provide fault(failure) detection, isolation and recovery

mechanism in time. It makes satellites know their health state, so the satellites

need less ground station’s intervention and control and lower the operation

cost. FDIR could provides the quick testability, which accelerates the on-board

test.

Here, I would like to share some experience with the FDIR design and

development for avionics of microsatellites. -18-

Page 19: Fault Detection, Isolation and Recovery Design for micro

Agenda

Introduction

Background

FDIR design for microsatellites

Some ideas for further Research

Conclusion

-19-

Page 20: Fault Detection, Isolation and Recovery Design for micro

FDIR design for microsatellites

4 levels kinds of fault are defined and divided as follows:

Level 0: model level fault, such as single event upset of SRAM, these

faults could be detected by the BIT design and recovered by inner

redundancy design.

Level 1: equipments’ level fault; it could be detected by the BIT design of

equipments;

Level 2: system level fault, including CMU hardware or software fault and

subsystem’s primary parameters abnormal;

Level 3:system safety level fault, such as primary power bus abnormal,

which will affect the satellites’ safety, these faults will be detected by direct

alert signals and generate reconfigurable instructions directly by the FDIR

hardware model.

-20-

Page 21: Fault Detection, Isolation and Recovery Design for micro

FDIR design for microsatellites

FDIR design architecture

-21-

BIT, Fault-redundant design , watch dog etal is widely used in the FDIR design for

Level 0 ~ Level 2.

Urgent !!!

EquipmentUNIT

Central Data Management Unit

Fatal Failures signal detection and recovery

Alert and Safe ModeLevel3

Software WatchDog

Hardware WatchDog

BusInterface N

BusInterface R

Level2 Vital Function Alarms & on-board computer integrated diagnosis

Satellite Distribute Interface UnitHealth status

check

BusInterface N

BusInterface R

Health status check N

Health status check R

Health status check

Self-fault tolerate mechanism

EquipmentUNIT

UNIT N UNIT N

Level 1

Level 0

Communication bus

Page 22: Fault Detection, Isolation and Recovery Design for micro

FDIR design for microsatellites

FDIR Level 3 design The functions of the FDIR module is as follows: (1)Detection of the system alarm signal, including discrete signal (such as OBC or RTU’s watch dog signal) and analog signal such as primary power bus, payload current etal. (2)Generate instructions for fault recovery when enabled, the instruction is reconfigurable. (3)Safe Guard Memory: store the important operation parameters of satellites avionics, such as uploading instructions, on-board time, when the backup OBC or RTU switches on, it can get the current operation parameters; (4)On-board timer, it works as the backup on-board timer; (5)Mass memory, monitor and store the CAN bus data, provide LVDS interface(10Mbps) to data transceiver. These data is very useful for ground engineer to analyze the satellites on-board operation state, especially for fault isolation.

-22-

Page 23: Fault Detection, Isolation and Recovery Design for micro

FDIR design for microsatellites

FDIR Level 3 design

Discrete signal 1

Fault detection

logicOBT

SGM

Analog signal 1

Comparator

Analog signal n

……

Discrete signal n

Threshold value 1

Threshold value n

…… Instruct

-ion decoder

Programmable instruction sequence

Instruction generator

Microprocessor

OC instru-ctiondriver

Alert signals

FPGAOscillator

ADC

CANCAN

Mass Memory(Flash)

Comparator

Flash controller

LVDS to data transceiver

-23-

Page 24: Fault Detection, Isolation and Recovery Design for micro

FDIR design for microsatellites

FDIR module performance: Processing ability: ≥2MIPS PROM:32KB EEPROM:32KB SRAM:32KB Alarm signal input channels:8 OC instruction output:16 Enable: controlled by instruction; SGM:256KB; Bus Interface: CAN 2.0B LVDS interface: 10Mbps On-board timer: Sec:32bit,SubSec:16bit Mass Memory: 2GB Mass:≤ 0.6kg Power:≤1.5W

-24-

Page 25: Fault Detection, Isolation and Recovery Design for micro

Built-in-test(BIT) helps to implement fault detection and isolation. I. Module level BIT

BIT should be able to carry out real-time monitoring of the key parameters of the product, including work voltage and current. Generate and send the corresponding fault code.

(1) Power up BIT ROM memory check

The boot program and the application program is stored by three copies, the three copies are compared with each other before running. SRAM memory check Inner voltage signal check

(2) Periodic BIT external interface monitoring, such as CAN, RS422, etal;

OBC monitors the periodic communication with other nodes SRAM memory check Inner voltage signal check

FDIR design for microsatellites

-25-

Digital cross m

Smart controller

Input signal

BIT input signal

Control signal

待测电路待测电路UUT

待测电路待测电路UUT

Communication bus

Signal measure circuit

Output signal

BIT output signal

Page 26: Fault Detection, Isolation and Recovery Design for micro

II. System Level BIT System level BIT is a fast system level test

when system integration or on-board test. System level BIT is mainly realized by the hierarchical design and based on the underlying module BIT design. The on-board computer act as the system level bit sponsors and organizers, and is triggered into test mode by the system level BIT instruction. In the test mode, OBC sends test instruction to other module and carry out the module level test one by one. The system BIT controller collects each module level test data and develop fault detection, localization and comprehensive diagnosis.

FDIR design for microsatellites

-26-

System level BIT Controller

Smart Comprehensive

diagnosis

Module level

Module level BIT Controller

UUT 1 UUT 2 UUT n……

Module level

Module level

Page 27: Fault Detection, Isolation and Recovery Design for micro

Agenda

Introduction

Background

FDIR design for microsatellites

Some ideas for further Research

Conclusion

-27-

Page 28: Fault Detection, Isolation and Recovery Design for micro

Some Ideas for Further Research

How to lower the false alarm rate during BIT design BIT is widely used in the aircraft and automobile avionics design, now we

employ BIT design in the on-board avionics. With the high reliability and rigorous space environment, We have to solve the false alarm problem.

Firstly, the reason for false alarm is summarized here:

Uncertainty of UUT Model

Environment stress

False Alarm

Detection Algorithm

Uncertainty of Measure Data

Several measures will be adopted in the following research:

FPGA Current

(mA

Total Dose: Krad(Si)

-28-

Page 29: Fault Detection, Isolation and Recovery Design for micro

Some Ideas for Further Research

I. Characteristic parameters selection Firstly, the equipment structure is divided

according to the operating principle and function, and the analysis layer is defined and FMEA of each module is carried out to get fault mode sets and fault attributes. Then the key monitoring component is selected. Secondly, Environmental stresses are analyzed based on environmental profile. With the help of prior information, such as test data, historical knowledge, and failure analysis, fault mechanism is analyzed and summarized. Finally, according to the failure physics model of components, Key Characteristic Parameters can be selected and its exact model could be setup.

-29-

Aerospace avionics

Function module division

Analysis layers definition

FMEA

Key monitor component selection

Monitor component 1

Monitor component 2

Monitor component n Test dataEnvironmental

stress measure

Fault mechanism analysisEnvironmental stresses analysis

Historical knowledge

Characteristic parameters 1

Characteristic parameters 2

Characteristic parameters n

Page 30: Fault Detection, Isolation and Recovery Design for micro

II. Fault detection methods: Character level information fusion is proposed to use in the fault detection process. Threshold check, varing trend character and statistical character is fused to make the

decision—fault or not.

Some Ideas for Further Research

-30-

Parameter 1 sampling

Parameter 2 sampling

Parameter n sampling

Varing trend character

Varing trend character

Varing trend character

Statistical character

Statistical character

Statistical character

Character level

Information fusion

UUT Fault Alert

sample rate varing

Recognize varing trend character

Recognize statistical character

Information fusion

Threshold

t0 t1

t0 t1

t0 t1

Time Stress Measurement

Time Stress effect analyze

Threshold

Threshold

Page 31: Fault Detection, Isolation and Recovery Design for micro

III. Fault Isolation methods: As we all know, faults can transmit from one to another, a system level fault may be accused by a sub-layer fault. If the system level fault is detected, we should find the initial fault that reduced the detected fault. Petri nets (PNs), coined by Carl Petri , are an adaptive, versatile, and yet simple graphical modeling tool for representing dynamic systems. PNs have successful applications in the reliability modeling of various systems. Petri-net is selected to model the fault relationship among faults and the transmitting character of fault. But, Petri-net couldn’t describe the transient state, which may cause false alarm. So we propose to add a Warning state in the Petri-net model to describe transient state.

Initial

Warning

Fault

Fire condition (Original Petri-net

Model)

Warning condition Fire

conditionFalse Alarm

Some Ideas for Further Research

The Proposed new Petri-net Model

p1

p3p2 t1

Petri-net Model

Place

Transition Place

-31-

Page 32: Fault Detection, Isolation and Recovery Design for micro

Some Ideas for Further Research

IV. Fault Recovery: Reconfiguration is one of the most important solution for fault recovery. But now, the reconfiguration is very limited for on-board avionics. Generally speaking, the on-board avionics is a distributed processing system. if the OBCs have powerful processing ability, the software component could be deployed on different OBC dynamically, which will provide flexible reconfiguration ability. The software component running on a failed OBC could be deployed on another one, so the failed OBC don’t affect the system’s function.

-32-

Processor module

1Fault

Processor module

2Fault

Software module

Backplaneinterconnect

The software module running in

processor 1 of equipment A

Software module

The software module running in

processor 2 of equipment A

Inner Reconfigure

Processor module

1Fault

Processor module

2Fault

Software module

The software module running in

processor 2 of equipment B

System level reconfigure

Equipm-ent A

Equipm-ent B

System Bus

Backplaneinterconnect

Page 33: Fault Detection, Isolation and Recovery Design for micro

Agenda

Introduction

Background

FDIR design for microsatellites

Some ideas for further Research

Conclusion

-33-

Page 34: Fault Detection, Isolation and Recovery Design for micro

In the development of the avionics for microsatellites, we proposed a hierarchical FDIR architecture and the FDIR hardware module was developed for the fatal fault detection and recovery. It also store the vital operation parameters for state recovery when the OBCs switch on or off. It also monitor and store the CAN bus data and provide a high speed channel to the transceiver, so that more data could be transmit down to ground-station. some ideas for the next following research is proposed: parameters selection based on FMEA character level information fusion for fault detection a new Petri-net model is proposed for transient state modeling reconfiguration based on software component dynamic deployment

Conclusion

-34-

Page 35: Fault Detection, Isolation and Recovery Design for micro

Thanks for attention!

[email protected]

-35-