temporal convolutional network based regression approach

10
Temporal Convolutional Network Based Regression Approach for Estimation of Remaining Useful Life Rongze Li 1, Zhengtian Chu 1, Wangkai Jin 2, Yaohua Wang 3 , Xiao Hu 3* , Abstract—Remaining Useful Life (RUL) is an essential factor in the Prognostics and Health Management (PHM) field. A reliable and accurate RUL estimation of the condi- tion monitoring data could maximize system performance and reduce maintenance costs. Recently, with a surge of interest in deep learning (DL) and the rise of com- putational power, many state-of-the-art neural networks have been introduced in the PHM field. However, the previously proposed networks have drawbacks in handling sequential tasks. For example, the widely-used Recurrent Neural Network (RNN) and Long Short-term Neural Network (LSTM) have long-term dependency problems and gradient vanishing problems. In this paper, we adopt the Temporal Convolutional Network (TCN), which excels in sequential data processing and avoid potential problems shared by the aforementioned models. We have leveraged TCN on the C-MAPSS Dataset from NASA to examine its performance in RUL estimation. Our experiments result shows that TCN outperforms all the previous proposed neural networks for RUL estimation, which indicates the potential of TCN applications in the PHM field. Index Terms—Alternative to RNN, Prognostics Health Management, Remaining Useful Life, Temporal Convolu- tional Network. I. I NTRODUCTION Prognostics and Health Management (PHM) is a disci- pline that mainly focuses on studying the failure mech- anism of a system. Applying PHM methodologies on manufacturing/industrial systems could release the full potential of a system while guaranteeing systems’ safety by spotting potential faulty components at an early stage. By processing sensor data as input, PHM approaches could predict a system’s or a component’s rest working 1 Rongze Li and Zhengtian Chu are with the University of Notting- ham. 2 Wangkai Jin is with the University of Nottingham, Ningbo, China. 3 Yaohua Wang and Xiao Hu are with the National University of Defense Technology. * The corresponding author is Xiao Hu, Email: [email protected]. Work is done during internship in National University of Defense Technology time until failure. Examples of several implementations in the industry world are [1] and [2]. The prediction time refers to the remaining useful life (RUL), which is essential in fault detection and maintenance decision- making. RUL has already served as one of the standard cri- teria in the industry field and many experts have been striving to enhance the accuracy of predicting RUL in diverse scenarios. There are mainly three types of model approaches of RUL estimation, which are model-based prognostics, data-driven prognostics and hybrid approach [3]. Model-based prognostics emphasize the implemen- tation of physical models, which can achieve on several levels(e.g., micro or macro levels). It has a compelling performance in scenarios where the degradation mecha- nism is determined and failure thresholds can be defined. However, time and cost are two obstacles for researchers to implement or reproduce these models. Data-driven prognostics, which use sensor data to simplify the im- plementation and lower costs, are applied more in the sequence learning process. It could further be further categorized into statistical and classical Machine Learn- ing (ML) approaches. With a surge of interest in Deep Learning (DL), researches have applied many state- of-the-art deep learning neural networks in the PHM field. The general advantages of DL over ML in data- driven approaches are: 1) deeper network architecture contributes to more precise feature extractions. 2) DL performs better in processing temporal data. 3) advanced ability to handle a large amount of high-dimensional data. Therefore, data-driven approaches have strength in predicting systems’ dysfunction by using run-to-failure data. Thus, they have a wider confidence interval than model-based prognostics, which means its prediction is more assured, despite uncertainty in prediction error, degradation changes and human operations, etc.. Hybrid approaches incorporate the advantages of both aforemen- tioned prognostic approaches and its practice is more inclined to a real-world situation. Recently, researchers have been exploring data-driven

Upload: others

Post on 28-Mar-2022

7 views

Category:

Documents


0 download

TRANSCRIPT

Rongze Li1†, Zhengtian Chu1†, Wangkai Jin2†, Yaohua Wang3, Xiao Hu3∗,
Abstract—Remaining Useful Life (RUL) is an essential factor in the Prognostics and Health Management (PHM) field. A reliable and accurate RUL estimation of the condi- tion monitoring data could maximize system performance and reduce maintenance costs. Recently, with a surge of interest in deep learning (DL) and the rise of com- putational power, many state-of-the-art neural networks have been introduced in the PHM field. However, the previously proposed networks have drawbacks in handling sequential tasks. For example, the widely-used Recurrent Neural Network (RNN) and Long Short-term Neural Network (LSTM) have long-term dependency problems and gradient vanishing problems. In this paper, we adopt the Temporal Convolutional Network (TCN), which excels in sequential data processing and avoid potential problems shared by the aforementioned models. We have leveraged TCN on the C-MAPSS Dataset from NASA to examine its performance in RUL estimation. Our experiments result shows that TCN outperforms all the previous proposed neural networks for RUL estimation, which indicates the potential of TCN applications in the PHM field.
Index Terms—Alternative to RNN, Prognostics Health Management, Remaining Useful Life, Temporal Convolu- tional Network.
I. INTRODUCTION
Prognostics and Health Management (PHM) is a disci- pline that mainly focuses on studying the failure mech- anism of a system. Applying PHM methodologies on manufacturing/industrial systems could release the full potential of a system while guaranteeing systems’ safety by spotting potential faulty components at an early stage. By processing sensor data as input, PHM approaches could predict a system’s or a component’s rest working
1Rongze Li and Zhengtian Chu are with the University of Notting- ham.
2Wangkai Jin is with the University of Nottingham, Ningbo, China. 3Yaohua Wang and Xiao Hu are with the National University of
Defense Technology. ∗ The corresponding author is Xiao Hu, Email:
[email protected]. † Work is done during internship in National University of Defense
Technology
time until failure. Examples of several implementations in the industry world are [1] and [2]. The prediction time refers to the remaining useful life (RUL), which is essential in fault detection and maintenance decision- making.
RUL has already served as one of the standard cri- teria in the industry field and many experts have been striving to enhance the accuracy of predicting RUL in diverse scenarios. There are mainly three types of model approaches of RUL estimation, which are model-based prognostics, data-driven prognostics and hybrid approach [3]. Model-based prognostics emphasize the implemen- tation of physical models, which can achieve on several levels(e.g., micro or macro levels). It has a compelling performance in scenarios where the degradation mecha- nism is determined and failure thresholds can be defined. However, time and cost are two obstacles for researchers to implement or reproduce these models. Data-driven prognostics, which use sensor data to simplify the im- plementation and lower costs, are applied more in the sequence learning process. It could further be further categorized into statistical and classical Machine Learn- ing (ML) approaches. With a surge of interest in Deep Learning (DL), researches have applied many state- of-the-art deep learning neural networks in the PHM field. The general advantages of DL over ML in data- driven approaches are: 1) deeper network architecture contributes to more precise feature extractions. 2) DL performs better in processing temporal data. 3) advanced ability to handle a large amount of high-dimensional data. Therefore, data-driven approaches have strength in predicting systems’ dysfunction by using run-to-failure data. Thus, they have a wider confidence interval than model-based prognostics, which means its prediction is more assured, despite uncertainty in prediction error, degradation changes and human operations, etc.. Hybrid approaches incorporate the advantages of both aforemen- tioned prognostic approaches and its practice is more inclined to a real-world situation.
Recently, researchers have been exploring data-driven
approaches, especially DL-based approaches to en- hance prediction accuracy. Deep learning methods have high performance in processing high-volume and high- dimensional Time Series Data (TSD). This type of data requires sequence modeling to solve different problems and many deep learning networks are capable of achiev- ing that. However, these networks still face various drawbacks which are caused by the flawed architec- ture design. Networks like CNN[4], which has strength in feature extraction, performs poorly in keeping time coherence. While networks like RNN[5], LSTM[6] are proposed to solve the problem. they have other problems such as gradient vanishing and longer execution time.
Towards these challenges, this paper applied the Tem- poral Convolutional Network (TCN)[7] in RUL estima- tion to address these problems. The TCN model incorpo- rates the strength of causal convolution, residual connec- tion, and dilation convolution, following a convolutional neural network (CNN) framework for sequence model- ing. It excels in solving gradients vanishing/explosion problems, fastening training epochs, and changing the receptive fields flexibly. Experiment results conducted on the C-MAPSS dataset provided by NASA [8] shows the exceptional performance of our proposed work. A systematic study performs to test TCN’s effectiveness among DL-based approaches, and Its result shows that TCN has superior performance in obtaining high accu- racy of faults prognostics with less time.
The rest of the paper organizes as follows. Section II introduces the key features and evaluation of the TCN model; Section III presents the experiment study and evaluation of different network architectures on C- MAPSS data; Section IV shows the discussions and future work plans; Section V provides the conclusion of the paper.
II. TEMPORAL CONVOLUTIONAL NETWORK FOR
RUL ESTIMATION
A. Brief Introduction of TCN
Convolutional neural network (CNN) is a classical neural network which is good at image processing based on its excellent feature extraction capability. At present, CNN has widely used in many fields, such as face recognition, automatic driving, and security. Neverthe- less, there was no mature CNN model applied in timing problems until the advent of the temporal convolutional network (TCN) proposed by [7]. TCN has shown great ability in solving sequence problems and it could be used as a better alternative of RNN/LSTM in such problems.
The following sections will illustrate its working princi- ples and main advantages.
B. The principle of TCN
Generally speaking, TCN has two main characteris- tics. Firstly, it maintains a causal relationship between each layer of the network, which means that the con- volution output of a layer t is determined solely on the convolution result of layers before t. Thus, the data coherence and time coherence are better protected than the limited historical information storage and possible data absence of LSTM’s memory cell. Secondly, the architecture of this model can be flexibly adjusted to any length. It can also be mapped according to several interfaces required by the output, which is similar to the RNN framework. Compared with the traditional CNN network structure, TCN adds four core parts in the de- sign: sequence modeling, causal convolutions and dilated convolutions, and residual connections. This section will introduce the architecture and working principle through these four parts.
1) Sequence Modeling: A simple sequence modeling task is used to illustrate the sequence modeling char- acteristics of TCN. Assuming that the input sequence i0, ..., iT is given, and it requires predicting the specific outputs O0, ..., OT at every step. Following the require- ments, the model should predict the corresponding out- put O0 at a particular time point t. The key constraint of sequence modeling is that the output at time t should be generated by exactly the recorded inputs before time t instead of the post-positional information, which follows the sequence of data flow. The one-to-one mapping from it to yt of sequence modeling network could be simply expressed as:
O0, ..., OT = f(i0, ..., iT ) (1)
After the prediction, it is necessary to establish a corre- sponding evaluation mechanism to evaluate the quality of the prediction results and control the whole training procedure like the equation below. C-MAPSS Data Set accomplish the demand of sequential tasks and are suit- able to implement TCN architecture due to the features of datasets.
Fig. 1. An example of causal convolutions.
2) Causal Convolutions: After the introduction of the sequence modeling above, two principles of TCN are summarized. First, the length of output after model prediction will always remain the same as the input length. Second, the TCN remains invisible to ’future’ information and always depends on the previous inputs to complete the prediction. To maintain the first principle, the TCN utilizes the 1D fully-convolutional network (FCN)[10]. The core idea of FCN is adopting the zero- padding method to guarantee each output layer keeps the same length and width as the input layer in the propagation of the network. As for the second principle, TCN utilizes causal convolutions to prevent future in- formation leakage. Causal convolutions are abstracted to predict current output yT depending on previous inputs x0, ..., xT and previous layers’ output y0, ..., yT−1 to make yT approach to the actual value.
Fig. 2. An example of dilated convolutions.
3) Dilated Convolutions: Although those above causal convolutional structure is feasible to prevent fu- ture information leakage, it increases the number of layers in the network and keeps extremely long histor- ical information sequences simultaneously. As Figure 1 shows, the signed output in the upper right corresponds to five perceptive fields (5 black balls in the input sequence), and it is obtained through five layers. It shows that the size of the receptive field has a positive linear
correlation with the depth of the network, which may burden the learning process. To simplify the network and relieve memory storage pressure, TCN applies dilated convolutions[11] on the network and forms an exponen- tial correlation between the size of the receptive field and the number of layers[12]. The following equation can demonstrate the principle:
F (s) = (x ∗d f)(s) = k−1∑ i=0
p(i) · xs−d·i (2)
where d is the dilation factor, k is the filter size, and s − d · i means convoluting only the former state. x is the sequence input and f : {0, ..., k−1} is the filter. The operation F takes the inputs s to complete convolutions using a fixed step between every two adjacent filter taps. Figure 2 shows the different dilated convolutions when d is 1, 2, 4 respectively, the whole architecture of the network becomes dilated and includes less historical data. Therefore, this method can keep a large perceptive field with fewer layers and simplify learning tasks.
4) Residual Connections: The fast track in ResNet[16] enables the model to learn the difference information, which effectively allows the network to modify the identity mapping to avoid gradient vanishing and gradient exploding problem in the deep-layer model. For TCN, if the model needs to record a large amount of historical information, the final receptive field could be vast and the network could become extremely deep. Hence, TCN adopted residual connections to reduce network depth. Each residual block module consists of two layers of residual convolutions, ReLU[17] and batch normalization. Weight normalization is adopted for batch normalization operation. In addition, spatial dropout[18] is added after the activation function. An illustration of detailed residual block construction is in figure 3.
Fig. 3. The profile of one residual block in TCN.
Figure 4 shows the sample residual block of TCN with 3 kernel size and 1 dilated factor as below.
Fig. 4. A sample residual block of TCN with 3 kernel size and 1 dilated factor.
C. Advantages and Disadvantages
This section demonstrates the strengths and weak- nesses of TCN.
1) Advantages: (1) CNN can conduct convolution operations in paral-
lel. Therefore, TCN can preserve long-term memory in both training and validation.
(2) Gradient stable TCN has a different back- propagation path from the sequence time direction, which avoids the gradient exploding and gradient vanishing problems in deep-layer networks com- pared to the RNN.
(3) The TCN can possess a sizeable perceptive field under the condition of shallow layers. Therefore, TCN can be more flexible in the model’s memory size, and it is easy to migrate to other fields
(4) The TCN can accept any length of input sequence by sliding one-dimensional convolutional kernels. Therefore, it is flexible to be utilized on distinct tasks.
2) Disadvantages: (1) To maintain the long-term memory and generate the
predicted result, the TCN needs to occupy more memories during the testing phase.
(2) When TCN migrates to different fields, the require- ment of historical length and perceptive field will be distinct. Hence, migration operations could result in a weak expression of the TCN model.
III. EXPERIMENTS
A. Dataset Description
The NASA C-MAPSS Data Set selected in this ex- periment is widely used in the research of remaining useful life prediction. It has four sub-datasets; each sub- dataset contains a different number of turbofan engine
performance in several conditions and fault modes. Each sub-dataset is also divided into a train set and a test set of multiple multivariate time series. Each row of the data set describes a series of data for different turbofan engines in a life cycle. The first column represents the engine ID; the second one represents the current operating cycle; 3- 5 columns are three operation settings and column 6-26 records are the 21 sensors’measurements[13].
B. Data Preprocessing
1) Feature Selection: There are 21 sensor measure- ments and three setting values can choose. However, among these values, some of them have no apparent fluctuations which contribute little to the experiment. To reduce the complexity of the experiment and to shorten training time, this work ignored these values. Therefore, these following features have been discarded in FD001: three setting values and sensor 1, 5, 10, 16, 18, 19. The remaining number of features is 15.
2) Data Normalization: This paper normalized all the selected features to 0 through standard score normaliza- tion. Let µi be the mean of i-th feature in the correspond- ing data set, σi be corresponding the standard deviation, xi is the data to be normalized and xi
′ represents the
xi ′ = xi − µi σi
(3)
Fig. 5. Piece-wise RUL of C-MAPSS data set (Maximum limit RUL is 125 time cycles)
3) Training Label Calculations: If RUL is calculated directly from the current cycle time and the total cycle time, a linear relationship will be obtained. It may be inaccurate because when the engine works at a startup, the performance degradation can be negligible or even no attenuation. When the engine runs for a particular time, it will show a relatively apparent downward trend. Therefore, using a linear relationship to calculate the RUL will overestimate the RUL at the beginning of
engine use. This paper chose a piece-wise linear function [14] [15] to handle the training labels, as Figure 5 shows. The maximum limit of RUL was set to 125 time-cycles in this experiment.
(a) FD001 example (b) FD002 example
(c) FD003 example (d) FD004 example
Fig. 6. Sample sensor data in four training datasets (sensor2, 3, 4, 12)
4) Data Visualization: To build an efficient and practical model requires the ability to discover potential patterns in data. Figure 6 shows the fluctuation of the values of one example sensor from 4 datasets over the cycle time. It is noted that the value fluctuation of these sensors in FD001 and FD003 is much less than that in FD002 and FD004. In FD002 and FD004, the operation setting values can be clustered into 6 clusters[14], which produces six different kinds of work conditions. When the engine is working, the operating condition is continuously switching and fault modes may happen, which leads to significant fluctuations in the data and makes the prediction work more difficult. This phenomenon also implies that the traditional CNN can not meet the requirements well when building the training model of C-MAPSS Data Set. It needs the intervention of the sequence model to solve the problem of operating condition switching and fault modes.
C. Experimental Network Architecture
TCN is the proposed network that has shown high
performance in sequence modeling tasks in diverse fields. Towards testing its advantages introduced in Section II and verifying its feasibility in RUL estimation, TCN is applied on the available C- MAPSS dataset and compare its performance with other innovative network models.
• Long short-term Memory LSTM is a representative and effective network of the RNN family, a group of networks famous for its strength in processing sequence modeling tasks. By introducing a memory cell to store historical data, LSTM could remember the sequence information longer than RNN which mitigates the gradients vanishing or explosion problems to some extent.
• One Dimensimsonal Convolutional Neural Net- work 1DCNN is a typical CNN architecture widely ap- plied in the field of signal verification and natu- ral language processing. The critical advantage of 1DCNN is that it works well for analysis of time series data because its one-dimensional kernel could fully extract the feature of input sequence data by scanning it thoroughly from start to the end.
• Deep Convolutional Neural Network DCNN is also a convolutional neural network model that is good at feature extraction. It was first in- troduced in fault diagnosis and prognosis by [14]. Based on their pioneer work, a modified DCNN is used to compare the performance with TCN.
TABLE I THE CONSTRUCTION DETAILS OF TCN LAYER.
Parameters Value
Number of filters 128
Padding ’causal’
Drop rate 0.5
Batch normalization True
Layer normalization True
2) Network Architecture setting: Detailed network settings are introduced below: 1) In Table I, the TCN layer is illustrated. The whole TCN structure utilizes
three stacks of residual blocks. The dilated factors are set to 1, 2, 4, 8, 16, and 32 respectively in each residual block. Additionally, data pruning, batch normalization and layer normalization were added to improve the generalization ability of the TCN model. The other three networks are exhibited in Figure 7; 2) For LSTM training, the model designs five hidden layers, including three LSTM layers and two fully connected layers. Finally, there is a 1-dimensional output layer. 3) Five one dimensional convolutional layers constructed the whole 1DCNN network, three one dimensional pooling layers, one flatten connection layer and two fully connected layers. Some normalized layers like batch normalization layers and drop out layers were also added, but they are not shown in Figure 7; 4) The CNN architecture includes six convolutional layers and three pooling layers. The flatten layer and fully connected layers were put at the end of the network; All of the settings were obtained in many times experiments with high-quality results. Each model was trained with the same learning rate (0.001), batch size (512) and epochs (200).
(a) DCNN (b) LSTM (c) 1DCNN
Fig. 7. Brief illustration of DCNN, LSTM and 1DCNN’s network architecture
D. Evaluation of Experimental Results
1) Evaluation Methodology:
n∑ i=1
hi 2 (4)
This paper chose Root Mean Square Error (RMSE) function for evaluating the RUL estimation. It can measure the deviation between the observed value and the real value which is often used as the standard to measure the prediction results of such models. In training, the woek selected Mean Square Error (MSE) as the loss function. However, since the results of MSE often reach thousands or even tens of thousands, it is difficult for describing data. RMSE is the root of MSE, which can better describe the model appearance without affecting the results.
S =
t=1(e ht 10 − 1), when ht ≥ 0
(5)
Moreover, a score function provided by [15] also occurs in the evaluation metric for RUL estimation. Let n be the total number of samples in a test set, ht = ˆRULt − RULt ,which means the estimated RUL value minus actual RUL value. For RMSE, when the model overestimates or underestimates RUL, the curve of the penalty term is symmetric. Unlike RMSE, when the model overestimates RUL, that is, ht ≥ 0, the penalty rises faster than when ht < 0. If the RUL is overestimated, the engine will remain running after the end of the work cycle, which may cause system failure and lead to more serious consequences. Therefore, a greater penalty is imposed on the engine in this case.
Fig. 8. Training loss curve of TCN in FD001
2) Experimental Results and Evaluation: Figure 8 displays the training loss curve. It is observed that TCN reduces the loss sharply in the first 40 epochs and keeps the curve flattened after about 70 epochs. According to the metrics introduced in the previous function, a
TABLE II EXPERIMENTAL RESULTS FOR DIFFERENT LEARNING METHODS
Model FD001 FD002 FD003 FD004
RMSE Score RMSE Score RMSE Score RMSE Score
TCN 11.58 195.1 14.67 1020 12.67 228.2 17.00 1810
LSTM 12.52 291.7 21.42 3897 13.54 347.3 24.21 5806 1DCNN 13.56 326.1 21.01 3800 13.75 378.2 22.72 4178 DCNN 14.41 357.4 23.74 4050 14.00 484.1 24.23 5293 DBN[9] 15.21 417.59 27.12 9031.64 14.71 442.43 29.88 7954.51 MLP[9] 16.78 560.59 28.78 14026.72 18.47 479.85 30.96 10444.35 SVM[9] 40.72 7703.33 52.99 316483.31 46.32 22541.58 59.96 141122.19
(a) Test unit 21 (b) Test unit 24
(c) Test unit 34 (d) Test unit 81
Fig. 9. Comparison between four engines life-time RUL prediction results and the actual RUL.
network with a lower score performs better in RUL estimation. In this experiment, TCN obtains the lowest score among all seven deep learning neural network models. Table II displays the final RMSE and scores. It is worth noticing that TCN improves each dataset’s scores by about 33%, 73%, 34% and 68% compared with the scores achieved by the second-best model, LSTM. To take an insight into TCN’s prognostics, in Figure 9 four test units are selected to compare the predicted value with actual RUL, whose unit numbers are 21,24,34 and 81. Overall, the predictions on these four test units fit the real value well despite slight fluctuations during about 50 cycles in each test unit’s training time cycles. From these four figures, it is conspicuous that the predicted RUL is relatively close to the actual values and the estimated values almost form a linear degradation following the curves of actual values.
Moreover, the improvement of scores on each dataset
varies greatly. Scores of FD002 and FD004 have a dra- matic leap compared to that of FD001 and FD003. The reason for it could be that FD002 and FD004 have more densely fluctuating data which could be better processed by TCN rather than LSTM. Whereas the overall scores of FD002 and FD004 are still much higher than the scores of FD001 and FD003, which might be caused by the more massive and complex data in FD002 and FD004 to enhance the difficulties of training. The following section will demonstrate the detailed comparisons between these networks.
3) Comparison between networks: In this section, the experimental results will be evaluated and analyzed in the following two perspectives.
• Comparison between 1DCNN and DCNN It can be observed from Table II, the RMSE results of 1DCNN reduce about 6%, 11%, 2%, and 6% respectively on four sub-datasets compared with the results of DCNN. Simultaneously, the score accuracy increase by about 9%, 6%, 22%, and 21% respectively. 1DCNN optimize the training effect slightly based on DCNN, which indicates that the fully sliding of one-dimensional kernels can learn sequential features to some extent. Hence, 1DCNN could also be a choice to be applied in some simple sequence modeling tasks.
• Comparison between TCN and 1DCNN Although 1DCNN has showed its strength in pro- cessing sequential data, further improvements still exist according to Table II. As TCN applies 1DCNN in its architecture, the comparison between TCN and 1DCNN indicates the effectiveness of dilated convolutions and residual connections.
• Comparison between TCN and RNN This section will compare the TCN with the RNN according to each network’s experimental results and principles.
It is worth mentioning that the comparative trial was completed to verify whether TCN can replace RNN in the PHM field. According to Table II, the RMSE results of TCN reduce about 8%, 32%, 6%, and 30% respectively on four sub-datasets compared with the results of LSTM. Simultaneously, the score accuracy increase by around 33%, 73%, 34%, and 68% respectively. The improvement of the evaluated results is dramatic for FD002 and FD004 but inap- parent for FD001 and FD003. This phenomenon is still caused by the differences in the data volatility in each data set, where FD002 and FD004 are more difficult sequence modeling tasks. TCN has more advantages in processing sequence modeling tasks according to the result. The reasons for RNN failure will be explained in the following contents. For RNN, two main problems are unstable gradient and non-parallelism. To solve the first problem, TCN applies residual connections to pass informa- tion in the useful blocks and skip useless blocks to deepen the network layers. An experiment was designed to compare the gradient’s stability in the TCN and RNN network model. In the testing procedure, the test loss fluctuations reflected the changes of stability. Based on the FD002 data set training task, the network depth could be increased by adding residual blocks for the TCN model or adding LSTM layers for the LSTM model. Fig- ure 10 shows the negative impact of the unstable gradient. For LSTM, the test loss kept decreasing until adding three LSTM layers, the gradient van- ishing occurred when adding five layers, and the trained model’s accuracy declined. Moreover, for TCN, the training results were more accurate with the increase of block numbers, which means the residual connections effectively stabilize gradients. Secondly, non-parallelizability dramatically slows down the training speed of the LSTM network and consumes too much computing resources. This limitation also reduces the expression ability of the LSTM model. Nevertheless, TCN can perform parallel computing and reduce low memory con- sumption during training because of the causal convolution. Generally, the experiment provides a new idea and enlightenment for the application of TCN and shows that TCN is promising to break the monopoly of RNN and even replace RNN in more research and industrial fields.
Fig. 10. The change of test loss with increased residual blocks for TCN and LSTM layers for LSTM based on FD002 data set.
IV. DISCUSSION AND FUTURE WORK
This work demonstrates significant advantages of TCN in remaining useful life estimation. Compared with other network architectures like RNN, CNN and LSTM’s ap- pearance in training C-MAPSS, TCN has lower training loss and faster loss convergence rate. This kind of advantage should be more widely used in the PHM field and extended to other fields.
A. Extension on the Application of TCN on PHM
In addition to fault detection of mechanical compo- nents such as the engine, TCN has a broad application prospect in other PHM fields. For example, in some signal system of railway transit, train delay caused by the failure of trackside equipment and the signal system often occurs[19]. To detect the fault in time, the TCN network model can be established using all kinds of data before the fault occurs. The reference data include temperature and humidity and other external environ- ment conditions, a series of parameters of parts on the track, and signal fluctuation of the system. Exploring the potential relationship between these data may improve the accuracy of fault detection. Moreover, TCN can also be used to predict the health of some intelligent machines.
B. TCN in Other Fields
Because TCN can deal with the problem of the se- quence model, it can be applied to speech processing, language model, and time series prediction. [7] has been proved that TCN is more accurate and faster in dealing with many language modeling and music modeling tasks such as text8 dataset[20], Music JSB Chorales[21], etc. At the same time, TCN, as an innovation of the CNN model, can also achieve high-quality results in com- puter vision. For example, in the field of sign language translation, there is no explicit mapping relationship between sign language actions and text words in sentence meaning expression. The model of TCN can not only
capture the actions in hierarchical views but also help to learn the correlation between adjacent features to reduce the difficulty and improve the accuracy [22]. The future experimental direction will be based on this model and continue to improve to be compatible with the work in other fields.
V. RELATED WORK
In addition to the mentioned work, some related work in RUL estimation prediction realm will be introduced. The combined usage of enhanced deep LSTM and Gaussian Mixture Models (GMMs) was introduced by M.Sayah[23] in 2021. A novel prediction architecture[24] also taking LSTM as a basic part was proposed by R. Guo in 2021. Their prediction mode is based on empirical mode decomposition (EMD) and LSTM. Similarly, LSTM was utilized with genetic algorithm to predict the remaining useful life by Yang, K[25], such integration was named as GAPLS-LSTM. consequently, many researchers used LSTM as their prediction architectures’ essential part and multi-algorithm mode was the key to differ from other work. Additionally, there were some work utilizing other algorithm more than LSTM. For instance, DCNN was chosen to make prediction combined with Bayesian optimization and adaptive batch normalization (AdaBN) by J. Li[26]. Additionally, a remaining life prediction method based on fuzzy evaluation-Gaussian process regression (FE-GPR) was also attempted by W. Kang[27]. In this way, plenty kinds of methods were applied in remaining useful life prediction problem.
VI. CONCLUSION
This work adopted the temporal convolutional network (TCN) to predict turbofans’ RUL based on C-MAPSS Dataset. As it describes, the TCN adds four core parts in the design: sequence modeling, causal convolutions and dilated convolutions, and residual connections. It is good at learning sequential features of data and keeping long historical information. This paper selected three other networks(LSTM, 1DCNN, and DCNN) to train with the same input data in order to verify the effectiveness of the TCN. The experimental results showed that the TCN model is more accurate than other compared network models, which indicates that the impact of migrating the TCN to the PHM field could be positive. Moreover, the analysis of the result has shown the potential to replace the RNN to deal with sequence modeling tasks due to
the TCN’s stable gradient and parallel computing capa- bility. The current research and experiments illustrated significant advantages of TCN in remaining useful life estimation preliminarily.
VII. ACKNOWLEDGMENT
We thank for valuable suggestions and feedbacks from all members of User-Centric Computing Group and the reviewers from ICPHM2021. This research is supported by ported by The Science and Technology Planning Project of Hunan Province (2019RS2027).
REFERENCES
[1] K. Jamali, Achieving reasonable conservatism in nuclear safety analyses, Reliab. Eng. Syst. Saf. 137 (2015) 112–119.
[2] J. Park, W. Jung, A systematic framework to investigate the coverage of abnormal operating procedures in nuclear power plants, Reliab. Eng. Syst. Saf. 138 (2015) 21–30.
[3] J. Lee, F. Wu, W. Zhao, M. Ghaffari, L. Liao, D. Siegel, Prognostics and health management design for rotary machinery systems—Reviews, methodology and applications, Mech. Syst. Signal Process. 42 (2014) 314–334.
[4] G. S. Babu, P. Zhao, and X.-L. Li, “Deep convolutional neural network based regression approach for estimation of remaining useful life,” in International Conference on Database Systems for Advanced Applications. Springer, 2016, pp. 214–228.
[5] I. Sutskever, J. Martens, and G. E. Hinton, “Generating text with recurrent neural networks,” in Proc. 28th Int. Conf. Mach. Learn., 2011, pp. 1017–1024.
[6] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997.
[7] Shaojie Bai and J. Zico Kolter and Vladlen Koltun, An empirical evaluation of generic convolutional and recurrent networks for sequence modeling, arxiv:1803.01271 [cs.LG].
[8] E. Ramasso, A. Saxena, Review and analysis of algorithmic approaches developed for prognostics on CMAPSS dataset, in: Conference of the Prognostics and Health Management Society, 2015.
[9] C. Zhang, P. Lim, A.K. Qin, K.C. Tan, Multiobjective deep belief networks ensemble for remaining useful life estimation in prognostics, IEEE Trans. Neural Netw. Learn. Syst. 28 (2017) 2306–2318.
[10] Shelhamer E, Long J, Darrell T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans Pattern Anal Mach Intell. 2017;39(4):640-651. doi:10.1109/TPAMI.2016.2572683
[11] Oord, Aaron van den, et al. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016).
[12] Fisher Yu and Vladlen Koltun, Multi-Scale Context Aggregation by Dilated Convolutions. arXiv:1511.07122 [cs.CV]
[13] E. Ramasso and A. Saxena, “Performance benchmarking and analysis of prognostic methods for cmapss datasets.” Interna- tional Journal of Prognostics and Health Management, vol. 5, no. 2, pp. 1–15, 2014.
[14] G. S. Babu, P. Zhao, and X.-L. Li, “Deep convolutional neural network based regression approach for estimation of remaining useful life,” in International Conference on Database Systems for Advanced Applications. Springer, 2016, pp. 214–228.
[15] F. O. Heimes, “Recurrent neural networks for remaining useful life estimation,” in Prognostics and Health Management, 2008. PHM 2008. International Conference on. IEEE, 2008, pp. 1–6.
[16] K. He, X. Zhang, S. Ren and J. Sun, ”Deep residual learning for image recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 770- 778, doi: 10.1109/CVPR.2016.90.
[17] Nair, Vinod and Hinton, Geoffrey E. Rectified linear units improve restricted Boltzmann machines. In ICML, 2010.
[18] Srivastava, Nitish, Hinton, Geoffrey E, Krizhevsky, Alex, Sutskever, Ilya, and Salakhutdinov, Ruslan. Dropout: A simple way to prevent neural networks from overfitting. JMLR, 15(1), 2014.
[19] Oyebande, B.O.; Renfrew, A.C.: ’Condition monitoring of railway electric point machines’, IEE Proceedings - Electric Power Applications, 2002, 149, (6), p. 465-473, DOI: 10.1049/ip- epa:20020499
[20] Mikolov T, Sutskever I, Deoras A, et al. Subword language modeling with neural networks[J]. preprint (http://www. fit. vutbr. cz/imikolov/rnnlm/char. pdf), 2012, 8: 67.
[21] Allan, Moray, and Christopher Williams. ”Harmonising chorales by probabilistic inference.” Advances in neural infor- mation processing systems. 2005.
[22] Guo D, Wang S, Tian Q, et al. Dense Temporal Convolution Network for Sign Language Translation[C]//IJCAI. 2019: 744- 750.
[23] Sayah, M., Guebli, D., Noureddine, Z. et al. Deep LSTM Enhancement for RUL Prediction Using Gaussian Mix- ture Models. Aut. Control Comp. Sci. 55, 15–25 (2021). https://doi.org/10.3103/S0146411621010089
[24] R. Guo, Y. Wang, H. Zhang and G. Zhang, ”Remaining Useful Life Prediction for Rolling Bearings Using EMD-RISI- LSTM,” in IEEE Transactions on Instrumentation and Mea- surement, vol. 70, pp. 1-12, 2021, Art no. 3509812, doi: 10.1109/TIM.2021.3051717.
[25] Yang, K, Wang, Y, Yao, Y-n, Fan, S-d. Remaining useful life prediction via long-short time memory neural network with novel partial least squares and genetic algorithm. Qual Reliab Eng Int. 2021; 37: 1080– 1098. https://doi.org/10.1002/qre.2782
[26] J. Li and D. He, ”A Bayesian Optimization AdaBN-DCNN Method With Self-Optimized Structure and Hyperparameters for Domain Adaptation Remaining Useful Life Prediction,” in IEEE Access, vol. 8, pp. 41482-41501, 2020, doi: 10.1109/AC- CESS.2020.2976595.