incomingwork-in-progresspredictioninsemiconductor...

17
Research Article Incoming Work-In-Progress Prediction in Semiconductor Fabrication Foundry Using Long Short-Term Memory TzeChiangTin, 1,2 KangLengChiew , 1 SiewCheePhang, 2 SanNahSze, 1 andPeiSanTan 2 1 Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak, 94300 Kota Samarahan, Sarawak, Malaysia 2 X-FAB Sarawak Sdn. Bhd., 1 Silicon Drive, Sama Jaya Free Industrial Zone, 93350 Kuching, Sarawak, Malaysia Correspondence should be addressed to Kang Leng Chiew; [email protected] Received 19 September 2018; Revised 22 November 2018; Accepted 12 December 2018; Published 2 January 2019 Academic Editor: Paolo Gastaldo Copyright©2019TzeChiangTinetal.isisanopenaccessarticledistributedundertheCreativeCommonsAttributionLicense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Preventive maintenance activities require a tool to be offline for long hour in order to perform the prescribed maintenance activities. Although preventive maintenance is crucial to ensure operational reliability and efficiency of the tool, long hour of preventive maintenance activities increases the cycle time of the semiconductor fabrication foundry (Fab). erefore, this activity is usually performed when the incoming Work-in-Progress to the equipment is forecasted to be low. e current statistical forecasting approach has low accuracy because it lacks the ability to capture the time-dependent behavior of the Work-in- Progress. In this paper, we present a forecasting model that utilizes machine learning method to forecast the incoming Work-In- Progress. Specifically, our proposed model uses LSTM to forecast multistep ahead incoming Work-in-Progress prediction to an equipment group. e proposed model’s prediction results were compared with the results of the current statistical forecasting method of the Fab. e experimental results demonstrated that the proposed model performed better than the statistical forecasting method in both hit rate and Pearson’s correlation coefficient, r. 1. Introduction In semiconductor manufacturing, preventive maintenance (PM) is an activity that takes the entire tool offline to carry out prescribed maintenance activity in order to maintain or increase the operational efficiency and reliability of the tool and minimizes unanticipated failures due to faulty parts [1]. However, PM downtime can be costly because it takes significantly long hours. If there are insufficient back-up tools to process the incoming Work-in-Progress (IWIP) when the tool is taken offline for PM activities, a WIP bottleneck situation will be created which affects the linearity of the WIP distribution in the line. Reducing cycle time is one of the main goals to ensure on-time-delivery to the customers, while ensuring that the wafers have good yields. us, it is necessary to do proper PM planning to minimize cycle time impact while ensuring the tool is operational reliable. To achieve this goal, PM should be done when the tool group has low IWIP. However, the IWIP to a tool group has high variations as it is influ- enced by the conditions of the tools supplying the WIP to it, and various lots dispatching decision that changes dy- namically every day. In this paper, we present a multistep univariate IWIP prediction model to forecast the IWIP to a particular tool group in a semiconductor fabrication foundry (Fab) for the next seven days. We predict seven days ahead in this study as a requirement from the Fab. e problem domain is based on X-Fab Sarawak Sdn. Bhd., which has been abbreviated as the Fab. Long Short-Term Memory (LSTM) recurrent neural network is used in the prediction model to learn the his- torical incoming WIP pattern of the tool group to predict the future incoming WIP pattern of that tool group. LSTM has been used in various research areas such as traffic flow prediction [2], log-driven information technology system failure prediction to discover long-range structure in his- torical data [3], gesture recognition [4], voice conversion [5], and aircraft engines excess vibration events predictions [6]. Hindawi Computational Intelligence and Neuroscience Volume 2019, Article ID 8729367, 16 pages https://doi.org/10.1155/2019/8729367

Upload: others

Post on 25-Mar-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IncomingWork-In-ProgressPredictioninSemiconductor ...downloads.hindawi.com/journals/cin/2019/8729367.pdf · sequentially in time. e IWIP forecast to a tool group is similar to traffic

Research ArticleIncoming Work-In-Progress Prediction in SemiconductorFabrication Foundry Using Long Short-Term Memory

Tze Chiang Tin12 Kang Leng Chiew 1 SiewChee Phang2 SanNah Sze1 and Pei San Tan2

1Faculty of Computer Science and Information Technology Universiti Malaysia Sarawak 94300 Kota SamarahanSarawak Malaysia2X-FAB Sarawak Sdn Bhd 1 Silicon Drive Sama Jaya Free Industrial Zone 93350 Kuching Sarawak Malaysia

Correspondence should be addressed to Kang Leng Chiew klchiewunimasmy

Received 19 September 2018 Revised 22 November 2018 Accepted 12 December 2018 Published 2 January 2019

Academic Editor Paolo Gastaldo

Copyright copy 2019 Tze Chiang Tin et al-is is an open access article distributed under the Creative CommonsAttribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Preventive maintenance activities require a tool to be offline for long hour in order to perform the prescribed maintenanceactivities Although preventive maintenance is crucial to ensure operational reliability and efficiency of the tool long hour ofpreventive maintenance activities increases the cycle time of the semiconductor fabrication foundry (Fab) -erefore this activityis usually performed when the incoming Work-in-Progress to the equipment is forecasted to be low -e current statisticalforecasting approach has low accuracy because it lacks the ability to capture the time-dependent behavior of the Work-in-Progress In this paper we present a forecasting model that utilizes machine learning method to forecast the incoming Work-In-Progress Specifically our proposed model uses LSTM to forecast multistep ahead incoming Work-in-Progress prediction to anequipment group -e proposed modelrsquos prediction results were compared with the results of the current statistical forecastingmethod of the Fab -e experimental results demonstrated that the proposed model performed better than the statisticalforecasting method in both hit rate and Pearsonrsquos correlation coefficient r

1 Introduction

In semiconductor manufacturing preventive maintenance(PM) is an activity that takes the entire tool offline to carryout prescribed maintenance activity in order to maintain orincrease the operational efficiency and reliability of the tooland minimizes unanticipated failures due to faulty parts [1]However PM downtime can be costly because it takessignificantly long hours If there are insufficient back-uptools to process the incoming Work-in-Progress (IWIP)when the tool is taken offline for PM activities a WIPbottleneck situation will be created which affects the linearityof the WIP distribution in the line

Reducing cycle time is one of the main goals to ensureon-time-delivery to the customers while ensuring that thewafers have good yields -us it is necessary to do properPM planning to minimize cycle time impact while ensuringthe tool is operational reliable To achieve this goal PMshould be done when the tool group has low IWIP However

the IWIP to a tool group has high variations as it is influ-enced by the conditions of the tools supplying the WIP to itand various lots dispatching decision that changes dy-namically every day

In this paper we present a multistep univariate IWIPprediction model to forecast the IWIP to a particular toolgroup in a semiconductor fabrication foundry (Fab) for thenext seven days We predict seven days ahead in this study asa requirement from the Fab -e problem domain is basedon X-Fab Sarawak Sdn Bhd which has been abbreviated asthe Fab Long Short-TermMemory (LSTM) recurrent neuralnetwork is used in the prediction model to learn the his-torical incomingWIP pattern of the tool group to predict thefuture incoming WIP pattern of that tool group LSTM hasbeen used in various research areas such as traffic flowprediction [2] log-driven information technology systemfailure prediction to discover long-range structure in his-torical data [3] gesture recognition [4] voice conversion [5]and aircraft engines excess vibration events predictions [6]

HindawiComputational Intelligence and NeuroscienceVolume 2019 Article ID 8729367 16 pageshttpsdoiorg10115520198729367

To the best of our knowledge LSTM has not yet been appliedin Fab to predict IWIP Hence the application of LSTM inour work to perform IWIP prediction is novel -e con-tributions of the proposed model are summarized as follows

(i) A machine learning-based approach to predict theincomingWIP for a tool group of interest in the FabSpecifically LSTM recurrent neural network is usedas the machine learning algorithm to predict theincoming WIP

(ii) A simplified prediction model that is capable ofmodeling the dynamic environment of the Fab anddelivers higher prediction accuracy than the Fabrsquosbaseline method

-e remainder of the paper is organized as followsSection 2 introduces the related works on-time seriesforecasting and their limitations Section 3 presents theproposed framework of this research Section 4 describes theexperimental setup presents the results and discusses themajor findings of this research Section 5 highlights thecontribution of the proposed framework and concludes thispaper

2 Literature Review

21 Research Background In the research domain of fore-casting in semiconductor manufacturing majority of theresearch works focus on forecasting the cycle time of the FabFor instance Wang et al [7] proposed an automatic factorselection to improve the prediction accuracy of cycle time(CT) forecasting of wafer lots-e authors presented that theANN input factors for the CT forecasting model in the pastresearch works are either selected manually or empiricallywhich seems arbitrary and unreliable to select input factorssince it depends on artificial experience Another approachto input selection is accomplished by selecting factors thatrepresent condition of the Fab as a whole -e examples ofthese factors are the average queuing time of each wafer inthe Fab the average number of process steps completed perday and the total wafers currently in the production lineAccording to the author such approach is too complex torepresent the wafer flow of the Fab since it is influence by theinteractions among the equipment properties wafer prop-erties product mix and production control policies Hencein the authorsrsquo work input factor selection are accomplishedthrough analyzing the collected data without artificial ex-perience to improve the accuracy scalability and compre-hensibility of the CT forecasting models

In another research Wang et al [8] attempted short-term cycle time forecasting in reentrant manufacturingsystems According to the authors most previous studiesfocus on estimating the whole CT in long-term time scaleswhich predicted the output time at the at the moment thewafers enter the production stage However considering thelong production cycle (60000ndash90000 minutes) and dy-namic production environment such long-term predictionhardly meets the needs of decision making in productioncontrol Based on the long-term forecasting comparison theauthors justified that CT forecasting in short time scales can

provide more timely advice on production control such asrebalancing work in process changing dispatching rulesand job prioritizations

Scholl et al [9] presented their work on an imple-mentation of a simulation based short-term lot arrivalforecast in a mature 200mm semiconductor Fab conductedin Infineon Technologies Dresden -e authorsrsquo workforecasts the lot arrival to the defect density measurement(DDM) work centre in Infineon Technologies -e problemdomain of the authorsrsquo research is similar to this researchwhere the intention of the lot arrival forecast is to avoid PMactivities when the WIP is expected to be high However theproposed model of Scholl et al is very specific to InfineonTechnologies and is impractical to apply in other Fab-is ismainly due to the fact that the core of the simulation engineused in the research work to perform the arrival forecastproprietary simulation engine In addition operatingmethods and products modelled are specific to InfineonTechnologies which differs from other Fab It is also im-portant to note that Infineon Technologies is a Fab withfully-automated wafer transportation system using roboticsystems as oppose to human operators in other Fabs -e lotarrival time of a fully-automated wafer delivery systembetween each operation steps is highly consistent comparedwith human-operated delivery system -ese factors madethe comparison difficult across other Fabs that do not havethe same system and facilities

Another similar research work was done by Mosinskiet al [10] In this research the authors focused on the dailydelivery predictions and a bottleneck early warning systemfor several machine groups in the Fab of their projectpartner -e forecast horizon is up to 14 days According tothe authors the prediction is based completely on statisticsextracted from historical lot data traces -e research usesthe Alternative Forecast Method (AFM) which uses just oneexclusive data source to extract the detailed historical lotmovement information

-e forecast elements of the authorsrsquo research consists ofthree main stages which are data collection statistics gen-eration and forecast calculation -e data collection stagecollects all lots that are currently Work-In-Progress (WIP)-e statistics generation step aims to calculate the cycle timefrom the start operation step A to target operation step BEach lotrsquos delivery time is estimated at the forecast stage-eforecast calculation is based on a statistical evaluation of theduration between A and B from historical data Since thetime interval differs for different products lot grouping rulesbased on product characteristics need to be applied In orderto improve the forecast accuracy another lot classificationstep is required to group the lots based on specific predefinedlot attributes such as lot priority or the lotrsquos tardiness

In the statistical generation step the weightage used inthe authorsrsquo model are Fab specific and hence should bedefined upfront if the model is being used in a different fabIn addition various manual data sanitization steps are re-quired in order to remove outliers in the data In additionthe regeneration of the cycle time statistics is resource in-tensive especially with large number of lots involvementSuch limitation makes this model impractical for most Fab

2 Computational Intelligence and Neuroscience

with large number of lots Proper lot classification is nec-essary to ensure that the cycle time statistics generated arerelevant In addition special software for lot scheduling isalso required to generate the relevant data

Due to the lack of similar research works done in thedomain of semiconductor fabrication a cross-reference tosimilar research problem in a different domain is necessaryFrom the literature reviews vehicle traffic arrival forecastingexhibits the closest similarity to the forecast ofWIP arrival inthe fab Consider the comparison of the following twomodels presented in Figures 1 and 2 Figure 1 presents thescenario geometry of a typical traffic arrival modelled byLarry [11] while Figure 2 presents a typical WIP arrivalscenario to an equipment group in a fab

In Figure 1 dr dt dl and dA each represents a trafficdetector deployed at their designated location while A and Beach denotes the two intersections of the road It is desired topredict the traffic flow approaching intersection A at de-tector dA where the actual traffic flow can be measured sothat the quality of the prediction can be assessed in real-timeIn Figure 2 S1 S2 and S3 each denotes the tool group thatsupplies lots to the equipment group W In Fab environ-ment the set of tools that perform the same wafer fabricationprocess are commonly grouped logically and termed as toolgroup Figure 3 depicts an example with six tools in Group S

S1 S2 and S3 are analogous to the three roads inter-connected at intersection B that supplies traffic to in-tersection A in Larryrsquos model [11] -e lots that are suppliedby S1 S2 and S3 becomes the WIP for W -e traffic inHeadrsquos model is therefore analogous to the WIP of W in thiswork Similar to the work of Larry [11] this research workwould like to predict the totalWIP that would arrive atW fora period of time in the future

According to Larry [11] in general traffic flow is a time-space phenomenon Many of the subsequent traffic flowprediction works have also modelled the traffic flow pre-diction as a form of time series problem A list of non-exhaustive related works include Tian and Pan [2] Williamsand Hoel [12] VanderVoort et al [13] Xie et al [14] Huanget al [15] Abadi et al [16] Fu et al [17] and Shao and Soong[18] -e models used by those research works can becategorized into two categories -e first category usesstatistical models while the second category uses machinelearning models

22 Prediction Models -e prediction models can com-monly be divided into 2 categories parametric models andnonparametric models Parametric models refer to modelswith fixed structure based on some assumptions and themodel parameters can be computed with empirical data [17]Autoregressive Integrated Moving Average (ARIMA) is oneof the most popular parametric models in time series pre-diction It was first proposed to predict short-term freewaytraffic in 1970s [19] Subsequently variants of ARIMA fortime series prediction were proposed such as Kohonen-ARIMA (KARIMA) [13] and seasonal ARIMA [12]According to [1] these models are based on the assumptionof stationary variance and mean of the time series Kalman

filter is another parametric approach to solving short-termtime series traffic flow prediction [20ndash23] In the most recentresearch Abadi et al [16] applied autoregressive model (AR)to predict traffic flow up to 30 minutes ahead -e authorsrsquowork uses complete traffic data such as the historical trafficdata collected from traffic links with traffic sensors topredict the short-term traffic flow As a case study theauthors predicted the flow of a downtown traffic in SanFrancisco USA and employed Monte Carlo simulations toevaluate their methodology-e authors reported an averageprediction error varying from two percent for five minutesprediction windows to twelve percent for 30 minutes pre-diction windows with the presence of unpredictable events

Nonparametric models refer to models with no fixedstructure and parameters [1] It is also known as data-driven

dt

B

A

dr

dA

dl

Figure 1 Geometric layout of traffic flow prediction scenario byLarry [6]

S1

W

S2 S3

Figure 2 Typical scenario ofWIP arrival to an equipment group ina fab

Tool 2 Tool 3Tool 1

Tool 5 Tool 6Tool 4

Group S

Figure 3 A logical grouping of tools into tool group

Computational Intelligence and Neuroscience 3

model Nonparametric models have gainedmuch attention insolving time series problem because of the modelsrsquo ability toaddress stochastic and nonlinear nature of time seriesproblem compared to parametric models [17] Artificialneural network (ANN) support vector machine (SVM) anddeep-learning neural networks are examples of non-parametric models -e discovery of deep-learning neuralnetwork [24] and its reported success [25] have drawn manyresearchersrsquo attention to apply deep-learning neural networkto solve various research problems Dimensionality reductionof data [26] natural language processing [27] number rec-ognition [28] object detection [29] and organ detection [30]are examples of published research works that have dem-onstrated the successful use of deep-learning neural network

LSTM a variant of deep-learning neural network hasrecently gained popularity in traffic flow prediction In [31]Duan et al have constructed 66 series of LSTM neuralnetwork for the 66 travel links in their data set and validatedthat 1-step ahead travel prediction error is relatively small In[32] Zhao et al evaluated the effectiveness of LSTM inmachine health monitoring systems by sampling data over100 thousands time steps sensory signal data and evaluatingthem over linear regression (LR) support vector regression(SVR) multilayer perceptron neural network (MLP) re-current neural network (RNN) Basic LSTM and DeepLSTM -e results showed that deep LSTM performs thebest among the evaluated methods According to the au-thors LSTM do not require any expert knowledge andfeature engineering as required by the LR SVR and MLPwhich may not be accessible in practice In addition with theintroduction of forget gates LSTM is able to capture long-term dependencies thus it is able to capture and discovermeaningful features in the signal data

In [2 32 33] the authors reported that LSTM and StackedAutoEncoders (SAE) have better performance in traffic flowpredictions than the traditional predictionmodels Accordingto [32] also LSTM reported to have better performance thanSAE In addition the comparison performed in [14] forLSTM gated recurrent units (GRU) neural network and autoregressive integrated moving average (ARIMA) in trafficprediction had demonstrated that the LSTM and GRU per-formed better than the ARIMA model

In [17] also Tian and Pan demonstrated the use of LSTMto achieve higher traffic prediction accuracy in short-termtraffic prediction as compared to [31 34] -e authors foundthat the mentioned models require the length of the inputhistorical data to be predefined and static In addition themodel cannot automatically determine the optimal time lagsWith the use of LSTM the author demonstrated that LSTMcan capture the nonlinearity and randomness of the trafficflowmore effectively Furthermore the use of LSTM also canovercome the issue of back-propagated error decay throughmemory blocks thereby increasing the accuracy of theprediction

3 Methodology

-e daily IWIP to a tool group is a form of time series data-is is because it is a sequence of values that observed

sequentially in time -e IWIP forecast to a tool group issimilar to traffic arrival forecast where the objective is toensure that there is enough capacity for the traffic to flowthrough with minimum obstruction for a given time-frame inthe future in order not to create any bottlenecks in the trafficflow-e amount ofWIP arriving to a tool group is analogicalto the number of vehicles arriving to a road junction or agroup of interlinks of interest -is research also requiresmultistep ahead forecasting approach as the research problemrequires forecasting the IWIP multiple days ahead from thelast observation in order to plan for PM activities

31 Statistical Incoming WIP Forecasting Method -eexisting solution in the Fab uses a basic statistical forecastingapproach to forecast the WIP arrival for all tool groups forthe next 7 days -e forecast is run once a week at thebeginning of each week -e calculation steps of statisticalforecasting approach are summarized in Table 1

-e existing forecasting method only caters to productswith the number of wafers ordered dominating the totalWIP in the production line -is is because the calculationrequires the number of operation steps and their respectiveTAT to calculate the forecasted arrival steps -e forecastedresults are therefore not accurate because the number ofwafers considered in the calculation differs from the actualnumber of wafers in the production line In addition themethod cannot predict the IWIP to a particular tool groupmore accurately because it does not include any algorithmsto capture the time-dependency relation between the data-is limits the ability of the Fab managers Fab managerrefers to personnel who is assigned with the managementresponsibility to oversee various aspect of the Fab to ensurethat the fabrsquos production line performs smoothly to create abetter PM activities schedule that could minimize negativeimpact to the production line -erefore it is important tocreate a forecasting model with better accuracy to assist Fabmanagers to carry out more effective PM activities planningthat can minimize the impact on CT

32 Long Short-TermMemory -e long short-termmemory(LSTM) was developed in 1997 by Hochereiter andSchmidhuber [35] to address the exploding and vanishinggradient phenomena in RNN -e presence of these twophenomena had caused RNN to suffer in the inability torecord information for longer period of time [18] In otherwords RNN is not able to capture long-term dependencies[32] -e solution to this problem is the introduction offorget gate into the neural network to avoid long-termdependencies -e forget gate is used during the trainingphase to decide when information from previous cell stateshould be forgotten In general LSTM has three gatesnamely the input gate the forget gate and the output gate-e key feature of LSTM is its gated memory cell and eachcell has the above mentioned three gates -ese gates areused to control the flow of information through each cell

Let time be denoted as t At time t the input to a LSTMcell is xt and its previous output is htminus1 -e cell input state is1113957Ct the cell output state is Ct and its previous state is Ctminus1

4 Computational Intelligence and Neuroscience

Input gate at time t is it forget gate is ft and output gate isot According to the structure of the LSTM cell Ct and ht willbe transmitted to the next cell in the network To calculate Ct

and ht we first define the following 4 equationsInput gate

it σ Wixt + Wihtminus1 + bi( 1113857 (1)

Forget gate

ft σ Wfxt + Wfhtminus1 + bf1113872 1113873 (2)

Output gate

ot σ Woxt + Wohtminus1 + bo( 1113857 (3)

Cell Input1113957Ct tan h WCxt + WChtminus1 + bC( 1113857 (4)

where Wweighted matrices b bias vector σ sigmoidfunction and tanh hyperbolic tangent function

Sigmoid function σ(x) is defined as follows

σ(x) 1

1 + exp(minusx) (5)

Hyperbolic tangent function tan h(x) is defined asfollows

tan h(x) exp(x)minus exp(minusx)

exp(x) + exp(minusx) (6)

Using equations (1) (2) and (4) we calculate the celloutput state using the following equation

Ct ft lowastCtminus1( 1113857 + it lowast 1113957Ct1113872 1113873 (7)

Lastly the hidden layer output is calculated using thefollowing equation

ht otlowast tan h Ct( 1113857 (8)

-e hidden layer of the LSTM can be stacked such thatthe architecture of neural network consists of more than oneLSTM hidden layers Figure 4 shows neural network ar-chitecture with one LSTM hidden layer while Figure 5shows neural network with two LSTM hidden layers stacked

With reference to Figure 5 each LSTM hidden layer isfully connected through recurrent connections (the con-nection is indicated by the dotted directional line) -esquares in the LSTM hidden layer represents the LSTMneurons the circles denoted with xi represents the input tothe LSTM neuron while the circles denoted with yi rep-resents the output of the LSTM neuron When the LSTMhidden layers are stacked each LSTM neuron in the lowerLSTM hidden layer is fully connected to each LSTM neuronin the LSTM hidden layer above it through feedforwardconnections which is denoted by the solid directional linebetween the stacked LSTM hidden layers

33 Proposed Method Figure 6 illustrates the proposedmethod -e historical IWIP data are first stored in a datastore to ease the management of the data -e historical dataare then extracted from the data store to be preprocessed-e preprocessing stage consists of two steps which are datascaling and data formatting

In the data scaling step the historical IWIP data to beused for supervised-learning is scaled according to thefollowing equation

y xminusmin

maxminus min (9)

where x denotes each of the historical IWIP value mindenotes the smallest historical IWIP value in the historicaldata and max denotes the largest historical IWIP value inthe historical data

-e next step is the data formatting step In time seriesdomain the term ldquolagsrdquo is commonly used to denote valuesof time steps observed prior to the prediction Generally thetime series data are separated into training and testing setwhere the training set contains the lags while the testing setcontains the actual values of future time steps -erefore inthe data formatting step let xi denotes each individual lagthe scaled historical IWIP data are formatted according tothe format tabulated in Tables 2 and 3 Tables 2 and 3 depictthe format of the training and testing dataset respectively

Following this format column X consists of a series oflags column Y consists of the number of time steps to beforecasted and column Z consists of the number of featuresused in the forecast Each row in column X will contain a setof seven IWIP points that corresponds to the number ofIWIP points to be forecasted -ese seven IWIP points aregrouped into a single set of value -e number of IWIPpoints to be forecasted is represented in column Y as timesteps Column Z has the value of one in the training datasetwhich corresponds to the single set of seven IWIP points incolumn X By putting seven IWIP points in both the trainingand testing dataset we are effectively telling LSTM that each

Table 1 Existing statistical WIP forecasting steps

Step Description

1 Given the process flow name of product P is Fretrieve all process steps for F

2For each process step s in F get the turn-around-time

of s TATs this value is predefined in themanufacturing execution system (MES) of the fab

3

Let LF denotes the total photolithography layers in FlF denotes the number of photolithography layers in F

completed per day the day-per-mask-layer (DPML)committed for P DPMLP is denoted as DPMLP

(1lF)

TATF denotes the total turn-around-time (TAT) forF denoted as TATF 1113936

Fs1TATs

-e run rate for F RRF isRRF ((LF times DPMLP)TATF)

4 Cycle time (CT) for a process step s CTs isCTs RRF times TATs

5For each lot sum the next n steps of s until the CTreaches 24 hours -e last s would be the forecasted

destination step of the lot after 24 hours

6

To forecast the destination step of the lot for the nextD day sum the next n steps of s until the CT reaches

D times 24 hours -e last s would be the forecasteddestination step of the lot for the next D day

Computational Intelligence and Neuroscience 5

future seven IWIP points are related to its immediateprevious seven IWIP points

As a nonparametric model neural network model doesnot have a fixed structure In a RNN with one hidden layerthe ability of the neural networks to discover importantrelationship in the training data during the supervised-

Data storing

Preprocessing

Data scaling

Data formatting

Supervised-learningParameter sizeidentification

Epoch

Batch size

LSTM neuron size

Stacked hidden layer size

Parameter combination evaluation and selection

Result measurement

Forecast

LSTM

f (x)

y0

h0

y1 y2 yn

x0 x1 x2 xn

hnndash1h1

Figure 6 Proposed method

Input layer

Output layer

LSTM hidden layer

y0

h0

y1 y2 yn

x0 x1 x2 xn

hnndash1h1

Figure 4 Nonstacked LSTM neural network

y0

h0

y1 y2 yn

x0 x1 x2 xn

hnndash1h1

y0

h0

y1 y2 yn

x0 x1 x2 xn

hnndash1h1

Input layer

Output layer

LSTM hidden layer

LSTM hidden layer

Figure 5 Stacked LSTM neural network

Table 2 Data formation for training dataset

Set Training datasetX Y Z

1 (x1 x2 x3 x4 x5 x6 x7) 7 12 (x2 x3 x4 x5 x6 x7 x8) 7 13 (x3 x4 x5 x6 x7 x8 x9) 7 14 (x4 x5 x6 x7 x8 x9 x10) 7 15 (x5 x6 x7 x8 x9 x10 x11) 7 16 (x6 x7 x8 x9 x10 x11 x12) 7 17 (x7 x8 x9 x10 x11 x12 x13) 7 1

Table 3 Data formation for testing dataset

Set Testing datasetX Y

1 (x8 x9 x10 x11 x12 x13 x14) 12 (x9 x10 x11 x12 x13 x14 x15) 13 (x10 x11 x12 x13 x14 x15 x16) 14 (x11 x12 x13 x14 x15 x16 x17) 15 (x12 x13 x14 x15 x16 x17 x18) 16 (x13 x14 x15 x16 x17 x18 x19) 17 (x14 x15 x16 x17 x18 x19 x20) 1

6 Computational Intelligence and Neuroscience

learning is affected by the batch size used per epoch thenumber of epoch hidden layers and hidden neuron -ecombination of the sizes of these four parameters that resultsin stable supervised-learning and delivers the lowest forecasterror is desired

Each parameter being examined will have a list ofpredefined sizes to be tested When one of the parameters isbeing examined the remaining two parameters will be fixedto their current sizes in their respective list -is is to controlthe variation across the examinations For each combinationof the parameters the model will be tested with thatcombination to measure its performance in terms of theforecasting error and the stability of its supervised-learning

For the LSTM setup of this research we construct aLSTM model using the LSTM cell Let t denote the obser-vation time of each IWIP and x denotes the IWIP the inputof the LSTMmodel is the observed IWIP x at time t denotedas xt and the output of the LSTM model is the predictedIWIP 1113957xt+1 -rough the LSTM equations presented 1113957xt+1 istherefore calculated as

1113957xt+1 W middot ht + b (10)

where W is the weight matrix between the output layer andthe hidden layer

-e metric used to measure the forecasting error in thesupervised-learning is the root-mean-squared error (RMSE)Let P denote the actual IWIP 1113957P denote the forecasted IWIPand n denotes the total day forecasted RMSE is defined asfollows

RMSE

1n

1113944

n

j1Pj minus 1113957Pj1113872 1113873

2

11139741113972

(11)

RMSE is a frequently used evaluation metric because itmeasures the difference between the values predicted by amodel and the actually observed values

For each parameter size combination to be tested themodel will be experimented multiple times with the sameparameters setting If N denotes the number of times theexperiment was conducted there will be N number of RMSEobtained to represent the performance of the model for eachexperiment -e reason running multiple experiments foreach parameter size combination is because internallyneural network uses randomization to assign the weightsand the states of its neurons -is produces different fore-casting errors between experiments -erefore multipleexperimental runs are recommended to allow for the se-lection of the neural network model with internal settings toproduce the lowest RMSE

After the supervised-learning is completed the proposedmethod will proceed to parameter combination evaluationand selection -e evaluation of parameter combination andselection step is necessary because it is common to assumethat a particular parameter combination that gives a lowRMSE at the end of the supervised-learning directlytranslated to a good parameter combination that allowssufficient capability of the model to perform forecastHowever this assumption is misleading because a model

that has overlearned during the supervised-learning candeliver results with very low RMSE at the end of the trainingAn overlearned model will perform poorly in the actualforecast -erefore it is necessary to also measure the sta-bility of the supervised-learning of the model given aparticular combination of the four parameters During eachepoch in the supervised-learning the model will be requiredto perform two forecasts one uses a reserved set from thetraining set and other uses a reserved set from the testing setWith two forecasts performed two RMSE will be generated-e RMSE generated by using training set is the trainingerror while the RMSE generated by using the reservedtesting set is the testing error To measure the stability of thesupervised-learning the RMSE for both training error andtesting error of each epoch are collected and plotted in asingle graph With y-axis representing the RMSE and x-axisrepresenting the number of epochs Figures 7ndash9 show ex-amples of the curves that exhibited from the supervised-learning -e combination of parameter sizes that allows themodel to exhibit learning curve pattern similar to Figure 7 isthe desired selection Learning curve with pattern similar toFigure 7 signifies that the model was able to perform stablesupervised-learning with stable reduction in the RMSE ofboth training and testing phases using the selected combi-nation of parameter sizes In other words themodel was ableto discover the time-dependent relation in the given datasetsuch that it allows the model to minimize its prediction errorfor each epoch of the supervised-learning

-e combination of the four parametersrsquo sizes that en-ables the model to show stable performance in thesupervised-learning and lowest RMSE will be selected toforecast the IWIP

For each of the selected parameter combination themodel is required to forecast for three consecutive weeks-e accuracies for the forecast results will be measuredaccording to the selected measurement metrics to evaluatethe forecasting capability of the model

4 Experimental Results

41DataDescriptionandExperimentalDesign -e IWIP fora particular tool group is denoted as IWIP and can becalculated as

IWIP WIPt24 minusWIPt1( 1113857 1113944

24

t1(MOVE) (12)

where MOVE denotes the number of wafer moved per hourt1 refers to the first hour the data are collected and t24 refersto the twenty-fourth hour the data are collected In thisstudy the first hour is at 0830 while the 0730 on the next dayis the twenty-fourth hour

-e data use for this experiment is acquired from theFabrsquos internal development database with the applicationrunning hourly to collect the WIP and calculate the numberof wafers moved for each tool group in the production lineevery 24 hours Due to the Fabrsquos data security and confi-dentiality policies we are only allowed to access productionsystemrsquos data source of the company to perform data

Computational Intelligence and Neuroscience 7

collection for a specic duration Given the allowed durationfrom the Fab we were able to collect three months data tocreate a data set with 90 days of historical IWIP With eachIWIP as a data point 70 percent of the data points are usedfor the LSTM training phase and the remaining 30 percentfor testing phase

For the number of epochs numerical values of 100 and200 are selected For batch size numerical values of 10 and20 are selected for the number of hidden layers numericalvalues of 3 and 4 are selected and for the number of hiddenneuron numerical values of 384 and 512 are selected for therst hidden layer while the numerical values of 8 and 16 areselected for the subsequent layers It is worthwhile tomention that by using seven IWIP points per dataset as thenumber of previous IWIP lags to be examined each of thenumerical values for batch size denotes the number of weekspresented to the LSTMmodel per epoche neural networkis initialized with uniformly distributed weights where theranges of the weights are (minus01 01) and trained usingmean-squared-error (MSE) as the loss function Adamoptimizer is used as the optimization function with default

learning rate η 0001 β1 09 β2 0999 ε 0 andc 0 Each combination of the selected values is thenevaluated three times to obtain the three RMSE results ofeach combination

Parameter size selection is done by selecting the lowestRMSE among the three experimental runs followed by ex-amining the graphs of the supervised-learning result of thesame run that produced the lowest RMSE e desiredsupervised-learning graph should resemble the pattern il-lustrated in Figure 7 e parameter size combinations thatdo not meet the required pattern will be discarded

42 Measurement Metrics To measure the performance ofthe models two accuracy measurements are used esetwo-measurement metrics are hit rate and correlationmeasurement

Hit rate or probability of detection (POD) is theprobability that the forecasted event matches the observedevent In the context of this research work the observedevents are either low IWIP or high IWIP erefore hit ratecan be used to measure the forecast capability of the pro-posed method to match the actual IWIP events Let HRdenotes hit rate n denotes the number correct detectionand N denotes the total number of observation hit rate isexpressed as

HR n

Ntimes 100 (13)

From the requirement of the Fab it is only necessary forthe proposedmethod to be able to forecast any two days withhighest IWIP and any two days with lowest IWIP For thesefour days to be forecasted the hit rate required by the Fab is75 percent In other words at least three out of these fourdays must be detected

To measure the correlation between the actual IWIP andthe forecasted IWIP this research uses the Pearsonrsquos cor-relation coecient r Pearsonrsquos r is a measure of the linearrelationship between two vectors of variables In this re-search work these two vectors of variables are the actual

Epoch

RMSE

TrainingTesting

Figure 7 RMSE curves when the model is well learned

TrainingTesting

Epoch

RMSE

Figure 8 RMSE curves when then model is underlearned

TrainingTesting

Epoch

RMSE

Figure 9 RMSE curves when the model is overlearned(overtting)

8 Computational Intelligence and Neuroscience

IWIP and the forecasted IWIP Let y denotes the actualIWIP and 1113957y denotes the forecasted IWIP Pearsonrsquos r isexpressed as

r cov(y 1113957y)

σyσ1113957y (14)

where cov is the covariance of actual IWIP and forecastedIWIP σy is the standard deviation of the actual IWIP and σ1113957yis the standard deviation of the forecasted IWIP

-e correlation coefficient takes values in the range [minus11] -e value of 1 implies that a linear equation describes therelationship between the two vectors perfectly -is meansthat all data points of the two vectors fit perfectly on astraight line on a graph -e positive sign of the coefficientindicates positive correlation -is means that as the actualIWIP increase and the forecasted IWIP increases as well-e negative sign of the coefficient indicates negative cor-relation -is means that as the actual IWIP increase theforecasted IWIP decreases as well Positive correlation istherefore desirable for the forecast results

Due to the Fabrsquos privacy protection agreement only theobtained Pearsonrsquos r will be reported while the detail cal-culations of the covariance and standard deviation will beomitted Based on the requirement of the Fab the minimumPearsonrsquos r value is 04

We conduct the experiment for three consecutive weeks-is allows us to monitor the consistency of the modelsrsquoprediction At the beginning of each week we will predictseven days ahead and measure the performance at the end ofeach week -e implementation of the proposed method isaccomplished using Python programming language andKeras [36] neural network library

43 Results Analysis and Discussions Table 4 tabulates theresults of the experiments -e parameter size combinationsobtained in Table 4 are combinations that exhibited curvepattern similar to Figure 7

Figures 10ndash12 show the graphs of the supervised-learning results for the selected three combinations re-spectively From the figures it can be seen that both lines arefar apart although they move along in descending pattern Inaddition the line graph of the training is descending slowlyand remained high at the end of the epoch However theneural network was still able to forecast the IWIP that arequite close to the actual IWIP -is is shown by the linegraph of the testingrsquos RMSE that exhibits small fluctuates Byreferring to these figures alone we are not able to identify thebest parameter size combination because all three graphsexhibit similar pattern -erefore hit rate and linear cor-relation of the forecasting results of each combination can beused to identify the best parameter size combination

-e parameters from each of the selected three com-binations were applied on the proposed LSTM model toperform the three consecutive weeks forecasting -e ex-periments were run and recorded separately for eachcombination Tables 5ndash7 tabulate the hit rate percentage forCombinations 1 2 and 3 respectively Tables 8ndash10 tabulatethe Pearsonrsquos r for Combinations 1 2 and 3 respectively

Table 11 summarizes the hit rate of Combinations 1 2 and 3while Table 12 summarizes the Pearsonrsquos r of Combinations1 2 and 3

Figures 13ndash15 shows the graph plots for the IWIPforecast for the three parameter size combinationsrespectively

From the results obtained the model performed the bestusing Combination 3 In terms of hit rate Combination 3scored the highest compare to Combinations 1 and 2 for allthree weeks Combinations 1 and 2 scored 75 percent forweek 1 but for subsequent weeks both combinations onlyscored the maximum of 50 percent In terms of Pearsonrsquos rCombination 3 has the best performance in overall compareto Combinations 1 and 2 while Combination 1 has the leastperformance Although on week 3 the Pearsonrsquos r ofCombination 3 is slightly lower than Combination 2however it is still above the Fabrsquos requirement

We then compare the forecast result using Combination3 to the statistical forecasting method used in the Fab Inorder to make the writing clearer the statistical forecastingmethod used in the Fab is abbreviated as Fab methodTables 13 and 14 tabulate the hit rate and Pearsonrsquos r of Fabrsquosmethod respectively Table 15 tabulates the comparison ofthe forecast results between the proposed method and Fabrsquosmethod Figure 16 shows the WIP forecast using Fabrsquosmethod

-e Fab method serves as the baseline to measure theperformance of the LSTM forecasting model From theresults tabulated in Table 15 the proposed method withLSTM forecasting model outperformed the Fab methodHowever both hit rate and Pearsonrsquos r of the proposedmethod is unable to remain consistent for three consecutiveweeks forecasted -e results also show that the IWIPforecasted by the Fab method consistently failed to meet therequirement of the Fab for both hit rate and Pearsonrsquos r -emain reason for the inaccuracy of the Fab method is that itonly considers for products with the number of wafersordered dominating the total WIP in the production lineHowever operators need to process other wafers from otherproducts as well Hence the wafers did not arrive on time aspredicted In addition the Fab method does not consider thenumber of tools available at each process steps to process thewafers and the total amount of time that each tool is used toprocess the wafers In real environment a tool can be takenoffline for maintenance purposes or it could be used by therespective engineers to process specially crafted wafers forresearch and development purposes Without taking intothese considerations the Fab method indirectly assumedthat the number of tools available and the time of each tooldedicated to process wafers are the same across the entireperiod of the wafer fabrication process -is assumptioncaused the forecasted results to have negative correlationwith the actual IWIP

For hit rate the proposed method only scored 50 forweek 3 while for Pearsonrsquos r the proposed method onlyscored 031 for week 2 One of the factors that caused thereduced performance of the model could be due to thereason that the size of the historical data to train the LSTMmodel is not large enough Larger historical IWIP data could

Computational Intelligence and Neuroscience 9

potentially allow the LSTM model of the proposed methodto discover more time-dependent relations in the Fabrsquosproduction environment With the additional time-dependent relations discovered the accuracy of themodelrsquos forecasting can be increased

e next factor that could contribute to the inconsistentresult of the model is the limited number of features thatused to represent Fabrsquos production environment Havingadditional features to represent Fabrsquos production environ-ment could allow LSTM to perform better modeling of WIParrival e examples of additional data that could serve assuch features are the actual number of equipment thatsupplies the WIP to the tool group of interest the amount oftime each equipment in the tool group is processing the

production wafers instead of performing other maintenanceactivities and the number of wafers that each equipment inthe tool group of interest has actually processed

e last factor that contributes to the inconsistent resultscould be the need for more hidden layers As the number ofhidden layers increases it creates a deeper neural networkthat could potential allow the model to capture even moretime-dependent relations in the data However in order tobenet from deeper neural network larger dataset must rstbe obtained so that the model can be properly trained

For the experiments conducted the selection of sizes forthe LSTM modelrsquos parameters and the number of experi-mental runs is largely aected by the hardware resourcesallocation and the software capability setup From the

Table 4 Parameter size selection results

CombinationParameter

RMSEEpoch Batch size n LSTM hidden layers stacked LSTM neuronsize

1 100 10 3 512 8 8 000962 100 10 3 512 8 16 000863 100 20 3 512 16 16 00091

0

0005

001

0015

002

0025

003

1 21 41 61 81

RMSE

Epoch

Training vs testing

TrainingTesting

Figure 10 Supervised-learning result for Combination 1

00060008

0010012001400160018

002002200240026

1 21 41 61 81

RMSE

Epoch

Training vs testing

TrainingTesting

Figure 11 Supervised-learning result for Combination 2

10 Computational Intelligence and Neuroscience

hardware resource perspective sucient CPU should beallocated in the computing machine while from the softwarecapability perspective parallelization should be enabled tofully utilize the available CPU With 4 CPUs allocated in avirtual machine environment and parallelization enabled inKeras it took approximately 8 hours to complete one fullexperiment One full experiment refers to the completeevaluation all the predened sizes For real productiondeployment 8 hours is too long to obtain a usable modelParallelization with sucient number of CPUs in thecomputing machine are therefore critical in the production

environment as the results should be obtained as fast aspossible in order for the managements to make the necessarydecision for production line stability Hence proper hard-ware planning is required for production deployment

5 Conclusion

PM activity is an important activity in the Fab as it maintainsor increases the operational eciency and reliability of thetool Proper PM planning is necessary as PM activity takessignicantly long time to complete thus it is desirable to

TrainingTesting

0

0005

001

0015

002

0025

003

1 21 41 61 81

RMSE

Epoch

Training vs testing

Figure 12 Supervised-learning result for Combination 3

Table 5 Hit rate for Combination 1

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

1 L2 L34 L H5 H L67 H H 1

HR 25

w3

1 L L 12 L3 L4 H5 H H 16 H7

HR 50

Table 6 Hit rate for Combination 2

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

1 L2 L L 134 L5 H H 16 H7 H

HR 50

w3

1 L2 L3 L4 H5 H H 16 H7 L

HR 25

Computational Intelligence and Neuroscience 11

perform this activity when the IWIP to the tool group isexpected to be low With an IWIP prediction model that iscapable to predict the IWIP with high accuracy PM activitycan be planned and managed better to reduce its negativeimpact to the fabrsquos CT Reducing the negative impact to CTisimportant as this will enable the Fab to meet the On-Time-Delivery (OTD) committed to customers With consistencyin the OTD the logistic management of the company can beimproved as well such as proper storage place planning to

keep the fabricated wafers and scheduling their trans-portations for shipments Well-planned PM activities alsoallow better manpower planning in areas such as manpowerplanning When performing the PM activity sufficient toolengineers and tool vendors are required to be onsite toperform the prescribed maintenance activities Well-planned PM activities allow the required manpower to beproperly prepared Well-planned manpower directly con-tributes to better manpower cost planning With proper PMplanning in-place tools in the Fab can be scheduled toreceive their proper maintenances on time It is importantfor tools in the Fab to receive their appropriate maintenanceson-time to improve its productivity and lifetime extensionWith improved performance and extended lifetime thecapital investments of the company on the tools can beoptimized Reliable tool performance will also increase thetrust of the customers as chances of the fabricated wafersbeing scrapped due to unhealthy tool are minimized

In this paper we investigated LSTM to assist in PMplanning in Fab by predicting the IWIP to a tool group -ecomparison of the performance of the proposed method wasdone with an existing forecasting method from the Fab -eproposed method was trained using the historical IWIP dataprovided by the Fab which is a time series data Both hit rateand Pearsonrsquos correlation coefficient are important criteriathat determine the forecast capability -e proposed methodhas demonstrated results that outperformed the Fab methodby reaching above the requirement of the Fab for week 1 andweek 3 while the Fabmethod fails to meet Fabrsquos requirementfor all three weeks In terms of hit rate the proposed methodshows higher percentage than Fabrsquos method Following therequirement given by the Fab the results of the proposedmethod signifies that for forecast duration of seven days it isable to identify more accurately the two days that the IWIPwill be highest and the two days the IWIP will be lowest in aweek In terms of Pearsonrsquos correlation coefficient r theproposed method shows positive correlation and highervalue than the Fabrsquos method -is result signifies that theforecast results of the proposed method are able to producebetter forecasting results that have closer proportionalchanges in relation to the actual IWIP -e LSTM model

Table 7 Hit rate for Combination 3

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

12 L L 134 L5 H H 167 H H 1

HR 75

w3

1 L L 12 L3 L4 H5 H H 16 H7

HR 50

Table 8 Pearsonrsquos r for combination 1

Week r

w1 031w2 006w3 034

Table 9 Pearsonrsquos r for combination 2

Week r

w1 040w2 028w3 046

Table 10 Pearsonrsquos r for combination 3

Week r

w1 042w2 031w3 043

Table 11 Summary of hit rate for Combinations 1 2 and 3

CombinationHit rate ()

w1 w2 w31 75 25 502 75 50 253 75 75 50

Table 12 Summary of Pearsonrsquos r for combinations 1 2 and 3

CombinationPearsonrsquos r

w1 w2 w31 031 006 0342 040 028 0463 042 031 043

12 Computational Intelligence and Neuroscience

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 10 batch sizeand 3 hidden layers (512 8 8 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 13 IWIP forecast using Combination 1

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 10 batch sizeand 3 hidden layers (512 8 16 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 14 IWIP forecast using Combination 2

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 20 batch sizeand 3 hidden layers (512 16 16 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 15 IWIP forecast using Combination 3

Computational Intelligence and Neuroscience 13

Table 13 Hit rate for Fab method

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L6 L7 L

HR 50

w2

12 L H3 H4 L5 H L6 L7 H

HR 0

w3

1 L H2 L34 H5 H L6 L7 H

HR 0

Table 14 Pearsonrsquos r for Fab method

Week r

w1 028w2 minus011w3 minus082

Table 15 Forecast result comparison between proposed method and Fab

Hit rate () Pearsonrsquos rWeek Week

w1 w2 w3 w1 w2 w3Proposed method 750 750 500 042 031 043Fab 500 00 00 028 minus011 minus082

ndash10000

00000

10000

20000

30000

40000

50000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using Fab method

ActualProposed LSTM10 upper limit

10 lower limitFab method

Figure 16 IWIP forecast using Fab method

14 Computational Intelligence and Neuroscience

used in the proposed method contains memory cells tomemorize the long and short temporal features in the datawhich yields better performance for the prediction of timeseries data -erefore the proposed method will be veryuseful and benefit to the PM planning

Although the proposed method is outperformed theexisting Fabrsquos statistical method there is still room forimprovement -e first future work is to increase the size ofhistorical dataset With the use of larger historical datasetthat spans across longer historical time horizon to train theLSTM model it may be potential for the LSTM model todiscover significantWIP arrival pattern that could have beenmissed out in smaller historical dataset -e second futurework is to extend the univariate forecasting model in thisresearch to multivariate forecasting model -e reason forthis extension is to allow the inclusion of more features totrain the LSTM model so that the LSTM model can bettermodel the actual environment of the Fab -e next futurework is to increase the number of hidden layers in the LSTMforecasting model -e approach to increase the number ofhidden layer is also an initial step to experiment the potentialuse of deep-learning model in time series forecasting -elast future work is to extend the application of the proposedmethod to predict the IWIP of other types of tool group toexperiment if the proposed method is capable to deliveringthe same prediction performance -e collected predictionresults across various types of tool groups from the futurework will also allow us to generalize the proposed method tobe used as a generic IWIP prediction model for the fab

Data Availability

-e time-series data used to support the findings of thisstudy were supplied by X-Fab Sarawak Sdn Bhd underprivacy agreement and there the data cannot be made freelyavailable -e data potentially reveal sensitive informationand therefore their access is being restricted

Conflicts of Interest

-e authors declare that they have no conflicts of interest

Acknowledgments

-e funding for this project was made possible through theresearch grant from the Ministry of Education Malaysiaunder the Research Acculturation Collaborative Effort(Grant No RACEb(3)12472015(03)) -e authors wouldlike to thank X-FAB Sarawak Sdn Bhd for their support inthis research by providing the environment and resources toextract the relevant data

References

[1] J A Ramırez-Hernandez J Crabtree X Yao et al ldquoOptimalpreventive maintenance scheduling in semiconductormanufacturing systems software tool and simulation casestudiesrdquo IEEE Transactions on Semiconductor Manufacturingvol 23 no 3 pp 477ndash489 2010

[2] Y Tian and L Pan ldquoPredicting short term traffic flow by longshort-termmemory recurrent neural networkrdquo in Proceedingsof 2015 IEEE International Conference on Smart Citypp 153ndash158 Chengdu China December 2015

[3] K Zhang J Xu M R Min G Jiang K Pelechrinis andH Zhang ldquoAutomated IT system failure prediction a deeplearning approachrdquo in Proceedings of 2016 IEEE InternationalConference on Big Data (Big Data) pp 1291ndash1300 Wash-ington DC USA March 2016

[4] G Zhu L Zhang P Shen and J Song ldquoMultimodal gesturerecognition using 3D convolution and convolutional LSTMrdquoIEEE Access vol 5 pp 4517ndash4524 2017

[5] J Lai B Chen T Tan S Tong and K Yu ldquoPhone-awareLSTM-RNN for voice conversionrdquo in Proceedings of 2016IEEE 13th International Conference on Signal Processing(ICSP) pp 177ndash182 Chengdu China November 2016

[6] A ElSaid B Wild J Higgins and T Desell ldquoUsing LSTMrecurrent neural networks to predict excess vibration events inaircraft enginesrdquo in Proceedings of 2016 IEEE 12th In-ternational Conference on e-Science pp 260ndash269 BaltimoreMD USA October 2016

[7] J Wang J Zhang and X Wang ldquoA data driven cycle timeprediction with feature selection in a semiconductor waferfabrication systemrdquo IEEE Transactions on SemiconductorManufacturing vol 31 no 1 pp 173ndash182 2018

[8] J Wang J Zhang and X Wang ldquoBilateral LSTM a two-dimensional long short-term memory model with multiplymemory units for short-term cycle time forecasting in re-entrant manufacturing systemsrdquo IEEE Transactions on In-dustrial Informatics vol 14 no 2 pp 748ndash758 2018

[9] W Scholl B P Gan P Lendermann et al ldquoImplementationof a simulation-based short-term lot arrival forecast in amature 200 mm semiconductor FABrdquo in Proceedings of 2011Winter Simulation Conference (WSC) pp 1927ndash1938Phoenix AZ USA December 2011

[10] M Mosinski D Noack F S Pappert O Rose andW SchollldquoCluster based analytical method for the lot delivery forecastin semiconductor fab with wide product rangerdquo in Pro-ceedings of 2011 Winter Simulation Conference (WSC)pp 1829ndash1839 Phoenix AZ USA December 2011

[11] H K Larry ldquoEvent-based short-term traffic flow predictionmodelrdquo Transportation Research Record vol 1510 pp 45ndash521995

[12] B M Williams and L A Hoel ldquoModeling and forecastingvehicular traffic flow as a seasonal ARIMA process theoreticalbasis and empirical resultsrdquo Journal of Transportation Engi-neering vol 129 no 6 pp 664ndash672 2003

[13] M Van Der Voort M Dougherty and S Watson ldquoCom-bining Kohonen maps with ARIMA time series models toforecast traffic flowrdquo Transportation Research Part CEmerging Technologies vol 4 no 5 pp 307ndash318 1996

[14] Y Xie Y Zhang and Z Ye ldquoShort-term traffic volumeforecasting using Kalman filter with discrete wavelet de-compositionrdquo Computer-Aided Civil and Infrastructure En-gineering vol 22 no 5 pp 326ndash334 2007

[15] W Huang G Song H Hong and K Xie ldquoDeep architecturefor traffic flow prediction deep belief networks with multitasklearningrdquo IEEE Transactions on Intelligent TransportationSystems vol 15 no 5 pp 2191ndash2201 2014

[16] A Abadi T Rajabioun and P A Ioannou ldquoTraffic flowprediction for road transportation networks with limitedtraffic datardquo IEEE Transactions on Intelligent TransportationSystems vol 16 no 2 pp 653ndash662 2015

Computational Intelligence and Neuroscience 15

[17] R Fu Z Zhang and L Li ldquoUsing LSTM and GRU neuralnetwork methods for traffic flow predictionrdquo in Proceedings of2016 31st Youth Academic Annual Conference of ChineseAssociation of Automation (YAC) pp 324ndash328 WuhanChina November 2016

[18] H Shao and B Soong ldquoTraffic flow prediction with longshort-term memory networks (LSTMs)rdquo in Proceedings of2016 IEEE Region 10 Conference (TENCON) pp 2986ndash2989Singapore November 2016

[19] M S Ahmed and A R Cook ldquoAnalysis of freeway traffictime-series data by using Box-Jenkins Techniquesrdquo Trans-portation Research Record vol 773 no 722 pp 1ndash9 1979

[20] I Okutani ldquo-e Kalman filtering approaches in sometransportation and traffic problemsrdquo Transportation ResearchRecord vol 2 no 1 pp 397ndash416 1987

[21] I Okutani and Y J Stephanedes ldquoDynamic prediction oftraffic volume through Kalman filtering theoryrdquo Trans-portation Research Part B Methodological vol 18 no 1pp 1ndash11 1984

[22] H J H Ji A X A Xu X S X Sui and L L L Li ldquo-e appliedresearch of Kalman in the dynamic travel time predictionrdquo inProceedings of 18th International Conference on Geo-informatics pp 1ndash5 Beijing China June 2010

[23] Y Wang and M Papageorgiou ldquoReal-time freeway trafficstate estimation based on extended Kalman filter a generalapproachrdquo Transportation Research Part B Methodologicalvol 39 no 2 pp 141ndash167 2005

[24] Y Bengio ldquoLearning deep architectures for AIrdquo Foundationsand Trendsreg in Machine Learning vol 2 no 1 pp 1ndash1272009

[25] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inProceedings of NIPS Lake Tahoe NV USA December 2012

[26] G E Hinton and R R Salakhutdinov ldquoReducing the di-mensionality of data with neural networksrdquo Science vol 313no 5786 pp 504ndash507 2006

[27] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of 25th ICML pp 160ndash167 HelsinkiFinland July 2008

[28] I J Goodfellow Y Bulatov J Ibarz S Arnoud and V ShetldquoMulti-digit number recognition from street view imageryusing deep convolutional neural networksrdquo 2013 httpsarxivorgabs13126082

[29] B Huval A Coates and A Ng ldquoDeep learning for class-generic object detectionrdquo 2013 httpsarxivorgabs13126885

[30] H C Shin M R Orton D J Collins S J Doran andM O Leach ldquoStacked autoencoders for unsupervised featurelearning and multiple organ detection in a pilot study using4D patient datardquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 35 no 8 pp 1930ndash1943 2013

[31] Y Lv Y Duan W Kang Z Li and F Wang ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16no 2 pp 865ndash873 2015

[32] R Zhao J Wang R Yan and K Mao ldquoMachine healthmonitoring with LSTM networksrdquo in Proceedings of 2016 10thInternational Conference on Sensing Technology (ICST)pp 1ndash6 Nanjing China November 2016

[33] X Ma Z Tao Y Wang H Yu and Y Wang ldquoLong short-term memory neural network for traffic speed predictionusing remote microwave sensor datardquo Transportation

Research Part C Emerging Technologies vol 54 pp 187ndash1972015

[34] E I Vlahogianni M G Karlaftis and J C Golias ldquoOptimizedand meta-optimized neural networks for short-term trafficflow prediction a genetic approachrdquo Transportation ResearchPart C Emerging Technologies vol 13 no 3 pp 211ndash2342005

[35] S Hochreiter and J Schmidhuber ldquoLong short-term mem-oryrdquo Neural Computation vol 9 no 8 pp 1735ndash1780 1997

[36] Keras -e Python Deep Learning Library 2018 httpskerasio

16 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 2: IncomingWork-In-ProgressPredictioninSemiconductor ...downloads.hindawi.com/journals/cin/2019/8729367.pdf · sequentially in time. e IWIP forecast to a tool group is similar to traffic

To the best of our knowledge LSTM has not yet been appliedin Fab to predict IWIP Hence the application of LSTM inour work to perform IWIP prediction is novel -e con-tributions of the proposed model are summarized as follows

(i) A machine learning-based approach to predict theincomingWIP for a tool group of interest in the FabSpecifically LSTM recurrent neural network is usedas the machine learning algorithm to predict theincoming WIP

(ii) A simplified prediction model that is capable ofmodeling the dynamic environment of the Fab anddelivers higher prediction accuracy than the Fabrsquosbaseline method

-e remainder of the paper is organized as followsSection 2 introduces the related works on-time seriesforecasting and their limitations Section 3 presents theproposed framework of this research Section 4 describes theexperimental setup presents the results and discusses themajor findings of this research Section 5 highlights thecontribution of the proposed framework and concludes thispaper

2 Literature Review

21 Research Background In the research domain of fore-casting in semiconductor manufacturing majority of theresearch works focus on forecasting the cycle time of the FabFor instance Wang et al [7] proposed an automatic factorselection to improve the prediction accuracy of cycle time(CT) forecasting of wafer lots-e authors presented that theANN input factors for the CT forecasting model in the pastresearch works are either selected manually or empiricallywhich seems arbitrary and unreliable to select input factorssince it depends on artificial experience Another approachto input selection is accomplished by selecting factors thatrepresent condition of the Fab as a whole -e examples ofthese factors are the average queuing time of each wafer inthe Fab the average number of process steps completed perday and the total wafers currently in the production lineAccording to the author such approach is too complex torepresent the wafer flow of the Fab since it is influence by theinteractions among the equipment properties wafer prop-erties product mix and production control policies Hencein the authorsrsquo work input factor selection are accomplishedthrough analyzing the collected data without artificial ex-perience to improve the accuracy scalability and compre-hensibility of the CT forecasting models

In another research Wang et al [8] attempted short-term cycle time forecasting in reentrant manufacturingsystems According to the authors most previous studiesfocus on estimating the whole CT in long-term time scaleswhich predicted the output time at the at the moment thewafers enter the production stage However considering thelong production cycle (60000ndash90000 minutes) and dy-namic production environment such long-term predictionhardly meets the needs of decision making in productioncontrol Based on the long-term forecasting comparison theauthors justified that CT forecasting in short time scales can

provide more timely advice on production control such asrebalancing work in process changing dispatching rulesand job prioritizations

Scholl et al [9] presented their work on an imple-mentation of a simulation based short-term lot arrivalforecast in a mature 200mm semiconductor Fab conductedin Infineon Technologies Dresden -e authorsrsquo workforecasts the lot arrival to the defect density measurement(DDM) work centre in Infineon Technologies -e problemdomain of the authorsrsquo research is similar to this researchwhere the intention of the lot arrival forecast is to avoid PMactivities when the WIP is expected to be high However theproposed model of Scholl et al is very specific to InfineonTechnologies and is impractical to apply in other Fab-is ismainly due to the fact that the core of the simulation engineused in the research work to perform the arrival forecastproprietary simulation engine In addition operatingmethods and products modelled are specific to InfineonTechnologies which differs from other Fab It is also im-portant to note that Infineon Technologies is a Fab withfully-automated wafer transportation system using roboticsystems as oppose to human operators in other Fabs -e lotarrival time of a fully-automated wafer delivery systembetween each operation steps is highly consistent comparedwith human-operated delivery system -ese factors madethe comparison difficult across other Fabs that do not havethe same system and facilities

Another similar research work was done by Mosinskiet al [10] In this research the authors focused on the dailydelivery predictions and a bottleneck early warning systemfor several machine groups in the Fab of their projectpartner -e forecast horizon is up to 14 days According tothe authors the prediction is based completely on statisticsextracted from historical lot data traces -e research usesthe Alternative Forecast Method (AFM) which uses just oneexclusive data source to extract the detailed historical lotmovement information

-e forecast elements of the authorsrsquo research consists ofthree main stages which are data collection statistics gen-eration and forecast calculation -e data collection stagecollects all lots that are currently Work-In-Progress (WIP)-e statistics generation step aims to calculate the cycle timefrom the start operation step A to target operation step BEach lotrsquos delivery time is estimated at the forecast stage-eforecast calculation is based on a statistical evaluation of theduration between A and B from historical data Since thetime interval differs for different products lot grouping rulesbased on product characteristics need to be applied In orderto improve the forecast accuracy another lot classificationstep is required to group the lots based on specific predefinedlot attributes such as lot priority or the lotrsquos tardiness

In the statistical generation step the weightage used inthe authorsrsquo model are Fab specific and hence should bedefined upfront if the model is being used in a different fabIn addition various manual data sanitization steps are re-quired in order to remove outliers in the data In additionthe regeneration of the cycle time statistics is resource in-tensive especially with large number of lots involvementSuch limitation makes this model impractical for most Fab

2 Computational Intelligence and Neuroscience

with large number of lots Proper lot classification is nec-essary to ensure that the cycle time statistics generated arerelevant In addition special software for lot scheduling isalso required to generate the relevant data

Due to the lack of similar research works done in thedomain of semiconductor fabrication a cross-reference tosimilar research problem in a different domain is necessaryFrom the literature reviews vehicle traffic arrival forecastingexhibits the closest similarity to the forecast ofWIP arrival inthe fab Consider the comparison of the following twomodels presented in Figures 1 and 2 Figure 1 presents thescenario geometry of a typical traffic arrival modelled byLarry [11] while Figure 2 presents a typical WIP arrivalscenario to an equipment group in a fab

In Figure 1 dr dt dl and dA each represents a trafficdetector deployed at their designated location while A and Beach denotes the two intersections of the road It is desired topredict the traffic flow approaching intersection A at de-tector dA where the actual traffic flow can be measured sothat the quality of the prediction can be assessed in real-timeIn Figure 2 S1 S2 and S3 each denotes the tool group thatsupplies lots to the equipment group W In Fab environ-ment the set of tools that perform the same wafer fabricationprocess are commonly grouped logically and termed as toolgroup Figure 3 depicts an example with six tools in Group S

S1 S2 and S3 are analogous to the three roads inter-connected at intersection B that supplies traffic to in-tersection A in Larryrsquos model [11] -e lots that are suppliedby S1 S2 and S3 becomes the WIP for W -e traffic inHeadrsquos model is therefore analogous to the WIP of W in thiswork Similar to the work of Larry [11] this research workwould like to predict the totalWIP that would arrive atW fora period of time in the future

According to Larry [11] in general traffic flow is a time-space phenomenon Many of the subsequent traffic flowprediction works have also modelled the traffic flow pre-diction as a form of time series problem A list of non-exhaustive related works include Tian and Pan [2] Williamsand Hoel [12] VanderVoort et al [13] Xie et al [14] Huanget al [15] Abadi et al [16] Fu et al [17] and Shao and Soong[18] -e models used by those research works can becategorized into two categories -e first category usesstatistical models while the second category uses machinelearning models

22 Prediction Models -e prediction models can com-monly be divided into 2 categories parametric models andnonparametric models Parametric models refer to modelswith fixed structure based on some assumptions and themodel parameters can be computed with empirical data [17]Autoregressive Integrated Moving Average (ARIMA) is oneof the most popular parametric models in time series pre-diction It was first proposed to predict short-term freewaytraffic in 1970s [19] Subsequently variants of ARIMA fortime series prediction were proposed such as Kohonen-ARIMA (KARIMA) [13] and seasonal ARIMA [12]According to [1] these models are based on the assumptionof stationary variance and mean of the time series Kalman

filter is another parametric approach to solving short-termtime series traffic flow prediction [20ndash23] In the most recentresearch Abadi et al [16] applied autoregressive model (AR)to predict traffic flow up to 30 minutes ahead -e authorsrsquowork uses complete traffic data such as the historical trafficdata collected from traffic links with traffic sensors topredict the short-term traffic flow As a case study theauthors predicted the flow of a downtown traffic in SanFrancisco USA and employed Monte Carlo simulations toevaluate their methodology-e authors reported an averageprediction error varying from two percent for five minutesprediction windows to twelve percent for 30 minutes pre-diction windows with the presence of unpredictable events

Nonparametric models refer to models with no fixedstructure and parameters [1] It is also known as data-driven

dt

B

A

dr

dA

dl

Figure 1 Geometric layout of traffic flow prediction scenario byLarry [6]

S1

W

S2 S3

Figure 2 Typical scenario ofWIP arrival to an equipment group ina fab

Tool 2 Tool 3Tool 1

Tool 5 Tool 6Tool 4

Group S

Figure 3 A logical grouping of tools into tool group

Computational Intelligence and Neuroscience 3

model Nonparametric models have gainedmuch attention insolving time series problem because of the modelsrsquo ability toaddress stochastic and nonlinear nature of time seriesproblem compared to parametric models [17] Artificialneural network (ANN) support vector machine (SVM) anddeep-learning neural networks are examples of non-parametric models -e discovery of deep-learning neuralnetwork [24] and its reported success [25] have drawn manyresearchersrsquo attention to apply deep-learning neural networkto solve various research problems Dimensionality reductionof data [26] natural language processing [27] number rec-ognition [28] object detection [29] and organ detection [30]are examples of published research works that have dem-onstrated the successful use of deep-learning neural network

LSTM a variant of deep-learning neural network hasrecently gained popularity in traffic flow prediction In [31]Duan et al have constructed 66 series of LSTM neuralnetwork for the 66 travel links in their data set and validatedthat 1-step ahead travel prediction error is relatively small In[32] Zhao et al evaluated the effectiveness of LSTM inmachine health monitoring systems by sampling data over100 thousands time steps sensory signal data and evaluatingthem over linear regression (LR) support vector regression(SVR) multilayer perceptron neural network (MLP) re-current neural network (RNN) Basic LSTM and DeepLSTM -e results showed that deep LSTM performs thebest among the evaluated methods According to the au-thors LSTM do not require any expert knowledge andfeature engineering as required by the LR SVR and MLPwhich may not be accessible in practice In addition with theintroduction of forget gates LSTM is able to capture long-term dependencies thus it is able to capture and discovermeaningful features in the signal data

In [2 32 33] the authors reported that LSTM and StackedAutoEncoders (SAE) have better performance in traffic flowpredictions than the traditional predictionmodels Accordingto [32] also LSTM reported to have better performance thanSAE In addition the comparison performed in [14] forLSTM gated recurrent units (GRU) neural network and autoregressive integrated moving average (ARIMA) in trafficprediction had demonstrated that the LSTM and GRU per-formed better than the ARIMA model

In [17] also Tian and Pan demonstrated the use of LSTMto achieve higher traffic prediction accuracy in short-termtraffic prediction as compared to [31 34] -e authors foundthat the mentioned models require the length of the inputhistorical data to be predefined and static In addition themodel cannot automatically determine the optimal time lagsWith the use of LSTM the author demonstrated that LSTMcan capture the nonlinearity and randomness of the trafficflowmore effectively Furthermore the use of LSTM also canovercome the issue of back-propagated error decay throughmemory blocks thereby increasing the accuracy of theprediction

3 Methodology

-e daily IWIP to a tool group is a form of time series data-is is because it is a sequence of values that observed

sequentially in time -e IWIP forecast to a tool group issimilar to traffic arrival forecast where the objective is toensure that there is enough capacity for the traffic to flowthrough with minimum obstruction for a given time-frame inthe future in order not to create any bottlenecks in the trafficflow-e amount ofWIP arriving to a tool group is analogicalto the number of vehicles arriving to a road junction or agroup of interlinks of interest -is research also requiresmultistep ahead forecasting approach as the research problemrequires forecasting the IWIP multiple days ahead from thelast observation in order to plan for PM activities

31 Statistical Incoming WIP Forecasting Method -eexisting solution in the Fab uses a basic statistical forecastingapproach to forecast the WIP arrival for all tool groups forthe next 7 days -e forecast is run once a week at thebeginning of each week -e calculation steps of statisticalforecasting approach are summarized in Table 1

-e existing forecasting method only caters to productswith the number of wafers ordered dominating the totalWIP in the production line -is is because the calculationrequires the number of operation steps and their respectiveTAT to calculate the forecasted arrival steps -e forecastedresults are therefore not accurate because the number ofwafers considered in the calculation differs from the actualnumber of wafers in the production line In addition themethod cannot predict the IWIP to a particular tool groupmore accurately because it does not include any algorithmsto capture the time-dependency relation between the data-is limits the ability of the Fab managers Fab managerrefers to personnel who is assigned with the managementresponsibility to oversee various aspect of the Fab to ensurethat the fabrsquos production line performs smoothly to create abetter PM activities schedule that could minimize negativeimpact to the production line -erefore it is important tocreate a forecasting model with better accuracy to assist Fabmanagers to carry out more effective PM activities planningthat can minimize the impact on CT

32 Long Short-TermMemory -e long short-termmemory(LSTM) was developed in 1997 by Hochereiter andSchmidhuber [35] to address the exploding and vanishinggradient phenomena in RNN -e presence of these twophenomena had caused RNN to suffer in the inability torecord information for longer period of time [18] In otherwords RNN is not able to capture long-term dependencies[32] -e solution to this problem is the introduction offorget gate into the neural network to avoid long-termdependencies -e forget gate is used during the trainingphase to decide when information from previous cell stateshould be forgotten In general LSTM has three gatesnamely the input gate the forget gate and the output gate-e key feature of LSTM is its gated memory cell and eachcell has the above mentioned three gates -ese gates areused to control the flow of information through each cell

Let time be denoted as t At time t the input to a LSTMcell is xt and its previous output is htminus1 -e cell input state is1113957Ct the cell output state is Ct and its previous state is Ctminus1

4 Computational Intelligence and Neuroscience

Input gate at time t is it forget gate is ft and output gate isot According to the structure of the LSTM cell Ct and ht willbe transmitted to the next cell in the network To calculate Ct

and ht we first define the following 4 equationsInput gate

it σ Wixt + Wihtminus1 + bi( 1113857 (1)

Forget gate

ft σ Wfxt + Wfhtminus1 + bf1113872 1113873 (2)

Output gate

ot σ Woxt + Wohtminus1 + bo( 1113857 (3)

Cell Input1113957Ct tan h WCxt + WChtminus1 + bC( 1113857 (4)

where Wweighted matrices b bias vector σ sigmoidfunction and tanh hyperbolic tangent function

Sigmoid function σ(x) is defined as follows

σ(x) 1

1 + exp(minusx) (5)

Hyperbolic tangent function tan h(x) is defined asfollows

tan h(x) exp(x)minus exp(minusx)

exp(x) + exp(minusx) (6)

Using equations (1) (2) and (4) we calculate the celloutput state using the following equation

Ct ft lowastCtminus1( 1113857 + it lowast 1113957Ct1113872 1113873 (7)

Lastly the hidden layer output is calculated using thefollowing equation

ht otlowast tan h Ct( 1113857 (8)

-e hidden layer of the LSTM can be stacked such thatthe architecture of neural network consists of more than oneLSTM hidden layers Figure 4 shows neural network ar-chitecture with one LSTM hidden layer while Figure 5shows neural network with two LSTM hidden layers stacked

With reference to Figure 5 each LSTM hidden layer isfully connected through recurrent connections (the con-nection is indicated by the dotted directional line) -esquares in the LSTM hidden layer represents the LSTMneurons the circles denoted with xi represents the input tothe LSTM neuron while the circles denoted with yi rep-resents the output of the LSTM neuron When the LSTMhidden layers are stacked each LSTM neuron in the lowerLSTM hidden layer is fully connected to each LSTM neuronin the LSTM hidden layer above it through feedforwardconnections which is denoted by the solid directional linebetween the stacked LSTM hidden layers

33 Proposed Method Figure 6 illustrates the proposedmethod -e historical IWIP data are first stored in a datastore to ease the management of the data -e historical dataare then extracted from the data store to be preprocessed-e preprocessing stage consists of two steps which are datascaling and data formatting

In the data scaling step the historical IWIP data to beused for supervised-learning is scaled according to thefollowing equation

y xminusmin

maxminus min (9)

where x denotes each of the historical IWIP value mindenotes the smallest historical IWIP value in the historicaldata and max denotes the largest historical IWIP value inthe historical data

-e next step is the data formatting step In time seriesdomain the term ldquolagsrdquo is commonly used to denote valuesof time steps observed prior to the prediction Generally thetime series data are separated into training and testing setwhere the training set contains the lags while the testing setcontains the actual values of future time steps -erefore inthe data formatting step let xi denotes each individual lagthe scaled historical IWIP data are formatted according tothe format tabulated in Tables 2 and 3 Tables 2 and 3 depictthe format of the training and testing dataset respectively

Following this format column X consists of a series oflags column Y consists of the number of time steps to beforecasted and column Z consists of the number of featuresused in the forecast Each row in column X will contain a setof seven IWIP points that corresponds to the number ofIWIP points to be forecasted -ese seven IWIP points aregrouped into a single set of value -e number of IWIPpoints to be forecasted is represented in column Y as timesteps Column Z has the value of one in the training datasetwhich corresponds to the single set of seven IWIP points incolumn X By putting seven IWIP points in both the trainingand testing dataset we are effectively telling LSTM that each

Table 1 Existing statistical WIP forecasting steps

Step Description

1 Given the process flow name of product P is Fretrieve all process steps for F

2For each process step s in F get the turn-around-time

of s TATs this value is predefined in themanufacturing execution system (MES) of the fab

3

Let LF denotes the total photolithography layers in FlF denotes the number of photolithography layers in F

completed per day the day-per-mask-layer (DPML)committed for P DPMLP is denoted as DPMLP

(1lF)

TATF denotes the total turn-around-time (TAT) forF denoted as TATF 1113936

Fs1TATs

-e run rate for F RRF isRRF ((LF times DPMLP)TATF)

4 Cycle time (CT) for a process step s CTs isCTs RRF times TATs

5For each lot sum the next n steps of s until the CTreaches 24 hours -e last s would be the forecasted

destination step of the lot after 24 hours

6

To forecast the destination step of the lot for the nextD day sum the next n steps of s until the CT reaches

D times 24 hours -e last s would be the forecasteddestination step of the lot for the next D day

Computational Intelligence and Neuroscience 5

future seven IWIP points are related to its immediateprevious seven IWIP points

As a nonparametric model neural network model doesnot have a fixed structure In a RNN with one hidden layerthe ability of the neural networks to discover importantrelationship in the training data during the supervised-

Data storing

Preprocessing

Data scaling

Data formatting

Supervised-learningParameter sizeidentification

Epoch

Batch size

LSTM neuron size

Stacked hidden layer size

Parameter combination evaluation and selection

Result measurement

Forecast

LSTM

f (x)

y0

h0

y1 y2 yn

x0 x1 x2 xn

hnndash1h1

Figure 6 Proposed method

Input layer

Output layer

LSTM hidden layer

y0

h0

y1 y2 yn

x0 x1 x2 xn

hnndash1h1

Figure 4 Nonstacked LSTM neural network

y0

h0

y1 y2 yn

x0 x1 x2 xn

hnndash1h1

y0

h0

y1 y2 yn

x0 x1 x2 xn

hnndash1h1

Input layer

Output layer

LSTM hidden layer

LSTM hidden layer

Figure 5 Stacked LSTM neural network

Table 2 Data formation for training dataset

Set Training datasetX Y Z

1 (x1 x2 x3 x4 x5 x6 x7) 7 12 (x2 x3 x4 x5 x6 x7 x8) 7 13 (x3 x4 x5 x6 x7 x8 x9) 7 14 (x4 x5 x6 x7 x8 x9 x10) 7 15 (x5 x6 x7 x8 x9 x10 x11) 7 16 (x6 x7 x8 x9 x10 x11 x12) 7 17 (x7 x8 x9 x10 x11 x12 x13) 7 1

Table 3 Data formation for testing dataset

Set Testing datasetX Y

1 (x8 x9 x10 x11 x12 x13 x14) 12 (x9 x10 x11 x12 x13 x14 x15) 13 (x10 x11 x12 x13 x14 x15 x16) 14 (x11 x12 x13 x14 x15 x16 x17) 15 (x12 x13 x14 x15 x16 x17 x18) 16 (x13 x14 x15 x16 x17 x18 x19) 17 (x14 x15 x16 x17 x18 x19 x20) 1

6 Computational Intelligence and Neuroscience

learning is affected by the batch size used per epoch thenumber of epoch hidden layers and hidden neuron -ecombination of the sizes of these four parameters that resultsin stable supervised-learning and delivers the lowest forecasterror is desired

Each parameter being examined will have a list ofpredefined sizes to be tested When one of the parameters isbeing examined the remaining two parameters will be fixedto their current sizes in their respective list -is is to controlthe variation across the examinations For each combinationof the parameters the model will be tested with thatcombination to measure its performance in terms of theforecasting error and the stability of its supervised-learning

For the LSTM setup of this research we construct aLSTM model using the LSTM cell Let t denote the obser-vation time of each IWIP and x denotes the IWIP the inputof the LSTMmodel is the observed IWIP x at time t denotedas xt and the output of the LSTM model is the predictedIWIP 1113957xt+1 -rough the LSTM equations presented 1113957xt+1 istherefore calculated as

1113957xt+1 W middot ht + b (10)

where W is the weight matrix between the output layer andthe hidden layer

-e metric used to measure the forecasting error in thesupervised-learning is the root-mean-squared error (RMSE)Let P denote the actual IWIP 1113957P denote the forecasted IWIPand n denotes the total day forecasted RMSE is defined asfollows

RMSE

1n

1113944

n

j1Pj minus 1113957Pj1113872 1113873

2

11139741113972

(11)

RMSE is a frequently used evaluation metric because itmeasures the difference between the values predicted by amodel and the actually observed values

For each parameter size combination to be tested themodel will be experimented multiple times with the sameparameters setting If N denotes the number of times theexperiment was conducted there will be N number of RMSEobtained to represent the performance of the model for eachexperiment -e reason running multiple experiments foreach parameter size combination is because internallyneural network uses randomization to assign the weightsand the states of its neurons -is produces different fore-casting errors between experiments -erefore multipleexperimental runs are recommended to allow for the se-lection of the neural network model with internal settings toproduce the lowest RMSE

After the supervised-learning is completed the proposedmethod will proceed to parameter combination evaluationand selection -e evaluation of parameter combination andselection step is necessary because it is common to assumethat a particular parameter combination that gives a lowRMSE at the end of the supervised-learning directlytranslated to a good parameter combination that allowssufficient capability of the model to perform forecastHowever this assumption is misleading because a model

that has overlearned during the supervised-learning candeliver results with very low RMSE at the end of the trainingAn overlearned model will perform poorly in the actualforecast -erefore it is necessary to also measure the sta-bility of the supervised-learning of the model given aparticular combination of the four parameters During eachepoch in the supervised-learning the model will be requiredto perform two forecasts one uses a reserved set from thetraining set and other uses a reserved set from the testing setWith two forecasts performed two RMSE will be generated-e RMSE generated by using training set is the trainingerror while the RMSE generated by using the reservedtesting set is the testing error To measure the stability of thesupervised-learning the RMSE for both training error andtesting error of each epoch are collected and plotted in asingle graph With y-axis representing the RMSE and x-axisrepresenting the number of epochs Figures 7ndash9 show ex-amples of the curves that exhibited from the supervised-learning -e combination of parameter sizes that allows themodel to exhibit learning curve pattern similar to Figure 7 isthe desired selection Learning curve with pattern similar toFigure 7 signifies that the model was able to perform stablesupervised-learning with stable reduction in the RMSE ofboth training and testing phases using the selected combi-nation of parameter sizes In other words themodel was ableto discover the time-dependent relation in the given datasetsuch that it allows the model to minimize its prediction errorfor each epoch of the supervised-learning

-e combination of the four parametersrsquo sizes that en-ables the model to show stable performance in thesupervised-learning and lowest RMSE will be selected toforecast the IWIP

For each of the selected parameter combination themodel is required to forecast for three consecutive weeks-e accuracies for the forecast results will be measuredaccording to the selected measurement metrics to evaluatethe forecasting capability of the model

4 Experimental Results

41DataDescriptionandExperimentalDesign -e IWIP fora particular tool group is denoted as IWIP and can becalculated as

IWIP WIPt24 minusWIPt1( 1113857 1113944

24

t1(MOVE) (12)

where MOVE denotes the number of wafer moved per hourt1 refers to the first hour the data are collected and t24 refersto the twenty-fourth hour the data are collected In thisstudy the first hour is at 0830 while the 0730 on the next dayis the twenty-fourth hour

-e data use for this experiment is acquired from theFabrsquos internal development database with the applicationrunning hourly to collect the WIP and calculate the numberof wafers moved for each tool group in the production lineevery 24 hours Due to the Fabrsquos data security and confi-dentiality policies we are only allowed to access productionsystemrsquos data source of the company to perform data

Computational Intelligence and Neuroscience 7

collection for a specic duration Given the allowed durationfrom the Fab we were able to collect three months data tocreate a data set with 90 days of historical IWIP With eachIWIP as a data point 70 percent of the data points are usedfor the LSTM training phase and the remaining 30 percentfor testing phase

For the number of epochs numerical values of 100 and200 are selected For batch size numerical values of 10 and20 are selected for the number of hidden layers numericalvalues of 3 and 4 are selected and for the number of hiddenneuron numerical values of 384 and 512 are selected for therst hidden layer while the numerical values of 8 and 16 areselected for the subsequent layers It is worthwhile tomention that by using seven IWIP points per dataset as thenumber of previous IWIP lags to be examined each of thenumerical values for batch size denotes the number of weekspresented to the LSTMmodel per epoche neural networkis initialized with uniformly distributed weights where theranges of the weights are (minus01 01) and trained usingmean-squared-error (MSE) as the loss function Adamoptimizer is used as the optimization function with default

learning rate η 0001 β1 09 β2 0999 ε 0 andc 0 Each combination of the selected values is thenevaluated three times to obtain the three RMSE results ofeach combination

Parameter size selection is done by selecting the lowestRMSE among the three experimental runs followed by ex-amining the graphs of the supervised-learning result of thesame run that produced the lowest RMSE e desiredsupervised-learning graph should resemble the pattern il-lustrated in Figure 7 e parameter size combinations thatdo not meet the required pattern will be discarded

42 Measurement Metrics To measure the performance ofthe models two accuracy measurements are used esetwo-measurement metrics are hit rate and correlationmeasurement

Hit rate or probability of detection (POD) is theprobability that the forecasted event matches the observedevent In the context of this research work the observedevents are either low IWIP or high IWIP erefore hit ratecan be used to measure the forecast capability of the pro-posed method to match the actual IWIP events Let HRdenotes hit rate n denotes the number correct detectionand N denotes the total number of observation hit rate isexpressed as

HR n

Ntimes 100 (13)

From the requirement of the Fab it is only necessary forthe proposedmethod to be able to forecast any two days withhighest IWIP and any two days with lowest IWIP For thesefour days to be forecasted the hit rate required by the Fab is75 percent In other words at least three out of these fourdays must be detected

To measure the correlation between the actual IWIP andthe forecasted IWIP this research uses the Pearsonrsquos cor-relation coecient r Pearsonrsquos r is a measure of the linearrelationship between two vectors of variables In this re-search work these two vectors of variables are the actual

Epoch

RMSE

TrainingTesting

Figure 7 RMSE curves when the model is well learned

TrainingTesting

Epoch

RMSE

Figure 8 RMSE curves when then model is underlearned

TrainingTesting

Epoch

RMSE

Figure 9 RMSE curves when the model is overlearned(overtting)

8 Computational Intelligence and Neuroscience

IWIP and the forecasted IWIP Let y denotes the actualIWIP and 1113957y denotes the forecasted IWIP Pearsonrsquos r isexpressed as

r cov(y 1113957y)

σyσ1113957y (14)

where cov is the covariance of actual IWIP and forecastedIWIP σy is the standard deviation of the actual IWIP and σ1113957yis the standard deviation of the forecasted IWIP

-e correlation coefficient takes values in the range [minus11] -e value of 1 implies that a linear equation describes therelationship between the two vectors perfectly -is meansthat all data points of the two vectors fit perfectly on astraight line on a graph -e positive sign of the coefficientindicates positive correlation -is means that as the actualIWIP increase and the forecasted IWIP increases as well-e negative sign of the coefficient indicates negative cor-relation -is means that as the actual IWIP increase theforecasted IWIP decreases as well Positive correlation istherefore desirable for the forecast results

Due to the Fabrsquos privacy protection agreement only theobtained Pearsonrsquos r will be reported while the detail cal-culations of the covariance and standard deviation will beomitted Based on the requirement of the Fab the minimumPearsonrsquos r value is 04

We conduct the experiment for three consecutive weeks-is allows us to monitor the consistency of the modelsrsquoprediction At the beginning of each week we will predictseven days ahead and measure the performance at the end ofeach week -e implementation of the proposed method isaccomplished using Python programming language andKeras [36] neural network library

43 Results Analysis and Discussions Table 4 tabulates theresults of the experiments -e parameter size combinationsobtained in Table 4 are combinations that exhibited curvepattern similar to Figure 7

Figures 10ndash12 show the graphs of the supervised-learning results for the selected three combinations re-spectively From the figures it can be seen that both lines arefar apart although they move along in descending pattern Inaddition the line graph of the training is descending slowlyand remained high at the end of the epoch However theneural network was still able to forecast the IWIP that arequite close to the actual IWIP -is is shown by the linegraph of the testingrsquos RMSE that exhibits small fluctuates Byreferring to these figures alone we are not able to identify thebest parameter size combination because all three graphsexhibit similar pattern -erefore hit rate and linear cor-relation of the forecasting results of each combination can beused to identify the best parameter size combination

-e parameters from each of the selected three com-binations were applied on the proposed LSTM model toperform the three consecutive weeks forecasting -e ex-periments were run and recorded separately for eachcombination Tables 5ndash7 tabulate the hit rate percentage forCombinations 1 2 and 3 respectively Tables 8ndash10 tabulatethe Pearsonrsquos r for Combinations 1 2 and 3 respectively

Table 11 summarizes the hit rate of Combinations 1 2 and 3while Table 12 summarizes the Pearsonrsquos r of Combinations1 2 and 3

Figures 13ndash15 shows the graph plots for the IWIPforecast for the three parameter size combinationsrespectively

From the results obtained the model performed the bestusing Combination 3 In terms of hit rate Combination 3scored the highest compare to Combinations 1 and 2 for allthree weeks Combinations 1 and 2 scored 75 percent forweek 1 but for subsequent weeks both combinations onlyscored the maximum of 50 percent In terms of Pearsonrsquos rCombination 3 has the best performance in overall compareto Combinations 1 and 2 while Combination 1 has the leastperformance Although on week 3 the Pearsonrsquos r ofCombination 3 is slightly lower than Combination 2however it is still above the Fabrsquos requirement

We then compare the forecast result using Combination3 to the statistical forecasting method used in the Fab Inorder to make the writing clearer the statistical forecastingmethod used in the Fab is abbreviated as Fab methodTables 13 and 14 tabulate the hit rate and Pearsonrsquos r of Fabrsquosmethod respectively Table 15 tabulates the comparison ofthe forecast results between the proposed method and Fabrsquosmethod Figure 16 shows the WIP forecast using Fabrsquosmethod

-e Fab method serves as the baseline to measure theperformance of the LSTM forecasting model From theresults tabulated in Table 15 the proposed method withLSTM forecasting model outperformed the Fab methodHowever both hit rate and Pearsonrsquos r of the proposedmethod is unable to remain consistent for three consecutiveweeks forecasted -e results also show that the IWIPforecasted by the Fab method consistently failed to meet therequirement of the Fab for both hit rate and Pearsonrsquos r -emain reason for the inaccuracy of the Fab method is that itonly considers for products with the number of wafersordered dominating the total WIP in the production lineHowever operators need to process other wafers from otherproducts as well Hence the wafers did not arrive on time aspredicted In addition the Fab method does not consider thenumber of tools available at each process steps to process thewafers and the total amount of time that each tool is used toprocess the wafers In real environment a tool can be takenoffline for maintenance purposes or it could be used by therespective engineers to process specially crafted wafers forresearch and development purposes Without taking intothese considerations the Fab method indirectly assumedthat the number of tools available and the time of each tooldedicated to process wafers are the same across the entireperiod of the wafer fabrication process -is assumptioncaused the forecasted results to have negative correlationwith the actual IWIP

For hit rate the proposed method only scored 50 forweek 3 while for Pearsonrsquos r the proposed method onlyscored 031 for week 2 One of the factors that caused thereduced performance of the model could be due to thereason that the size of the historical data to train the LSTMmodel is not large enough Larger historical IWIP data could

Computational Intelligence and Neuroscience 9

potentially allow the LSTM model of the proposed methodto discover more time-dependent relations in the Fabrsquosproduction environment With the additional time-dependent relations discovered the accuracy of themodelrsquos forecasting can be increased

e next factor that could contribute to the inconsistentresult of the model is the limited number of features thatused to represent Fabrsquos production environment Havingadditional features to represent Fabrsquos production environ-ment could allow LSTM to perform better modeling of WIParrival e examples of additional data that could serve assuch features are the actual number of equipment thatsupplies the WIP to the tool group of interest the amount oftime each equipment in the tool group is processing the

production wafers instead of performing other maintenanceactivities and the number of wafers that each equipment inthe tool group of interest has actually processed

e last factor that contributes to the inconsistent resultscould be the need for more hidden layers As the number ofhidden layers increases it creates a deeper neural networkthat could potential allow the model to capture even moretime-dependent relations in the data However in order tobenet from deeper neural network larger dataset must rstbe obtained so that the model can be properly trained

For the experiments conducted the selection of sizes forthe LSTM modelrsquos parameters and the number of experi-mental runs is largely aected by the hardware resourcesallocation and the software capability setup From the

Table 4 Parameter size selection results

CombinationParameter

RMSEEpoch Batch size n LSTM hidden layers stacked LSTM neuronsize

1 100 10 3 512 8 8 000962 100 10 3 512 8 16 000863 100 20 3 512 16 16 00091

0

0005

001

0015

002

0025

003

1 21 41 61 81

RMSE

Epoch

Training vs testing

TrainingTesting

Figure 10 Supervised-learning result for Combination 1

00060008

0010012001400160018

002002200240026

1 21 41 61 81

RMSE

Epoch

Training vs testing

TrainingTesting

Figure 11 Supervised-learning result for Combination 2

10 Computational Intelligence and Neuroscience

hardware resource perspective sucient CPU should beallocated in the computing machine while from the softwarecapability perspective parallelization should be enabled tofully utilize the available CPU With 4 CPUs allocated in avirtual machine environment and parallelization enabled inKeras it took approximately 8 hours to complete one fullexperiment One full experiment refers to the completeevaluation all the predened sizes For real productiondeployment 8 hours is too long to obtain a usable modelParallelization with sucient number of CPUs in thecomputing machine are therefore critical in the production

environment as the results should be obtained as fast aspossible in order for the managements to make the necessarydecision for production line stability Hence proper hard-ware planning is required for production deployment

5 Conclusion

PM activity is an important activity in the Fab as it maintainsor increases the operational eciency and reliability of thetool Proper PM planning is necessary as PM activity takessignicantly long time to complete thus it is desirable to

TrainingTesting

0

0005

001

0015

002

0025

003

1 21 41 61 81

RMSE

Epoch

Training vs testing

Figure 12 Supervised-learning result for Combination 3

Table 5 Hit rate for Combination 1

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

1 L2 L34 L H5 H L67 H H 1

HR 25

w3

1 L L 12 L3 L4 H5 H H 16 H7

HR 50

Table 6 Hit rate for Combination 2

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

1 L2 L L 134 L5 H H 16 H7 H

HR 50

w3

1 L2 L3 L4 H5 H H 16 H7 L

HR 25

Computational Intelligence and Neuroscience 11

perform this activity when the IWIP to the tool group isexpected to be low With an IWIP prediction model that iscapable to predict the IWIP with high accuracy PM activitycan be planned and managed better to reduce its negativeimpact to the fabrsquos CT Reducing the negative impact to CTisimportant as this will enable the Fab to meet the On-Time-Delivery (OTD) committed to customers With consistencyin the OTD the logistic management of the company can beimproved as well such as proper storage place planning to

keep the fabricated wafers and scheduling their trans-portations for shipments Well-planned PM activities alsoallow better manpower planning in areas such as manpowerplanning When performing the PM activity sufficient toolengineers and tool vendors are required to be onsite toperform the prescribed maintenance activities Well-planned PM activities allow the required manpower to beproperly prepared Well-planned manpower directly con-tributes to better manpower cost planning With proper PMplanning in-place tools in the Fab can be scheduled toreceive their proper maintenances on time It is importantfor tools in the Fab to receive their appropriate maintenanceson-time to improve its productivity and lifetime extensionWith improved performance and extended lifetime thecapital investments of the company on the tools can beoptimized Reliable tool performance will also increase thetrust of the customers as chances of the fabricated wafersbeing scrapped due to unhealthy tool are minimized

In this paper we investigated LSTM to assist in PMplanning in Fab by predicting the IWIP to a tool group -ecomparison of the performance of the proposed method wasdone with an existing forecasting method from the Fab -eproposed method was trained using the historical IWIP dataprovided by the Fab which is a time series data Both hit rateand Pearsonrsquos correlation coefficient are important criteriathat determine the forecast capability -e proposed methodhas demonstrated results that outperformed the Fab methodby reaching above the requirement of the Fab for week 1 andweek 3 while the Fabmethod fails to meet Fabrsquos requirementfor all three weeks In terms of hit rate the proposed methodshows higher percentage than Fabrsquos method Following therequirement given by the Fab the results of the proposedmethod signifies that for forecast duration of seven days it isable to identify more accurately the two days that the IWIPwill be highest and the two days the IWIP will be lowest in aweek In terms of Pearsonrsquos correlation coefficient r theproposed method shows positive correlation and highervalue than the Fabrsquos method -is result signifies that theforecast results of the proposed method are able to producebetter forecasting results that have closer proportionalchanges in relation to the actual IWIP -e LSTM model

Table 7 Hit rate for Combination 3

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

12 L L 134 L5 H H 167 H H 1

HR 75

w3

1 L L 12 L3 L4 H5 H H 16 H7

HR 50

Table 8 Pearsonrsquos r for combination 1

Week r

w1 031w2 006w3 034

Table 9 Pearsonrsquos r for combination 2

Week r

w1 040w2 028w3 046

Table 10 Pearsonrsquos r for combination 3

Week r

w1 042w2 031w3 043

Table 11 Summary of hit rate for Combinations 1 2 and 3

CombinationHit rate ()

w1 w2 w31 75 25 502 75 50 253 75 75 50

Table 12 Summary of Pearsonrsquos r for combinations 1 2 and 3

CombinationPearsonrsquos r

w1 w2 w31 031 006 0342 040 028 0463 042 031 043

12 Computational Intelligence and Neuroscience

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 10 batch sizeand 3 hidden layers (512 8 8 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 13 IWIP forecast using Combination 1

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 10 batch sizeand 3 hidden layers (512 8 16 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 14 IWIP forecast using Combination 2

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 20 batch sizeand 3 hidden layers (512 16 16 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 15 IWIP forecast using Combination 3

Computational Intelligence and Neuroscience 13

Table 13 Hit rate for Fab method

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L6 L7 L

HR 50

w2

12 L H3 H4 L5 H L6 L7 H

HR 0

w3

1 L H2 L34 H5 H L6 L7 H

HR 0

Table 14 Pearsonrsquos r for Fab method

Week r

w1 028w2 minus011w3 minus082

Table 15 Forecast result comparison between proposed method and Fab

Hit rate () Pearsonrsquos rWeek Week

w1 w2 w3 w1 w2 w3Proposed method 750 750 500 042 031 043Fab 500 00 00 028 minus011 minus082

ndash10000

00000

10000

20000

30000

40000

50000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using Fab method

ActualProposed LSTM10 upper limit

10 lower limitFab method

Figure 16 IWIP forecast using Fab method

14 Computational Intelligence and Neuroscience

used in the proposed method contains memory cells tomemorize the long and short temporal features in the datawhich yields better performance for the prediction of timeseries data -erefore the proposed method will be veryuseful and benefit to the PM planning

Although the proposed method is outperformed theexisting Fabrsquos statistical method there is still room forimprovement -e first future work is to increase the size ofhistorical dataset With the use of larger historical datasetthat spans across longer historical time horizon to train theLSTM model it may be potential for the LSTM model todiscover significantWIP arrival pattern that could have beenmissed out in smaller historical dataset -e second futurework is to extend the univariate forecasting model in thisresearch to multivariate forecasting model -e reason forthis extension is to allow the inclusion of more features totrain the LSTM model so that the LSTM model can bettermodel the actual environment of the Fab -e next futurework is to increase the number of hidden layers in the LSTMforecasting model -e approach to increase the number ofhidden layer is also an initial step to experiment the potentialuse of deep-learning model in time series forecasting -elast future work is to extend the application of the proposedmethod to predict the IWIP of other types of tool group toexperiment if the proposed method is capable to deliveringthe same prediction performance -e collected predictionresults across various types of tool groups from the futurework will also allow us to generalize the proposed method tobe used as a generic IWIP prediction model for the fab

Data Availability

-e time-series data used to support the findings of thisstudy were supplied by X-Fab Sarawak Sdn Bhd underprivacy agreement and there the data cannot be made freelyavailable -e data potentially reveal sensitive informationand therefore their access is being restricted

Conflicts of Interest

-e authors declare that they have no conflicts of interest

Acknowledgments

-e funding for this project was made possible through theresearch grant from the Ministry of Education Malaysiaunder the Research Acculturation Collaborative Effort(Grant No RACEb(3)12472015(03)) -e authors wouldlike to thank X-FAB Sarawak Sdn Bhd for their support inthis research by providing the environment and resources toextract the relevant data

References

[1] J A Ramırez-Hernandez J Crabtree X Yao et al ldquoOptimalpreventive maintenance scheduling in semiconductormanufacturing systems software tool and simulation casestudiesrdquo IEEE Transactions on Semiconductor Manufacturingvol 23 no 3 pp 477ndash489 2010

[2] Y Tian and L Pan ldquoPredicting short term traffic flow by longshort-termmemory recurrent neural networkrdquo in Proceedingsof 2015 IEEE International Conference on Smart Citypp 153ndash158 Chengdu China December 2015

[3] K Zhang J Xu M R Min G Jiang K Pelechrinis andH Zhang ldquoAutomated IT system failure prediction a deeplearning approachrdquo in Proceedings of 2016 IEEE InternationalConference on Big Data (Big Data) pp 1291ndash1300 Wash-ington DC USA March 2016

[4] G Zhu L Zhang P Shen and J Song ldquoMultimodal gesturerecognition using 3D convolution and convolutional LSTMrdquoIEEE Access vol 5 pp 4517ndash4524 2017

[5] J Lai B Chen T Tan S Tong and K Yu ldquoPhone-awareLSTM-RNN for voice conversionrdquo in Proceedings of 2016IEEE 13th International Conference on Signal Processing(ICSP) pp 177ndash182 Chengdu China November 2016

[6] A ElSaid B Wild J Higgins and T Desell ldquoUsing LSTMrecurrent neural networks to predict excess vibration events inaircraft enginesrdquo in Proceedings of 2016 IEEE 12th In-ternational Conference on e-Science pp 260ndash269 BaltimoreMD USA October 2016

[7] J Wang J Zhang and X Wang ldquoA data driven cycle timeprediction with feature selection in a semiconductor waferfabrication systemrdquo IEEE Transactions on SemiconductorManufacturing vol 31 no 1 pp 173ndash182 2018

[8] J Wang J Zhang and X Wang ldquoBilateral LSTM a two-dimensional long short-term memory model with multiplymemory units for short-term cycle time forecasting in re-entrant manufacturing systemsrdquo IEEE Transactions on In-dustrial Informatics vol 14 no 2 pp 748ndash758 2018

[9] W Scholl B P Gan P Lendermann et al ldquoImplementationof a simulation-based short-term lot arrival forecast in amature 200 mm semiconductor FABrdquo in Proceedings of 2011Winter Simulation Conference (WSC) pp 1927ndash1938Phoenix AZ USA December 2011

[10] M Mosinski D Noack F S Pappert O Rose andW SchollldquoCluster based analytical method for the lot delivery forecastin semiconductor fab with wide product rangerdquo in Pro-ceedings of 2011 Winter Simulation Conference (WSC)pp 1829ndash1839 Phoenix AZ USA December 2011

[11] H K Larry ldquoEvent-based short-term traffic flow predictionmodelrdquo Transportation Research Record vol 1510 pp 45ndash521995

[12] B M Williams and L A Hoel ldquoModeling and forecastingvehicular traffic flow as a seasonal ARIMA process theoreticalbasis and empirical resultsrdquo Journal of Transportation Engi-neering vol 129 no 6 pp 664ndash672 2003

[13] M Van Der Voort M Dougherty and S Watson ldquoCom-bining Kohonen maps with ARIMA time series models toforecast traffic flowrdquo Transportation Research Part CEmerging Technologies vol 4 no 5 pp 307ndash318 1996

[14] Y Xie Y Zhang and Z Ye ldquoShort-term traffic volumeforecasting using Kalman filter with discrete wavelet de-compositionrdquo Computer-Aided Civil and Infrastructure En-gineering vol 22 no 5 pp 326ndash334 2007

[15] W Huang G Song H Hong and K Xie ldquoDeep architecturefor traffic flow prediction deep belief networks with multitasklearningrdquo IEEE Transactions on Intelligent TransportationSystems vol 15 no 5 pp 2191ndash2201 2014

[16] A Abadi T Rajabioun and P A Ioannou ldquoTraffic flowprediction for road transportation networks with limitedtraffic datardquo IEEE Transactions on Intelligent TransportationSystems vol 16 no 2 pp 653ndash662 2015

Computational Intelligence and Neuroscience 15

[17] R Fu Z Zhang and L Li ldquoUsing LSTM and GRU neuralnetwork methods for traffic flow predictionrdquo in Proceedings of2016 31st Youth Academic Annual Conference of ChineseAssociation of Automation (YAC) pp 324ndash328 WuhanChina November 2016

[18] H Shao and B Soong ldquoTraffic flow prediction with longshort-term memory networks (LSTMs)rdquo in Proceedings of2016 IEEE Region 10 Conference (TENCON) pp 2986ndash2989Singapore November 2016

[19] M S Ahmed and A R Cook ldquoAnalysis of freeway traffictime-series data by using Box-Jenkins Techniquesrdquo Trans-portation Research Record vol 773 no 722 pp 1ndash9 1979

[20] I Okutani ldquo-e Kalman filtering approaches in sometransportation and traffic problemsrdquo Transportation ResearchRecord vol 2 no 1 pp 397ndash416 1987

[21] I Okutani and Y J Stephanedes ldquoDynamic prediction oftraffic volume through Kalman filtering theoryrdquo Trans-portation Research Part B Methodological vol 18 no 1pp 1ndash11 1984

[22] H J H Ji A X A Xu X S X Sui and L L L Li ldquo-e appliedresearch of Kalman in the dynamic travel time predictionrdquo inProceedings of 18th International Conference on Geo-informatics pp 1ndash5 Beijing China June 2010

[23] Y Wang and M Papageorgiou ldquoReal-time freeway trafficstate estimation based on extended Kalman filter a generalapproachrdquo Transportation Research Part B Methodologicalvol 39 no 2 pp 141ndash167 2005

[24] Y Bengio ldquoLearning deep architectures for AIrdquo Foundationsand Trendsreg in Machine Learning vol 2 no 1 pp 1ndash1272009

[25] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inProceedings of NIPS Lake Tahoe NV USA December 2012

[26] G E Hinton and R R Salakhutdinov ldquoReducing the di-mensionality of data with neural networksrdquo Science vol 313no 5786 pp 504ndash507 2006

[27] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of 25th ICML pp 160ndash167 HelsinkiFinland July 2008

[28] I J Goodfellow Y Bulatov J Ibarz S Arnoud and V ShetldquoMulti-digit number recognition from street view imageryusing deep convolutional neural networksrdquo 2013 httpsarxivorgabs13126082

[29] B Huval A Coates and A Ng ldquoDeep learning for class-generic object detectionrdquo 2013 httpsarxivorgabs13126885

[30] H C Shin M R Orton D J Collins S J Doran andM O Leach ldquoStacked autoencoders for unsupervised featurelearning and multiple organ detection in a pilot study using4D patient datardquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 35 no 8 pp 1930ndash1943 2013

[31] Y Lv Y Duan W Kang Z Li and F Wang ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16no 2 pp 865ndash873 2015

[32] R Zhao J Wang R Yan and K Mao ldquoMachine healthmonitoring with LSTM networksrdquo in Proceedings of 2016 10thInternational Conference on Sensing Technology (ICST)pp 1ndash6 Nanjing China November 2016

[33] X Ma Z Tao Y Wang H Yu and Y Wang ldquoLong short-term memory neural network for traffic speed predictionusing remote microwave sensor datardquo Transportation

Research Part C Emerging Technologies vol 54 pp 187ndash1972015

[34] E I Vlahogianni M G Karlaftis and J C Golias ldquoOptimizedand meta-optimized neural networks for short-term trafficflow prediction a genetic approachrdquo Transportation ResearchPart C Emerging Technologies vol 13 no 3 pp 211ndash2342005

[35] S Hochreiter and J Schmidhuber ldquoLong short-term mem-oryrdquo Neural Computation vol 9 no 8 pp 1735ndash1780 1997

[36] Keras -e Python Deep Learning Library 2018 httpskerasio

16 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 3: IncomingWork-In-ProgressPredictioninSemiconductor ...downloads.hindawi.com/journals/cin/2019/8729367.pdf · sequentially in time. e IWIP forecast to a tool group is similar to traffic

with large number of lots Proper lot classification is nec-essary to ensure that the cycle time statistics generated arerelevant In addition special software for lot scheduling isalso required to generate the relevant data

Due to the lack of similar research works done in thedomain of semiconductor fabrication a cross-reference tosimilar research problem in a different domain is necessaryFrom the literature reviews vehicle traffic arrival forecastingexhibits the closest similarity to the forecast ofWIP arrival inthe fab Consider the comparison of the following twomodels presented in Figures 1 and 2 Figure 1 presents thescenario geometry of a typical traffic arrival modelled byLarry [11] while Figure 2 presents a typical WIP arrivalscenario to an equipment group in a fab

In Figure 1 dr dt dl and dA each represents a trafficdetector deployed at their designated location while A and Beach denotes the two intersections of the road It is desired topredict the traffic flow approaching intersection A at de-tector dA where the actual traffic flow can be measured sothat the quality of the prediction can be assessed in real-timeIn Figure 2 S1 S2 and S3 each denotes the tool group thatsupplies lots to the equipment group W In Fab environ-ment the set of tools that perform the same wafer fabricationprocess are commonly grouped logically and termed as toolgroup Figure 3 depicts an example with six tools in Group S

S1 S2 and S3 are analogous to the three roads inter-connected at intersection B that supplies traffic to in-tersection A in Larryrsquos model [11] -e lots that are suppliedby S1 S2 and S3 becomes the WIP for W -e traffic inHeadrsquos model is therefore analogous to the WIP of W in thiswork Similar to the work of Larry [11] this research workwould like to predict the totalWIP that would arrive atW fora period of time in the future

According to Larry [11] in general traffic flow is a time-space phenomenon Many of the subsequent traffic flowprediction works have also modelled the traffic flow pre-diction as a form of time series problem A list of non-exhaustive related works include Tian and Pan [2] Williamsand Hoel [12] VanderVoort et al [13] Xie et al [14] Huanget al [15] Abadi et al [16] Fu et al [17] and Shao and Soong[18] -e models used by those research works can becategorized into two categories -e first category usesstatistical models while the second category uses machinelearning models

22 Prediction Models -e prediction models can com-monly be divided into 2 categories parametric models andnonparametric models Parametric models refer to modelswith fixed structure based on some assumptions and themodel parameters can be computed with empirical data [17]Autoregressive Integrated Moving Average (ARIMA) is oneof the most popular parametric models in time series pre-diction It was first proposed to predict short-term freewaytraffic in 1970s [19] Subsequently variants of ARIMA fortime series prediction were proposed such as Kohonen-ARIMA (KARIMA) [13] and seasonal ARIMA [12]According to [1] these models are based on the assumptionof stationary variance and mean of the time series Kalman

filter is another parametric approach to solving short-termtime series traffic flow prediction [20ndash23] In the most recentresearch Abadi et al [16] applied autoregressive model (AR)to predict traffic flow up to 30 minutes ahead -e authorsrsquowork uses complete traffic data such as the historical trafficdata collected from traffic links with traffic sensors topredict the short-term traffic flow As a case study theauthors predicted the flow of a downtown traffic in SanFrancisco USA and employed Monte Carlo simulations toevaluate their methodology-e authors reported an averageprediction error varying from two percent for five minutesprediction windows to twelve percent for 30 minutes pre-diction windows with the presence of unpredictable events

Nonparametric models refer to models with no fixedstructure and parameters [1] It is also known as data-driven

dt

B

A

dr

dA

dl

Figure 1 Geometric layout of traffic flow prediction scenario byLarry [6]

S1

W

S2 S3

Figure 2 Typical scenario ofWIP arrival to an equipment group ina fab

Tool 2 Tool 3Tool 1

Tool 5 Tool 6Tool 4

Group S

Figure 3 A logical grouping of tools into tool group

Computational Intelligence and Neuroscience 3

model Nonparametric models have gainedmuch attention insolving time series problem because of the modelsrsquo ability toaddress stochastic and nonlinear nature of time seriesproblem compared to parametric models [17] Artificialneural network (ANN) support vector machine (SVM) anddeep-learning neural networks are examples of non-parametric models -e discovery of deep-learning neuralnetwork [24] and its reported success [25] have drawn manyresearchersrsquo attention to apply deep-learning neural networkto solve various research problems Dimensionality reductionof data [26] natural language processing [27] number rec-ognition [28] object detection [29] and organ detection [30]are examples of published research works that have dem-onstrated the successful use of deep-learning neural network

LSTM a variant of deep-learning neural network hasrecently gained popularity in traffic flow prediction In [31]Duan et al have constructed 66 series of LSTM neuralnetwork for the 66 travel links in their data set and validatedthat 1-step ahead travel prediction error is relatively small In[32] Zhao et al evaluated the effectiveness of LSTM inmachine health monitoring systems by sampling data over100 thousands time steps sensory signal data and evaluatingthem over linear regression (LR) support vector regression(SVR) multilayer perceptron neural network (MLP) re-current neural network (RNN) Basic LSTM and DeepLSTM -e results showed that deep LSTM performs thebest among the evaluated methods According to the au-thors LSTM do not require any expert knowledge andfeature engineering as required by the LR SVR and MLPwhich may not be accessible in practice In addition with theintroduction of forget gates LSTM is able to capture long-term dependencies thus it is able to capture and discovermeaningful features in the signal data

In [2 32 33] the authors reported that LSTM and StackedAutoEncoders (SAE) have better performance in traffic flowpredictions than the traditional predictionmodels Accordingto [32] also LSTM reported to have better performance thanSAE In addition the comparison performed in [14] forLSTM gated recurrent units (GRU) neural network and autoregressive integrated moving average (ARIMA) in trafficprediction had demonstrated that the LSTM and GRU per-formed better than the ARIMA model

In [17] also Tian and Pan demonstrated the use of LSTMto achieve higher traffic prediction accuracy in short-termtraffic prediction as compared to [31 34] -e authors foundthat the mentioned models require the length of the inputhistorical data to be predefined and static In addition themodel cannot automatically determine the optimal time lagsWith the use of LSTM the author demonstrated that LSTMcan capture the nonlinearity and randomness of the trafficflowmore effectively Furthermore the use of LSTM also canovercome the issue of back-propagated error decay throughmemory blocks thereby increasing the accuracy of theprediction

3 Methodology

-e daily IWIP to a tool group is a form of time series data-is is because it is a sequence of values that observed

sequentially in time -e IWIP forecast to a tool group issimilar to traffic arrival forecast where the objective is toensure that there is enough capacity for the traffic to flowthrough with minimum obstruction for a given time-frame inthe future in order not to create any bottlenecks in the trafficflow-e amount ofWIP arriving to a tool group is analogicalto the number of vehicles arriving to a road junction or agroup of interlinks of interest -is research also requiresmultistep ahead forecasting approach as the research problemrequires forecasting the IWIP multiple days ahead from thelast observation in order to plan for PM activities

31 Statistical Incoming WIP Forecasting Method -eexisting solution in the Fab uses a basic statistical forecastingapproach to forecast the WIP arrival for all tool groups forthe next 7 days -e forecast is run once a week at thebeginning of each week -e calculation steps of statisticalforecasting approach are summarized in Table 1

-e existing forecasting method only caters to productswith the number of wafers ordered dominating the totalWIP in the production line -is is because the calculationrequires the number of operation steps and their respectiveTAT to calculate the forecasted arrival steps -e forecastedresults are therefore not accurate because the number ofwafers considered in the calculation differs from the actualnumber of wafers in the production line In addition themethod cannot predict the IWIP to a particular tool groupmore accurately because it does not include any algorithmsto capture the time-dependency relation between the data-is limits the ability of the Fab managers Fab managerrefers to personnel who is assigned with the managementresponsibility to oversee various aspect of the Fab to ensurethat the fabrsquos production line performs smoothly to create abetter PM activities schedule that could minimize negativeimpact to the production line -erefore it is important tocreate a forecasting model with better accuracy to assist Fabmanagers to carry out more effective PM activities planningthat can minimize the impact on CT

32 Long Short-TermMemory -e long short-termmemory(LSTM) was developed in 1997 by Hochereiter andSchmidhuber [35] to address the exploding and vanishinggradient phenomena in RNN -e presence of these twophenomena had caused RNN to suffer in the inability torecord information for longer period of time [18] In otherwords RNN is not able to capture long-term dependencies[32] -e solution to this problem is the introduction offorget gate into the neural network to avoid long-termdependencies -e forget gate is used during the trainingphase to decide when information from previous cell stateshould be forgotten In general LSTM has three gatesnamely the input gate the forget gate and the output gate-e key feature of LSTM is its gated memory cell and eachcell has the above mentioned three gates -ese gates areused to control the flow of information through each cell

Let time be denoted as t At time t the input to a LSTMcell is xt and its previous output is htminus1 -e cell input state is1113957Ct the cell output state is Ct and its previous state is Ctminus1

4 Computational Intelligence and Neuroscience

Input gate at time t is it forget gate is ft and output gate isot According to the structure of the LSTM cell Ct and ht willbe transmitted to the next cell in the network To calculate Ct

and ht we first define the following 4 equationsInput gate

it σ Wixt + Wihtminus1 + bi( 1113857 (1)

Forget gate

ft σ Wfxt + Wfhtminus1 + bf1113872 1113873 (2)

Output gate

ot σ Woxt + Wohtminus1 + bo( 1113857 (3)

Cell Input1113957Ct tan h WCxt + WChtminus1 + bC( 1113857 (4)

where Wweighted matrices b bias vector σ sigmoidfunction and tanh hyperbolic tangent function

Sigmoid function σ(x) is defined as follows

σ(x) 1

1 + exp(minusx) (5)

Hyperbolic tangent function tan h(x) is defined asfollows

tan h(x) exp(x)minus exp(minusx)

exp(x) + exp(minusx) (6)

Using equations (1) (2) and (4) we calculate the celloutput state using the following equation

Ct ft lowastCtminus1( 1113857 + it lowast 1113957Ct1113872 1113873 (7)

Lastly the hidden layer output is calculated using thefollowing equation

ht otlowast tan h Ct( 1113857 (8)

-e hidden layer of the LSTM can be stacked such thatthe architecture of neural network consists of more than oneLSTM hidden layers Figure 4 shows neural network ar-chitecture with one LSTM hidden layer while Figure 5shows neural network with two LSTM hidden layers stacked

With reference to Figure 5 each LSTM hidden layer isfully connected through recurrent connections (the con-nection is indicated by the dotted directional line) -esquares in the LSTM hidden layer represents the LSTMneurons the circles denoted with xi represents the input tothe LSTM neuron while the circles denoted with yi rep-resents the output of the LSTM neuron When the LSTMhidden layers are stacked each LSTM neuron in the lowerLSTM hidden layer is fully connected to each LSTM neuronin the LSTM hidden layer above it through feedforwardconnections which is denoted by the solid directional linebetween the stacked LSTM hidden layers

33 Proposed Method Figure 6 illustrates the proposedmethod -e historical IWIP data are first stored in a datastore to ease the management of the data -e historical dataare then extracted from the data store to be preprocessed-e preprocessing stage consists of two steps which are datascaling and data formatting

In the data scaling step the historical IWIP data to beused for supervised-learning is scaled according to thefollowing equation

y xminusmin

maxminus min (9)

where x denotes each of the historical IWIP value mindenotes the smallest historical IWIP value in the historicaldata and max denotes the largest historical IWIP value inthe historical data

-e next step is the data formatting step In time seriesdomain the term ldquolagsrdquo is commonly used to denote valuesof time steps observed prior to the prediction Generally thetime series data are separated into training and testing setwhere the training set contains the lags while the testing setcontains the actual values of future time steps -erefore inthe data formatting step let xi denotes each individual lagthe scaled historical IWIP data are formatted according tothe format tabulated in Tables 2 and 3 Tables 2 and 3 depictthe format of the training and testing dataset respectively

Following this format column X consists of a series oflags column Y consists of the number of time steps to beforecasted and column Z consists of the number of featuresused in the forecast Each row in column X will contain a setof seven IWIP points that corresponds to the number ofIWIP points to be forecasted -ese seven IWIP points aregrouped into a single set of value -e number of IWIPpoints to be forecasted is represented in column Y as timesteps Column Z has the value of one in the training datasetwhich corresponds to the single set of seven IWIP points incolumn X By putting seven IWIP points in both the trainingand testing dataset we are effectively telling LSTM that each

Table 1 Existing statistical WIP forecasting steps

Step Description

1 Given the process flow name of product P is Fretrieve all process steps for F

2For each process step s in F get the turn-around-time

of s TATs this value is predefined in themanufacturing execution system (MES) of the fab

3

Let LF denotes the total photolithography layers in FlF denotes the number of photolithography layers in F

completed per day the day-per-mask-layer (DPML)committed for P DPMLP is denoted as DPMLP

(1lF)

TATF denotes the total turn-around-time (TAT) forF denoted as TATF 1113936

Fs1TATs

-e run rate for F RRF isRRF ((LF times DPMLP)TATF)

4 Cycle time (CT) for a process step s CTs isCTs RRF times TATs

5For each lot sum the next n steps of s until the CTreaches 24 hours -e last s would be the forecasted

destination step of the lot after 24 hours

6

To forecast the destination step of the lot for the nextD day sum the next n steps of s until the CT reaches

D times 24 hours -e last s would be the forecasteddestination step of the lot for the next D day

Computational Intelligence and Neuroscience 5

future seven IWIP points are related to its immediateprevious seven IWIP points

As a nonparametric model neural network model doesnot have a fixed structure In a RNN with one hidden layerthe ability of the neural networks to discover importantrelationship in the training data during the supervised-

Data storing

Preprocessing

Data scaling

Data formatting

Supervised-learningParameter sizeidentification

Epoch

Batch size

LSTM neuron size

Stacked hidden layer size

Parameter combination evaluation and selection

Result measurement

Forecast

LSTM

f (x)

y0

h0

y1 y2 yn

x0 x1 x2 xn

hnndash1h1

Figure 6 Proposed method

Input layer

Output layer

LSTM hidden layer

y0

h0

y1 y2 yn

x0 x1 x2 xn

hnndash1h1

Figure 4 Nonstacked LSTM neural network

y0

h0

y1 y2 yn

x0 x1 x2 xn

hnndash1h1

y0

h0

y1 y2 yn

x0 x1 x2 xn

hnndash1h1

Input layer

Output layer

LSTM hidden layer

LSTM hidden layer

Figure 5 Stacked LSTM neural network

Table 2 Data formation for training dataset

Set Training datasetX Y Z

1 (x1 x2 x3 x4 x5 x6 x7) 7 12 (x2 x3 x4 x5 x6 x7 x8) 7 13 (x3 x4 x5 x6 x7 x8 x9) 7 14 (x4 x5 x6 x7 x8 x9 x10) 7 15 (x5 x6 x7 x8 x9 x10 x11) 7 16 (x6 x7 x8 x9 x10 x11 x12) 7 17 (x7 x8 x9 x10 x11 x12 x13) 7 1

Table 3 Data formation for testing dataset

Set Testing datasetX Y

1 (x8 x9 x10 x11 x12 x13 x14) 12 (x9 x10 x11 x12 x13 x14 x15) 13 (x10 x11 x12 x13 x14 x15 x16) 14 (x11 x12 x13 x14 x15 x16 x17) 15 (x12 x13 x14 x15 x16 x17 x18) 16 (x13 x14 x15 x16 x17 x18 x19) 17 (x14 x15 x16 x17 x18 x19 x20) 1

6 Computational Intelligence and Neuroscience

learning is affected by the batch size used per epoch thenumber of epoch hidden layers and hidden neuron -ecombination of the sizes of these four parameters that resultsin stable supervised-learning and delivers the lowest forecasterror is desired

Each parameter being examined will have a list ofpredefined sizes to be tested When one of the parameters isbeing examined the remaining two parameters will be fixedto their current sizes in their respective list -is is to controlthe variation across the examinations For each combinationof the parameters the model will be tested with thatcombination to measure its performance in terms of theforecasting error and the stability of its supervised-learning

For the LSTM setup of this research we construct aLSTM model using the LSTM cell Let t denote the obser-vation time of each IWIP and x denotes the IWIP the inputof the LSTMmodel is the observed IWIP x at time t denotedas xt and the output of the LSTM model is the predictedIWIP 1113957xt+1 -rough the LSTM equations presented 1113957xt+1 istherefore calculated as

1113957xt+1 W middot ht + b (10)

where W is the weight matrix between the output layer andthe hidden layer

-e metric used to measure the forecasting error in thesupervised-learning is the root-mean-squared error (RMSE)Let P denote the actual IWIP 1113957P denote the forecasted IWIPand n denotes the total day forecasted RMSE is defined asfollows

RMSE

1n

1113944

n

j1Pj minus 1113957Pj1113872 1113873

2

11139741113972

(11)

RMSE is a frequently used evaluation metric because itmeasures the difference between the values predicted by amodel and the actually observed values

For each parameter size combination to be tested themodel will be experimented multiple times with the sameparameters setting If N denotes the number of times theexperiment was conducted there will be N number of RMSEobtained to represent the performance of the model for eachexperiment -e reason running multiple experiments foreach parameter size combination is because internallyneural network uses randomization to assign the weightsand the states of its neurons -is produces different fore-casting errors between experiments -erefore multipleexperimental runs are recommended to allow for the se-lection of the neural network model with internal settings toproduce the lowest RMSE

After the supervised-learning is completed the proposedmethod will proceed to parameter combination evaluationand selection -e evaluation of parameter combination andselection step is necessary because it is common to assumethat a particular parameter combination that gives a lowRMSE at the end of the supervised-learning directlytranslated to a good parameter combination that allowssufficient capability of the model to perform forecastHowever this assumption is misleading because a model

that has overlearned during the supervised-learning candeliver results with very low RMSE at the end of the trainingAn overlearned model will perform poorly in the actualforecast -erefore it is necessary to also measure the sta-bility of the supervised-learning of the model given aparticular combination of the four parameters During eachepoch in the supervised-learning the model will be requiredto perform two forecasts one uses a reserved set from thetraining set and other uses a reserved set from the testing setWith two forecasts performed two RMSE will be generated-e RMSE generated by using training set is the trainingerror while the RMSE generated by using the reservedtesting set is the testing error To measure the stability of thesupervised-learning the RMSE for both training error andtesting error of each epoch are collected and plotted in asingle graph With y-axis representing the RMSE and x-axisrepresenting the number of epochs Figures 7ndash9 show ex-amples of the curves that exhibited from the supervised-learning -e combination of parameter sizes that allows themodel to exhibit learning curve pattern similar to Figure 7 isthe desired selection Learning curve with pattern similar toFigure 7 signifies that the model was able to perform stablesupervised-learning with stable reduction in the RMSE ofboth training and testing phases using the selected combi-nation of parameter sizes In other words themodel was ableto discover the time-dependent relation in the given datasetsuch that it allows the model to minimize its prediction errorfor each epoch of the supervised-learning

-e combination of the four parametersrsquo sizes that en-ables the model to show stable performance in thesupervised-learning and lowest RMSE will be selected toforecast the IWIP

For each of the selected parameter combination themodel is required to forecast for three consecutive weeks-e accuracies for the forecast results will be measuredaccording to the selected measurement metrics to evaluatethe forecasting capability of the model

4 Experimental Results

41DataDescriptionandExperimentalDesign -e IWIP fora particular tool group is denoted as IWIP and can becalculated as

IWIP WIPt24 minusWIPt1( 1113857 1113944

24

t1(MOVE) (12)

where MOVE denotes the number of wafer moved per hourt1 refers to the first hour the data are collected and t24 refersto the twenty-fourth hour the data are collected In thisstudy the first hour is at 0830 while the 0730 on the next dayis the twenty-fourth hour

-e data use for this experiment is acquired from theFabrsquos internal development database with the applicationrunning hourly to collect the WIP and calculate the numberof wafers moved for each tool group in the production lineevery 24 hours Due to the Fabrsquos data security and confi-dentiality policies we are only allowed to access productionsystemrsquos data source of the company to perform data

Computational Intelligence and Neuroscience 7

collection for a specic duration Given the allowed durationfrom the Fab we were able to collect three months data tocreate a data set with 90 days of historical IWIP With eachIWIP as a data point 70 percent of the data points are usedfor the LSTM training phase and the remaining 30 percentfor testing phase

For the number of epochs numerical values of 100 and200 are selected For batch size numerical values of 10 and20 are selected for the number of hidden layers numericalvalues of 3 and 4 are selected and for the number of hiddenneuron numerical values of 384 and 512 are selected for therst hidden layer while the numerical values of 8 and 16 areselected for the subsequent layers It is worthwhile tomention that by using seven IWIP points per dataset as thenumber of previous IWIP lags to be examined each of thenumerical values for batch size denotes the number of weekspresented to the LSTMmodel per epoche neural networkis initialized with uniformly distributed weights where theranges of the weights are (minus01 01) and trained usingmean-squared-error (MSE) as the loss function Adamoptimizer is used as the optimization function with default

learning rate η 0001 β1 09 β2 0999 ε 0 andc 0 Each combination of the selected values is thenevaluated three times to obtain the three RMSE results ofeach combination

Parameter size selection is done by selecting the lowestRMSE among the three experimental runs followed by ex-amining the graphs of the supervised-learning result of thesame run that produced the lowest RMSE e desiredsupervised-learning graph should resemble the pattern il-lustrated in Figure 7 e parameter size combinations thatdo not meet the required pattern will be discarded

42 Measurement Metrics To measure the performance ofthe models two accuracy measurements are used esetwo-measurement metrics are hit rate and correlationmeasurement

Hit rate or probability of detection (POD) is theprobability that the forecasted event matches the observedevent In the context of this research work the observedevents are either low IWIP or high IWIP erefore hit ratecan be used to measure the forecast capability of the pro-posed method to match the actual IWIP events Let HRdenotes hit rate n denotes the number correct detectionand N denotes the total number of observation hit rate isexpressed as

HR n

Ntimes 100 (13)

From the requirement of the Fab it is only necessary forthe proposedmethod to be able to forecast any two days withhighest IWIP and any two days with lowest IWIP For thesefour days to be forecasted the hit rate required by the Fab is75 percent In other words at least three out of these fourdays must be detected

To measure the correlation between the actual IWIP andthe forecasted IWIP this research uses the Pearsonrsquos cor-relation coecient r Pearsonrsquos r is a measure of the linearrelationship between two vectors of variables In this re-search work these two vectors of variables are the actual

Epoch

RMSE

TrainingTesting

Figure 7 RMSE curves when the model is well learned

TrainingTesting

Epoch

RMSE

Figure 8 RMSE curves when then model is underlearned

TrainingTesting

Epoch

RMSE

Figure 9 RMSE curves when the model is overlearned(overtting)

8 Computational Intelligence and Neuroscience

IWIP and the forecasted IWIP Let y denotes the actualIWIP and 1113957y denotes the forecasted IWIP Pearsonrsquos r isexpressed as

r cov(y 1113957y)

σyσ1113957y (14)

where cov is the covariance of actual IWIP and forecastedIWIP σy is the standard deviation of the actual IWIP and σ1113957yis the standard deviation of the forecasted IWIP

-e correlation coefficient takes values in the range [minus11] -e value of 1 implies that a linear equation describes therelationship between the two vectors perfectly -is meansthat all data points of the two vectors fit perfectly on astraight line on a graph -e positive sign of the coefficientindicates positive correlation -is means that as the actualIWIP increase and the forecasted IWIP increases as well-e negative sign of the coefficient indicates negative cor-relation -is means that as the actual IWIP increase theforecasted IWIP decreases as well Positive correlation istherefore desirable for the forecast results

Due to the Fabrsquos privacy protection agreement only theobtained Pearsonrsquos r will be reported while the detail cal-culations of the covariance and standard deviation will beomitted Based on the requirement of the Fab the minimumPearsonrsquos r value is 04

We conduct the experiment for three consecutive weeks-is allows us to monitor the consistency of the modelsrsquoprediction At the beginning of each week we will predictseven days ahead and measure the performance at the end ofeach week -e implementation of the proposed method isaccomplished using Python programming language andKeras [36] neural network library

43 Results Analysis and Discussions Table 4 tabulates theresults of the experiments -e parameter size combinationsobtained in Table 4 are combinations that exhibited curvepattern similar to Figure 7

Figures 10ndash12 show the graphs of the supervised-learning results for the selected three combinations re-spectively From the figures it can be seen that both lines arefar apart although they move along in descending pattern Inaddition the line graph of the training is descending slowlyand remained high at the end of the epoch However theneural network was still able to forecast the IWIP that arequite close to the actual IWIP -is is shown by the linegraph of the testingrsquos RMSE that exhibits small fluctuates Byreferring to these figures alone we are not able to identify thebest parameter size combination because all three graphsexhibit similar pattern -erefore hit rate and linear cor-relation of the forecasting results of each combination can beused to identify the best parameter size combination

-e parameters from each of the selected three com-binations were applied on the proposed LSTM model toperform the three consecutive weeks forecasting -e ex-periments were run and recorded separately for eachcombination Tables 5ndash7 tabulate the hit rate percentage forCombinations 1 2 and 3 respectively Tables 8ndash10 tabulatethe Pearsonrsquos r for Combinations 1 2 and 3 respectively

Table 11 summarizes the hit rate of Combinations 1 2 and 3while Table 12 summarizes the Pearsonrsquos r of Combinations1 2 and 3

Figures 13ndash15 shows the graph plots for the IWIPforecast for the three parameter size combinationsrespectively

From the results obtained the model performed the bestusing Combination 3 In terms of hit rate Combination 3scored the highest compare to Combinations 1 and 2 for allthree weeks Combinations 1 and 2 scored 75 percent forweek 1 but for subsequent weeks both combinations onlyscored the maximum of 50 percent In terms of Pearsonrsquos rCombination 3 has the best performance in overall compareto Combinations 1 and 2 while Combination 1 has the leastperformance Although on week 3 the Pearsonrsquos r ofCombination 3 is slightly lower than Combination 2however it is still above the Fabrsquos requirement

We then compare the forecast result using Combination3 to the statistical forecasting method used in the Fab Inorder to make the writing clearer the statistical forecastingmethod used in the Fab is abbreviated as Fab methodTables 13 and 14 tabulate the hit rate and Pearsonrsquos r of Fabrsquosmethod respectively Table 15 tabulates the comparison ofthe forecast results between the proposed method and Fabrsquosmethod Figure 16 shows the WIP forecast using Fabrsquosmethod

-e Fab method serves as the baseline to measure theperformance of the LSTM forecasting model From theresults tabulated in Table 15 the proposed method withLSTM forecasting model outperformed the Fab methodHowever both hit rate and Pearsonrsquos r of the proposedmethod is unable to remain consistent for three consecutiveweeks forecasted -e results also show that the IWIPforecasted by the Fab method consistently failed to meet therequirement of the Fab for both hit rate and Pearsonrsquos r -emain reason for the inaccuracy of the Fab method is that itonly considers for products with the number of wafersordered dominating the total WIP in the production lineHowever operators need to process other wafers from otherproducts as well Hence the wafers did not arrive on time aspredicted In addition the Fab method does not consider thenumber of tools available at each process steps to process thewafers and the total amount of time that each tool is used toprocess the wafers In real environment a tool can be takenoffline for maintenance purposes or it could be used by therespective engineers to process specially crafted wafers forresearch and development purposes Without taking intothese considerations the Fab method indirectly assumedthat the number of tools available and the time of each tooldedicated to process wafers are the same across the entireperiod of the wafer fabrication process -is assumptioncaused the forecasted results to have negative correlationwith the actual IWIP

For hit rate the proposed method only scored 50 forweek 3 while for Pearsonrsquos r the proposed method onlyscored 031 for week 2 One of the factors that caused thereduced performance of the model could be due to thereason that the size of the historical data to train the LSTMmodel is not large enough Larger historical IWIP data could

Computational Intelligence and Neuroscience 9

potentially allow the LSTM model of the proposed methodto discover more time-dependent relations in the Fabrsquosproduction environment With the additional time-dependent relations discovered the accuracy of themodelrsquos forecasting can be increased

e next factor that could contribute to the inconsistentresult of the model is the limited number of features thatused to represent Fabrsquos production environment Havingadditional features to represent Fabrsquos production environ-ment could allow LSTM to perform better modeling of WIParrival e examples of additional data that could serve assuch features are the actual number of equipment thatsupplies the WIP to the tool group of interest the amount oftime each equipment in the tool group is processing the

production wafers instead of performing other maintenanceactivities and the number of wafers that each equipment inthe tool group of interest has actually processed

e last factor that contributes to the inconsistent resultscould be the need for more hidden layers As the number ofhidden layers increases it creates a deeper neural networkthat could potential allow the model to capture even moretime-dependent relations in the data However in order tobenet from deeper neural network larger dataset must rstbe obtained so that the model can be properly trained

For the experiments conducted the selection of sizes forthe LSTM modelrsquos parameters and the number of experi-mental runs is largely aected by the hardware resourcesallocation and the software capability setup From the

Table 4 Parameter size selection results

CombinationParameter

RMSEEpoch Batch size n LSTM hidden layers stacked LSTM neuronsize

1 100 10 3 512 8 8 000962 100 10 3 512 8 16 000863 100 20 3 512 16 16 00091

0

0005

001

0015

002

0025

003

1 21 41 61 81

RMSE

Epoch

Training vs testing

TrainingTesting

Figure 10 Supervised-learning result for Combination 1

00060008

0010012001400160018

002002200240026

1 21 41 61 81

RMSE

Epoch

Training vs testing

TrainingTesting

Figure 11 Supervised-learning result for Combination 2

10 Computational Intelligence and Neuroscience

hardware resource perspective sucient CPU should beallocated in the computing machine while from the softwarecapability perspective parallelization should be enabled tofully utilize the available CPU With 4 CPUs allocated in avirtual machine environment and parallelization enabled inKeras it took approximately 8 hours to complete one fullexperiment One full experiment refers to the completeevaluation all the predened sizes For real productiondeployment 8 hours is too long to obtain a usable modelParallelization with sucient number of CPUs in thecomputing machine are therefore critical in the production

environment as the results should be obtained as fast aspossible in order for the managements to make the necessarydecision for production line stability Hence proper hard-ware planning is required for production deployment

5 Conclusion

PM activity is an important activity in the Fab as it maintainsor increases the operational eciency and reliability of thetool Proper PM planning is necessary as PM activity takessignicantly long time to complete thus it is desirable to

TrainingTesting

0

0005

001

0015

002

0025

003

1 21 41 61 81

RMSE

Epoch

Training vs testing

Figure 12 Supervised-learning result for Combination 3

Table 5 Hit rate for Combination 1

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

1 L2 L34 L H5 H L67 H H 1

HR 25

w3

1 L L 12 L3 L4 H5 H H 16 H7

HR 50

Table 6 Hit rate for Combination 2

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

1 L2 L L 134 L5 H H 16 H7 H

HR 50

w3

1 L2 L3 L4 H5 H H 16 H7 L

HR 25

Computational Intelligence and Neuroscience 11

perform this activity when the IWIP to the tool group isexpected to be low With an IWIP prediction model that iscapable to predict the IWIP with high accuracy PM activitycan be planned and managed better to reduce its negativeimpact to the fabrsquos CT Reducing the negative impact to CTisimportant as this will enable the Fab to meet the On-Time-Delivery (OTD) committed to customers With consistencyin the OTD the logistic management of the company can beimproved as well such as proper storage place planning to

keep the fabricated wafers and scheduling their trans-portations for shipments Well-planned PM activities alsoallow better manpower planning in areas such as manpowerplanning When performing the PM activity sufficient toolengineers and tool vendors are required to be onsite toperform the prescribed maintenance activities Well-planned PM activities allow the required manpower to beproperly prepared Well-planned manpower directly con-tributes to better manpower cost planning With proper PMplanning in-place tools in the Fab can be scheduled toreceive their proper maintenances on time It is importantfor tools in the Fab to receive their appropriate maintenanceson-time to improve its productivity and lifetime extensionWith improved performance and extended lifetime thecapital investments of the company on the tools can beoptimized Reliable tool performance will also increase thetrust of the customers as chances of the fabricated wafersbeing scrapped due to unhealthy tool are minimized

In this paper we investigated LSTM to assist in PMplanning in Fab by predicting the IWIP to a tool group -ecomparison of the performance of the proposed method wasdone with an existing forecasting method from the Fab -eproposed method was trained using the historical IWIP dataprovided by the Fab which is a time series data Both hit rateand Pearsonrsquos correlation coefficient are important criteriathat determine the forecast capability -e proposed methodhas demonstrated results that outperformed the Fab methodby reaching above the requirement of the Fab for week 1 andweek 3 while the Fabmethod fails to meet Fabrsquos requirementfor all three weeks In terms of hit rate the proposed methodshows higher percentage than Fabrsquos method Following therequirement given by the Fab the results of the proposedmethod signifies that for forecast duration of seven days it isable to identify more accurately the two days that the IWIPwill be highest and the two days the IWIP will be lowest in aweek In terms of Pearsonrsquos correlation coefficient r theproposed method shows positive correlation and highervalue than the Fabrsquos method -is result signifies that theforecast results of the proposed method are able to producebetter forecasting results that have closer proportionalchanges in relation to the actual IWIP -e LSTM model

Table 7 Hit rate for Combination 3

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

12 L L 134 L5 H H 167 H H 1

HR 75

w3

1 L L 12 L3 L4 H5 H H 16 H7

HR 50

Table 8 Pearsonrsquos r for combination 1

Week r

w1 031w2 006w3 034

Table 9 Pearsonrsquos r for combination 2

Week r

w1 040w2 028w3 046

Table 10 Pearsonrsquos r for combination 3

Week r

w1 042w2 031w3 043

Table 11 Summary of hit rate for Combinations 1 2 and 3

CombinationHit rate ()

w1 w2 w31 75 25 502 75 50 253 75 75 50

Table 12 Summary of Pearsonrsquos r for combinations 1 2 and 3

CombinationPearsonrsquos r

w1 w2 w31 031 006 0342 040 028 0463 042 031 043

12 Computational Intelligence and Neuroscience

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 10 batch sizeand 3 hidden layers (512 8 8 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 13 IWIP forecast using Combination 1

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 10 batch sizeand 3 hidden layers (512 8 16 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 14 IWIP forecast using Combination 2

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 20 batch sizeand 3 hidden layers (512 16 16 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 15 IWIP forecast using Combination 3

Computational Intelligence and Neuroscience 13

Table 13 Hit rate for Fab method

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L6 L7 L

HR 50

w2

12 L H3 H4 L5 H L6 L7 H

HR 0

w3

1 L H2 L34 H5 H L6 L7 H

HR 0

Table 14 Pearsonrsquos r for Fab method

Week r

w1 028w2 minus011w3 minus082

Table 15 Forecast result comparison between proposed method and Fab

Hit rate () Pearsonrsquos rWeek Week

w1 w2 w3 w1 w2 w3Proposed method 750 750 500 042 031 043Fab 500 00 00 028 minus011 minus082

ndash10000

00000

10000

20000

30000

40000

50000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using Fab method

ActualProposed LSTM10 upper limit

10 lower limitFab method

Figure 16 IWIP forecast using Fab method

14 Computational Intelligence and Neuroscience

used in the proposed method contains memory cells tomemorize the long and short temporal features in the datawhich yields better performance for the prediction of timeseries data -erefore the proposed method will be veryuseful and benefit to the PM planning

Although the proposed method is outperformed theexisting Fabrsquos statistical method there is still room forimprovement -e first future work is to increase the size ofhistorical dataset With the use of larger historical datasetthat spans across longer historical time horizon to train theLSTM model it may be potential for the LSTM model todiscover significantWIP arrival pattern that could have beenmissed out in smaller historical dataset -e second futurework is to extend the univariate forecasting model in thisresearch to multivariate forecasting model -e reason forthis extension is to allow the inclusion of more features totrain the LSTM model so that the LSTM model can bettermodel the actual environment of the Fab -e next futurework is to increase the number of hidden layers in the LSTMforecasting model -e approach to increase the number ofhidden layer is also an initial step to experiment the potentialuse of deep-learning model in time series forecasting -elast future work is to extend the application of the proposedmethod to predict the IWIP of other types of tool group toexperiment if the proposed method is capable to deliveringthe same prediction performance -e collected predictionresults across various types of tool groups from the futurework will also allow us to generalize the proposed method tobe used as a generic IWIP prediction model for the fab

Data Availability

-e time-series data used to support the findings of thisstudy were supplied by X-Fab Sarawak Sdn Bhd underprivacy agreement and there the data cannot be made freelyavailable -e data potentially reveal sensitive informationand therefore their access is being restricted

Conflicts of Interest

-e authors declare that they have no conflicts of interest

Acknowledgments

-e funding for this project was made possible through theresearch grant from the Ministry of Education Malaysiaunder the Research Acculturation Collaborative Effort(Grant No RACEb(3)12472015(03)) -e authors wouldlike to thank X-FAB Sarawak Sdn Bhd for their support inthis research by providing the environment and resources toextract the relevant data

References

[1] J A Ramırez-Hernandez J Crabtree X Yao et al ldquoOptimalpreventive maintenance scheduling in semiconductormanufacturing systems software tool and simulation casestudiesrdquo IEEE Transactions on Semiconductor Manufacturingvol 23 no 3 pp 477ndash489 2010

[2] Y Tian and L Pan ldquoPredicting short term traffic flow by longshort-termmemory recurrent neural networkrdquo in Proceedingsof 2015 IEEE International Conference on Smart Citypp 153ndash158 Chengdu China December 2015

[3] K Zhang J Xu M R Min G Jiang K Pelechrinis andH Zhang ldquoAutomated IT system failure prediction a deeplearning approachrdquo in Proceedings of 2016 IEEE InternationalConference on Big Data (Big Data) pp 1291ndash1300 Wash-ington DC USA March 2016

[4] G Zhu L Zhang P Shen and J Song ldquoMultimodal gesturerecognition using 3D convolution and convolutional LSTMrdquoIEEE Access vol 5 pp 4517ndash4524 2017

[5] J Lai B Chen T Tan S Tong and K Yu ldquoPhone-awareLSTM-RNN for voice conversionrdquo in Proceedings of 2016IEEE 13th International Conference on Signal Processing(ICSP) pp 177ndash182 Chengdu China November 2016

[6] A ElSaid B Wild J Higgins and T Desell ldquoUsing LSTMrecurrent neural networks to predict excess vibration events inaircraft enginesrdquo in Proceedings of 2016 IEEE 12th In-ternational Conference on e-Science pp 260ndash269 BaltimoreMD USA October 2016

[7] J Wang J Zhang and X Wang ldquoA data driven cycle timeprediction with feature selection in a semiconductor waferfabrication systemrdquo IEEE Transactions on SemiconductorManufacturing vol 31 no 1 pp 173ndash182 2018

[8] J Wang J Zhang and X Wang ldquoBilateral LSTM a two-dimensional long short-term memory model with multiplymemory units for short-term cycle time forecasting in re-entrant manufacturing systemsrdquo IEEE Transactions on In-dustrial Informatics vol 14 no 2 pp 748ndash758 2018

[9] W Scholl B P Gan P Lendermann et al ldquoImplementationof a simulation-based short-term lot arrival forecast in amature 200 mm semiconductor FABrdquo in Proceedings of 2011Winter Simulation Conference (WSC) pp 1927ndash1938Phoenix AZ USA December 2011

[10] M Mosinski D Noack F S Pappert O Rose andW SchollldquoCluster based analytical method for the lot delivery forecastin semiconductor fab with wide product rangerdquo in Pro-ceedings of 2011 Winter Simulation Conference (WSC)pp 1829ndash1839 Phoenix AZ USA December 2011

[11] H K Larry ldquoEvent-based short-term traffic flow predictionmodelrdquo Transportation Research Record vol 1510 pp 45ndash521995

[12] B M Williams and L A Hoel ldquoModeling and forecastingvehicular traffic flow as a seasonal ARIMA process theoreticalbasis and empirical resultsrdquo Journal of Transportation Engi-neering vol 129 no 6 pp 664ndash672 2003

[13] M Van Der Voort M Dougherty and S Watson ldquoCom-bining Kohonen maps with ARIMA time series models toforecast traffic flowrdquo Transportation Research Part CEmerging Technologies vol 4 no 5 pp 307ndash318 1996

[14] Y Xie Y Zhang and Z Ye ldquoShort-term traffic volumeforecasting using Kalman filter with discrete wavelet de-compositionrdquo Computer-Aided Civil and Infrastructure En-gineering vol 22 no 5 pp 326ndash334 2007

[15] W Huang G Song H Hong and K Xie ldquoDeep architecturefor traffic flow prediction deep belief networks with multitasklearningrdquo IEEE Transactions on Intelligent TransportationSystems vol 15 no 5 pp 2191ndash2201 2014

[16] A Abadi T Rajabioun and P A Ioannou ldquoTraffic flowprediction for road transportation networks with limitedtraffic datardquo IEEE Transactions on Intelligent TransportationSystems vol 16 no 2 pp 653ndash662 2015

Computational Intelligence and Neuroscience 15

[17] R Fu Z Zhang and L Li ldquoUsing LSTM and GRU neuralnetwork methods for traffic flow predictionrdquo in Proceedings of2016 31st Youth Academic Annual Conference of ChineseAssociation of Automation (YAC) pp 324ndash328 WuhanChina November 2016

[18] H Shao and B Soong ldquoTraffic flow prediction with longshort-term memory networks (LSTMs)rdquo in Proceedings of2016 IEEE Region 10 Conference (TENCON) pp 2986ndash2989Singapore November 2016

[19] M S Ahmed and A R Cook ldquoAnalysis of freeway traffictime-series data by using Box-Jenkins Techniquesrdquo Trans-portation Research Record vol 773 no 722 pp 1ndash9 1979

[20] I Okutani ldquo-e Kalman filtering approaches in sometransportation and traffic problemsrdquo Transportation ResearchRecord vol 2 no 1 pp 397ndash416 1987

[21] I Okutani and Y J Stephanedes ldquoDynamic prediction oftraffic volume through Kalman filtering theoryrdquo Trans-portation Research Part B Methodological vol 18 no 1pp 1ndash11 1984

[22] H J H Ji A X A Xu X S X Sui and L L L Li ldquo-e appliedresearch of Kalman in the dynamic travel time predictionrdquo inProceedings of 18th International Conference on Geo-informatics pp 1ndash5 Beijing China June 2010

[23] Y Wang and M Papageorgiou ldquoReal-time freeway trafficstate estimation based on extended Kalman filter a generalapproachrdquo Transportation Research Part B Methodologicalvol 39 no 2 pp 141ndash167 2005

[24] Y Bengio ldquoLearning deep architectures for AIrdquo Foundationsand Trendsreg in Machine Learning vol 2 no 1 pp 1ndash1272009

[25] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inProceedings of NIPS Lake Tahoe NV USA December 2012

[26] G E Hinton and R R Salakhutdinov ldquoReducing the di-mensionality of data with neural networksrdquo Science vol 313no 5786 pp 504ndash507 2006

[27] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of 25th ICML pp 160ndash167 HelsinkiFinland July 2008

[28] I J Goodfellow Y Bulatov J Ibarz S Arnoud and V ShetldquoMulti-digit number recognition from street view imageryusing deep convolutional neural networksrdquo 2013 httpsarxivorgabs13126082

[29] B Huval A Coates and A Ng ldquoDeep learning for class-generic object detectionrdquo 2013 httpsarxivorgabs13126885

[30] H C Shin M R Orton D J Collins S J Doran andM O Leach ldquoStacked autoencoders for unsupervised featurelearning and multiple organ detection in a pilot study using4D patient datardquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 35 no 8 pp 1930ndash1943 2013

[31] Y Lv Y Duan W Kang Z Li and F Wang ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16no 2 pp 865ndash873 2015

[32] R Zhao J Wang R Yan and K Mao ldquoMachine healthmonitoring with LSTM networksrdquo in Proceedings of 2016 10thInternational Conference on Sensing Technology (ICST)pp 1ndash6 Nanjing China November 2016

[33] X Ma Z Tao Y Wang H Yu and Y Wang ldquoLong short-term memory neural network for traffic speed predictionusing remote microwave sensor datardquo Transportation

Research Part C Emerging Technologies vol 54 pp 187ndash1972015

[34] E I Vlahogianni M G Karlaftis and J C Golias ldquoOptimizedand meta-optimized neural networks for short-term trafficflow prediction a genetic approachrdquo Transportation ResearchPart C Emerging Technologies vol 13 no 3 pp 211ndash2342005

[35] S Hochreiter and J Schmidhuber ldquoLong short-term mem-oryrdquo Neural Computation vol 9 no 8 pp 1735ndash1780 1997

[36] Keras -e Python Deep Learning Library 2018 httpskerasio

16 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 4: IncomingWork-In-ProgressPredictioninSemiconductor ...downloads.hindawi.com/journals/cin/2019/8729367.pdf · sequentially in time. e IWIP forecast to a tool group is similar to traffic

model Nonparametric models have gainedmuch attention insolving time series problem because of the modelsrsquo ability toaddress stochastic and nonlinear nature of time seriesproblem compared to parametric models [17] Artificialneural network (ANN) support vector machine (SVM) anddeep-learning neural networks are examples of non-parametric models -e discovery of deep-learning neuralnetwork [24] and its reported success [25] have drawn manyresearchersrsquo attention to apply deep-learning neural networkto solve various research problems Dimensionality reductionof data [26] natural language processing [27] number rec-ognition [28] object detection [29] and organ detection [30]are examples of published research works that have dem-onstrated the successful use of deep-learning neural network

LSTM a variant of deep-learning neural network hasrecently gained popularity in traffic flow prediction In [31]Duan et al have constructed 66 series of LSTM neuralnetwork for the 66 travel links in their data set and validatedthat 1-step ahead travel prediction error is relatively small In[32] Zhao et al evaluated the effectiveness of LSTM inmachine health monitoring systems by sampling data over100 thousands time steps sensory signal data and evaluatingthem over linear regression (LR) support vector regression(SVR) multilayer perceptron neural network (MLP) re-current neural network (RNN) Basic LSTM and DeepLSTM -e results showed that deep LSTM performs thebest among the evaluated methods According to the au-thors LSTM do not require any expert knowledge andfeature engineering as required by the LR SVR and MLPwhich may not be accessible in practice In addition with theintroduction of forget gates LSTM is able to capture long-term dependencies thus it is able to capture and discovermeaningful features in the signal data

In [2 32 33] the authors reported that LSTM and StackedAutoEncoders (SAE) have better performance in traffic flowpredictions than the traditional predictionmodels Accordingto [32] also LSTM reported to have better performance thanSAE In addition the comparison performed in [14] forLSTM gated recurrent units (GRU) neural network and autoregressive integrated moving average (ARIMA) in trafficprediction had demonstrated that the LSTM and GRU per-formed better than the ARIMA model

In [17] also Tian and Pan demonstrated the use of LSTMto achieve higher traffic prediction accuracy in short-termtraffic prediction as compared to [31 34] -e authors foundthat the mentioned models require the length of the inputhistorical data to be predefined and static In addition themodel cannot automatically determine the optimal time lagsWith the use of LSTM the author demonstrated that LSTMcan capture the nonlinearity and randomness of the trafficflowmore effectively Furthermore the use of LSTM also canovercome the issue of back-propagated error decay throughmemory blocks thereby increasing the accuracy of theprediction

3 Methodology

-e daily IWIP to a tool group is a form of time series data-is is because it is a sequence of values that observed

sequentially in time -e IWIP forecast to a tool group issimilar to traffic arrival forecast where the objective is toensure that there is enough capacity for the traffic to flowthrough with minimum obstruction for a given time-frame inthe future in order not to create any bottlenecks in the trafficflow-e amount ofWIP arriving to a tool group is analogicalto the number of vehicles arriving to a road junction or agroup of interlinks of interest -is research also requiresmultistep ahead forecasting approach as the research problemrequires forecasting the IWIP multiple days ahead from thelast observation in order to plan for PM activities

31 Statistical Incoming WIP Forecasting Method -eexisting solution in the Fab uses a basic statistical forecastingapproach to forecast the WIP arrival for all tool groups forthe next 7 days -e forecast is run once a week at thebeginning of each week -e calculation steps of statisticalforecasting approach are summarized in Table 1

-e existing forecasting method only caters to productswith the number of wafers ordered dominating the totalWIP in the production line -is is because the calculationrequires the number of operation steps and their respectiveTAT to calculate the forecasted arrival steps -e forecastedresults are therefore not accurate because the number ofwafers considered in the calculation differs from the actualnumber of wafers in the production line In addition themethod cannot predict the IWIP to a particular tool groupmore accurately because it does not include any algorithmsto capture the time-dependency relation between the data-is limits the ability of the Fab managers Fab managerrefers to personnel who is assigned with the managementresponsibility to oversee various aspect of the Fab to ensurethat the fabrsquos production line performs smoothly to create abetter PM activities schedule that could minimize negativeimpact to the production line -erefore it is important tocreate a forecasting model with better accuracy to assist Fabmanagers to carry out more effective PM activities planningthat can minimize the impact on CT

32 Long Short-TermMemory -e long short-termmemory(LSTM) was developed in 1997 by Hochereiter andSchmidhuber [35] to address the exploding and vanishinggradient phenomena in RNN -e presence of these twophenomena had caused RNN to suffer in the inability torecord information for longer period of time [18] In otherwords RNN is not able to capture long-term dependencies[32] -e solution to this problem is the introduction offorget gate into the neural network to avoid long-termdependencies -e forget gate is used during the trainingphase to decide when information from previous cell stateshould be forgotten In general LSTM has three gatesnamely the input gate the forget gate and the output gate-e key feature of LSTM is its gated memory cell and eachcell has the above mentioned three gates -ese gates areused to control the flow of information through each cell

Let time be denoted as t At time t the input to a LSTMcell is xt and its previous output is htminus1 -e cell input state is1113957Ct the cell output state is Ct and its previous state is Ctminus1

4 Computational Intelligence and Neuroscience

Input gate at time t is it forget gate is ft and output gate isot According to the structure of the LSTM cell Ct and ht willbe transmitted to the next cell in the network To calculate Ct

and ht we first define the following 4 equationsInput gate

it σ Wixt + Wihtminus1 + bi( 1113857 (1)

Forget gate

ft σ Wfxt + Wfhtminus1 + bf1113872 1113873 (2)

Output gate

ot σ Woxt + Wohtminus1 + bo( 1113857 (3)

Cell Input1113957Ct tan h WCxt + WChtminus1 + bC( 1113857 (4)

where Wweighted matrices b bias vector σ sigmoidfunction and tanh hyperbolic tangent function

Sigmoid function σ(x) is defined as follows

σ(x) 1

1 + exp(minusx) (5)

Hyperbolic tangent function tan h(x) is defined asfollows

tan h(x) exp(x)minus exp(minusx)

exp(x) + exp(minusx) (6)

Using equations (1) (2) and (4) we calculate the celloutput state using the following equation

Ct ft lowastCtminus1( 1113857 + it lowast 1113957Ct1113872 1113873 (7)

Lastly the hidden layer output is calculated using thefollowing equation

ht otlowast tan h Ct( 1113857 (8)

-e hidden layer of the LSTM can be stacked such thatthe architecture of neural network consists of more than oneLSTM hidden layers Figure 4 shows neural network ar-chitecture with one LSTM hidden layer while Figure 5shows neural network with two LSTM hidden layers stacked

With reference to Figure 5 each LSTM hidden layer isfully connected through recurrent connections (the con-nection is indicated by the dotted directional line) -esquares in the LSTM hidden layer represents the LSTMneurons the circles denoted with xi represents the input tothe LSTM neuron while the circles denoted with yi rep-resents the output of the LSTM neuron When the LSTMhidden layers are stacked each LSTM neuron in the lowerLSTM hidden layer is fully connected to each LSTM neuronin the LSTM hidden layer above it through feedforwardconnections which is denoted by the solid directional linebetween the stacked LSTM hidden layers

33 Proposed Method Figure 6 illustrates the proposedmethod -e historical IWIP data are first stored in a datastore to ease the management of the data -e historical dataare then extracted from the data store to be preprocessed-e preprocessing stage consists of two steps which are datascaling and data formatting

In the data scaling step the historical IWIP data to beused for supervised-learning is scaled according to thefollowing equation

y xminusmin

maxminus min (9)

where x denotes each of the historical IWIP value mindenotes the smallest historical IWIP value in the historicaldata and max denotes the largest historical IWIP value inthe historical data

-e next step is the data formatting step In time seriesdomain the term ldquolagsrdquo is commonly used to denote valuesof time steps observed prior to the prediction Generally thetime series data are separated into training and testing setwhere the training set contains the lags while the testing setcontains the actual values of future time steps -erefore inthe data formatting step let xi denotes each individual lagthe scaled historical IWIP data are formatted according tothe format tabulated in Tables 2 and 3 Tables 2 and 3 depictthe format of the training and testing dataset respectively

Following this format column X consists of a series oflags column Y consists of the number of time steps to beforecasted and column Z consists of the number of featuresused in the forecast Each row in column X will contain a setof seven IWIP points that corresponds to the number ofIWIP points to be forecasted -ese seven IWIP points aregrouped into a single set of value -e number of IWIPpoints to be forecasted is represented in column Y as timesteps Column Z has the value of one in the training datasetwhich corresponds to the single set of seven IWIP points incolumn X By putting seven IWIP points in both the trainingand testing dataset we are effectively telling LSTM that each

Table 1 Existing statistical WIP forecasting steps

Step Description

1 Given the process flow name of product P is Fretrieve all process steps for F

2For each process step s in F get the turn-around-time

of s TATs this value is predefined in themanufacturing execution system (MES) of the fab

3

Let LF denotes the total photolithography layers in FlF denotes the number of photolithography layers in F

completed per day the day-per-mask-layer (DPML)committed for P DPMLP is denoted as DPMLP

(1lF)

TATF denotes the total turn-around-time (TAT) forF denoted as TATF 1113936

Fs1TATs

-e run rate for F RRF isRRF ((LF times DPMLP)TATF)

4 Cycle time (CT) for a process step s CTs isCTs RRF times TATs

5For each lot sum the next n steps of s until the CTreaches 24 hours -e last s would be the forecasted

destination step of the lot after 24 hours

6

To forecast the destination step of the lot for the nextD day sum the next n steps of s until the CT reaches

D times 24 hours -e last s would be the forecasteddestination step of the lot for the next D day

Computational Intelligence and Neuroscience 5

future seven IWIP points are related to its immediateprevious seven IWIP points

As a nonparametric model neural network model doesnot have a fixed structure In a RNN with one hidden layerthe ability of the neural networks to discover importantrelationship in the training data during the supervised-

Data storing

Preprocessing

Data scaling

Data formatting

Supervised-learningParameter sizeidentification

Epoch

Batch size

LSTM neuron size

Stacked hidden layer size

Parameter combination evaluation and selection

Result measurement

Forecast

LSTM

f (x)

y0

h0

y1 y2 yn

x0 x1 x2 xn

hnndash1h1

Figure 6 Proposed method

Input layer

Output layer

LSTM hidden layer

y0

h0

y1 y2 yn

x0 x1 x2 xn

hnndash1h1

Figure 4 Nonstacked LSTM neural network

y0

h0

y1 y2 yn

x0 x1 x2 xn

hnndash1h1

y0

h0

y1 y2 yn

x0 x1 x2 xn

hnndash1h1

Input layer

Output layer

LSTM hidden layer

LSTM hidden layer

Figure 5 Stacked LSTM neural network

Table 2 Data formation for training dataset

Set Training datasetX Y Z

1 (x1 x2 x3 x4 x5 x6 x7) 7 12 (x2 x3 x4 x5 x6 x7 x8) 7 13 (x3 x4 x5 x6 x7 x8 x9) 7 14 (x4 x5 x6 x7 x8 x9 x10) 7 15 (x5 x6 x7 x8 x9 x10 x11) 7 16 (x6 x7 x8 x9 x10 x11 x12) 7 17 (x7 x8 x9 x10 x11 x12 x13) 7 1

Table 3 Data formation for testing dataset

Set Testing datasetX Y

1 (x8 x9 x10 x11 x12 x13 x14) 12 (x9 x10 x11 x12 x13 x14 x15) 13 (x10 x11 x12 x13 x14 x15 x16) 14 (x11 x12 x13 x14 x15 x16 x17) 15 (x12 x13 x14 x15 x16 x17 x18) 16 (x13 x14 x15 x16 x17 x18 x19) 17 (x14 x15 x16 x17 x18 x19 x20) 1

6 Computational Intelligence and Neuroscience

learning is affected by the batch size used per epoch thenumber of epoch hidden layers and hidden neuron -ecombination of the sizes of these four parameters that resultsin stable supervised-learning and delivers the lowest forecasterror is desired

Each parameter being examined will have a list ofpredefined sizes to be tested When one of the parameters isbeing examined the remaining two parameters will be fixedto their current sizes in their respective list -is is to controlthe variation across the examinations For each combinationof the parameters the model will be tested with thatcombination to measure its performance in terms of theforecasting error and the stability of its supervised-learning

For the LSTM setup of this research we construct aLSTM model using the LSTM cell Let t denote the obser-vation time of each IWIP and x denotes the IWIP the inputof the LSTMmodel is the observed IWIP x at time t denotedas xt and the output of the LSTM model is the predictedIWIP 1113957xt+1 -rough the LSTM equations presented 1113957xt+1 istherefore calculated as

1113957xt+1 W middot ht + b (10)

where W is the weight matrix between the output layer andthe hidden layer

-e metric used to measure the forecasting error in thesupervised-learning is the root-mean-squared error (RMSE)Let P denote the actual IWIP 1113957P denote the forecasted IWIPand n denotes the total day forecasted RMSE is defined asfollows

RMSE

1n

1113944

n

j1Pj minus 1113957Pj1113872 1113873

2

11139741113972

(11)

RMSE is a frequently used evaluation metric because itmeasures the difference between the values predicted by amodel and the actually observed values

For each parameter size combination to be tested themodel will be experimented multiple times with the sameparameters setting If N denotes the number of times theexperiment was conducted there will be N number of RMSEobtained to represent the performance of the model for eachexperiment -e reason running multiple experiments foreach parameter size combination is because internallyneural network uses randomization to assign the weightsand the states of its neurons -is produces different fore-casting errors between experiments -erefore multipleexperimental runs are recommended to allow for the se-lection of the neural network model with internal settings toproduce the lowest RMSE

After the supervised-learning is completed the proposedmethod will proceed to parameter combination evaluationand selection -e evaluation of parameter combination andselection step is necessary because it is common to assumethat a particular parameter combination that gives a lowRMSE at the end of the supervised-learning directlytranslated to a good parameter combination that allowssufficient capability of the model to perform forecastHowever this assumption is misleading because a model

that has overlearned during the supervised-learning candeliver results with very low RMSE at the end of the trainingAn overlearned model will perform poorly in the actualforecast -erefore it is necessary to also measure the sta-bility of the supervised-learning of the model given aparticular combination of the four parameters During eachepoch in the supervised-learning the model will be requiredto perform two forecasts one uses a reserved set from thetraining set and other uses a reserved set from the testing setWith two forecasts performed two RMSE will be generated-e RMSE generated by using training set is the trainingerror while the RMSE generated by using the reservedtesting set is the testing error To measure the stability of thesupervised-learning the RMSE for both training error andtesting error of each epoch are collected and plotted in asingle graph With y-axis representing the RMSE and x-axisrepresenting the number of epochs Figures 7ndash9 show ex-amples of the curves that exhibited from the supervised-learning -e combination of parameter sizes that allows themodel to exhibit learning curve pattern similar to Figure 7 isthe desired selection Learning curve with pattern similar toFigure 7 signifies that the model was able to perform stablesupervised-learning with stable reduction in the RMSE ofboth training and testing phases using the selected combi-nation of parameter sizes In other words themodel was ableto discover the time-dependent relation in the given datasetsuch that it allows the model to minimize its prediction errorfor each epoch of the supervised-learning

-e combination of the four parametersrsquo sizes that en-ables the model to show stable performance in thesupervised-learning and lowest RMSE will be selected toforecast the IWIP

For each of the selected parameter combination themodel is required to forecast for three consecutive weeks-e accuracies for the forecast results will be measuredaccording to the selected measurement metrics to evaluatethe forecasting capability of the model

4 Experimental Results

41DataDescriptionandExperimentalDesign -e IWIP fora particular tool group is denoted as IWIP and can becalculated as

IWIP WIPt24 minusWIPt1( 1113857 1113944

24

t1(MOVE) (12)

where MOVE denotes the number of wafer moved per hourt1 refers to the first hour the data are collected and t24 refersto the twenty-fourth hour the data are collected In thisstudy the first hour is at 0830 while the 0730 on the next dayis the twenty-fourth hour

-e data use for this experiment is acquired from theFabrsquos internal development database with the applicationrunning hourly to collect the WIP and calculate the numberof wafers moved for each tool group in the production lineevery 24 hours Due to the Fabrsquos data security and confi-dentiality policies we are only allowed to access productionsystemrsquos data source of the company to perform data

Computational Intelligence and Neuroscience 7

collection for a specic duration Given the allowed durationfrom the Fab we were able to collect three months data tocreate a data set with 90 days of historical IWIP With eachIWIP as a data point 70 percent of the data points are usedfor the LSTM training phase and the remaining 30 percentfor testing phase

For the number of epochs numerical values of 100 and200 are selected For batch size numerical values of 10 and20 are selected for the number of hidden layers numericalvalues of 3 and 4 are selected and for the number of hiddenneuron numerical values of 384 and 512 are selected for therst hidden layer while the numerical values of 8 and 16 areselected for the subsequent layers It is worthwhile tomention that by using seven IWIP points per dataset as thenumber of previous IWIP lags to be examined each of thenumerical values for batch size denotes the number of weekspresented to the LSTMmodel per epoche neural networkis initialized with uniformly distributed weights where theranges of the weights are (minus01 01) and trained usingmean-squared-error (MSE) as the loss function Adamoptimizer is used as the optimization function with default

learning rate η 0001 β1 09 β2 0999 ε 0 andc 0 Each combination of the selected values is thenevaluated three times to obtain the three RMSE results ofeach combination

Parameter size selection is done by selecting the lowestRMSE among the three experimental runs followed by ex-amining the graphs of the supervised-learning result of thesame run that produced the lowest RMSE e desiredsupervised-learning graph should resemble the pattern il-lustrated in Figure 7 e parameter size combinations thatdo not meet the required pattern will be discarded

42 Measurement Metrics To measure the performance ofthe models two accuracy measurements are used esetwo-measurement metrics are hit rate and correlationmeasurement

Hit rate or probability of detection (POD) is theprobability that the forecasted event matches the observedevent In the context of this research work the observedevents are either low IWIP or high IWIP erefore hit ratecan be used to measure the forecast capability of the pro-posed method to match the actual IWIP events Let HRdenotes hit rate n denotes the number correct detectionand N denotes the total number of observation hit rate isexpressed as

HR n

Ntimes 100 (13)

From the requirement of the Fab it is only necessary forthe proposedmethod to be able to forecast any two days withhighest IWIP and any two days with lowest IWIP For thesefour days to be forecasted the hit rate required by the Fab is75 percent In other words at least three out of these fourdays must be detected

To measure the correlation between the actual IWIP andthe forecasted IWIP this research uses the Pearsonrsquos cor-relation coecient r Pearsonrsquos r is a measure of the linearrelationship between two vectors of variables In this re-search work these two vectors of variables are the actual

Epoch

RMSE

TrainingTesting

Figure 7 RMSE curves when the model is well learned

TrainingTesting

Epoch

RMSE

Figure 8 RMSE curves when then model is underlearned

TrainingTesting

Epoch

RMSE

Figure 9 RMSE curves when the model is overlearned(overtting)

8 Computational Intelligence and Neuroscience

IWIP and the forecasted IWIP Let y denotes the actualIWIP and 1113957y denotes the forecasted IWIP Pearsonrsquos r isexpressed as

r cov(y 1113957y)

σyσ1113957y (14)

where cov is the covariance of actual IWIP and forecastedIWIP σy is the standard deviation of the actual IWIP and σ1113957yis the standard deviation of the forecasted IWIP

-e correlation coefficient takes values in the range [minus11] -e value of 1 implies that a linear equation describes therelationship between the two vectors perfectly -is meansthat all data points of the two vectors fit perfectly on astraight line on a graph -e positive sign of the coefficientindicates positive correlation -is means that as the actualIWIP increase and the forecasted IWIP increases as well-e negative sign of the coefficient indicates negative cor-relation -is means that as the actual IWIP increase theforecasted IWIP decreases as well Positive correlation istherefore desirable for the forecast results

Due to the Fabrsquos privacy protection agreement only theobtained Pearsonrsquos r will be reported while the detail cal-culations of the covariance and standard deviation will beomitted Based on the requirement of the Fab the minimumPearsonrsquos r value is 04

We conduct the experiment for three consecutive weeks-is allows us to monitor the consistency of the modelsrsquoprediction At the beginning of each week we will predictseven days ahead and measure the performance at the end ofeach week -e implementation of the proposed method isaccomplished using Python programming language andKeras [36] neural network library

43 Results Analysis and Discussions Table 4 tabulates theresults of the experiments -e parameter size combinationsobtained in Table 4 are combinations that exhibited curvepattern similar to Figure 7

Figures 10ndash12 show the graphs of the supervised-learning results for the selected three combinations re-spectively From the figures it can be seen that both lines arefar apart although they move along in descending pattern Inaddition the line graph of the training is descending slowlyand remained high at the end of the epoch However theneural network was still able to forecast the IWIP that arequite close to the actual IWIP -is is shown by the linegraph of the testingrsquos RMSE that exhibits small fluctuates Byreferring to these figures alone we are not able to identify thebest parameter size combination because all three graphsexhibit similar pattern -erefore hit rate and linear cor-relation of the forecasting results of each combination can beused to identify the best parameter size combination

-e parameters from each of the selected three com-binations were applied on the proposed LSTM model toperform the three consecutive weeks forecasting -e ex-periments were run and recorded separately for eachcombination Tables 5ndash7 tabulate the hit rate percentage forCombinations 1 2 and 3 respectively Tables 8ndash10 tabulatethe Pearsonrsquos r for Combinations 1 2 and 3 respectively

Table 11 summarizes the hit rate of Combinations 1 2 and 3while Table 12 summarizes the Pearsonrsquos r of Combinations1 2 and 3

Figures 13ndash15 shows the graph plots for the IWIPforecast for the three parameter size combinationsrespectively

From the results obtained the model performed the bestusing Combination 3 In terms of hit rate Combination 3scored the highest compare to Combinations 1 and 2 for allthree weeks Combinations 1 and 2 scored 75 percent forweek 1 but for subsequent weeks both combinations onlyscored the maximum of 50 percent In terms of Pearsonrsquos rCombination 3 has the best performance in overall compareto Combinations 1 and 2 while Combination 1 has the leastperformance Although on week 3 the Pearsonrsquos r ofCombination 3 is slightly lower than Combination 2however it is still above the Fabrsquos requirement

We then compare the forecast result using Combination3 to the statistical forecasting method used in the Fab Inorder to make the writing clearer the statistical forecastingmethod used in the Fab is abbreviated as Fab methodTables 13 and 14 tabulate the hit rate and Pearsonrsquos r of Fabrsquosmethod respectively Table 15 tabulates the comparison ofthe forecast results between the proposed method and Fabrsquosmethod Figure 16 shows the WIP forecast using Fabrsquosmethod

-e Fab method serves as the baseline to measure theperformance of the LSTM forecasting model From theresults tabulated in Table 15 the proposed method withLSTM forecasting model outperformed the Fab methodHowever both hit rate and Pearsonrsquos r of the proposedmethod is unable to remain consistent for three consecutiveweeks forecasted -e results also show that the IWIPforecasted by the Fab method consistently failed to meet therequirement of the Fab for both hit rate and Pearsonrsquos r -emain reason for the inaccuracy of the Fab method is that itonly considers for products with the number of wafersordered dominating the total WIP in the production lineHowever operators need to process other wafers from otherproducts as well Hence the wafers did not arrive on time aspredicted In addition the Fab method does not consider thenumber of tools available at each process steps to process thewafers and the total amount of time that each tool is used toprocess the wafers In real environment a tool can be takenoffline for maintenance purposes or it could be used by therespective engineers to process specially crafted wafers forresearch and development purposes Without taking intothese considerations the Fab method indirectly assumedthat the number of tools available and the time of each tooldedicated to process wafers are the same across the entireperiod of the wafer fabrication process -is assumptioncaused the forecasted results to have negative correlationwith the actual IWIP

For hit rate the proposed method only scored 50 forweek 3 while for Pearsonrsquos r the proposed method onlyscored 031 for week 2 One of the factors that caused thereduced performance of the model could be due to thereason that the size of the historical data to train the LSTMmodel is not large enough Larger historical IWIP data could

Computational Intelligence and Neuroscience 9

potentially allow the LSTM model of the proposed methodto discover more time-dependent relations in the Fabrsquosproduction environment With the additional time-dependent relations discovered the accuracy of themodelrsquos forecasting can be increased

e next factor that could contribute to the inconsistentresult of the model is the limited number of features thatused to represent Fabrsquos production environment Havingadditional features to represent Fabrsquos production environ-ment could allow LSTM to perform better modeling of WIParrival e examples of additional data that could serve assuch features are the actual number of equipment thatsupplies the WIP to the tool group of interest the amount oftime each equipment in the tool group is processing the

production wafers instead of performing other maintenanceactivities and the number of wafers that each equipment inthe tool group of interest has actually processed

e last factor that contributes to the inconsistent resultscould be the need for more hidden layers As the number ofhidden layers increases it creates a deeper neural networkthat could potential allow the model to capture even moretime-dependent relations in the data However in order tobenet from deeper neural network larger dataset must rstbe obtained so that the model can be properly trained

For the experiments conducted the selection of sizes forthe LSTM modelrsquos parameters and the number of experi-mental runs is largely aected by the hardware resourcesallocation and the software capability setup From the

Table 4 Parameter size selection results

CombinationParameter

RMSEEpoch Batch size n LSTM hidden layers stacked LSTM neuronsize

1 100 10 3 512 8 8 000962 100 10 3 512 8 16 000863 100 20 3 512 16 16 00091

0

0005

001

0015

002

0025

003

1 21 41 61 81

RMSE

Epoch

Training vs testing

TrainingTesting

Figure 10 Supervised-learning result for Combination 1

00060008

0010012001400160018

002002200240026

1 21 41 61 81

RMSE

Epoch

Training vs testing

TrainingTesting

Figure 11 Supervised-learning result for Combination 2

10 Computational Intelligence and Neuroscience

hardware resource perspective sucient CPU should beallocated in the computing machine while from the softwarecapability perspective parallelization should be enabled tofully utilize the available CPU With 4 CPUs allocated in avirtual machine environment and parallelization enabled inKeras it took approximately 8 hours to complete one fullexperiment One full experiment refers to the completeevaluation all the predened sizes For real productiondeployment 8 hours is too long to obtain a usable modelParallelization with sucient number of CPUs in thecomputing machine are therefore critical in the production

environment as the results should be obtained as fast aspossible in order for the managements to make the necessarydecision for production line stability Hence proper hard-ware planning is required for production deployment

5 Conclusion

PM activity is an important activity in the Fab as it maintainsor increases the operational eciency and reliability of thetool Proper PM planning is necessary as PM activity takessignicantly long time to complete thus it is desirable to

TrainingTesting

0

0005

001

0015

002

0025

003

1 21 41 61 81

RMSE

Epoch

Training vs testing

Figure 12 Supervised-learning result for Combination 3

Table 5 Hit rate for Combination 1

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

1 L2 L34 L H5 H L67 H H 1

HR 25

w3

1 L L 12 L3 L4 H5 H H 16 H7

HR 50

Table 6 Hit rate for Combination 2

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

1 L2 L L 134 L5 H H 16 H7 H

HR 50

w3

1 L2 L3 L4 H5 H H 16 H7 L

HR 25

Computational Intelligence and Neuroscience 11

perform this activity when the IWIP to the tool group isexpected to be low With an IWIP prediction model that iscapable to predict the IWIP with high accuracy PM activitycan be planned and managed better to reduce its negativeimpact to the fabrsquos CT Reducing the negative impact to CTisimportant as this will enable the Fab to meet the On-Time-Delivery (OTD) committed to customers With consistencyin the OTD the logistic management of the company can beimproved as well such as proper storage place planning to

keep the fabricated wafers and scheduling their trans-portations for shipments Well-planned PM activities alsoallow better manpower planning in areas such as manpowerplanning When performing the PM activity sufficient toolengineers and tool vendors are required to be onsite toperform the prescribed maintenance activities Well-planned PM activities allow the required manpower to beproperly prepared Well-planned manpower directly con-tributes to better manpower cost planning With proper PMplanning in-place tools in the Fab can be scheduled toreceive their proper maintenances on time It is importantfor tools in the Fab to receive their appropriate maintenanceson-time to improve its productivity and lifetime extensionWith improved performance and extended lifetime thecapital investments of the company on the tools can beoptimized Reliable tool performance will also increase thetrust of the customers as chances of the fabricated wafersbeing scrapped due to unhealthy tool are minimized

In this paper we investigated LSTM to assist in PMplanning in Fab by predicting the IWIP to a tool group -ecomparison of the performance of the proposed method wasdone with an existing forecasting method from the Fab -eproposed method was trained using the historical IWIP dataprovided by the Fab which is a time series data Both hit rateand Pearsonrsquos correlation coefficient are important criteriathat determine the forecast capability -e proposed methodhas demonstrated results that outperformed the Fab methodby reaching above the requirement of the Fab for week 1 andweek 3 while the Fabmethod fails to meet Fabrsquos requirementfor all three weeks In terms of hit rate the proposed methodshows higher percentage than Fabrsquos method Following therequirement given by the Fab the results of the proposedmethod signifies that for forecast duration of seven days it isable to identify more accurately the two days that the IWIPwill be highest and the two days the IWIP will be lowest in aweek In terms of Pearsonrsquos correlation coefficient r theproposed method shows positive correlation and highervalue than the Fabrsquos method -is result signifies that theforecast results of the proposed method are able to producebetter forecasting results that have closer proportionalchanges in relation to the actual IWIP -e LSTM model

Table 7 Hit rate for Combination 3

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

12 L L 134 L5 H H 167 H H 1

HR 75

w3

1 L L 12 L3 L4 H5 H H 16 H7

HR 50

Table 8 Pearsonrsquos r for combination 1

Week r

w1 031w2 006w3 034

Table 9 Pearsonrsquos r for combination 2

Week r

w1 040w2 028w3 046

Table 10 Pearsonrsquos r for combination 3

Week r

w1 042w2 031w3 043

Table 11 Summary of hit rate for Combinations 1 2 and 3

CombinationHit rate ()

w1 w2 w31 75 25 502 75 50 253 75 75 50

Table 12 Summary of Pearsonrsquos r for combinations 1 2 and 3

CombinationPearsonrsquos r

w1 w2 w31 031 006 0342 040 028 0463 042 031 043

12 Computational Intelligence and Neuroscience

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 10 batch sizeand 3 hidden layers (512 8 8 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 13 IWIP forecast using Combination 1

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 10 batch sizeand 3 hidden layers (512 8 16 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 14 IWIP forecast using Combination 2

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 20 batch sizeand 3 hidden layers (512 16 16 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 15 IWIP forecast using Combination 3

Computational Intelligence and Neuroscience 13

Table 13 Hit rate for Fab method

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L6 L7 L

HR 50

w2

12 L H3 H4 L5 H L6 L7 H

HR 0

w3

1 L H2 L34 H5 H L6 L7 H

HR 0

Table 14 Pearsonrsquos r for Fab method

Week r

w1 028w2 minus011w3 minus082

Table 15 Forecast result comparison between proposed method and Fab

Hit rate () Pearsonrsquos rWeek Week

w1 w2 w3 w1 w2 w3Proposed method 750 750 500 042 031 043Fab 500 00 00 028 minus011 minus082

ndash10000

00000

10000

20000

30000

40000

50000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using Fab method

ActualProposed LSTM10 upper limit

10 lower limitFab method

Figure 16 IWIP forecast using Fab method

14 Computational Intelligence and Neuroscience

used in the proposed method contains memory cells tomemorize the long and short temporal features in the datawhich yields better performance for the prediction of timeseries data -erefore the proposed method will be veryuseful and benefit to the PM planning

Although the proposed method is outperformed theexisting Fabrsquos statistical method there is still room forimprovement -e first future work is to increase the size ofhistorical dataset With the use of larger historical datasetthat spans across longer historical time horizon to train theLSTM model it may be potential for the LSTM model todiscover significantWIP arrival pattern that could have beenmissed out in smaller historical dataset -e second futurework is to extend the univariate forecasting model in thisresearch to multivariate forecasting model -e reason forthis extension is to allow the inclusion of more features totrain the LSTM model so that the LSTM model can bettermodel the actual environment of the Fab -e next futurework is to increase the number of hidden layers in the LSTMforecasting model -e approach to increase the number ofhidden layer is also an initial step to experiment the potentialuse of deep-learning model in time series forecasting -elast future work is to extend the application of the proposedmethod to predict the IWIP of other types of tool group toexperiment if the proposed method is capable to deliveringthe same prediction performance -e collected predictionresults across various types of tool groups from the futurework will also allow us to generalize the proposed method tobe used as a generic IWIP prediction model for the fab

Data Availability

-e time-series data used to support the findings of thisstudy were supplied by X-Fab Sarawak Sdn Bhd underprivacy agreement and there the data cannot be made freelyavailable -e data potentially reveal sensitive informationand therefore their access is being restricted

Conflicts of Interest

-e authors declare that they have no conflicts of interest

Acknowledgments

-e funding for this project was made possible through theresearch grant from the Ministry of Education Malaysiaunder the Research Acculturation Collaborative Effort(Grant No RACEb(3)12472015(03)) -e authors wouldlike to thank X-FAB Sarawak Sdn Bhd for their support inthis research by providing the environment and resources toextract the relevant data

References

[1] J A Ramırez-Hernandez J Crabtree X Yao et al ldquoOptimalpreventive maintenance scheduling in semiconductormanufacturing systems software tool and simulation casestudiesrdquo IEEE Transactions on Semiconductor Manufacturingvol 23 no 3 pp 477ndash489 2010

[2] Y Tian and L Pan ldquoPredicting short term traffic flow by longshort-termmemory recurrent neural networkrdquo in Proceedingsof 2015 IEEE International Conference on Smart Citypp 153ndash158 Chengdu China December 2015

[3] K Zhang J Xu M R Min G Jiang K Pelechrinis andH Zhang ldquoAutomated IT system failure prediction a deeplearning approachrdquo in Proceedings of 2016 IEEE InternationalConference on Big Data (Big Data) pp 1291ndash1300 Wash-ington DC USA March 2016

[4] G Zhu L Zhang P Shen and J Song ldquoMultimodal gesturerecognition using 3D convolution and convolutional LSTMrdquoIEEE Access vol 5 pp 4517ndash4524 2017

[5] J Lai B Chen T Tan S Tong and K Yu ldquoPhone-awareLSTM-RNN for voice conversionrdquo in Proceedings of 2016IEEE 13th International Conference on Signal Processing(ICSP) pp 177ndash182 Chengdu China November 2016

[6] A ElSaid B Wild J Higgins and T Desell ldquoUsing LSTMrecurrent neural networks to predict excess vibration events inaircraft enginesrdquo in Proceedings of 2016 IEEE 12th In-ternational Conference on e-Science pp 260ndash269 BaltimoreMD USA October 2016

[7] J Wang J Zhang and X Wang ldquoA data driven cycle timeprediction with feature selection in a semiconductor waferfabrication systemrdquo IEEE Transactions on SemiconductorManufacturing vol 31 no 1 pp 173ndash182 2018

[8] J Wang J Zhang and X Wang ldquoBilateral LSTM a two-dimensional long short-term memory model with multiplymemory units for short-term cycle time forecasting in re-entrant manufacturing systemsrdquo IEEE Transactions on In-dustrial Informatics vol 14 no 2 pp 748ndash758 2018

[9] W Scholl B P Gan P Lendermann et al ldquoImplementationof a simulation-based short-term lot arrival forecast in amature 200 mm semiconductor FABrdquo in Proceedings of 2011Winter Simulation Conference (WSC) pp 1927ndash1938Phoenix AZ USA December 2011

[10] M Mosinski D Noack F S Pappert O Rose andW SchollldquoCluster based analytical method for the lot delivery forecastin semiconductor fab with wide product rangerdquo in Pro-ceedings of 2011 Winter Simulation Conference (WSC)pp 1829ndash1839 Phoenix AZ USA December 2011

[11] H K Larry ldquoEvent-based short-term traffic flow predictionmodelrdquo Transportation Research Record vol 1510 pp 45ndash521995

[12] B M Williams and L A Hoel ldquoModeling and forecastingvehicular traffic flow as a seasonal ARIMA process theoreticalbasis and empirical resultsrdquo Journal of Transportation Engi-neering vol 129 no 6 pp 664ndash672 2003

[13] M Van Der Voort M Dougherty and S Watson ldquoCom-bining Kohonen maps with ARIMA time series models toforecast traffic flowrdquo Transportation Research Part CEmerging Technologies vol 4 no 5 pp 307ndash318 1996

[14] Y Xie Y Zhang and Z Ye ldquoShort-term traffic volumeforecasting using Kalman filter with discrete wavelet de-compositionrdquo Computer-Aided Civil and Infrastructure En-gineering vol 22 no 5 pp 326ndash334 2007

[15] W Huang G Song H Hong and K Xie ldquoDeep architecturefor traffic flow prediction deep belief networks with multitasklearningrdquo IEEE Transactions on Intelligent TransportationSystems vol 15 no 5 pp 2191ndash2201 2014

[16] A Abadi T Rajabioun and P A Ioannou ldquoTraffic flowprediction for road transportation networks with limitedtraffic datardquo IEEE Transactions on Intelligent TransportationSystems vol 16 no 2 pp 653ndash662 2015

Computational Intelligence and Neuroscience 15

[17] R Fu Z Zhang and L Li ldquoUsing LSTM and GRU neuralnetwork methods for traffic flow predictionrdquo in Proceedings of2016 31st Youth Academic Annual Conference of ChineseAssociation of Automation (YAC) pp 324ndash328 WuhanChina November 2016

[18] H Shao and B Soong ldquoTraffic flow prediction with longshort-term memory networks (LSTMs)rdquo in Proceedings of2016 IEEE Region 10 Conference (TENCON) pp 2986ndash2989Singapore November 2016

[19] M S Ahmed and A R Cook ldquoAnalysis of freeway traffictime-series data by using Box-Jenkins Techniquesrdquo Trans-portation Research Record vol 773 no 722 pp 1ndash9 1979

[20] I Okutani ldquo-e Kalman filtering approaches in sometransportation and traffic problemsrdquo Transportation ResearchRecord vol 2 no 1 pp 397ndash416 1987

[21] I Okutani and Y J Stephanedes ldquoDynamic prediction oftraffic volume through Kalman filtering theoryrdquo Trans-portation Research Part B Methodological vol 18 no 1pp 1ndash11 1984

[22] H J H Ji A X A Xu X S X Sui and L L L Li ldquo-e appliedresearch of Kalman in the dynamic travel time predictionrdquo inProceedings of 18th International Conference on Geo-informatics pp 1ndash5 Beijing China June 2010

[23] Y Wang and M Papageorgiou ldquoReal-time freeway trafficstate estimation based on extended Kalman filter a generalapproachrdquo Transportation Research Part B Methodologicalvol 39 no 2 pp 141ndash167 2005

[24] Y Bengio ldquoLearning deep architectures for AIrdquo Foundationsand Trendsreg in Machine Learning vol 2 no 1 pp 1ndash1272009

[25] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inProceedings of NIPS Lake Tahoe NV USA December 2012

[26] G E Hinton and R R Salakhutdinov ldquoReducing the di-mensionality of data with neural networksrdquo Science vol 313no 5786 pp 504ndash507 2006

[27] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of 25th ICML pp 160ndash167 HelsinkiFinland July 2008

[28] I J Goodfellow Y Bulatov J Ibarz S Arnoud and V ShetldquoMulti-digit number recognition from street view imageryusing deep convolutional neural networksrdquo 2013 httpsarxivorgabs13126082

[29] B Huval A Coates and A Ng ldquoDeep learning for class-generic object detectionrdquo 2013 httpsarxivorgabs13126885

[30] H C Shin M R Orton D J Collins S J Doran andM O Leach ldquoStacked autoencoders for unsupervised featurelearning and multiple organ detection in a pilot study using4D patient datardquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 35 no 8 pp 1930ndash1943 2013

[31] Y Lv Y Duan W Kang Z Li and F Wang ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16no 2 pp 865ndash873 2015

[32] R Zhao J Wang R Yan and K Mao ldquoMachine healthmonitoring with LSTM networksrdquo in Proceedings of 2016 10thInternational Conference on Sensing Technology (ICST)pp 1ndash6 Nanjing China November 2016

[33] X Ma Z Tao Y Wang H Yu and Y Wang ldquoLong short-term memory neural network for traffic speed predictionusing remote microwave sensor datardquo Transportation

Research Part C Emerging Technologies vol 54 pp 187ndash1972015

[34] E I Vlahogianni M G Karlaftis and J C Golias ldquoOptimizedand meta-optimized neural networks for short-term trafficflow prediction a genetic approachrdquo Transportation ResearchPart C Emerging Technologies vol 13 no 3 pp 211ndash2342005

[35] S Hochreiter and J Schmidhuber ldquoLong short-term mem-oryrdquo Neural Computation vol 9 no 8 pp 1735ndash1780 1997

[36] Keras -e Python Deep Learning Library 2018 httpskerasio

16 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 5: IncomingWork-In-ProgressPredictioninSemiconductor ...downloads.hindawi.com/journals/cin/2019/8729367.pdf · sequentially in time. e IWIP forecast to a tool group is similar to traffic

Input gate at time t is it forget gate is ft and output gate isot According to the structure of the LSTM cell Ct and ht willbe transmitted to the next cell in the network To calculate Ct

and ht we first define the following 4 equationsInput gate

it σ Wixt + Wihtminus1 + bi( 1113857 (1)

Forget gate

ft σ Wfxt + Wfhtminus1 + bf1113872 1113873 (2)

Output gate

ot σ Woxt + Wohtminus1 + bo( 1113857 (3)

Cell Input1113957Ct tan h WCxt + WChtminus1 + bC( 1113857 (4)

where Wweighted matrices b bias vector σ sigmoidfunction and tanh hyperbolic tangent function

Sigmoid function σ(x) is defined as follows

σ(x) 1

1 + exp(minusx) (5)

Hyperbolic tangent function tan h(x) is defined asfollows

tan h(x) exp(x)minus exp(minusx)

exp(x) + exp(minusx) (6)

Using equations (1) (2) and (4) we calculate the celloutput state using the following equation

Ct ft lowastCtminus1( 1113857 + it lowast 1113957Ct1113872 1113873 (7)

Lastly the hidden layer output is calculated using thefollowing equation

ht otlowast tan h Ct( 1113857 (8)

-e hidden layer of the LSTM can be stacked such thatthe architecture of neural network consists of more than oneLSTM hidden layers Figure 4 shows neural network ar-chitecture with one LSTM hidden layer while Figure 5shows neural network with two LSTM hidden layers stacked

With reference to Figure 5 each LSTM hidden layer isfully connected through recurrent connections (the con-nection is indicated by the dotted directional line) -esquares in the LSTM hidden layer represents the LSTMneurons the circles denoted with xi represents the input tothe LSTM neuron while the circles denoted with yi rep-resents the output of the LSTM neuron When the LSTMhidden layers are stacked each LSTM neuron in the lowerLSTM hidden layer is fully connected to each LSTM neuronin the LSTM hidden layer above it through feedforwardconnections which is denoted by the solid directional linebetween the stacked LSTM hidden layers

33 Proposed Method Figure 6 illustrates the proposedmethod -e historical IWIP data are first stored in a datastore to ease the management of the data -e historical dataare then extracted from the data store to be preprocessed-e preprocessing stage consists of two steps which are datascaling and data formatting

In the data scaling step the historical IWIP data to beused for supervised-learning is scaled according to thefollowing equation

y xminusmin

maxminus min (9)

where x denotes each of the historical IWIP value mindenotes the smallest historical IWIP value in the historicaldata and max denotes the largest historical IWIP value inthe historical data

-e next step is the data formatting step In time seriesdomain the term ldquolagsrdquo is commonly used to denote valuesof time steps observed prior to the prediction Generally thetime series data are separated into training and testing setwhere the training set contains the lags while the testing setcontains the actual values of future time steps -erefore inthe data formatting step let xi denotes each individual lagthe scaled historical IWIP data are formatted according tothe format tabulated in Tables 2 and 3 Tables 2 and 3 depictthe format of the training and testing dataset respectively

Following this format column X consists of a series oflags column Y consists of the number of time steps to beforecasted and column Z consists of the number of featuresused in the forecast Each row in column X will contain a setof seven IWIP points that corresponds to the number ofIWIP points to be forecasted -ese seven IWIP points aregrouped into a single set of value -e number of IWIPpoints to be forecasted is represented in column Y as timesteps Column Z has the value of one in the training datasetwhich corresponds to the single set of seven IWIP points incolumn X By putting seven IWIP points in both the trainingand testing dataset we are effectively telling LSTM that each

Table 1 Existing statistical WIP forecasting steps

Step Description

1 Given the process flow name of product P is Fretrieve all process steps for F

2For each process step s in F get the turn-around-time

of s TATs this value is predefined in themanufacturing execution system (MES) of the fab

3

Let LF denotes the total photolithography layers in FlF denotes the number of photolithography layers in F

completed per day the day-per-mask-layer (DPML)committed for P DPMLP is denoted as DPMLP

(1lF)

TATF denotes the total turn-around-time (TAT) forF denoted as TATF 1113936

Fs1TATs

-e run rate for F RRF isRRF ((LF times DPMLP)TATF)

4 Cycle time (CT) for a process step s CTs isCTs RRF times TATs

5For each lot sum the next n steps of s until the CTreaches 24 hours -e last s would be the forecasted

destination step of the lot after 24 hours

6

To forecast the destination step of the lot for the nextD day sum the next n steps of s until the CT reaches

D times 24 hours -e last s would be the forecasteddestination step of the lot for the next D day

Computational Intelligence and Neuroscience 5

future seven IWIP points are related to its immediateprevious seven IWIP points

As a nonparametric model neural network model doesnot have a fixed structure In a RNN with one hidden layerthe ability of the neural networks to discover importantrelationship in the training data during the supervised-

Data storing

Preprocessing

Data scaling

Data formatting

Supervised-learningParameter sizeidentification

Epoch

Batch size

LSTM neuron size

Stacked hidden layer size

Parameter combination evaluation and selection

Result measurement

Forecast

LSTM

f (x)

y0

h0

y1 y2 yn

x0 x1 x2 xn

hnndash1h1

Figure 6 Proposed method

Input layer

Output layer

LSTM hidden layer

y0

h0

y1 y2 yn

x0 x1 x2 xn

hnndash1h1

Figure 4 Nonstacked LSTM neural network

y0

h0

y1 y2 yn

x0 x1 x2 xn

hnndash1h1

y0

h0

y1 y2 yn

x0 x1 x2 xn

hnndash1h1

Input layer

Output layer

LSTM hidden layer

LSTM hidden layer

Figure 5 Stacked LSTM neural network

Table 2 Data formation for training dataset

Set Training datasetX Y Z

1 (x1 x2 x3 x4 x5 x6 x7) 7 12 (x2 x3 x4 x5 x6 x7 x8) 7 13 (x3 x4 x5 x6 x7 x8 x9) 7 14 (x4 x5 x6 x7 x8 x9 x10) 7 15 (x5 x6 x7 x8 x9 x10 x11) 7 16 (x6 x7 x8 x9 x10 x11 x12) 7 17 (x7 x8 x9 x10 x11 x12 x13) 7 1

Table 3 Data formation for testing dataset

Set Testing datasetX Y

1 (x8 x9 x10 x11 x12 x13 x14) 12 (x9 x10 x11 x12 x13 x14 x15) 13 (x10 x11 x12 x13 x14 x15 x16) 14 (x11 x12 x13 x14 x15 x16 x17) 15 (x12 x13 x14 x15 x16 x17 x18) 16 (x13 x14 x15 x16 x17 x18 x19) 17 (x14 x15 x16 x17 x18 x19 x20) 1

6 Computational Intelligence and Neuroscience

learning is affected by the batch size used per epoch thenumber of epoch hidden layers and hidden neuron -ecombination of the sizes of these four parameters that resultsin stable supervised-learning and delivers the lowest forecasterror is desired

Each parameter being examined will have a list ofpredefined sizes to be tested When one of the parameters isbeing examined the remaining two parameters will be fixedto their current sizes in their respective list -is is to controlthe variation across the examinations For each combinationof the parameters the model will be tested with thatcombination to measure its performance in terms of theforecasting error and the stability of its supervised-learning

For the LSTM setup of this research we construct aLSTM model using the LSTM cell Let t denote the obser-vation time of each IWIP and x denotes the IWIP the inputof the LSTMmodel is the observed IWIP x at time t denotedas xt and the output of the LSTM model is the predictedIWIP 1113957xt+1 -rough the LSTM equations presented 1113957xt+1 istherefore calculated as

1113957xt+1 W middot ht + b (10)

where W is the weight matrix between the output layer andthe hidden layer

-e metric used to measure the forecasting error in thesupervised-learning is the root-mean-squared error (RMSE)Let P denote the actual IWIP 1113957P denote the forecasted IWIPand n denotes the total day forecasted RMSE is defined asfollows

RMSE

1n

1113944

n

j1Pj minus 1113957Pj1113872 1113873

2

11139741113972

(11)

RMSE is a frequently used evaluation metric because itmeasures the difference between the values predicted by amodel and the actually observed values

For each parameter size combination to be tested themodel will be experimented multiple times with the sameparameters setting If N denotes the number of times theexperiment was conducted there will be N number of RMSEobtained to represent the performance of the model for eachexperiment -e reason running multiple experiments foreach parameter size combination is because internallyneural network uses randomization to assign the weightsand the states of its neurons -is produces different fore-casting errors between experiments -erefore multipleexperimental runs are recommended to allow for the se-lection of the neural network model with internal settings toproduce the lowest RMSE

After the supervised-learning is completed the proposedmethod will proceed to parameter combination evaluationand selection -e evaluation of parameter combination andselection step is necessary because it is common to assumethat a particular parameter combination that gives a lowRMSE at the end of the supervised-learning directlytranslated to a good parameter combination that allowssufficient capability of the model to perform forecastHowever this assumption is misleading because a model

that has overlearned during the supervised-learning candeliver results with very low RMSE at the end of the trainingAn overlearned model will perform poorly in the actualforecast -erefore it is necessary to also measure the sta-bility of the supervised-learning of the model given aparticular combination of the four parameters During eachepoch in the supervised-learning the model will be requiredto perform two forecasts one uses a reserved set from thetraining set and other uses a reserved set from the testing setWith two forecasts performed two RMSE will be generated-e RMSE generated by using training set is the trainingerror while the RMSE generated by using the reservedtesting set is the testing error To measure the stability of thesupervised-learning the RMSE for both training error andtesting error of each epoch are collected and plotted in asingle graph With y-axis representing the RMSE and x-axisrepresenting the number of epochs Figures 7ndash9 show ex-amples of the curves that exhibited from the supervised-learning -e combination of parameter sizes that allows themodel to exhibit learning curve pattern similar to Figure 7 isthe desired selection Learning curve with pattern similar toFigure 7 signifies that the model was able to perform stablesupervised-learning with stable reduction in the RMSE ofboth training and testing phases using the selected combi-nation of parameter sizes In other words themodel was ableto discover the time-dependent relation in the given datasetsuch that it allows the model to minimize its prediction errorfor each epoch of the supervised-learning

-e combination of the four parametersrsquo sizes that en-ables the model to show stable performance in thesupervised-learning and lowest RMSE will be selected toforecast the IWIP

For each of the selected parameter combination themodel is required to forecast for three consecutive weeks-e accuracies for the forecast results will be measuredaccording to the selected measurement metrics to evaluatethe forecasting capability of the model

4 Experimental Results

41DataDescriptionandExperimentalDesign -e IWIP fora particular tool group is denoted as IWIP and can becalculated as

IWIP WIPt24 minusWIPt1( 1113857 1113944

24

t1(MOVE) (12)

where MOVE denotes the number of wafer moved per hourt1 refers to the first hour the data are collected and t24 refersto the twenty-fourth hour the data are collected In thisstudy the first hour is at 0830 while the 0730 on the next dayis the twenty-fourth hour

-e data use for this experiment is acquired from theFabrsquos internal development database with the applicationrunning hourly to collect the WIP and calculate the numberof wafers moved for each tool group in the production lineevery 24 hours Due to the Fabrsquos data security and confi-dentiality policies we are only allowed to access productionsystemrsquos data source of the company to perform data

Computational Intelligence and Neuroscience 7

collection for a specic duration Given the allowed durationfrom the Fab we were able to collect three months data tocreate a data set with 90 days of historical IWIP With eachIWIP as a data point 70 percent of the data points are usedfor the LSTM training phase and the remaining 30 percentfor testing phase

For the number of epochs numerical values of 100 and200 are selected For batch size numerical values of 10 and20 are selected for the number of hidden layers numericalvalues of 3 and 4 are selected and for the number of hiddenneuron numerical values of 384 and 512 are selected for therst hidden layer while the numerical values of 8 and 16 areselected for the subsequent layers It is worthwhile tomention that by using seven IWIP points per dataset as thenumber of previous IWIP lags to be examined each of thenumerical values for batch size denotes the number of weekspresented to the LSTMmodel per epoche neural networkis initialized with uniformly distributed weights where theranges of the weights are (minus01 01) and trained usingmean-squared-error (MSE) as the loss function Adamoptimizer is used as the optimization function with default

learning rate η 0001 β1 09 β2 0999 ε 0 andc 0 Each combination of the selected values is thenevaluated three times to obtain the three RMSE results ofeach combination

Parameter size selection is done by selecting the lowestRMSE among the three experimental runs followed by ex-amining the graphs of the supervised-learning result of thesame run that produced the lowest RMSE e desiredsupervised-learning graph should resemble the pattern il-lustrated in Figure 7 e parameter size combinations thatdo not meet the required pattern will be discarded

42 Measurement Metrics To measure the performance ofthe models two accuracy measurements are used esetwo-measurement metrics are hit rate and correlationmeasurement

Hit rate or probability of detection (POD) is theprobability that the forecasted event matches the observedevent In the context of this research work the observedevents are either low IWIP or high IWIP erefore hit ratecan be used to measure the forecast capability of the pro-posed method to match the actual IWIP events Let HRdenotes hit rate n denotes the number correct detectionand N denotes the total number of observation hit rate isexpressed as

HR n

Ntimes 100 (13)

From the requirement of the Fab it is only necessary forthe proposedmethod to be able to forecast any two days withhighest IWIP and any two days with lowest IWIP For thesefour days to be forecasted the hit rate required by the Fab is75 percent In other words at least three out of these fourdays must be detected

To measure the correlation between the actual IWIP andthe forecasted IWIP this research uses the Pearsonrsquos cor-relation coecient r Pearsonrsquos r is a measure of the linearrelationship between two vectors of variables In this re-search work these two vectors of variables are the actual

Epoch

RMSE

TrainingTesting

Figure 7 RMSE curves when the model is well learned

TrainingTesting

Epoch

RMSE

Figure 8 RMSE curves when then model is underlearned

TrainingTesting

Epoch

RMSE

Figure 9 RMSE curves when the model is overlearned(overtting)

8 Computational Intelligence and Neuroscience

IWIP and the forecasted IWIP Let y denotes the actualIWIP and 1113957y denotes the forecasted IWIP Pearsonrsquos r isexpressed as

r cov(y 1113957y)

σyσ1113957y (14)

where cov is the covariance of actual IWIP and forecastedIWIP σy is the standard deviation of the actual IWIP and σ1113957yis the standard deviation of the forecasted IWIP

-e correlation coefficient takes values in the range [minus11] -e value of 1 implies that a linear equation describes therelationship between the two vectors perfectly -is meansthat all data points of the two vectors fit perfectly on astraight line on a graph -e positive sign of the coefficientindicates positive correlation -is means that as the actualIWIP increase and the forecasted IWIP increases as well-e negative sign of the coefficient indicates negative cor-relation -is means that as the actual IWIP increase theforecasted IWIP decreases as well Positive correlation istherefore desirable for the forecast results

Due to the Fabrsquos privacy protection agreement only theobtained Pearsonrsquos r will be reported while the detail cal-culations of the covariance and standard deviation will beomitted Based on the requirement of the Fab the minimumPearsonrsquos r value is 04

We conduct the experiment for three consecutive weeks-is allows us to monitor the consistency of the modelsrsquoprediction At the beginning of each week we will predictseven days ahead and measure the performance at the end ofeach week -e implementation of the proposed method isaccomplished using Python programming language andKeras [36] neural network library

43 Results Analysis and Discussions Table 4 tabulates theresults of the experiments -e parameter size combinationsobtained in Table 4 are combinations that exhibited curvepattern similar to Figure 7

Figures 10ndash12 show the graphs of the supervised-learning results for the selected three combinations re-spectively From the figures it can be seen that both lines arefar apart although they move along in descending pattern Inaddition the line graph of the training is descending slowlyand remained high at the end of the epoch However theneural network was still able to forecast the IWIP that arequite close to the actual IWIP -is is shown by the linegraph of the testingrsquos RMSE that exhibits small fluctuates Byreferring to these figures alone we are not able to identify thebest parameter size combination because all three graphsexhibit similar pattern -erefore hit rate and linear cor-relation of the forecasting results of each combination can beused to identify the best parameter size combination

-e parameters from each of the selected three com-binations were applied on the proposed LSTM model toperform the three consecutive weeks forecasting -e ex-periments were run and recorded separately for eachcombination Tables 5ndash7 tabulate the hit rate percentage forCombinations 1 2 and 3 respectively Tables 8ndash10 tabulatethe Pearsonrsquos r for Combinations 1 2 and 3 respectively

Table 11 summarizes the hit rate of Combinations 1 2 and 3while Table 12 summarizes the Pearsonrsquos r of Combinations1 2 and 3

Figures 13ndash15 shows the graph plots for the IWIPforecast for the three parameter size combinationsrespectively

From the results obtained the model performed the bestusing Combination 3 In terms of hit rate Combination 3scored the highest compare to Combinations 1 and 2 for allthree weeks Combinations 1 and 2 scored 75 percent forweek 1 but for subsequent weeks both combinations onlyscored the maximum of 50 percent In terms of Pearsonrsquos rCombination 3 has the best performance in overall compareto Combinations 1 and 2 while Combination 1 has the leastperformance Although on week 3 the Pearsonrsquos r ofCombination 3 is slightly lower than Combination 2however it is still above the Fabrsquos requirement

We then compare the forecast result using Combination3 to the statistical forecasting method used in the Fab Inorder to make the writing clearer the statistical forecastingmethod used in the Fab is abbreviated as Fab methodTables 13 and 14 tabulate the hit rate and Pearsonrsquos r of Fabrsquosmethod respectively Table 15 tabulates the comparison ofthe forecast results between the proposed method and Fabrsquosmethod Figure 16 shows the WIP forecast using Fabrsquosmethod

-e Fab method serves as the baseline to measure theperformance of the LSTM forecasting model From theresults tabulated in Table 15 the proposed method withLSTM forecasting model outperformed the Fab methodHowever both hit rate and Pearsonrsquos r of the proposedmethod is unable to remain consistent for three consecutiveweeks forecasted -e results also show that the IWIPforecasted by the Fab method consistently failed to meet therequirement of the Fab for both hit rate and Pearsonrsquos r -emain reason for the inaccuracy of the Fab method is that itonly considers for products with the number of wafersordered dominating the total WIP in the production lineHowever operators need to process other wafers from otherproducts as well Hence the wafers did not arrive on time aspredicted In addition the Fab method does not consider thenumber of tools available at each process steps to process thewafers and the total amount of time that each tool is used toprocess the wafers In real environment a tool can be takenoffline for maintenance purposes or it could be used by therespective engineers to process specially crafted wafers forresearch and development purposes Without taking intothese considerations the Fab method indirectly assumedthat the number of tools available and the time of each tooldedicated to process wafers are the same across the entireperiod of the wafer fabrication process -is assumptioncaused the forecasted results to have negative correlationwith the actual IWIP

For hit rate the proposed method only scored 50 forweek 3 while for Pearsonrsquos r the proposed method onlyscored 031 for week 2 One of the factors that caused thereduced performance of the model could be due to thereason that the size of the historical data to train the LSTMmodel is not large enough Larger historical IWIP data could

Computational Intelligence and Neuroscience 9

potentially allow the LSTM model of the proposed methodto discover more time-dependent relations in the Fabrsquosproduction environment With the additional time-dependent relations discovered the accuracy of themodelrsquos forecasting can be increased

e next factor that could contribute to the inconsistentresult of the model is the limited number of features thatused to represent Fabrsquos production environment Havingadditional features to represent Fabrsquos production environ-ment could allow LSTM to perform better modeling of WIParrival e examples of additional data that could serve assuch features are the actual number of equipment thatsupplies the WIP to the tool group of interest the amount oftime each equipment in the tool group is processing the

production wafers instead of performing other maintenanceactivities and the number of wafers that each equipment inthe tool group of interest has actually processed

e last factor that contributes to the inconsistent resultscould be the need for more hidden layers As the number ofhidden layers increases it creates a deeper neural networkthat could potential allow the model to capture even moretime-dependent relations in the data However in order tobenet from deeper neural network larger dataset must rstbe obtained so that the model can be properly trained

For the experiments conducted the selection of sizes forthe LSTM modelrsquos parameters and the number of experi-mental runs is largely aected by the hardware resourcesallocation and the software capability setup From the

Table 4 Parameter size selection results

CombinationParameter

RMSEEpoch Batch size n LSTM hidden layers stacked LSTM neuronsize

1 100 10 3 512 8 8 000962 100 10 3 512 8 16 000863 100 20 3 512 16 16 00091

0

0005

001

0015

002

0025

003

1 21 41 61 81

RMSE

Epoch

Training vs testing

TrainingTesting

Figure 10 Supervised-learning result for Combination 1

00060008

0010012001400160018

002002200240026

1 21 41 61 81

RMSE

Epoch

Training vs testing

TrainingTesting

Figure 11 Supervised-learning result for Combination 2

10 Computational Intelligence and Neuroscience

hardware resource perspective sucient CPU should beallocated in the computing machine while from the softwarecapability perspective parallelization should be enabled tofully utilize the available CPU With 4 CPUs allocated in avirtual machine environment and parallelization enabled inKeras it took approximately 8 hours to complete one fullexperiment One full experiment refers to the completeevaluation all the predened sizes For real productiondeployment 8 hours is too long to obtain a usable modelParallelization with sucient number of CPUs in thecomputing machine are therefore critical in the production

environment as the results should be obtained as fast aspossible in order for the managements to make the necessarydecision for production line stability Hence proper hard-ware planning is required for production deployment

5 Conclusion

PM activity is an important activity in the Fab as it maintainsor increases the operational eciency and reliability of thetool Proper PM planning is necessary as PM activity takessignicantly long time to complete thus it is desirable to

TrainingTesting

0

0005

001

0015

002

0025

003

1 21 41 61 81

RMSE

Epoch

Training vs testing

Figure 12 Supervised-learning result for Combination 3

Table 5 Hit rate for Combination 1

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

1 L2 L34 L H5 H L67 H H 1

HR 25

w3

1 L L 12 L3 L4 H5 H H 16 H7

HR 50

Table 6 Hit rate for Combination 2

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

1 L2 L L 134 L5 H H 16 H7 H

HR 50

w3

1 L2 L3 L4 H5 H H 16 H7 L

HR 25

Computational Intelligence and Neuroscience 11

perform this activity when the IWIP to the tool group isexpected to be low With an IWIP prediction model that iscapable to predict the IWIP with high accuracy PM activitycan be planned and managed better to reduce its negativeimpact to the fabrsquos CT Reducing the negative impact to CTisimportant as this will enable the Fab to meet the On-Time-Delivery (OTD) committed to customers With consistencyin the OTD the logistic management of the company can beimproved as well such as proper storage place planning to

keep the fabricated wafers and scheduling their trans-portations for shipments Well-planned PM activities alsoallow better manpower planning in areas such as manpowerplanning When performing the PM activity sufficient toolengineers and tool vendors are required to be onsite toperform the prescribed maintenance activities Well-planned PM activities allow the required manpower to beproperly prepared Well-planned manpower directly con-tributes to better manpower cost planning With proper PMplanning in-place tools in the Fab can be scheduled toreceive their proper maintenances on time It is importantfor tools in the Fab to receive their appropriate maintenanceson-time to improve its productivity and lifetime extensionWith improved performance and extended lifetime thecapital investments of the company on the tools can beoptimized Reliable tool performance will also increase thetrust of the customers as chances of the fabricated wafersbeing scrapped due to unhealthy tool are minimized

In this paper we investigated LSTM to assist in PMplanning in Fab by predicting the IWIP to a tool group -ecomparison of the performance of the proposed method wasdone with an existing forecasting method from the Fab -eproposed method was trained using the historical IWIP dataprovided by the Fab which is a time series data Both hit rateand Pearsonrsquos correlation coefficient are important criteriathat determine the forecast capability -e proposed methodhas demonstrated results that outperformed the Fab methodby reaching above the requirement of the Fab for week 1 andweek 3 while the Fabmethod fails to meet Fabrsquos requirementfor all three weeks In terms of hit rate the proposed methodshows higher percentage than Fabrsquos method Following therequirement given by the Fab the results of the proposedmethod signifies that for forecast duration of seven days it isable to identify more accurately the two days that the IWIPwill be highest and the two days the IWIP will be lowest in aweek In terms of Pearsonrsquos correlation coefficient r theproposed method shows positive correlation and highervalue than the Fabrsquos method -is result signifies that theforecast results of the proposed method are able to producebetter forecasting results that have closer proportionalchanges in relation to the actual IWIP -e LSTM model

Table 7 Hit rate for Combination 3

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

12 L L 134 L5 H H 167 H H 1

HR 75

w3

1 L L 12 L3 L4 H5 H H 16 H7

HR 50

Table 8 Pearsonrsquos r for combination 1

Week r

w1 031w2 006w3 034

Table 9 Pearsonrsquos r for combination 2

Week r

w1 040w2 028w3 046

Table 10 Pearsonrsquos r for combination 3

Week r

w1 042w2 031w3 043

Table 11 Summary of hit rate for Combinations 1 2 and 3

CombinationHit rate ()

w1 w2 w31 75 25 502 75 50 253 75 75 50

Table 12 Summary of Pearsonrsquos r for combinations 1 2 and 3

CombinationPearsonrsquos r

w1 w2 w31 031 006 0342 040 028 0463 042 031 043

12 Computational Intelligence and Neuroscience

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 10 batch sizeand 3 hidden layers (512 8 8 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 13 IWIP forecast using Combination 1

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 10 batch sizeand 3 hidden layers (512 8 16 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 14 IWIP forecast using Combination 2

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 20 batch sizeand 3 hidden layers (512 16 16 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 15 IWIP forecast using Combination 3

Computational Intelligence and Neuroscience 13

Table 13 Hit rate for Fab method

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L6 L7 L

HR 50

w2

12 L H3 H4 L5 H L6 L7 H

HR 0

w3

1 L H2 L34 H5 H L6 L7 H

HR 0

Table 14 Pearsonrsquos r for Fab method

Week r

w1 028w2 minus011w3 minus082

Table 15 Forecast result comparison between proposed method and Fab

Hit rate () Pearsonrsquos rWeek Week

w1 w2 w3 w1 w2 w3Proposed method 750 750 500 042 031 043Fab 500 00 00 028 minus011 minus082

ndash10000

00000

10000

20000

30000

40000

50000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using Fab method

ActualProposed LSTM10 upper limit

10 lower limitFab method

Figure 16 IWIP forecast using Fab method

14 Computational Intelligence and Neuroscience

used in the proposed method contains memory cells tomemorize the long and short temporal features in the datawhich yields better performance for the prediction of timeseries data -erefore the proposed method will be veryuseful and benefit to the PM planning

Although the proposed method is outperformed theexisting Fabrsquos statistical method there is still room forimprovement -e first future work is to increase the size ofhistorical dataset With the use of larger historical datasetthat spans across longer historical time horizon to train theLSTM model it may be potential for the LSTM model todiscover significantWIP arrival pattern that could have beenmissed out in smaller historical dataset -e second futurework is to extend the univariate forecasting model in thisresearch to multivariate forecasting model -e reason forthis extension is to allow the inclusion of more features totrain the LSTM model so that the LSTM model can bettermodel the actual environment of the Fab -e next futurework is to increase the number of hidden layers in the LSTMforecasting model -e approach to increase the number ofhidden layer is also an initial step to experiment the potentialuse of deep-learning model in time series forecasting -elast future work is to extend the application of the proposedmethod to predict the IWIP of other types of tool group toexperiment if the proposed method is capable to deliveringthe same prediction performance -e collected predictionresults across various types of tool groups from the futurework will also allow us to generalize the proposed method tobe used as a generic IWIP prediction model for the fab

Data Availability

-e time-series data used to support the findings of thisstudy were supplied by X-Fab Sarawak Sdn Bhd underprivacy agreement and there the data cannot be made freelyavailable -e data potentially reveal sensitive informationand therefore their access is being restricted

Conflicts of Interest

-e authors declare that they have no conflicts of interest

Acknowledgments

-e funding for this project was made possible through theresearch grant from the Ministry of Education Malaysiaunder the Research Acculturation Collaborative Effort(Grant No RACEb(3)12472015(03)) -e authors wouldlike to thank X-FAB Sarawak Sdn Bhd for their support inthis research by providing the environment and resources toextract the relevant data

References

[1] J A Ramırez-Hernandez J Crabtree X Yao et al ldquoOptimalpreventive maintenance scheduling in semiconductormanufacturing systems software tool and simulation casestudiesrdquo IEEE Transactions on Semiconductor Manufacturingvol 23 no 3 pp 477ndash489 2010

[2] Y Tian and L Pan ldquoPredicting short term traffic flow by longshort-termmemory recurrent neural networkrdquo in Proceedingsof 2015 IEEE International Conference on Smart Citypp 153ndash158 Chengdu China December 2015

[3] K Zhang J Xu M R Min G Jiang K Pelechrinis andH Zhang ldquoAutomated IT system failure prediction a deeplearning approachrdquo in Proceedings of 2016 IEEE InternationalConference on Big Data (Big Data) pp 1291ndash1300 Wash-ington DC USA March 2016

[4] G Zhu L Zhang P Shen and J Song ldquoMultimodal gesturerecognition using 3D convolution and convolutional LSTMrdquoIEEE Access vol 5 pp 4517ndash4524 2017

[5] J Lai B Chen T Tan S Tong and K Yu ldquoPhone-awareLSTM-RNN for voice conversionrdquo in Proceedings of 2016IEEE 13th International Conference on Signal Processing(ICSP) pp 177ndash182 Chengdu China November 2016

[6] A ElSaid B Wild J Higgins and T Desell ldquoUsing LSTMrecurrent neural networks to predict excess vibration events inaircraft enginesrdquo in Proceedings of 2016 IEEE 12th In-ternational Conference on e-Science pp 260ndash269 BaltimoreMD USA October 2016

[7] J Wang J Zhang and X Wang ldquoA data driven cycle timeprediction with feature selection in a semiconductor waferfabrication systemrdquo IEEE Transactions on SemiconductorManufacturing vol 31 no 1 pp 173ndash182 2018

[8] J Wang J Zhang and X Wang ldquoBilateral LSTM a two-dimensional long short-term memory model with multiplymemory units for short-term cycle time forecasting in re-entrant manufacturing systemsrdquo IEEE Transactions on In-dustrial Informatics vol 14 no 2 pp 748ndash758 2018

[9] W Scholl B P Gan P Lendermann et al ldquoImplementationof a simulation-based short-term lot arrival forecast in amature 200 mm semiconductor FABrdquo in Proceedings of 2011Winter Simulation Conference (WSC) pp 1927ndash1938Phoenix AZ USA December 2011

[10] M Mosinski D Noack F S Pappert O Rose andW SchollldquoCluster based analytical method for the lot delivery forecastin semiconductor fab with wide product rangerdquo in Pro-ceedings of 2011 Winter Simulation Conference (WSC)pp 1829ndash1839 Phoenix AZ USA December 2011

[11] H K Larry ldquoEvent-based short-term traffic flow predictionmodelrdquo Transportation Research Record vol 1510 pp 45ndash521995

[12] B M Williams and L A Hoel ldquoModeling and forecastingvehicular traffic flow as a seasonal ARIMA process theoreticalbasis and empirical resultsrdquo Journal of Transportation Engi-neering vol 129 no 6 pp 664ndash672 2003

[13] M Van Der Voort M Dougherty and S Watson ldquoCom-bining Kohonen maps with ARIMA time series models toforecast traffic flowrdquo Transportation Research Part CEmerging Technologies vol 4 no 5 pp 307ndash318 1996

[14] Y Xie Y Zhang and Z Ye ldquoShort-term traffic volumeforecasting using Kalman filter with discrete wavelet de-compositionrdquo Computer-Aided Civil and Infrastructure En-gineering vol 22 no 5 pp 326ndash334 2007

[15] W Huang G Song H Hong and K Xie ldquoDeep architecturefor traffic flow prediction deep belief networks with multitasklearningrdquo IEEE Transactions on Intelligent TransportationSystems vol 15 no 5 pp 2191ndash2201 2014

[16] A Abadi T Rajabioun and P A Ioannou ldquoTraffic flowprediction for road transportation networks with limitedtraffic datardquo IEEE Transactions on Intelligent TransportationSystems vol 16 no 2 pp 653ndash662 2015

Computational Intelligence and Neuroscience 15

[17] R Fu Z Zhang and L Li ldquoUsing LSTM and GRU neuralnetwork methods for traffic flow predictionrdquo in Proceedings of2016 31st Youth Academic Annual Conference of ChineseAssociation of Automation (YAC) pp 324ndash328 WuhanChina November 2016

[18] H Shao and B Soong ldquoTraffic flow prediction with longshort-term memory networks (LSTMs)rdquo in Proceedings of2016 IEEE Region 10 Conference (TENCON) pp 2986ndash2989Singapore November 2016

[19] M S Ahmed and A R Cook ldquoAnalysis of freeway traffictime-series data by using Box-Jenkins Techniquesrdquo Trans-portation Research Record vol 773 no 722 pp 1ndash9 1979

[20] I Okutani ldquo-e Kalman filtering approaches in sometransportation and traffic problemsrdquo Transportation ResearchRecord vol 2 no 1 pp 397ndash416 1987

[21] I Okutani and Y J Stephanedes ldquoDynamic prediction oftraffic volume through Kalman filtering theoryrdquo Trans-portation Research Part B Methodological vol 18 no 1pp 1ndash11 1984

[22] H J H Ji A X A Xu X S X Sui and L L L Li ldquo-e appliedresearch of Kalman in the dynamic travel time predictionrdquo inProceedings of 18th International Conference on Geo-informatics pp 1ndash5 Beijing China June 2010

[23] Y Wang and M Papageorgiou ldquoReal-time freeway trafficstate estimation based on extended Kalman filter a generalapproachrdquo Transportation Research Part B Methodologicalvol 39 no 2 pp 141ndash167 2005

[24] Y Bengio ldquoLearning deep architectures for AIrdquo Foundationsand Trendsreg in Machine Learning vol 2 no 1 pp 1ndash1272009

[25] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inProceedings of NIPS Lake Tahoe NV USA December 2012

[26] G E Hinton and R R Salakhutdinov ldquoReducing the di-mensionality of data with neural networksrdquo Science vol 313no 5786 pp 504ndash507 2006

[27] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of 25th ICML pp 160ndash167 HelsinkiFinland July 2008

[28] I J Goodfellow Y Bulatov J Ibarz S Arnoud and V ShetldquoMulti-digit number recognition from street view imageryusing deep convolutional neural networksrdquo 2013 httpsarxivorgabs13126082

[29] B Huval A Coates and A Ng ldquoDeep learning for class-generic object detectionrdquo 2013 httpsarxivorgabs13126885

[30] H C Shin M R Orton D J Collins S J Doran andM O Leach ldquoStacked autoencoders for unsupervised featurelearning and multiple organ detection in a pilot study using4D patient datardquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 35 no 8 pp 1930ndash1943 2013

[31] Y Lv Y Duan W Kang Z Li and F Wang ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16no 2 pp 865ndash873 2015

[32] R Zhao J Wang R Yan and K Mao ldquoMachine healthmonitoring with LSTM networksrdquo in Proceedings of 2016 10thInternational Conference on Sensing Technology (ICST)pp 1ndash6 Nanjing China November 2016

[33] X Ma Z Tao Y Wang H Yu and Y Wang ldquoLong short-term memory neural network for traffic speed predictionusing remote microwave sensor datardquo Transportation

Research Part C Emerging Technologies vol 54 pp 187ndash1972015

[34] E I Vlahogianni M G Karlaftis and J C Golias ldquoOptimizedand meta-optimized neural networks for short-term trafficflow prediction a genetic approachrdquo Transportation ResearchPart C Emerging Technologies vol 13 no 3 pp 211ndash2342005

[35] S Hochreiter and J Schmidhuber ldquoLong short-term mem-oryrdquo Neural Computation vol 9 no 8 pp 1735ndash1780 1997

[36] Keras -e Python Deep Learning Library 2018 httpskerasio

16 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 6: IncomingWork-In-ProgressPredictioninSemiconductor ...downloads.hindawi.com/journals/cin/2019/8729367.pdf · sequentially in time. e IWIP forecast to a tool group is similar to traffic

future seven IWIP points are related to its immediateprevious seven IWIP points

As a nonparametric model neural network model doesnot have a fixed structure In a RNN with one hidden layerthe ability of the neural networks to discover importantrelationship in the training data during the supervised-

Data storing

Preprocessing

Data scaling

Data formatting

Supervised-learningParameter sizeidentification

Epoch

Batch size

LSTM neuron size

Stacked hidden layer size

Parameter combination evaluation and selection

Result measurement

Forecast

LSTM

f (x)

y0

h0

y1 y2 yn

x0 x1 x2 xn

hnndash1h1

Figure 6 Proposed method

Input layer

Output layer

LSTM hidden layer

y0

h0

y1 y2 yn

x0 x1 x2 xn

hnndash1h1

Figure 4 Nonstacked LSTM neural network

y0

h0

y1 y2 yn

x0 x1 x2 xn

hnndash1h1

y0

h0

y1 y2 yn

x0 x1 x2 xn

hnndash1h1

Input layer

Output layer

LSTM hidden layer

LSTM hidden layer

Figure 5 Stacked LSTM neural network

Table 2 Data formation for training dataset

Set Training datasetX Y Z

1 (x1 x2 x3 x4 x5 x6 x7) 7 12 (x2 x3 x4 x5 x6 x7 x8) 7 13 (x3 x4 x5 x6 x7 x8 x9) 7 14 (x4 x5 x6 x7 x8 x9 x10) 7 15 (x5 x6 x7 x8 x9 x10 x11) 7 16 (x6 x7 x8 x9 x10 x11 x12) 7 17 (x7 x8 x9 x10 x11 x12 x13) 7 1

Table 3 Data formation for testing dataset

Set Testing datasetX Y

1 (x8 x9 x10 x11 x12 x13 x14) 12 (x9 x10 x11 x12 x13 x14 x15) 13 (x10 x11 x12 x13 x14 x15 x16) 14 (x11 x12 x13 x14 x15 x16 x17) 15 (x12 x13 x14 x15 x16 x17 x18) 16 (x13 x14 x15 x16 x17 x18 x19) 17 (x14 x15 x16 x17 x18 x19 x20) 1

6 Computational Intelligence and Neuroscience

learning is affected by the batch size used per epoch thenumber of epoch hidden layers and hidden neuron -ecombination of the sizes of these four parameters that resultsin stable supervised-learning and delivers the lowest forecasterror is desired

Each parameter being examined will have a list ofpredefined sizes to be tested When one of the parameters isbeing examined the remaining two parameters will be fixedto their current sizes in their respective list -is is to controlthe variation across the examinations For each combinationof the parameters the model will be tested with thatcombination to measure its performance in terms of theforecasting error and the stability of its supervised-learning

For the LSTM setup of this research we construct aLSTM model using the LSTM cell Let t denote the obser-vation time of each IWIP and x denotes the IWIP the inputof the LSTMmodel is the observed IWIP x at time t denotedas xt and the output of the LSTM model is the predictedIWIP 1113957xt+1 -rough the LSTM equations presented 1113957xt+1 istherefore calculated as

1113957xt+1 W middot ht + b (10)

where W is the weight matrix between the output layer andthe hidden layer

-e metric used to measure the forecasting error in thesupervised-learning is the root-mean-squared error (RMSE)Let P denote the actual IWIP 1113957P denote the forecasted IWIPand n denotes the total day forecasted RMSE is defined asfollows

RMSE

1n

1113944

n

j1Pj minus 1113957Pj1113872 1113873

2

11139741113972

(11)

RMSE is a frequently used evaluation metric because itmeasures the difference between the values predicted by amodel and the actually observed values

For each parameter size combination to be tested themodel will be experimented multiple times with the sameparameters setting If N denotes the number of times theexperiment was conducted there will be N number of RMSEobtained to represent the performance of the model for eachexperiment -e reason running multiple experiments foreach parameter size combination is because internallyneural network uses randomization to assign the weightsand the states of its neurons -is produces different fore-casting errors between experiments -erefore multipleexperimental runs are recommended to allow for the se-lection of the neural network model with internal settings toproduce the lowest RMSE

After the supervised-learning is completed the proposedmethod will proceed to parameter combination evaluationand selection -e evaluation of parameter combination andselection step is necessary because it is common to assumethat a particular parameter combination that gives a lowRMSE at the end of the supervised-learning directlytranslated to a good parameter combination that allowssufficient capability of the model to perform forecastHowever this assumption is misleading because a model

that has overlearned during the supervised-learning candeliver results with very low RMSE at the end of the trainingAn overlearned model will perform poorly in the actualforecast -erefore it is necessary to also measure the sta-bility of the supervised-learning of the model given aparticular combination of the four parameters During eachepoch in the supervised-learning the model will be requiredto perform two forecasts one uses a reserved set from thetraining set and other uses a reserved set from the testing setWith two forecasts performed two RMSE will be generated-e RMSE generated by using training set is the trainingerror while the RMSE generated by using the reservedtesting set is the testing error To measure the stability of thesupervised-learning the RMSE for both training error andtesting error of each epoch are collected and plotted in asingle graph With y-axis representing the RMSE and x-axisrepresenting the number of epochs Figures 7ndash9 show ex-amples of the curves that exhibited from the supervised-learning -e combination of parameter sizes that allows themodel to exhibit learning curve pattern similar to Figure 7 isthe desired selection Learning curve with pattern similar toFigure 7 signifies that the model was able to perform stablesupervised-learning with stable reduction in the RMSE ofboth training and testing phases using the selected combi-nation of parameter sizes In other words themodel was ableto discover the time-dependent relation in the given datasetsuch that it allows the model to minimize its prediction errorfor each epoch of the supervised-learning

-e combination of the four parametersrsquo sizes that en-ables the model to show stable performance in thesupervised-learning and lowest RMSE will be selected toforecast the IWIP

For each of the selected parameter combination themodel is required to forecast for three consecutive weeks-e accuracies for the forecast results will be measuredaccording to the selected measurement metrics to evaluatethe forecasting capability of the model

4 Experimental Results

41DataDescriptionandExperimentalDesign -e IWIP fora particular tool group is denoted as IWIP and can becalculated as

IWIP WIPt24 minusWIPt1( 1113857 1113944

24

t1(MOVE) (12)

where MOVE denotes the number of wafer moved per hourt1 refers to the first hour the data are collected and t24 refersto the twenty-fourth hour the data are collected In thisstudy the first hour is at 0830 while the 0730 on the next dayis the twenty-fourth hour

-e data use for this experiment is acquired from theFabrsquos internal development database with the applicationrunning hourly to collect the WIP and calculate the numberof wafers moved for each tool group in the production lineevery 24 hours Due to the Fabrsquos data security and confi-dentiality policies we are only allowed to access productionsystemrsquos data source of the company to perform data

Computational Intelligence and Neuroscience 7

collection for a specic duration Given the allowed durationfrom the Fab we were able to collect three months data tocreate a data set with 90 days of historical IWIP With eachIWIP as a data point 70 percent of the data points are usedfor the LSTM training phase and the remaining 30 percentfor testing phase

For the number of epochs numerical values of 100 and200 are selected For batch size numerical values of 10 and20 are selected for the number of hidden layers numericalvalues of 3 and 4 are selected and for the number of hiddenneuron numerical values of 384 and 512 are selected for therst hidden layer while the numerical values of 8 and 16 areselected for the subsequent layers It is worthwhile tomention that by using seven IWIP points per dataset as thenumber of previous IWIP lags to be examined each of thenumerical values for batch size denotes the number of weekspresented to the LSTMmodel per epoche neural networkis initialized with uniformly distributed weights where theranges of the weights are (minus01 01) and trained usingmean-squared-error (MSE) as the loss function Adamoptimizer is used as the optimization function with default

learning rate η 0001 β1 09 β2 0999 ε 0 andc 0 Each combination of the selected values is thenevaluated three times to obtain the three RMSE results ofeach combination

Parameter size selection is done by selecting the lowestRMSE among the three experimental runs followed by ex-amining the graphs of the supervised-learning result of thesame run that produced the lowest RMSE e desiredsupervised-learning graph should resemble the pattern il-lustrated in Figure 7 e parameter size combinations thatdo not meet the required pattern will be discarded

42 Measurement Metrics To measure the performance ofthe models two accuracy measurements are used esetwo-measurement metrics are hit rate and correlationmeasurement

Hit rate or probability of detection (POD) is theprobability that the forecasted event matches the observedevent In the context of this research work the observedevents are either low IWIP or high IWIP erefore hit ratecan be used to measure the forecast capability of the pro-posed method to match the actual IWIP events Let HRdenotes hit rate n denotes the number correct detectionand N denotes the total number of observation hit rate isexpressed as

HR n

Ntimes 100 (13)

From the requirement of the Fab it is only necessary forthe proposedmethod to be able to forecast any two days withhighest IWIP and any two days with lowest IWIP For thesefour days to be forecasted the hit rate required by the Fab is75 percent In other words at least three out of these fourdays must be detected

To measure the correlation between the actual IWIP andthe forecasted IWIP this research uses the Pearsonrsquos cor-relation coecient r Pearsonrsquos r is a measure of the linearrelationship between two vectors of variables In this re-search work these two vectors of variables are the actual

Epoch

RMSE

TrainingTesting

Figure 7 RMSE curves when the model is well learned

TrainingTesting

Epoch

RMSE

Figure 8 RMSE curves when then model is underlearned

TrainingTesting

Epoch

RMSE

Figure 9 RMSE curves when the model is overlearned(overtting)

8 Computational Intelligence and Neuroscience

IWIP and the forecasted IWIP Let y denotes the actualIWIP and 1113957y denotes the forecasted IWIP Pearsonrsquos r isexpressed as

r cov(y 1113957y)

σyσ1113957y (14)

where cov is the covariance of actual IWIP and forecastedIWIP σy is the standard deviation of the actual IWIP and σ1113957yis the standard deviation of the forecasted IWIP

-e correlation coefficient takes values in the range [minus11] -e value of 1 implies that a linear equation describes therelationship between the two vectors perfectly -is meansthat all data points of the two vectors fit perfectly on astraight line on a graph -e positive sign of the coefficientindicates positive correlation -is means that as the actualIWIP increase and the forecasted IWIP increases as well-e negative sign of the coefficient indicates negative cor-relation -is means that as the actual IWIP increase theforecasted IWIP decreases as well Positive correlation istherefore desirable for the forecast results

Due to the Fabrsquos privacy protection agreement only theobtained Pearsonrsquos r will be reported while the detail cal-culations of the covariance and standard deviation will beomitted Based on the requirement of the Fab the minimumPearsonrsquos r value is 04

We conduct the experiment for three consecutive weeks-is allows us to monitor the consistency of the modelsrsquoprediction At the beginning of each week we will predictseven days ahead and measure the performance at the end ofeach week -e implementation of the proposed method isaccomplished using Python programming language andKeras [36] neural network library

43 Results Analysis and Discussions Table 4 tabulates theresults of the experiments -e parameter size combinationsobtained in Table 4 are combinations that exhibited curvepattern similar to Figure 7

Figures 10ndash12 show the graphs of the supervised-learning results for the selected three combinations re-spectively From the figures it can be seen that both lines arefar apart although they move along in descending pattern Inaddition the line graph of the training is descending slowlyand remained high at the end of the epoch However theneural network was still able to forecast the IWIP that arequite close to the actual IWIP -is is shown by the linegraph of the testingrsquos RMSE that exhibits small fluctuates Byreferring to these figures alone we are not able to identify thebest parameter size combination because all three graphsexhibit similar pattern -erefore hit rate and linear cor-relation of the forecasting results of each combination can beused to identify the best parameter size combination

-e parameters from each of the selected three com-binations were applied on the proposed LSTM model toperform the three consecutive weeks forecasting -e ex-periments were run and recorded separately for eachcombination Tables 5ndash7 tabulate the hit rate percentage forCombinations 1 2 and 3 respectively Tables 8ndash10 tabulatethe Pearsonrsquos r for Combinations 1 2 and 3 respectively

Table 11 summarizes the hit rate of Combinations 1 2 and 3while Table 12 summarizes the Pearsonrsquos r of Combinations1 2 and 3

Figures 13ndash15 shows the graph plots for the IWIPforecast for the three parameter size combinationsrespectively

From the results obtained the model performed the bestusing Combination 3 In terms of hit rate Combination 3scored the highest compare to Combinations 1 and 2 for allthree weeks Combinations 1 and 2 scored 75 percent forweek 1 but for subsequent weeks both combinations onlyscored the maximum of 50 percent In terms of Pearsonrsquos rCombination 3 has the best performance in overall compareto Combinations 1 and 2 while Combination 1 has the leastperformance Although on week 3 the Pearsonrsquos r ofCombination 3 is slightly lower than Combination 2however it is still above the Fabrsquos requirement

We then compare the forecast result using Combination3 to the statistical forecasting method used in the Fab Inorder to make the writing clearer the statistical forecastingmethod used in the Fab is abbreviated as Fab methodTables 13 and 14 tabulate the hit rate and Pearsonrsquos r of Fabrsquosmethod respectively Table 15 tabulates the comparison ofthe forecast results between the proposed method and Fabrsquosmethod Figure 16 shows the WIP forecast using Fabrsquosmethod

-e Fab method serves as the baseline to measure theperformance of the LSTM forecasting model From theresults tabulated in Table 15 the proposed method withLSTM forecasting model outperformed the Fab methodHowever both hit rate and Pearsonrsquos r of the proposedmethod is unable to remain consistent for three consecutiveweeks forecasted -e results also show that the IWIPforecasted by the Fab method consistently failed to meet therequirement of the Fab for both hit rate and Pearsonrsquos r -emain reason for the inaccuracy of the Fab method is that itonly considers for products with the number of wafersordered dominating the total WIP in the production lineHowever operators need to process other wafers from otherproducts as well Hence the wafers did not arrive on time aspredicted In addition the Fab method does not consider thenumber of tools available at each process steps to process thewafers and the total amount of time that each tool is used toprocess the wafers In real environment a tool can be takenoffline for maintenance purposes or it could be used by therespective engineers to process specially crafted wafers forresearch and development purposes Without taking intothese considerations the Fab method indirectly assumedthat the number of tools available and the time of each tooldedicated to process wafers are the same across the entireperiod of the wafer fabrication process -is assumptioncaused the forecasted results to have negative correlationwith the actual IWIP

For hit rate the proposed method only scored 50 forweek 3 while for Pearsonrsquos r the proposed method onlyscored 031 for week 2 One of the factors that caused thereduced performance of the model could be due to thereason that the size of the historical data to train the LSTMmodel is not large enough Larger historical IWIP data could

Computational Intelligence and Neuroscience 9

potentially allow the LSTM model of the proposed methodto discover more time-dependent relations in the Fabrsquosproduction environment With the additional time-dependent relations discovered the accuracy of themodelrsquos forecasting can be increased

e next factor that could contribute to the inconsistentresult of the model is the limited number of features thatused to represent Fabrsquos production environment Havingadditional features to represent Fabrsquos production environ-ment could allow LSTM to perform better modeling of WIParrival e examples of additional data that could serve assuch features are the actual number of equipment thatsupplies the WIP to the tool group of interest the amount oftime each equipment in the tool group is processing the

production wafers instead of performing other maintenanceactivities and the number of wafers that each equipment inthe tool group of interest has actually processed

e last factor that contributes to the inconsistent resultscould be the need for more hidden layers As the number ofhidden layers increases it creates a deeper neural networkthat could potential allow the model to capture even moretime-dependent relations in the data However in order tobenet from deeper neural network larger dataset must rstbe obtained so that the model can be properly trained

For the experiments conducted the selection of sizes forthe LSTM modelrsquos parameters and the number of experi-mental runs is largely aected by the hardware resourcesallocation and the software capability setup From the

Table 4 Parameter size selection results

CombinationParameter

RMSEEpoch Batch size n LSTM hidden layers stacked LSTM neuronsize

1 100 10 3 512 8 8 000962 100 10 3 512 8 16 000863 100 20 3 512 16 16 00091

0

0005

001

0015

002

0025

003

1 21 41 61 81

RMSE

Epoch

Training vs testing

TrainingTesting

Figure 10 Supervised-learning result for Combination 1

00060008

0010012001400160018

002002200240026

1 21 41 61 81

RMSE

Epoch

Training vs testing

TrainingTesting

Figure 11 Supervised-learning result for Combination 2

10 Computational Intelligence and Neuroscience

hardware resource perspective sucient CPU should beallocated in the computing machine while from the softwarecapability perspective parallelization should be enabled tofully utilize the available CPU With 4 CPUs allocated in avirtual machine environment and parallelization enabled inKeras it took approximately 8 hours to complete one fullexperiment One full experiment refers to the completeevaluation all the predened sizes For real productiondeployment 8 hours is too long to obtain a usable modelParallelization with sucient number of CPUs in thecomputing machine are therefore critical in the production

environment as the results should be obtained as fast aspossible in order for the managements to make the necessarydecision for production line stability Hence proper hard-ware planning is required for production deployment

5 Conclusion

PM activity is an important activity in the Fab as it maintainsor increases the operational eciency and reliability of thetool Proper PM planning is necessary as PM activity takessignicantly long time to complete thus it is desirable to

TrainingTesting

0

0005

001

0015

002

0025

003

1 21 41 61 81

RMSE

Epoch

Training vs testing

Figure 12 Supervised-learning result for Combination 3

Table 5 Hit rate for Combination 1

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

1 L2 L34 L H5 H L67 H H 1

HR 25

w3

1 L L 12 L3 L4 H5 H H 16 H7

HR 50

Table 6 Hit rate for Combination 2

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

1 L2 L L 134 L5 H H 16 H7 H

HR 50

w3

1 L2 L3 L4 H5 H H 16 H7 L

HR 25

Computational Intelligence and Neuroscience 11

perform this activity when the IWIP to the tool group isexpected to be low With an IWIP prediction model that iscapable to predict the IWIP with high accuracy PM activitycan be planned and managed better to reduce its negativeimpact to the fabrsquos CT Reducing the negative impact to CTisimportant as this will enable the Fab to meet the On-Time-Delivery (OTD) committed to customers With consistencyin the OTD the logistic management of the company can beimproved as well such as proper storage place planning to

keep the fabricated wafers and scheduling their trans-portations for shipments Well-planned PM activities alsoallow better manpower planning in areas such as manpowerplanning When performing the PM activity sufficient toolengineers and tool vendors are required to be onsite toperform the prescribed maintenance activities Well-planned PM activities allow the required manpower to beproperly prepared Well-planned manpower directly con-tributes to better manpower cost planning With proper PMplanning in-place tools in the Fab can be scheduled toreceive their proper maintenances on time It is importantfor tools in the Fab to receive their appropriate maintenanceson-time to improve its productivity and lifetime extensionWith improved performance and extended lifetime thecapital investments of the company on the tools can beoptimized Reliable tool performance will also increase thetrust of the customers as chances of the fabricated wafersbeing scrapped due to unhealthy tool are minimized

In this paper we investigated LSTM to assist in PMplanning in Fab by predicting the IWIP to a tool group -ecomparison of the performance of the proposed method wasdone with an existing forecasting method from the Fab -eproposed method was trained using the historical IWIP dataprovided by the Fab which is a time series data Both hit rateand Pearsonrsquos correlation coefficient are important criteriathat determine the forecast capability -e proposed methodhas demonstrated results that outperformed the Fab methodby reaching above the requirement of the Fab for week 1 andweek 3 while the Fabmethod fails to meet Fabrsquos requirementfor all three weeks In terms of hit rate the proposed methodshows higher percentage than Fabrsquos method Following therequirement given by the Fab the results of the proposedmethod signifies that for forecast duration of seven days it isable to identify more accurately the two days that the IWIPwill be highest and the two days the IWIP will be lowest in aweek In terms of Pearsonrsquos correlation coefficient r theproposed method shows positive correlation and highervalue than the Fabrsquos method -is result signifies that theforecast results of the proposed method are able to producebetter forecasting results that have closer proportionalchanges in relation to the actual IWIP -e LSTM model

Table 7 Hit rate for Combination 3

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

12 L L 134 L5 H H 167 H H 1

HR 75

w3

1 L L 12 L3 L4 H5 H H 16 H7

HR 50

Table 8 Pearsonrsquos r for combination 1

Week r

w1 031w2 006w3 034

Table 9 Pearsonrsquos r for combination 2

Week r

w1 040w2 028w3 046

Table 10 Pearsonrsquos r for combination 3

Week r

w1 042w2 031w3 043

Table 11 Summary of hit rate for Combinations 1 2 and 3

CombinationHit rate ()

w1 w2 w31 75 25 502 75 50 253 75 75 50

Table 12 Summary of Pearsonrsquos r for combinations 1 2 and 3

CombinationPearsonrsquos r

w1 w2 w31 031 006 0342 040 028 0463 042 031 043

12 Computational Intelligence and Neuroscience

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 10 batch sizeand 3 hidden layers (512 8 8 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 13 IWIP forecast using Combination 1

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 10 batch sizeand 3 hidden layers (512 8 16 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 14 IWIP forecast using Combination 2

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 20 batch sizeand 3 hidden layers (512 16 16 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 15 IWIP forecast using Combination 3

Computational Intelligence and Neuroscience 13

Table 13 Hit rate for Fab method

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L6 L7 L

HR 50

w2

12 L H3 H4 L5 H L6 L7 H

HR 0

w3

1 L H2 L34 H5 H L6 L7 H

HR 0

Table 14 Pearsonrsquos r for Fab method

Week r

w1 028w2 minus011w3 minus082

Table 15 Forecast result comparison between proposed method and Fab

Hit rate () Pearsonrsquos rWeek Week

w1 w2 w3 w1 w2 w3Proposed method 750 750 500 042 031 043Fab 500 00 00 028 minus011 minus082

ndash10000

00000

10000

20000

30000

40000

50000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using Fab method

ActualProposed LSTM10 upper limit

10 lower limitFab method

Figure 16 IWIP forecast using Fab method

14 Computational Intelligence and Neuroscience

used in the proposed method contains memory cells tomemorize the long and short temporal features in the datawhich yields better performance for the prediction of timeseries data -erefore the proposed method will be veryuseful and benefit to the PM planning

Although the proposed method is outperformed theexisting Fabrsquos statistical method there is still room forimprovement -e first future work is to increase the size ofhistorical dataset With the use of larger historical datasetthat spans across longer historical time horizon to train theLSTM model it may be potential for the LSTM model todiscover significantWIP arrival pattern that could have beenmissed out in smaller historical dataset -e second futurework is to extend the univariate forecasting model in thisresearch to multivariate forecasting model -e reason forthis extension is to allow the inclusion of more features totrain the LSTM model so that the LSTM model can bettermodel the actual environment of the Fab -e next futurework is to increase the number of hidden layers in the LSTMforecasting model -e approach to increase the number ofhidden layer is also an initial step to experiment the potentialuse of deep-learning model in time series forecasting -elast future work is to extend the application of the proposedmethod to predict the IWIP of other types of tool group toexperiment if the proposed method is capable to deliveringthe same prediction performance -e collected predictionresults across various types of tool groups from the futurework will also allow us to generalize the proposed method tobe used as a generic IWIP prediction model for the fab

Data Availability

-e time-series data used to support the findings of thisstudy were supplied by X-Fab Sarawak Sdn Bhd underprivacy agreement and there the data cannot be made freelyavailable -e data potentially reveal sensitive informationand therefore their access is being restricted

Conflicts of Interest

-e authors declare that they have no conflicts of interest

Acknowledgments

-e funding for this project was made possible through theresearch grant from the Ministry of Education Malaysiaunder the Research Acculturation Collaborative Effort(Grant No RACEb(3)12472015(03)) -e authors wouldlike to thank X-FAB Sarawak Sdn Bhd for their support inthis research by providing the environment and resources toextract the relevant data

References

[1] J A Ramırez-Hernandez J Crabtree X Yao et al ldquoOptimalpreventive maintenance scheduling in semiconductormanufacturing systems software tool and simulation casestudiesrdquo IEEE Transactions on Semiconductor Manufacturingvol 23 no 3 pp 477ndash489 2010

[2] Y Tian and L Pan ldquoPredicting short term traffic flow by longshort-termmemory recurrent neural networkrdquo in Proceedingsof 2015 IEEE International Conference on Smart Citypp 153ndash158 Chengdu China December 2015

[3] K Zhang J Xu M R Min G Jiang K Pelechrinis andH Zhang ldquoAutomated IT system failure prediction a deeplearning approachrdquo in Proceedings of 2016 IEEE InternationalConference on Big Data (Big Data) pp 1291ndash1300 Wash-ington DC USA March 2016

[4] G Zhu L Zhang P Shen and J Song ldquoMultimodal gesturerecognition using 3D convolution and convolutional LSTMrdquoIEEE Access vol 5 pp 4517ndash4524 2017

[5] J Lai B Chen T Tan S Tong and K Yu ldquoPhone-awareLSTM-RNN for voice conversionrdquo in Proceedings of 2016IEEE 13th International Conference on Signal Processing(ICSP) pp 177ndash182 Chengdu China November 2016

[6] A ElSaid B Wild J Higgins and T Desell ldquoUsing LSTMrecurrent neural networks to predict excess vibration events inaircraft enginesrdquo in Proceedings of 2016 IEEE 12th In-ternational Conference on e-Science pp 260ndash269 BaltimoreMD USA October 2016

[7] J Wang J Zhang and X Wang ldquoA data driven cycle timeprediction with feature selection in a semiconductor waferfabrication systemrdquo IEEE Transactions on SemiconductorManufacturing vol 31 no 1 pp 173ndash182 2018

[8] J Wang J Zhang and X Wang ldquoBilateral LSTM a two-dimensional long short-term memory model with multiplymemory units for short-term cycle time forecasting in re-entrant manufacturing systemsrdquo IEEE Transactions on In-dustrial Informatics vol 14 no 2 pp 748ndash758 2018

[9] W Scholl B P Gan P Lendermann et al ldquoImplementationof a simulation-based short-term lot arrival forecast in amature 200 mm semiconductor FABrdquo in Proceedings of 2011Winter Simulation Conference (WSC) pp 1927ndash1938Phoenix AZ USA December 2011

[10] M Mosinski D Noack F S Pappert O Rose andW SchollldquoCluster based analytical method for the lot delivery forecastin semiconductor fab with wide product rangerdquo in Pro-ceedings of 2011 Winter Simulation Conference (WSC)pp 1829ndash1839 Phoenix AZ USA December 2011

[11] H K Larry ldquoEvent-based short-term traffic flow predictionmodelrdquo Transportation Research Record vol 1510 pp 45ndash521995

[12] B M Williams and L A Hoel ldquoModeling and forecastingvehicular traffic flow as a seasonal ARIMA process theoreticalbasis and empirical resultsrdquo Journal of Transportation Engi-neering vol 129 no 6 pp 664ndash672 2003

[13] M Van Der Voort M Dougherty and S Watson ldquoCom-bining Kohonen maps with ARIMA time series models toforecast traffic flowrdquo Transportation Research Part CEmerging Technologies vol 4 no 5 pp 307ndash318 1996

[14] Y Xie Y Zhang and Z Ye ldquoShort-term traffic volumeforecasting using Kalman filter with discrete wavelet de-compositionrdquo Computer-Aided Civil and Infrastructure En-gineering vol 22 no 5 pp 326ndash334 2007

[15] W Huang G Song H Hong and K Xie ldquoDeep architecturefor traffic flow prediction deep belief networks with multitasklearningrdquo IEEE Transactions on Intelligent TransportationSystems vol 15 no 5 pp 2191ndash2201 2014

[16] A Abadi T Rajabioun and P A Ioannou ldquoTraffic flowprediction for road transportation networks with limitedtraffic datardquo IEEE Transactions on Intelligent TransportationSystems vol 16 no 2 pp 653ndash662 2015

Computational Intelligence and Neuroscience 15

[17] R Fu Z Zhang and L Li ldquoUsing LSTM and GRU neuralnetwork methods for traffic flow predictionrdquo in Proceedings of2016 31st Youth Academic Annual Conference of ChineseAssociation of Automation (YAC) pp 324ndash328 WuhanChina November 2016

[18] H Shao and B Soong ldquoTraffic flow prediction with longshort-term memory networks (LSTMs)rdquo in Proceedings of2016 IEEE Region 10 Conference (TENCON) pp 2986ndash2989Singapore November 2016

[19] M S Ahmed and A R Cook ldquoAnalysis of freeway traffictime-series data by using Box-Jenkins Techniquesrdquo Trans-portation Research Record vol 773 no 722 pp 1ndash9 1979

[20] I Okutani ldquo-e Kalman filtering approaches in sometransportation and traffic problemsrdquo Transportation ResearchRecord vol 2 no 1 pp 397ndash416 1987

[21] I Okutani and Y J Stephanedes ldquoDynamic prediction oftraffic volume through Kalman filtering theoryrdquo Trans-portation Research Part B Methodological vol 18 no 1pp 1ndash11 1984

[22] H J H Ji A X A Xu X S X Sui and L L L Li ldquo-e appliedresearch of Kalman in the dynamic travel time predictionrdquo inProceedings of 18th International Conference on Geo-informatics pp 1ndash5 Beijing China June 2010

[23] Y Wang and M Papageorgiou ldquoReal-time freeway trafficstate estimation based on extended Kalman filter a generalapproachrdquo Transportation Research Part B Methodologicalvol 39 no 2 pp 141ndash167 2005

[24] Y Bengio ldquoLearning deep architectures for AIrdquo Foundationsand Trendsreg in Machine Learning vol 2 no 1 pp 1ndash1272009

[25] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inProceedings of NIPS Lake Tahoe NV USA December 2012

[26] G E Hinton and R R Salakhutdinov ldquoReducing the di-mensionality of data with neural networksrdquo Science vol 313no 5786 pp 504ndash507 2006

[27] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of 25th ICML pp 160ndash167 HelsinkiFinland July 2008

[28] I J Goodfellow Y Bulatov J Ibarz S Arnoud and V ShetldquoMulti-digit number recognition from street view imageryusing deep convolutional neural networksrdquo 2013 httpsarxivorgabs13126082

[29] B Huval A Coates and A Ng ldquoDeep learning for class-generic object detectionrdquo 2013 httpsarxivorgabs13126885

[30] H C Shin M R Orton D J Collins S J Doran andM O Leach ldquoStacked autoencoders for unsupervised featurelearning and multiple organ detection in a pilot study using4D patient datardquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 35 no 8 pp 1930ndash1943 2013

[31] Y Lv Y Duan W Kang Z Li and F Wang ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16no 2 pp 865ndash873 2015

[32] R Zhao J Wang R Yan and K Mao ldquoMachine healthmonitoring with LSTM networksrdquo in Proceedings of 2016 10thInternational Conference on Sensing Technology (ICST)pp 1ndash6 Nanjing China November 2016

[33] X Ma Z Tao Y Wang H Yu and Y Wang ldquoLong short-term memory neural network for traffic speed predictionusing remote microwave sensor datardquo Transportation

Research Part C Emerging Technologies vol 54 pp 187ndash1972015

[34] E I Vlahogianni M G Karlaftis and J C Golias ldquoOptimizedand meta-optimized neural networks for short-term trafficflow prediction a genetic approachrdquo Transportation ResearchPart C Emerging Technologies vol 13 no 3 pp 211ndash2342005

[35] S Hochreiter and J Schmidhuber ldquoLong short-term mem-oryrdquo Neural Computation vol 9 no 8 pp 1735ndash1780 1997

[36] Keras -e Python Deep Learning Library 2018 httpskerasio

16 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 7: IncomingWork-In-ProgressPredictioninSemiconductor ...downloads.hindawi.com/journals/cin/2019/8729367.pdf · sequentially in time. e IWIP forecast to a tool group is similar to traffic

learning is affected by the batch size used per epoch thenumber of epoch hidden layers and hidden neuron -ecombination of the sizes of these four parameters that resultsin stable supervised-learning and delivers the lowest forecasterror is desired

Each parameter being examined will have a list ofpredefined sizes to be tested When one of the parameters isbeing examined the remaining two parameters will be fixedto their current sizes in their respective list -is is to controlthe variation across the examinations For each combinationof the parameters the model will be tested with thatcombination to measure its performance in terms of theforecasting error and the stability of its supervised-learning

For the LSTM setup of this research we construct aLSTM model using the LSTM cell Let t denote the obser-vation time of each IWIP and x denotes the IWIP the inputof the LSTMmodel is the observed IWIP x at time t denotedas xt and the output of the LSTM model is the predictedIWIP 1113957xt+1 -rough the LSTM equations presented 1113957xt+1 istherefore calculated as

1113957xt+1 W middot ht + b (10)

where W is the weight matrix between the output layer andthe hidden layer

-e metric used to measure the forecasting error in thesupervised-learning is the root-mean-squared error (RMSE)Let P denote the actual IWIP 1113957P denote the forecasted IWIPand n denotes the total day forecasted RMSE is defined asfollows

RMSE

1n

1113944

n

j1Pj minus 1113957Pj1113872 1113873

2

11139741113972

(11)

RMSE is a frequently used evaluation metric because itmeasures the difference between the values predicted by amodel and the actually observed values

For each parameter size combination to be tested themodel will be experimented multiple times with the sameparameters setting If N denotes the number of times theexperiment was conducted there will be N number of RMSEobtained to represent the performance of the model for eachexperiment -e reason running multiple experiments foreach parameter size combination is because internallyneural network uses randomization to assign the weightsand the states of its neurons -is produces different fore-casting errors between experiments -erefore multipleexperimental runs are recommended to allow for the se-lection of the neural network model with internal settings toproduce the lowest RMSE

After the supervised-learning is completed the proposedmethod will proceed to parameter combination evaluationand selection -e evaluation of parameter combination andselection step is necessary because it is common to assumethat a particular parameter combination that gives a lowRMSE at the end of the supervised-learning directlytranslated to a good parameter combination that allowssufficient capability of the model to perform forecastHowever this assumption is misleading because a model

that has overlearned during the supervised-learning candeliver results with very low RMSE at the end of the trainingAn overlearned model will perform poorly in the actualforecast -erefore it is necessary to also measure the sta-bility of the supervised-learning of the model given aparticular combination of the four parameters During eachepoch in the supervised-learning the model will be requiredto perform two forecasts one uses a reserved set from thetraining set and other uses a reserved set from the testing setWith two forecasts performed two RMSE will be generated-e RMSE generated by using training set is the trainingerror while the RMSE generated by using the reservedtesting set is the testing error To measure the stability of thesupervised-learning the RMSE for both training error andtesting error of each epoch are collected and plotted in asingle graph With y-axis representing the RMSE and x-axisrepresenting the number of epochs Figures 7ndash9 show ex-amples of the curves that exhibited from the supervised-learning -e combination of parameter sizes that allows themodel to exhibit learning curve pattern similar to Figure 7 isthe desired selection Learning curve with pattern similar toFigure 7 signifies that the model was able to perform stablesupervised-learning with stable reduction in the RMSE ofboth training and testing phases using the selected combi-nation of parameter sizes In other words themodel was ableto discover the time-dependent relation in the given datasetsuch that it allows the model to minimize its prediction errorfor each epoch of the supervised-learning

-e combination of the four parametersrsquo sizes that en-ables the model to show stable performance in thesupervised-learning and lowest RMSE will be selected toforecast the IWIP

For each of the selected parameter combination themodel is required to forecast for three consecutive weeks-e accuracies for the forecast results will be measuredaccording to the selected measurement metrics to evaluatethe forecasting capability of the model

4 Experimental Results

41DataDescriptionandExperimentalDesign -e IWIP fora particular tool group is denoted as IWIP and can becalculated as

IWIP WIPt24 minusWIPt1( 1113857 1113944

24

t1(MOVE) (12)

where MOVE denotes the number of wafer moved per hourt1 refers to the first hour the data are collected and t24 refersto the twenty-fourth hour the data are collected In thisstudy the first hour is at 0830 while the 0730 on the next dayis the twenty-fourth hour

-e data use for this experiment is acquired from theFabrsquos internal development database with the applicationrunning hourly to collect the WIP and calculate the numberof wafers moved for each tool group in the production lineevery 24 hours Due to the Fabrsquos data security and confi-dentiality policies we are only allowed to access productionsystemrsquos data source of the company to perform data

Computational Intelligence and Neuroscience 7

collection for a specic duration Given the allowed durationfrom the Fab we were able to collect three months data tocreate a data set with 90 days of historical IWIP With eachIWIP as a data point 70 percent of the data points are usedfor the LSTM training phase and the remaining 30 percentfor testing phase

For the number of epochs numerical values of 100 and200 are selected For batch size numerical values of 10 and20 are selected for the number of hidden layers numericalvalues of 3 and 4 are selected and for the number of hiddenneuron numerical values of 384 and 512 are selected for therst hidden layer while the numerical values of 8 and 16 areselected for the subsequent layers It is worthwhile tomention that by using seven IWIP points per dataset as thenumber of previous IWIP lags to be examined each of thenumerical values for batch size denotes the number of weekspresented to the LSTMmodel per epoche neural networkis initialized with uniformly distributed weights where theranges of the weights are (minus01 01) and trained usingmean-squared-error (MSE) as the loss function Adamoptimizer is used as the optimization function with default

learning rate η 0001 β1 09 β2 0999 ε 0 andc 0 Each combination of the selected values is thenevaluated three times to obtain the three RMSE results ofeach combination

Parameter size selection is done by selecting the lowestRMSE among the three experimental runs followed by ex-amining the graphs of the supervised-learning result of thesame run that produced the lowest RMSE e desiredsupervised-learning graph should resemble the pattern il-lustrated in Figure 7 e parameter size combinations thatdo not meet the required pattern will be discarded

42 Measurement Metrics To measure the performance ofthe models two accuracy measurements are used esetwo-measurement metrics are hit rate and correlationmeasurement

Hit rate or probability of detection (POD) is theprobability that the forecasted event matches the observedevent In the context of this research work the observedevents are either low IWIP or high IWIP erefore hit ratecan be used to measure the forecast capability of the pro-posed method to match the actual IWIP events Let HRdenotes hit rate n denotes the number correct detectionand N denotes the total number of observation hit rate isexpressed as

HR n

Ntimes 100 (13)

From the requirement of the Fab it is only necessary forthe proposedmethod to be able to forecast any two days withhighest IWIP and any two days with lowest IWIP For thesefour days to be forecasted the hit rate required by the Fab is75 percent In other words at least three out of these fourdays must be detected

To measure the correlation between the actual IWIP andthe forecasted IWIP this research uses the Pearsonrsquos cor-relation coecient r Pearsonrsquos r is a measure of the linearrelationship between two vectors of variables In this re-search work these two vectors of variables are the actual

Epoch

RMSE

TrainingTesting

Figure 7 RMSE curves when the model is well learned

TrainingTesting

Epoch

RMSE

Figure 8 RMSE curves when then model is underlearned

TrainingTesting

Epoch

RMSE

Figure 9 RMSE curves when the model is overlearned(overtting)

8 Computational Intelligence and Neuroscience

IWIP and the forecasted IWIP Let y denotes the actualIWIP and 1113957y denotes the forecasted IWIP Pearsonrsquos r isexpressed as

r cov(y 1113957y)

σyσ1113957y (14)

where cov is the covariance of actual IWIP and forecastedIWIP σy is the standard deviation of the actual IWIP and σ1113957yis the standard deviation of the forecasted IWIP

-e correlation coefficient takes values in the range [minus11] -e value of 1 implies that a linear equation describes therelationship between the two vectors perfectly -is meansthat all data points of the two vectors fit perfectly on astraight line on a graph -e positive sign of the coefficientindicates positive correlation -is means that as the actualIWIP increase and the forecasted IWIP increases as well-e negative sign of the coefficient indicates negative cor-relation -is means that as the actual IWIP increase theforecasted IWIP decreases as well Positive correlation istherefore desirable for the forecast results

Due to the Fabrsquos privacy protection agreement only theobtained Pearsonrsquos r will be reported while the detail cal-culations of the covariance and standard deviation will beomitted Based on the requirement of the Fab the minimumPearsonrsquos r value is 04

We conduct the experiment for three consecutive weeks-is allows us to monitor the consistency of the modelsrsquoprediction At the beginning of each week we will predictseven days ahead and measure the performance at the end ofeach week -e implementation of the proposed method isaccomplished using Python programming language andKeras [36] neural network library

43 Results Analysis and Discussions Table 4 tabulates theresults of the experiments -e parameter size combinationsobtained in Table 4 are combinations that exhibited curvepattern similar to Figure 7

Figures 10ndash12 show the graphs of the supervised-learning results for the selected three combinations re-spectively From the figures it can be seen that both lines arefar apart although they move along in descending pattern Inaddition the line graph of the training is descending slowlyand remained high at the end of the epoch However theneural network was still able to forecast the IWIP that arequite close to the actual IWIP -is is shown by the linegraph of the testingrsquos RMSE that exhibits small fluctuates Byreferring to these figures alone we are not able to identify thebest parameter size combination because all three graphsexhibit similar pattern -erefore hit rate and linear cor-relation of the forecasting results of each combination can beused to identify the best parameter size combination

-e parameters from each of the selected three com-binations were applied on the proposed LSTM model toperform the three consecutive weeks forecasting -e ex-periments were run and recorded separately for eachcombination Tables 5ndash7 tabulate the hit rate percentage forCombinations 1 2 and 3 respectively Tables 8ndash10 tabulatethe Pearsonrsquos r for Combinations 1 2 and 3 respectively

Table 11 summarizes the hit rate of Combinations 1 2 and 3while Table 12 summarizes the Pearsonrsquos r of Combinations1 2 and 3

Figures 13ndash15 shows the graph plots for the IWIPforecast for the three parameter size combinationsrespectively

From the results obtained the model performed the bestusing Combination 3 In terms of hit rate Combination 3scored the highest compare to Combinations 1 and 2 for allthree weeks Combinations 1 and 2 scored 75 percent forweek 1 but for subsequent weeks both combinations onlyscored the maximum of 50 percent In terms of Pearsonrsquos rCombination 3 has the best performance in overall compareto Combinations 1 and 2 while Combination 1 has the leastperformance Although on week 3 the Pearsonrsquos r ofCombination 3 is slightly lower than Combination 2however it is still above the Fabrsquos requirement

We then compare the forecast result using Combination3 to the statistical forecasting method used in the Fab Inorder to make the writing clearer the statistical forecastingmethod used in the Fab is abbreviated as Fab methodTables 13 and 14 tabulate the hit rate and Pearsonrsquos r of Fabrsquosmethod respectively Table 15 tabulates the comparison ofthe forecast results between the proposed method and Fabrsquosmethod Figure 16 shows the WIP forecast using Fabrsquosmethod

-e Fab method serves as the baseline to measure theperformance of the LSTM forecasting model From theresults tabulated in Table 15 the proposed method withLSTM forecasting model outperformed the Fab methodHowever both hit rate and Pearsonrsquos r of the proposedmethod is unable to remain consistent for three consecutiveweeks forecasted -e results also show that the IWIPforecasted by the Fab method consistently failed to meet therequirement of the Fab for both hit rate and Pearsonrsquos r -emain reason for the inaccuracy of the Fab method is that itonly considers for products with the number of wafersordered dominating the total WIP in the production lineHowever operators need to process other wafers from otherproducts as well Hence the wafers did not arrive on time aspredicted In addition the Fab method does not consider thenumber of tools available at each process steps to process thewafers and the total amount of time that each tool is used toprocess the wafers In real environment a tool can be takenoffline for maintenance purposes or it could be used by therespective engineers to process specially crafted wafers forresearch and development purposes Without taking intothese considerations the Fab method indirectly assumedthat the number of tools available and the time of each tooldedicated to process wafers are the same across the entireperiod of the wafer fabrication process -is assumptioncaused the forecasted results to have negative correlationwith the actual IWIP

For hit rate the proposed method only scored 50 forweek 3 while for Pearsonrsquos r the proposed method onlyscored 031 for week 2 One of the factors that caused thereduced performance of the model could be due to thereason that the size of the historical data to train the LSTMmodel is not large enough Larger historical IWIP data could

Computational Intelligence and Neuroscience 9

potentially allow the LSTM model of the proposed methodto discover more time-dependent relations in the Fabrsquosproduction environment With the additional time-dependent relations discovered the accuracy of themodelrsquos forecasting can be increased

e next factor that could contribute to the inconsistentresult of the model is the limited number of features thatused to represent Fabrsquos production environment Havingadditional features to represent Fabrsquos production environ-ment could allow LSTM to perform better modeling of WIParrival e examples of additional data that could serve assuch features are the actual number of equipment thatsupplies the WIP to the tool group of interest the amount oftime each equipment in the tool group is processing the

production wafers instead of performing other maintenanceactivities and the number of wafers that each equipment inthe tool group of interest has actually processed

e last factor that contributes to the inconsistent resultscould be the need for more hidden layers As the number ofhidden layers increases it creates a deeper neural networkthat could potential allow the model to capture even moretime-dependent relations in the data However in order tobenet from deeper neural network larger dataset must rstbe obtained so that the model can be properly trained

For the experiments conducted the selection of sizes forthe LSTM modelrsquos parameters and the number of experi-mental runs is largely aected by the hardware resourcesallocation and the software capability setup From the

Table 4 Parameter size selection results

CombinationParameter

RMSEEpoch Batch size n LSTM hidden layers stacked LSTM neuronsize

1 100 10 3 512 8 8 000962 100 10 3 512 8 16 000863 100 20 3 512 16 16 00091

0

0005

001

0015

002

0025

003

1 21 41 61 81

RMSE

Epoch

Training vs testing

TrainingTesting

Figure 10 Supervised-learning result for Combination 1

00060008

0010012001400160018

002002200240026

1 21 41 61 81

RMSE

Epoch

Training vs testing

TrainingTesting

Figure 11 Supervised-learning result for Combination 2

10 Computational Intelligence and Neuroscience

hardware resource perspective sucient CPU should beallocated in the computing machine while from the softwarecapability perspective parallelization should be enabled tofully utilize the available CPU With 4 CPUs allocated in avirtual machine environment and parallelization enabled inKeras it took approximately 8 hours to complete one fullexperiment One full experiment refers to the completeevaluation all the predened sizes For real productiondeployment 8 hours is too long to obtain a usable modelParallelization with sucient number of CPUs in thecomputing machine are therefore critical in the production

environment as the results should be obtained as fast aspossible in order for the managements to make the necessarydecision for production line stability Hence proper hard-ware planning is required for production deployment

5 Conclusion

PM activity is an important activity in the Fab as it maintainsor increases the operational eciency and reliability of thetool Proper PM planning is necessary as PM activity takessignicantly long time to complete thus it is desirable to

TrainingTesting

0

0005

001

0015

002

0025

003

1 21 41 61 81

RMSE

Epoch

Training vs testing

Figure 12 Supervised-learning result for Combination 3

Table 5 Hit rate for Combination 1

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

1 L2 L34 L H5 H L67 H H 1

HR 25

w3

1 L L 12 L3 L4 H5 H H 16 H7

HR 50

Table 6 Hit rate for Combination 2

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

1 L2 L L 134 L5 H H 16 H7 H

HR 50

w3

1 L2 L3 L4 H5 H H 16 H7 L

HR 25

Computational Intelligence and Neuroscience 11

perform this activity when the IWIP to the tool group isexpected to be low With an IWIP prediction model that iscapable to predict the IWIP with high accuracy PM activitycan be planned and managed better to reduce its negativeimpact to the fabrsquos CT Reducing the negative impact to CTisimportant as this will enable the Fab to meet the On-Time-Delivery (OTD) committed to customers With consistencyin the OTD the logistic management of the company can beimproved as well such as proper storage place planning to

keep the fabricated wafers and scheduling their trans-portations for shipments Well-planned PM activities alsoallow better manpower planning in areas such as manpowerplanning When performing the PM activity sufficient toolengineers and tool vendors are required to be onsite toperform the prescribed maintenance activities Well-planned PM activities allow the required manpower to beproperly prepared Well-planned manpower directly con-tributes to better manpower cost planning With proper PMplanning in-place tools in the Fab can be scheduled toreceive their proper maintenances on time It is importantfor tools in the Fab to receive their appropriate maintenanceson-time to improve its productivity and lifetime extensionWith improved performance and extended lifetime thecapital investments of the company on the tools can beoptimized Reliable tool performance will also increase thetrust of the customers as chances of the fabricated wafersbeing scrapped due to unhealthy tool are minimized

In this paper we investigated LSTM to assist in PMplanning in Fab by predicting the IWIP to a tool group -ecomparison of the performance of the proposed method wasdone with an existing forecasting method from the Fab -eproposed method was trained using the historical IWIP dataprovided by the Fab which is a time series data Both hit rateand Pearsonrsquos correlation coefficient are important criteriathat determine the forecast capability -e proposed methodhas demonstrated results that outperformed the Fab methodby reaching above the requirement of the Fab for week 1 andweek 3 while the Fabmethod fails to meet Fabrsquos requirementfor all three weeks In terms of hit rate the proposed methodshows higher percentage than Fabrsquos method Following therequirement given by the Fab the results of the proposedmethod signifies that for forecast duration of seven days it isable to identify more accurately the two days that the IWIPwill be highest and the two days the IWIP will be lowest in aweek In terms of Pearsonrsquos correlation coefficient r theproposed method shows positive correlation and highervalue than the Fabrsquos method -is result signifies that theforecast results of the proposed method are able to producebetter forecasting results that have closer proportionalchanges in relation to the actual IWIP -e LSTM model

Table 7 Hit rate for Combination 3

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

12 L L 134 L5 H H 167 H H 1

HR 75

w3

1 L L 12 L3 L4 H5 H H 16 H7

HR 50

Table 8 Pearsonrsquos r for combination 1

Week r

w1 031w2 006w3 034

Table 9 Pearsonrsquos r for combination 2

Week r

w1 040w2 028w3 046

Table 10 Pearsonrsquos r for combination 3

Week r

w1 042w2 031w3 043

Table 11 Summary of hit rate for Combinations 1 2 and 3

CombinationHit rate ()

w1 w2 w31 75 25 502 75 50 253 75 75 50

Table 12 Summary of Pearsonrsquos r for combinations 1 2 and 3

CombinationPearsonrsquos r

w1 w2 w31 031 006 0342 040 028 0463 042 031 043

12 Computational Intelligence and Neuroscience

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 10 batch sizeand 3 hidden layers (512 8 8 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 13 IWIP forecast using Combination 1

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 10 batch sizeand 3 hidden layers (512 8 16 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 14 IWIP forecast using Combination 2

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 20 batch sizeand 3 hidden layers (512 16 16 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 15 IWIP forecast using Combination 3

Computational Intelligence and Neuroscience 13

Table 13 Hit rate for Fab method

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L6 L7 L

HR 50

w2

12 L H3 H4 L5 H L6 L7 H

HR 0

w3

1 L H2 L34 H5 H L6 L7 H

HR 0

Table 14 Pearsonrsquos r for Fab method

Week r

w1 028w2 minus011w3 minus082

Table 15 Forecast result comparison between proposed method and Fab

Hit rate () Pearsonrsquos rWeek Week

w1 w2 w3 w1 w2 w3Proposed method 750 750 500 042 031 043Fab 500 00 00 028 minus011 minus082

ndash10000

00000

10000

20000

30000

40000

50000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using Fab method

ActualProposed LSTM10 upper limit

10 lower limitFab method

Figure 16 IWIP forecast using Fab method

14 Computational Intelligence and Neuroscience

used in the proposed method contains memory cells tomemorize the long and short temporal features in the datawhich yields better performance for the prediction of timeseries data -erefore the proposed method will be veryuseful and benefit to the PM planning

Although the proposed method is outperformed theexisting Fabrsquos statistical method there is still room forimprovement -e first future work is to increase the size ofhistorical dataset With the use of larger historical datasetthat spans across longer historical time horizon to train theLSTM model it may be potential for the LSTM model todiscover significantWIP arrival pattern that could have beenmissed out in smaller historical dataset -e second futurework is to extend the univariate forecasting model in thisresearch to multivariate forecasting model -e reason forthis extension is to allow the inclusion of more features totrain the LSTM model so that the LSTM model can bettermodel the actual environment of the Fab -e next futurework is to increase the number of hidden layers in the LSTMforecasting model -e approach to increase the number ofhidden layer is also an initial step to experiment the potentialuse of deep-learning model in time series forecasting -elast future work is to extend the application of the proposedmethod to predict the IWIP of other types of tool group toexperiment if the proposed method is capable to deliveringthe same prediction performance -e collected predictionresults across various types of tool groups from the futurework will also allow us to generalize the proposed method tobe used as a generic IWIP prediction model for the fab

Data Availability

-e time-series data used to support the findings of thisstudy were supplied by X-Fab Sarawak Sdn Bhd underprivacy agreement and there the data cannot be made freelyavailable -e data potentially reveal sensitive informationand therefore their access is being restricted

Conflicts of Interest

-e authors declare that they have no conflicts of interest

Acknowledgments

-e funding for this project was made possible through theresearch grant from the Ministry of Education Malaysiaunder the Research Acculturation Collaborative Effort(Grant No RACEb(3)12472015(03)) -e authors wouldlike to thank X-FAB Sarawak Sdn Bhd for their support inthis research by providing the environment and resources toextract the relevant data

References

[1] J A Ramırez-Hernandez J Crabtree X Yao et al ldquoOptimalpreventive maintenance scheduling in semiconductormanufacturing systems software tool and simulation casestudiesrdquo IEEE Transactions on Semiconductor Manufacturingvol 23 no 3 pp 477ndash489 2010

[2] Y Tian and L Pan ldquoPredicting short term traffic flow by longshort-termmemory recurrent neural networkrdquo in Proceedingsof 2015 IEEE International Conference on Smart Citypp 153ndash158 Chengdu China December 2015

[3] K Zhang J Xu M R Min G Jiang K Pelechrinis andH Zhang ldquoAutomated IT system failure prediction a deeplearning approachrdquo in Proceedings of 2016 IEEE InternationalConference on Big Data (Big Data) pp 1291ndash1300 Wash-ington DC USA March 2016

[4] G Zhu L Zhang P Shen and J Song ldquoMultimodal gesturerecognition using 3D convolution and convolutional LSTMrdquoIEEE Access vol 5 pp 4517ndash4524 2017

[5] J Lai B Chen T Tan S Tong and K Yu ldquoPhone-awareLSTM-RNN for voice conversionrdquo in Proceedings of 2016IEEE 13th International Conference on Signal Processing(ICSP) pp 177ndash182 Chengdu China November 2016

[6] A ElSaid B Wild J Higgins and T Desell ldquoUsing LSTMrecurrent neural networks to predict excess vibration events inaircraft enginesrdquo in Proceedings of 2016 IEEE 12th In-ternational Conference on e-Science pp 260ndash269 BaltimoreMD USA October 2016

[7] J Wang J Zhang and X Wang ldquoA data driven cycle timeprediction with feature selection in a semiconductor waferfabrication systemrdquo IEEE Transactions on SemiconductorManufacturing vol 31 no 1 pp 173ndash182 2018

[8] J Wang J Zhang and X Wang ldquoBilateral LSTM a two-dimensional long short-term memory model with multiplymemory units for short-term cycle time forecasting in re-entrant manufacturing systemsrdquo IEEE Transactions on In-dustrial Informatics vol 14 no 2 pp 748ndash758 2018

[9] W Scholl B P Gan P Lendermann et al ldquoImplementationof a simulation-based short-term lot arrival forecast in amature 200 mm semiconductor FABrdquo in Proceedings of 2011Winter Simulation Conference (WSC) pp 1927ndash1938Phoenix AZ USA December 2011

[10] M Mosinski D Noack F S Pappert O Rose andW SchollldquoCluster based analytical method for the lot delivery forecastin semiconductor fab with wide product rangerdquo in Pro-ceedings of 2011 Winter Simulation Conference (WSC)pp 1829ndash1839 Phoenix AZ USA December 2011

[11] H K Larry ldquoEvent-based short-term traffic flow predictionmodelrdquo Transportation Research Record vol 1510 pp 45ndash521995

[12] B M Williams and L A Hoel ldquoModeling and forecastingvehicular traffic flow as a seasonal ARIMA process theoreticalbasis and empirical resultsrdquo Journal of Transportation Engi-neering vol 129 no 6 pp 664ndash672 2003

[13] M Van Der Voort M Dougherty and S Watson ldquoCom-bining Kohonen maps with ARIMA time series models toforecast traffic flowrdquo Transportation Research Part CEmerging Technologies vol 4 no 5 pp 307ndash318 1996

[14] Y Xie Y Zhang and Z Ye ldquoShort-term traffic volumeforecasting using Kalman filter with discrete wavelet de-compositionrdquo Computer-Aided Civil and Infrastructure En-gineering vol 22 no 5 pp 326ndash334 2007

[15] W Huang G Song H Hong and K Xie ldquoDeep architecturefor traffic flow prediction deep belief networks with multitasklearningrdquo IEEE Transactions on Intelligent TransportationSystems vol 15 no 5 pp 2191ndash2201 2014

[16] A Abadi T Rajabioun and P A Ioannou ldquoTraffic flowprediction for road transportation networks with limitedtraffic datardquo IEEE Transactions on Intelligent TransportationSystems vol 16 no 2 pp 653ndash662 2015

Computational Intelligence and Neuroscience 15

[17] R Fu Z Zhang and L Li ldquoUsing LSTM and GRU neuralnetwork methods for traffic flow predictionrdquo in Proceedings of2016 31st Youth Academic Annual Conference of ChineseAssociation of Automation (YAC) pp 324ndash328 WuhanChina November 2016

[18] H Shao and B Soong ldquoTraffic flow prediction with longshort-term memory networks (LSTMs)rdquo in Proceedings of2016 IEEE Region 10 Conference (TENCON) pp 2986ndash2989Singapore November 2016

[19] M S Ahmed and A R Cook ldquoAnalysis of freeway traffictime-series data by using Box-Jenkins Techniquesrdquo Trans-portation Research Record vol 773 no 722 pp 1ndash9 1979

[20] I Okutani ldquo-e Kalman filtering approaches in sometransportation and traffic problemsrdquo Transportation ResearchRecord vol 2 no 1 pp 397ndash416 1987

[21] I Okutani and Y J Stephanedes ldquoDynamic prediction oftraffic volume through Kalman filtering theoryrdquo Trans-portation Research Part B Methodological vol 18 no 1pp 1ndash11 1984

[22] H J H Ji A X A Xu X S X Sui and L L L Li ldquo-e appliedresearch of Kalman in the dynamic travel time predictionrdquo inProceedings of 18th International Conference on Geo-informatics pp 1ndash5 Beijing China June 2010

[23] Y Wang and M Papageorgiou ldquoReal-time freeway trafficstate estimation based on extended Kalman filter a generalapproachrdquo Transportation Research Part B Methodologicalvol 39 no 2 pp 141ndash167 2005

[24] Y Bengio ldquoLearning deep architectures for AIrdquo Foundationsand Trendsreg in Machine Learning vol 2 no 1 pp 1ndash1272009

[25] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inProceedings of NIPS Lake Tahoe NV USA December 2012

[26] G E Hinton and R R Salakhutdinov ldquoReducing the di-mensionality of data with neural networksrdquo Science vol 313no 5786 pp 504ndash507 2006

[27] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of 25th ICML pp 160ndash167 HelsinkiFinland July 2008

[28] I J Goodfellow Y Bulatov J Ibarz S Arnoud and V ShetldquoMulti-digit number recognition from street view imageryusing deep convolutional neural networksrdquo 2013 httpsarxivorgabs13126082

[29] B Huval A Coates and A Ng ldquoDeep learning for class-generic object detectionrdquo 2013 httpsarxivorgabs13126885

[30] H C Shin M R Orton D J Collins S J Doran andM O Leach ldquoStacked autoencoders for unsupervised featurelearning and multiple organ detection in a pilot study using4D patient datardquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 35 no 8 pp 1930ndash1943 2013

[31] Y Lv Y Duan W Kang Z Li and F Wang ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16no 2 pp 865ndash873 2015

[32] R Zhao J Wang R Yan and K Mao ldquoMachine healthmonitoring with LSTM networksrdquo in Proceedings of 2016 10thInternational Conference on Sensing Technology (ICST)pp 1ndash6 Nanjing China November 2016

[33] X Ma Z Tao Y Wang H Yu and Y Wang ldquoLong short-term memory neural network for traffic speed predictionusing remote microwave sensor datardquo Transportation

Research Part C Emerging Technologies vol 54 pp 187ndash1972015

[34] E I Vlahogianni M G Karlaftis and J C Golias ldquoOptimizedand meta-optimized neural networks for short-term trafficflow prediction a genetic approachrdquo Transportation ResearchPart C Emerging Technologies vol 13 no 3 pp 211ndash2342005

[35] S Hochreiter and J Schmidhuber ldquoLong short-term mem-oryrdquo Neural Computation vol 9 no 8 pp 1735ndash1780 1997

[36] Keras -e Python Deep Learning Library 2018 httpskerasio

16 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 8: IncomingWork-In-ProgressPredictioninSemiconductor ...downloads.hindawi.com/journals/cin/2019/8729367.pdf · sequentially in time. e IWIP forecast to a tool group is similar to traffic

collection for a specic duration Given the allowed durationfrom the Fab we were able to collect three months data tocreate a data set with 90 days of historical IWIP With eachIWIP as a data point 70 percent of the data points are usedfor the LSTM training phase and the remaining 30 percentfor testing phase

For the number of epochs numerical values of 100 and200 are selected For batch size numerical values of 10 and20 are selected for the number of hidden layers numericalvalues of 3 and 4 are selected and for the number of hiddenneuron numerical values of 384 and 512 are selected for therst hidden layer while the numerical values of 8 and 16 areselected for the subsequent layers It is worthwhile tomention that by using seven IWIP points per dataset as thenumber of previous IWIP lags to be examined each of thenumerical values for batch size denotes the number of weekspresented to the LSTMmodel per epoche neural networkis initialized with uniformly distributed weights where theranges of the weights are (minus01 01) and trained usingmean-squared-error (MSE) as the loss function Adamoptimizer is used as the optimization function with default

learning rate η 0001 β1 09 β2 0999 ε 0 andc 0 Each combination of the selected values is thenevaluated three times to obtain the three RMSE results ofeach combination

Parameter size selection is done by selecting the lowestRMSE among the three experimental runs followed by ex-amining the graphs of the supervised-learning result of thesame run that produced the lowest RMSE e desiredsupervised-learning graph should resemble the pattern il-lustrated in Figure 7 e parameter size combinations thatdo not meet the required pattern will be discarded

42 Measurement Metrics To measure the performance ofthe models two accuracy measurements are used esetwo-measurement metrics are hit rate and correlationmeasurement

Hit rate or probability of detection (POD) is theprobability that the forecasted event matches the observedevent In the context of this research work the observedevents are either low IWIP or high IWIP erefore hit ratecan be used to measure the forecast capability of the pro-posed method to match the actual IWIP events Let HRdenotes hit rate n denotes the number correct detectionand N denotes the total number of observation hit rate isexpressed as

HR n

Ntimes 100 (13)

From the requirement of the Fab it is only necessary forthe proposedmethod to be able to forecast any two days withhighest IWIP and any two days with lowest IWIP For thesefour days to be forecasted the hit rate required by the Fab is75 percent In other words at least three out of these fourdays must be detected

To measure the correlation between the actual IWIP andthe forecasted IWIP this research uses the Pearsonrsquos cor-relation coecient r Pearsonrsquos r is a measure of the linearrelationship between two vectors of variables In this re-search work these two vectors of variables are the actual

Epoch

RMSE

TrainingTesting

Figure 7 RMSE curves when the model is well learned

TrainingTesting

Epoch

RMSE

Figure 8 RMSE curves when then model is underlearned

TrainingTesting

Epoch

RMSE

Figure 9 RMSE curves when the model is overlearned(overtting)

8 Computational Intelligence and Neuroscience

IWIP and the forecasted IWIP Let y denotes the actualIWIP and 1113957y denotes the forecasted IWIP Pearsonrsquos r isexpressed as

r cov(y 1113957y)

σyσ1113957y (14)

where cov is the covariance of actual IWIP and forecastedIWIP σy is the standard deviation of the actual IWIP and σ1113957yis the standard deviation of the forecasted IWIP

-e correlation coefficient takes values in the range [minus11] -e value of 1 implies that a linear equation describes therelationship between the two vectors perfectly -is meansthat all data points of the two vectors fit perfectly on astraight line on a graph -e positive sign of the coefficientindicates positive correlation -is means that as the actualIWIP increase and the forecasted IWIP increases as well-e negative sign of the coefficient indicates negative cor-relation -is means that as the actual IWIP increase theforecasted IWIP decreases as well Positive correlation istherefore desirable for the forecast results

Due to the Fabrsquos privacy protection agreement only theobtained Pearsonrsquos r will be reported while the detail cal-culations of the covariance and standard deviation will beomitted Based on the requirement of the Fab the minimumPearsonrsquos r value is 04

We conduct the experiment for three consecutive weeks-is allows us to monitor the consistency of the modelsrsquoprediction At the beginning of each week we will predictseven days ahead and measure the performance at the end ofeach week -e implementation of the proposed method isaccomplished using Python programming language andKeras [36] neural network library

43 Results Analysis and Discussions Table 4 tabulates theresults of the experiments -e parameter size combinationsobtained in Table 4 are combinations that exhibited curvepattern similar to Figure 7

Figures 10ndash12 show the graphs of the supervised-learning results for the selected three combinations re-spectively From the figures it can be seen that both lines arefar apart although they move along in descending pattern Inaddition the line graph of the training is descending slowlyand remained high at the end of the epoch However theneural network was still able to forecast the IWIP that arequite close to the actual IWIP -is is shown by the linegraph of the testingrsquos RMSE that exhibits small fluctuates Byreferring to these figures alone we are not able to identify thebest parameter size combination because all three graphsexhibit similar pattern -erefore hit rate and linear cor-relation of the forecasting results of each combination can beused to identify the best parameter size combination

-e parameters from each of the selected three com-binations were applied on the proposed LSTM model toperform the three consecutive weeks forecasting -e ex-periments were run and recorded separately for eachcombination Tables 5ndash7 tabulate the hit rate percentage forCombinations 1 2 and 3 respectively Tables 8ndash10 tabulatethe Pearsonrsquos r for Combinations 1 2 and 3 respectively

Table 11 summarizes the hit rate of Combinations 1 2 and 3while Table 12 summarizes the Pearsonrsquos r of Combinations1 2 and 3

Figures 13ndash15 shows the graph plots for the IWIPforecast for the three parameter size combinationsrespectively

From the results obtained the model performed the bestusing Combination 3 In terms of hit rate Combination 3scored the highest compare to Combinations 1 and 2 for allthree weeks Combinations 1 and 2 scored 75 percent forweek 1 but for subsequent weeks both combinations onlyscored the maximum of 50 percent In terms of Pearsonrsquos rCombination 3 has the best performance in overall compareto Combinations 1 and 2 while Combination 1 has the leastperformance Although on week 3 the Pearsonrsquos r ofCombination 3 is slightly lower than Combination 2however it is still above the Fabrsquos requirement

We then compare the forecast result using Combination3 to the statistical forecasting method used in the Fab Inorder to make the writing clearer the statistical forecastingmethod used in the Fab is abbreviated as Fab methodTables 13 and 14 tabulate the hit rate and Pearsonrsquos r of Fabrsquosmethod respectively Table 15 tabulates the comparison ofthe forecast results between the proposed method and Fabrsquosmethod Figure 16 shows the WIP forecast using Fabrsquosmethod

-e Fab method serves as the baseline to measure theperformance of the LSTM forecasting model From theresults tabulated in Table 15 the proposed method withLSTM forecasting model outperformed the Fab methodHowever both hit rate and Pearsonrsquos r of the proposedmethod is unable to remain consistent for three consecutiveweeks forecasted -e results also show that the IWIPforecasted by the Fab method consistently failed to meet therequirement of the Fab for both hit rate and Pearsonrsquos r -emain reason for the inaccuracy of the Fab method is that itonly considers for products with the number of wafersordered dominating the total WIP in the production lineHowever operators need to process other wafers from otherproducts as well Hence the wafers did not arrive on time aspredicted In addition the Fab method does not consider thenumber of tools available at each process steps to process thewafers and the total amount of time that each tool is used toprocess the wafers In real environment a tool can be takenoffline for maintenance purposes or it could be used by therespective engineers to process specially crafted wafers forresearch and development purposes Without taking intothese considerations the Fab method indirectly assumedthat the number of tools available and the time of each tooldedicated to process wafers are the same across the entireperiod of the wafer fabrication process -is assumptioncaused the forecasted results to have negative correlationwith the actual IWIP

For hit rate the proposed method only scored 50 forweek 3 while for Pearsonrsquos r the proposed method onlyscored 031 for week 2 One of the factors that caused thereduced performance of the model could be due to thereason that the size of the historical data to train the LSTMmodel is not large enough Larger historical IWIP data could

Computational Intelligence and Neuroscience 9

potentially allow the LSTM model of the proposed methodto discover more time-dependent relations in the Fabrsquosproduction environment With the additional time-dependent relations discovered the accuracy of themodelrsquos forecasting can be increased

e next factor that could contribute to the inconsistentresult of the model is the limited number of features thatused to represent Fabrsquos production environment Havingadditional features to represent Fabrsquos production environ-ment could allow LSTM to perform better modeling of WIParrival e examples of additional data that could serve assuch features are the actual number of equipment thatsupplies the WIP to the tool group of interest the amount oftime each equipment in the tool group is processing the

production wafers instead of performing other maintenanceactivities and the number of wafers that each equipment inthe tool group of interest has actually processed

e last factor that contributes to the inconsistent resultscould be the need for more hidden layers As the number ofhidden layers increases it creates a deeper neural networkthat could potential allow the model to capture even moretime-dependent relations in the data However in order tobenet from deeper neural network larger dataset must rstbe obtained so that the model can be properly trained

For the experiments conducted the selection of sizes forthe LSTM modelrsquos parameters and the number of experi-mental runs is largely aected by the hardware resourcesallocation and the software capability setup From the

Table 4 Parameter size selection results

CombinationParameter

RMSEEpoch Batch size n LSTM hidden layers stacked LSTM neuronsize

1 100 10 3 512 8 8 000962 100 10 3 512 8 16 000863 100 20 3 512 16 16 00091

0

0005

001

0015

002

0025

003

1 21 41 61 81

RMSE

Epoch

Training vs testing

TrainingTesting

Figure 10 Supervised-learning result for Combination 1

00060008

0010012001400160018

002002200240026

1 21 41 61 81

RMSE

Epoch

Training vs testing

TrainingTesting

Figure 11 Supervised-learning result for Combination 2

10 Computational Intelligence and Neuroscience

hardware resource perspective sucient CPU should beallocated in the computing machine while from the softwarecapability perspective parallelization should be enabled tofully utilize the available CPU With 4 CPUs allocated in avirtual machine environment and parallelization enabled inKeras it took approximately 8 hours to complete one fullexperiment One full experiment refers to the completeevaluation all the predened sizes For real productiondeployment 8 hours is too long to obtain a usable modelParallelization with sucient number of CPUs in thecomputing machine are therefore critical in the production

environment as the results should be obtained as fast aspossible in order for the managements to make the necessarydecision for production line stability Hence proper hard-ware planning is required for production deployment

5 Conclusion

PM activity is an important activity in the Fab as it maintainsor increases the operational eciency and reliability of thetool Proper PM planning is necessary as PM activity takessignicantly long time to complete thus it is desirable to

TrainingTesting

0

0005

001

0015

002

0025

003

1 21 41 61 81

RMSE

Epoch

Training vs testing

Figure 12 Supervised-learning result for Combination 3

Table 5 Hit rate for Combination 1

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

1 L2 L34 L H5 H L67 H H 1

HR 25

w3

1 L L 12 L3 L4 H5 H H 16 H7

HR 50

Table 6 Hit rate for Combination 2

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

1 L2 L L 134 L5 H H 16 H7 H

HR 50

w3

1 L2 L3 L4 H5 H H 16 H7 L

HR 25

Computational Intelligence and Neuroscience 11

perform this activity when the IWIP to the tool group isexpected to be low With an IWIP prediction model that iscapable to predict the IWIP with high accuracy PM activitycan be planned and managed better to reduce its negativeimpact to the fabrsquos CT Reducing the negative impact to CTisimportant as this will enable the Fab to meet the On-Time-Delivery (OTD) committed to customers With consistencyin the OTD the logistic management of the company can beimproved as well such as proper storage place planning to

keep the fabricated wafers and scheduling their trans-portations for shipments Well-planned PM activities alsoallow better manpower planning in areas such as manpowerplanning When performing the PM activity sufficient toolengineers and tool vendors are required to be onsite toperform the prescribed maintenance activities Well-planned PM activities allow the required manpower to beproperly prepared Well-planned manpower directly con-tributes to better manpower cost planning With proper PMplanning in-place tools in the Fab can be scheduled toreceive their proper maintenances on time It is importantfor tools in the Fab to receive their appropriate maintenanceson-time to improve its productivity and lifetime extensionWith improved performance and extended lifetime thecapital investments of the company on the tools can beoptimized Reliable tool performance will also increase thetrust of the customers as chances of the fabricated wafersbeing scrapped due to unhealthy tool are minimized

In this paper we investigated LSTM to assist in PMplanning in Fab by predicting the IWIP to a tool group -ecomparison of the performance of the proposed method wasdone with an existing forecasting method from the Fab -eproposed method was trained using the historical IWIP dataprovided by the Fab which is a time series data Both hit rateand Pearsonrsquos correlation coefficient are important criteriathat determine the forecast capability -e proposed methodhas demonstrated results that outperformed the Fab methodby reaching above the requirement of the Fab for week 1 andweek 3 while the Fabmethod fails to meet Fabrsquos requirementfor all three weeks In terms of hit rate the proposed methodshows higher percentage than Fabrsquos method Following therequirement given by the Fab the results of the proposedmethod signifies that for forecast duration of seven days it isable to identify more accurately the two days that the IWIPwill be highest and the two days the IWIP will be lowest in aweek In terms of Pearsonrsquos correlation coefficient r theproposed method shows positive correlation and highervalue than the Fabrsquos method -is result signifies that theforecast results of the proposed method are able to producebetter forecasting results that have closer proportionalchanges in relation to the actual IWIP -e LSTM model

Table 7 Hit rate for Combination 3

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

12 L L 134 L5 H H 167 H H 1

HR 75

w3

1 L L 12 L3 L4 H5 H H 16 H7

HR 50

Table 8 Pearsonrsquos r for combination 1

Week r

w1 031w2 006w3 034

Table 9 Pearsonrsquos r for combination 2

Week r

w1 040w2 028w3 046

Table 10 Pearsonrsquos r for combination 3

Week r

w1 042w2 031w3 043

Table 11 Summary of hit rate for Combinations 1 2 and 3

CombinationHit rate ()

w1 w2 w31 75 25 502 75 50 253 75 75 50

Table 12 Summary of Pearsonrsquos r for combinations 1 2 and 3

CombinationPearsonrsquos r

w1 w2 w31 031 006 0342 040 028 0463 042 031 043

12 Computational Intelligence and Neuroscience

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 10 batch sizeand 3 hidden layers (512 8 8 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 13 IWIP forecast using Combination 1

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 10 batch sizeand 3 hidden layers (512 8 16 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 14 IWIP forecast using Combination 2

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 20 batch sizeand 3 hidden layers (512 16 16 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 15 IWIP forecast using Combination 3

Computational Intelligence and Neuroscience 13

Table 13 Hit rate for Fab method

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L6 L7 L

HR 50

w2

12 L H3 H4 L5 H L6 L7 H

HR 0

w3

1 L H2 L34 H5 H L6 L7 H

HR 0

Table 14 Pearsonrsquos r for Fab method

Week r

w1 028w2 minus011w3 minus082

Table 15 Forecast result comparison between proposed method and Fab

Hit rate () Pearsonrsquos rWeek Week

w1 w2 w3 w1 w2 w3Proposed method 750 750 500 042 031 043Fab 500 00 00 028 minus011 minus082

ndash10000

00000

10000

20000

30000

40000

50000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using Fab method

ActualProposed LSTM10 upper limit

10 lower limitFab method

Figure 16 IWIP forecast using Fab method

14 Computational Intelligence and Neuroscience

used in the proposed method contains memory cells tomemorize the long and short temporal features in the datawhich yields better performance for the prediction of timeseries data -erefore the proposed method will be veryuseful and benefit to the PM planning

Although the proposed method is outperformed theexisting Fabrsquos statistical method there is still room forimprovement -e first future work is to increase the size ofhistorical dataset With the use of larger historical datasetthat spans across longer historical time horizon to train theLSTM model it may be potential for the LSTM model todiscover significantWIP arrival pattern that could have beenmissed out in smaller historical dataset -e second futurework is to extend the univariate forecasting model in thisresearch to multivariate forecasting model -e reason forthis extension is to allow the inclusion of more features totrain the LSTM model so that the LSTM model can bettermodel the actual environment of the Fab -e next futurework is to increase the number of hidden layers in the LSTMforecasting model -e approach to increase the number ofhidden layer is also an initial step to experiment the potentialuse of deep-learning model in time series forecasting -elast future work is to extend the application of the proposedmethod to predict the IWIP of other types of tool group toexperiment if the proposed method is capable to deliveringthe same prediction performance -e collected predictionresults across various types of tool groups from the futurework will also allow us to generalize the proposed method tobe used as a generic IWIP prediction model for the fab

Data Availability

-e time-series data used to support the findings of thisstudy were supplied by X-Fab Sarawak Sdn Bhd underprivacy agreement and there the data cannot be made freelyavailable -e data potentially reveal sensitive informationand therefore their access is being restricted

Conflicts of Interest

-e authors declare that they have no conflicts of interest

Acknowledgments

-e funding for this project was made possible through theresearch grant from the Ministry of Education Malaysiaunder the Research Acculturation Collaborative Effort(Grant No RACEb(3)12472015(03)) -e authors wouldlike to thank X-FAB Sarawak Sdn Bhd for their support inthis research by providing the environment and resources toextract the relevant data

References

[1] J A Ramırez-Hernandez J Crabtree X Yao et al ldquoOptimalpreventive maintenance scheduling in semiconductormanufacturing systems software tool and simulation casestudiesrdquo IEEE Transactions on Semiconductor Manufacturingvol 23 no 3 pp 477ndash489 2010

[2] Y Tian and L Pan ldquoPredicting short term traffic flow by longshort-termmemory recurrent neural networkrdquo in Proceedingsof 2015 IEEE International Conference on Smart Citypp 153ndash158 Chengdu China December 2015

[3] K Zhang J Xu M R Min G Jiang K Pelechrinis andH Zhang ldquoAutomated IT system failure prediction a deeplearning approachrdquo in Proceedings of 2016 IEEE InternationalConference on Big Data (Big Data) pp 1291ndash1300 Wash-ington DC USA March 2016

[4] G Zhu L Zhang P Shen and J Song ldquoMultimodal gesturerecognition using 3D convolution and convolutional LSTMrdquoIEEE Access vol 5 pp 4517ndash4524 2017

[5] J Lai B Chen T Tan S Tong and K Yu ldquoPhone-awareLSTM-RNN for voice conversionrdquo in Proceedings of 2016IEEE 13th International Conference on Signal Processing(ICSP) pp 177ndash182 Chengdu China November 2016

[6] A ElSaid B Wild J Higgins and T Desell ldquoUsing LSTMrecurrent neural networks to predict excess vibration events inaircraft enginesrdquo in Proceedings of 2016 IEEE 12th In-ternational Conference on e-Science pp 260ndash269 BaltimoreMD USA October 2016

[7] J Wang J Zhang and X Wang ldquoA data driven cycle timeprediction with feature selection in a semiconductor waferfabrication systemrdquo IEEE Transactions on SemiconductorManufacturing vol 31 no 1 pp 173ndash182 2018

[8] J Wang J Zhang and X Wang ldquoBilateral LSTM a two-dimensional long short-term memory model with multiplymemory units for short-term cycle time forecasting in re-entrant manufacturing systemsrdquo IEEE Transactions on In-dustrial Informatics vol 14 no 2 pp 748ndash758 2018

[9] W Scholl B P Gan P Lendermann et al ldquoImplementationof a simulation-based short-term lot arrival forecast in amature 200 mm semiconductor FABrdquo in Proceedings of 2011Winter Simulation Conference (WSC) pp 1927ndash1938Phoenix AZ USA December 2011

[10] M Mosinski D Noack F S Pappert O Rose andW SchollldquoCluster based analytical method for the lot delivery forecastin semiconductor fab with wide product rangerdquo in Pro-ceedings of 2011 Winter Simulation Conference (WSC)pp 1829ndash1839 Phoenix AZ USA December 2011

[11] H K Larry ldquoEvent-based short-term traffic flow predictionmodelrdquo Transportation Research Record vol 1510 pp 45ndash521995

[12] B M Williams and L A Hoel ldquoModeling and forecastingvehicular traffic flow as a seasonal ARIMA process theoreticalbasis and empirical resultsrdquo Journal of Transportation Engi-neering vol 129 no 6 pp 664ndash672 2003

[13] M Van Der Voort M Dougherty and S Watson ldquoCom-bining Kohonen maps with ARIMA time series models toforecast traffic flowrdquo Transportation Research Part CEmerging Technologies vol 4 no 5 pp 307ndash318 1996

[14] Y Xie Y Zhang and Z Ye ldquoShort-term traffic volumeforecasting using Kalman filter with discrete wavelet de-compositionrdquo Computer-Aided Civil and Infrastructure En-gineering vol 22 no 5 pp 326ndash334 2007

[15] W Huang G Song H Hong and K Xie ldquoDeep architecturefor traffic flow prediction deep belief networks with multitasklearningrdquo IEEE Transactions on Intelligent TransportationSystems vol 15 no 5 pp 2191ndash2201 2014

[16] A Abadi T Rajabioun and P A Ioannou ldquoTraffic flowprediction for road transportation networks with limitedtraffic datardquo IEEE Transactions on Intelligent TransportationSystems vol 16 no 2 pp 653ndash662 2015

Computational Intelligence and Neuroscience 15

[17] R Fu Z Zhang and L Li ldquoUsing LSTM and GRU neuralnetwork methods for traffic flow predictionrdquo in Proceedings of2016 31st Youth Academic Annual Conference of ChineseAssociation of Automation (YAC) pp 324ndash328 WuhanChina November 2016

[18] H Shao and B Soong ldquoTraffic flow prediction with longshort-term memory networks (LSTMs)rdquo in Proceedings of2016 IEEE Region 10 Conference (TENCON) pp 2986ndash2989Singapore November 2016

[19] M S Ahmed and A R Cook ldquoAnalysis of freeway traffictime-series data by using Box-Jenkins Techniquesrdquo Trans-portation Research Record vol 773 no 722 pp 1ndash9 1979

[20] I Okutani ldquo-e Kalman filtering approaches in sometransportation and traffic problemsrdquo Transportation ResearchRecord vol 2 no 1 pp 397ndash416 1987

[21] I Okutani and Y J Stephanedes ldquoDynamic prediction oftraffic volume through Kalman filtering theoryrdquo Trans-portation Research Part B Methodological vol 18 no 1pp 1ndash11 1984

[22] H J H Ji A X A Xu X S X Sui and L L L Li ldquo-e appliedresearch of Kalman in the dynamic travel time predictionrdquo inProceedings of 18th International Conference on Geo-informatics pp 1ndash5 Beijing China June 2010

[23] Y Wang and M Papageorgiou ldquoReal-time freeway trafficstate estimation based on extended Kalman filter a generalapproachrdquo Transportation Research Part B Methodologicalvol 39 no 2 pp 141ndash167 2005

[24] Y Bengio ldquoLearning deep architectures for AIrdquo Foundationsand Trendsreg in Machine Learning vol 2 no 1 pp 1ndash1272009

[25] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inProceedings of NIPS Lake Tahoe NV USA December 2012

[26] G E Hinton and R R Salakhutdinov ldquoReducing the di-mensionality of data with neural networksrdquo Science vol 313no 5786 pp 504ndash507 2006

[27] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of 25th ICML pp 160ndash167 HelsinkiFinland July 2008

[28] I J Goodfellow Y Bulatov J Ibarz S Arnoud and V ShetldquoMulti-digit number recognition from street view imageryusing deep convolutional neural networksrdquo 2013 httpsarxivorgabs13126082

[29] B Huval A Coates and A Ng ldquoDeep learning for class-generic object detectionrdquo 2013 httpsarxivorgabs13126885

[30] H C Shin M R Orton D J Collins S J Doran andM O Leach ldquoStacked autoencoders for unsupervised featurelearning and multiple organ detection in a pilot study using4D patient datardquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 35 no 8 pp 1930ndash1943 2013

[31] Y Lv Y Duan W Kang Z Li and F Wang ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16no 2 pp 865ndash873 2015

[32] R Zhao J Wang R Yan and K Mao ldquoMachine healthmonitoring with LSTM networksrdquo in Proceedings of 2016 10thInternational Conference on Sensing Technology (ICST)pp 1ndash6 Nanjing China November 2016

[33] X Ma Z Tao Y Wang H Yu and Y Wang ldquoLong short-term memory neural network for traffic speed predictionusing remote microwave sensor datardquo Transportation

Research Part C Emerging Technologies vol 54 pp 187ndash1972015

[34] E I Vlahogianni M G Karlaftis and J C Golias ldquoOptimizedand meta-optimized neural networks for short-term trafficflow prediction a genetic approachrdquo Transportation ResearchPart C Emerging Technologies vol 13 no 3 pp 211ndash2342005

[35] S Hochreiter and J Schmidhuber ldquoLong short-term mem-oryrdquo Neural Computation vol 9 no 8 pp 1735ndash1780 1997

[36] Keras -e Python Deep Learning Library 2018 httpskerasio

16 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 9: IncomingWork-In-ProgressPredictioninSemiconductor ...downloads.hindawi.com/journals/cin/2019/8729367.pdf · sequentially in time. e IWIP forecast to a tool group is similar to traffic

IWIP and the forecasted IWIP Let y denotes the actualIWIP and 1113957y denotes the forecasted IWIP Pearsonrsquos r isexpressed as

r cov(y 1113957y)

σyσ1113957y (14)

where cov is the covariance of actual IWIP and forecastedIWIP σy is the standard deviation of the actual IWIP and σ1113957yis the standard deviation of the forecasted IWIP

-e correlation coefficient takes values in the range [minus11] -e value of 1 implies that a linear equation describes therelationship between the two vectors perfectly -is meansthat all data points of the two vectors fit perfectly on astraight line on a graph -e positive sign of the coefficientindicates positive correlation -is means that as the actualIWIP increase and the forecasted IWIP increases as well-e negative sign of the coefficient indicates negative cor-relation -is means that as the actual IWIP increase theforecasted IWIP decreases as well Positive correlation istherefore desirable for the forecast results

Due to the Fabrsquos privacy protection agreement only theobtained Pearsonrsquos r will be reported while the detail cal-culations of the covariance and standard deviation will beomitted Based on the requirement of the Fab the minimumPearsonrsquos r value is 04

We conduct the experiment for three consecutive weeks-is allows us to monitor the consistency of the modelsrsquoprediction At the beginning of each week we will predictseven days ahead and measure the performance at the end ofeach week -e implementation of the proposed method isaccomplished using Python programming language andKeras [36] neural network library

43 Results Analysis and Discussions Table 4 tabulates theresults of the experiments -e parameter size combinationsobtained in Table 4 are combinations that exhibited curvepattern similar to Figure 7

Figures 10ndash12 show the graphs of the supervised-learning results for the selected three combinations re-spectively From the figures it can be seen that both lines arefar apart although they move along in descending pattern Inaddition the line graph of the training is descending slowlyand remained high at the end of the epoch However theneural network was still able to forecast the IWIP that arequite close to the actual IWIP -is is shown by the linegraph of the testingrsquos RMSE that exhibits small fluctuates Byreferring to these figures alone we are not able to identify thebest parameter size combination because all three graphsexhibit similar pattern -erefore hit rate and linear cor-relation of the forecasting results of each combination can beused to identify the best parameter size combination

-e parameters from each of the selected three com-binations were applied on the proposed LSTM model toperform the three consecutive weeks forecasting -e ex-periments were run and recorded separately for eachcombination Tables 5ndash7 tabulate the hit rate percentage forCombinations 1 2 and 3 respectively Tables 8ndash10 tabulatethe Pearsonrsquos r for Combinations 1 2 and 3 respectively

Table 11 summarizes the hit rate of Combinations 1 2 and 3while Table 12 summarizes the Pearsonrsquos r of Combinations1 2 and 3

Figures 13ndash15 shows the graph plots for the IWIPforecast for the three parameter size combinationsrespectively

From the results obtained the model performed the bestusing Combination 3 In terms of hit rate Combination 3scored the highest compare to Combinations 1 and 2 for allthree weeks Combinations 1 and 2 scored 75 percent forweek 1 but for subsequent weeks both combinations onlyscored the maximum of 50 percent In terms of Pearsonrsquos rCombination 3 has the best performance in overall compareto Combinations 1 and 2 while Combination 1 has the leastperformance Although on week 3 the Pearsonrsquos r ofCombination 3 is slightly lower than Combination 2however it is still above the Fabrsquos requirement

We then compare the forecast result using Combination3 to the statistical forecasting method used in the Fab Inorder to make the writing clearer the statistical forecastingmethod used in the Fab is abbreviated as Fab methodTables 13 and 14 tabulate the hit rate and Pearsonrsquos r of Fabrsquosmethod respectively Table 15 tabulates the comparison ofthe forecast results between the proposed method and Fabrsquosmethod Figure 16 shows the WIP forecast using Fabrsquosmethod

-e Fab method serves as the baseline to measure theperformance of the LSTM forecasting model From theresults tabulated in Table 15 the proposed method withLSTM forecasting model outperformed the Fab methodHowever both hit rate and Pearsonrsquos r of the proposedmethod is unable to remain consistent for three consecutiveweeks forecasted -e results also show that the IWIPforecasted by the Fab method consistently failed to meet therequirement of the Fab for both hit rate and Pearsonrsquos r -emain reason for the inaccuracy of the Fab method is that itonly considers for products with the number of wafersordered dominating the total WIP in the production lineHowever operators need to process other wafers from otherproducts as well Hence the wafers did not arrive on time aspredicted In addition the Fab method does not consider thenumber of tools available at each process steps to process thewafers and the total amount of time that each tool is used toprocess the wafers In real environment a tool can be takenoffline for maintenance purposes or it could be used by therespective engineers to process specially crafted wafers forresearch and development purposes Without taking intothese considerations the Fab method indirectly assumedthat the number of tools available and the time of each tooldedicated to process wafers are the same across the entireperiod of the wafer fabrication process -is assumptioncaused the forecasted results to have negative correlationwith the actual IWIP

For hit rate the proposed method only scored 50 forweek 3 while for Pearsonrsquos r the proposed method onlyscored 031 for week 2 One of the factors that caused thereduced performance of the model could be due to thereason that the size of the historical data to train the LSTMmodel is not large enough Larger historical IWIP data could

Computational Intelligence and Neuroscience 9

potentially allow the LSTM model of the proposed methodto discover more time-dependent relations in the Fabrsquosproduction environment With the additional time-dependent relations discovered the accuracy of themodelrsquos forecasting can be increased

e next factor that could contribute to the inconsistentresult of the model is the limited number of features thatused to represent Fabrsquos production environment Havingadditional features to represent Fabrsquos production environ-ment could allow LSTM to perform better modeling of WIParrival e examples of additional data that could serve assuch features are the actual number of equipment thatsupplies the WIP to the tool group of interest the amount oftime each equipment in the tool group is processing the

production wafers instead of performing other maintenanceactivities and the number of wafers that each equipment inthe tool group of interest has actually processed

e last factor that contributes to the inconsistent resultscould be the need for more hidden layers As the number ofhidden layers increases it creates a deeper neural networkthat could potential allow the model to capture even moretime-dependent relations in the data However in order tobenet from deeper neural network larger dataset must rstbe obtained so that the model can be properly trained

For the experiments conducted the selection of sizes forthe LSTM modelrsquos parameters and the number of experi-mental runs is largely aected by the hardware resourcesallocation and the software capability setup From the

Table 4 Parameter size selection results

CombinationParameter

RMSEEpoch Batch size n LSTM hidden layers stacked LSTM neuronsize

1 100 10 3 512 8 8 000962 100 10 3 512 8 16 000863 100 20 3 512 16 16 00091

0

0005

001

0015

002

0025

003

1 21 41 61 81

RMSE

Epoch

Training vs testing

TrainingTesting

Figure 10 Supervised-learning result for Combination 1

00060008

0010012001400160018

002002200240026

1 21 41 61 81

RMSE

Epoch

Training vs testing

TrainingTesting

Figure 11 Supervised-learning result for Combination 2

10 Computational Intelligence and Neuroscience

hardware resource perspective sucient CPU should beallocated in the computing machine while from the softwarecapability perspective parallelization should be enabled tofully utilize the available CPU With 4 CPUs allocated in avirtual machine environment and parallelization enabled inKeras it took approximately 8 hours to complete one fullexperiment One full experiment refers to the completeevaluation all the predened sizes For real productiondeployment 8 hours is too long to obtain a usable modelParallelization with sucient number of CPUs in thecomputing machine are therefore critical in the production

environment as the results should be obtained as fast aspossible in order for the managements to make the necessarydecision for production line stability Hence proper hard-ware planning is required for production deployment

5 Conclusion

PM activity is an important activity in the Fab as it maintainsor increases the operational eciency and reliability of thetool Proper PM planning is necessary as PM activity takessignicantly long time to complete thus it is desirable to

TrainingTesting

0

0005

001

0015

002

0025

003

1 21 41 61 81

RMSE

Epoch

Training vs testing

Figure 12 Supervised-learning result for Combination 3

Table 5 Hit rate for Combination 1

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

1 L2 L34 L H5 H L67 H H 1

HR 25

w3

1 L L 12 L3 L4 H5 H H 16 H7

HR 50

Table 6 Hit rate for Combination 2

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

1 L2 L L 134 L5 H H 16 H7 H

HR 50

w3

1 L2 L3 L4 H5 H H 16 H7 L

HR 25

Computational Intelligence and Neuroscience 11

perform this activity when the IWIP to the tool group isexpected to be low With an IWIP prediction model that iscapable to predict the IWIP with high accuracy PM activitycan be planned and managed better to reduce its negativeimpact to the fabrsquos CT Reducing the negative impact to CTisimportant as this will enable the Fab to meet the On-Time-Delivery (OTD) committed to customers With consistencyin the OTD the logistic management of the company can beimproved as well such as proper storage place planning to

keep the fabricated wafers and scheduling their trans-portations for shipments Well-planned PM activities alsoallow better manpower planning in areas such as manpowerplanning When performing the PM activity sufficient toolengineers and tool vendors are required to be onsite toperform the prescribed maintenance activities Well-planned PM activities allow the required manpower to beproperly prepared Well-planned manpower directly con-tributes to better manpower cost planning With proper PMplanning in-place tools in the Fab can be scheduled toreceive their proper maintenances on time It is importantfor tools in the Fab to receive their appropriate maintenanceson-time to improve its productivity and lifetime extensionWith improved performance and extended lifetime thecapital investments of the company on the tools can beoptimized Reliable tool performance will also increase thetrust of the customers as chances of the fabricated wafersbeing scrapped due to unhealthy tool are minimized

In this paper we investigated LSTM to assist in PMplanning in Fab by predicting the IWIP to a tool group -ecomparison of the performance of the proposed method wasdone with an existing forecasting method from the Fab -eproposed method was trained using the historical IWIP dataprovided by the Fab which is a time series data Both hit rateand Pearsonrsquos correlation coefficient are important criteriathat determine the forecast capability -e proposed methodhas demonstrated results that outperformed the Fab methodby reaching above the requirement of the Fab for week 1 andweek 3 while the Fabmethod fails to meet Fabrsquos requirementfor all three weeks In terms of hit rate the proposed methodshows higher percentage than Fabrsquos method Following therequirement given by the Fab the results of the proposedmethod signifies that for forecast duration of seven days it isable to identify more accurately the two days that the IWIPwill be highest and the two days the IWIP will be lowest in aweek In terms of Pearsonrsquos correlation coefficient r theproposed method shows positive correlation and highervalue than the Fabrsquos method -is result signifies that theforecast results of the proposed method are able to producebetter forecasting results that have closer proportionalchanges in relation to the actual IWIP -e LSTM model

Table 7 Hit rate for Combination 3

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

12 L L 134 L5 H H 167 H H 1

HR 75

w3

1 L L 12 L3 L4 H5 H H 16 H7

HR 50

Table 8 Pearsonrsquos r for combination 1

Week r

w1 031w2 006w3 034

Table 9 Pearsonrsquos r for combination 2

Week r

w1 040w2 028w3 046

Table 10 Pearsonrsquos r for combination 3

Week r

w1 042w2 031w3 043

Table 11 Summary of hit rate for Combinations 1 2 and 3

CombinationHit rate ()

w1 w2 w31 75 25 502 75 50 253 75 75 50

Table 12 Summary of Pearsonrsquos r for combinations 1 2 and 3

CombinationPearsonrsquos r

w1 w2 w31 031 006 0342 040 028 0463 042 031 043

12 Computational Intelligence and Neuroscience

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 10 batch sizeand 3 hidden layers (512 8 8 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 13 IWIP forecast using Combination 1

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 10 batch sizeand 3 hidden layers (512 8 16 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 14 IWIP forecast using Combination 2

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 20 batch sizeand 3 hidden layers (512 16 16 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 15 IWIP forecast using Combination 3

Computational Intelligence and Neuroscience 13

Table 13 Hit rate for Fab method

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L6 L7 L

HR 50

w2

12 L H3 H4 L5 H L6 L7 H

HR 0

w3

1 L H2 L34 H5 H L6 L7 H

HR 0

Table 14 Pearsonrsquos r for Fab method

Week r

w1 028w2 minus011w3 minus082

Table 15 Forecast result comparison between proposed method and Fab

Hit rate () Pearsonrsquos rWeek Week

w1 w2 w3 w1 w2 w3Proposed method 750 750 500 042 031 043Fab 500 00 00 028 minus011 minus082

ndash10000

00000

10000

20000

30000

40000

50000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using Fab method

ActualProposed LSTM10 upper limit

10 lower limitFab method

Figure 16 IWIP forecast using Fab method

14 Computational Intelligence and Neuroscience

used in the proposed method contains memory cells tomemorize the long and short temporal features in the datawhich yields better performance for the prediction of timeseries data -erefore the proposed method will be veryuseful and benefit to the PM planning

Although the proposed method is outperformed theexisting Fabrsquos statistical method there is still room forimprovement -e first future work is to increase the size ofhistorical dataset With the use of larger historical datasetthat spans across longer historical time horizon to train theLSTM model it may be potential for the LSTM model todiscover significantWIP arrival pattern that could have beenmissed out in smaller historical dataset -e second futurework is to extend the univariate forecasting model in thisresearch to multivariate forecasting model -e reason forthis extension is to allow the inclusion of more features totrain the LSTM model so that the LSTM model can bettermodel the actual environment of the Fab -e next futurework is to increase the number of hidden layers in the LSTMforecasting model -e approach to increase the number ofhidden layer is also an initial step to experiment the potentialuse of deep-learning model in time series forecasting -elast future work is to extend the application of the proposedmethod to predict the IWIP of other types of tool group toexperiment if the proposed method is capable to deliveringthe same prediction performance -e collected predictionresults across various types of tool groups from the futurework will also allow us to generalize the proposed method tobe used as a generic IWIP prediction model for the fab

Data Availability

-e time-series data used to support the findings of thisstudy were supplied by X-Fab Sarawak Sdn Bhd underprivacy agreement and there the data cannot be made freelyavailable -e data potentially reveal sensitive informationand therefore their access is being restricted

Conflicts of Interest

-e authors declare that they have no conflicts of interest

Acknowledgments

-e funding for this project was made possible through theresearch grant from the Ministry of Education Malaysiaunder the Research Acculturation Collaborative Effort(Grant No RACEb(3)12472015(03)) -e authors wouldlike to thank X-FAB Sarawak Sdn Bhd for their support inthis research by providing the environment and resources toextract the relevant data

References

[1] J A Ramırez-Hernandez J Crabtree X Yao et al ldquoOptimalpreventive maintenance scheduling in semiconductormanufacturing systems software tool and simulation casestudiesrdquo IEEE Transactions on Semiconductor Manufacturingvol 23 no 3 pp 477ndash489 2010

[2] Y Tian and L Pan ldquoPredicting short term traffic flow by longshort-termmemory recurrent neural networkrdquo in Proceedingsof 2015 IEEE International Conference on Smart Citypp 153ndash158 Chengdu China December 2015

[3] K Zhang J Xu M R Min G Jiang K Pelechrinis andH Zhang ldquoAutomated IT system failure prediction a deeplearning approachrdquo in Proceedings of 2016 IEEE InternationalConference on Big Data (Big Data) pp 1291ndash1300 Wash-ington DC USA March 2016

[4] G Zhu L Zhang P Shen and J Song ldquoMultimodal gesturerecognition using 3D convolution and convolutional LSTMrdquoIEEE Access vol 5 pp 4517ndash4524 2017

[5] J Lai B Chen T Tan S Tong and K Yu ldquoPhone-awareLSTM-RNN for voice conversionrdquo in Proceedings of 2016IEEE 13th International Conference on Signal Processing(ICSP) pp 177ndash182 Chengdu China November 2016

[6] A ElSaid B Wild J Higgins and T Desell ldquoUsing LSTMrecurrent neural networks to predict excess vibration events inaircraft enginesrdquo in Proceedings of 2016 IEEE 12th In-ternational Conference on e-Science pp 260ndash269 BaltimoreMD USA October 2016

[7] J Wang J Zhang and X Wang ldquoA data driven cycle timeprediction with feature selection in a semiconductor waferfabrication systemrdquo IEEE Transactions on SemiconductorManufacturing vol 31 no 1 pp 173ndash182 2018

[8] J Wang J Zhang and X Wang ldquoBilateral LSTM a two-dimensional long short-term memory model with multiplymemory units for short-term cycle time forecasting in re-entrant manufacturing systemsrdquo IEEE Transactions on In-dustrial Informatics vol 14 no 2 pp 748ndash758 2018

[9] W Scholl B P Gan P Lendermann et al ldquoImplementationof a simulation-based short-term lot arrival forecast in amature 200 mm semiconductor FABrdquo in Proceedings of 2011Winter Simulation Conference (WSC) pp 1927ndash1938Phoenix AZ USA December 2011

[10] M Mosinski D Noack F S Pappert O Rose andW SchollldquoCluster based analytical method for the lot delivery forecastin semiconductor fab with wide product rangerdquo in Pro-ceedings of 2011 Winter Simulation Conference (WSC)pp 1829ndash1839 Phoenix AZ USA December 2011

[11] H K Larry ldquoEvent-based short-term traffic flow predictionmodelrdquo Transportation Research Record vol 1510 pp 45ndash521995

[12] B M Williams and L A Hoel ldquoModeling and forecastingvehicular traffic flow as a seasonal ARIMA process theoreticalbasis and empirical resultsrdquo Journal of Transportation Engi-neering vol 129 no 6 pp 664ndash672 2003

[13] M Van Der Voort M Dougherty and S Watson ldquoCom-bining Kohonen maps with ARIMA time series models toforecast traffic flowrdquo Transportation Research Part CEmerging Technologies vol 4 no 5 pp 307ndash318 1996

[14] Y Xie Y Zhang and Z Ye ldquoShort-term traffic volumeforecasting using Kalman filter with discrete wavelet de-compositionrdquo Computer-Aided Civil and Infrastructure En-gineering vol 22 no 5 pp 326ndash334 2007

[15] W Huang G Song H Hong and K Xie ldquoDeep architecturefor traffic flow prediction deep belief networks with multitasklearningrdquo IEEE Transactions on Intelligent TransportationSystems vol 15 no 5 pp 2191ndash2201 2014

[16] A Abadi T Rajabioun and P A Ioannou ldquoTraffic flowprediction for road transportation networks with limitedtraffic datardquo IEEE Transactions on Intelligent TransportationSystems vol 16 no 2 pp 653ndash662 2015

Computational Intelligence and Neuroscience 15

[17] R Fu Z Zhang and L Li ldquoUsing LSTM and GRU neuralnetwork methods for traffic flow predictionrdquo in Proceedings of2016 31st Youth Academic Annual Conference of ChineseAssociation of Automation (YAC) pp 324ndash328 WuhanChina November 2016

[18] H Shao and B Soong ldquoTraffic flow prediction with longshort-term memory networks (LSTMs)rdquo in Proceedings of2016 IEEE Region 10 Conference (TENCON) pp 2986ndash2989Singapore November 2016

[19] M S Ahmed and A R Cook ldquoAnalysis of freeway traffictime-series data by using Box-Jenkins Techniquesrdquo Trans-portation Research Record vol 773 no 722 pp 1ndash9 1979

[20] I Okutani ldquo-e Kalman filtering approaches in sometransportation and traffic problemsrdquo Transportation ResearchRecord vol 2 no 1 pp 397ndash416 1987

[21] I Okutani and Y J Stephanedes ldquoDynamic prediction oftraffic volume through Kalman filtering theoryrdquo Trans-portation Research Part B Methodological vol 18 no 1pp 1ndash11 1984

[22] H J H Ji A X A Xu X S X Sui and L L L Li ldquo-e appliedresearch of Kalman in the dynamic travel time predictionrdquo inProceedings of 18th International Conference on Geo-informatics pp 1ndash5 Beijing China June 2010

[23] Y Wang and M Papageorgiou ldquoReal-time freeway trafficstate estimation based on extended Kalman filter a generalapproachrdquo Transportation Research Part B Methodologicalvol 39 no 2 pp 141ndash167 2005

[24] Y Bengio ldquoLearning deep architectures for AIrdquo Foundationsand Trendsreg in Machine Learning vol 2 no 1 pp 1ndash1272009

[25] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inProceedings of NIPS Lake Tahoe NV USA December 2012

[26] G E Hinton and R R Salakhutdinov ldquoReducing the di-mensionality of data with neural networksrdquo Science vol 313no 5786 pp 504ndash507 2006

[27] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of 25th ICML pp 160ndash167 HelsinkiFinland July 2008

[28] I J Goodfellow Y Bulatov J Ibarz S Arnoud and V ShetldquoMulti-digit number recognition from street view imageryusing deep convolutional neural networksrdquo 2013 httpsarxivorgabs13126082

[29] B Huval A Coates and A Ng ldquoDeep learning for class-generic object detectionrdquo 2013 httpsarxivorgabs13126885

[30] H C Shin M R Orton D J Collins S J Doran andM O Leach ldquoStacked autoencoders for unsupervised featurelearning and multiple organ detection in a pilot study using4D patient datardquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 35 no 8 pp 1930ndash1943 2013

[31] Y Lv Y Duan W Kang Z Li and F Wang ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16no 2 pp 865ndash873 2015

[32] R Zhao J Wang R Yan and K Mao ldquoMachine healthmonitoring with LSTM networksrdquo in Proceedings of 2016 10thInternational Conference on Sensing Technology (ICST)pp 1ndash6 Nanjing China November 2016

[33] X Ma Z Tao Y Wang H Yu and Y Wang ldquoLong short-term memory neural network for traffic speed predictionusing remote microwave sensor datardquo Transportation

Research Part C Emerging Technologies vol 54 pp 187ndash1972015

[34] E I Vlahogianni M G Karlaftis and J C Golias ldquoOptimizedand meta-optimized neural networks for short-term trafficflow prediction a genetic approachrdquo Transportation ResearchPart C Emerging Technologies vol 13 no 3 pp 211ndash2342005

[35] S Hochreiter and J Schmidhuber ldquoLong short-term mem-oryrdquo Neural Computation vol 9 no 8 pp 1735ndash1780 1997

[36] Keras -e Python Deep Learning Library 2018 httpskerasio

16 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 10: IncomingWork-In-ProgressPredictioninSemiconductor ...downloads.hindawi.com/journals/cin/2019/8729367.pdf · sequentially in time. e IWIP forecast to a tool group is similar to traffic

potentially allow the LSTM model of the proposed methodto discover more time-dependent relations in the Fabrsquosproduction environment With the additional time-dependent relations discovered the accuracy of themodelrsquos forecasting can be increased

e next factor that could contribute to the inconsistentresult of the model is the limited number of features thatused to represent Fabrsquos production environment Havingadditional features to represent Fabrsquos production environ-ment could allow LSTM to perform better modeling of WIParrival e examples of additional data that could serve assuch features are the actual number of equipment thatsupplies the WIP to the tool group of interest the amount oftime each equipment in the tool group is processing the

production wafers instead of performing other maintenanceactivities and the number of wafers that each equipment inthe tool group of interest has actually processed

e last factor that contributes to the inconsistent resultscould be the need for more hidden layers As the number ofhidden layers increases it creates a deeper neural networkthat could potential allow the model to capture even moretime-dependent relations in the data However in order tobenet from deeper neural network larger dataset must rstbe obtained so that the model can be properly trained

For the experiments conducted the selection of sizes forthe LSTM modelrsquos parameters and the number of experi-mental runs is largely aected by the hardware resourcesallocation and the software capability setup From the

Table 4 Parameter size selection results

CombinationParameter

RMSEEpoch Batch size n LSTM hidden layers stacked LSTM neuronsize

1 100 10 3 512 8 8 000962 100 10 3 512 8 16 000863 100 20 3 512 16 16 00091

0

0005

001

0015

002

0025

003

1 21 41 61 81

RMSE

Epoch

Training vs testing

TrainingTesting

Figure 10 Supervised-learning result for Combination 1

00060008

0010012001400160018

002002200240026

1 21 41 61 81

RMSE

Epoch

Training vs testing

TrainingTesting

Figure 11 Supervised-learning result for Combination 2

10 Computational Intelligence and Neuroscience

hardware resource perspective sucient CPU should beallocated in the computing machine while from the softwarecapability perspective parallelization should be enabled tofully utilize the available CPU With 4 CPUs allocated in avirtual machine environment and parallelization enabled inKeras it took approximately 8 hours to complete one fullexperiment One full experiment refers to the completeevaluation all the predened sizes For real productiondeployment 8 hours is too long to obtain a usable modelParallelization with sucient number of CPUs in thecomputing machine are therefore critical in the production

environment as the results should be obtained as fast aspossible in order for the managements to make the necessarydecision for production line stability Hence proper hard-ware planning is required for production deployment

5 Conclusion

PM activity is an important activity in the Fab as it maintainsor increases the operational eciency and reliability of thetool Proper PM planning is necessary as PM activity takessignicantly long time to complete thus it is desirable to

TrainingTesting

0

0005

001

0015

002

0025

003

1 21 41 61 81

RMSE

Epoch

Training vs testing

Figure 12 Supervised-learning result for Combination 3

Table 5 Hit rate for Combination 1

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

1 L2 L34 L H5 H L67 H H 1

HR 25

w3

1 L L 12 L3 L4 H5 H H 16 H7

HR 50

Table 6 Hit rate for Combination 2

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

1 L2 L L 134 L5 H H 16 H7 H

HR 50

w3

1 L2 L3 L4 H5 H H 16 H7 L

HR 25

Computational Intelligence and Neuroscience 11

perform this activity when the IWIP to the tool group isexpected to be low With an IWIP prediction model that iscapable to predict the IWIP with high accuracy PM activitycan be planned and managed better to reduce its negativeimpact to the fabrsquos CT Reducing the negative impact to CTisimportant as this will enable the Fab to meet the On-Time-Delivery (OTD) committed to customers With consistencyin the OTD the logistic management of the company can beimproved as well such as proper storage place planning to

keep the fabricated wafers and scheduling their trans-portations for shipments Well-planned PM activities alsoallow better manpower planning in areas such as manpowerplanning When performing the PM activity sufficient toolengineers and tool vendors are required to be onsite toperform the prescribed maintenance activities Well-planned PM activities allow the required manpower to beproperly prepared Well-planned manpower directly con-tributes to better manpower cost planning With proper PMplanning in-place tools in the Fab can be scheduled toreceive their proper maintenances on time It is importantfor tools in the Fab to receive their appropriate maintenanceson-time to improve its productivity and lifetime extensionWith improved performance and extended lifetime thecapital investments of the company on the tools can beoptimized Reliable tool performance will also increase thetrust of the customers as chances of the fabricated wafersbeing scrapped due to unhealthy tool are minimized

In this paper we investigated LSTM to assist in PMplanning in Fab by predicting the IWIP to a tool group -ecomparison of the performance of the proposed method wasdone with an existing forecasting method from the Fab -eproposed method was trained using the historical IWIP dataprovided by the Fab which is a time series data Both hit rateand Pearsonrsquos correlation coefficient are important criteriathat determine the forecast capability -e proposed methodhas demonstrated results that outperformed the Fab methodby reaching above the requirement of the Fab for week 1 andweek 3 while the Fabmethod fails to meet Fabrsquos requirementfor all three weeks In terms of hit rate the proposed methodshows higher percentage than Fabrsquos method Following therequirement given by the Fab the results of the proposedmethod signifies that for forecast duration of seven days it isable to identify more accurately the two days that the IWIPwill be highest and the two days the IWIP will be lowest in aweek In terms of Pearsonrsquos correlation coefficient r theproposed method shows positive correlation and highervalue than the Fabrsquos method -is result signifies that theforecast results of the proposed method are able to producebetter forecasting results that have closer proportionalchanges in relation to the actual IWIP -e LSTM model

Table 7 Hit rate for Combination 3

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

12 L L 134 L5 H H 167 H H 1

HR 75

w3

1 L L 12 L3 L4 H5 H H 16 H7

HR 50

Table 8 Pearsonrsquos r for combination 1

Week r

w1 031w2 006w3 034

Table 9 Pearsonrsquos r for combination 2

Week r

w1 040w2 028w3 046

Table 10 Pearsonrsquos r for combination 3

Week r

w1 042w2 031w3 043

Table 11 Summary of hit rate for Combinations 1 2 and 3

CombinationHit rate ()

w1 w2 w31 75 25 502 75 50 253 75 75 50

Table 12 Summary of Pearsonrsquos r for combinations 1 2 and 3

CombinationPearsonrsquos r

w1 w2 w31 031 006 0342 040 028 0463 042 031 043

12 Computational Intelligence and Neuroscience

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 10 batch sizeand 3 hidden layers (512 8 8 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 13 IWIP forecast using Combination 1

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 10 batch sizeand 3 hidden layers (512 8 16 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 14 IWIP forecast using Combination 2

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 20 batch sizeand 3 hidden layers (512 16 16 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 15 IWIP forecast using Combination 3

Computational Intelligence and Neuroscience 13

Table 13 Hit rate for Fab method

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L6 L7 L

HR 50

w2

12 L H3 H4 L5 H L6 L7 H

HR 0

w3

1 L H2 L34 H5 H L6 L7 H

HR 0

Table 14 Pearsonrsquos r for Fab method

Week r

w1 028w2 minus011w3 minus082

Table 15 Forecast result comparison between proposed method and Fab

Hit rate () Pearsonrsquos rWeek Week

w1 w2 w3 w1 w2 w3Proposed method 750 750 500 042 031 043Fab 500 00 00 028 minus011 minus082

ndash10000

00000

10000

20000

30000

40000

50000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using Fab method

ActualProposed LSTM10 upper limit

10 lower limitFab method

Figure 16 IWIP forecast using Fab method

14 Computational Intelligence and Neuroscience

used in the proposed method contains memory cells tomemorize the long and short temporal features in the datawhich yields better performance for the prediction of timeseries data -erefore the proposed method will be veryuseful and benefit to the PM planning

Although the proposed method is outperformed theexisting Fabrsquos statistical method there is still room forimprovement -e first future work is to increase the size ofhistorical dataset With the use of larger historical datasetthat spans across longer historical time horizon to train theLSTM model it may be potential for the LSTM model todiscover significantWIP arrival pattern that could have beenmissed out in smaller historical dataset -e second futurework is to extend the univariate forecasting model in thisresearch to multivariate forecasting model -e reason forthis extension is to allow the inclusion of more features totrain the LSTM model so that the LSTM model can bettermodel the actual environment of the Fab -e next futurework is to increase the number of hidden layers in the LSTMforecasting model -e approach to increase the number ofhidden layer is also an initial step to experiment the potentialuse of deep-learning model in time series forecasting -elast future work is to extend the application of the proposedmethod to predict the IWIP of other types of tool group toexperiment if the proposed method is capable to deliveringthe same prediction performance -e collected predictionresults across various types of tool groups from the futurework will also allow us to generalize the proposed method tobe used as a generic IWIP prediction model for the fab

Data Availability

-e time-series data used to support the findings of thisstudy were supplied by X-Fab Sarawak Sdn Bhd underprivacy agreement and there the data cannot be made freelyavailable -e data potentially reveal sensitive informationand therefore their access is being restricted

Conflicts of Interest

-e authors declare that they have no conflicts of interest

Acknowledgments

-e funding for this project was made possible through theresearch grant from the Ministry of Education Malaysiaunder the Research Acculturation Collaborative Effort(Grant No RACEb(3)12472015(03)) -e authors wouldlike to thank X-FAB Sarawak Sdn Bhd for their support inthis research by providing the environment and resources toextract the relevant data

References

[1] J A Ramırez-Hernandez J Crabtree X Yao et al ldquoOptimalpreventive maintenance scheduling in semiconductormanufacturing systems software tool and simulation casestudiesrdquo IEEE Transactions on Semiconductor Manufacturingvol 23 no 3 pp 477ndash489 2010

[2] Y Tian and L Pan ldquoPredicting short term traffic flow by longshort-termmemory recurrent neural networkrdquo in Proceedingsof 2015 IEEE International Conference on Smart Citypp 153ndash158 Chengdu China December 2015

[3] K Zhang J Xu M R Min G Jiang K Pelechrinis andH Zhang ldquoAutomated IT system failure prediction a deeplearning approachrdquo in Proceedings of 2016 IEEE InternationalConference on Big Data (Big Data) pp 1291ndash1300 Wash-ington DC USA March 2016

[4] G Zhu L Zhang P Shen and J Song ldquoMultimodal gesturerecognition using 3D convolution and convolutional LSTMrdquoIEEE Access vol 5 pp 4517ndash4524 2017

[5] J Lai B Chen T Tan S Tong and K Yu ldquoPhone-awareLSTM-RNN for voice conversionrdquo in Proceedings of 2016IEEE 13th International Conference on Signal Processing(ICSP) pp 177ndash182 Chengdu China November 2016

[6] A ElSaid B Wild J Higgins and T Desell ldquoUsing LSTMrecurrent neural networks to predict excess vibration events inaircraft enginesrdquo in Proceedings of 2016 IEEE 12th In-ternational Conference on e-Science pp 260ndash269 BaltimoreMD USA October 2016

[7] J Wang J Zhang and X Wang ldquoA data driven cycle timeprediction with feature selection in a semiconductor waferfabrication systemrdquo IEEE Transactions on SemiconductorManufacturing vol 31 no 1 pp 173ndash182 2018

[8] J Wang J Zhang and X Wang ldquoBilateral LSTM a two-dimensional long short-term memory model with multiplymemory units for short-term cycle time forecasting in re-entrant manufacturing systemsrdquo IEEE Transactions on In-dustrial Informatics vol 14 no 2 pp 748ndash758 2018

[9] W Scholl B P Gan P Lendermann et al ldquoImplementationof a simulation-based short-term lot arrival forecast in amature 200 mm semiconductor FABrdquo in Proceedings of 2011Winter Simulation Conference (WSC) pp 1927ndash1938Phoenix AZ USA December 2011

[10] M Mosinski D Noack F S Pappert O Rose andW SchollldquoCluster based analytical method for the lot delivery forecastin semiconductor fab with wide product rangerdquo in Pro-ceedings of 2011 Winter Simulation Conference (WSC)pp 1829ndash1839 Phoenix AZ USA December 2011

[11] H K Larry ldquoEvent-based short-term traffic flow predictionmodelrdquo Transportation Research Record vol 1510 pp 45ndash521995

[12] B M Williams and L A Hoel ldquoModeling and forecastingvehicular traffic flow as a seasonal ARIMA process theoreticalbasis and empirical resultsrdquo Journal of Transportation Engi-neering vol 129 no 6 pp 664ndash672 2003

[13] M Van Der Voort M Dougherty and S Watson ldquoCom-bining Kohonen maps with ARIMA time series models toforecast traffic flowrdquo Transportation Research Part CEmerging Technologies vol 4 no 5 pp 307ndash318 1996

[14] Y Xie Y Zhang and Z Ye ldquoShort-term traffic volumeforecasting using Kalman filter with discrete wavelet de-compositionrdquo Computer-Aided Civil and Infrastructure En-gineering vol 22 no 5 pp 326ndash334 2007

[15] W Huang G Song H Hong and K Xie ldquoDeep architecturefor traffic flow prediction deep belief networks with multitasklearningrdquo IEEE Transactions on Intelligent TransportationSystems vol 15 no 5 pp 2191ndash2201 2014

[16] A Abadi T Rajabioun and P A Ioannou ldquoTraffic flowprediction for road transportation networks with limitedtraffic datardquo IEEE Transactions on Intelligent TransportationSystems vol 16 no 2 pp 653ndash662 2015

Computational Intelligence and Neuroscience 15

[17] R Fu Z Zhang and L Li ldquoUsing LSTM and GRU neuralnetwork methods for traffic flow predictionrdquo in Proceedings of2016 31st Youth Academic Annual Conference of ChineseAssociation of Automation (YAC) pp 324ndash328 WuhanChina November 2016

[18] H Shao and B Soong ldquoTraffic flow prediction with longshort-term memory networks (LSTMs)rdquo in Proceedings of2016 IEEE Region 10 Conference (TENCON) pp 2986ndash2989Singapore November 2016

[19] M S Ahmed and A R Cook ldquoAnalysis of freeway traffictime-series data by using Box-Jenkins Techniquesrdquo Trans-portation Research Record vol 773 no 722 pp 1ndash9 1979

[20] I Okutani ldquo-e Kalman filtering approaches in sometransportation and traffic problemsrdquo Transportation ResearchRecord vol 2 no 1 pp 397ndash416 1987

[21] I Okutani and Y J Stephanedes ldquoDynamic prediction oftraffic volume through Kalman filtering theoryrdquo Trans-portation Research Part B Methodological vol 18 no 1pp 1ndash11 1984

[22] H J H Ji A X A Xu X S X Sui and L L L Li ldquo-e appliedresearch of Kalman in the dynamic travel time predictionrdquo inProceedings of 18th International Conference on Geo-informatics pp 1ndash5 Beijing China June 2010

[23] Y Wang and M Papageorgiou ldquoReal-time freeway trafficstate estimation based on extended Kalman filter a generalapproachrdquo Transportation Research Part B Methodologicalvol 39 no 2 pp 141ndash167 2005

[24] Y Bengio ldquoLearning deep architectures for AIrdquo Foundationsand Trendsreg in Machine Learning vol 2 no 1 pp 1ndash1272009

[25] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inProceedings of NIPS Lake Tahoe NV USA December 2012

[26] G E Hinton and R R Salakhutdinov ldquoReducing the di-mensionality of data with neural networksrdquo Science vol 313no 5786 pp 504ndash507 2006

[27] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of 25th ICML pp 160ndash167 HelsinkiFinland July 2008

[28] I J Goodfellow Y Bulatov J Ibarz S Arnoud and V ShetldquoMulti-digit number recognition from street view imageryusing deep convolutional neural networksrdquo 2013 httpsarxivorgabs13126082

[29] B Huval A Coates and A Ng ldquoDeep learning for class-generic object detectionrdquo 2013 httpsarxivorgabs13126885

[30] H C Shin M R Orton D J Collins S J Doran andM O Leach ldquoStacked autoencoders for unsupervised featurelearning and multiple organ detection in a pilot study using4D patient datardquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 35 no 8 pp 1930ndash1943 2013

[31] Y Lv Y Duan W Kang Z Li and F Wang ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16no 2 pp 865ndash873 2015

[32] R Zhao J Wang R Yan and K Mao ldquoMachine healthmonitoring with LSTM networksrdquo in Proceedings of 2016 10thInternational Conference on Sensing Technology (ICST)pp 1ndash6 Nanjing China November 2016

[33] X Ma Z Tao Y Wang H Yu and Y Wang ldquoLong short-term memory neural network for traffic speed predictionusing remote microwave sensor datardquo Transportation

Research Part C Emerging Technologies vol 54 pp 187ndash1972015

[34] E I Vlahogianni M G Karlaftis and J C Golias ldquoOptimizedand meta-optimized neural networks for short-term trafficflow prediction a genetic approachrdquo Transportation ResearchPart C Emerging Technologies vol 13 no 3 pp 211ndash2342005

[35] S Hochreiter and J Schmidhuber ldquoLong short-term mem-oryrdquo Neural Computation vol 9 no 8 pp 1735ndash1780 1997

[36] Keras -e Python Deep Learning Library 2018 httpskerasio

16 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 11: IncomingWork-In-ProgressPredictioninSemiconductor ...downloads.hindawi.com/journals/cin/2019/8729367.pdf · sequentially in time. e IWIP forecast to a tool group is similar to traffic

hardware resource perspective sucient CPU should beallocated in the computing machine while from the softwarecapability perspective parallelization should be enabled tofully utilize the available CPU With 4 CPUs allocated in avirtual machine environment and parallelization enabled inKeras it took approximately 8 hours to complete one fullexperiment One full experiment refers to the completeevaluation all the predened sizes For real productiondeployment 8 hours is too long to obtain a usable modelParallelization with sucient number of CPUs in thecomputing machine are therefore critical in the production

environment as the results should be obtained as fast aspossible in order for the managements to make the necessarydecision for production line stability Hence proper hard-ware planning is required for production deployment

5 Conclusion

PM activity is an important activity in the Fab as it maintainsor increases the operational eciency and reliability of thetool Proper PM planning is necessary as PM activity takessignicantly long time to complete thus it is desirable to

TrainingTesting

0

0005

001

0015

002

0025

003

1 21 41 61 81

RMSE

Epoch

Training vs testing

Figure 12 Supervised-learning result for Combination 3

Table 5 Hit rate for Combination 1

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

1 L2 L34 L H5 H L67 H H 1

HR 25

w3

1 L L 12 L3 L4 H5 H H 16 H7

HR 50

Table 6 Hit rate for Combination 2

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

1 L2 L L 134 L5 H H 16 H7 H

HR 50

w3

1 L2 L3 L4 H5 H H 16 H7 L

HR 25

Computational Intelligence and Neuroscience 11

perform this activity when the IWIP to the tool group isexpected to be low With an IWIP prediction model that iscapable to predict the IWIP with high accuracy PM activitycan be planned and managed better to reduce its negativeimpact to the fabrsquos CT Reducing the negative impact to CTisimportant as this will enable the Fab to meet the On-Time-Delivery (OTD) committed to customers With consistencyin the OTD the logistic management of the company can beimproved as well such as proper storage place planning to

keep the fabricated wafers and scheduling their trans-portations for shipments Well-planned PM activities alsoallow better manpower planning in areas such as manpowerplanning When performing the PM activity sufficient toolengineers and tool vendors are required to be onsite toperform the prescribed maintenance activities Well-planned PM activities allow the required manpower to beproperly prepared Well-planned manpower directly con-tributes to better manpower cost planning With proper PMplanning in-place tools in the Fab can be scheduled toreceive their proper maintenances on time It is importantfor tools in the Fab to receive their appropriate maintenanceson-time to improve its productivity and lifetime extensionWith improved performance and extended lifetime thecapital investments of the company on the tools can beoptimized Reliable tool performance will also increase thetrust of the customers as chances of the fabricated wafersbeing scrapped due to unhealthy tool are minimized

In this paper we investigated LSTM to assist in PMplanning in Fab by predicting the IWIP to a tool group -ecomparison of the performance of the proposed method wasdone with an existing forecasting method from the Fab -eproposed method was trained using the historical IWIP dataprovided by the Fab which is a time series data Both hit rateand Pearsonrsquos correlation coefficient are important criteriathat determine the forecast capability -e proposed methodhas demonstrated results that outperformed the Fab methodby reaching above the requirement of the Fab for week 1 andweek 3 while the Fabmethod fails to meet Fabrsquos requirementfor all three weeks In terms of hit rate the proposed methodshows higher percentage than Fabrsquos method Following therequirement given by the Fab the results of the proposedmethod signifies that for forecast duration of seven days it isable to identify more accurately the two days that the IWIPwill be highest and the two days the IWIP will be lowest in aweek In terms of Pearsonrsquos correlation coefficient r theproposed method shows positive correlation and highervalue than the Fabrsquos method -is result signifies that theforecast results of the proposed method are able to producebetter forecasting results that have closer proportionalchanges in relation to the actual IWIP -e LSTM model

Table 7 Hit rate for Combination 3

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

12 L L 134 L5 H H 167 H H 1

HR 75

w3

1 L L 12 L3 L4 H5 H H 16 H7

HR 50

Table 8 Pearsonrsquos r for combination 1

Week r

w1 031w2 006w3 034

Table 9 Pearsonrsquos r for combination 2

Week r

w1 040w2 028w3 046

Table 10 Pearsonrsquos r for combination 3

Week r

w1 042w2 031w3 043

Table 11 Summary of hit rate for Combinations 1 2 and 3

CombinationHit rate ()

w1 w2 w31 75 25 502 75 50 253 75 75 50

Table 12 Summary of Pearsonrsquos r for combinations 1 2 and 3

CombinationPearsonrsquos r

w1 w2 w31 031 006 0342 040 028 0463 042 031 043

12 Computational Intelligence and Neuroscience

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 10 batch sizeand 3 hidden layers (512 8 8 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 13 IWIP forecast using Combination 1

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 10 batch sizeand 3 hidden layers (512 8 16 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 14 IWIP forecast using Combination 2

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 20 batch sizeand 3 hidden layers (512 16 16 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 15 IWIP forecast using Combination 3

Computational Intelligence and Neuroscience 13

Table 13 Hit rate for Fab method

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L6 L7 L

HR 50

w2

12 L H3 H4 L5 H L6 L7 H

HR 0

w3

1 L H2 L34 H5 H L6 L7 H

HR 0

Table 14 Pearsonrsquos r for Fab method

Week r

w1 028w2 minus011w3 minus082

Table 15 Forecast result comparison between proposed method and Fab

Hit rate () Pearsonrsquos rWeek Week

w1 w2 w3 w1 w2 w3Proposed method 750 750 500 042 031 043Fab 500 00 00 028 minus011 minus082

ndash10000

00000

10000

20000

30000

40000

50000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using Fab method

ActualProposed LSTM10 upper limit

10 lower limitFab method

Figure 16 IWIP forecast using Fab method

14 Computational Intelligence and Neuroscience

used in the proposed method contains memory cells tomemorize the long and short temporal features in the datawhich yields better performance for the prediction of timeseries data -erefore the proposed method will be veryuseful and benefit to the PM planning

Although the proposed method is outperformed theexisting Fabrsquos statistical method there is still room forimprovement -e first future work is to increase the size ofhistorical dataset With the use of larger historical datasetthat spans across longer historical time horizon to train theLSTM model it may be potential for the LSTM model todiscover significantWIP arrival pattern that could have beenmissed out in smaller historical dataset -e second futurework is to extend the univariate forecasting model in thisresearch to multivariate forecasting model -e reason forthis extension is to allow the inclusion of more features totrain the LSTM model so that the LSTM model can bettermodel the actual environment of the Fab -e next futurework is to increase the number of hidden layers in the LSTMforecasting model -e approach to increase the number ofhidden layer is also an initial step to experiment the potentialuse of deep-learning model in time series forecasting -elast future work is to extend the application of the proposedmethod to predict the IWIP of other types of tool group toexperiment if the proposed method is capable to deliveringthe same prediction performance -e collected predictionresults across various types of tool groups from the futurework will also allow us to generalize the proposed method tobe used as a generic IWIP prediction model for the fab

Data Availability

-e time-series data used to support the findings of thisstudy were supplied by X-Fab Sarawak Sdn Bhd underprivacy agreement and there the data cannot be made freelyavailable -e data potentially reveal sensitive informationand therefore their access is being restricted

Conflicts of Interest

-e authors declare that they have no conflicts of interest

Acknowledgments

-e funding for this project was made possible through theresearch grant from the Ministry of Education Malaysiaunder the Research Acculturation Collaborative Effort(Grant No RACEb(3)12472015(03)) -e authors wouldlike to thank X-FAB Sarawak Sdn Bhd for their support inthis research by providing the environment and resources toextract the relevant data

References

[1] J A Ramırez-Hernandez J Crabtree X Yao et al ldquoOptimalpreventive maintenance scheduling in semiconductormanufacturing systems software tool and simulation casestudiesrdquo IEEE Transactions on Semiconductor Manufacturingvol 23 no 3 pp 477ndash489 2010

[2] Y Tian and L Pan ldquoPredicting short term traffic flow by longshort-termmemory recurrent neural networkrdquo in Proceedingsof 2015 IEEE International Conference on Smart Citypp 153ndash158 Chengdu China December 2015

[3] K Zhang J Xu M R Min G Jiang K Pelechrinis andH Zhang ldquoAutomated IT system failure prediction a deeplearning approachrdquo in Proceedings of 2016 IEEE InternationalConference on Big Data (Big Data) pp 1291ndash1300 Wash-ington DC USA March 2016

[4] G Zhu L Zhang P Shen and J Song ldquoMultimodal gesturerecognition using 3D convolution and convolutional LSTMrdquoIEEE Access vol 5 pp 4517ndash4524 2017

[5] J Lai B Chen T Tan S Tong and K Yu ldquoPhone-awareLSTM-RNN for voice conversionrdquo in Proceedings of 2016IEEE 13th International Conference on Signal Processing(ICSP) pp 177ndash182 Chengdu China November 2016

[6] A ElSaid B Wild J Higgins and T Desell ldquoUsing LSTMrecurrent neural networks to predict excess vibration events inaircraft enginesrdquo in Proceedings of 2016 IEEE 12th In-ternational Conference on e-Science pp 260ndash269 BaltimoreMD USA October 2016

[7] J Wang J Zhang and X Wang ldquoA data driven cycle timeprediction with feature selection in a semiconductor waferfabrication systemrdquo IEEE Transactions on SemiconductorManufacturing vol 31 no 1 pp 173ndash182 2018

[8] J Wang J Zhang and X Wang ldquoBilateral LSTM a two-dimensional long short-term memory model with multiplymemory units for short-term cycle time forecasting in re-entrant manufacturing systemsrdquo IEEE Transactions on In-dustrial Informatics vol 14 no 2 pp 748ndash758 2018

[9] W Scholl B P Gan P Lendermann et al ldquoImplementationof a simulation-based short-term lot arrival forecast in amature 200 mm semiconductor FABrdquo in Proceedings of 2011Winter Simulation Conference (WSC) pp 1927ndash1938Phoenix AZ USA December 2011

[10] M Mosinski D Noack F S Pappert O Rose andW SchollldquoCluster based analytical method for the lot delivery forecastin semiconductor fab with wide product rangerdquo in Pro-ceedings of 2011 Winter Simulation Conference (WSC)pp 1829ndash1839 Phoenix AZ USA December 2011

[11] H K Larry ldquoEvent-based short-term traffic flow predictionmodelrdquo Transportation Research Record vol 1510 pp 45ndash521995

[12] B M Williams and L A Hoel ldquoModeling and forecastingvehicular traffic flow as a seasonal ARIMA process theoreticalbasis and empirical resultsrdquo Journal of Transportation Engi-neering vol 129 no 6 pp 664ndash672 2003

[13] M Van Der Voort M Dougherty and S Watson ldquoCom-bining Kohonen maps with ARIMA time series models toforecast traffic flowrdquo Transportation Research Part CEmerging Technologies vol 4 no 5 pp 307ndash318 1996

[14] Y Xie Y Zhang and Z Ye ldquoShort-term traffic volumeforecasting using Kalman filter with discrete wavelet de-compositionrdquo Computer-Aided Civil and Infrastructure En-gineering vol 22 no 5 pp 326ndash334 2007

[15] W Huang G Song H Hong and K Xie ldquoDeep architecturefor traffic flow prediction deep belief networks with multitasklearningrdquo IEEE Transactions on Intelligent TransportationSystems vol 15 no 5 pp 2191ndash2201 2014

[16] A Abadi T Rajabioun and P A Ioannou ldquoTraffic flowprediction for road transportation networks with limitedtraffic datardquo IEEE Transactions on Intelligent TransportationSystems vol 16 no 2 pp 653ndash662 2015

Computational Intelligence and Neuroscience 15

[17] R Fu Z Zhang and L Li ldquoUsing LSTM and GRU neuralnetwork methods for traffic flow predictionrdquo in Proceedings of2016 31st Youth Academic Annual Conference of ChineseAssociation of Automation (YAC) pp 324ndash328 WuhanChina November 2016

[18] H Shao and B Soong ldquoTraffic flow prediction with longshort-term memory networks (LSTMs)rdquo in Proceedings of2016 IEEE Region 10 Conference (TENCON) pp 2986ndash2989Singapore November 2016

[19] M S Ahmed and A R Cook ldquoAnalysis of freeway traffictime-series data by using Box-Jenkins Techniquesrdquo Trans-portation Research Record vol 773 no 722 pp 1ndash9 1979

[20] I Okutani ldquo-e Kalman filtering approaches in sometransportation and traffic problemsrdquo Transportation ResearchRecord vol 2 no 1 pp 397ndash416 1987

[21] I Okutani and Y J Stephanedes ldquoDynamic prediction oftraffic volume through Kalman filtering theoryrdquo Trans-portation Research Part B Methodological vol 18 no 1pp 1ndash11 1984

[22] H J H Ji A X A Xu X S X Sui and L L L Li ldquo-e appliedresearch of Kalman in the dynamic travel time predictionrdquo inProceedings of 18th International Conference on Geo-informatics pp 1ndash5 Beijing China June 2010

[23] Y Wang and M Papageorgiou ldquoReal-time freeway trafficstate estimation based on extended Kalman filter a generalapproachrdquo Transportation Research Part B Methodologicalvol 39 no 2 pp 141ndash167 2005

[24] Y Bengio ldquoLearning deep architectures for AIrdquo Foundationsand Trendsreg in Machine Learning vol 2 no 1 pp 1ndash1272009

[25] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inProceedings of NIPS Lake Tahoe NV USA December 2012

[26] G E Hinton and R R Salakhutdinov ldquoReducing the di-mensionality of data with neural networksrdquo Science vol 313no 5786 pp 504ndash507 2006

[27] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of 25th ICML pp 160ndash167 HelsinkiFinland July 2008

[28] I J Goodfellow Y Bulatov J Ibarz S Arnoud and V ShetldquoMulti-digit number recognition from street view imageryusing deep convolutional neural networksrdquo 2013 httpsarxivorgabs13126082

[29] B Huval A Coates and A Ng ldquoDeep learning for class-generic object detectionrdquo 2013 httpsarxivorgabs13126885

[30] H C Shin M R Orton D J Collins S J Doran andM O Leach ldquoStacked autoencoders for unsupervised featurelearning and multiple organ detection in a pilot study using4D patient datardquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 35 no 8 pp 1930ndash1943 2013

[31] Y Lv Y Duan W Kang Z Li and F Wang ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16no 2 pp 865ndash873 2015

[32] R Zhao J Wang R Yan and K Mao ldquoMachine healthmonitoring with LSTM networksrdquo in Proceedings of 2016 10thInternational Conference on Sensing Technology (ICST)pp 1ndash6 Nanjing China November 2016

[33] X Ma Z Tao Y Wang H Yu and Y Wang ldquoLong short-term memory neural network for traffic speed predictionusing remote microwave sensor datardquo Transportation

Research Part C Emerging Technologies vol 54 pp 187ndash1972015

[34] E I Vlahogianni M G Karlaftis and J C Golias ldquoOptimizedand meta-optimized neural networks for short-term trafficflow prediction a genetic approachrdquo Transportation ResearchPart C Emerging Technologies vol 13 no 3 pp 211ndash2342005

[35] S Hochreiter and J Schmidhuber ldquoLong short-term mem-oryrdquo Neural Computation vol 9 no 8 pp 1735ndash1780 1997

[36] Keras -e Python Deep Learning Library 2018 httpskerasio

16 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 12: IncomingWork-In-ProgressPredictioninSemiconductor ...downloads.hindawi.com/journals/cin/2019/8729367.pdf · sequentially in time. e IWIP forecast to a tool group is similar to traffic

perform this activity when the IWIP to the tool group isexpected to be low With an IWIP prediction model that iscapable to predict the IWIP with high accuracy PM activitycan be planned and managed better to reduce its negativeimpact to the fabrsquos CT Reducing the negative impact to CTisimportant as this will enable the Fab to meet the On-Time-Delivery (OTD) committed to customers With consistencyin the OTD the logistic management of the company can beimproved as well such as proper storage place planning to

keep the fabricated wafers and scheduling their trans-portations for shipments Well-planned PM activities alsoallow better manpower planning in areas such as manpowerplanning When performing the PM activity sufficient toolengineers and tool vendors are required to be onsite toperform the prescribed maintenance activities Well-planned PM activities allow the required manpower to beproperly prepared Well-planned manpower directly con-tributes to better manpower cost planning With proper PMplanning in-place tools in the Fab can be scheduled toreceive their proper maintenances on time It is importantfor tools in the Fab to receive their appropriate maintenanceson-time to improve its productivity and lifetime extensionWith improved performance and extended lifetime thecapital investments of the company on the tools can beoptimized Reliable tool performance will also increase thetrust of the customers as chances of the fabricated wafersbeing scrapped due to unhealthy tool are minimized

In this paper we investigated LSTM to assist in PMplanning in Fab by predicting the IWIP to a tool group -ecomparison of the performance of the proposed method wasdone with an existing forecasting method from the Fab -eproposed method was trained using the historical IWIP dataprovided by the Fab which is a time series data Both hit rateand Pearsonrsquos correlation coefficient are important criteriathat determine the forecast capability -e proposed methodhas demonstrated results that outperformed the Fab methodby reaching above the requirement of the Fab for week 1 andweek 3 while the Fabmethod fails to meet Fabrsquos requirementfor all three weeks In terms of hit rate the proposed methodshows higher percentage than Fabrsquos method Following therequirement given by the Fab the results of the proposedmethod signifies that for forecast duration of seven days it isable to identify more accurately the two days that the IWIPwill be highest and the two days the IWIP will be lowest in aweek In terms of Pearsonrsquos correlation coefficient r theproposed method shows positive correlation and highervalue than the Fabrsquos method -is result signifies that theforecast results of the proposed method are able to producebetter forecasting results that have closer proportionalchanges in relation to the actual IWIP -e LSTM model

Table 7 Hit rate for Combination 3

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L L 16 L7

HR 75

w2

12 L L 134 L5 H H 167 H H 1

HR 75

w3

1 L L 12 L3 L4 H5 H H 16 H7

HR 50

Table 8 Pearsonrsquos r for combination 1

Week r

w1 031w2 006w3 034

Table 9 Pearsonrsquos r for combination 2

Week r

w1 040w2 028w3 046

Table 10 Pearsonrsquos r for combination 3

Week r

w1 042w2 031w3 043

Table 11 Summary of hit rate for Combinations 1 2 and 3

CombinationHit rate ()

w1 w2 w31 75 25 502 75 50 253 75 75 50

Table 12 Summary of Pearsonrsquos r for combinations 1 2 and 3

CombinationPearsonrsquos r

w1 w2 w31 031 006 0342 040 028 0463 042 031 043

12 Computational Intelligence and Neuroscience

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 10 batch sizeand 3 hidden layers (512 8 8 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 13 IWIP forecast using Combination 1

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 10 batch sizeand 3 hidden layers (512 8 16 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 14 IWIP forecast using Combination 2

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 20 batch sizeand 3 hidden layers (512 16 16 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 15 IWIP forecast using Combination 3

Computational Intelligence and Neuroscience 13

Table 13 Hit rate for Fab method

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L6 L7 L

HR 50

w2

12 L H3 H4 L5 H L6 L7 H

HR 0

w3

1 L H2 L34 H5 H L6 L7 H

HR 0

Table 14 Pearsonrsquos r for Fab method

Week r

w1 028w2 minus011w3 minus082

Table 15 Forecast result comparison between proposed method and Fab

Hit rate () Pearsonrsquos rWeek Week

w1 w2 w3 w1 w2 w3Proposed method 750 750 500 042 031 043Fab 500 00 00 028 minus011 minus082

ndash10000

00000

10000

20000

30000

40000

50000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using Fab method

ActualProposed LSTM10 upper limit

10 lower limitFab method

Figure 16 IWIP forecast using Fab method

14 Computational Intelligence and Neuroscience

used in the proposed method contains memory cells tomemorize the long and short temporal features in the datawhich yields better performance for the prediction of timeseries data -erefore the proposed method will be veryuseful and benefit to the PM planning

Although the proposed method is outperformed theexisting Fabrsquos statistical method there is still room forimprovement -e first future work is to increase the size ofhistorical dataset With the use of larger historical datasetthat spans across longer historical time horizon to train theLSTM model it may be potential for the LSTM model todiscover significantWIP arrival pattern that could have beenmissed out in smaller historical dataset -e second futurework is to extend the univariate forecasting model in thisresearch to multivariate forecasting model -e reason forthis extension is to allow the inclusion of more features totrain the LSTM model so that the LSTM model can bettermodel the actual environment of the Fab -e next futurework is to increase the number of hidden layers in the LSTMforecasting model -e approach to increase the number ofhidden layer is also an initial step to experiment the potentialuse of deep-learning model in time series forecasting -elast future work is to extend the application of the proposedmethod to predict the IWIP of other types of tool group toexperiment if the proposed method is capable to deliveringthe same prediction performance -e collected predictionresults across various types of tool groups from the futurework will also allow us to generalize the proposed method tobe used as a generic IWIP prediction model for the fab

Data Availability

-e time-series data used to support the findings of thisstudy were supplied by X-Fab Sarawak Sdn Bhd underprivacy agreement and there the data cannot be made freelyavailable -e data potentially reveal sensitive informationand therefore their access is being restricted

Conflicts of Interest

-e authors declare that they have no conflicts of interest

Acknowledgments

-e funding for this project was made possible through theresearch grant from the Ministry of Education Malaysiaunder the Research Acculturation Collaborative Effort(Grant No RACEb(3)12472015(03)) -e authors wouldlike to thank X-FAB Sarawak Sdn Bhd for their support inthis research by providing the environment and resources toextract the relevant data

References

[1] J A Ramırez-Hernandez J Crabtree X Yao et al ldquoOptimalpreventive maintenance scheduling in semiconductormanufacturing systems software tool and simulation casestudiesrdquo IEEE Transactions on Semiconductor Manufacturingvol 23 no 3 pp 477ndash489 2010

[2] Y Tian and L Pan ldquoPredicting short term traffic flow by longshort-termmemory recurrent neural networkrdquo in Proceedingsof 2015 IEEE International Conference on Smart Citypp 153ndash158 Chengdu China December 2015

[3] K Zhang J Xu M R Min G Jiang K Pelechrinis andH Zhang ldquoAutomated IT system failure prediction a deeplearning approachrdquo in Proceedings of 2016 IEEE InternationalConference on Big Data (Big Data) pp 1291ndash1300 Wash-ington DC USA March 2016

[4] G Zhu L Zhang P Shen and J Song ldquoMultimodal gesturerecognition using 3D convolution and convolutional LSTMrdquoIEEE Access vol 5 pp 4517ndash4524 2017

[5] J Lai B Chen T Tan S Tong and K Yu ldquoPhone-awareLSTM-RNN for voice conversionrdquo in Proceedings of 2016IEEE 13th International Conference on Signal Processing(ICSP) pp 177ndash182 Chengdu China November 2016

[6] A ElSaid B Wild J Higgins and T Desell ldquoUsing LSTMrecurrent neural networks to predict excess vibration events inaircraft enginesrdquo in Proceedings of 2016 IEEE 12th In-ternational Conference on e-Science pp 260ndash269 BaltimoreMD USA October 2016

[7] J Wang J Zhang and X Wang ldquoA data driven cycle timeprediction with feature selection in a semiconductor waferfabrication systemrdquo IEEE Transactions on SemiconductorManufacturing vol 31 no 1 pp 173ndash182 2018

[8] J Wang J Zhang and X Wang ldquoBilateral LSTM a two-dimensional long short-term memory model with multiplymemory units for short-term cycle time forecasting in re-entrant manufacturing systemsrdquo IEEE Transactions on In-dustrial Informatics vol 14 no 2 pp 748ndash758 2018

[9] W Scholl B P Gan P Lendermann et al ldquoImplementationof a simulation-based short-term lot arrival forecast in amature 200 mm semiconductor FABrdquo in Proceedings of 2011Winter Simulation Conference (WSC) pp 1927ndash1938Phoenix AZ USA December 2011

[10] M Mosinski D Noack F S Pappert O Rose andW SchollldquoCluster based analytical method for the lot delivery forecastin semiconductor fab with wide product rangerdquo in Pro-ceedings of 2011 Winter Simulation Conference (WSC)pp 1829ndash1839 Phoenix AZ USA December 2011

[11] H K Larry ldquoEvent-based short-term traffic flow predictionmodelrdquo Transportation Research Record vol 1510 pp 45ndash521995

[12] B M Williams and L A Hoel ldquoModeling and forecastingvehicular traffic flow as a seasonal ARIMA process theoreticalbasis and empirical resultsrdquo Journal of Transportation Engi-neering vol 129 no 6 pp 664ndash672 2003

[13] M Van Der Voort M Dougherty and S Watson ldquoCom-bining Kohonen maps with ARIMA time series models toforecast traffic flowrdquo Transportation Research Part CEmerging Technologies vol 4 no 5 pp 307ndash318 1996

[14] Y Xie Y Zhang and Z Ye ldquoShort-term traffic volumeforecasting using Kalman filter with discrete wavelet de-compositionrdquo Computer-Aided Civil and Infrastructure En-gineering vol 22 no 5 pp 326ndash334 2007

[15] W Huang G Song H Hong and K Xie ldquoDeep architecturefor traffic flow prediction deep belief networks with multitasklearningrdquo IEEE Transactions on Intelligent TransportationSystems vol 15 no 5 pp 2191ndash2201 2014

[16] A Abadi T Rajabioun and P A Ioannou ldquoTraffic flowprediction for road transportation networks with limitedtraffic datardquo IEEE Transactions on Intelligent TransportationSystems vol 16 no 2 pp 653ndash662 2015

Computational Intelligence and Neuroscience 15

[17] R Fu Z Zhang and L Li ldquoUsing LSTM and GRU neuralnetwork methods for traffic flow predictionrdquo in Proceedings of2016 31st Youth Academic Annual Conference of ChineseAssociation of Automation (YAC) pp 324ndash328 WuhanChina November 2016

[18] H Shao and B Soong ldquoTraffic flow prediction with longshort-term memory networks (LSTMs)rdquo in Proceedings of2016 IEEE Region 10 Conference (TENCON) pp 2986ndash2989Singapore November 2016

[19] M S Ahmed and A R Cook ldquoAnalysis of freeway traffictime-series data by using Box-Jenkins Techniquesrdquo Trans-portation Research Record vol 773 no 722 pp 1ndash9 1979

[20] I Okutani ldquo-e Kalman filtering approaches in sometransportation and traffic problemsrdquo Transportation ResearchRecord vol 2 no 1 pp 397ndash416 1987

[21] I Okutani and Y J Stephanedes ldquoDynamic prediction oftraffic volume through Kalman filtering theoryrdquo Trans-portation Research Part B Methodological vol 18 no 1pp 1ndash11 1984

[22] H J H Ji A X A Xu X S X Sui and L L L Li ldquo-e appliedresearch of Kalman in the dynamic travel time predictionrdquo inProceedings of 18th International Conference on Geo-informatics pp 1ndash5 Beijing China June 2010

[23] Y Wang and M Papageorgiou ldquoReal-time freeway trafficstate estimation based on extended Kalman filter a generalapproachrdquo Transportation Research Part B Methodologicalvol 39 no 2 pp 141ndash167 2005

[24] Y Bengio ldquoLearning deep architectures for AIrdquo Foundationsand Trendsreg in Machine Learning vol 2 no 1 pp 1ndash1272009

[25] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inProceedings of NIPS Lake Tahoe NV USA December 2012

[26] G E Hinton and R R Salakhutdinov ldquoReducing the di-mensionality of data with neural networksrdquo Science vol 313no 5786 pp 504ndash507 2006

[27] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of 25th ICML pp 160ndash167 HelsinkiFinland July 2008

[28] I J Goodfellow Y Bulatov J Ibarz S Arnoud and V ShetldquoMulti-digit number recognition from street view imageryusing deep convolutional neural networksrdquo 2013 httpsarxivorgabs13126082

[29] B Huval A Coates and A Ng ldquoDeep learning for class-generic object detectionrdquo 2013 httpsarxivorgabs13126885

[30] H C Shin M R Orton D J Collins S J Doran andM O Leach ldquoStacked autoencoders for unsupervised featurelearning and multiple organ detection in a pilot study using4D patient datardquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 35 no 8 pp 1930ndash1943 2013

[31] Y Lv Y Duan W Kang Z Li and F Wang ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16no 2 pp 865ndash873 2015

[32] R Zhao J Wang R Yan and K Mao ldquoMachine healthmonitoring with LSTM networksrdquo in Proceedings of 2016 10thInternational Conference on Sensing Technology (ICST)pp 1ndash6 Nanjing China November 2016

[33] X Ma Z Tao Y Wang H Yu and Y Wang ldquoLong short-term memory neural network for traffic speed predictionusing remote microwave sensor datardquo Transportation

Research Part C Emerging Technologies vol 54 pp 187ndash1972015

[34] E I Vlahogianni M G Karlaftis and J C Golias ldquoOptimizedand meta-optimized neural networks for short-term trafficflow prediction a genetic approachrdquo Transportation ResearchPart C Emerging Technologies vol 13 no 3 pp 211ndash2342005

[35] S Hochreiter and J Schmidhuber ldquoLong short-term mem-oryrdquo Neural Computation vol 9 no 8 pp 1735ndash1780 1997

[36] Keras -e Python Deep Learning Library 2018 httpskerasio

16 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 13: IncomingWork-In-ProgressPredictioninSemiconductor ...downloads.hindawi.com/journals/cin/2019/8729367.pdf · sequentially in time. e IWIP forecast to a tool group is similar to traffic

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 10 batch sizeand 3 hidden layers (512 8 8 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 13 IWIP forecast using Combination 1

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 10 batch sizeand 3 hidden layers (512 8 16 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 14 IWIP forecast using Combination 2

ndash05000

00000

05000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using 100 epoch 20 batch sizeand 3 hidden layers (512 16 16 hidden neurons)

ActualProposed LSTM

10 upper limit10 lower limit

Figure 15 IWIP forecast using Combination 3

Computational Intelligence and Neuroscience 13

Table 13 Hit rate for Fab method

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L6 L7 L

HR 50

w2

12 L H3 H4 L5 H L6 L7 H

HR 0

w3

1 L H2 L34 H5 H L6 L7 H

HR 0

Table 14 Pearsonrsquos r for Fab method

Week r

w1 028w2 minus011w3 minus082

Table 15 Forecast result comparison between proposed method and Fab

Hit rate () Pearsonrsquos rWeek Week

w1 w2 w3 w1 w2 w3Proposed method 750 750 500 042 031 043Fab 500 00 00 028 minus011 minus082

ndash10000

00000

10000

20000

30000

40000

50000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using Fab method

ActualProposed LSTM10 upper limit

10 lower limitFab method

Figure 16 IWIP forecast using Fab method

14 Computational Intelligence and Neuroscience

used in the proposed method contains memory cells tomemorize the long and short temporal features in the datawhich yields better performance for the prediction of timeseries data -erefore the proposed method will be veryuseful and benefit to the PM planning

Although the proposed method is outperformed theexisting Fabrsquos statistical method there is still room forimprovement -e first future work is to increase the size ofhistorical dataset With the use of larger historical datasetthat spans across longer historical time horizon to train theLSTM model it may be potential for the LSTM model todiscover significantWIP arrival pattern that could have beenmissed out in smaller historical dataset -e second futurework is to extend the univariate forecasting model in thisresearch to multivariate forecasting model -e reason forthis extension is to allow the inclusion of more features totrain the LSTM model so that the LSTM model can bettermodel the actual environment of the Fab -e next futurework is to increase the number of hidden layers in the LSTMforecasting model -e approach to increase the number ofhidden layer is also an initial step to experiment the potentialuse of deep-learning model in time series forecasting -elast future work is to extend the application of the proposedmethod to predict the IWIP of other types of tool group toexperiment if the proposed method is capable to deliveringthe same prediction performance -e collected predictionresults across various types of tool groups from the futurework will also allow us to generalize the proposed method tobe used as a generic IWIP prediction model for the fab

Data Availability

-e time-series data used to support the findings of thisstudy were supplied by X-Fab Sarawak Sdn Bhd underprivacy agreement and there the data cannot be made freelyavailable -e data potentially reveal sensitive informationand therefore their access is being restricted

Conflicts of Interest

-e authors declare that they have no conflicts of interest

Acknowledgments

-e funding for this project was made possible through theresearch grant from the Ministry of Education Malaysiaunder the Research Acculturation Collaborative Effort(Grant No RACEb(3)12472015(03)) -e authors wouldlike to thank X-FAB Sarawak Sdn Bhd for their support inthis research by providing the environment and resources toextract the relevant data

References

[1] J A Ramırez-Hernandez J Crabtree X Yao et al ldquoOptimalpreventive maintenance scheduling in semiconductormanufacturing systems software tool and simulation casestudiesrdquo IEEE Transactions on Semiconductor Manufacturingvol 23 no 3 pp 477ndash489 2010

[2] Y Tian and L Pan ldquoPredicting short term traffic flow by longshort-termmemory recurrent neural networkrdquo in Proceedingsof 2015 IEEE International Conference on Smart Citypp 153ndash158 Chengdu China December 2015

[3] K Zhang J Xu M R Min G Jiang K Pelechrinis andH Zhang ldquoAutomated IT system failure prediction a deeplearning approachrdquo in Proceedings of 2016 IEEE InternationalConference on Big Data (Big Data) pp 1291ndash1300 Wash-ington DC USA March 2016

[4] G Zhu L Zhang P Shen and J Song ldquoMultimodal gesturerecognition using 3D convolution and convolutional LSTMrdquoIEEE Access vol 5 pp 4517ndash4524 2017

[5] J Lai B Chen T Tan S Tong and K Yu ldquoPhone-awareLSTM-RNN for voice conversionrdquo in Proceedings of 2016IEEE 13th International Conference on Signal Processing(ICSP) pp 177ndash182 Chengdu China November 2016

[6] A ElSaid B Wild J Higgins and T Desell ldquoUsing LSTMrecurrent neural networks to predict excess vibration events inaircraft enginesrdquo in Proceedings of 2016 IEEE 12th In-ternational Conference on e-Science pp 260ndash269 BaltimoreMD USA October 2016

[7] J Wang J Zhang and X Wang ldquoA data driven cycle timeprediction with feature selection in a semiconductor waferfabrication systemrdquo IEEE Transactions on SemiconductorManufacturing vol 31 no 1 pp 173ndash182 2018

[8] J Wang J Zhang and X Wang ldquoBilateral LSTM a two-dimensional long short-term memory model with multiplymemory units for short-term cycle time forecasting in re-entrant manufacturing systemsrdquo IEEE Transactions on In-dustrial Informatics vol 14 no 2 pp 748ndash758 2018

[9] W Scholl B P Gan P Lendermann et al ldquoImplementationof a simulation-based short-term lot arrival forecast in amature 200 mm semiconductor FABrdquo in Proceedings of 2011Winter Simulation Conference (WSC) pp 1927ndash1938Phoenix AZ USA December 2011

[10] M Mosinski D Noack F S Pappert O Rose andW SchollldquoCluster based analytical method for the lot delivery forecastin semiconductor fab with wide product rangerdquo in Pro-ceedings of 2011 Winter Simulation Conference (WSC)pp 1829ndash1839 Phoenix AZ USA December 2011

[11] H K Larry ldquoEvent-based short-term traffic flow predictionmodelrdquo Transportation Research Record vol 1510 pp 45ndash521995

[12] B M Williams and L A Hoel ldquoModeling and forecastingvehicular traffic flow as a seasonal ARIMA process theoreticalbasis and empirical resultsrdquo Journal of Transportation Engi-neering vol 129 no 6 pp 664ndash672 2003

[13] M Van Der Voort M Dougherty and S Watson ldquoCom-bining Kohonen maps with ARIMA time series models toforecast traffic flowrdquo Transportation Research Part CEmerging Technologies vol 4 no 5 pp 307ndash318 1996

[14] Y Xie Y Zhang and Z Ye ldquoShort-term traffic volumeforecasting using Kalman filter with discrete wavelet de-compositionrdquo Computer-Aided Civil and Infrastructure En-gineering vol 22 no 5 pp 326ndash334 2007

[15] W Huang G Song H Hong and K Xie ldquoDeep architecturefor traffic flow prediction deep belief networks with multitasklearningrdquo IEEE Transactions on Intelligent TransportationSystems vol 15 no 5 pp 2191ndash2201 2014

[16] A Abadi T Rajabioun and P A Ioannou ldquoTraffic flowprediction for road transportation networks with limitedtraffic datardquo IEEE Transactions on Intelligent TransportationSystems vol 16 no 2 pp 653ndash662 2015

Computational Intelligence and Neuroscience 15

[17] R Fu Z Zhang and L Li ldquoUsing LSTM and GRU neuralnetwork methods for traffic flow predictionrdquo in Proceedings of2016 31st Youth Academic Annual Conference of ChineseAssociation of Automation (YAC) pp 324ndash328 WuhanChina November 2016

[18] H Shao and B Soong ldquoTraffic flow prediction with longshort-term memory networks (LSTMs)rdquo in Proceedings of2016 IEEE Region 10 Conference (TENCON) pp 2986ndash2989Singapore November 2016

[19] M S Ahmed and A R Cook ldquoAnalysis of freeway traffictime-series data by using Box-Jenkins Techniquesrdquo Trans-portation Research Record vol 773 no 722 pp 1ndash9 1979

[20] I Okutani ldquo-e Kalman filtering approaches in sometransportation and traffic problemsrdquo Transportation ResearchRecord vol 2 no 1 pp 397ndash416 1987

[21] I Okutani and Y J Stephanedes ldquoDynamic prediction oftraffic volume through Kalman filtering theoryrdquo Trans-portation Research Part B Methodological vol 18 no 1pp 1ndash11 1984

[22] H J H Ji A X A Xu X S X Sui and L L L Li ldquo-e appliedresearch of Kalman in the dynamic travel time predictionrdquo inProceedings of 18th International Conference on Geo-informatics pp 1ndash5 Beijing China June 2010

[23] Y Wang and M Papageorgiou ldquoReal-time freeway trafficstate estimation based on extended Kalman filter a generalapproachrdquo Transportation Research Part B Methodologicalvol 39 no 2 pp 141ndash167 2005

[24] Y Bengio ldquoLearning deep architectures for AIrdquo Foundationsand Trendsreg in Machine Learning vol 2 no 1 pp 1ndash1272009

[25] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inProceedings of NIPS Lake Tahoe NV USA December 2012

[26] G E Hinton and R R Salakhutdinov ldquoReducing the di-mensionality of data with neural networksrdquo Science vol 313no 5786 pp 504ndash507 2006

[27] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of 25th ICML pp 160ndash167 HelsinkiFinland July 2008

[28] I J Goodfellow Y Bulatov J Ibarz S Arnoud and V ShetldquoMulti-digit number recognition from street view imageryusing deep convolutional neural networksrdquo 2013 httpsarxivorgabs13126082

[29] B Huval A Coates and A Ng ldquoDeep learning for class-generic object detectionrdquo 2013 httpsarxivorgabs13126885

[30] H C Shin M R Orton D J Collins S J Doran andM O Leach ldquoStacked autoencoders for unsupervised featurelearning and multiple organ detection in a pilot study using4D patient datardquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 35 no 8 pp 1930ndash1943 2013

[31] Y Lv Y Duan W Kang Z Li and F Wang ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16no 2 pp 865ndash873 2015

[32] R Zhao J Wang R Yan and K Mao ldquoMachine healthmonitoring with LSTM networksrdquo in Proceedings of 2016 10thInternational Conference on Sensing Technology (ICST)pp 1ndash6 Nanjing China November 2016

[33] X Ma Z Tao Y Wang H Yu and Y Wang ldquoLong short-term memory neural network for traffic speed predictionusing remote microwave sensor datardquo Transportation

Research Part C Emerging Technologies vol 54 pp 187ndash1972015

[34] E I Vlahogianni M G Karlaftis and J C Golias ldquoOptimizedand meta-optimized neural networks for short-term trafficflow prediction a genetic approachrdquo Transportation ResearchPart C Emerging Technologies vol 13 no 3 pp 211ndash2342005

[35] S Hochreiter and J Schmidhuber ldquoLong short-term mem-oryrdquo Neural Computation vol 9 no 8 pp 1735ndash1780 1997

[36] Keras -e Python Deep Learning Library 2018 httpskerasio

16 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 14: IncomingWork-In-ProgressPredictioninSemiconductor ...downloads.hindawi.com/journals/cin/2019/8729367.pdf · sequentially in time. e IWIP forecast to a tool group is similar to traffic

Table 13 Hit rate for Fab method

Week f Actual Forecast Hit

w1

12 L3 H H 14 H H 15 L6 L7 L

HR 50

w2

12 L H3 H4 L5 H L6 L7 H

HR 0

w3

1 L H2 L34 H5 H L6 L7 H

HR 0

Table 14 Pearsonrsquos r for Fab method

Week r

w1 028w2 minus011w3 minus082

Table 15 Forecast result comparison between proposed method and Fab

Hit rate () Pearsonrsquos rWeek Week

w1 w2 w3 w1 w2 w3Proposed method 750 750 500 042 031 043Fab 500 00 00 028 minus011 minus082

ndash10000

00000

10000

20000

30000

40000

50000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Inco

min

g W

IP (w

afer

)

Day

21 days IWIP forecast using Fab method

ActualProposed LSTM10 upper limit

10 lower limitFab method

Figure 16 IWIP forecast using Fab method

14 Computational Intelligence and Neuroscience

used in the proposed method contains memory cells tomemorize the long and short temporal features in the datawhich yields better performance for the prediction of timeseries data -erefore the proposed method will be veryuseful and benefit to the PM planning

Although the proposed method is outperformed theexisting Fabrsquos statistical method there is still room forimprovement -e first future work is to increase the size ofhistorical dataset With the use of larger historical datasetthat spans across longer historical time horizon to train theLSTM model it may be potential for the LSTM model todiscover significantWIP arrival pattern that could have beenmissed out in smaller historical dataset -e second futurework is to extend the univariate forecasting model in thisresearch to multivariate forecasting model -e reason forthis extension is to allow the inclusion of more features totrain the LSTM model so that the LSTM model can bettermodel the actual environment of the Fab -e next futurework is to increase the number of hidden layers in the LSTMforecasting model -e approach to increase the number ofhidden layer is also an initial step to experiment the potentialuse of deep-learning model in time series forecasting -elast future work is to extend the application of the proposedmethod to predict the IWIP of other types of tool group toexperiment if the proposed method is capable to deliveringthe same prediction performance -e collected predictionresults across various types of tool groups from the futurework will also allow us to generalize the proposed method tobe used as a generic IWIP prediction model for the fab

Data Availability

-e time-series data used to support the findings of thisstudy were supplied by X-Fab Sarawak Sdn Bhd underprivacy agreement and there the data cannot be made freelyavailable -e data potentially reveal sensitive informationand therefore their access is being restricted

Conflicts of Interest

-e authors declare that they have no conflicts of interest

Acknowledgments

-e funding for this project was made possible through theresearch grant from the Ministry of Education Malaysiaunder the Research Acculturation Collaborative Effort(Grant No RACEb(3)12472015(03)) -e authors wouldlike to thank X-FAB Sarawak Sdn Bhd for their support inthis research by providing the environment and resources toextract the relevant data

References

[1] J A Ramırez-Hernandez J Crabtree X Yao et al ldquoOptimalpreventive maintenance scheduling in semiconductormanufacturing systems software tool and simulation casestudiesrdquo IEEE Transactions on Semiconductor Manufacturingvol 23 no 3 pp 477ndash489 2010

[2] Y Tian and L Pan ldquoPredicting short term traffic flow by longshort-termmemory recurrent neural networkrdquo in Proceedingsof 2015 IEEE International Conference on Smart Citypp 153ndash158 Chengdu China December 2015

[3] K Zhang J Xu M R Min G Jiang K Pelechrinis andH Zhang ldquoAutomated IT system failure prediction a deeplearning approachrdquo in Proceedings of 2016 IEEE InternationalConference on Big Data (Big Data) pp 1291ndash1300 Wash-ington DC USA March 2016

[4] G Zhu L Zhang P Shen and J Song ldquoMultimodal gesturerecognition using 3D convolution and convolutional LSTMrdquoIEEE Access vol 5 pp 4517ndash4524 2017

[5] J Lai B Chen T Tan S Tong and K Yu ldquoPhone-awareLSTM-RNN for voice conversionrdquo in Proceedings of 2016IEEE 13th International Conference on Signal Processing(ICSP) pp 177ndash182 Chengdu China November 2016

[6] A ElSaid B Wild J Higgins and T Desell ldquoUsing LSTMrecurrent neural networks to predict excess vibration events inaircraft enginesrdquo in Proceedings of 2016 IEEE 12th In-ternational Conference on e-Science pp 260ndash269 BaltimoreMD USA October 2016

[7] J Wang J Zhang and X Wang ldquoA data driven cycle timeprediction with feature selection in a semiconductor waferfabrication systemrdquo IEEE Transactions on SemiconductorManufacturing vol 31 no 1 pp 173ndash182 2018

[8] J Wang J Zhang and X Wang ldquoBilateral LSTM a two-dimensional long short-term memory model with multiplymemory units for short-term cycle time forecasting in re-entrant manufacturing systemsrdquo IEEE Transactions on In-dustrial Informatics vol 14 no 2 pp 748ndash758 2018

[9] W Scholl B P Gan P Lendermann et al ldquoImplementationof a simulation-based short-term lot arrival forecast in amature 200 mm semiconductor FABrdquo in Proceedings of 2011Winter Simulation Conference (WSC) pp 1927ndash1938Phoenix AZ USA December 2011

[10] M Mosinski D Noack F S Pappert O Rose andW SchollldquoCluster based analytical method for the lot delivery forecastin semiconductor fab with wide product rangerdquo in Pro-ceedings of 2011 Winter Simulation Conference (WSC)pp 1829ndash1839 Phoenix AZ USA December 2011

[11] H K Larry ldquoEvent-based short-term traffic flow predictionmodelrdquo Transportation Research Record vol 1510 pp 45ndash521995

[12] B M Williams and L A Hoel ldquoModeling and forecastingvehicular traffic flow as a seasonal ARIMA process theoreticalbasis and empirical resultsrdquo Journal of Transportation Engi-neering vol 129 no 6 pp 664ndash672 2003

[13] M Van Der Voort M Dougherty and S Watson ldquoCom-bining Kohonen maps with ARIMA time series models toforecast traffic flowrdquo Transportation Research Part CEmerging Technologies vol 4 no 5 pp 307ndash318 1996

[14] Y Xie Y Zhang and Z Ye ldquoShort-term traffic volumeforecasting using Kalman filter with discrete wavelet de-compositionrdquo Computer-Aided Civil and Infrastructure En-gineering vol 22 no 5 pp 326ndash334 2007

[15] W Huang G Song H Hong and K Xie ldquoDeep architecturefor traffic flow prediction deep belief networks with multitasklearningrdquo IEEE Transactions on Intelligent TransportationSystems vol 15 no 5 pp 2191ndash2201 2014

[16] A Abadi T Rajabioun and P A Ioannou ldquoTraffic flowprediction for road transportation networks with limitedtraffic datardquo IEEE Transactions on Intelligent TransportationSystems vol 16 no 2 pp 653ndash662 2015

Computational Intelligence and Neuroscience 15

[17] R Fu Z Zhang and L Li ldquoUsing LSTM and GRU neuralnetwork methods for traffic flow predictionrdquo in Proceedings of2016 31st Youth Academic Annual Conference of ChineseAssociation of Automation (YAC) pp 324ndash328 WuhanChina November 2016

[18] H Shao and B Soong ldquoTraffic flow prediction with longshort-term memory networks (LSTMs)rdquo in Proceedings of2016 IEEE Region 10 Conference (TENCON) pp 2986ndash2989Singapore November 2016

[19] M S Ahmed and A R Cook ldquoAnalysis of freeway traffictime-series data by using Box-Jenkins Techniquesrdquo Trans-portation Research Record vol 773 no 722 pp 1ndash9 1979

[20] I Okutani ldquo-e Kalman filtering approaches in sometransportation and traffic problemsrdquo Transportation ResearchRecord vol 2 no 1 pp 397ndash416 1987

[21] I Okutani and Y J Stephanedes ldquoDynamic prediction oftraffic volume through Kalman filtering theoryrdquo Trans-portation Research Part B Methodological vol 18 no 1pp 1ndash11 1984

[22] H J H Ji A X A Xu X S X Sui and L L L Li ldquo-e appliedresearch of Kalman in the dynamic travel time predictionrdquo inProceedings of 18th International Conference on Geo-informatics pp 1ndash5 Beijing China June 2010

[23] Y Wang and M Papageorgiou ldquoReal-time freeway trafficstate estimation based on extended Kalman filter a generalapproachrdquo Transportation Research Part B Methodologicalvol 39 no 2 pp 141ndash167 2005

[24] Y Bengio ldquoLearning deep architectures for AIrdquo Foundationsand Trendsreg in Machine Learning vol 2 no 1 pp 1ndash1272009

[25] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inProceedings of NIPS Lake Tahoe NV USA December 2012

[26] G E Hinton and R R Salakhutdinov ldquoReducing the di-mensionality of data with neural networksrdquo Science vol 313no 5786 pp 504ndash507 2006

[27] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of 25th ICML pp 160ndash167 HelsinkiFinland July 2008

[28] I J Goodfellow Y Bulatov J Ibarz S Arnoud and V ShetldquoMulti-digit number recognition from street view imageryusing deep convolutional neural networksrdquo 2013 httpsarxivorgabs13126082

[29] B Huval A Coates and A Ng ldquoDeep learning for class-generic object detectionrdquo 2013 httpsarxivorgabs13126885

[30] H C Shin M R Orton D J Collins S J Doran andM O Leach ldquoStacked autoencoders for unsupervised featurelearning and multiple organ detection in a pilot study using4D patient datardquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 35 no 8 pp 1930ndash1943 2013

[31] Y Lv Y Duan W Kang Z Li and F Wang ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16no 2 pp 865ndash873 2015

[32] R Zhao J Wang R Yan and K Mao ldquoMachine healthmonitoring with LSTM networksrdquo in Proceedings of 2016 10thInternational Conference on Sensing Technology (ICST)pp 1ndash6 Nanjing China November 2016

[33] X Ma Z Tao Y Wang H Yu and Y Wang ldquoLong short-term memory neural network for traffic speed predictionusing remote microwave sensor datardquo Transportation

Research Part C Emerging Technologies vol 54 pp 187ndash1972015

[34] E I Vlahogianni M G Karlaftis and J C Golias ldquoOptimizedand meta-optimized neural networks for short-term trafficflow prediction a genetic approachrdquo Transportation ResearchPart C Emerging Technologies vol 13 no 3 pp 211ndash2342005

[35] S Hochreiter and J Schmidhuber ldquoLong short-term mem-oryrdquo Neural Computation vol 9 no 8 pp 1735ndash1780 1997

[36] Keras -e Python Deep Learning Library 2018 httpskerasio

16 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 15: IncomingWork-In-ProgressPredictioninSemiconductor ...downloads.hindawi.com/journals/cin/2019/8729367.pdf · sequentially in time. e IWIP forecast to a tool group is similar to traffic

used in the proposed method contains memory cells tomemorize the long and short temporal features in the datawhich yields better performance for the prediction of timeseries data -erefore the proposed method will be veryuseful and benefit to the PM planning

Although the proposed method is outperformed theexisting Fabrsquos statistical method there is still room forimprovement -e first future work is to increase the size ofhistorical dataset With the use of larger historical datasetthat spans across longer historical time horizon to train theLSTM model it may be potential for the LSTM model todiscover significantWIP arrival pattern that could have beenmissed out in smaller historical dataset -e second futurework is to extend the univariate forecasting model in thisresearch to multivariate forecasting model -e reason forthis extension is to allow the inclusion of more features totrain the LSTM model so that the LSTM model can bettermodel the actual environment of the Fab -e next futurework is to increase the number of hidden layers in the LSTMforecasting model -e approach to increase the number ofhidden layer is also an initial step to experiment the potentialuse of deep-learning model in time series forecasting -elast future work is to extend the application of the proposedmethod to predict the IWIP of other types of tool group toexperiment if the proposed method is capable to deliveringthe same prediction performance -e collected predictionresults across various types of tool groups from the futurework will also allow us to generalize the proposed method tobe used as a generic IWIP prediction model for the fab

Data Availability

-e time-series data used to support the findings of thisstudy were supplied by X-Fab Sarawak Sdn Bhd underprivacy agreement and there the data cannot be made freelyavailable -e data potentially reveal sensitive informationand therefore their access is being restricted

Conflicts of Interest

-e authors declare that they have no conflicts of interest

Acknowledgments

-e funding for this project was made possible through theresearch grant from the Ministry of Education Malaysiaunder the Research Acculturation Collaborative Effort(Grant No RACEb(3)12472015(03)) -e authors wouldlike to thank X-FAB Sarawak Sdn Bhd for their support inthis research by providing the environment and resources toextract the relevant data

References

[1] J A Ramırez-Hernandez J Crabtree X Yao et al ldquoOptimalpreventive maintenance scheduling in semiconductormanufacturing systems software tool and simulation casestudiesrdquo IEEE Transactions on Semiconductor Manufacturingvol 23 no 3 pp 477ndash489 2010

[2] Y Tian and L Pan ldquoPredicting short term traffic flow by longshort-termmemory recurrent neural networkrdquo in Proceedingsof 2015 IEEE International Conference on Smart Citypp 153ndash158 Chengdu China December 2015

[3] K Zhang J Xu M R Min G Jiang K Pelechrinis andH Zhang ldquoAutomated IT system failure prediction a deeplearning approachrdquo in Proceedings of 2016 IEEE InternationalConference on Big Data (Big Data) pp 1291ndash1300 Wash-ington DC USA March 2016

[4] G Zhu L Zhang P Shen and J Song ldquoMultimodal gesturerecognition using 3D convolution and convolutional LSTMrdquoIEEE Access vol 5 pp 4517ndash4524 2017

[5] J Lai B Chen T Tan S Tong and K Yu ldquoPhone-awareLSTM-RNN for voice conversionrdquo in Proceedings of 2016IEEE 13th International Conference on Signal Processing(ICSP) pp 177ndash182 Chengdu China November 2016

[6] A ElSaid B Wild J Higgins and T Desell ldquoUsing LSTMrecurrent neural networks to predict excess vibration events inaircraft enginesrdquo in Proceedings of 2016 IEEE 12th In-ternational Conference on e-Science pp 260ndash269 BaltimoreMD USA October 2016

[7] J Wang J Zhang and X Wang ldquoA data driven cycle timeprediction with feature selection in a semiconductor waferfabrication systemrdquo IEEE Transactions on SemiconductorManufacturing vol 31 no 1 pp 173ndash182 2018

[8] J Wang J Zhang and X Wang ldquoBilateral LSTM a two-dimensional long short-term memory model with multiplymemory units for short-term cycle time forecasting in re-entrant manufacturing systemsrdquo IEEE Transactions on In-dustrial Informatics vol 14 no 2 pp 748ndash758 2018

[9] W Scholl B P Gan P Lendermann et al ldquoImplementationof a simulation-based short-term lot arrival forecast in amature 200 mm semiconductor FABrdquo in Proceedings of 2011Winter Simulation Conference (WSC) pp 1927ndash1938Phoenix AZ USA December 2011

[10] M Mosinski D Noack F S Pappert O Rose andW SchollldquoCluster based analytical method for the lot delivery forecastin semiconductor fab with wide product rangerdquo in Pro-ceedings of 2011 Winter Simulation Conference (WSC)pp 1829ndash1839 Phoenix AZ USA December 2011

[11] H K Larry ldquoEvent-based short-term traffic flow predictionmodelrdquo Transportation Research Record vol 1510 pp 45ndash521995

[12] B M Williams and L A Hoel ldquoModeling and forecastingvehicular traffic flow as a seasonal ARIMA process theoreticalbasis and empirical resultsrdquo Journal of Transportation Engi-neering vol 129 no 6 pp 664ndash672 2003

[13] M Van Der Voort M Dougherty and S Watson ldquoCom-bining Kohonen maps with ARIMA time series models toforecast traffic flowrdquo Transportation Research Part CEmerging Technologies vol 4 no 5 pp 307ndash318 1996

[14] Y Xie Y Zhang and Z Ye ldquoShort-term traffic volumeforecasting using Kalman filter with discrete wavelet de-compositionrdquo Computer-Aided Civil and Infrastructure En-gineering vol 22 no 5 pp 326ndash334 2007

[15] W Huang G Song H Hong and K Xie ldquoDeep architecturefor traffic flow prediction deep belief networks with multitasklearningrdquo IEEE Transactions on Intelligent TransportationSystems vol 15 no 5 pp 2191ndash2201 2014

[16] A Abadi T Rajabioun and P A Ioannou ldquoTraffic flowprediction for road transportation networks with limitedtraffic datardquo IEEE Transactions on Intelligent TransportationSystems vol 16 no 2 pp 653ndash662 2015

Computational Intelligence and Neuroscience 15

[17] R Fu Z Zhang and L Li ldquoUsing LSTM and GRU neuralnetwork methods for traffic flow predictionrdquo in Proceedings of2016 31st Youth Academic Annual Conference of ChineseAssociation of Automation (YAC) pp 324ndash328 WuhanChina November 2016

[18] H Shao and B Soong ldquoTraffic flow prediction with longshort-term memory networks (LSTMs)rdquo in Proceedings of2016 IEEE Region 10 Conference (TENCON) pp 2986ndash2989Singapore November 2016

[19] M S Ahmed and A R Cook ldquoAnalysis of freeway traffictime-series data by using Box-Jenkins Techniquesrdquo Trans-portation Research Record vol 773 no 722 pp 1ndash9 1979

[20] I Okutani ldquo-e Kalman filtering approaches in sometransportation and traffic problemsrdquo Transportation ResearchRecord vol 2 no 1 pp 397ndash416 1987

[21] I Okutani and Y J Stephanedes ldquoDynamic prediction oftraffic volume through Kalman filtering theoryrdquo Trans-portation Research Part B Methodological vol 18 no 1pp 1ndash11 1984

[22] H J H Ji A X A Xu X S X Sui and L L L Li ldquo-e appliedresearch of Kalman in the dynamic travel time predictionrdquo inProceedings of 18th International Conference on Geo-informatics pp 1ndash5 Beijing China June 2010

[23] Y Wang and M Papageorgiou ldquoReal-time freeway trafficstate estimation based on extended Kalman filter a generalapproachrdquo Transportation Research Part B Methodologicalvol 39 no 2 pp 141ndash167 2005

[24] Y Bengio ldquoLearning deep architectures for AIrdquo Foundationsand Trendsreg in Machine Learning vol 2 no 1 pp 1ndash1272009

[25] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inProceedings of NIPS Lake Tahoe NV USA December 2012

[26] G E Hinton and R R Salakhutdinov ldquoReducing the di-mensionality of data with neural networksrdquo Science vol 313no 5786 pp 504ndash507 2006

[27] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of 25th ICML pp 160ndash167 HelsinkiFinland July 2008

[28] I J Goodfellow Y Bulatov J Ibarz S Arnoud and V ShetldquoMulti-digit number recognition from street view imageryusing deep convolutional neural networksrdquo 2013 httpsarxivorgabs13126082

[29] B Huval A Coates and A Ng ldquoDeep learning for class-generic object detectionrdquo 2013 httpsarxivorgabs13126885

[30] H C Shin M R Orton D J Collins S J Doran andM O Leach ldquoStacked autoencoders for unsupervised featurelearning and multiple organ detection in a pilot study using4D patient datardquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 35 no 8 pp 1930ndash1943 2013

[31] Y Lv Y Duan W Kang Z Li and F Wang ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16no 2 pp 865ndash873 2015

[32] R Zhao J Wang R Yan and K Mao ldquoMachine healthmonitoring with LSTM networksrdquo in Proceedings of 2016 10thInternational Conference on Sensing Technology (ICST)pp 1ndash6 Nanjing China November 2016

[33] X Ma Z Tao Y Wang H Yu and Y Wang ldquoLong short-term memory neural network for traffic speed predictionusing remote microwave sensor datardquo Transportation

Research Part C Emerging Technologies vol 54 pp 187ndash1972015

[34] E I Vlahogianni M G Karlaftis and J C Golias ldquoOptimizedand meta-optimized neural networks for short-term trafficflow prediction a genetic approachrdquo Transportation ResearchPart C Emerging Technologies vol 13 no 3 pp 211ndash2342005

[35] S Hochreiter and J Schmidhuber ldquoLong short-term mem-oryrdquo Neural Computation vol 9 no 8 pp 1735ndash1780 1997

[36] Keras -e Python Deep Learning Library 2018 httpskerasio

16 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 16: IncomingWork-In-ProgressPredictioninSemiconductor ...downloads.hindawi.com/journals/cin/2019/8729367.pdf · sequentially in time. e IWIP forecast to a tool group is similar to traffic

[17] R Fu Z Zhang and L Li ldquoUsing LSTM and GRU neuralnetwork methods for traffic flow predictionrdquo in Proceedings of2016 31st Youth Academic Annual Conference of ChineseAssociation of Automation (YAC) pp 324ndash328 WuhanChina November 2016

[18] H Shao and B Soong ldquoTraffic flow prediction with longshort-term memory networks (LSTMs)rdquo in Proceedings of2016 IEEE Region 10 Conference (TENCON) pp 2986ndash2989Singapore November 2016

[19] M S Ahmed and A R Cook ldquoAnalysis of freeway traffictime-series data by using Box-Jenkins Techniquesrdquo Trans-portation Research Record vol 773 no 722 pp 1ndash9 1979

[20] I Okutani ldquo-e Kalman filtering approaches in sometransportation and traffic problemsrdquo Transportation ResearchRecord vol 2 no 1 pp 397ndash416 1987

[21] I Okutani and Y J Stephanedes ldquoDynamic prediction oftraffic volume through Kalman filtering theoryrdquo Trans-portation Research Part B Methodological vol 18 no 1pp 1ndash11 1984

[22] H J H Ji A X A Xu X S X Sui and L L L Li ldquo-e appliedresearch of Kalman in the dynamic travel time predictionrdquo inProceedings of 18th International Conference on Geo-informatics pp 1ndash5 Beijing China June 2010

[23] Y Wang and M Papageorgiou ldquoReal-time freeway trafficstate estimation based on extended Kalman filter a generalapproachrdquo Transportation Research Part B Methodologicalvol 39 no 2 pp 141ndash167 2005

[24] Y Bengio ldquoLearning deep architectures for AIrdquo Foundationsand Trendsreg in Machine Learning vol 2 no 1 pp 1ndash1272009

[25] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inProceedings of NIPS Lake Tahoe NV USA December 2012

[26] G E Hinton and R R Salakhutdinov ldquoReducing the di-mensionality of data with neural networksrdquo Science vol 313no 5786 pp 504ndash507 2006

[27] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of 25th ICML pp 160ndash167 HelsinkiFinland July 2008

[28] I J Goodfellow Y Bulatov J Ibarz S Arnoud and V ShetldquoMulti-digit number recognition from street view imageryusing deep convolutional neural networksrdquo 2013 httpsarxivorgabs13126082

[29] B Huval A Coates and A Ng ldquoDeep learning for class-generic object detectionrdquo 2013 httpsarxivorgabs13126885

[30] H C Shin M R Orton D J Collins S J Doran andM O Leach ldquoStacked autoencoders for unsupervised featurelearning and multiple organ detection in a pilot study using4D patient datardquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 35 no 8 pp 1930ndash1943 2013

[31] Y Lv Y Duan W Kang Z Li and F Wang ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16no 2 pp 865ndash873 2015

[32] R Zhao J Wang R Yan and K Mao ldquoMachine healthmonitoring with LSTM networksrdquo in Proceedings of 2016 10thInternational Conference on Sensing Technology (ICST)pp 1ndash6 Nanjing China November 2016

[33] X Ma Z Tao Y Wang H Yu and Y Wang ldquoLong short-term memory neural network for traffic speed predictionusing remote microwave sensor datardquo Transportation

Research Part C Emerging Technologies vol 54 pp 187ndash1972015

[34] E I Vlahogianni M G Karlaftis and J C Golias ldquoOptimizedand meta-optimized neural networks for short-term trafficflow prediction a genetic approachrdquo Transportation ResearchPart C Emerging Technologies vol 13 no 3 pp 211ndash2342005

[35] S Hochreiter and J Schmidhuber ldquoLong short-term mem-oryrdquo Neural Computation vol 9 no 8 pp 1735ndash1780 1997

[36] Keras -e Python Deep Learning Library 2018 httpskerasio

16 Computational Intelligence and Neuroscience

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 17: IncomingWork-In-ProgressPredictioninSemiconductor ...downloads.hindawi.com/journals/cin/2019/8729367.pdf · sequentially in time. e IWIP forecast to a tool group is similar to traffic

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom