predictive maintenance with machine learning on weld joint ......ever since the first industrial...

UPTEC F 19058

Examensarbete 30 hpOktober 2019

Predictive maintenance with machine learning on weld joint analysed by ultrasound

Adam Hedkvist

Teknisk- naturvetenskaplig fakultet UTH-enheten Besöksadress: Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0 Postadress: Box 536 751 21 Uppsala Telefon: 018 – 471 30 03 Telefax: 018 – 471 30 00 Hemsida: http://www.teknat.uu.se/student

Abstract

Predictive maintenance with machine learning on weldjoint analysed by ultrasound

Adam Hedkvist

Ever since the first industrial revolution industries have had the goal to increase theirproduction. With new technology such as CPS, AI and IoT industries today are goingthrough the fourth industrial revolution denoted as industry 4.0. The new technologynot only revolutionises production, but also maintenance, making predictivemaintenance possible. Predictive maintenance seeks to predict when failure wouldoccur, instead of having scheduled maintenance or maintenance after failure alreadyoccurred. In this report a convolutional neural network (CNN) will analyse data froman ultrasound machine scanning a weld joint. The data from the ultrasound machinewill be transformed by the short time Fourier transform in order to create an imagefor the CNN. Since the data from the ultrasound is not complete, simulated data willbe created and investigated as another option for training the network. The resultsare promising, however the lack of data makes it hard to show any concrete proof.

ISSN: 1401-5757, UPTEC F 19058Examinator: Tomas NybergÄmnesgranskare: Kristiaan PelckmansHandledare: Håkan Sjörling

Populärvetenskaplig sammanfattning

I dagens industrier så sker en ny industriell revolution. Då fler och fler objekt börjar få en interne-tuppkoppling så blir det lättare för maskiner att samla och dela med sig av data under produktion.Denna data kan användas av artificiell intelligens för att försöka förhindra att fel uppstår underproduktionen. I denna rapport så har data samlas in från en ultraljudsmaskin som undersökeren svetsfog för att sedan analyseras av artificiell intelligens. Det som då vill göras är att jämföradenna artificiella intelligens med det nuvarande programmet som avgör om svetsfogen är godkändeller icke. Då datan som samlades in från ultraljudsmaskinen saknade vissa egenskaper, så hardet även testas att göra egen simulerad data. Resultatet visar att den artificiella intelligensen kanhärma det nuvarande programmet med 95% träffsäkerhet, och på den simulerade datan så är denungefär lika bra som det nuvarande programmet. Detta är ett lovande resultat, men för att kunnadra säkrare slutsatser så behövs det mer och säkrare data från undersökningen av ultraljudet.

ii

Contents1 Introduction 1

2 Problem formulation 2

3 Background 33.1 Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33.2 Ultrasound analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33.3 Time-frequency analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3.3.1 Short time Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . 53.3.2 Wavelet transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.4 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

4 Neural Networks 74.1 Mathematical description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74.2 Learning and optimiser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84.3 Dropout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

5 Convolutional neural networks 95.1 Pooling layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

6 Method 116.1 The data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116.2 Simulating data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116.3 Fully connected neural network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136.4 Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136.5 Convolutional layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146.6 Wavelet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

7 Training and validation 177.1 Fully connected network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177.2 Convolutional neural network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

7.2.1 With Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197.2.2 With wavelet transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

8 Results 23

9 Discussion 249.1 Results analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

iii

1 IntroductionIndustries have always had the goal to produce as much as possible as fast as possible. Someinventions and changes had such a big influence on the world by increasing productivity in industriesthat they classify as industrial revolutions. The first and most renown industrial revolution startedaround 1760, this is when people started going from hand-crafting into machine production [1].Large factories that housed many workers became more common to mass produce items such asclothing, chemicals and iron. A big milestone came when James Watt invented the steam engine.Previous factories had utilised water or wind power, this power is unpredictable and difficult touse. It also meant that factories was limited to certain areas. The steam engine allowed factoriesto be placed almost anywhere and gave a more stable power output, which led to a big increasein production. The second revolution started around 1870, this is when railroads and telegraphlines were built which allowed people and materials to quickly go from one place to another.This is also when the first electrical assembly line was introduced which was a drastic increasein production speed [2]. The third industrial revolution is also known as the digital revolution,this is when digital system overtook analogue systems. This allowed machines in industry to bechanged with the push of a button, instead of mechanical changes and allowed far more flexibilityin production. This age also introduced computers as well as the internet [3]. Lastly is the fourthindustrial revolution, which as of now is ongoing. It has been denoted as Industry 4.0 and it someof its major concepts include Internet of things (IoT), Internet of Services (IoS), cloud computing,machine to machine communication. It is focused on how easily information can be stored andshared with the connection that has arisen with internet and wireless communication. Industry4.0 combines the strength of optimised industrial processes with internet technology. This allowsnew strategies in manufacturing and maintenance, opening pathways to predictive maintenance.

Figure 1: Showcasing the different industrial revolutions [4].

Machine learning allows a computer to learn something without being explicitly programmed forit. The term was first coined by Arthur Samuels [5] when he created a program to play checkers inthe 1950s. Machine learning quickly became popular, but due to the difficulties of saving data, andthe relative slow processors at the time, it never saw its full potential until years later [6]. Insteadexpert system became popular. Expert systems worked by using logical if-then rules and came todominate the AI field in the 1980s. The 1980s saw a big step forward for machine learning withthe invention of backpropagation. It is used by many deep learning methods today, and neuralnetworks started becoming more popular because of it. Early 1980s also saw the first convolutionalneural network [7], the network was called a Time delay neural network which were fixed-size andshared weights along the temporal dimension.

1

2 Problem formulationThis report will look into a weld joint analysed by ultrasound. The ultrasound scans the weldjoint to approximate holes in it. A program analyses the data and either sends the weld joint toconstruction or to a manual scan, if the manual scan fails the weld joint it is sent to re-welding.This report will look into how machine learning can be utilised to analyse the data collected bythe program in an attempt to outperform the program in accuracy of labelling pass or fail on theweld joints. The data collected does not contain the manual scans performed after the programlabelled them as fail. Therefor simulated data will be created in order to compare the program toa machine learning algorithm. The problem that is being investigated can be summarised into thefollowing questions:

• Main query: How can the analysis of ultrasound data from a weld joint be improved withmachine learning?Subqueries:

– How sensitive will the network be to disturbances? If a sensor is moved slightly, and thedata looks a little different, how will that affect the machine learning algorithm, andhow can we deal with it, if it does?

– How long does it take to create an automatic process that with help from machinelearning gives reliable answers?

– How long would it take before you can start cutting down on manual measurements?

– How can machine learning in this scenario deal with and learn from a live data flow?

2

3 Background

3.1 MaintenanceHumans have been using equipment for a long time, and for about an equal amount of time equip-ment have broken down and has had to be replaced. Until the 1950s the only time equipmentwas replaced was when it had broken down and couldn’t be used anymore [8]. This is called cor-rective maintenance. Since corrective maintenance allows failure before maintenance, correctivemaintenance in today’s systems only exist if the consequence of that equipment failing is small,and the repair time is short. In 1950s Japan, engineers suggested they start with preventive main-tenance. This new strategy lubricated and replaced parts before they failed. This strategy wasuseful to reduce downtime, it started replacing parts at predetermined intervals. Some advantagesof preventive maintenance includes: Energy-savings, cost effectiveness in many capital-intensiveprocesses and have an estimated 12-18% cost savings over corrective maintenance [8]. Howeverit was expensive to switch out parts that could have lasted longer, and sometimes unexpectedfailure occurred, meaning downtime in the systems anyways. As time went on and the digitisationoccurred, it became easier to store data from the behaviour of equipment before they failed. Thisled to the final maintenance, predictive maintenance. Predictive maintenance is in theory an idealmaintenance system, it allows the equipment to run its entire life-course, only swapping it out justbefore it is about to break. This can protect from even unexpected failure, which previously hasn’tbeen possible.

3.2 Ultrasound analysisUltrasound is a very common tool to use for non-destructive testing of materials and products.As the name suggest non-destructive testing does not destroy the product during the test. Theultrasound machine has a transducer and a receiver, the transducer sends continuous pulses ofsound waves between 0.5MHz and 20MHz [9] into the material which is then bounced back to thereceiver. The receiver measure the time discrepancy between the pulse being sent and received andcan then map the interior structure of the material analysed. An example of how the transducerand receiver works can be seen in figure 2.

Figure 2: Ultrasonic testing in materials and how it is interpreted [9].

There are a lot of different ultrasound machines, the one used to analyse the weld joint in thisproject looks like something like shown in figure 3. It rotates around the weld joint to get multipleangles. The data is collected quite a bit differently than suggested in figure 2 as well. The signalis interpreted by a program that reads a specific gate on the x-axis, this gate is designed to fitover the "crack echo" that is seen in figure 3. This is because the area which the gate representsis the area which are the most sensitive to cracks. It then takes only the maximum value from

3

that gate. This is done 1500 times as the ultrasound rotates around the weld joint. This is howthe data looks that the neural network has been trained on in this project. The amplitude on they-axis represents the size of a cavity found in the weld joint. The time represents the data pointstaken when the camera moved around the weld joint.

Figure 3: How the ultrasound spins around the weld joint, and how the data gets represented.

3.3 Time-frequency analysisFourier transform has been dominant in the field of signal analysis for many, many years. Originallydeveloped by Jean Baptiste Joseph Fourier, he showed that any periodic function can be expressedas an infinite sum of periodic complex exponential functions, shown in equation 1.

X(f) =

∫ ∞−∞

x(t) · e−2jπftdt (1)

A long time after this discovery it was generalised to explain non-periodic function, and eventuallydiscrete time signals as well. It is after that generalisation it became a very suitable tool to usefor computer calculations where it is often called the discrete Fourier transform (DFT), shown inequation 2.

Xk =

N−1∑n=0

xn · e−i2πN kn (2)

However the Fourier transform can be expensive to use on its own. Thankfully there is a fasteroption in the form of the fast Fourier transform, developed by James Cooley and John Tukey in1965 [17]. What the fast Fourier transform does is it divides the original signal into sub-pieces ofsize N = N1 ∗ N2. It then performs N1 DFTs of size N2. The most popular split of the CooleyTukey algorithm is to use N2 = 2, this requires N to be a number to the power of two. It isthen possible to use a butterfly diagram to add the DFTs. This is highlighted in figure 4. Thismethod has a complexity by O(n log n) instead of the ordinary discrete Fourier transform whichhas a complexity of O(n2).

4

Figure 4: An illustration of how the butterfly effect works for DFTs.

3.3.1 Short time Fourier transform

One shortcoming of the Fourier transform is non-stationary signals. If a signal changes its frequencyover time, the Fourier transform would also be in constant change, which makes the analysis moredifficult. However, a method to tackle that is the short time Fourier transform (or STFT). TheSTFT takes a small window of the signal, and then performs the Fourier transform on it, seeequation 3. The idea is that in this small window the frequency change over time is very small,meaning that the fast Fourier transform can be used without problems. This also means that theSTFT is dependant on time and frequency as well as having an amplitude. Thus, to plot a STFTyou need a 3D image, or a 2D image with some sort of colour scheme.

FTx(t, f) =

∫t

[x(t) · ω(t− t′)] · e−j2πft (3)

However, the STFT is not perfect either, because it has an inherent problem recognised fromquantum physics. The Heisenberg uncertainty principle. In other words, the window size of theSTFT can not be perfect, if the window size is infinite, the STFT becomes ordinary Fouriertransform. If that is the case, the information of what frequencies exists in the signal is perfect,but there is no information available of during what time these frequencies appear in. If insteadthe window is infinitely small there will be no information of what frequencies exist, and onlyinformation about where in time these frequencies would show up. There are options that haveattempted to solve this problem, one is the wavelet transform.

3.3.2 Wavelet transform

The wavelet transform uses something called multiresolution analysis (MRA). As implied by thename, it utilises different frequencies and resolutions to analyse the signal. The equation for thewavelet transform can be seen in equation 4

CWTψx (τ, s) =1√s

∫x(t)ψ · ( t− τ

s)dt (4)

The wavelet transform is a function of two variables, which corresponds to the translation andscale parameters. ψ(t) corresponds to the transforming function and is usually referred to as themother function. There are some variations as to which mother function is used, with different oneshaving their own strengths and weaknesses. For this project the Ricker wavelet, or more commonlythe Mexican hat wavelet function has been used. Instead of taking a window and applying theFourier transform, the wavelet uses this mother-function to transform the signal. The scale (s inthe equation) can be compared to how scale is used in maps. A high scale shows large areas,but with little detail of the trees and/or roads. A low scale allows more detail, but a lot lessarea. Because of this, the window function is adjustable, and thus you can in scenarios wherethe frequency is rapidly changing lower the scale. This increases the frequency resolution, andwhen the frequency is not alternating as much the scale can be increased again, lowering the timeresolution, but increasing the frequency resolution. Thus the wavelet function has more options totackle the Heisenberg’s uncertainty principle than ordinary STFT.

5

3.4 MetricsThe most common metric to use when investigating classification is accuracy. Accuracy penalisesboth mislabelled positives and negatives in the same way, in reality however, a false positive anda false negative rarely has the same cost to industries. Two other common metrics can be usedto help with this issue, called precision and recall. Accuracy, precision and recall can be seen inequations 5, 6 and 7.

Accuracy =TP + TN

TP + TN + FP + FN(5)

Precision =TP

TP + FP(6)

Recall =TP

TP + FN(7)

Precision is high when false positives is low. Take, for instance, a part of a plane during qualityassurance. It gets classified as "pass" but is actually broken. This could mean later on that theentire plane gets decommissioned until they fix this broken part, leading to an overall higher costfor the company. This is an example when high precision is important. Recall is high when falsenegatives is high. For instance a patient gets the answer whether or not they have cancer. A falsepositive surely is not ideal, but in the end, harmless. A false negative however, would be muchmore devastating. This is an example when high recall is important.

6

4 Neural NetworksNeural networks was first introduced by McCulloch & Pitts [10] as a way to model nervous activity.Neurons are based on the "all or none" characteristic. The "all or none" characteristic means thatwhen the neuron has a certain threshold, if excitation exceeds this threshold it initiates an impulse.The impulse is then propagated to all nearby neurons. This simple model of a neural network waslater to be called a McCulloch-Pitts (MP) network. These characteristics of neurons is very similaras to how the modern neural networks are created, with an input layer and an output layer. Thisin turn inspired Rosenblatt to introduce the concept of association cells which is comparable to thehidden layers used today. The point of these association cells was to learn features from imagesand recognise patterns [11]. Although interest in neural networks was big with the discoveries fromMcCulloch, Pitts and Rosenblatt, it took decades before neural networks started seeing practicaluse. A big step forward was the discovery of backpropagation in neural networks by Werbos [12]in the 1980s. As time went on, and it became easier to store data, in conjunction with the morepowerful processors that arose, neural networks slowly started to show its full potential.

In machine learning and neural networks one usually makes a difference between regression andclassification. Regression is often used to specify a quantity. Classification on the other hand wouldbe to predict a label. The problem in this report is considered a classification problem.

A typical task for neural networks in a classification setting can be to classify images, facial recogni-tion, or speech recognition. In order to perform well on these tasks it needs to train. How the datatrains depends on what kind of training algorithm is used. It is divided into two sub-categories,supervised training and unsupervised training. In this project only supervised learning has beenused. Supervised training is when the data that the network is training on has been given trueoutput data as well. Given an input the network tries to predict the output and it is then comparedto the true data, the error from the guess is then calculated using a loss function. Depending onthe neural network and problem, different loss function has different advantages, some are listedin table 1. The job of the loss function is to determine how wrong the prediction of the network is.The network then uses an stochastic gradient descent optimiser to figure out how much it needsto change to the next iteration.

Table 1: Loss functions.

Name Mathematical formulaQuadratic loss C(y, y) = 1

2 (y − y)2

Binary Cross-entropy loss C(y, y) = −∑Nn=1

(ynln(yn) + (1− yn)ln(1− yn)

)Weighted binary cross-entropy loss C(y, y) = −

∑Nn=1

(ynln(yn) + λ(1− yn)ln(1− yn)

)In unsupervised learning the network is given a set of rules and parameters in order to optimisea task. A score system or something similar is given so the network knows what to optimise, andit trains by defeating itself. This can be a very powerful tool, AIs that are reliant on previousdata can only be as good as the data provided to them. Unsupervised learning allows computersto find their on strategies to evolve which can surpass what previously was known about it. Oneexample of this is google’s alpha-go AI which defeated the best human player in the world [13].For unsupervised learning previous data is not necessary but it can help in some cases to reducetraining time.

4.1 Mathematical descriptionIn figure 5 a visualisation of a neural network is shown. Data is input to the left, and worksthrough the layers going towards the right side. The input layer is connected to hidden layerswhich ultimately connect to the output layer. As seen, each node is connected to every node in thefollowing layer. These connections are called weights, where one weight is represented by a realnumber. The layers can be defined as a vector al, where l is the number of the layer. The layersconnected to each other can be represented by a matrix wl where the element wlij is the connection

7

from neuron j in layer l−1. The biases are an extra node which introduces an offset in the network,if necessary. The bias weights can be represented by a vector bl. With this information and giventhe previous layer al−1 it is now possible to calculate the next layer as a linear combination ofneurons in the previous layer:

al = wlal−1 + bl (8)This is still only a linear combination, however many hidden layers is used. To introduce non-linearity a function σ is implemented in each layer, called the activation function. This function isapplied on equation 8 such as:

al = σ(wlal−1 + bl) (9)This is called a "fully connected network", and is a common network in neural networks. Thereare other versions such as recurrent neural network and convolutional neural network, which willbe delved deeper into later.

Figure 5: A visualisation of a feed forward neural network.

There exist a lot of different activation functions, some work better than others in different cases.In table 2 some different kinds of activation functions are listed. For this project mainly RectifiedLinear Unit (ReLU) has been used. ReLU has been chosen as it has consistently been performingvery well and it is shown that deep convolutional networks with ReLU trains several times fasterthan other activation functions such as f(x) = tanh [15]. When the network has progressed throughthe different layers, it comes to the output. However in the last hidden layer, many networks havea different activation function. Classification networks usually have a sigmoid or softmax, this isbecause ReLU does not normalise the values after each layer, and for classification it helps if theoutput is between zero and one, which both the softmax and sigmoid activation function does.This helps interpreting the results as they sum up to one, thus if you have two possible outcomes(like the scenario in this report), if one of them has the value 0.1 and the other 0.9, one can saythat option two have a 90% likelihood of occurring.

Activation function Mathematical descriptionReLU max(0, x)

Sigmoid eyi∑jeyj

Softmax fi(~x) =exi∑Jj=1 e

xj

Tan f(x) = tanh(x)

Table 2: Examples of activation functions

4.2 Learning and optimiserAs mentioned the network predicts an answer, and then the loss function is calculated which tellsthe network the error of its prediction. Therefore, by minimising the loss function corresponds

8

to maximising the predicting probabilities of the network. Loss functions usually have differentstrengths, in table 1 there are three different loss functions. Quadratic loss is common to use inregression task. In classification tasks, cross entropy is very common to use.When given an error, the network uses backpropagation to calculate the gradients of the error.Since the gradients of the error is known, one can use stochastic gradient descent methods tofind minimums within the error domain. There are a few options for this, the ones tested in thisproject is the Adam optimiser and stochastic gradient descent (SGD). Stochastic gradient descentis the base that which the other optimisers are built upon. It can be a bit slower than the neweroptimisers, but it is very powerful. The Adam optimiser has been tested on various machinelearning algorithms [14] and has performed very well in those scenarios. It has the ability to utilisemoment, so if it finds a local minimum, it will initially move past it. This is helpful because itincreases the chances of finding the global minimum.

4.3 DropoutWhen training a neural network for too long it runs the risk of being overtrained, also referred toas overfitting. What that means is that the network have stopped finding patterns in the data,and instead have started to memorise specific cases. This lowers the generality of the network andmakes it harder for it to learn something new. One common method to reduce this risk is calleddropout. Dropout works by freezing a certain percentage of nodes in the network, since these arechosen randomly, the network has to make sure that all nodes must be active. This can slow downtraining by a small margin, but it increases the generalisation of the network.

5 Convolutional neural networksConvolutional neural network is designed for image recognition tasks. It is similar to the fullyconnected described above, it uses a input vector, some hidden layers and an output vector. It alsouses biases and activation functions. Instead of having a series of fully connected layers it utilisesconvolutional layers combined with pooling layers. It takes advantage of the fact that the inputis an image, so it arranges the neurons in three dimensions, height width and depth. Height andwidth is the same as the pixel size of the picture, and the depth is the colour coding, i.e one forgrayscaling and three for RBG. It utilises a window (often called kernel) which strides over thepixels in the image taking in the pixel values. This allows the network to extract features fromthe image. The more layers, and the larger they become, the more features can be saved from theimage. See figure 6 for how the kernel strides over the pixel values.

Figure 6: An example of how kernel strides in convolutional neural networks [16].

The kernel is multiplied with the image point-wise and creates a new "image" or feature map asthey are more commonly called. The new feature map is based on whatever feature that becamehighlighted in the previous one, but they are usually less generalised.

Mathematically speaking, the equation for calculating the output from a feature map in a convo-lutional network can be seen in equation 10 [19].

9

G[m,n] =∑j

∑k

h[j, k]f [m− j, n− k] (10)

Here, the input image is denoted by f and the kernel by k. The rows and column index are denotedby m and n respectively. The addition of the bias and activation function from this stage is donein the same way as the fully connected network. Dropout in convolutional neural network hasbeen discussed thoroughly with many arguing that dropout does not have the desired benefits inconvolutional neural network that it has in fully connected networks. Convolutional networks tendto have much lower parameters and connections than other networks due to the local-connectivityand shared-filter architecture. This means that it is less prone to overfitting already. However,according to [20] dropout can have great results even in convolutional networks.

5.1 Pooling layersA part of convolutional neural networks are the pooling layers. They make sure to scale down thenetwork, which is useful because too many feature maps can slow down a large network. Figure 7is a visualisation of how max pooling looks like. It works by striding over the feature map with thetensor, and instead of writing all values in the output it only writes the one with the most impacton the network, i.e the highest number. This allows the most "important" features to remain,while seeding out the unimportant ones.

Figure 7: A visualisation of max-pool layer, the highest value goes into the next feature map.

10

6 MethodAs described in section one, the goal is to create a neural network to analyse data from a weldjoint. This can be divided into four different milestones. The first milestone is being able to readand properly process the data. After the preprocessing the data was sent into a fully connectednetwork. The network itself shows what to expect from a simple solution, and gives a good refer-ence point for further improvements. It also shows that the data can be properly analysed by aneural network, more about the data exists in section 6.1. How the fully connected network wasimplemented can be seen in 6.3. The second milestone is to simulate data. Since the data is createdfrom the algorithm, a neural network trained on this data simply mimics this algorithm. So thesimulated data gives a different perspective, as it allows us to compare the existing algorithm withthe neural network, more about this in section 6.2. The third milestone is transforming the data,this is done in two separate ways, once with the short time fourier transform (STFT) and with thewavelet transform, more details can be found in section 6.4. This allows us to create images whichis needed for the final milestone, creating a convolutional neural network, this implementation isdescribed in more detail in section 6.5.

Since the goal is to create a neural network, i used Keras which is an open source library createdby Google. It is built on top of existing libraries, and utilises these libraries as background, with-out having to learn those specifically. The background chosen was Tensorflow, and the reason forthis is Tensorflow have implemented GPU-support for neural networks which can increase speedsignificantly [21]. The code is written in entirely in Python version 3.6.7. The computer used is aLenovo Legion Y530-15ICH with an Intel i5-8300H core as processor and a NVIDIA Geforce GTX1050TI graphics card.

In order to motivate the use of a convolutional neural network, a simple logistic regresion wasfirst performed on the data. This logistic regression uses the limited memory Broyden-Fletcher-Goldfarb-Shannon algorithm for training. It trained for 100 iteration and achieved an accuracy of79%.

6.1 The dataThe data that the neural network was trained on come from an algorithm that classifies pass orfail based on a set of parameters. An example of a failed run and a passed run exists in figure 8,the weld joint gets marked as fail if it fulfils one of two conditions. If it rises above the value 60in 3 consecutive data points, or if it rises above the value 40 in 3 consecutive data points between45◦ to 135◦ or 225◦ to 315◦. This area corresponds to the points 187-562 and 937-1312 on thex-axis. This is because this area of the weld joint is more sensitive to wear and tear. The datawas in csv-files with about 40000 scans in total, each scan having 1500 data points. In the 40000scans, about 28000 was labelled as pass, and 12000 was labelled as fail. However, class balancingwas used to average out the pass and fail, making the total amount of data points approximately24000. In figure 8 you can also see the effects of the more sensitive parts of the weld joint, fig-ure 8a is well above 40 in the middle, but since the area is not sensitive, the algorithm does notput it as fail. In 8b however the spike is within the more sensitive region, thus it gets labelled as fail.

6.2 Simulating dataIn order to more easily compare the results between the current algorithm and the neural network,simulated data was created. From inspecting a large amount of data, some reoccurring patternsin the data was found. The most common being spikes that lasted for about 40-100 data points inrange. In figure 8a there is a spike starting just after timestamp 750. These spikes was often thereason as to why the data was labelled as fail.

With this in mind the simulation started with taking the average and standard deviation of allcollected data. Each simulation create a new randomised average and standard deviation. Thiscreates a larger variance between simulations, without having the data in the simulations becomingtoo volatile. Then, for every point in the new simulation, the value is normally distributed fromthis average and standard deviation. The simulations amplitude increases or decreases in value if

11

(a) An example of data labelled as pass. (b) An example of data labelled as fail.

Figure 8: Two examples of data.

the new randomised value is higher or lower than the previous one. A feature the data had wasthat the relative change between two points are quite low, if a change of value would occur it willchange by only 10% of the new value. This created a good baseline similarity. However, withoutspikes, almost all data would label as pass. In order to get these spikes randomly occurring peakswas added. The number of peaks depends on the randomised standard deviation, higher standarddeviation means more peaks. When the simulation comes to an area where there is a peak, fora certain duration it only goes up by 10% of the new value, even if the new value is lower thanthe old value. Once the "peak" duration has ended, the simulation goes back to the baseline. Itsrare, but several peaks can overlap, however this behaviour has been recognised in the data on rareoccasions as well, thus it was left in on purpose. The reason why it was left in is because values ofthe simulation cannot exceed 100, nor go below zero, instead it stays at 100 or zero. Finally thedata is read into csv-files, in the same way the original data is stored. Beneath is pseudo code forhow the program works:

Algorithm 1 Pseudocode for simulating the data.1: Require: µ: Average of all data2: Require: σ: Standard deviation of all data3: Require: N : Number of simulations4: Require: Loc_start Array with start locations for the peaks5: Require: Loc_end End location for the peaks6: for N do {7: New value = Normal Distribution(µ,σ)8: if N = Loc_start then {9: for Loc_start to Loc_end do

10: Value(N) = Previous value + Peak_value11:12: N+=1

}13: if New value > Previous value then {14: Value(N) = Previous value + 0.1*new value15: else16: Value(N) = Previous value - 0.1*new value

}17: Return Value

The next step was to find a way to label the data, to know what classifies as a pass and fail.Using the same labelling as the algorithm would make the simulated data indistinguishable fromthe non-simulated data and would void the whole points of creating it in the first place. Therefor anew method of labelling the data as created. This method loops over all the values in the run, andif there are three consecutive data points exceeding the average value by 20 or more, the data islabelled as fail. The reason this was chosen is because during the ultrasound analysis, the data has

12

parameters such as gain. This can create an offset which could make the data unreadable. Thisoffset affects the algorithm, but it wouldn’t affect this method, since the average would increase inthe same amount as the offset. 40 000 pieces of data was simulated with this method. It labelled27% of them as fail, and the rest as pass. To compare to the non-simulated data, it had 28%labelled as fail, and the rest as pass.

6.3 Fully connected neural networkThe first solution that came to mind when starting the project was to create a fully connectedneural network. Since the data is only 1500 data points per run, it is possible to use all of it as inputto the neural network. So, the first thing to do was to read the csv-files of data and load it into thenetwork. Corrupted data with less than 1500 data points was deleted to make sure the networkcan be trained smoothly. In 9 you can see some different sizes tried out for the fully connectednetwork. The activation function used was ReLU, cross entropy as loss function and Adam as theoptimiser. The data was divided into a training set and test set. The test set was used for bothtesting and validation, it consisted of 20% of the total data. The data was also normalised beforetraining. Training was very fast for the fully connected network, it was trained for 10 epochs, witheach epoch taking around 3-4 seconds. When the first network had been created using the existingdata, another one was created on simulated data. This network went through the same process forfinding parameters. The parameters that were best for the network trained on ordinary data werealso the best parameters for the network trained on simulated data.

Table 3: The layers and their respective activation function used to test the fully connected network,for both simulated and non-simulated data.

First layer Second layer Third layer Output layer256 ReLU 128 ReLU 64 Softmax 2128 ReLU 64 ReLU 32 Softmax 264 ReLU 32 ReLU 16 Softmax 2

Simulated data256 ReLU 128 ReLU 64 Softmax 2128 ReLU 64 ReLU 32 Softmax 264 ReLU 32 ReLU 16 Softmax 2

6.4 TransformsSince the data represents a signal, it is possible to apply the Fourier transform to get some infor-mation about its frequency. In figure 9 is all the transform used on the same piece of data. Togenerate the wavelet transform and the STFT the scipy package signal was used in python. Forthe Fourier transform python has a built in function that does the transform. The FFT has thesame size as the original data. The STFT had each window at length 15 with an overlap of 7and the length of the FFT used in the STFT was 250. For the wavelet transform a width of 100points were used, the mexican hat was used as mother function. The STFT with these parametersgenerated an image with the size of 126x189 pixels, and the wavelet transform generated an imagewith size 300x100.

13

(a) An example of data labeled as pass. (b) An example of data transformed by the wavelettransformation.

(c) An example of data transformed by theShort time fourier transform.

(d) An example of data transformed by the ordinaryfast fourier transform.

Figure 9: An example of data and how it has been transformed.

6.5 Convolutional layerAfter the data had been read and transformed it is once again saved on the hard drive in png-files.To start training the convolutional network the data is read from the hard drive into an arraywhich contains all the images created. After the images is loaded into the array they are separatedinto training and validation sets, where 20% of the data is for validation. The images can thenbe read into the convolutional network one by one for training. The convolutional network hasbeen tested with many different hyperparameters, in table 8 some of the parameters tested canbe seen. Since there are so many hyperparameters, the optimisation went through two stages, theone below tries to decide the best size of the layers, as well as the best dropout parameter. Theyall were run with the cross entropy as loss function, Adam optimiser as learning algorithm, andwith a kernel size of 3. Lastly they were all run two times, one with no weight, and one wherefalse positives had a weight of 5. The networks created from the simulated data went through thesame process.

14

Table 4: The layers, their respective activation function and the range of the dropout with aninterval of 0.1.

Non-simulated dataFirst layer Second layer Third layer Output layer

2 ReLU 0 - 0.5 4 ReLU 0 - 0.5 8 Softmax 0 24 ReLU 0 - 0.5 8 ReLU 0 - 0.5 16 Softmax 0 28 ReLU 0 - 0.5 16 ReLU 0 - 0.5 32 Softmax 0 216 ReLU 0 - 0.5 32 ReLU 0 - 0.5 64 Softmax 0 232 ReLU 0 - 0.5 64 ReLU 0 - 0.5 128 Softmax 0 264 ReLU 0 - 0.5 128 ReLU 0 - 0.5 256 Softmax 0 2128 ReLU 0 - 0.5 256 ReLU 0 - 0.5 512 Softmax 0 2

Simulated data2 ReLU 0 - 0.5 4 ReLU 0 - 0.5 8 Softmax 0 24 ReLU 0 - 0.5 8 ReLU 0 - 0.5 16 Softmax 0 28 ReLU 0 - 0.5 16 ReLU 0 - 0.5 32 Softmax 0 216 ReLU 0 - 0.5 32 ReLU 0 - 0.5 64 Softmax 0 232 ReLU 0 - 0.5 64 ReLU 0 - 0.5 128 Softmax 0 264 ReLU 0 - 0.5 128 ReLU 0 - 0.5 256 Softmax 0 2128 ReLU 0 - 0.5 256 ReLU 0 - 0.5 512 Softmax 0 2

Since a false positive is a lot more expensive and dangerous all runs seen in table 8 not only lookedat accuracy and loss as a results, but also the amount of false positives it had. The four networksparameters can be seen in table 5.

Table 5: The top performing networks, their respective activation function and their dropout.

Non-simulated dataNetwork First layer Second layer Third layer Output layerCNN1 64 ReLU 0 128 ReLU 0 256 Softmax 0 2CNN2 16 ReLU 0 32 ReLU 0 64 Softmax 0 2CNN3 16 ReLU 0.2 32 ReLU 0.2 64 Softmax 0 2CNN4 4 ReLU 0.2 8 ReLU 0.2 16 Softmax 0 2

Simulated dataSCNN1 8 ReLU 0 16 ReLU 0 32 Softmax 0 2SCNN2 32 ReLU 0 64 ReLU 0 128 Softmax 0 2SCNN3 32 ReLU 0.1 64 ReLU 0.1 128 Softmax 0 2SCNN4 64 ReLU 0.3 128 ReLU 0.3 256 Softmax 0 2

To scale down the number of networks that had to be created for the optimisation, these parameterswere kept while changing some other parameters. The new networks this time instead changedbetween stochastic gradient descent (SGD) as learning optimiser and Adam optimiser. It alsochanged the kernel size of the networks and tried two different weight parameters. The newnetworks parameters are shown in table 6.

15

Table 6: The seconds iteration of parameter optimisations, looking into optimisers, kernel sizesand weight.

Non-simulated data Simulated dataNetwork optimiser Kernel size Weight Network optimiser Kernel size WeightCNN1 Adam 2 - 5 1 SCNN1 Adam 2 - 5 1CNN2 Adam 2 - 5 1 SCNN2 Adam 2 - 5 1CNN3 Adam 2 - 5 1 SCNN3 Adam 2 - 5 1CNN4 Adam 2 - 5 1 SCNN4 Adam 2 - 5 1CNN1 Adam 2 - 5 5 SCNN1 Adam 2 - 5 5CNN2 Adam 2 - 5 5 SCNN2 Adam 2 - 5 5CNN3 Adam 2 - 5 5 SCNN3 Adam 2 - 5 5CNN4 Adam 2 - 5 5 SCNN4 Adam 2 - 5 5CNN1 SGD 2 - 5 1 SCNN1 SGD 2 - 5 1CNN2 SGD 2 - 5 1 SCNN2 SGD 2 - 5 1CNN3 SGD 2 - 5 1 SCNN3 SGD 2 - 5 1CNN4 SGD 2 - 5 1 SCNN4 SGD 2 - 5 1CNN1 SGD 2 - 5 5 SCNN1 SGD 2 - 5 5CNN2 SGD 2 - 5 5 SCNN2 SGD 2 - 5 5CNN3 SGD 2 - 5 5 SCNN3 SGD 2 - 5 5CNN4 SGD 2 - 5 5 SCNN4 SGD 2 - 5 5

After analysing this test, looking into the false positives, the validation accuracy, and the validationloss, four finalists were picked from the non-simulated and simulated networks, and their parameterscan be seen in 7.

Table 7: The finalists that were taken from the second iteration of parameter optimisation.

Non-simulated data Simulated dataNetwork optimiser Kernel size Weight Network optimiser Kernel size WeightCNN2 Adam 4 5 SCNN1 Adam 3 1CNN3 Adam 5 1 SCNN2 SGD 4 5CNN4 Adam 2 5 SCNN4 Adam 4 1CNN4 Adam 5 5 SCNN4 Adam 5 1

6.6 WaveletIn addition to the STFT, network were trained on images transformed by the wavelet transform.The procedure here was very similar to the one done with the Fourier transform explained in thesection above. The wavelet transform constructed images out of the data that were saved onto thehard drive as PNG-files, which were read and used as input to the network. Below is a table whichhighlights the parameters that were tested with the wavelet transform. Cross entropy was used asloss function and Adam was used as optimiser.

Table 8: The layers, their respective activation function and the range of the kernel size with aninterval of one, as well as the weight on the loss function.

First layer Second layer Third layer Output layer Weight4 ReLU 2 - 5 8 ReLU 2 - 5 16 Softmax 0 2 516 ReLU 2 - 5 32 ReLU 2 - 5 64 Softmax 0 2 564 ReLU 2 - 5 128 ReLU 2 - 5 256 Softmax 0 2 5

Simulated data4 ReLU 2 - 5 8 ReLU 2 - 5 16 Softmax 0 2 516 ReLU 2 - 5 32 ReLU 2 - 5 64 Softmax 0 2 564 ReLU 2 - 5 128 ReLU 2 - 5 256 Softmax 0 2 5

16

7 Training and validationThis section will show how the networks perform during their training and testing. The networksthat will be shown are the fully connected network trained on non-simulated and simulated data.The convolutional network trained on images transformed by STFT on non-simulated and simu-lated data. Lastly the convolutional network trained on images created by the wavelet transformon both non-simulated and simulated data. The 24000 datapoints are divided into 80% trainingdata and 20% validation data. The false positives (FP) seen in the tables are based on the valida-tion data.

7.1 Fully connected networkThis will highlight the training and validation of the fully connected network. In table 9 the colourcorresponding to each network is shown.

Table 9: The layers and their respective activation function used to test the fully connected network,for both simulated and non-simulated data.

Non-simulated dataColour First layer Second layer Third layer Output layer FPBlue 256 ReLU 128 ReLU 64 Softmax 2 8.93%

Orange 128 ReLU 64 ReLU 32 Softmax 2 9.75%Green 64 ReLU 32 ReLU 16 Softmax 2 10.6%

Simulated dataOrange 256 ReLU 128 ReLU 64 Softmax 2 12.7%Green 128 ReLU 64 ReLU 32 Softmax 2 9.58%Blue 64 ReLU 32 ReLU 16 Softmax 2 39.7%

(a) The training accuracy of the fully connectedlayer on non-simulated data over 10 epochs.

(b) The training accuracy of the fully connected layeron simulated data over 10 epochs.

Figure 10: The training accuracy of the fully connected layers for both the simulated and non-simulated data.

17

(a) The training loss of the fully connectedlayer on non-simulated data over 10 epochs.

(b) The training loss of the fully connectedlayer on simulated data over 10 epochs.

Figure 11: The training loss of the fully connected layers for both the simulated and non-simulateddata.

(a) The validation accuracy of the fully connectedlayer on non-simulated data over 10 epochs.

(b) The validation accuracy of the fully connectedlayer on simulated data over 10 epochs.

Figure 12: The validation accuracy of the fully connected layers for both the simulated and non-simulated data.

(a) The validation loss of the fully connectedlayer on non-simulated data over 10 epochs.

(b) The validation loss of the fully connectedlayer on simulated data over 10 epochs.

Figure 13: The validation loss of the fully connected layers for both the simulated and non-simulated data.

18

7.2 Convolutional neural networkThis section will show how the convolutional network performed during training, for both theversion trained on data from the STFT and wavelet transform.

7.2.1 With Fourier transform

Table 10: The layers and their respective activation function used to test the fully connectednetwork, for both simulated and non-simulated data.

Non-simulated dataColour Network Optimiser Kernel size Weight FPRed CNN2 Adam 4 5 0.48%Blue CNN3 Adam 5 1 1.35%

Orange CNN4 Adam 2 5 0%Green CNN4 Adam 5 5 0.083%

Simulated dataBlue SCNN1 Adam 3 1 2%Red SCNN2 SGD 4 5 0.13%

Orange SCNN4 Adam 4 1 0.65%Green SCNN4 Adam 5 1 1.52%

(a) The training accuracy on non-simulated data. (b) The training accuracy on simulated data.

Figure 14: The training accuracy of the four top performing convolutional networks trained ondata transformed by the Fourier transform for both simulated and non-simulated data.

(a) The training loss on non-simulated data. (b) The training loss on simulated data.

Figure 15: The training loss of the four top performing convolutional networks trained on datatransformed by the Fourier transform for both simulated and non-simulated data.

19

(a) The validation accuracy on non-simulated data. (b) The validation accuracy on simulated data.

Figure 16: The validation accuracy of the four top performing convolutional networks trained ondata transformed by the Fourier transform for both simulated and non-simulated data.

(a) The validation loss on non-simulated data. (b) The validation loss on simulated data.

Figure 17: The validation loss of the four top performing convolutional networks trained on datatransformed by the fourier transform for both simulated and non-simulated data.

7.2.2 With wavelet transform

Table 11 shows which graph corresponds to which colour in the plots.

Table 11: The networks from section 6 and their corresponding colours. The layers are parametersare the size, activation function and kernel size.

Non-simulated dataColour First layer Second layer Third layer Output layer Weight FPRed 64 ReLU 5 128 ReLU 5 256 Softmax 5 2 5 2.8%Blue 16 ReLU 3 32 ReLU 3 64 Softmax 3 2 5 5.9%

Orange 64 ReLU 3 128 ReLU 3 256 Softmax 3 2 5 3.6%Green 64 ReLU 4 128 ReLU 4 256 Softmax 4 2 5 1.9%

Simulated dataOrange 4 ReLU 5 8 ReLU 5 16 Softmax 5 2 5 0.73%Blue 16 ReLU 2 32 ReLU 2 64 Softmax 2 2 5 0.83%Green 64 ReLU 2 128 ReLU 2 256 Softmax 2 2 5 4.1%Red 64 ReLU 4 128 ReLU 4 256 Softmax 4 2 5 2.8%

20

(a) The training accuracy on non-simulated data. (b) The training accuracy on simulated data.

Figure 18: The training accuracy of the top performing networks trained on simulated and non-simulated data transformed by the wavelet transformation.

(a) The training loss on non-simulated data. (b) The training loss on simulated data.

Figure 19: The training accuracy of the top performing networks trained on simulated and non-simulated data transformed by the wavelet transformation.

(a) The validation accuracy on non-simulated data. (b) The validation accuracy on simulated data.

Figure 20: The validation accuracy of the top performing networks trained on simulated andnon-simulated data transformed by the wavelet transformation.

21

(a) The validation loss on non-simulated data. (b) The validation loss on simulated data.

Figure 21: The validation loss of the top performing networks trained on simulated and non-simulated data transformed by the wavelet transformation.

22

8 ResultsAs seen in the above results, the convolutional neural network done with the STFT far outperformsthe fully connected network and the convolutional network done with the wavelet transform. Sothis network was chosen to compare to the original algorithm to see how well it would stand up toit. The confusion matrices shows which network is being tested and how it performed compared tothe algorithm, this is done on simulated data, as the algorithm would simply have 100% accuracyon non-simulated.

AlgorithmTotal accuracy: 89.8%

Precision: 84.9% Recall: 93.6%True diagnosis

Positive Negative

Prediction Positive 1827 324Negative 125 2105

Orange NetworkTotal accuracy: 92.1%


Positive Negative


Green NetworkTotal accuracy: 94.6%


Positive Negative


Blue NetworkTotal accuracy: 94.9%


Positive Negative


Red NetworkTotal accuracy: 84.5%


Positive Negative


23

9 DiscussionLooking back to section 2, there was several question that wanted to be answered in this report.The main one is how machine learning can help analyse ultrasound data on a weld joint. Theresults indicate that this also is the case. Although, with the lack of complete data, and theuncertainty of simulated data, it is hard to say for sure. Looking at the subqueries, they are veryfocused around the implementation of the algorithm in the industrial setting. This is because theoriginal idea was to do a wholesome project. First to develop a neural network and then researchas to how one would implement a neural network into a industrial setting. This was logisticallyvery difficult, and more time was instead spent on developing and testing the neural network.

9.1 Results analysisThe logistic regression performed with an accuracy of 79%. This shows that the problem at handis not a linear problem, and more advanced methods, such as neural network is better equipped tohandle it. The results from the convolutional network also solidifies this, as the accuracy increasedby such a large amount.

When testing the first fully connected network the expected outcome of the network was for it tohave a very high accuracy. The data from the algorithm is basically just a threshold with a fewif-clauses. However, when the FCCs accuracy was so low, the performance of the convolutionalnetwork was surprisingly impressive.

The importance of false positives was something that was carefully considered during the durationof the project. Using a weighted cross entropy as the loss function allowed a rather good controlover the amount of false positives in validation. This is also why instead of making just one net-work, four is highlighted. The amount of false positives increases as the accuracy goes up, but italso highlights the difficulty of having zero false positives, as the lowest on simulated data nevereven got to zero, and it performed far below the algorithm.

The results only contains the values of accuracy taken from the simulated data. However, theaccuracy on the non-simulated data is also a very important to take into consideration. Even ifthe simulated data is similar to the real data, it is impossible to make a simulation that catches allthe outlying data points. It is also impossible to say that the network would have 95% accuracy onthe real data, since the algorithm probably labels some outlying data incorrect. However, having95% accuracy shows that the network is good at finding patterns in the real data, which wouldbode well for future improvements.

As mentioned earlier, the false positives weighs more heavily than the false negative. This lessensthe viability of accuracy as a metric. Precision and recall have been added to find some sort ofbalance as to how the results should be interpreted. The two of them have problems of their own,precision would have an amazing score if the network classified 99% as false, with the exception ofthose with a very high probability of being true. This is because it only scales with true positivesand false positives. Recall is not affected by false positives, which is the more important parameterin this case. In the end, accuracy supported by precision was the metrics mainly used whenanalysing and picking networks for the results.

9.2 Future workThere are some ways to further develop this project, most importantly collecting real data. Thereare things that can be done without collecting data such as changing the structure of the network.There are many ways to create a neural network, one could try a recurrent neural network, morespecifically a long short term memory network. This could be interesting because as seen in thedata, a faulty weld joint has multiple data points building up towards a peak. Having a memorywhich remembers the previous data points might be able to draw some other conclusions than thecurrent one. Other things could include further parameter optimisation, more data, data that hasbeen more preprocessed. One could try a different approach than the Fourier or wavelet transfor-mation.

24

The best option to compare the algorithm with the neural network would be to have the datafrom the manual scans to identify the false negatives. This would allow a fair comparison betweenthe neural network and the algorithm, which would make this report have a fairly straightforwardanswer in whichever has the best performance.

If one would to gather data directly from the output of the ultrasound, some problems that arosewith the data itself might be eliminated. Remember figure 2, as mentioned it only collects datafrom the highest value in the gated section. If you would instead utilise all the data, it can bringseveral benefits. For one, there are many instances in the data which every point is just zero. Thisis because when the data is collected, it is collected above a certain noise threshold, and sometimesdue to disturbances, this threshold is above the values of the scan. Convolutional network haveshow many times that they are very noise resistant. Another thing is that there would be a lotmore data, it would be possible to simply use the fast Fourier transform on the raw data, and stillbe able to create a time-series. This would hopefully increase the resolution in the images creatingmore details for the neural network to analyse.

9.3 ConclusionThe results of the convolutional network is something that look promising. The simulation of thedata went a lot better than expected. The amount of pass and fails in the simulated data and thereal data are very similar. The accuracy of the convolutional neural network is also very similaron both data sets. The downside here is that the labelling chosen is almost completely arbitrary.If one would have some other method of analysing ultrasound data and being able to use that forlabelling this data the results would be more trustworthy.

The lack of data makes it difficult to draw any concrete solutions. However, by having a impressiveresult on the non-simulated data, and having better result on the simulated data, the overall resultsis promising.

25

References[1] https://www.britishmuseum.org/research/publications/online_research_catalogues/paper_money/paper_money_of_england__wales/the_industrial_revolution.aspx (Visited 10/07/2019)

[2] https://richmondvale.org/en/blog/second-industrial-revolution-the-technological-revolution(Visited 10/07/2019)

[3] https://www.worldatlas.com/articles/what-was-the-digital-revolution.html (Visited 16/1/2019)

[4] https://www.presentationpoint.com/blog/data-signals-triggers-industry-4-0/(Visited 2019-03-26)

[5] A. L. Samuel. July 1959. Some Studies in Machine Learning Using the Game of Checkers, inIBM Journal of Research and Development, vol. 3, no. 3, pp. 210-229.

[6] https://en.wikipedia.org/wiki/Machine_learning#History_and_relationships_to_other_fields (Visited 16/1/2019)

[7] https://en.wikipedia.org/wiki/Convolutional_neural_network#History(Visited 16/1/2019)

[8] Zhe Li 2018 Deep learning driven approaches for Predictive maintenance Doctoral thesis atNTNU

[9] Introduction to Non-Destructive Testing Techniques, Dr. Ala Hijazi (found at https://eis.hu.edu.jo/acuploads/10526/ultrasonic%20testing.pdf)

[10] McCulloch W. S. & Pitts W. 1943. A Logical Calculus of the Ideas Immanent in NervousActivity. The Bulletin of Mathematical Biophysics, (Vol. 5, Issue 4): pp. 115-133.

[11] Rosenblatt F. 1958. The Perceptron: A Probabilistic Model for Information Storage and Or-ganization in the Brain. Psychological Review, (Vol. 65, no. 6): pp. 386-408.

[12] Werbos P. J. 1974. Beyond Regression: New Tools for Prediction and Analysis in the Be-havioural Sciences. Ph.D. thesis, Harvard University.

[13] David Silver, Julian Schrittwieser, Karen Simonyan et al. Mastering the Game of Go withDeep Neural Networks and Tree Search, Nature 529 (7587): 484–489 (January 2016)

[14] Kingma D. P. & Ba J. L. 2015. Adam: A Method for Stochastic optimisation. 3rd InternationalConference for Learning Representations, San Diego.

[15] Krizhevsky A., Sutskever I. & Hinton G. E. 2012. ImageNet Classification with Deep Convo-lutional Neural Networks NIPS 25: pp. 1106-1114.

[16] Michele Cavaioni 24/2-2018, DeepLearning series: Convolutional Neural Networks

[17] Clara Holloway, Graz, March 2014, Real time spectrogram inversion

[18] - JCGM 2008 International vocabulary of metrology — Basic and general concepts and asso-ciated terms

[19] https://towardsdatascience.com/gentle-dive-into-math-behind-convolutional-neural-networks-79a07dd44cf9- Visited 2019-05-07

[20] Haibing Wu, Xiaodong Gu, 2015. Towards Dropout Training for Convolutional Neural Net-works Department of Electronic Engineering, Fudan University, Shanghai200433, China.

[21] Dan C. Cire san et al, Technical Report No. IDSIA-01-11January 2011, Dalle Molle Insti-tute for Artificial Intelligence Galleria 2, 6928 Manno, Switzerland, High-Performance NeuralNetworks for Visual Object Classification

[22] Robi polikar, 05/11/2006, The wavelet tutorial, second edition, part one

26

https://www.britishmuseum.org/research/publications/online_research_catalogues/paper_money/paper_money_of_england__wales/the_industrial_revolution.aspx



https://richmondvale.org/en/blog/second-industrial-revolution-the-technological-revolution

https://www.worldatlas.com/articles/what-was-the-digital-revolution.html

https://www.worldatlas.com/articles/what-was-the-digital-revolution.html

https://www.presentationpoint.com/blog/data-signals-triggers-industry-4-0/

https://en.wikipedia.org/wiki/Machine_learning#History_and_relationships_to_other_fields

https://en.wikipedia.org/wiki/Machine_learning#History_and_relationships_to_other_fields

https://en.wikipedia.org/wiki/Convolutional_neural_network#History

https://eis.hu.edu.jo/acuploads/10526/ultrasonic%20testing.pdf

https://eis.hu.edu.jo/acuploads/10526/ultrasonic%20testing.pdf

https://towardsdatascience.com/gentle-dive-into-math-behind-convolutional-neural-networks-79a07dd44cf9

predictive maintenance with machine learning on weld joint ......ever since the first industrial...

Documents