data mining for quality control: burr detection in the drilling process

Computers & Industrial Engineering 60 (2011) 801–810

Contents lists available at ScienceDirect

Computers & Industrial Engineering

journal homepage: www.elsevier .com/ locate/caie

Data mining for quality control: Burr detection in the drilling process q

Susana Ferreiro a,⇑, Basilio Sierra b, Itziar Irigoien b, Eneko Gorritxategi a

a Fundación TEKNIKER, Eibar, Guipúzcoa, Spainb University of the Basque Country, San Sebastián, Guipúzcoa, Spain

a r t i c l e i n f o a b s t r a c t

Article history:Received 25 March 2010Received in revised form 24 January 2011Accepted 27 January 2011Available online 4 February 2011

Keywords:Data miningMachine learningDrilling processBurr detection

0360-8352/$ - see front matter � 2011 Elsevier Ltd. Adoi:10.1016/j.cie.2011.01.018

q This manuscript was processed by Area Editor Sa⇑ Corresponding author.

E-mail address: [email protected] (S. Ferreiro).

Drilling process is one of the most important operations in aeronautic industry. It is performed on thewings of the aeroplanes and its main problem lies with the burr generation. At present moment, thereis a visual inspection and manual burr elimination task subsequent to the drilling and previous to the riv-eting to ensure the quality of the product. These operations increase the cost and the resources requiredduring the process. The article shows the use of data mining techniques to obtain a reliable model todetect the generation of burr during high speed drilling in dry conditions on aluminium Al 7075-T6. Itmakes possible to eliminate the unproductive operations in order to optimize the process and reduce eco-nomic cost. Furthermore, this model should be able to be implemented later in a monitoring system todetect automatically and on-line when the generated burr is out of tolerance limits or not. The articleexplains the whole process of data analysis from the data preparation to the evaluation and selectionof the final model.

� 2011 Elsevier Ltd. All rights reserved.

1. Introduction

Nowadays, practically all fields of industrial activities are mov-ing towards automation of their processes. This automation shouldensure the quality of the product while minimizing manufacturingcost and optimizing resources.

Drilling is the most important operation for aeronautic industrybecause it implies a high economic cost. This cost is consequenceof the visual inspection and burr elimination tasks. They are non-pro-ductive operations, carried out subsequent to drilling and theyshould be eliminated or minimized to the maximum extent possible.A small or medium size aeroplane has more than 250,000 holes to beinspected and if there is burr, it must be eliminated. It is necessary toeliminate this manual process and change it for a monitoring systemable to detect automatically and on-line when there is a burr. The aimof this article is to obtain a model that can be implemented into themachine to predict the burr generation during the drilling process.

The technological centre ‘C.I.C MARGUNE’, a Cooperative Re-search Centre for High Performance Manufacturing, patented amonitoring method, experimentally adjusted, able to detect if thesize of the generated burr is between aeronautical limits or not(Peña, Aramendi, & Rivero, 2007). This method consists of a con-ventional mathematical model to burr detection based on theparameters extracted from the whole internal signal of the ma-chine and its percentage of correct classification was 92%.

ll rights reserved.

tish Bukkapatnam.

Nevertheless, this model could not be implemented into themachine and there is currently no monitoring method for burrdetection, so most of the research of this article was focused onobtaining a model that could be implemented in a monitoring sys-tem to predict automatically the burr generation during the dril-ling process. This model was derived from a process that extractsuseful and understandable knowledge previously unknown froma set of experiments. Fig. 1 shows the communication among dif-ferent phases of the knowledge extraction process.

The storage, the organization and the information retrieval havebeen automated thanks to the data base systems and the availabil-ity of a huge quantity of information. There are some analytic tech-niques based on statistics that have been used to analyze thisinformation, but they are cryptic for people who are not very expe-rience with it. Data mining as explained in Michalski, Bratko, andKubat (1998) and Kaelbling and Cohn (2003) is a multidisciplinaryfield easy to put into practice and it combines several techniquessuch as statistics, machine learning, decision-making support sys-tems, and visualization, in order to extract knowledge from a dataset. Each phase of the process includes a set of these techniques.

The process is iterative and interactive. It is iterative becausethe output of any phase may turn back to the previous steps andbecause some iterations are necessary to extract high-qualityknowledge. It needs to explore various models to find the mostuseful one to solve the problem. In the search of a good model itmay be possible to return to previous phases and make changesin the data. Even the problem definition could be modified to giveit a different approach. Moreover, the process is interactivebecause the expert in the problem domain should help in data

http://dx.doi.org/10.1016/j.cie.2011.01.018

mailto:[email protected]

http://dx.doi.org/10.1016/j.cie.2011.01.018

http://www.sciencedirect.com/science/journal/03608352

http://www.elsevier.com/locate/caie

Fig. 1. Data analysis process.

802 S. Ferreiro et al. / Computers & Industrial Engineering 60 (2011) 801–810

preparation and evaluation. The evaluation is one of the mostimportant phases in this process and it needs to have well-definedtraining and validation stages to decide which model offers betterperformance and accuracy. The idea is to estimate (train) the mod-el with a subset of the dataset (training dataset) and then validateit with the rest of the dataset (test dataset).

The main contribution of this article is to explain the usefulnessand benefits of the data mining techniques to obtain a robust, accu-rate and reliable model that can be implemented into a monitoringsystem and used to predict burr generation during the drilling pro-cess as shown in Fig. 2.

The rest of the article is organized as follows. Section 2 presentsa review of related and previous works in the use of other data dri-ven models in machining, and the position of the present work inthe context of previous ones. Section 3 shows the experimentaldataset, characteristics of the process, and data selection and prep-aration. It defines a clean and reliable dataset. Section 4 briefly de-scribes the concept of machine learning. Next, Section 5 explainsthe results of the analysis and evaluation in order to obtain the bestfinal model for detecting burr generation: application of machinelearning algorithms without selection of variables, then with selec-tion and combination of algorithms, and finally, makes a change ofstrategy to eliminate false negatives (cases in which the model pre-dicts no burr but burr is generated. And finally, Section 6 closes thearticle with the most important conclusions.

2. A review of related works

Aeronautic industry, as well as other industrial sectors mustmodify some of its manufacturing processes and maintenancestrategy. Considering maintenance strategy, it is necessary to min-imize the cost of maintenance and to increase operational reliabil-ity, replacing the traditional ‘‘fail and fix’’ method with ‘‘predictand prevent’’ as explained in Ferreiro and Arnaiz (2010). And withregard to manufacturing, the major need is to increase productivityand to optimize and automate certain processes while ensuring the

quality of the product. In both, manufacturing and maintenance, itis essential to explore new technologies, and a lot of works havebeen published looking into monitoring and diagnosis.

Bukkapatnam, Kumara, and Lakhtakia (1999) presents a meth-odology based on chaos theory, wavelets and neural networks foranalyzing AE signals. It evolves a thorough signal characterization,followed by signal representation using wavelet packets, and stateestimation using multilayer neural networks. Bukkapatnam,Kumara, and Lakhtakia (2000) develops a methodology for accu-rate and algorithmically simple neural network estimation byexploiting the properties of the underlying machining, dynamicsand its interactions with flank wear dynamics. Kamarthi et al.(2000) investigate a flank wear estimation technique in turningthrough wavelet representation of acoustic emission (AE) signals.The effectiveness of the wavelet representation of AE signals forflank wear estimation is investigated by conducting a set of turningexperiments on AISI 6150 steel workpiece and K68 (C2) grade un-coated carbide inserts. In these experiments, flank wear is moni-tored through AE signals. A recurrent neural network of simplearchitecture is used to relate AE features to flank wear. Using thistechnique, flank wear estimation results are obtained for the oper-ating conditions that are within in the range of those used duringneural network training. In Pittner and Kamarthi (2002) the workdeals with the assessment of process parameters or states in a gi-ven application using the features extracted from the wavelet coef-ficients of measured process signals. Sick (2002) describes the‘state of the art’ with 138 publications dealing with on-line andindirect tool wear monitoring in turning by means of artificial neu-ral networks. The article compares the methods applied in thesepublications as well as the methodologies used to select certainmethods, to carry out simulation experiments, to evaluate and topresent results. Rangwala and Dornfeld (2002) present a schemethat uses a feedforward neural network for learning and optimiza-tion of machining operations. The network learns by observing theeffect of the input variables of the operation (such as feed rate,depth of cut, and cutting speed) on the output variables (such ascutting force, power, temperature, and surface finish of the work-

Fig. 2. Elimination of unproductive operations.

S. Ferreiro et al. / Computers & Industrial Engineering 60 (2011) 801–810 803

piece). The learning phase is followed by a synthesis phase duringwhich the network predicts the input conditions to be used by themachine tool to maximize the metal removal rate subject to appro-priate operating constraints. Byrne, Dornfeld, and Denkena (2003)review some of the main developments in cutting technology sincethe foundation of CIRP. Caprino, Teti, and de lorio (2005) predictsthe residual strength of pre-fatigued glass fibre-reinforced plasticlaminates through acoustic emission monitoring. An empirical cor-relation was found between the material residual strength and thetotal event counts detected at maximum stress applied during pre-fatiguing cycles. Moreover, the correlation was improved when aprevious model, relying on fracture mechanics concepts, wasutilised.

Gradually, the techniques have been varied always trying to im-prove the results obtained so far. In How, Liu, and Lin (2003) arough set is used to extract causal relationships between manufac-turers’ parameters and product quality measures. In order to iden-tify the relationship between residual stresses and the processparameters themselves, Umbrello, Ambrogio, Filice, Guerriero,and Guido (2009) proposes data mining techniques applied tothe cutting process. Additionally Malakooti and Raman (2000) de-fine the problem of assessing the value of certain input when try-ing to minimize cost, maximize productivity and improve thesurface finish. However, today drilling is one of the most importantmanufacturing processes which requires attention. Peña et al.(2007) have studied ways to obtain cleaner holes, free of burr, inan effort to reduce or eliminate the lubricant used in the cleaningprocess prior to riveting. Nevertheless, the main problem is theoccurrence of burr, and several studies into this question have beencarried out, such as Kim et al. (2000) or Min et al. (2001). In thesestudies a control chart was developed for stainless AISI 304L andAISI 4118 in order to examine the drilling process in terms of cut-

ting conditions and drill diameter. Hambli (2002) describes a finiteelements approach with neural network to predict the burr heightof the parts. Heisel, Luik, Eisseler, and Schaal (2005) proposes amethod based on empirical cutting examinations and correlationbetween burr parameters. Lauderbaugh (2009) presents a method-ology to predict burr height, force, heat flux, and temperature atbreakthrough based on a statistical analysis of 2024-T351 and7075-T6 aluminium. Gaitonde, Karnik, Achyutha, and Siddeswar-appa (2008a) determines burr height and burr thickness combin-ing response surface methodology with genetic algorithms indrilling of AISI 316L stainless steel using HSS twist drills. Gaitonde,Karnik, Achyutha, and Siddeswarappa (2008b) presents an applica-tion of Taguchi optimization method in order to minimize burrheight and thickness. Chang and Bone (2010) describes an analyt-ical model to predict burr height in vibration assisted drilling ofaluminium 6061-T6.

The drilling process is particularly important in the aeronauticsindustry because of the need to ensure the safety of the productand meet the statutory requirements regarding the maximum sizeof the burrs. Holes with burr exceeding the official limit of 127 lmare not permitted, even when the percentage of these is low. Sev-eral studies based on data mining approaches, machine learningalgorithms or advanced statistics have been carried out since thepatent Peña, Aramendi, Rivero, and López de LaCalle (2005) be-cause as mentioned in Wang (2007): ‘‘. . .human operators maynever find such rules by investigating a dataset manually’’ and‘‘. . .one may never be able to discover such hidden knowledge froma dataset without the assistance of computer-based data analysisand mining approaches’’. Initially, some of the classic algorithmsof machine learning were applied to solve the problem presentedin this article as detailed in Ferreiro et al. (2009). This study wasapplied without required precision, in an explorative way, only

Fig. 4. Cutting area.


with the intention of seeing if machine learning algorithms couldimprove the detection of burrs during the process. They improvethe results of the patented model without much loss of time orthe need for computational resources.

Then, a study in-depth was carried out into the possibility ofimprovement through the use of various data mining approachesexplained in Kaelbling and Cohn (2003) such as a pre-processingof the data (discretization) and a wider selection of the most influ-ential variables in the drilling process, based on different criteriaand search methods. Moreover, the criterion to select the bestmodel among several was set by means of the evaluation of eachof them plus a hypothesis test that determines if there are signifi-cant differences. This study is explained below.

3. Experimental setup: data selection and preparation

The main objective of the present work is to improve the detec-tion of burr during the drilling process by using data mining tech-niques. It should be able to improve the correct ratio classificationobtained by the current mathematical model and to be imple-mented later into the machine, within a monitoring system to pre-dict automatically and on-line the burr generation.

The monitoring system should start from a data base withinwhich create a model and introduce it later in the machine. Oneof the fundamental tasks prior to the development of this projectwas to study the sensitive of different signals to the burr detection,to treat and to use them to develop on-line monitoring system. Itimplied to analyze the signals and to evaluate which of them hadmore information about the burr. The internal signals of the ma-chine were analyzed (torque of the spindle, the power and ad-vanced force) and the studies reached that these signals presentcertain advantages:

1. they are a simple acquisition method and it does not requireadditional elements,

2. they form a non intrusive method since no elements are addedto the work piece,

3. they provide an easy integration in the machines control.

Fig. 3 shows an example of internal signal caught during a dril-ling test. This signal belongs to the torque of electro-spindle duringthe drilling of a hole, from the electro-spindle acceleration to thedeceleration. It shows four areas: ‘‘Spindle acceleration’’ of thegear-head, ‘‘Approach to work piece’’ to the material, ‘‘Cutting’’and ‘‘Spindle deceleration’’.

The studies of these types of signals by experts concluded thatthe shape of the signal of the torque of the electro-spindle regard-ing to the time domain is related to the size of the burr, and it was

Fig. 3. Electro-spindle signal caught during drilling.

observed that the most representative area corresponded to ‘‘Cut-ting’’ area represented below in Fig. 4.

Finally, the most influential and representative variables of theprocess and the ‘‘Cutting’’ area of the pair of the spindle signal weretaken into account as predictive variables as shown in Table 1. Drillbit, speed of cut, length in entrance, speed of advanced, length inexit and thickness defines the process, while minimum, maximum,angle, height and weight were calculated from the ‘‘Cutting’’ area.

In order to develop a model, a set of experiments was performedfrom a design of experiments. The material for the tests was alumin-ium Al 7075-T6, commonly used in aeronautical structures and itwere performed in a high speed machining centre CNC (3-axis) with-out lubricant. This machine operates at a maximum speed of24,000 rpm, and has a maximum feed rate of 120 m/min. It has anacceleration of 2 g (g = 9.8 m/s2), a nominal power of 27 kW andnominal torque of 16.97Nm. The tool was a three-edged carbide drillwith two different angles: point angle (130� – hard rock) and helixangle (30� – soft rock). The geometry of the drill-hole correspondedto a 10 mm diameter, drilled length was 12 and 25 mm thicknesses.And cutting speed range was set at 0.2–0.5 mm/rev.

Finally, there was a dataset of 106 tests plus the class to beingpredicted as represented in Table 2:

� Class (Burr = no): admissible burr.� Class (Burr = yes): non admissible burr.

The class was categorized based on the permissible burr sizeimposed by the aeronautical industry which demands maximumsize of 127 lm. The size of burr was measured during the execu-tion of the experiments at different angles (0�, 90�, 120�, 180�,240�, 270�) by means of a roughness tester. The mean of these val-ues was used to categorize the class based on the permissible burrsize as defined by aeronautical industry.

However, the dataset from which to learn the algorithms doesnot always get through a design of experiments. Sometimes it isnecessary to extract this data from complex databases which are

Table 1Predictive variables.

Description Variable Origin Type Values

Drill bit BRO Configuration Discrete SR; HARDTest num/drill bit NUM Configuration ContinuousVelocity VC Configuration ContinuousLength in entrance TRM Configuration Discrete 8; 15; 20; 35Advance 1 AV1 Configuration ContinuousAdvance 2 AV2 Configuration ContinuousLength in exit REC Configuration Discrete 20; 35Thickness ESP Configuration Discrete 12; 25Minimum MIN Sensor ContinuousMaximum MAX Sensor ContinuousAngle ANG Sensor ContinuousHeight ALT Sensor ContinuousWeight ANC Sensor Continuous

Table 2Dataset structure.

BRO NUM VC TRM AV1 AV2 REC ESP MIN MAX ANG ALT ANC BURR

SR 1 150 35 0.3 0.3 35 25 0.11 0.31 �42.3 2.091 10.44 YESSR 20 200 35 0.4 0.4 35 25 0.21 0.243 �55.7 2.228 17.04 YESHARD 24 150 8 0.3 0.5 20 25 0.07 2.08 �43.3 4.93 8.82 YESHARD 12 250 35 0.2 0.2 35 12 0.69 0.3 31.1 3.78 14.32 YES

SR 5 200 20 0.3 0.5 20 12 0.203 �0.84 �22.5 3.689 10.77 NO


difficult to understand and operate. This data as seen in Fayyad,Piatetsky-Shapiro, and Smyth (1996) are typically too voluminousto understand and digest easily.

After defining the initial dataset, a group of experts in the dril-ling process carried out a detection of irrelevant or unnecessarydata, anomalous data (outliers), missing values, inconsistencies,etc., as in Hand, Mannila, and Smyth (2001). Consequently, thedataset was more representative and reliable. Each task mentionedabove includes a broad set of data mining techniques to its execu-tion such as visual inspection, histograms, box plots, methods toreplace missing values or treat inconsistencies and outliers. Butthese techniques are not yet commonly used by the experts, andusually they rely on their knowledge and the visual inspection asin the present work. Moreover, when the data set is obtained froman experimental design, the data is more reliable and the effort re-quired for the pre treatment is lower.

Having defined the dataset and to complete this phase of dataselection and preparation we decided to perform a supervised dis-cretization (Kononenko, 1995) of the dataset. The discretization isbeneficial and usually performed when there is significant thresh-old, it is necessary to integrate different scales, the error of themean is large, the model requires nominal variables, the model isslow with numerical variables, etc.

Data selection and processing is an important task that usuallytakes more than 50% of the analysis time so the results obtained la-ter depend on the quality of the initial data set to a certain extend.

4. What is machine learning?

Machine learning as defined in Mitchell (1997a) is a subfield ofartificial intelligence (Russel & Norvig, 1995) closely related to datamining (Michalski et al., 1998) and its aim is to develop algorithmsthat allow to the machines learn from data, that is, to develop pro-grams able to induce models that improve their performance overthe time from data. This is the reason that it is a knowledge induc-tion process.

Within Machine learning it can be distinguished different typesof algorithms depending on their functionality and application:

� Supervised Learning: they offer the possibility of learning froma categorized set of data, resulting in a predictive model able torepresent and generalize the behaviour contained in the data.Once a model is created, it will be able to classify and categorizenew cases of the problem that it is trying to solve.� Unsupervised Learning: from a set of no categorized data, con-

sisting of cases where the class is unknown and will be analyzedby the models in order to recognize different groups of caseswith similar characteristics. The creation of these groups allowsthe extraction of information from the available data set, mak-ing visible certain characteristics that hide information.

Machine learning algorithms have been used up to some yearsin specific applications. Langley and Simon (1995) offers a briefdescription of some of these applications such as diagnosis ofmechanical devices, preventing breakdowns in electrical trans-

formers, forecasting severe thunderstorms, predicting the struc-ture of proteins or making credit decisions. However, thesealgorithms are widely used increasingly in medicine as demon-strated in Inza, Merino, et al. (2001), bioinformatics as explainedin Inza et al. (2010) and industrial applications as shown in Nieveset al. (2009), Santos, Nieves, Penya, and Bringas (2009), Correa, Bie-lza, de Ramirez, and Alique (2008) and Correa, Bielza, and Pamies-Teixeira (2009).

It mixes mathematical elements with statistics and computa-tional sciences such as classification trees, induction rules, neuralnetworks, Bayesian networks, regression algorithms, supportedvector machines, and clustering. The development of the presentwork was performed using Weka (Witten & Frank, 2000) software,which is a collection of machine learning algorithms written inJava and developed by the University of Waikato (Australia). Atthe end of this article the validity of these techniques for the indus-trial field are shown, and a model valid for the burr prediction anddetection in the drilling process is obtained in order to improve thecurrent results of the conventional techniques of the data analysis.

5. Presented approach and experimental results

After a first pre-processing of data there were a set of 106 testsof drilling measures made up of the parameters of the process andvariables from the pair of the spindle signal.

The objective was to obtain a more suitable classification mod-el, with a result of (Burr = yes | Burr = no). These classes were de-fined taking into account the imposed aeronautic limits.

� Burr = yes: non acceptable burr (with value equal or greaterthan 127 lm).� Burr = no: acceptable burr (with value lower than 127 lm).

5.1. Evaluation

Evaluation is a very important question to bear in mind afterlearning the network because the network validity depends onthe quality of the evaluation.

The following objectives are considered:

A. To estimate the real error rate of the prediction (with newvalidation samples): this rate should be calculated usingthe data sets that have not been used for learning the modelbecause the error rate calculated from training samplesunderestimates the error rate predicted for new samples.

B. To select the model from two or more models. The evalua-tion assesses whether one model is better than another.

Because of the importance of these two objectives, the follow-ing procedure was carried out for calculating good estimates ofthe error rate of the models:

1. 10-fold cross-validation was applied. This is a technique to esti-mate the performance of the predictive model. It randomlyassigns tests to 10 sets {d1, d2 , . . ., d10} of equal size. Then the


model is trained on each set and tested on the rest. The finalaccuracy is calculated as the average of the accuracy obtainedfrom the 10 sets.

2. 10-fold cross-validation was repeated ten times (using differentseeds), obtaining 10 rates of the percentage of correctclassification.

3. The average of the 10 rates was calculated.

Finally, a ranking of model was identified, based on the correctclassification rate calculated in 3 above.

5.2. First approach

Firstly, several algorithms of machine learning presented inMitchell (1997b) were applied such as classification trees, induc-tion rules, distance based techniques, techniques based on proba-bilities and neural networks.

5.2.1. ResultsTables 3 and 4 show the results obtained by some of the ma-

chine learning algorithms for the classification task (Burr = yes |Burr = no) before and after data pre-processing (discretization). Itcan be seen that the results are not much better when discretiza-tion is not performing, there are even algorithms in which theaccuracy seems worse. For this reason and because having the datadiscretized speeds up the training process of the algorithms, it isdecided to work with this data from here on.

It is observed that most of the machine learning algorithms pro-vide better or very close results to the conventional approach usedby the industrial partner (remember that it gives a 92% of correctclassification rate). The three algorithms that provide the best re-sults from Table 4 are marked with an asterisk (ID3, Prism, andKNN). These algorithms are very intuitive, and it can be developedand implemented into the machine easily.

5.3. Second approach

Once the standard approach was applied, we expected to im-prove the models increasing their accuracy, validity, reliabilityand stability as in Kazakov and Kudenko (2001) with a feature sub-set selection or selection of variables as explained in Mitchell(1997a, 1997b) and a combination of classifier algorithms.

5.3.1. Feature subset selectionThe aim of the selection of attributes as stated in Nilsson (1996)

does not lie on analyzing the variables because this task was madein a previous sections, but this is a second selection and it deter-mines which are the most influential variables and which of themimprove the model. This feature subset selection has someadvantages:

Table 3Results of machine learning algorithms before data discretization.

Type of classification Algorithm Mean value(%)

Standarddeviation

Classification trees J48 92.85 1.06ID3 94.66 1.86

Induction rules JRip 89.33 1.79Prism 95.71 1.14

Distance based techniques KNN (k = 1) 92.09 1.06KNN (k = 3) 93.14 1.26

Techniques based onprobabilities

Naive Bayessimple

94.09 0.85

� Noise elimination, increasing data precision and predictive andexplanatory ability of the model.� Irrelevant data elimination, decreasing acquisition cost and

computational cost of the data base.� Redundancies elimination, avoiding problems of inconsistencies

and duplications.

There are a wide variety of complex algorithms developed forthe selection of variables in different applications such as Inza, Lar-rañaga, Etxeberria, and Sierra (2000) and Inza, Larrañaga, and Sier-ra, 2001. The present work combines several criteria and measures(such as gain information, explicative variance, and correlationtest) together with search methods (such as exhaustive search orranker) in order to reach to the more representative set ofvariables.

5.3.2. Combination of classifier algorithmsThe combination of classifiers raised in Witten and Frank (2000)

is carried out to improve classification accuracy and to decreaseuncertainty. Each classifier has its own performance and its ownerror rate due to their different input spaces and/or algorithm.Therefore, by combining different types of classifiers focusing ondifferent aspects of the task, the overall error rate can be decreased.It is very interesting to reinforce the classification ability by takingthe maximum profit of the fact that each model has a different setof well classified cases.

There are several methods to combine the classifiers (combina-tion of classifiers of different types, combination of classifiers of thesame type and hybrids). The difference of the combination of clas-sifiers of different types is given by the method of the combination(the order in which the classifiers are executed):

� In series or Cascade: the combination is chained up (the firstmodel is included in the second and so on).� In parallel: the final classifier is obtained by means of the com-

bination of strategies such as voting.� Hierarchic: similar to Cascade but setting hierarchies amongst

the classifiers in a structured way like decision trees.

The combination of classifiers of the same type can be divided intwo groups:

� Bagging: it combines the classifiers obtained from replicas byusing the re-sampling method called bootstrapping on thetraining set.� Boosting: it increases the samples in each iteration.

Finally, hybrid methods are developed from the combination oftwo or more paradigms (i.e.: Lazy Bayesian Rules, Naïve Bayes Tree,Logistic Model Trees, etc.).

5.3.3. ResultsTable 5 presents the selection of variables taking into account

different criteria and search methods that are explained belowand Table 6 shows the results obtained when applying machinelearning algorithms with these sets of variables.

Criteria

� Criteria_1: it evaluates the worth of a subset of variables byconsidering the individual predictive ability of each variablealong with the degree of redundancy between them.� Criteria_2: it evaluates the worth of a variable by computing the

value of the chi-squared statistic with respect to the class.� Criteria_3: it evaluates the worth of a subset of variables by the

level of consistency in the class values when the training data isprojected onto the subset of variables.

Table 4Results of machine learning algorithms after data discretization.

Type of classification Algorithm Mean value(%)

Standarddeviation

Classification trees J48 92.19 1.17ID3 94.95⁄ 1.62

Induction rules JRip 89.24 1.56Prism 95.81⁄ 1.12

Distance based techniques KNN (k = 1) 95.43⁄ 1.67KNN (k = 3) 93.43 0.30

Techniques based onprobabilities

Naive BayesSimple

94 0.78

Table 5Selection of variables.

Criteria Search method Selected variables

Criteria_1 SearchMethod_1 VC, TRM, ANC, MINCriteria_2 SearchMethod_2 ANC, MIN, VC, ALT, AV1Criteria_3 SearchMethod_1 BRO, VC, TRM, MIN, ALT, ANCCriteria_4 SearchMethod_2 VC, ANC, ALT, AV1, MINCriteria_5 SearchMethod_2 ANC, MIN, VC, ALT, AV1

Table 6Results of machine learning algorithms with selection of variables.

Selected variables Algorithm Mean value(%)

Standarddeviation

VC, TRM, ANC, MIN J48 94.95 2.20ID3 95.52 1.56JRip 90 1.12Prism 91.14 0.46KNN (k = 1) 92.19 1.54KNN (k = 3) 94.95 0.90Naïve Bayessimple

96.19⁄ 0

ANC, MIN, VC, ALT, AV1 J48 94 1.96ID3 94.29 0.9JRip 88.76 1.08Prism 92.57 1.33KNN (k = 1) 93.05 1.68KNN (k = 3) 94.48 1.89Naive BayesSimple

95.24 0

BRO, VC, TRM, MIN, ALT,ANC

J48 92.48 1.14ID3 96.1⁄ 1.65JRip 90.1 1.21Prism 95.9⁄ 1.19KNN (k = 1) 96.29⁄ 1.82KNN (k = 3) 93.71 0.49Naive Bayessimple

94.29 0

1 ANOVA is a statistical procedure used to determine whether the means of two ormore samples are drawn from populations with the same mean.


� Criteria_4: it evaluates the worth of a variable by measuring thegain ratio with respect to the class.� Criteria_5: it evaluates the worth of a variable by measuring the

information gain with respect to the class.

Search algorithms

� SearchAlgorithm_1: it performs an exhaustive search throughthe space of variable subsets starting from the empty set ofattributes and it reports the best subset found.� SearchAlgorithm_2: it ranks variables by their individual

evaluations.

This selection of variables determined which parameters of theprocess were relevant and at the same time to show that not allparameters used by the classical algorithm were necessaries.

It is clear that some of the algorithms exceed the accuracy of theprevious algorithms learnt without selection of variables. Thesealgorithms (Naïve Bayes Simple, ID3, Prism and KNN) have beenidentified in the table above. Naïve Bayes Simple uses speed ofcut, speed in advance, weight, height and minimum as predictors,while ID3, Prism and KNN use drill bit, speed of cut, length in en-trance, minimum, height and weight.

As a result, six variables were selected (BRO, VC, TRM, MIN, ALTand ANC) from the original dataset of eleven variables and we at-tempted to improve the accuracy of the model by combining clas-sifiers as shown in Table 7.

The analysis of the different algorithms of combination of clas-sifiers showed that ‘Vote’ classifier provided the best accuracyusing KNN (k = 1), ID3 and Prism, with the six variables extractedfrom the feature subset selection. It provided with a 96.76% of cor-rect classification rate with a 1.57% of standard deviation. ‘Boost-ing’ classifier applied to KNN (k = 1) and to Prism algorithmsindividually provides a very similar accuracy to the ‘Vote’ classifier.

However, although the accuracy of the algorithms seems tohave been improving throughout the whole process, the top 10algorithms (�), selected in the previous approaches and presentedin Table 8, do not differ significantly in their accuracy and it wouldbe inappropriate to assume that one is better than another.

Table 9 shows the result when applying the one-way ANOVA1

technique to compare the means of the accuracies using F distribu-tion (Ross, 2005). It test the null hypothesis (H0: l1 = l2 = . . . = ln)that the groups are drawn from the same population, where li isthe mean of ith algorithm. The ANOVA produces an F statistic, the ra-tio of the variance calculated among the means to the variance with-in the samples. If the algorithm means are drawn from the samepopulation, the variance between the group means should be lowerthan the variance of the samples, following central limit theorem. Ahigher ratio implies that the samples were drawn from differentpopulations.

Before applying ANOVA, it was proved that basic assumptionsof the model are true by means of the analysis of remainders, thataccording to the hypothesis should be random values, independentand normally distributed with mean 0 and with homogeneousstandard deviation.

The test concludes that there are not significant differencesamong the algorithms accuracy because the significance value is0.206 and it is greater than 0.05.

But despite there are no significant differences among the algo-rithms accuracy, Naïve Bayes Simple is the best option. Firstly, it isa simple model to implant into the machine for the burr detection.Secondly, its standard deviation is very close to zero which meansthat the algorithm is very stable. And finally, it uses only a subsetof the overall set of variables and this implies that it is possible toreduce the quantity of data to use for the machine learning task,reducing the computation time.

5.4. Third approach

This approach changed the aim of the problem due to a newrequirement. It was necessary to eliminate ‘false negatives’, thisis to say, cases in which the model predicts no burr but burr is gen-erated. This requirement is imposed by manufacturing industry inorder to reduce the high cost of the checking and to ensure 100%that the burr is detected.

The (two states) classification had the inconvenience of havingfalse negatives; they were not eliminated in spite of the highpercentage of correct classification rate (even by the conventional

Table 7Results of machine learning algorithms with combination of classifiers.

Type ofcombination

Algorithms Mean value(%)

Standarddeviation

Classifiers of different typesStacking KNN (k = 1), ID3,

Prism95.24 1.42

Vote KNN (k = 1), ID3,Prism

96.76⁄ 1.57

Classifiers of the same typeBoosting KNN (k = 1) 96.48⁄ 1.56

ID3 95.33 1.31Prism 96.38⁄ 1.17

Bagging KNN (k = 1) 96.1 1.82ID3 95.62 1.81Prism 95.9 1.19

Table 8Top 10 algorithms.

Variables Algorithm Mean value(%)

Standarddeviation

All ID3 94.95⁄ 1.62All Prism 95.81⁄ 1.12All KNN (k = 1) 95.43⁄ 1.67VC, TRM, ANC, MIN Naive Bayes

simple96.19⁄ 0


ID3 96.1⁄ 1.65


Prism 95.9⁄ 1.19


KNN (k = 1) 96.29⁄ 1.82


Vote 96.76⁄ 1.57


Boosting 96.48⁄ 1.56


Boosting 96.38⁄ 1.17

Table 9One-way ANOVA.

Sum of squares df Mean square F Sig.

Between groups 25.404 9 2.823 1.386 0.206Within groups 183.312 90 2.037

Total 208.717 99

Table 10Results of machine learning algorithms.

Type of classification Algorithm Meanvalue (%)

Standarddeviation

Falsenegatives

Classification trees J48 93.33⁄ 1.19 0ID3 89.62 1.14 0.8

Induction rules JRip 93.05⁄ 1.01 0Prism 89.9⁄ 0.66 0

Distance basedtechniques

KNN (k = 1) 89.9 0.66 1.2KNN (k = 3) 88.57 0 3

Techniques based onprobabilities.

Naive Bayessimple

90.38 0.3 1.2

Table 11One-way ANOVA.

Sum of squares df Mean square F Sig.

Between groups 72.381 2 36.190 37.801 .000Within groups 25.850 27 .957

Total 98.230 29

Table 12Scheffé method.

(I)Group

(J)Group

Meandifference (I–J)

Std.error

Sig. 95% confidenceinterval

Lowerbound

Upperbound

J48 JRip .2857 .43758 .809 �.8476 1.4191Prism 3.4286* .43758 .000 2.2952 4.5619

JRip J48 �.2857 .43758 .809 �1.4191 .8476Prism 3.1429* .43758 .000 2.0095 4.2762

Prism J48 �3.4286* .43758 .000 �4.5619 �2.2952JRip �3.1429* .43758 .000 �4.2762 �2.0095

* The mean difference is significant at the .05 level.

Fig. 5. Classification tree (J48).


approach). Notice that to predict burr when there is none is lessrelevant than failing to predict burr. The second action impliesmore consequences. At this point in time, the aim was to reducethe number of holes being revised but without ‘false negatives’.

In order to avoid these ‘false negatives’ the analysis was chan-ged, adding one more class to introduce a margin of error:

� Class 1: Burr size P127 lm.� Class 2: 127 > Burr size P100 lm.� Class 3: Burr size <100 lm.

5.4.1. ResultsTable 10 shows the accuracy obtained by using all the initial

predictive variables, from the calibration of 10 different seeds (tointroduce noise in the data) as applied in the previous evaluations,and the quantity of ‘false negatives’ calculated as the average offalse negatives obtained from each calibration.

Thanks to having added one more class the false negatives dis-appear although some accuracy is lost. The mean of the percentage


of the correct classification rate is worst that for the algorithmsshowed in the previous section. The reason is due to the raise ofthe number of classes being predicted, it makes more difficult forthe model the discrimination amongst them. Even so, the numberof ‘false negatives’ decreases in some algorithms (J48, JRip andPrism), which is a requirement imposed by the industry. But thesealgorithms present significant differences amongst them becausethe significance level is less than 0.05 when applying one-way AN-OVA (Table 11).

Next, Scheffé2 method (Montgomery, 2004) shows that the dif-ference is due to the third algorithm: Prism. Prism is worse thatJ48 and JRip and this difference is significant as shown in Table 12.

Both algorithms provide a good result and they eliminate falsenegatives as well. Besides, its implementation into the monitoringsystem to detect automatically and on-line the burr generation canbe simple because JRip algorithm is based on a set of rules (definedbelow) and J48 is based on a classification tree (Fig. 5), which canbe converted quickly and easily in a set of rules.

JRIP rulesIF (ANC 6 7.254) and (AV2 > 0.225) THEN Class 2ELSE IF (VC 6 112.5) THEN Class 3ELSE Class 1END IF

6. Conclusions

Data mining is a multidisciplinary field that can be applied inmany activity fields to data study and analysis. Sometimes theindustry relies on a lot of data from which to extract previously un-known information.

This information brings many benefits as shown in the article.From a simple design of experiments and data mining techniques,especially the selection of variables and machine learning algo-rithms, a model for detection of burr during the drilling processwas developed. This model provides better accuracy than the pre-viously used mathematical model and in addition it has certainadvantages. Firstly, the model is based on the internal signal ofthe machine and certain parameters of the conditions of the pro-cess, so the implementation would be easier and without externalsensors. It is able to be implemented later in a monitoring systemto detect automatically and on-line when the burr occurs duringthe drilling process. And secondly, it provides information aboutwhich parameters of the drilling process define if there is occur-rence of burr or not.

Regarding the results, almost all developed models provide ahigher accuracy than the current mathematical model. Moreover,for the final model based on Naïve Bayes the accuracy was 95%and standard deviation equal to 0, which means it is a very stablemodel. At this point, it must also be mentioned that in most of thecases in which the model performs a bad prediction, it is becausethe burr is very close to the aeronautical limits (127 lm), whichmakes more difficult to distinguish which is the correspondingclass.

Other aspect to take into account is the fact that the predictionerror has not the same significance whether the drilled hole hasburr or not. The main error is produced when a sample is classifiedas acceptable but, actually, it has not acceptable limits. For thedetection of these cases, the model has been improved, reducingconsiderably the number of inspections and operations to removeburr.

2 Scheffé method is a test performed after ANOVA. This makes comparisonsamongst means to determine significant differences between pairs of groups.

These techniques could be applied to other industrial processessuch as moulding and chip removal processes, potentially obtain-ing good results as in the previous case and making possible theautomation of some tasks which are made by hand at present. Theyare relatively new and novel techniques in certain industrial sec-tors such as metal, aeronautics or chemistry, and they should beinvestigated as a possible solution to some of their currentproblems.

Further studies could be undertaken by using Bayesian network,a model representation for reasoning under uncertainty. Formally,its representation is a directed acyclic graph (DAG) where eachnode represents a random variable and the edges represent (oftencausal) dependence relations between them (Jensen, 1996). Eachvariable represents a unique event or hypothesis, with a finite setof mutually exclusive states: X = {X1, . . . , Xn}, and there must be astate for each possible value and its conditional probabilities. Ituses Bayes’ theorem to upgrade the probabilities, and taking intoaccount the good results obtained by the Naïve Bayes algorithm,Bayesian network would be a model to evaluate. Moreover, it couldbe possible to increase the number of predictive variables, such asthe type of material, and the geometry of the bit. Another possibil-ity would be to take into account the wear on the tool. At present,when the operator becomes aware of the worn condition of thetool, it should be changed. It is clear that many other aspects ofthe drilling process can be usefully explored.

References

Byrne, G., Dornfeld, D., & Denkena, B. (2003). Advancing cutting technology. CIRPAnnals – Manufacturing Technology, 52(2), 483–507.

Bukkapatnam, S. T. S., Kumara, S. R. T., & Lakhtakia, A. (1999). Analysis of acousticemission signals in machining. Journal of Manufacturing Science and Engineering,121(4), 568–571. doi:10.1115/1.2833058.

Bukkapatnam, S. T. S., Kumara, S. R. T., & Lakhtakia, A. (2000). Fractal estimation offlank wear in turning. Journal of Dynamic Systems, Measurement, and Control,122, 89–94. http://www.okstate.edu/commsens/Papers/10journalpublic.pdf.

Caprino, G., Teti, R., & de lorio, I. (2005). Predicting residual strength of pre-fatiguedglass fibre-reinforced plastic laminates through acoustic emission monitoring.Composites Part B: Engineering, 36(5), 365–371.

Chang, Simon S. F., & Bone, G. M. (2010). Burr height model for vibration assisteddrilling of aluminum 6061-T6. Precision Engineering, 34, 369–375.

Correa, M., Bielza, C., de Ramirez, M. J., & Alique, J. R. (2008). A Bayesian networksmodel for surface roughness prediction in the machining process. InternationalJournal of Systems Science, 39, 1181–1192.

Correa, M., Bielza, C., & Pamies-Teixeira, J. (2009). Comparison of Bayesian networksand artificial neural networks for quality detection in a machining process.Expert Systems with Applications, 36, 7270–7279.

Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining toknowledge discovery in databases. AI Magazine, 17, 37–54. http://www.aaai.org/ojs/index.php/aimagazine/article/view/1230/1131.pdf.

Ferreiro, S., Arana, R., Aizpurua, G., Aramendi, G., Arnaiz, A., & Sierra, B. (2009). Datamining for burr detection (in the drilling process). Distributed computing,artificial intelligence, bioinformatics, soft computing, and ambient assisted living, Ptli, proceedings (Vol. 5518, pp. 1264–1273).

Ferreiro, S., & Arnaiz, A. (2010). Improving aircraft maintenance with innovativeprognostics and health management techniques. Case of study: Brake weardegradation. In 2nd International Conference on agents and artificial intelligence(ICAART 2010), Valencia, Spain (pp. 568–575).

Gaitonde, V. N., Karnik, S. R., Achyutha, B. T., & Siddeswarappa, B. (2008a). Geneticalgorithm-based burr size minimization in drilling of AISI 316L stainless steel.Journal of materials processing technology, 197(1–3), 225–236.

Gaitonde, V. N., Karnik, S. R., Achyutha, B. T., & Siddeswarappa, B. (2008b). Taguchioptimization in drilling of AISI 316L stainless steel to minimize burr size usingmulti-performance objective based on membership function. Journal ofmaterials processing technology, 202(1–3), 374–379.

Hand, D. J., Mannila, H., & Smyth, P. (2001). Principles of data mining. Cambridge:MIT Press.

Hambli, R. (2002). Prediction of burr height formation in blanking processes usingneural network. International Journal of Mechanical Sciences, 44, 2089–2102.

Heisel, U., Luik, M., Eisseler, R., & Schaal, M. (2005). Prediction of parameters for theburr dimensions in short-hole drilling. Annals of the CIRP, 54(1), 79–82.

How, T., Liu, W., & Lin, L. (2003). Intelligent remote monitoring and diagnosis ofmanufacturing processes using an integrated approach of neural networks andrough sets. Journal of Intelligent Manufacturing, 14, 239–253.

Inza, I., Larrañaga, P., Etxeberria, R., & Sierra, B. (2000). Feature subset selection byBayesian network-based optimization. Artificial Intelligence, 123, 157–184.

http://dx.doi.org/10.1115/1.2833058

http://www.okstate.edu/commsens/Papers/10journalpublic.pdf

http://www.aaai.org/ojs/index.php/aimagazine/article/view/1230/1131.pdf

http://www.aaai.org/ojs/index.php/aimagazine/article/view/1230/1131.pdf


Inza, I., Merino, M., Larrañaga, P., Quiroga, J., Sierra, B., & Girala, M. (2001). Featuresubset selection by genetic algorithms and estimation of distributionalgorithms – A case study in the survival of cirrhotic patients treated withTIPS. Artificial Intelligence in Medicine, 23, 187–205.

Inza, I., Larrañaga, P., & Sierra, B. (2001). Feature subset selection by Bayesiannetworks: A comparison with genetic and sequential algorithms. InternationalJournal of Approximate Reasoning, 27, 143–164.

Inza, I., Calvo, B., Armañanzas, R., Bengoetxea, E., Larrañaga, P., & Lozano, J. A. (2010).Machine learning: An indispensable tool in bioinformatics. In R. Matthiesen(Ed.). Bioinformatics methods in clinical research (Vol. 593, pp. 25–48). HumanaPress.

Jensen, F. V. (1996). An introduction to Bayesian networks. Springer Verlag.Kaelbling, L. P., & Cohn, D. P. (2003). Special issue on feature subset selection.

Journal of Machine Learning Research, 3.Kamarthi, S. V., Kumara, S. R. T., & Cohen, P. H. (2000). Flank wear estimation in

turning through wavelet representation of acoustic emission signals. Journal ofManufacturing Science and Engineering, 122, 12–19.

Kazakov, D., & Kudenko, D. (2001). Machine learning, ILP for MAS. In advancedcourse on artificial intelligence (ACAI 2001). Lecture Notes in ArtificialIntelligence, 2086, 246–270 [Prague, Czech Republic].

Kim, J., Min, S., & Dornfeld, D. A. (2000). Optimization and control drilling burrformation of AISI 304L and AISI 4118 based on drilling burr control charts.International Journal of Machine Tools & Manufacture, 41, 923–936.

Kononenko, I. (1995). On biases in estimating multi-valued attributes. In 14thinternational joint conference on artificial intelligence, Montréal, Canada (pp.1034–1040).

Langley, P., & Simon, H. A. (1995). Application of machine learning and ruleinduction. Communications of the ACM, 38(11). doi:10.1145/219717.219768.

Lauderbaugh, L. K. (2009). Analysis of the effects of process parameters on exit burrsin drilling using a combined simulation and experimental approach. Journal ofMaterials Processing Technology, 209(4), 1909–1919.

Malakooti, B., & Raman, V. (2000). An interactive multi-objective artificial neuralnetwork approach for machine setup optimization. Journal of IntelligentManufacturing, 11, 41–50.

Michalski, R. S., Bratko, I., & Kubat, M. (1998). Machine learning and data mining:Methods and applications. New York: Wiley.

Min, S., Kim, J., & Dornfeld, D. A. (2001). Development of a drilling burr controlchart for low alloy steel, AISI 4118. Journal of Materials ProcessingTechnology, 113, 4–9.

Mitchell, T. M. (1997a). Does machine learning really work? AI Magazine, 18(3),11–20.

Mitchell, T. M. (1997b). Machine learning. McGrawHill International Editions.Montgomery, D. C. (2004). Design and analysis of experiments (6th ed.). John Wiley &

Sons.Nieves, J., Santos, I., Penya, Y. K., Rojas, S., Salazar, M., & Bringas, P. G. (2009).

Mechanical properties prediction in high-precision foundry production. In 7thIEEE international conference on industrial informatics, Cardiff, Wales (Vol. (1–2),pp. 31–36).

Nilsson, N. J. (1996). Introduction to machine learning. Early draft of proposedtextbook.

Peña, B., Aramendi, G., Rivero, A., & López de LaCalle, L. N. (2005). Monitoring ofdrilling for burr detection using spindle torque. International Journal of MachineTools & Manufacture, 45, 1614–1621.

Peña, B., Aramendi, G., & Rivero, M. A. (2007). Method for monitoring burr formationin processes involving the drilling of parts. WO2007/065959A1.

Pittner, S., & Kamarthi, S. V. (2002). Feature extraction from wavelet coefficients forpattern recognition tasks. International Conference on Neural Networks, 1997(3),1484–1489.

Rangwala, S. S., & Dornfeld, D. A. (2002). Learning and optimization of machiningoperations using computing abilities of neural networks. IEEE Transactions onSystems, Man and Cybernetics, 19(2), 299–314.

Ross, S. M. (2005). Introductory statistic (2nd ed.). Elsevier Inc.Russel, S., & Norvig, P. (1995). Artificial intelligence: A modern approach. Prentice Hall.Santos, I., Nieves, J., Penya, Y. K., & Bringas, P. G. (2009). Optimising machine-

learning-based fault prediction in foundry production. In Proceedings ofdistributed computing, artificial intelligence, bioinformatics, soft computing, andambient assisted living, Pt li (Vol. 5518, pp. 554–561). doi:10.1007/978-3-642-02481-8_80.

Sick, B. (2002). On-line and indirect tool wear monitoring in turning with artificialneural networks: A review of more than a decade of research. MechanicalSystems and Signal Processing, 16(4), 487–546.

Umbrello, D., Ambrogio, G., Filice, L., Guerriero, F., & Guido, R. (2009). A clusteringapproach for determining the optimal process parameters in cutting. Journal ofIntelligent Manufacturing, 21(6), 787–795.

Wang, K. (2007). Applying data mining to manufacturing: The nature andimplications. Journal of Intelligent Manufacturing, 18(4), 487–495.

Witten, I. H., & Frank, E. (2000). Data mining: Practical machine learning tools andtechniques with Java implementations. San Francisco: Morgan Kaufmann.

http://dx.doi.org/10.1145/219717.219768

http://dx.doi.org/10.1007/978-3-642-02481-8_80

http://dx.doi.org/10.1007/978-3-642-02481-8_80

data mining for quality control: burr detection in the drilling process

Documents