classification of fibers using ir-spectra

1
RESULTS Feature Engineering Models CLASSIFICATION OF FIBERS USING IR-SPECTRA Karl Kaupmees, Oliver Meikar, Edgar Sepp INTRODUCTION Identification of textile fibers is important in industry (quality control), forensic science (identification of fibers on crime scene), but also in conservation and archaeology (identification of historical textile fibers). Common methods for fiber identification are microscopic observation, burning test and various solubility tests. Infrared spectroscopy (IR) has many advantages for fiber identification, because it offers highly characteristic information, is easy, fast, non-destructive and relatively inexpensive. However the analysis of IR spectra is tedious and requires a trained scientist and some peak-analyzer software. The main difficulties are spectral inhomogeneities of repeated measurements and the intrinsic similarity between the spectra of some fibers. Hence, a better method to analyze the IR specta of textiles is needed. OBJECTIVES To meet the increasing demand from the academic and industry, we aimed build a classifier that can identify fibers by their IR spectra. The initial aim is to be able to detect pure fibers at 80% probability. For successful classification, we aimed to: reduce the complexity of the dataset engineer additional features from the dataset generate data normalization tools CONCLUSIONS As the amount of data is small the best performing models can be consider more-or-less equal in performance. Feature Engineering helped to improve the classification. Most useful feature seems to be difference from local mean. Feature engineering could be developed further and additional filtering of data applied to concentrate on areas with more information and thus providing better separation. Separating different spectra works well in general, accuracy above 0.9, but there are still difficulties with more similar fibers: linen, cotton, viscose. ACKNOW- LEDGEMENTS We would like to thank researchers Signe Vahur and Pilleriin Peets from the Cultural Heritage research group, University of Tartu, for providing the data as well as domain knowledge throughout the project. DATASET AND METHODOLOGY Our dataset contains of 438 IR spectra of 12 pure textile fibers both, natural and synthetic: Cat count Name 1.11 52 silk 3.5 44 polyacrylic 1.5 44 cotton 1.6 40 linen 2.1 35 viscose 3.1 33 polyester 1.1 30 wool 1.9 24 jute 3.11 22 elastane 3.2 18 polyamide 2.3 16 acetate 3.12 15 polyetylene Each spectra contains 1700 measured datapoints. Train data – 30 spectra from each class Test data – the remaining spectra from classes with more than 30 spectra http://lisa.chem.ut.ee/IR_spectra/textile-fibres/ Right: Correlations between original data and features. Down: 3 randomly sampled spectra from all classes and correlations between them. Big numbers show random forest (RF) classification accuracy on test data when using only this feature. 0.5 0.6 0.7 0.8 0.9 1 Original data [raw], RF Original data [standardized], RF Original data [normalized], RF Diff. from global mean [raw], RF Diff. from global mean [standardized], RF Diff. from global mean [normalized], RF Diff. from local mean [raw], RF Diff. from local mean [standardized], RF Diff. from local mean [normalized], RF Slope/angle of the graph [raw], RF Slope/angle of the graph [standardized], RF Slope/angle of the graph [normalized], RF All features [normilized], RF All features [normilized], SVM All features [standardized], RF All features [standardized], SVM All features [standardized], kNN All features [raw], RF All features [raw], SVM All features, glo loc [std], ang [norm], RF All features, glo loc [std], ang [norm], SVM Diff. from local mean [standardized], SVM Diff. from local mean [standardized], kNN Diff. from local mean [normilized], SVM All features, glo loc [std], ang [norm], kNN Accuracy on test data Accuracy on training data Train Data Raw data (Importing and balancing) Fill in NaN’s using linear interpolation Calculate new Features: 1. Difference from global mean 2. Difference from local mean 3. Slope / angle of the graph Standardize / normalize Subset the data Train the model RF, kNN, SVM Test Data Predict Test data Select best model Predict unknown spectra Unknown Data Predicted 1.5 3.1 1.6 3.5 2.1 1.11 Actual 1.5 14 0 0 0 0 0 3.1 0 3 0 0 0 0 1.6 3 0 7 0 0 0 3.5 0 0 0 14 0 0 2.1 0 0 0 0 5 0 1.11 0 0 0 0 0 22 Left: Confusion matrix of model All features [normalized], RF for predicting test data. The same model was used the classify textiles (old scarfs) from restorers of Kanuti Gild. It got 3 out of 4 correct and made a mistake by predicting cotton instead of viscose with probabilities of 0.44 vs 0.33, respectively. Linen Cotton

Upload: others

Post on 04-Oct-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CLASSIFICATION OF FIBERS USING IR-SPECTRA

RESULTS

• Feature Engineering

• Models

CLASSIFICATION OF FIBERS USING IR-SPECTRAKarl Kaupmees, Oliver Meikar, Edgar Sepp

INTRODUCTION

Identification of textile fibers is important in industry (quality control),forensic science (identification of fibers on crime scene), but also inconservation and archaeology (identification of historical textilefibers).

Common methods for fiber identification are microscopic observation,burning test and various solubility tests. Infrared spectroscopy (IR)has many advantages for fiber identification, because it offers highlycharacteristic information, is easy, fast, non-destructive and relativelyinexpensive.

However the analysis of IR spectra is tedious and requires a trainedscientist and some peak-analyzer software. The main difficulties arespectral inhomogeneities of repeated measurements and the intrinsicsimilarity between the spectra of some fibers.

Hence, a better method to analyze the IR specta of textiles is needed.

OBJECTIVES

To meet the increasing demand from the academic and industry, weaimed build a classifier that can identify fibers by their IR spectra.The initial aim is to be able to detect pure fibers at 80% probability.

For successful classification, we aimed to:

• reduce the complexity of the dataset• engineer additional features from the dataset• generate data normalization tools

CONCLUSIONS• As the amount of data is small the best performing models

can be consider more-or-less equal in performance.• Feature Engineering helped to improve the classification.• Most useful feature seems to be difference from local mean.• Feature engineering could be developed further and

additional filtering of data applied to concentrate on areaswith more information and thus providing better separation.

• Separating different spectra works well in general, accuracyabove 0.9, but there are still difficulties with more similarfibers: linen, cotton, viscose.

ACKNOW-LEDGEMENTS

We would like to thankresearchers Signe Vahurand Pilleriin Peets fromthe Cultural Heritageresearch group, Universityof Tartu, for providing thedata as well as domainknowledge throughout theproject.

DATASET AND METHODOLOGY

Our dataset contains of 438 IR spectra of 12 pure textile fibers both, natural and synthetic:

Cat count Name1.11 52 silk3.5 44 polyacrylic1.5 44 cotton1.6 40 linen2.1 35 viscose3.1 33 polyester1.1 30 wool1.9 24 jute3.11 22 elastane3.2 18 polyamide2.3 16 acetate3.12 15 polyetylene

Each spectra contains 1700 measureddatapoints.

Train data – 30 spectra from each classTest data – the remaining spectra from classes with more than 30 spectra

http://lisa.chem.ut.ee/IR_spectra/textile-fibres/

Right: Correlations between original data andfeatures.Down: 3 randomly sampled spectra from all classesand correlations between them. Big numbers showrandom forest (RF) classification accuracy on testdata when using only this feature.

0.5 0.6 0.7 0.8 0.9 1

Original data [raw], RF

Original data [standardized], RF

Original data [normalized], RF

Diff. from global mean [raw], RF

Diff. from global mean [standardized], RF

Diff. from global mean [normalized], RF

Diff. from local mean [raw], RF

Diff. from local mean [standardized], RF

Diff. from local mean [normalized], RF

Slope/angle of the graph [raw], RF

Slope/angle of the graph [standardized], RF

Slope/angle of the graph [normalized], RF

All features [normilized], RF

All features [normilized], SVM

All features [standardized], RF

All features [standardized], SVM

All features [standardized], kNN

All features [raw], RF

All features [raw], SVM

All features, glo loc [std], ang [norm], RF

All features, glo loc [std], ang [norm], SVM

Diff. from local mean [standardized], SVM

Diff. from local mean [standardized], kNN

Diff. from local mean [normilized], SVM

All features, glo loc [std], ang [norm], kNN

Accuracy on test data Accuracy on training data

Train Data

Raw data(Importing and

balancing)

Fill in NaN’s using linear interpolation

Calculate new Features:1. Difference from global mean2. Difference from local mean3. Slope / angle of the graph

Standardize / normalizeSubset the data

Train the modelRF, kNN, SVM

Test Data

Predict Test data

Select bestmodel

Predictunknownspectra

Unknown Data

Predicted

1.5 3.1 1.6 3.5 2.1 1.11

Act

ual

1.5 14 0 0 0 0 0

3.1 0 3 0 0 0 0

1.6 3 0 7 0 0 0

3.5 0 0 0 14 0 0

2.1 0 0 0 0 5 0

1.11 0 0 0 0 0 22

Left: Confusion matrix of model All features[normalized], RF for predicting test data.The same model was used the classifytextiles (old scarfs) from restorers of KanutiGild. It got 3 out of 4 correct and made amistake by predicting cotton instead ofviscose with probabilities of 0.44 vs 0.33,respectively.

Linen Cotton