ft-nir spectroscopy and laser diffraction particle sizing of apis in

$: FT-NIR spectroscopy and Laser Diffraction particle sizing of APIs in$
FT-NIR spectroscopy and Laser Diffraction particle sizing of

APIs in Pharmaceutical formulations

Joana Lúcia Carrilho Figueiredo

Dissertação para obtenção do Grau de Mestre em

Mestrado Integrado em Engenharia Química

Júri

Presidente: Prof. João Carlos Moura Bordado

Orientador: Prof. José Monteiro Cardoso de Menezes

Vogais: Prof. Helena Maria Rodrigues Vasconcelos Pinheiro

Dr. Paulo Alexandre de Araújo Loureiro Amaral

Setembro de 2008

FT-NIR spectroscopy and Laser Diffraction particle sizing of APIs in Pharmaceutical formulations

Joana Lúcia Carrilho Figueiredo

i

ACKNOWLEDGEMENTS

This work would not have been possible without the support and encouragement of

Professor José Cardoso de Menezes under whose supervision I did this thesis.

I would also like to thank Dr. Paulo Amaral, the Quality Director of Lusomedicamenta.

He was a strong promotor for the application of this project.

Cristiana Rocha, the Quality Assurance supervisor of Lusomedicamenta, is acknowledged

I thank her sympathy, availability and readiness in the collection of raw materials and

solvents.

Thanks to Professor Maria Joana Neiva Correia for the availability of her laboratory,

where the samples were prepared.

Licínia Rodrigues and Pedro Ceitil are acknowledged for their generous share of

knowledge on the different topics in this thesis.

I would like to express my special gratitude to João Henriques, Ricardo Duarte, Ornella

Preisner and Pedro Felizardo for taking time to discuss and see things from a new perspective

when I needed. Vera Lourenço and Gledson Emidio, my partners, are recognized for their

encouragement and great coffee time.

I wish to thank my best friend, Raquel Lopes, and my sister, Rita Figueiredo, for reading

the draft of the thesis and for their valuable comments.

I would also like to thank Filipe Calado, my boyfriend, for his loving support.

I cannot end without thanking my friends and my family for the constant encouragement

and love.

To them I dedicate this thesis.

ii

ABSTRACT

Near-Infrared (NIR) spectroscopy associated with chemometrics and Laser Diffraction

have proven to be suitable tools for simple and rapid analysis in the Pharmaceutical Industry.

This work aims the simultaneous determination of the three Active Pharmaceutical

Ingredients (APIs): Paracetamol (PA); Pseudoephedrine Hydrochloride (PS) and

Detromethorphan Hydrobromide (DX) in a pharmaceutical formulation, using NIR

spectroscopy. In addition, the Particle Size Distribution (PSD) of each API was determined by

Powder Laser Diffraction.

NIR spectra contain chemical and physical information about each of above components.

In order to explore the potential of NIR spectroscopy and to understand

similarities/differences between APIs, the spectra were analysed based on different pre-

processing data. PA and PS are chemically more similar than DX because have the same

functional groups. Physically, PA and DX have a Gaussian PSD, while PS has a bimodal

distribution. The interpretation of physical results obtained by NIR spectroscopy corroborates

with those obtained by Laser Diffraction.

Quantitative analysis of the pharmaceutical formulation was based on Partial Least

Squares (PLS) regression. The accuracy of NIR calibration model was evaluated according to

root mean square error of prediction (RMSEP), and the best results were 4mg of PA; 3mg of

PS and 2mg of DX per tablet.

The physical properties measured by both techniques were well correlated by Orthogonal

Projections to Latent Structure (OPLS) analysis, with a cross validated predictive ability of

45.9%

NIR Spectroscopy, Powder Laser Diffraction or both techniques can be used in-process

monitoring and control in the pharmaceutical solid dosage production.

Keywords: API, Near Infrared Spectroscopy, PLS, Powder Laser Diffraction, Particle Size

Distribution

iii

RESUMO

A Espectroscopia de Infravermelho Próximo (NIR) associada à Quimiometria e a

Difracção de Laser têm-se revelado ferramentas adequadas para a análise simples e rápida na

Indústria Farmacêutica.

Este trabalho visa a determinação simultânea de três Princípios Activos (APIs):

Paracetamol (PA); Pseudoefedrina Cloridrato (PS) e Bromidrato de Dextrometorfano (DX),

numa formulação farmacêutica, utilizando Espectroscopia em NIR. Para além disso, a

distribuição do tamanho de partícula de cada API foi determinada através de Difracção de

Laser de pós.

Os espectros NIR contêm informação química e física sobre cada componente acima

mencionado. Assim, para explorar o potencial da espectroscopia NIR e compreender as

semelhanças/diferenças entre APIs, os espectros foram analisados com base em diferentes

pré-processamentos. O PA e o PS são quimicamente mais semelhantes que o DX porque tem

os mesmos grupos funcionais. Fisicamente, o PA e o DX têm uma distribuição do tamanho de

partículas Gaussiana, enquanto o PS tem uma distribuição bimodal. A interpretação dos

resultados físicos obtidos pela espectroscopia de NIR corrobora com aquela obtida por

Difracção a Laser.

A análise quantitativa da formulação farmacêutica foi baseada na regressão dos Mínimos

Quadrados Parciais (PLS). A precisão do modelo de calibração NIR foi avaliado de acordo

com o erro médio quadrado de previsão (RMSEP), e os melhores resultados foram 4mg de

PA; 3mg de PS e 2mg de DX por comprimido.

As propriedades físicas medidas em ambas as técnicas foram bem correlacionadas através

da Projecção Ortogonal de Estruturas Latentes (OPLS), com uma habilidade preditiva de

validação cruzada de 45.9%.

A espectroscopia em NIR, a Difracção de Laser de pós ou ambas as técnicas podem ser

usadas na monitorização do processo e controlo da produção farmacêutica de dosagens

sólidas.

Palavras-chave: Princípio Activo, Espectroscopia de Infravermelho Próximo, PLS, Difracção

de Laser a pós, Distribuição do Tamanho de Partícula

iv

INDEX

Acknowledgements................................................................................................................ i

Abstract................................................................................................................................. ii

Resumo ................................................................................................................................iii

Index .................................................................................................................................... iv

Index of Figures ..................................................................................................................vii

Index of Tables .................................................................................................................... ix

Abbreviations....................................................................................................................... xi

1. Introduction ................................................................................................................. 1

1.1. NIR Spectroscopy ................................................................................................ 1

1.1.1. Advantages vs. disadvantages....................................................................... 2

1.1.2. Applications .................................................................................................. 2

1.1.3. Instrumentation ............................................................................................. 2

1.2. Chemometrics....................................................................................................... 4

1.2.1. Qualitative analysis in NIR spectroscopy..................................................... 5

1.2.1.1. Unsupervised classification methods ........................................................ 5

1.2.1.2. Supervised classification methods ............................................................ 6

1.2.2. Quantitative analysis in NIR spectroscopy ................................................... 6

1.2.3. Spectra pre-processing .................................................................................. 7

1.2.3.1. Mean centering.......................................................................................... 8

1.2.3.2. Autoscaling ............................................................................................... 8

1.2.3.3. Derivatives ................................................................................................ 8

1.2.3.4. Multiplicative Scatter Correction (MSC).................................................. 8

1.2.3.5. Standard Normal Variate (SNV)............................................................... 8

1.2.4. Variables’ selection....................................................................................... 9

1.2.5. Number of principal components (PC’s) needed.......................................... 9

1.2.6. Outliers........................................................................................................ 10

1.2.7. Statistics ...................................................................................................... 10

1.3. Powder Laser Diffraction ................................................................................... 11

1.3.1. Advantages vs. disadvantages..................................................................... 12

v

1.3.2. Instrumentation ........................................................................................... 13

2. Experimental.............................................................................................................. 14

2.1. NIR Spectroscopy .............................................................................................. 14

2.1.1. Sample preparation ..................................................................................... 14

2.1.2. Measurement............................................................................................... 15

2.1.3. Software ...................................................................................................... 16

2.2. Powder Laser Diffraction ................................................................................... 16

2.2.1. Measurement............................................................................................... 16

3. Results and Discussion .............................................................................................. 18

3.1. NIR spectroscopy and chemometric analysis of each API’s.............................. 18

3.2. Particle size distribution of each APIs ...............................................................22

3.3. Quantitative analysis of API’s............................................................................ 25

3.3.1. First strategy................................................................................................ 26

3.3.1.1. Calibration vs. Test sets .......................................................................... 27

3.3.1.2. Data pre-processing................................................................................. 28

3.3.1.3. Variable selection....................................................................................28

3.3.1.4. Number of PCs........................................................................................ 33

3.3.1.5. Outliers.................................................................................................... 34

3.3.1.6. Statistics .................................................................................................. 35

3.3.1.7. First strategy without variable selection ................................................. 36

3.3.1.8. First strategy with variable selection ...................................................... 37

3.3.2. Second strategy ........................................................................................... 39

3.3.2.1. Second strategy without variable selection............................................. 40

3.3.2.2. Second strategy with variable selection .................................................. 41

3.4. Obtained results vs. other studies....................................................................... 42

3.5. Orthogonal analysis............................................................................................ 43

4. Conclusions ............................................................................................................... 45

5. Suggestions for future work ...................................................................................... 48

6. References ................................................................................................................. 49

7. Appendix ................................................................................................................... 52

7.1. Determination of Percent Relative Standard Deviation (%RSD) ...................... 52

7.2. Matrix design for laboratory samples................................................................. 52

vi

7.3. First Strategy ...................................................................................................... 53

7.4. Second Strategy.................................................................................................. 55

7.5. Orthogonal analysis............................................................................................ 57

7.6. Mastersizer Average Result Analysis Report..................................................... 58

vii

INDEX OF FIGURES

Figure 1 – The NIR region in electromagnetic spectrum [2]. ................................................ 1

Figure 2 – The NIR spectrometer with solid and tablet sampling accessory. ...................... 4

Figure 3 – Representation of a PCA model structure........................................................... 5

Figure 4 – Representation of a PLS model structure............................................................ 7

Figure 5 – The powder laser diffraction equipment. .......................................................... 12

Figure 6 – Detection of instrumental noise in an FT-NIR absorption spectrum of DX. .... 15

Figure 7 – FT-NIR absorption spectra of the three active principles obtained by diffuse

reflectance. ............................................................................................................................... 18

Figure 8 – Chemical structure of DX, PA and PS, respectively. ........................................ 19

Figure 9 – Scores plot of 2nd derivative (15 point Savitzky-Golay) spectra of the three

APIs. ......................................................................................................................................... 20

Figure 10 – Scores plot spectra of the three APIs without any pre-treatment. ................... 20

Figure 11 – Scores plot spectra of six batches of DX from two different manufactures

(without any pre-treatment)...................................................................................................... 21

Figure 12 – The measure background................................................................................. 22

Figure 13 – Particle size distribution of the three APIs measured based on the Malvern

optical model. ........................................................................................................................... 23

Figure 14 – FT-NIR MSC spectra of each calibration set (with DX (DSM)). PA

concentration increases in the arrow direction between 77.1% and 92.3% (a); while PS

concentration between 10.2% and 0% (b); and DX concentration among 5.4% and 0% (c). . 27

Figure 15 – Scores plot of PA (with DX (DSM)) samples with the selected calibration and

test sets based on NIR MSC and Mean Centering pre-processed spectra................................ 28

Figure 16 – Coefficient of Determination (R2) versus wavenumber of PA calibration set

(with DX (DSM)). .................................................................................................................... 29

Figure 17 – iPLS results for DX (DSM) calibration set...................................................... 30

Figure 18 – Optimal spectral region selected by iPLS for pre-processed previous spectrum.

.................................................................................................................................................. 30

Figure 19 – Diagnostic plots of GA analysis: Fitness vs. Number of variables (a);

Evolution of average and best fitness (b); Evolution of number of variables (c); and Models

with variable number (d) .......................................................................................................... 32

Figure 20 – PRESS for PLS on PA calibration set (with DX (DSM)) data based on MSC,

1st derivative and Mean Centering pre-processing spectra....................................................... 33

viii

Figure 21 – Analysis showing PLS Model of PA calibration set (with DX (DSM)).......... 33

Figure 22 – Studentized Residuals versus Leverage for PA calibration set (with DX

(DSM))...................................................................................................................................... 34

Figure 23 – Q residuals versus sample for PA calibration set (with DX (DSM))............... 35

Figure 24 – Correlation between measured and predicted PA calibration set (with DX

(DSM)) [●: calibration set; ▼: validation set]. ........................................................................ 35

Figure 25 – Correlation between measured and cross-validation predicted set (with DX

(DSM))...................................................................................................................................... 39

ix

INDEX OF TABLES

Table 1 – The concentration range of each API for each calibration set............................ 14

Table 2 – The optical properties of APIs and dispersants. ................................................. 16

Table 3 – Particle Size distributions obtained for different lots and APIs suppliers (percent

relative standard deviation (%RSD) and weighted residual and obscuration)......................... 24

Table 4 – Size distributions obtained for different batches for DX (DSM) and relative

error. ......................................................................................................................................... 25

Table 5 – The correlation between samples’ concentration for each calibration set with DX

(DSM). ...................................................................................................................................... 26

Table 6 – The best GA parameters chosen to use for DX (DSM) calibration set............... 31

Table 7 – The best results for each calibration set without variable selection (using DX

(DSM))...................................................................................................................................... 36

Table 8 – The best results obtained for each calibration set (using DX (DSM)) with

variable selection...................................................................................................................... 37

Table 9 – The best results obtained in the first strategy. .................................................... 38

Table 10 – The correlation between samples’ concentration for each calibration set with

DX (DSM). ............................................................................................................................... 39


(DSM))...................................................................................................................................... 40

Table 12 –The best results for each calibration set (using DX (DSM)) using variable

selection techniques.................................................................................................................. 41

Table 13 – The best results obtained in the first and second strategy. ............................... 42

Table 14 – The best results obtained in current and Alcalá’s study. .................................. 42

Table 15 – OPLS summary results for different pre-processing techniques of DX samples.

.................................................................................................................................................. 43

Table 16 – The residual in X and Y results. ....................................................................... 44

Table 17 – The best results for each calibration model, using PLS regression, with variable

selection (using DX (DSM))..................................................................................................... 46

Table 18 – The weight percentage of each API and respectively RMSEP obtained for the

best calibration set and the weight in of each active ingredient in a tablet. ............................. 46

Table 19 – The accuracy obtained in our study Alcalá’s study [35]. ................................. 47

Table 20 – Matrix design for laboratory samples. .............................................................. 52

x


DX (Divis). ............................................................................................................................... 53


(Divis)). .................................................................................................................................... 54

Table 23 –The best results for each calibration set with variable selection (using DX

(Divis)). .................................................................................................................................... 54


(DSM))...................................................................................................................................... 55


DX (Divis). ............................................................................................................................... 56


(Divis)). .................................................................................................................................... 56


(Divis)). .................................................................................................................................... 57

xi

ABBREVIATIONS

API – Active Pharmaceutical Ingredient

d(0.1) – Equivalent volume diameter at 10% cumulative volume

d(0.5) – Median of particle size distribution or equivalent volume diameter at 50% cumulative

volume

d(0.9) – Equivalent volume diameter at 90% cumulative volume

DX – Detromethorphan Hydrobromide

ED – Euclidean Distance

EMEA – European Medicines Agency

FT – Fourier Transform

ICH – International Conference on Harmonisation

IF – Infrared

GA – Genetic Algorithm

LED – Light Emitting Diodes

LDA – Linear Discriminant Analysis

LV – Latent Variables

MC – Mean Centering

MD – Mahalanobis Distance

MLR – Multivariate Linear Regression

MSC – Multiplicative Scatter Correction

NIR – Near-Infrared

OPLS – Orthogonal Projections to Latent Structure

PA – Paracetamol

PASG – Pharmaceutical Analytical Sciences Group

PAT – Process Analytical Technology

PC’s – Principal Components

PCA – Principal Component Analysis

PCR – Principal Component Regression

PLS – Partial Least Squares

PLS-DA – Partial Least Squares Discriminant Analysis

PRESS – Prediction Residual Error Sum of Square

PS – Pseudoephedrine Hydrochloride

PSD – Particle Size Distribution

xii

RI – Refractive Index

RMSECV – Root Mean Square Error of Cross-Validation

RMSEP – Root Mean Square Error of Prediction

SG – Savitzky-Golay

SIMCA – Soft Independent Modelling of Class Analogy

SNV – Standard Normal Variate

US FDA – United States Food and Drug Administration

UV – Ultraviolet

% (w/w) – Weight Percentage

1st D – First Derivative

2nd D – Second Derivative

R2 (X)p – The percentage of X data explained by the model of predictive set

R2 (Y)p – The percentage of Y data explained by the model of predictive set

Q2p – The percentage of variation predicted by the model according to cross-validation

predicted set

LVp – Latent Variables of predictive set

R2 (X)o – The percentage of X data explained by the model of orthogonal set

R2 (Y)o – The percentage of Y data explained by the model of orthogonal set

Q2o – The percentage of variation predicted by the model according to cross-validation

orthogonal set

LV o – Latent Variables of orthogonal set

1

1. INTRODUCTION

In the last years, the pharmaceutical industry has developed and implemented innovative

approaches to ensure the final product quality and to reduce its production costs, according

with the Process Analytical Technology (PAT) initiative from the US Food and Drug

Administration (FDA) [1, 12]. The goal of PAT is to monitor and control the manufacturing

processes, in a real-time, to increase process understanding and that quality in the final

product is obtained consistently [1].

There are several PAT monitoring tools available. This thesis focuses only on NIR

Spectroscopy and Powder Laser Diffraction.

1.1. NIR Spectroscopy

In the 19th century, William Herschel discovered infrared radiation by passing sunlight

through a prism. However, only in 1960s the Near Infrared (NIR) spectroscopy emerges into

the analytical world, with the work of Karl Norris of the US Department of Agriculture [3].

Nowadays, this important analytical technology has been used in different industrial fields,

Petrochemistry; Medical; Environmental; Pharmaceutical and Textile Industries; and others.

In the electromagnetic spectrum, the NIR region is located in between Mid-infrared and

Visible. In a range of wavenumber 4000-14000cm-1 (respectively wavelength 700-2500nm),

the absorption radiation of overtone and combination bands of covalent bonds such as N-H,

O-H and C-H of organic molecules can be measured using a NIR instrument (Figure 1).

Figure 1 – The NIR region in electromagnetic spectrum [2].

2

1.1.1. Advantages vs. disadvantages

NIR spectroscopy is the measurement of absorbed, reflected or transmitted light incident

on a sample at a certain wavelength. In NIR region, the absorption is lower than in the

adjacent regions of the spectra, because has a high overtone order. Consequently, this method

does not require a previous treatment (e.g. a dilution), which allows rapid and easy analysis.

In addition, the sample can be reused because this method is non-destructive. The pathlengths

and the ability to sample through glass in the NIR allow samples to be measured in common

solid and liquid forms.

Like other techniques, NIR spectroscopy has also some drawbacks. The low sensitivity of

this technique restricts the determination of the active principles with less than 0.01% (w/w)

[10]. NIR spectroscopy is an indirect method which requires a reference method,

Chemometric techniques – statistical and mathematical procedures – to extract, and interpret

spectral information acquired from the sample NIR spectra.

1.1.2. Applications

The NIR spectra capture chemical and physical variability in the samples, which can be

used in several applications. In pharmaceutical industry, NIR spectroscopy is applied to

qualify and/or quantify active pharmaceutical ingredients (APIs) and excipients; to

characterize polymorphic form, granulation, powder blending, drying and coating of in-

process product, etc.

1.1.3. Instrumentation

For a wide range of NIR spectroscopy applications there are available different NIR

spectrometers and sample accessories.

The spectrometers are made up of four components: a light source; a wavelength selector;

sample accessories; and a radiation detector. Thus, light from a source is passed through a

wavelength selector to select a limited region of the spectrum. The radiant light from the

wavelength selector strikes the sample and the emerging beam is caught by the detector [7].

The most frequently employed sources of NIR spectrometer are tungsten and tungsten-

halogen lamps, because they generate a continuous radiation. Light Emitting Diodes (LEDs)

are also used, since they offer greater lifetime; are much more efficient than other source and

3

they can be a wavelength selector. However, they only emit a limited range of wavenumbers

and are very expensive [7-8].

The wavelength selectors are used to provide a narrow band of radiation, and there are

commercially available four general types of: filter instruments; LED source; Dispersive

optics-based instruments and interferometric (Fourier Transform instruments).

The filter allows just a particular slice of the spectrum to pass through a bandpass filter [8-

9].

In older dispersive instruments there were used prisms, which disperse different

wavelength accomplished with the separating capability of refraction. Currently, they use

monochromators, which consist on entrance and exit slits, mirrors, and a grating to disperse

the light. Nearly all commercially dispersive monochromators are diffraction grating, because

they are more efficient than others [7-8].

Fourier-Transform NIR spectrometers have several advantages over all other wavelength

selectors, because they show best resolution and signal-to-noise-ratios. These instruments add

to the Jacquinot advantage (or throughput) and the Fellgett advantage (or multiplex). FT

instruments do not require slits to achieve resolution, consequently it gets higher throughput

than dispersive instruments (the Jacquinot advantage). Furthermore, this equipment collects

all wavelengths simultaneous, which increases the detection efficiency of signals. This feature

is called Multiplex advantage [3-7].

According to the type of required analysis, the analyst has to choose the proper sampling

and spectra acquisition. In case of on-line1 analysis fibre optic probes are used, while if the

sample is removed from the process stream and analysed at-line on at a laboratory several

spectra acquisition accessories (e.g. fibre optic probes, vial holder, powder or tablet sampling

accessories) can be used depending on the type of samples (solid, tablets, liquid, etc) [3, 34].

Sample information in the NIR region is usually collected as an absorption spectrum

through transmission or diffuses reflectance measurements with a NIR spectrometer. If the

light passes through a sample it is called transmission, this is common for liquids transparent

samples by using quartz cuvetes. In case of diffuse reflectance, incident radiation is projected

into the surface of the sample and is reflected at different angles, which commonly occurs

with powder or solid samples [3].

1 The sample is measured without being removed from the process stream.

4

Detectors convert radiant energy to a measurable signal and their use depends on the

wavelength range to be measured. Silicon detectors are used for a limited range of wavelength

(between 700 and 1100nm), and InGaAs and PbS detectors devices are more suitable for wide

range of 1100-2500nm.

Figure 2 – The NIR spectrometer with solid and tablet sampling accessory.

Two important parameters to ensure good spectra collection of a FT-NIR spectrometer

during an analysis are number of scans and resolution. Resolution in an FT-NIR determines a

small frequency interval that can be distinguished over a spectral range and typically this

parameter ranges from 4 and 64cm-1. In case of selecting a high resolution, the spectrum

becomes more detailed, but it captures more noise and the analysis takes a longer time. The

number of scans acquired enhances the signal amplitude per unit time. This parameter is

inversely proportional to noise effect, and the typical values of number of scans are between

16 and 128 [17]. Setting these parameters is based on a compromise between the operation

time and analysis quality desired.

1.2. Chemometrics

After collecting the NIR spectra, the processing and interpretation of multivariate data for

qualitative and quantitative analysis is done in chemometric software. The first step is to split

the data set in two groups: calibration set and test set. Based on the thumb rule, two third of

the data set are employed for calibration purpose and the rest is used for testing [14]. Then,

the model is developed and optimized according to spectra pre-processing, best number of

variables, identification and elimination of outliers. In the following step, the model is

predicted with a test group to check its robustness and efficiency, and it should be at least

validated with unknown samples.

5

One of the advantages of NIR spectroscopy is the ability of classifying/identifying and

quantifying samples.

1.2.1. Qualitative analysis in NIR spectroscopy

Qualitative analysis uses NIR spectral information to identify and to classify samples, for

example as raw-material libraries. These techniques can be unsupervised and supervised. In

the unsupervised classification no a priori assumption is made about the samples that are

going to be classified, while supervised classification requires knowledge about the category

membership of samples [11].

1.2.1.1. Unsupervised classification methods

Several unsupervised methods are available, but the more common are Principal

Components Analysis (PCA) and Hierarchical methods.

Usually, PCA is used as a first step of the data analysis in order to detect patterns in

multivariate data collection. Thereby, PCA is a technique that, by reducing original data

dimensionality, allows relevant visualization features from data spectra. The original data (X)

is decomposed to scores – T (the values that represent the samples in the space defined by the

principal components) and loadings – LT (the correlation coefficients between the original

variables and the principal components) [6]. PCA selects a direction that retains maximal

structure in a lower dimension among the data (Figure 3).

Figure 3 – Representation of a PCA model structure.

Hierarchical methods proceed by an evaluation of samples similarity in terms of their NIR

spectra and result in a cluster sequence which can be represented graphically as a

dendogramme [12].

6

1.2.1.2. Supervised classification methods

The most used classical methods of supervised classification are distance based methods;

linear discriminant analysis (LDA); soft independent modelling of class analogy (SIMCA);

and partial least squares discriminant analysis (PLS-DA) [3, 12-14]. .

In case of distance based methods the similarity or dissimilarity of the test samples and

calibration samples is measured. The Euclidean Distance (ED) and Mahalanobis Distance

(MD) are the most popular distance methods. In ED, all directions in spectral space have the

same weight, which results on circles at round point. For the MD the variability along the axis

of the data set, and the distribution of points is following an ellipsoid is weighted.

LDA can be considered as a method similar to PCA with the difference that LDA focuses

on finding the direction that achieves a maximum separation among classes of a data set.

In the SIMCA method each principal component is calculated separately by class. This is

the most used class-modelling technique.

The aim of PLS-DA is to find the variables and directions in the multivariate space which

discriminate the established classes in the calibration set.

1.2.2. Quantitative analysis in NIR spectroscopy

In quantitative analysis, data is based on Beer’s Law, which states that the absorption

measured of each sample is proportional to concentration. The quantitative models are

developed using NIR spectral information (X variable), which is directly dependent of an

analyte concentration or a property that has been determined (Y variable). The most

employed techniques of quantitative analysis using NIR spectroscopy is based on Multivariate

Linear Regression (MLR), Principal Component Regression (PCR), Partial Least Square

(PLS) and Orthogonal PLS (OPLS) [3, 12-14].

The MLR allows the establishment of a linear link between a reduced number of

regression variables and a property of the samples (e.g. the concentration values). This

technique is very limited, consequently is less used in current applications in comparison with

other methods. This method should be just applied when there are more samples than

variables.

7

PCR model is built in two-step. First the spectral data is compressed with a PCA, and then

the concentration data is regressed against the scores2 matrix using a method similar to MLR.

This method can be used for very complex mixtures since the number of regression variable is

bigger than the number of calibration samples.

PLS method is similar to PCR, but produces better models using a lower number of

principal components (vide Principal Components). Nevertheless, both methods (PCR and

PLS) require a large number of samples for accurate calibration, and must avoid collinear

constituent concentrations. In PLS, the original data (X) is decomposed to scores, T; loadings,

LT; and residuals, E (the relative distance between the model and the observed points) [6]

(Figure 4).

Figure 4 – Representation of a PLS model structure.

The OPLS method is a modification of PLS which the independent set (X variables) is

separated into two parts: one that is linearly related to dependent set (Y variables), predictive

and the other is orthogonal [36-38].

1.2.3. Spectra pre-processing

Spectra often have problems with noise arising from instrument errors or are affected by

physical effects such as light scattering. Pre-process data reduces the contribution from noise

or even remove it and enhances the chemical signal of interest.

There are several pre-treatment methods that can be used to remove the non-constituent

data information such as mean-centering, autoscale, first- and second-derivative,

multiplicative signal correction, standard normal variate, among others. Sometimes, it can be

useful to apply a combination of pre-processing algorithms to improve the quality of the

model.

2 The individual transformed observations are called scores, while the participation of the original variables in

the principal components is given by the loadings [15].

8

1.2.3.1. Mean centering

The mean centering pre-process involves the subtraction of the average spectrum of each

spectrum, which enhanced the differences between the samples. This technique allows the

increase in the accuracy prediction of the model.

1.2.3.2. Autoscaling

Autoscalling, like mean-centering, removes absolute intensity information; moreover, it

also removes total variance information in each of the variables, scaled to unit variance. This

technique is often used when the X-variables have not the same units of measurement.

1.2.3.3. Derivatives

Derivatives of spectral data are used to remove offset and background slope variations

between samples. The first derivative of a spectrum removes the baseline offset, while the

second derivative eliminates slope differences between spectra as well as effectively

minimizing the physical properties of a sample [24]. The most common algorithm employed

in derivatives is Savitzky-Golay (SG) method, which requires the number of data point in the

function specified.

The main disadvantage of using this technique is the difficult interpretation of spectra

resulting.

1.2.3.4. Multiplicative Scatter Correction (MSC)

The MSC pre-process reduces the spectral variability caused by pathlength effects such as

different particle size and light scattering among samples, generally in diffuse reflectance

spectroscopy. Mathematically, this method calculates the average spectrum from all the data

in the calibration set and uses it as the reference spectrum [12, 17].

1.2.3.5. Standard Normal Variate (SNV)

Like MSC, SNV method is used to remove scattering effect from the variations of spectral

data, but the correction factors are determined differently. Each spectrum is corrected

individually by first centering the spectra values, and then the centered spectra are scaled by

the standard deviation calculated from the individual spectra values [12, 17].

9

1.2.4. Variables’ selection

A variable selection algorithm reduces the number of variables which usually contain

redundant and noise information. There are several strategies to select the relevant variables

to allow producing the ‘best’ model, such as Coefficient of Determination, iPLS (MATLAB

toolbox) and Genetic Algorithm (GA) (MATLAB toolbox).

The Coefficient of Determination (R2) correlates the NIR spectra information (X variable)

and the analyte concentration (Y variable). This coefficient varies between 0 and 1, and the

highly coefficient indicates the best correlated region of spectrum.

As well as coefficient of determination, iPLS investigates the collinear variables of data

sets. This method splits the data set into equidistant intervals and calculates PLS models for

each interval. The iPLS has the ability to focus on important spectral region with less

interference [18].

GA is an optimization method based on genetic processes of biological organism.

Initially, it generates randomly an initial population of individuals which are represented in

encoded form called chromosomes. Next the fitness of each chromosome is evaluated, and

then it is applied genetic operators: selection, crossover and mutation. Lastly it is checked if

the new population satisfies the termination conditions, otherwise, everything is repeated

from the fitness step until a certain percentage of chromosomes are identical. The major

advantage of GA is its flexibility and robustness, but there is an inherent risk of overfitting

[19-21].

1.2.5. Number of principal components (PC’s) needed

The model is based on reducing the number of variables, consequently is essential to

select the number of PC’s which best define the model. Each PC contains different relevant

information, but the first components represent the most important data variation. If too many

components are used, too much information data is included in the model, which becomes an

overfitted solution. The model will be data dependent and will present more difficult to

predict results. On the other hand, using too few components, the model will not capture

enough variability in the data – underfitted. So, the optimal number of PC’s is between the

two extremes.

The main problem in choosing the number of PC’s is subjectivity. The number of PC’s

can be selected according to the minor Prediction Residual Error Sum of Square (PRESS).

10

PRESS calculates the squared difference between the test and calibration samples used in the

model.

But, before calculating the number of PC’s, the model should be tested. There are two

different methods to test it: Self-Prediction and Cross-Validation. Self-Prediction, predict the

same samples used for model building, which does not guarantee the model performance. The

Cross-Validation method is based on predicting subsets of samples, previously removed from

calibration set. Note that selection of subsets of samples could be done leave-on-out or

contiguous block procedure. This technique has two main advantages over the other. The first

benefit is a good performance of the model, since the predicted samples are not the same as

the samples used to build the model. The second one is the simplicity outliers’ detection.

1.2.6. Outliers

An outlier is a sample which has different characteristics from calibration set. There are

several reasons to detect an outlier such as: an instrumental or experimental error, change of

operative conditions, etc.

If the differences between a supposed outlier and calibration set are significant, the

samples do not fit the model well and they should be identified and eliminated to build an

efficient model. In fact, not all outliers are erroneous, i.e., some observations can be just

slightly different from the rest, which can guarantee the model robustness. To distinguish

between erroneous and non-erroneous outliers diagnostic tools are required to detect it, such

as leverage, residuals (e.g. Y-studentized residual and spectral residual) and hotelling T2.

1.2.7. Statistics

After determining the calibration equation, to evaluate the accuracy of model it is required

to check some parameters such as coefficient of determination of model (R2), root mean

square error of cross-validation (RMSECV) and root mean square error of prediction

(RMSEP).

Coefficient of determination of model is calculated between the NIR predictive and the

reference measurement value, from the calibration and the test sets [13].

11

( )

( )

−

−−=∑

∑=

=

=

=ni

ii

ni

iii

yy

yyR

1

2

1

2

2

\ˆ

1 (1)

where iy \ˆ is the estimated result for sample i when the model is constructed with the

sample i removed, yi is the reference measurement result for sample i, and y is the mean of

reference measurement results for all samples in the train and test sets.

The root mean square error of cross-validation is calculated as follows (2), where n is the

number of samples in the calibration set [13].

( )

n

yyRMSECV

ni

iii∑

=

=

−= 1

2\ˆ

(2)

The RMSECV calculated by cross-validation may give over-optimistic results, because the

same samples used for calibration development are also applied to validate the model.

For the test set, the root mean square error of prediction is calculated as follows equation

(3), and iy is the estimated result of the model for test sample i and m is the number of samples

in the test set [13].

( )

m

yyRMSEP

mi

iii∑

=

=

−= 1

2ˆ

(3)

The optimum model is defined with a lowest RMSECV and RMSEP and higher R2.

1.3. Powder Laser Diffraction

The particle size of APIs and excipients has huge influence on its handling and

processing, which can be crucial on the manufacture process. Thus, the particle size

distribution (PSD) analysis becomes of great importance for process optimization and control.

For the characterization of the particle size there are some precise and accurate analytical

methodologies. The most common techniques are optical microscopy, analytical sieving

method and powder laser diffraction that may be used depending on the measuring purpose.

12

The analytical sieving method is an old, but cheap technique. Usually, this method is

applied to powdered materials having a particle size of more than about 75 µm [23].

The optical microscopy is used to observe the morphological appearance and shape of the

particle. This method can generally be applied to particles in the size range between 0.5 and

100 µm, however it is not suitable as a quality or production control technique [23].

The most regularly applied technique is powder laser diffraction, which was used during

this study.

1.3.1. Advantages vs. disadvantages

The powder laser diffraction system allows a rapid measurement with a small volume of

sample, without the need of any external calibration. Moreover, the powder laser diffraction

equipment has a high reproducibility, is very flexible, and has the ability to analyse dry or wet

particles.

Figure 5 – The powder laser diffraction equipment.

In case of wet analysis it is fundamental to choose a good dispersant, to guarantee that the

sample does not solubilise.

The Mie’s theory assumes that the determination of particle size is based on the equivalent

sphere diameter; however the majority of particles are irregular [24]. But comparing some

feature of the actual particle to an imaginary spherical particle is the easy way to get a single

unique number to describe an irregular shaped particle [22].

13

1.3.2. Instrumentation

Powder laser diffraction is one of the most used techniques for particle size analysis. This

method consists on a passage of the sample through a focused He-Ne laser beam (λmáx = 633

nm) [23]. The particles scatter light at an angle, inversely proportional to their size, which is

measured by photosensitive detectors. According to Mie’s theory, the particle size can be

calculated with scattering intensity and angle information. But for this is necessary to specify

the refractive index (RI) and the absorption of the material under study [22-23].

14

2. EXPERIMENTAL

A quantitative analysis in NIR spectroscopy was developed and a PSD by powder laser

diffraction of the APIs in a Pharmaceutical formulation was determined.

2.1. NIR Spectroscopy

2.1.1. Sample preparation

The pharmaceutical formulation studied is a mixture of three APIs: Paracetamol (PA),

Pseudoephedrine Hydrochloride (PS), and Dextromethorphan Hydrobromide (DX) and

placebo.

For the development of a quantitative analysis, three independent experimental designs

(vide appendix 7.2) for each API were created. Thereby, in each calibration set,

concentrations of a selected API and placebo were varied by overdosing and underdosing.

The concentration range (%) of each API was chosen according to an extreme situation,

which can lead to production problems. PS and DX exist in minor quantity in the

pharmaceutical formulation, so each API was underdosing at the minimum limit (0% (w/w))

which allows detecting homogeneity problems during the production. In case of the majority

component, the PA was overdosed until 92.3% (w/w), assuming the inexistence of placebo.

Besides the low PS and DX concentration, NIR spectroscopy allows identifying both

components in a sample because their concentrations are upper than 0.01% (w/w) [10].

The range and nominal concentration (% w/w) of each API is shown in Table 1.

Table 1 – The concentration range of each API for each calibration set.

API % (w/w) PA 84,7 ± 7,6 PS 5,1 ± 5,1 DX 2,7 ± 2,7

In the laboratory, 19 PA powder samples; 12 PS and DX samples were prepared in an

amount of 7.5 g each. During sample preparation, the active principles and placebo were

accurately weighed (in an analytical balance with 0.1 mg precision) and properly

homogenized in a small laboratory vortex mixer for 1 minute, between each addiction.

15

Currently, the production of the studied pharmaceutical formulation is being carried out

with DX from two different suppliers (Divis and DSM). Thus, there were developed two sets

of samples with each supplier.

2.1.2. Measurement

The diffuse reflectance spectra were collected in an ABB FTLA2000 FT-NIR

spectrometer, equipped with a tungsten-halogen source; an InAs detector and a powder

sampling accessory. Before spectra data acquisition, the best gain for background and samples

was selected and aligned. Each spectrum had an average of 64 scans and provided a resolution

of 16cm-1. The spectral data analysis covered the range from 3996.2 to 12004 cm-1.

Before the acquisition of NIR spectra, every day a reference spectrum was recorded, the

background (using Teflon). Background measures the instrument and environment

contributions, to correct those deviations from sample measurement [17].

All sample measurements were recorded in triplicate.

A spectrum captures many different variations such as constituent parameter (e.g.

concentration, drying, coating, etc); instrument variations (e.g. detector noise); environmental

conditions (e.g. laboratory room temperature) and differences in sample handling, which

affect the baseline and absorbance. A good performance calibration set should only represent

the different concentrations of the constituents of the mixture. Therefore, before start the

construction of the calibration models, the noise level was checked over full wavenumber

range and for both high and low absorbance of spectra.

4000 5000 6000 7000 8000 9000 10000 11000 120000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

wavenumber (cm-1)

Abs

orba

nce

DX

Figure 6 – Detection of instrumental noise in an FT-NIR absorption spectrum of DX.

16

Following the Figure 6 instrumental noise in a range of 9003.1-12004cm-1 was detected,

which was previously removed prior to any pre-process (hence eliminating the other

irrelevant variations) or analysis.

2.1.3. Software

The data collection was controlled using GRAMS/AI (version 7.0 from Thermo Galatic,

USA) software. Multivariable calibration was performed in Matlab (version 6.5 from

Mathworks Inc., USA) with PLS toolbox (version 3.0 from Eigenvector Inc., USA). The

variable selection was developed with iPLS toolbox (2.1 routine by Nørgaard) and GA

toolbox (version 6.5 from Mathworks Inc., USA). The orthogonal analysis was done in

SIMCA-P+11.5 (MKS-Umetrics, Umeå).

2.2. Powder Laser Diffraction

2.2.1. Measurement

Wet laser diffraction measurements of each API were performed with Malvern

Mastersizer MS2000 (from Malvern Instruments Ltd., UK)3 using a small amount of sample.

This equipment allows ± 1% of accuracy on d(0.5) [26].

Before starting a measurement, the sample, the dispersant RI index and the sample

absorption value discrimination were required. The DX RI index and PS and DX absorption

values were estimated by trial and error, since there was no information available. The

viability of these parameters was checked according to the thumb rule of residual. The optical

properties of APIs and dispersants were summarized in Table 2.

Table 2 – The optical properties of APIs and dispersants.

API RI of API Absorption Dispersant RI of Dispersant [31]

PA 1.62[27] 0.32[29] Deionised water (20ºC) 1.33

PS 1.53[28] 0.50 Ether 1.35

DX 1.50 0.50 Ether 1.35

3 This equipment can measure particle sizes from 0.02 to 2000µm.

17

For PA measurements deionised water was used instead of tap water because the first

allows to do stable measurements, based on a Malvern 2000 report [25]. The high electrolyte

concentration in the tap water causes the emulsion to flocculate, which report a much larger

particle size than expected.

For all measurements, the pump speed was adjusted for 1750 rpm, which guarantee the

best conditions of suspending all the material without air bubble formation.

18

3. RESULTS AND DISCUSSION

The main aim of this work was the development and optimization of a PLS calibration

model to quantify simultaneously the three active principles of the commercial

pharmaceutical formulation studied. Several calibrations were built to determine the most

accurate and robust one. First, the chemical and physical information contained in NIR

spectra of each pure component were analysed in order to explore the potential of NIR

spectroscopy and to understand their similarities/differences. In parallel, their particle size

distribution by powder laser diffraction was determined.

Finally, in order to correlate the obtained results by NIR spectroscopy and powder laser

diffraction, an OPLS analysis was developed.

3.1. NIR spectroscopy and chemometric analysis of each

API’s

Development of quantitative analysis should be preceded of an exercise aimed at

correlating chemical knowledge about the APIs of pharmaceutical formulation. The results of

this exercise allow identifying some important NIR absorptions bands of the each active

principle.

Figure 7 – FT-NIR absorption spectra of the three active principles obtained by diffuse reflectance.

Figure 7 shows a strong overlapping of PA and PS absorption band (i.e. lack of selectivity

of NIR absorptions) between 4040-4080 cm-1, C-H and C-C combinations; in the range of

19

5880-6060 cm-1 the C-H 1st overtone; and in the 8740-8860 cm-1 region the C-H 2nd overtone.

This phenomenon can be justified by the similarity of some functional groups of APIs.

As can be seen in Figure 8, PA and PS both have an aromatic ring, a hydroxyl group (–

OH) and a secondary amine (–R2NH).

Figure 8 – Chemical structure of DX, PA and PS, respectively.

Despite of lack of selectivity of PA and PS, the development of multivariate models is

possible because there are chemometric techniques capable of solve this problem and some

selective spectral ranges for each active principle. DX absorption band can be visualised N-H

1st overtone from 6520-6720 cm-1 and C-H 2nd overtone absorption band from 8200-8450 cm-

1. In case of PA a N-H combination band in 4560-4750 cm-1 and C-H 1st overtone from 5900-

6150 cm-1 can be identified. For PS the C-H 1st overtone combination is detected in the range

of 7250-7500 cm-1.

NIR spectra can capture the chemical and physical4 characteristics of samples, which can

be interpreted through the use of chemometrics techniques. In case of spectra pre-processing

(e.g. second-derivative) physical effects are reduced, which makes easier to identify the

chemical information in only one PC. In other hand, if spectra pre-processing is not applied,

then the first PC’s can capture the physical effects of sample.

Figure 9 shows that the PCA model used two PCs explaining 98.85% of total variation of

X, and obtaining three distinct clusters.

4 NIR spectroscopy can captures particle size of the pharmaceutical compounds and different suppliers.

20

Figure 9 – Scores plot of 2nd derivative (15 point Savitzky-Golay) spectra of the three APIs.

The first component (87.16% variance) should describe the chemical properties because

PA and PS, which are chemically more similar, are closer to each other than DX in PC1. This

spectra pre-processing removes almost all the irrelevant information (such as noise), but only

reduces physical characteristics of samples. Consequently, PC2 supposedly describes that

information (11.69% variance). According to this assumption, PA and DX should be

physically more similar than PS.

In the scores plot spectra of APIs without any spectra pre-processing (Figure 10) this

assumption was checked.

Figure 10 – Scores plot spectra of the three APIs without any pre-treatment.

The PCA projection indicates that the two first principal components account for 98.92%

of the total variance, which is quite significant. The first component represents 81.31% of

21

variance, probably described by particle size distribution. According to Figure 10, DX and PA

should have a more similar particle size in comparison to PS, because they have the same

importance of PC1. This corroborates the previously supposition that the physical properties

of samples are described by PC2 of Figure 9.

Without any mathematic treatment, the NIR spectra contain sample information besides

background variation and noise. However, the chemical signal of interest is reduced and

physical parameters more enhanced. Thereby, in Figure 9, PC2 should explain also physical

effects, such as light scattering.

Figure 10, the PS forms a less spread cluster, while PA and DX clusters are more

scattered. This can be justified by the different number of batches in each cluster and batch-

to-batch variability. The spectral data of PA has information from two lots and DX from six,

although PS is constituted just by a sample of one lot.

As above mentioned, the production of pharmaceutical formulation studied is being

carried out with DX from two different suppliers. PCA of six batches (three of each supplier)

was developed with the goal to analyse physical differences between samples. Figure 11

shows that two first principal components account for the most spectral variation (99.80%)

enough to describe the variation between samples.

Figure 11 – Scores plot spectra of six batches of DX from two different manufactures (without any pre-treatment).

The first PC captures most of the variance in the data without any pre-treatment (94.65%),

which may describe the differences between particle sizes of the samples. The distance

between each of two suppliers is small in PC1 (when compared to Figure 10). Consequently,

this does not guarantee that there are different particle sizes between samples.

22

PC2 probably represents different powder compaction force between replicas of each

batch, which could be justified by the light scattering effect (5.15% variance).

To confirm physical properties speculations made before regarding between APIs, particle

size distribution was measured by powder laser diffraction.

3.2. Particle size distribution of each APIs

In the PSD analysis, two important parameters should be taken into account: the residual

and obscuration (%).

To guarantee a good fit between the calculated data and the measurement data, the residual

has to be less than 1% [22]. Otherwise, it may indicate the use of incorrect RI and/or

absorption values for the sample/dispersant or poor background.

If several measurements have a high residual value, the RI/absorption values should be

changed and they have to be recalculated.

In case of a poor background, an untypical background light scattering pattern is observed.

An erroneous background measurement provides a wrong particle size distribution, because

particle size is determined by the difference between the sample measurement and the

background. Thereby, to assure good results all analysis that did not meet the measurement

threshold parameters should be repeated.

Before starting the sample measurement a clean and stable background is required. For

that, a typical light scattering have to be detected, i.e., a near exponential decay across the

detector array and less than 20 units of scatter by ring 20 should be observed in the measure

background (Figure 12) [22].

Figure 12 – The measure background.

23

The equipment allows the measure of the amount of light scattered by the sample and

correlates with the concentration of material present within the measurement zone, obscurity.

The Malvern 2000 software (version 5.22) has an “obscuration bar” that gives a visual

indication of how much sample is added, which have to be about 10-20% of obscurity.

Finally, the sample analysis is proceeded [22].

The particle size distribution of each API of the studied pharmaceutical formulation was

measured according to a wet dispersion powder laser diffraction method (vide 2.2.1), as can

be seen in Figure 13.

Figure 13 – Particle size distribution of the three APIs measured based on the Malvern optical model.

The PSD curves of PA and DX are typically Gaussian or Normal Distributions, which

implies that the mean, median and mode match and the particle size is uniform along the

volume distribution. The PS distribution is more complex than PA and DX. This is a bimodal

distribution, i.e., this lot has non-homogenous PSD. The left peak contained almost all of the

volume percentage and ended at about 170 µm, while the peak to the right covered the

particle size range 720 µm (vide appendix 7.6).

Thus, PC1 interpretation of the score plot of the three APIs without any pre-treatment –

Figure 10 – was corrected because PA and DX PSD are more similar than PS.

The PSD is described by the equivalent volume diameter at 10%, 50% and 90%

cumulative volume, respectively, d(0.1); d(0.5) and d(0.9).

One sample of PS batch, six of DX and two of PA were properly analysed. The results

obtained are summarized on Table 3, as well as some measurement parameters that indicate

how well the calculated data fitted the measurement data.

24

Table 3 – Particle Size distributions obtained for different lots and APIs suppliers (percent relative standard deviation (%RSD) and weighted residual and obscuration).

API Batch

number d(0.1)

µµµµm % RSD

d(0.5) µµµµm

% RSD d(0.9)

µµµµm % RSD

Weighted residual

Obscuration (%)

7004727 69.57 7.4 134.39 2.0 251.33 4.7 0.8 15.9

DX (DSM) 8000874 63.79 2.5 128.23 2.4 235.95 3.7 0.7 12.9

8002235 66.90 0.2 139.87 2.0 252.73 1.4 0.8 10.3

7000732 48.46 2.5 105.87 0.4 206.04 0.4 0.2 14.0

DX (Divis) 7000733 42.88 5.3 94.76 4.0 187.44 5.0 0.9 13.7

8002055 65.19 1.4 135.55 1.0 252.51 0.1 0.7 11.0

PA 7004729 81.44 7.5 248.77 3.0 541.29 5.1 0.8 11.2

(Mallinckrodt) 8001124 94.44 3.0 255.95 4.1 490.21 7.1 0.7 10.7

PS (BASF) 8000221 9.93 7.3 37.75 7.4 385.28 4.9 0.7 17.3

The DX PSD was almost equal between lots of both suppliers, with the exception of the

two first lots from Divis (which were similar among them). Probably, some problems during

the production of these two lots occurred. The DX obtained results confirm the previous

interpretation of Figure 11 – score plot analysis of the six lots from both suppliers. The PC1

actually describes the particle size, which is quite similar and increases from the left to right

(except the 8002055 batch from Divis).

There are no significant PA PSD differences between lots.

PS has an irregular PSD, once the particle size different between d(0.5) and d(0.9) is

bigger than d(0.1) and d(0.5), which does not occur with other APIs. This can be justified

with the Bimodal Distribution.

The precision of this method was ensured by percent relative standard deviation (%RSD),

which has to be less than 3% at the d(0.5) and 5% at the d(0.1) and d(0.9) according with the

ISO standard for powder laser diffraction measurements – ISO13320-1 [30]. For DX PSD, the

%RSD for d(0.1) of 7004727 batch from DSM and 7000733 batch from Divis showed a small

deviation when compared to the imposed limits. The same happens with the d(0.1) of

7004729 batch and d(0.9) of 8001124 batch for PA, such as the d(0.1) and d(0.5) from PS.

As mentioned above, there are two important parameters in the PSD analysis: obscuration

(%) and residual value. In accordance with the set limits, all measurements were done with a

sufficient amount of sample (between 10.3% and 17.3%). Moreover, the residual rule of

thumb was respected (less than 1%), so a correct refractive index and absorption values were

used and a clean background was measured. So, good results were obtained.

The Quality Assurance and Quality Control Departments from Lusomedicamenta made

available product data sheets for each API batch. However, only the DX (DSM) data sheets

25

contained PSD analysis, these results were compared with those obtained by powder laser

diffraction on Malvern equipment at IST.

Table 4 – Size distributions obtained for different batches for DX (DSM) and relative error.

API Batch

number d(0.1)

µµµµm Relative

error d(0.5)

µµµµm Relative

error d(0.9)

µµµµm Relative

error 7004727 69.57 134.39 251.33

Supplier 62.00 12.2%

139.00 3.3%

235.00 6.9%

DX 8000874 63.79 128.23 235.95

(DSM) Supplier 61.00 4.6%

136.00 5.7%

231.00 2.1%

8002235 66.90 139.87 252.73

Supplier 58.00 15.4%

134.00 4.4%

225.00 12.3%

The obtained results by powder laser diffraction on the Malvern and the ones filled by

Lusomedicamenta supplier are quite similar. Therefore, the relative error calculated is

acceptable, since is less than 15.4%. The small differences between both results can be

justified possibly due to experimental errors or a suboptimal RI for the analysis protocol at

IST, since IST results are always higher than DSM’s.

3.3. Quantitative analysis of API’s

As mentioned before, the pharmaceutical formulation contains two compounds (PA and

PS) with overlapping spectra, which requires the use of multivariate chemometrics techniques

to solve this problem. PLS is a reasonable choice for the resolution of overlapping signal and

quantitative analysis.

In this work, the quantitative analysis of pharmaceutical product was developed, based on

FT-NIR spectroscopy. Thereby, several strategies for robust multivariate modelling using

PLS regression were proposed, with different spectra pre-processing, with or without variable

selection.

Currently, Lusomedicamenta is using both DX from DSM and Divis. However, for the

development of quantitative analysis of this pharmaceutical formulation, the DX supplier is

irrelevant because calibration models are building focused on DX chemical properties (the

physical properties are minimised by applying spectral pre-treatments, but beyond that they

have similar PSD and are within the specifications). Nevertheless, two independent models of

quantitative analysis for both suppliers were developed and the obtained results were quite

similar, as would be expected. For simplicity in Results and Discussion only the DX (DSM)

26

calibrations results were presented since these were better than DX (Divis) (all information

about the other calibrations set is available in appendix).

3.3.1. First strategy

Three independent experimental designs were created, where each API and placebo

concentrations were varied by overdosing or underdosing. They are very correlated, but

placebo does not interfere directly on the calibration model therefore its quantification is not

done. So, this was considered the best procedure because correlations between API’s

concentration were minimized, avoiding correlations among constituents.

Three individual calibrations (one for each API) were developed, which allow taking into

account small deviations of only one API from linearity in the studied concentration range.

The correlation between samples’ concentration for each calibration set with DX (DSM)

can be seen in Table 5.

Table 5 – The correlation between samples’ concentration for each calibration set with DX (DSM).

R2 PA PS DX Placebo

PA 1.00 - - -

PA PS 0.01 1.00 - -

DX 0.01 0.14 1.00 -

Placebo 1.00 0.01 0.01 1.00

R2 PA PS DX Placebo

PA 1.00 - - -

PS PS 0.16 1.00 - -

DX 0.15 3.00E-04 1.00 -

Placebo 0.16 1.00 3.00E-04 1.00

R2 PA PS DX Placebo

PA 1.00 - - -

DX PS 0.37 1.00 - -

DX 0.29 0.18 1.00 -

Placebo 0.29 0.18 1.00 1.00

The API concentration variation is easily detected in FT-NIR spectra (Figure 14).

27

a) b)

c)

Figure 14 – FT-NIR MSC spectra of each calibration set (with DX (DSM)). PA concentration increases in the arrow direction between 77.1% and 92.3% (a); while PS concentration between 10.2% and 0% (b); and DX concentration among 5.4% and 0% (c).

In each calibration set, the API concentration is proportional to absorption measured –

based on Beer’s Law – and increases in the arrow direction.

3.3.1.1. Calibration vs. Test sets

The spectral data are split into two subsets: calibration set (two third of all samples) and

test set (one third of samples). The first set is employed to build the model, while the other is

used to predict it. Note that, if it is chosen a single sample for the test set, the three replicas

are assigned to that set. The choice of calibration set is a crucial point, to ensure a robust

model the calibration set has to cover the maximum spectral variability observed in the scores

plot (Figure 15), as well the variability expected from future samples.

PA

concentration

+

- -

PS

concentration +

DX

concentration +

-

28

Figure 15 – Scores plot of PA (with DX (DSM)) samples with the selected calibration and test sets based on NIR MSC and Mean Centering pre-processed spectra.

Independently the pre-processing used in this step, the distribution of subsets selected is

almost constant.

3.3.1.2. Data pre-processing

Three approaches to pre-process the NIR spectra were applied in this work, MSC; first-

(1st D) and second-derivative (2nd D) using the SG algorithm with a 21-point moving window

and a second-order polynomial.

The calibration model was constructed by PLS, using contiguous block method for cross-

validation.

3.3.1.3. Variable selection

The calibration model can be constructed over the whole wavenumber range or with

selected spectral ranges. In the first case, the model is more robust because is susceptible to

interferences. In other hand, the second procedure allows simplifying the calibration model

and obtains a much precise model, which only focuses on the variables whose variation of

API concentration is significant.

On this strategy three different techniques of variable selection were applied: Coefficient

of Determination, iPLS and Genetic Algorithm.

29

The Coefficient of Determination (R2) correlates the NIR spectra information (X variable)

and the API concentration (Y variable). The coefficient over wavenumber was calculated by a

function developed in Matlab, as shown in the Figure 16.

4000 4500 5000 5500 6000 6500 7000 7500 8000 8500 90000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

wavenumber (cm-1)

R2

Figure 16 – Coefficient of Determination (R2) versus wavenumber of PA calibration set (with DX (DSM)).

This coefficient varies between 0 and 1, and the best correlated region of spectrum had a

highly coefficient. As variable selection criteria, all wavenumber ranges which had a

coefficient over 0.7 were accepted.

In case of using iPLS, the spectra were split into smaller equal width regions, PLS

regression models for each sub-intervals developed and the global RMSECV (over the whole

wavenumber range) calculated. The region with the lowest model error was chosen.

For each calibration set several combinations of pre-treatments and spectral intervals were

studied, and the best combination was chosen. Figure 17 shows the example of DX calibration

set based on MSC, second derivative and mean centering pre-processing spectra.

30

Figure 17 – iPLS results for DX (DSM) calibration set.

As can be seen in Figure 17, the pre-processed spectra was split into 20 intervals, and the

optimal number of PLS components in each interval was indicated at the bottom of each

vertical bar. Moreover, the global model RMSECV with 4 LV was represented with a dotted

line.

Next, in more detail the fifth interval was investigated since had lower RMSECV than the

global model. In the 5022.3 and 5269.1cm-1 range a better calibration will be developed

comparing to the whole wavenumber range.

Figure 18 – Optimal spectral region selected by iPLS for pre-processed previous spectrum.

4500 5000 5500 6000 6500 7000 7500 8000 8500 9000

-12

-10

-8

-6

-4

-2

0

2

x 10-3

Wavelength

Res

pons

e, r

aw d

ata

[mea

n is

use

d in

the

cal

cula

tions

]

Interval number 5, wavelengths 5022.27-5269.14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200

0.2

0.4

0.6

0.8

1

RM

SE

CV

Dotted line is RMSECV (4 LV's) for global model / I talic numbers are optimal LVs in interval model

Interval number

5 1 5 2 3 3 4 2 4 2 4 3 9 2 1 3 4 3 2 2

31

The interval 5 had the lowest RMSECV using 3 PLS components (more details for this

calibration can be seen in Table 8). Furthermore, this interval represented the C=O 2nd

overtone absorption band of DX, shows in Figure 5.

This method gives an overview of the spectral data and shows the interesting spectral

region to calibrate PLS model, but only allows knowing information about each interval.

On this strategy another method for variable selection in PLS regression was used, Genetic

Algorithm, for the same calibrations set on first strategy studied.

GA is a random technique inspired by natural selection mechanism, which find the

optimal variable subset to build a PLS model with the lowest RMSECV.

For each calibration set several pre-treatments and different GA parameters was studied,

and the best combination chosen. The GA for DX (DSM) calibration set was performed with

the following parameters:

Table 6 – The best GA parameters chosen to use for DX (DSM) calibration set.

Parameter Value Population size 64

Maximum generations 100

Mutation rate 0.005

Window width5 10

Convergence 50

Initial terms 30

Crossover Double

Regression PLS

Maximum LV 8

Cross validation Contiguous

In this case, GA was performed using a PLS regression method with a maximum of 8 LV

to avoid over fitting of the model and contiguous block method for cross-validation used.

Moreover, a MSC, second derivative and mean centering pre-processing was applied.

The algorithm started by a randomly generation of an initial population constituted by 64

individuals. Each individual was represented by chromosomes. At generation, the fitness of

each chromosome was calculated and evaluated according to the RMSECV, and half of

individuals with worst results are discarded. Next, the offspring by the genetic operator such

as double cross-over and mutation was created.

5 This parameter indicates how many adjacent variables should be grouped together at a time.

32

In double cross-over, the genes from two random individuals are split at some random

point, which are randomly grouped creating two new individuals.

Mutation consists in an arbitrary bit in a genetic sequence change from its original state

with a mutation rate of either 5%.

The GA simulation finishes when the amount of chromosomes defined by the convergence

of 50% or when 100 generations is reached, otherwise the generated offspring will be

repeated until termination conditions are satisfied.

During each analysis, a command window will display the progress of the GA run. In

Figure 19, the last GA generation is shown.

12 14 16 18

0.055

0.06

0.065

0.07

Number of Windows

Fitness vs. # of Windows at Generation 20

Fitn

ess

0 5 10 15 200

0.05

0.1

0.15

0.2

Generation

Ave

rag

e a

nd B

est

Fitn

ess

Evolution of Average and Best Fitness

0 5 10 15 2014

16

18

20

22

Generation

Ave

rag

e W

ind

ow

s U

sed

Evolution of Number of Windows

0 20 40 600

5

10

15

20

25

Window Number

Mo

dels

Incl

udin

g W

ind

ow

Models with Window at Generation 20

Figure 19 – Diagnostic plots of GA analysis: Fitness vs. Number of variables (a); Evolution of average and best fitness (b); Evolution of number of variables (c); and Models with variable number (d)

At fitness vs. Number of variables plot the actual fitness at generation 20 is described with

green circles.

The evolution of average and best fitness plot can be seen a dashed line which represents

the RMSECV obtained using all variables. Furthermore, the best and average fitness lines

over the generations tend to converge for a minor RMSECV value.

Evolution of number of variables plot shows the average variables number used by each

generation.

The last plot (d) shows the variables selected at generation 20.

In this technique, for each simulation different variables sets were selected. Thereby, ten

simulations for each calibration were developed and the variables which were repeated at least

five times were selected.

33

3.3.1.4. Number of PCs

The number of PC’s was selected according to the minor PRESS. Based on Figure 20, the

PA calibration set (with DX (DSM) based on MC; 1st derivative and Mean centering pre-

processing spectra) goes through a minimum at 3 LVs. A good PLS model was built with 3

LV, with a lowest RMSECV. Less than 3 LV’s, few information data is included in the

model; while more than this value, too much information is added.

1 2 3 4 5 6 7 8 9 10

1.25

1.3

1.35

1.4

1.45

1.5

1.55

1.6

1.65

Latent Variable Number

RM

SE

CV

Figure 20 – PRESS for PLS on PA calibration set (with DX (DSM)) data based on MSC, 1st derivative and Mean Centering pre-processing spectra.

This choice was also supported by the variance captured in Y as shown in Figure 21.

Figure 21 – Analysis showing PLS Model of PA calibration set (with DX (DSM)).

34

This model with 3 LV can capture 98.77% of Y cumulative variance, which is quite

significant.

3.3.1.5. Outliers

In case of a sample being significantly different the rest of the calibration set, it can be

outlier which should be eliminated. But, not all outliers are erroneous. The outliers’ detection

has to be very careful in order to avoid elimination of representative samples to the model.

There are some techniques to detect outliers. In this study only applied leverage and Q

residuals were consistently used.

Leverage measures the importance of the sample has on a model, while Y-studentized

residual is an indication of the lack of fit of the y-value of a sample [19].

0.05 0.1 0.15 0.2 0.25-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

Leverage

Y S

tdnt

Res

idua

l 1

34

Figure 22 – Studentized Residuals versus Leverage for PA calibration set (with DX (DSM)).

Sample number 34 had a standard deviation of error around 1.5 and a very high leverage,

about 0.255. This suggested that sample 34 was an outlier, which could be checked by making

Q residuals versus sample.

35

5 10 15 20 25 30 35 400

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1x 10

-5

Sample

Q R

esid

uals

(2.

66%

)

34

Figure 23 – Q residuals versus sample for PA calibration set (with DX (DSM)).

As can be seen in Figure 23, sample 34 had significantly higher Q residuals than the rest

of the calibration set, so it was really an outlier.

After the outliers’ detection, they are eliminated to build an efficient model.

3.3.1.6. Statistics

After the calibration model is built the RMSECV and the R2 are calculated to previously

evaluate the performance of cross-validation calibration data. At least, the predictive ability of

the model is measured by RMSEP with test set – samples set is not used for model

development.

78 80 82 84 86 88 90 9276

78

80

82

84

86

88

90

92

94

Y Measured 1

Y P

redi

cted

1

R2 = 0.9863 Latent VariablesRMSEC = 0.53486RMSECV = 0.73284RMSEP = 0.61823

Figure 24 – Correlation between measured and predicted PA calibration set (with DX (DSM)) [●: calibration set; ▼: validation set].

36

Figure 24 shows the performance of PA calibration model developed on MSC, first

derivative and mean centering pre-processed spectra over the whole wavenumber range. This

model was built using 3 LV and presented the lowest RMSEP for PA (more details can be

seen in Table 8).

3.3.1.7. First strategy without variable selection

Several models without variable selection were developed, according the steps previously

described. In addition, the characteristics of the best models of each API were summarized in

Table 7.

Table 7 – The best results for each calibration set without variable selection (using DX (DSM)).

Compound Pre-processing Range LV R2 RMSECV (%)

RMSEP (%)

MSC and MC 4 0.99 0.77 0.6 PA

MSC, 1st D and MC 3 0.99 0.73 0.62

MSC and MC 3 0.98 0.57 0.64 PS

MSC, 1st D and MC 3996.2-9003.1 3 0.97 0.61 0.64

MSC, 1st D and MC 4 0.98 0.43 0.37 DX

MSC, 2nd D and MC 4 0.98 0.42 0.37

The accuracy of model can be checked through of the coefficient of determination of

model (R2), RMSEP and RMSECV analysis.

The R2 is a statistical measure of how well the regression line approximates the real data

points. An R2 of 1.0 indicates that fitted model explains all variability in y Predicted.

The RMSECV and RMSEP estimate cross-validation and prediction error of the model. A

good calibration should have a low error values. The small differences between RMSEP and

RMSECV indicate that the model was robust not only for the observations in the calibration

dataset but also for prediction set. Moreover, theses differences also demonstrate the influence

of proper pre-processing methods on the raw data.

The R2 calculated for each model was very high, between 0.97 and 0.99.

Despite of RMSEP of the first PA model has a low RMSEP; the differences between

RMSECV and RMSEP are bigger than the second model. So, MSC, first-derivative and mean

centering is the best model, with only 3 latent variables. The value of RMSECV is 0.73 and

RMSEP equal to 0.62.

37

For PS and DX models, in both cases, second calibration has the lower prediction error

and RMSECV and RMSEP are more similar. Note that for PS calibration, theses values are

almost equals. In this application, MSC; first-derivative and MC and MSC; second-derivative

and MC seem to performer better PS and DX models, respectively, than others pre-processes.

RMSECV and RMSEP are 0.61 and 0.64 for PS model, and 0.42 and 0.37 for DX model.

In general, the obtained results are good. Over the whole wavenumber range, besides

capturing concentration variability, the model also contains irrelevant information and noise,

which can be avoided with variable selection. In this procedure, only the regions with relevant

information are considered in the model, which increases the prediction performance of the

model.

Thereby, new models with variable selection techniques were developed with the aim of

obtaining better results.

3.3.1.8. First strategy with variable selection

At each calibration set, three different methods of variable selection – Coefficient of

Determination, iPLS and Genetic Algorithm – were applied, with the aim of identifying

one/more regions where the concentration variation of each API was more significant. Several

pre-treatments were used and variables selected, but only the models with the best predictive

capabilities are shown in the Table 8.

Table 8 – The best results obtained for each calibration set (using DX (DSM)) with variable selection6.

Method Compound Pre-processing Range LV R2 RMSECV

(%) RMSEP

(%)

PA MSC, 1st D

and MC

4196.8-4428.2 4644.2-5400.3 5786-5971.2 6056-7259.5

8023.3-8663.6 3 0.99 0.76 0.67

R2 PS MSC, 1st D

and MC

4158.2-4204.5 4289.4-4412.8 4482.2-4544 4621.1-4698.3

5045.4-5361.7 5770.6-5832.3 5978.9-6079.2 6287.5-7282.7 7467.8-7529.5 8061.9-8231.6 8447.6-8686.7 8794.8-8833.3

2 0.98 0.57 0.62

DX MSC, 2nd D

and MC

4844.8-4906.5 5030-5346.3 5678-5863.2 6626.9-6657.8

6765.8-7028.1 3 0.98 0.34 0.33

6 For simplicity reasons, the range selected of each calibration set for GA variable selection is not included on this Table, but it is available in appendix 7.3.

38


(%) RMSEP

(%)

PA MSC, 2nd D

and MC 6688.6-7012.7 2 0.98 0.74 0.74

iPLS PS MSC and

MC 4003.9-4497.7 3 0.98 0.61 0.66

DX MSC, 2nd D

and MC 5022.3-5269.1 3 0.98 0.38 0.38

PA MSC, 2nd D

and MC Vide appendix 7.3 3 0.98 0.7 0.74

GA PS MSC, 2nd D


DX MSC, 2nd D


For the six models, a good R2 were obtained. In spite of the lowest RMSEP value of PA

model – chosen with R2 technique – the model based on iPLS variable selection was better

than the first because there were no differences between RMSECV and RMSEP. This model,

with MSC; second-derivative; mean centering pre-processing and 2 PLS factors, perform

better than others. RMSECV and RMSEP were both 0.74.

The PS model which used GA method to select the most relevant variables, and DX model

that used R2 techniques were very robust. In that way, the predictive set can describe the same

root mean square error that cross-validation. Both models were better performed with MSC,

second-derivative, mean centering pre-process and with 3 latent variables. For PS model the

RMSECV value is 0.54, while RMSEP 0.56, and for DX model are 0.34 and 0.33,

respectively.

For the models developed with variable selection better results were obtained than for over

full wavenumber (Table 9).

Table 9 – The best results obtained in the first strategy.

Without variable selection With variable selection Compound RMSECV (%) RMSEP (%) RMSECV (%) RMSEP (%)

PA 0.73 0.62 0.74 0.74

PS 0.61 0.64 0.54 0.56

DX 0.42 0.37 0.34 0.33

Thereby, the calibration models based on a small subset of wavenumbers have a lowest

prediction error (with the exception of PA) and are more accurate.

39

3.3.2. Second strategy

The first strategy minimizes correlation between the principal components in the three

independent calibration sets – for each API. However, this procedure is quite ideal, since it

does not take into account interactions between APIs.

First, the hypothesis of to join all available data spectra of three calibration sets was

admitted. But, during the development of an API model was perceptible that this was not the

best strategy, as can be seen in Figure 25.

0 10 20 30 40 50 60 70 80 90-60

-40

-20

0

20

40

60

80

100

Y Measured 1

Y C

V P

redi

cted

1

Figure 25 – Correlation between measured and cross-validation predicted set of PA (with DX (DSM)).

There was no ‘linear’ relation between the measured and predicted variables, so this data

set could not be used to build an accuracy calibration model.

Then, a new strategy was admitted. For each calibration set the first and last three spectra

(the extremes) of the other sets were added, including the three replicas. Thereby, the

concentrations of all APIs were increased or decreased in each calibration set, taking into

account interactions between APIs. The correlation coefficient between samples’

concentration was calculated with DX (DSM), as can be seen in Table 10 (the correlation

values for other calibrations set are available in appendix 7.4).

Table 10 – The correlation between samples’ concentration for each calibration set with DX (DSM).

R2 PA PS DX Placebo

PA 1.00 - - -

PA PS 9.00E-07 1.00 - -

DX 3.00E-10 6.00E-07 1.00 -

Placebo 0.75 0.19 0.05 1.00

40

R2 PA PS DX Placebo

PA 1.00 - - -

PS PS 8.00E-07 1.00 - -

DX 1.00E-07 3.00E-05 1.00 1.00

Placebo 0.60 0.32 0.08 1.00

R2 PA PS DX Placebo

PA 1.00 - - -

DX PS 2.00E-07 1.00 - -

DX 6.00E-09 3.00E-05 1.00 -

Placebo 0.62 0.29 0.09 1.00

On this strategy for each calibration set, the correlation coefficient between API (whose

concentration was varied) and placebo concentration was minor than the first strategy.

However, the other APIs (whose concentration were not varied) and placebo became more

correlated.

3.3.2.1. Second strategy without variable selection

Several calibration models were built on differently pre-processed data, and the best

performance model for each calibration set was chosen and summarized in Table 11.

Table 11 – The best results for each calibration set without variable selection (using DX (DSM)).

Compound Pre-processing Range LV R2 RMSECV (%)

RMSEP (%)

MSC and MC 2 0.93 1.21 2.04 PA

MSC, 2nd D and MC 3 0.94 1.16 2.10

MSC, 1st D and MC 5 0.93 0.77 1.08 PS

MSC, 2nd D and MC 5 0.94 0.72 1.24

MSC and MC 6 0.95 0.36 0.43 PS

MSC, 1st D and MC

3996.2-9003.1

6 0.94 0.37 0.42

Comparing each model, the lowest RMSECV of PA and PS calibration were obtained

after the MSC, second-derivative and mean centering pre-processing, while DX calibration

was with MSC and mean centering.

PS and DX models need 5 and 6 PLS factor, respectively, and PA model just requires 2 or

3 latent variables. Comparing with previous results these models have too much PLS factor

and not better RMSECV and RMSEP. PA models look underfitting since the differences

41

between RMSECV and RMSEP are significant, consequently they will not capture enough

variability in the data because they do not have enough information.

3.3.2.2. Second strategy with variable selection

The obtained results for over the whole wavenumber range were not good. So, the same

variable selection techniques previously mentioned were applied, with the aim of finding the

specific regions of each API to get better results. The models with the lowest RMSECV and

RMSEP for each calibration set are summarized in Table 12.

Table 12 –The best results for each calibration set (using DX (DSM)) using variable selection techniques.


(%) RMSEP

(%)

PA MSC, 2nd D

and MC

4775.4-4991.4 5099.4-5423.4 6534.3-7182.4

8216.2-8285.6 2 0.95 1.04 1.72

R2 PS MSC, 1st D

and MC 4335.7-4520.8 2 0.96 0.64 1.47

DX MSC, 2nd D

and MC 4868-5091.7 5608.6-

5932.6 4 0.94 0.37 0.46

PA MSC, 2nd D

and MC 4767.7-5114.8 2 0.88 1.52 1.38

iPLS PS MSC, 2nd D

and MC 4505.4-4752.3 2 0.84 1.29 1.95

DX MSC, 2nd D

and MC 4844.8-5114.8 2 0.93 0.39 0.44

PA MSC, 2nd D

and MC 6873.8-7197.8 7645.3-8177.6 8393.6-8686.7

3 0.91 1.34 1.37

GA PS MSC, 2nd D

and MC

4150.5-4374.2 4844.8-4991.4 5384.9-5454.3 5616.3-5685.7 5847.7-6071.5 6387.8-6457.2 7159.2-7460.1 7930.7-8000.1 8162.1-8540.2

4 0.89 0.93 1.08

DX MSC, 1st D

and MC

5400.3-6194.9 6310.6-7058.9 7174.7-7753.3

7923-8609.6 5 0.97 0.28 0.46

The best results, concerning the lowest RMSEP and the similarity between prediction and

cross-validation errors, were obtained after MSC, second-derivative and MC of the spectra for

PA and PS model with variable selected by GA. Thereby, for these models a RMSECV of

1.34 and RMSEP of 1.37 for PA calibration, with 3 PLS factors and a RMSECV of 0.93 and

RMSEP of 1.08 for PS calibration, using 4 latent variables were obtained. Correlation

coefficient for each models were 0.91 and 0.89, respectively.

42

Although, for the same pre-treatment applied for other calibration sets, the best DX model

was build with variable selection by iPLS method. This model needs 2 latent variables and a

correlation coefficient of 0.93, moreover, the RMSECV is 0.39 and a RMSEP equal to 0.44.

In conclusion, all models after the variable selection had better results than without, as

observed in the first strategy. In addition, the models developed for the first strategy had

better results than this one, as can be seen in Table 13.

Table 13 – The best results obtained in the first and second strategy.

First strategy Second strategy Compound

RMSECV (%) RMSEP (%) RMSECV (%) RMSEP (%) PA 0.74 0.74 1.34 1.37

PS 0.54 0.56 0.93 1.08

DX 0.34 0.33 0.39 0.44

3.4. Obtained results vs. other studies

A similar study for an analogous pharmaceutical formulation was already done by M.

Alcalá for two of the APIs studied [35]. Once for each API independent calibration models

were developed, a comparison analysis could be made between both studies.

Table 14 – The best results obtained in current and Alcalá’s study.

Current study Alcalá’s study

PA DX PA DX

% (w/w) 84.7 2.7 6.5 0.2

Pre-processing MSC, 2nd D

and MC MSC, 2nd D and

MC 1st D 2nd D

Range (cm-1) 6688.6-7012.7

4844.8-4906.5 5030-5346.3 5678-5863.2

6626.9-6657.8 6765.8-7028.1

4255-9090 4255-9090

LV 2 3 4 5

RMSECV (%) 0.74 0.34 2.7 1.7

RMSEP (%) 0.74 0.34 2.2 3.3

In these two studies, PA and DX calibration models were built using different pre-

processes and ranges. In the current work, both models had less LV; low and more similar

RMSECV and RMSEP than Alcalá’s study. Consequently, these models should be more

robust. But, to assure this supposition, external validation and validation according to the ICH

(International Conference on Harmonisation), EMEA (European Medicines Agency) and

43

PASG (Pharmaceutical Analytical Sciences Group) guidance must be done prior to use by the

pharmaceutical industry.

3.5. Orthogonal analysis

OPLS modelling was performed to establish the correlation between NIR spectroscopy

data of DX (X block) and it laser diffraction PSD (Y block).

This method aims to separate the systematic variation in the X-block into two parts: one

that is linearly related to Y, and a second which is unrelated (orthogonal) to Y. This

separation facilitates the model interpretation, thus the components that are related to Y are

called predictive (p), while those that are unrelated to Y are called orthogonal (o).

First, to simplify the application of this technique, the DX PSD [d(0.1); d(0.5) and d(0.9)]

was compressed in only 1 LV by PCA. One LV described it properly because a high

percentage of X data explained by the model – R2 (X) – and percentage of variation predicted

by the model according to cross-validation – Q2 – were obtained, respectively 0.98 and 0.95.

Next, OPLS model for two different pre-processes was developed.

Table 15 – OPLS summary results for different pre-processing techniques of DX samples.

Data set R2 (X)p R2 (Y)p Q2p LV p R2 (X)o R2 (Y)o Q2

o LV o

Without pre-processing

0.712 0.526 0.459 1 0.283 0.144 0.096 1

Second derivative7

0.299 0.302 0.285 1 0.701 0.679 0.468 4

On the first model – without any pre-processing –, as mentioned above, the physical

properties are more evident, such as DX particle size. Thus, that information (Y) was

predicted from physical properties contained in spectral data (X) on this model. As expected,

the predictive set is much more influent than the orthogonal, because X and Y blocks are

greatly related. The representation of predictive explained variance is much higher than the

orthogonal for the same latent variables, the Q2p= 0.459 while Q2

o= 0.096.

In the second model, spectral data (X) was pre-processed with second-derivative spectra,

which minimises the physical properties and gives more emphasis on chemical properties.

Consequently in this case, the opposite of first model occurs; the orthogonal set had more

influence than predictive set, as can be confirmed in Q2 value – 0.679 and 0.285 respectively.

Thus, the particle size information (Y) is almost unrelated to spectral data (X).

7 For the spectral data was applied 2nd-derivative using the SG algorithm with a 15-point moving window.

44

The deviation between the percentage of X and Y data explained by the model is described

by the residuals. In this case, there are almost no residuals in either data set; consequently the

spectra have low noise level.

Table 16 – The residual in X and Y results.

Data set Residual X Residual Y Without pre-processing 0.01 0.33

Second derivative 0.00 0.02

45

4. CONCLUSIONS

Near Infrared (NIR) spectroscopy, in combination with chemometrics, enables quantitative

analysis of a pharmaceutical preparation. This work was developed with the purpose of to

accurately quantifying the concentration of each 3 Active Pharmaceutical Ingredients (APIs)

in pharmaceutical solid dosage.

NIR spectroscopy provides major advantages over conventional methods, because it does

not require sample preparation and it is a non-destructive technique. This technique also has

the potential to distinguish the chemical and physical properties of samples.

In order to know more in detail about APIs, the chemical properties of each API were

studied by NIR spectroscopy, using second-derivative pre-process (which minimises the

physical properties). Comparing FT-NIR absorption spectra of the three APIs (Figure 7) and

theirs chemical structure (Figure 8) the overlapping of several absorption bands was evident,

especially between Paracetamol (PA) and Pseudoephedrine Hydrochloride (PS). This

phenomenon can be justified by the similarity of some functional groups of APIs. The lack of

selectivity problem can be solved with multivariate chemometrics techniques (such as Partial

Least Squares (PLS)) and variable selection. Thereby, the quantitative determination of

content in pharmaceutical product was developed using PLS regression, without overlapping

absorption bands between PA and PS.

The differences between physical properties of samples were identified by NIR

spectroscopy (without any pre-processing) and powder laser diffraction. According with both

techniques, PA and Detromethorphan Hydrobromide (DX) Particle Size Distribution (PSD)

were more similar than PS. As can be seen on Figure 13, PA and DX had Gaussian

distribution, while PS was bimodal, representing a heterogeneous PSD.

Powder laser diffraction is a fast and useful analytical tool for the particle characterization

of the API batches, with adequate precision over a wide particle size range.

For NIR calibrations, an appropriate experimental design was created, with a low

correlation between concentrations, to guarantee a good calibration set necessary to built

robust model. Three different calibration sets of laboratory samples were prepared, where

only one API and placebo concentrations were varied by overdosing and underdosing. Each

calibration set was used to predict only one API, taking into account small deviations of only

one API from linearity in the studied concentration range. This approach (first strategy)

solved the selectivity problems because in each calibration set only the concentration

variation of one API is captured.

46

The first strategy is quite ideal, since it does not take into account interactions between

APIs. Thus, a second strategy was considered, where some spectra from other calibration set

were added, and consequently the concentrations of all APIs were increased or decreased in

each calibration set. However, besides the correlation between each API (whose concentration

was varied) and placebo concentrations decreased, the other APIs (whose concentration were

not varied) and placebo became more correlated. Consequently, the results obtained from

second strategy were worst than in the first strategy.

Variable selection was also applied to search of the spectral region with the minimum

non-linearity of responses, i.e., choosing the spectral information of interesting components

and removing irrelevant or noise signals. Better results were obtained with variable selection.

The prediction ability of the proposed methods is summarized in Table 17.

Table 17 – The best results for each calibration model, using PLS regression, with variable selection (using DX (DSM)).

Compound Method Pre-processing Range LV R2 RMSECV

(%) RMSEP

(%)

PA iPLS MSC, 2nd D and MC 6688.6-7012.7 2 0.98 0.74 0.74

PS GA MSC, 2nd D and MC Vide appendix 7.4 3 0.98 0.54 0.56

DX R2 MSC, 2nd D and MC

4844.8-4906.5 5030-5346.3 5678-5863.2

6626.9-6657.8 6765.8-7028.1 3 0.98 0.34 0.33

The three calibrations models provide the lowest RMSECV and RMSEP and the high R2

were obtained.

The calibration models developed can be used for quality control purposes of the studied

pharmaceutical formulation. The absolute error of each API in a tablet was calculated to

demonstrate the potentiality of this method.

Table 18 – The weight percentage of each API and respectively RMSEP obtained for the best calibration set and the weight in of each active ingredient in a tablet.

Compound % (w/w) RMSEP (%) Weight (mg in a tablet) PA 84.75 0.74 500 ± 4

PS 5.08 0.56 30 ± 3

DX 2.67 0.33 16 ± 2

As can be seen in Table 18, this technique allows an accurate detection with small errors.

47

In addition, this study was compared with one available in the literature, which uses a

similar pharmaceutical formulation [35].

Table 19 – The accuracy obtained in our study Alcalá’s study [35].

Compound Current study Alcalá’s Study PA 500 ± 4 650 ± 22

DX 16 ± 2 20 ± 33

PA and DX models build in the current study allow identifying of a small amount of APIs,

comparing with the Alcalá’s study. Consequently, the first models are more accurate than the

developed by Alcalá.

The OPLS model allows to verify how well Y and X variables are correlated, in this case

powder laser diffraction and NIR spectroscopy respectively. As mentioned above, using

different pre-treatments the physical and chemical properties of samples can be evidenced in

NIR spectra. Thereby, without any pre-processing, the powder laser diffraction set can be

predicted by NIR spectroscopy set (based on the physical properties of samples), since a high

Q2p was obtained (equal to 0.459). However, if a second-derivative pre-process was used, as

expected, X and Y are much more unrelated, because NIR spectra focus on the chemical

properties of samples. Consequently, the Q2o is high than Q2p, respectively 0.468 and 0.285.

In conclusion, these two powerful techniques can be used in parallel in the quantification

and quality control of solid dosage formulation in the Pharmaceutical Industry.

48

5. SUGGESTIONS FOR FUTURE WORK

In this study, a quantitative analysis of the three APIs in a solid dosage formulation was

developed to assure a quality control of Pharmaceutical Product, with NIR spectroscopy and

Chemometric techniques. For the multivariate calibration models good results were obtained,

however their accuracy have to be validated with external samples and ICH; EMEA and

PASG guidance rules.

For the development of a quantitative analysis, three independent sets of laboratory

samples for each API were produced. In each calibration set, concentrations of a selected API

and placebo were varied by overdosing and underdosing. Instead of three sets of laboratory

samples, only one with placebo and all API concentration randomly varied could be prepared

(avoiding the correlation between APIs). Moreover, that set had more concentration

variability and number of samples. Thus, with only that set the three calibration models could

be created.

In this work the potential of NIR spectroscopy, associated with chemometrics, and Laser

Diffraction were studied. A combination of both techniques proven to have be suitable tools

for quality control of the end product. These operational tools can be used also to monitor and

control the manufacturing process, in a real-time, according to PAT initiative. Thus, during

the granulation processes with the Powder Laser Diffraction the potential risk of particle

segregation within the product could be reduced or eliminated. And, the quality control of

end-product can be assurance with NIR Spectroscopy.

49

6. REFERENCES

[1] Guidance for industry. PAT – a framework for innovative pharmaceutical

manufacturing and quality assurance (U.S. Food and Drug Administration, Rockville, MD,

USA, 2003)

[2] http://www.asdi.com/nir-chart_grid_rev-3.pdf (July 2008)

[3] H. W. Siesler, Y. Ozaki, S. Kawata, H. M. Heise; Near-Infrared Spectroscopy

Principles, Instruments, Applications; WILEY-VCH; New York; 2002

[4] Barbara Stuart; Infrared Spectroscopy: Fundamentals and applications; John Wiley &

Sons, Ltd; New York; 2004

[5] J. Luypaert, D. L. Massart, Y. Vander Heyden; Near-infrared spectroscopy

applications in pharmaceutical analysis; Talanta; Volume 72, Issue 3, 15 May 2007, Pages

865-883

[6] Matthias Otto; Chemometric Statistics and Computer Application in Analytical

Chemistry; WILEY-VCH; Weinheim, Germany; 1999

[7] Daniel C. Harris; Quantitative Chemical Analysis – Third Volume; Fifth Edition;

Freeman; New York; 1995

[8] Skoog, West, Holer; Analytical Chemistry an Introduction – Sixth Edition; New York,

1997

[9] Emil W. Ciurczak, James K. Drennen III; Pharmaceutical and Medical Applications of

Near-Infrared Spectroscopy; Marcel Dekker Inc.; New York; 2002

[10] Celio Pasquini; Near Infrared Spectroscopy: Fundamentals, Practical Aspects and

Analytical Applications; J. Braz. Chem. Soc.; Volume 14 Nº2, São Paulo, March/April 2003

[11] Bernhard Lendl, Bo Karlberg; Advancing from unsupervised, single variable-based

methods: A challenge for qualitative analysis; Trends in Analytical Chemistry; Volume 24 Nº

6, 2005

[12] Katherine A. Bakeev; Process Analytical Technology: Spectroscopic Tools and

Implementation Strategies for the Chemical and Pharmaceutical Industries; Blackwell

Publishing; New York; 2005

[13] Tormod Næs, Tomas Isaksson, Tom Fearn, Tony Davies; A user-friendly guide to

Multivariate Calibration and Classification; NIR Publications; Chichester, U.K; 2002

[14] Yves Roggo, Pascal Chalus, Lene Maurer, Carmen Lema-Martinez, Aurélie Edmond,

Nadine Jent; A review of Near Infrared spectroscopy and Chemometrics in pharmaceutical

50

technologies; Journal of Pharmaceutical and Biomedical Analysis; Volume 44, Issue 3; 27

July 2007; Pages 683-700

[15] Mei-Lin Wu, You-Shao Wang; Using chemometrics to evaluate anthropogenic effects

in Daya Bay, China; Estuarine, Coastal and Shelf Science; Volume 72, Issue 4; May 2007;

Pages 732-742

[16] F. González, R. Pous; Quality control in manufacturing process by near infrared

spectroscopy; Journal of Pharmaceutical and Biomedical Analysis; Volume 13 Nº4, April

1995; Pages 419-423(5)

[17] PLSplus IQ user’s guide; Thermo Electron Corporation, Salem, NH, USA

[18] Lars Nørgaard; iToolbox Manual; July 2004; Denmark

[19] Matlab manual version 6.5; Mathworks Inc., 2005

[20] B. Üstün, W. J. Melssen, M. Oudenhuijzen, L. M. C. Buydens; Determination of

optimal support vector regression parameters by genetic algorithms and simplex optimization;

Analytica Chimica Acta; Volume 544 Nº 1-2; May 2005; Pages 292-305

[21] Yibin Ying, Yande Liu; Non-destructive measurement of internal quality in pear

using genetic algorithms and FT-NIR spectroscopy; Journal of Food Engineering; Volume 84

Nº2; 2008; Pages 206-213

[22] Mastersizer 2000 user manual; Malvern Instruments; 2007

[23] Alan Rawle; Technical Paper: Basic Principles of particles size analysis; Malvern

Instruments; New York; 1995

[24] S. Sonja Sekulic, John Wakeman, Phil Doherty, Perry A. Hailey; Automated system

for the on-line monitoring of powder blending processes using near-infrared spectroscopy:

Part II. Qualitative approaches to blend evaluation; Journal of Pharmaceutical and Biomedical

Analysis; Volume 17, Issue 8, 30 September 1998; Pages 1285-1309

[25] Application Note: Wet method development for laser diffraction measurements;

Malvern Instruments

[26] Application Note: Method validation for laser diffraction measurements; Malvern

Instruments

[27] http://www.chemspider.com/RecordView.aspx?id=1906 (July 2008)

[28]http://www.chemspider.com/RecordView.aspx?rid=13bf38e9-a8ac-4992-a826-

191ff5964465 (July 2008)

[29] Weng Li Yoon, Roger D. Jee, Andrew Charvill, Gerard Lee, Anthony C. Moffat;

Application of near-infrared spectroscopy to the determination of the sites of manufacture of

51

proprietary products; Journal of Pharmaceutical and Biomedical Analysis; Volume 34, Issue

5, 20 March 2004, Pages 933-944

[30] A.P. Tinke, K. Vanhoutte, F. Vanhoutt, M. De Smet, H. De Winter; Laser diffraction

and image analysis as a supportive analytical tool in the pharmaceutical development of

immediate release direct compression formulations; International Journal of Pharmaceutics;

Volume 297 Nº1-2; 2005; Pages 80-88

[31] Sample dispersion & Refractive index guide; man. 0079 version 3.1; Malvern

Instruments; 1997

[32] http://www.dsm.com (August 2008)

[33] M. Blanco, A. Eustaquio, J. M. González, D. Serrano; Identification and quantitation

assays for intact tablets of two related pharmaceutical preparations by reflectance near-

infrared spectroscopy: validation of the procedure; Journal of Pharmaceutical and Biomedical

Analysis; Volume 22 Nº1; 2000; Pages 139-148

[34] http://www.abb.com (July 2008)

[35] M. Blanco, M. Alcalá; Simultaneous quantitation of five principles in a

pharmaceutical preparation: Development and validation of a near infrared spectroscopy

method; European Journal of Pharmaceutical Sciences; Volume 27 Nº 2-3; 2006; Pages 280-

286

[36] Mattias Hedenström, Susanne Wiklund, Björn Sundberg, Ulf Edlund; Visualization

and interpretation of OPLS models based on 2D NMR data; Chemometrics and Intelligent

Laboratory Systems; Volume 92 Nº2; 2008; Pages 110-117

[37] Svante Wold, Johan Trygg, Anders Berglund, Henrik Antti; Some recent

developments in PLS modeling; Chemometrics and Intelligent Laboratory Systems; Volume

58, Issue 2; 28 October 2001; Pages 131-150

[38] Jon Gabrielsson, Hans Jonsson, Christian Airiau, Bernd Schmidt, Richard Escott,

Johan Trygg; OPLS methodology for analysis of pre-processing effect on spectroscopy data;

Chemometrics and Intelligent Laboratory Systems; Volume 84, Issue 1-2; 1 December 2006;

Pages 153-158

52

7. APPENDIX

7.1. Determination of Percent Relative Standard Deviation

(%RSD)

Precision is often measured by the standard deviation of the set. The standard deviation s

of a set of n repeat measurements is defined as

( )1

2

−−

= ∑n

xxs (4)

where x is a single measurement and x is the mean measurement.

The lower standard deviation means a good precision of set of repeat measurements.

The relative precision of two or more methods of measurement is compared by calculating

their percent relative standard deviation (%RSD), which is calculated from the standard

deviation s and mean measurementx , according to the equation:

x

sRSD

×= 100% (5)

7.2. Matrix design for laboratory samples

For the production of laboratory samples, a matrix design was created based on three

independent calibration sets. At each calibration set, the concentration of the API was reduced

adding small amounts of placebo, as can be seen on Table 20.

Table 20 – Matrix design for laboratory samples8. PA PS DX 1 92.3 10.2 5.4 2 91.4 10.2 5.3 3 90.6 9.2 4.8 4 89.7 8.1 4.3 5 88.8 7.1 3.7 6 88.0 6.1 3.2 7 87.1 5.1 2.7 8 86.3 4.1 2.1

8 For each calibration set, only the API represented and placebo concentrations were varied.

53

PA PS DX 9 85.5 3.1 1.6 10 84.7 2.0 1.1 11 83.8 1.0 0.5 12 82.9 0.0 0.0 13 82.1 14 81.2 15 80.4 16 79.5 17 78.7 18 77.8 19 77.1

7.3. First Strategy

For simplicity reasons, in Results and Discussion was only presented the results of

calibrations set developed with DX (DSM).

The correlation between pairs of APIs for each calibration set with DX (Divis) can be seen

in Table 21.

Table 21 – The correlation between samples’ concentration for each calibration set with DX (Divis).

R2 PA PS DX Placebo

PA 1 - - -

PA PS 0.08 1 - -

DX 0.09 0.01 1 -

Placebo 1 0.08 0.09 1

R2 PA PS DX Placebo

PA 1 - - -

PS PS 0.17 1 - -

DX 5.00E-05 0.03 1 -

Placebo 0.17 1 0.03 1

R2 PA PS DX Placebo

PA 1 - - -

DX PS 0.35 1 - -

DX 0.01 0.09 1 -

Placebo 0.01 0.09 1 1

Several models without variable selection were developed, according the steps previously

described. In addition, the characteristics of the best models of each API were summarized in

Table 22.

54

Table 22 – The best results for each calibration set without variable selection (using DX (Divis)).

Compound Pre-processing Range LV R2 RMSECV (%) RMSEP (%)

MSC and MC 3 0.98 1.26 0.67 PA MSC, 1st D and

MC 3 0.95 1.21 1.29

MSC and MC 2 0.97 0.58 0.79 PS MSC, 1st D and

MC 2 0.97 0.65 0.59

MSC, 1st D and MC

3 0.97 0.33 0.29 DX

MSC, 2nd D and MC

3996.2-9003.1

3 0.98 0.34 0.23

At each calibration set were applied three different techniques of variable selection, with

the aim of finding one/more characteristics regions, where the concentration variation of each

API is more significant. Thereby, three calibration models of API's with three different

methods of variable selection – Coefficient of Determination, iPLS and Genetic Algorithm –

were developed, but only the models with the best predictive capabilities are shown in the

Table below.

Table 23 –The best results for each calibration set with variable selection (using DX (Divis)).


(%) RMSEP

(%)

PA MSC, 1st

D and MC

4327.9-4443.7 4821.7-4922 4968.3-5408 5801.5-5895 6110-7251.8 7992.4-8517

3 0.98 1.18 0.60

PS MSC and

MC 5006.8-54003 6256.6-7305.8 2 0.91 0.65 1.99 R2

DX MSC, 1st

D and MC

4790.8-4891.1 5045.4-5400.3 5786-5809.2 6295.2-6410.9

6889.2-7182.4 8239.3-8324.2 3 0.99 0.26 0.25

PA MSC, 2nd D and MC

4505.38-4999.12 5 0.97 1.16 1.35

PS MSC, 1st

D and MC 7020.4-7344.4 4 0.97 0.59 0.83 iPLS

DX MSC, 1st

D and MC 5022.3-5269.1 4 0.98 0.30 0.39

GA PA MSC, 2nd D and MC

4158.2-4219.8 4327.9-4428.2 4783.1-4814 5269.1-5485.1

6295.2-6364.6 6534.3-6673.2 6765.8-6827.5 7004.9-7097.5 7321.2-7406.1 7498.7-7668.4 7791.8-7845.8 7969.3-8216.2

4 0.97 1.18 1.14

55


(%) RMSEP

(%)

PS MSC, 2nd D and MC

4042.5-4158.2 4351.1-4405.1 4520.8-4698.3 4883.4-4945.1 5261.4-5377.1 5508.3-5693.4 5816.9-5932.6 6156.3-6356.9

6480.3-6619.52 6704.1 6773.5-6935.5 7043.5-7113 7236.4-7275 7375.3-7552.7

7637.6 7768.7-7899.9 8154.4-8262.4 8370.4-8586.5 8694.5-

8733 8887.3-9003.1

2 0.96 0.87 0.71

DX MSC, 1st

D and MC

4621.1-4767.7 4852.5-4922 5006.8-5076.3 5469.7-5616.3 6549.8-6619.2 7004.9-7097.5 7321.2-7406.1 7498.7-7668.4 7791.8-7845.8 7969.3-8216.2

4 0.99 0.27 0.29

For simplicity reasons, the range selected of each calibration set for GA variable selection

is not included on the Table 9. Thereby, the three calibration models of API's which used

Genetic Algorithm are shown in the Table below.

Table 24 –The best results for each calibration set with variable selection (using DX (DSM)).

Compound Pre-

processing Range LV R2 RMSECV (%)

RMSEP (%)

PA MSC, 2nd D and MC

4158.2-4219.9 4327.9-4428.2 4783.1-4814 5269.1-5485.1

6295.2-6364.6 6534.3-6673.2 6765.8-6827.5 7004.9-7097.5 7321.2-7406.1 7498.7-7668.4 7791.8-7845.8 7969.3-8216.2

3 0.98 0.70 0.74

PS MSC, 1st

D and MC

4235.4-4690.5 5932.6-6002 6318.3-6387.8 6704.1-6773.5 6935.5-7004.9 7475.5-7545

8941.3-9003.1

3 0.98 0.54 0.56

DX MSC, 2nd D and MC

4621.1-4767.7 4852.5-4922 5006.8-5076.3 5469.7-5616.3 6549.8-6619.2 7012.7-7082.1 7167-7313.5 7861.3-7930.7

8324.2-8393.6 8555.6-8856.5

3 0.98 0.29 0.35

7.4. Second Strategy

On this strategy, at each calibration set, previously developed, was added the first three

and last three spectra (included three replicas). The correlation coefficient between pairs of

components was calculated with DX (Divis) can be seen in Table 25.

56

Table 25 – The correlation between samples’ concentration for each calibration set with DX (Divis).

R2 PA PS DX Placebo PA 1.00 - - -

PS 4.00E-07 1.00 - -

DX 3.00E-07 2.00E-06 1.00 -

PA

Placebo 0.75 0.19 0.05 1.00

R2 PA PS DX Placebo

PA 1.00 - - -

PS 2.00E-07 1.00 - -

DX 2.00E-07 3.00E-05 1.00 -

PS

Placebo 0.62 0.29 0.09 1.00

R2 PA PS DX Placebo PA 1.00 - - -

PS 8.00E-07 1.00 - -

DX 1.00E-07 3.00E-05 1.00 -

DX

Placebo 0.60 0.32 0.08 1.00

Several calibration models were built on differently pre-processed data, and the best

performance model for each calibration set was chosen and summarized in Table 26.

Table 26 – The best results for each calibration set without variable selection (using DX (Divis)). Compound Pre-processing Range LV R2 RMSECV (%) RMSEP (%)

MSC and mean centering

6 0.90 1.40 2.01

PA MSC, first derivative and mean centering

6 0.89 1.46 1.75


3 0.94 0.77 0.80

PS MSC, second derivative and mean centering

4 0.94 0.81 0.82


6 0.96 0.31 0.47

DX MSC, second derivative and mean centering

3996.2-9003.1

6 0.95 0.34 0.35

At each calibration set were applied three different techniques of variable selection, but

only the models with the best predictive capabilities are shown in the Table below.

57

Table 27 –The best results for each calibration set with variable selection (using DX (Divis)).

Method Compound Pre-processing Range LV R2 RMSECV (%)

RMSEP (%)

PA MSC, first

derivative and mean centering

4852.5-4914.3 4983.7-5392.6 6110-6310.6

6 0.91 1.34 2.11


4142.8-4189.1 4297.1-4389.7 4490-4667.4 5986.6-6056

8794.8-8825.6

3 0.95 0.7 0.76 R2

DX MSC, first


4790.8-4860.3 5145.7-5161.1 8239.3-8277.9

4 0.95 0.35 0.41

PA MSC, second derivative and mean centering

4767.7-5292.3 5 0.95 1.03 1.86


4505.4-4752.3 3 0.84 0.97 1.30 iPLS

DX MSC, first


8007.9-8501.6 3 0.95 0.33 0.38

PA MSC, first


6210-6310.6 6572.9-6619.2 7128.4-7197.8 7444.7-7521.8

8787-8841

3 0.95 1.41 1.79


3996.2-4760 4999.1-5454.3 5847.7-5917.2 7930.7-8771.6

4 0.94 0.9 0.64

GA

DX MSC, first


4898.8-5546.9 6302.9-6434.1 7344.4-7691.6 8216.2-8409

5 0.97 0.23 0.32

7.5. Orthogonal analysis

To express the performance of the various models on the example data, it is usually used

the standard measures of fit, R2, and the fraction of the total variation of the Y’s, Q2.

The explained variation of X and Y is above described by equation 4 and 5, respectively:

)()(1)(2

XSSESSXR −= (6)

)()(1)(2

YSSFSSYR −= (7)

58

where, SS means the sum of squares, and E and F are the residual matrices of X and Y,

respectively.

The fraction of the total variation of the Y’s than can be predicted by a component can be

described by a following equation.

)(12

YSSPRESSQ −= (8)

The prediction error sum of squares (PRESS) is the squared difference between observed

Y and predicted values when the observations were kept out.

7.6. Mastersizer Average Result Analysis Report

ft-nir spectroscopy and laser diffraction particle sizing of apis in

Documents