soft-sensor development for hydrocracker product quality … · paula sofia lourenço barbosa...

I

Soft-sensor Development for Hydrocracker Product

Quality Prediction

Paula Sofia Lourenço Barbosa

Thesis to obtain the Master of Science Degree in

Chemical Engineering

Supervisors: Professor Dr. Carla Isabel Costa Pinheiro (I.S.T. Portugal)

Eng. Dora Luísa Rodrigues Moura Nogueira (GalpEnergia S.A.)

Examination Committee

Chairsperson: Professor Dr. Sebastião Manuel Tavares da Silva Alves

Supervisor: Professor Dr. Carla Isabel Costa Pinheiro

Member of the Committee: Professor Dr. José Monteiro Cardoso de Menezes

June 2014

I

Man replies:

You created night, I the lamp

You created clay, and I the cup

You-desert, mountain peak and valley

I-flower bed, park and orchard

It is I who grind a mirror out of stone

And brew elixir from poison

Excerpt from ‘Dialog Between Man and God’ by Muhammad Iqbal.

III

Resumo

Com o intuído de maximizar a sua capacidade produtiva, além da maximização do

rendimento de cada barril de petróleo, a Refinaria de Sines investiu na instalação de uma unidade

de hydrocracking. Dado que todos os combustíveis produzidos são objecto de rigorosa

regulamentação, é necessário exercer um controlo apertado sobre a sua qualidade. Assim sendo,

com o objectivo de implementar controlo avançado na unidade, procedeu-se a uma primeira

abordagem à previsão de qualidade de Y produzido fazendo uso de um soft-sensor.

Para desenvolver o soft-sensor para previsão da qualidade de Y, a unidade foi estudada,

foram escolhidas variáveis de interesse e os seus dados históricos foram recolhidos e analisados.

Procedeu-se também à realização de step-tests na unidade fabril real para melhor conhecimento da

dinâmica e comportamento da fracionadora. Realizou-se posteriormente uma análise multivariada

usando Principal Components Analysis seguido de regressão com Partial Least Squares para

obtenção de um modelo linear que pudesse prever da melhor forma a qualidade de Y.

Foram construídos quatro modelos (A, B, C e D) usando diferentes conjuntos de dados. Estes

modelos eram bons detectores de falhas de processo, porque incluíram as variáveis com valores

muito diferentes dos seus dados históricos nas suas equações. Todos estes modelos seguiram a

dinâmica do processo e apresentaram boas previsões da variável de qualidade Y, sendo que o

Modelo C é que apresenta melhores previsões e é a melhor escolha para ser implementado no

sistema DCS como um sensor inferencial para providenciar previsões em tempo real da variável de

qualidade Y.

Palavras-chave: PCA, PLS, Analise Multivariada, Hydrocracking, Previsão de qualidade, Soft-

Sensor.

IV

Abstract

The main goal of this work is to maximize the productive capacity, and the revenue from each

oil barrel, GalpEnergia Sines Refinery has invested in an hydrocracking unit. Given that all fuels are

subject to strict regulation, it is necessary to have tight control over their quality. Therefore, in order

to implement future advanced control on the unit, we proceeded to a first approach of the prediction

of a quality variable of the diesel produced by making use of a soft- sensor.

To develop the soft sensor for quality prediction, variables of interest and their historical data

were collected and analyzed. Step-tests were performed in the real industrial plant in order to better

understand the dynamic behaviour of the fractionator.

Four soft-sensors were developed using Principal Components Analysis followed by a Partial

Least Squares regression to obtain linear models able of quality prediction. The soft-sensors

developed were good detectors of process faults because they included the faulty variables for

prediction.

All soft-sensors followed process dynamics and showed good predictions of the variable quality

Y. Model C presents the best predictions and is the best choice to be implemented in the DCS

system as an inferential sensor to provide real time information of the Y prediction to the operators

and also to be used for control purposes.

Keywords: PCA, PLS, Multivariate Analysis, Hydrocracking, Quality prediction, Soft-Sensor.

V

Agradecimentos

Muitas foram as pessoas que me guiaram e ajudaram neste percurso. A todos devo

agradecimentos, um respeito acrescido e a certeza de que a aprendizagem que fiz durante este

período vai ser muito útil ao longo da minha carreira profissional.

Gostaria de agradecer em primeiro lugar às minhas orientadoras, Senhora Professora Carla

Pinheiro e Senhora Engenheira Dora Nogueira por todos os imensos ensinamentos e incansável

motivação no decorrer deste trabalho. O apoio dado em períodos de maior stress e a calma e

paciência com que me guiaram durante o desenvolvimento desta tese merecem toda a minha

gratidão e respeito, tornando-as no exemplo a seguir na minha futura carreira.

De seguida gostaria de agradecer ao Senhor Engenheiro José Roque pela oportunidade

concedida de estagiar na Galp e também por todo o apoio disponibilizado, e também à Senhora

Engenheira Cristina Ângelo pela disponibilidade em esclarecer dúvidas referentes a alguns

softwares usados.

Tenho de referir um especial agradecimento a todas as equipas operacionais e técnicas da

Fábrica III da Refinaria de Sines. Ao Senhor Engenheiro Hugo Carabineiro gostaria de agradecer a

grande oportunidade de fazer testes na unidade de Hydrocracking e todo o tempo que dispensou a

esclarecer duvidas, todo o feedback dado, todas as opiniões, todos os dados fornecidos e acima de

tudo, o acolhimento e a maneira como gentilmente que desde o primeiro minuto me fez sentir

confortável num ambiente desconhecido para mim. Também gostaria de agradecer ao Senhor

Manuel dos Santos, não só pelo caloroso acolhimento na Refinaria, como também o fornecimento

de algumas informações que me ajudaram na compreensão do processo de hydrocracking e do

funcionamento da refinaria. A ele tenho também de agradecer a infinita motivação na realização

desta tese e a boa disposição com que dispensava o seu tempo para me ajudar neste trabalho.

Gostaria também de referir o apoio dos Senhores Engenheiros Eurico Correia e António Pinto, que

cederam o seu tempo e conhecimentos aquando do planeamento e realização dos testes na

Fraccionadora. Não posso esquecer o empenho, dedicação e auxílio dos chefes de turno da sala de

controlo, os Senhores Paulo Azevedo e Joaquim Santiago que acompanharam de perto todos os

testes efectuados. Gostaria ainda de agradecer aos operadores de consola, os Senhores Mário

Oliveira e Jorge Elias, pela intensa e incansável dedicação aos testes, pelos ensinamentos e

também pelo excelente e caloroso acolhimento no seu local de trabalho.

Além destes profissionais ligados directamente à área de estudos deste trabalho, também

tenho de agradecer aos colegas que partilharam o dia a dia comigo no piso 10 da Torre C da Galp.

A enorme simpatia e boa disposição tornaram o ambiente de trabalho leve e agradável.

Todos estes excelentes profissionais mostraram-me que a Galp Energia vale muito mais pelo

seu Capital Humano do que pelos seus lucros anuais.

VI

Não me posso esquecer de alguns dos excelentes colegas do Técnico, entre eles João Pedro

Ferraz, Mafalda Lancinha, Inês Lino, Juliana Mota, Marisa Pardal, Sara Bernardo e Ana Paias, entre

outros, cujo companheirismo e amizade me ajudaram nos momentos mais difíceis no IST, e que

construíram comigo momentos de pura alegria. A todos agradeço profundamente todos os

momentos que com eles partilhei.

Last but never least, gostaria de agradecer a meus pais por todas as palavras de motivação e

também por toda a dedicação e amor, sem a sua constante dedicação não teria conseguido

terminar o curso. Ao meu irmão em particular agradeço pelo exemplo de trabalho duro e empenho,

para além do humor retorcido que partilha comigo.

A todos aqui nomeados, sem excepção, quero exprimir de novo o meu mais sentido

Muito Obrigada.

VII

Contents

1. Introduction...................................................................1

1.1 Supply/Demand.............................................................................................................1

1.2 Demand by Sector.........................................................................................................1

1.3 Market Trends...............................................................................................................3

1.4 Industry Profile..............................................................................................................4

1.5 Thesis Drivers and Overview .......................................................................................7

2. Hydrocracking Process Overview..............................9

3. State of the Art............................................................13

3.1 Soft-sensor Definition and Application in Industrial Processes...................................13

3.2 Soft-sensor Methodology............................................................................................14

3.3 Data Driven Methods for Soft-sensing........................................................................16

3.3.1 Principal Components Analysis (PCA)..........................................................17

3.3.2 Partial Least Squares (PLS)..........................................................................19

4. Implementation of Step Tests...................................21

4.1 Step Tests Planning....................................................................................................21

4.1.1 Historical Data Analysis and Variable Selection............................................21

4.1.2 Sensitivity Analysis........................................................................................22

4.2 Step Tests Results......................................................................................................25

5. Soft-sensor Development..........................................29

5.1 Model A.......................................................................................................................31

5.1.1 Principal Components Analysis.....................................................................29

5.1.2 Partial Least Squares....................................................................................35

5.1.3 Model Calibration...........................................................................................37

5.1.4 Model Validation............................................................................................37

5.2 Model B.......................................................................................................................40


5.2.2 Partial Least Squares ...................................................................................45



5.3 Model C.......................................................................................................................50



VIII



5.4 Model D.......................................................................................................................59





5.5 Model Results Summary.............................................................................................68

5.6 Soft-Sensor Fault Detection........................................................................................71

6. Conclusion..................................................................75

7. Future Work................................................................77

8. Bibliography/References...........................................79

IX

List of Figures

Figure 1.1 - World supply of primary energy.[1] ..................................................................................... 1

Figure 1.2 – Percentage shares of oil demand by sector in 2010 and 2035. [1] .................................... 2

Figure 1.3 – Global product demand, 2012 and 2035. [1] ...................................................................... 3

Figure 1.4 - Crude Prices in US dollars, these include Saharan Blend, Girassol, Oriente, Iran Heavy,

Basra Light, Kuwait Export, Es Sider, Bonny Light, Qatar Marin, Arab Light, Murban and Merey. [2] .. 4

Figure 1.5 - Global Capacity requirements by process. [1] .................................................................... 5

Figure 1.6 - Crude products, source: Skrebowski Energy Institute Oil Deplection Conference, 2008. 5

Figure 4.1 – Graphical User Interface for Petro-SIM™ after the fractionator was built and modelled.

............................................................................................................................................................ 22

Figure 4.3 – Laboratory results for Y during step tests. ...................................................................... 25

Figure 5.1 - Eigenvalues and cross-validation RMSECV curves for Model A data. ........................... 31

Figure 5.2 - Scores plots: Q residuals vs Hotelling T2 (top) and confidence ellipse on PC1 vs PC2

plot (bottom). ....................................................................................................................................... 32

Figure 5.3 - Correlation Map for Model A. .......................................................................................... 33

Figure 5.4- Loadings plot PC1 vs PC2 for Model A. ........................................................................... 34

Figure 5.5- RMSECV Y (Y) vs number of LV plot for Model A. .......................................................... 35

Figure 5.6 - PLS scores plots for Model A: Q Residuals vs Hotelling's T2 (left) and Scores on LV1 vs

Scores on LV2 (right) .......................................................................................................................... 36

Figure 5.7- Model A calibration results. .............................................................................................. 36

Figure 5.8 – Parity plot of the calibration step. ................................................................................... 37

Figure 5.9 – Model A validation results. .............................................................................................. 37

Figure 5.10 – Parity plot of the validation step of Model A. ................................................................ 38

Figure 5.11 – Eigenvalues and cross-validation RMSECV curves for Model B data. ........................ 40


plot (bottom). ....................................................................................................................................... 41

Figure 5.13 - Eigenvalues and cross-validation RMSECV curves for Model B data (without outliers)

............................................................................................................................................................ 42


plot (bottom), from Model B, without outliers. ..................................................................................... 43

Figure 5.15 – Correlation Map for Model B (without outliers) ............................................................. 43

Figure 5.16 - Loadings plot PC1 vs PC2 for Model B. ....................................................................... 44

Figure 5.17- RMSECV Y (Y) vs number of LV plot for Model B. ........................................................ 45

Figure 5.18 – PLS scores plots for Model B: Q Residuals vs Hotelling's T2 (left) and Scores on LV1

vs Scores on LV2 (right). .................................................................................................................... 46

Figure 5.19 – Calibration Results for Model B. ................................................................................... 47

Figure 5.20 – Parity plot of the calibration step of Model B. ............................................................... 47

Figure 5.21– Validation Results for Model B. ..................................................................................... 48

X

Figure 5.22- Parity plot of the validation step of Model B. .................................................................. 48

Figure 5.23 - Eigenvalues and cross-validation RMSECV curves for Model C data. ......................... 50

Figure 5.24 - Scores plots for Model C. .............................................................................................. 50

Figure 5.25 - Eigenvalues and cross-validation RMSECV curves for Model C data (without outliers).

............................................................................................................................................................ 51

Figure 5.26 – Scores plots for Model C (without outliers) ................................................................... 52

Figure 5.27- Correlations map for Model C (without outliers). ............................................................ 52

Figure 5.28 - Loadings map for Model C. ........................................................................................... 53

Figure 5.29 - RMSECV Y (Y) vs number of LV plot for Model C. ...................................................... 54

Figure 5.30- PLS scores plots for Model C: Q Residuals vs Hotelling's T2 (left) and Scores on LV1 vs

Scores on LV2 (right). ......................................................................................................................... 55

Figure 5.31- Calibration Results for Model C. ..................................................................................... 56

Figure 5.32 – Parity plot of the calibration step of Model C. ............................................................... 56

Figure 5.33- Validation Results for Model C. ...................................................................................... 57

Figure 5.34 – Parity plot of the validation step of Model C ................................................................. 57

Figure 5.35 – Eigenvalues and cross-validation RMSECV curves for Model D data. ........................ 59

Figure 5.36 – Scores plots for Model D. ............................................................................................. 59

Figure 5.37 - Eigenvalues and cross-validation RMSECV curves for Model D data (without outliers).

............................................................................................................................................................ 60

Figure 5.38 – Scores plots for Model D (without outliers). .................................................................. 61

Figure 5.39 – Correlations map for Model D (without outliers). .......................................................... 61

Figure 5.40 – Loadings map for Model D. .......................................................................................... 62

Figure 5.41 – RSMSECV Y vs number of LV plot for Model D. .......................................................... 63

Figure 5.42 – PLS scores plots for Model D: Q Residuals vs Hotelling's T2 (left) and Scores on LV1

vs Scores on LV2 (right). .................................................................................................................... 64

Figure 5.43 – Calibration results for Model D. .................................................................................... 65

Figure 5.44 – Calibration parity plot for Model D. ............................................................................... 65

Figure 5.45 – Validation results for Model D ....................................................................................... 66

Figure 5.46 – Validation parity plot for Model D.................................................................................. 66

Figure 5.47 – Model A validation for the dataset 1st of November 2013 to January 13th of 2014. ..... 70

Figure 5.48 - Model B validation for the dataset 1st of November 2013 to January 13th of 2014. ...... 70

Figure 5.49- Model C validation for the dataset 1st of November 2013 to January 13th of 2014. ....... 71

Figure 5.50 – ‘Corrected’ Model A validation for the dataset 1st of November 2013 to January 13th of

2014. ................................................................................................................................................... 72

Figure 5.51 - ‘Corrected’ Model B validation for the dataset 1st of November 2013 to January 13th of

2014. ................................................................................................................................................... 72

Figure 5.52 - ‘Corrected’ Model C validation for the dataset 1st of November 2013 to January 13th of

2014. ................................................................................................................................................... 72

XI

List of Tables

Table 4.1– Simulation results for the negative step tests. .................................................................. 23

Table 4.2 – Simulation results for positive step tests ......................................................................... 23

Table 4.3 – Simulation results for both positive and negative tests in variable X22........................... 23

Table 4.4 - Fractionator step tests scheduling. ................................................................................... 24

Table 4.5 – Y magnitude of variation for each step test on variables X3, X10, X12 and X9 .............. 26

Table 4.6 - Y magnitude of variation for each step test on variables X13 and X8. ............................. 26

Table 4.7 - Y magnitude of variation for each step test on variable X22. ........................................... 26

Table 5.2- PCA results obtained for Model A data. ............................................................................ 30

Table 5.3 - PLS results of Model A data. ............................................................................................ 35

Table 5.4 - PCA results for Model B. .................................................................................................. 39

Table 5.5- PCA results for Model B (without outliers). ........................................................................ 41

Table 5.6 – PLS results for Model B data. .......................................................................................... 45

Table 5.7- PCA results for Model C .................................................................................................... 49

Table 5.8 - PCA results for Model C (without outliers) ....................................................................... 51

Table 5.9 - PLS results for Model C .................................................................................................... 54

Table 5.10 – PCA results for Model D ................................................................................................ 58

Table 5.11 – PCA results for Model D (without outliers) ..................................................................... 60

Table 5.12 – PLS results for Model D. ................................................................................................ 63

Table 5.13 – Model results summary .................................................................................................. 67

Table 5.14 – Performance criteria VAF for the modelling results. ...................................................... 67

XII

Abbreviations

ANN Artificial Neural Networks

b/d barrels of oil per day

DCS Digital Control Systems

LV Latent Variable

mb/d thousand barrels oil per day

mboe/d thousand barrels oil equivalent per day

MSE Mean Square Error

MSEP Mean Square Error of Prediction

NFS Neuro-Fuzzy Systems

OECD Organization for Economic Co-operation and Development

OPEC Organization of the Petroleum Exporting Countries

PC Principal Component

PCA Principal Components Analysis

PLS Partial Least Squares

RMSEC Root-Mean-Square Error of Calibration

RMSECV Root-Mean-Square Error of Cross-Validation

RTDB Real Time Database

SVM Support Vector Machines

VAF Variance Accounted For

VGO Vacuum Gas Oil

XIII

Nomenclature

Symbol Description

b Inner linear regression coefficient

E Principal Components Analysis residual matrix

F Partial Least Squares residual matrix

m Number of matrix columns

n Number of matrix rows

N Number of Samples

p Input loading vector

pT Transpose input loading vector

P Loading matrix from Matrix X decomposition

PT Transpose Matrix of P

q Output loading vector

qT Transpose output loading vector

Q Loading matrix from matrix Y decomposition

QT Transpose Matrix of Q

R2 Coefficient of Determination

s Number of splits

t Input score vector

T Score matrix from matrix X decomposition

u Output score vector

uT Transpose input score vector

U Score matrix from matrix Y decomposition

x Laboratory analysis

�� Model Prediction �� Mean value of the Laboratory Analysis X Input Matrix

Y Output Matrix

1

1. Introduction

Living without energy seems, nowadays, unthinkable. The need for energy comes from the

need of comfort, and this need includes heating, technology and the ability to move and travel. Most

of the technological world we live in today would not exist in the absence of fuels, and most

particularly in the absence of fossil fuels like petroleum, coal or natural gas.

1.1 Energy Supply and Demand

Fossil fuels accounted for 82% of energy supply in 2010 and will be 80% of the global total in

2035. In 2010 oil demand was of 81,2 mboe/d accounting for 32,2% of fuel shares (figure 1.1),

demand for coal was 69,8 mboe/d having 27,7% of the shares, and gas supply was 54,8 mboe/d

having 21,7% fuel shares (the remaining 18,4% are distributed between nuclear, hydro, biomass and

other renewables). The prediction for 2035 is that the oil demand will be of 100,2 mboe/d counting

for 26,3% of fuel shares, coal demand is predicted to be 104,0 mboe/d (having 27,2% of fuel shares)

and gas will have a demand of 99,8 mboe/d, accounting for 26,0% of fuel shares. By 2035 the global

oil use per head will average just 3,2 barrels, up from 2,4 in 2010 [1].

Figure 1.1 - World supply of primary energy.[1]

2

1.2 Demand by Sector

Of all sectors of oil consumption, transportation of people and goods (road, aviation, railways,

and marine transport) is the main use of oil, and the other sectors are the petrochemical industry,

agricultural/commercial/residential, and also electricity generation sector. In 2010 transportation

accounted for 52% of all oil use and the prediction for 2035 is of 60%, figure 1.2. Furthermore,

transportation is the main drive for the overall oil consumption increase and this increase is often

stimulated by demographic changes, higher wealth levels, increasing urbanization, etc, and all of

these lead to more passenger car ownership. Although car ownership will grow in OECD

(Organization of Economic Co-operation and Development) states, the major pull in car demand will

be in developing Asian countries and China, having the latest the biggest oil demand growth. In the

period of 2010 until 2035 the number of cars in OECD countries will rise by 125 million, but in China

alone, the rise is substantially dramatic, of about 380 million cars. The overall car park in 2035 will be

1,9 billion cars, more than the double of the 2010 numbers. As for global oil demand for

transportation, in 2010 demand was of 34,6 mboe/d and the prediction for 2035 is of 44,6 mboe/d. [1]

Figure 1.2 – Percentage shares of oil demand by sector in 2010 and 2035. [1]

Although air traffic is expected to rise, it is somewhat crippled by financial crisis in OECD since

72% of share in aviation oil demand is of OECD countries, and global recession made aviation oil

consumption values in 2010 smaller than in 2000.These facts show how much aviation oil demand is

closely linked to economic activity. Furthermore, it is also heavily influenced by jet fuel prices.

3

Demand for aviation oil was in 2010 of 5 mboe/d and predictions point to 7,2 mboe/d in 2035, having

the developing countries leading the growth in demand.

Oil supply in non-OPEC (Organization of the Petroleum Exporting Countries) countries, was of

46,4 mb/d in 2012 and is estimated of 45,9 mb/ by 2035, passing through a peak in 2020 of 50,3

mb/d. As for OPEC countries, supply in 2012 was of 31 mb/d and it is estimated that in 2035 supply

will be of 37 mb/d.

The refined product demand, and in the particular case of the middle distillates, in 2012 was of

32,3 mb/d (accounting for 36,3% of all distillated products) and it is estimated that in 2035 the

demand will be of 44.1 mb/d (accounting for 40,7% of all distillated products), having the biggest

growth rates of all the distillates, and particularly the diesel oil with the largest growth rate, figure 1.3. [1]

Figure 1.3 – Global product demand, 2012 and 2035. [1]

1.3 Market Trends

Crude prices have increased dramatically since the middle of the 2000’s, especially since

2006. The reason for this increase may be because there was (and still is) a rapid growth in Asian

economies that are sustained in large quantities of oil consumption. In 2008, the US faced the

longest recession since the Great Depression and therefore ceased to trade oil causing the crude

4

prices to decline. Knowing this, OPEC decides to also decrease production by the end of the year.

This decrease in production and the continuing demand in China, have a positive effect on the price,

and the prices rise steadily from the middle of the year 2009. After 2011, prices surpass those of the

previous peak in 2008 because of the civil war and loss of production in Libya, and continue to

increase due to unrest in Middle Easter and North African countries.

Figure 1.4 - Crude Prices in US dollars, these include Saharan Blend, Girassol, Oriente, Iran Heavy, Basra Light, Kuwait Export, Es Sider, Bonny Light, Qatar Marin, Arab Light, Murban and Merey.

[2]

1.4 Industry Profile

The refining capacity was usually measured by distillation capacity, but nowadays capacity for

conversion and product quality improvements prove to be the vital role in processing raw crude

fractions into more valued products, especially now that the trend is for higher demand in lighter

products with more limitative quality specifications. All new refinery projects have high levels of

desulphurization and secondary processing leading to the ability to produce high yields of light clean

products that comply with the most advanced specifications. Moreover, these new projects are

designed to be able to refine heavy, low quality crudes as well as better quality grades of crude. The

prediction for 2035 is that projects for conversion capacities will increase more that distillation

capacities, figure 1.5. Within the conversion projects, hydrocracking will have the highest growth,

because hydrocracking is the primary means to produce incremental distillate, once straight run

fractions from crude have been maximized. [1]

5

Figure 1.5 - Global Capacity requirements by process. [1]

With the rise in crude prices comes the need to make the most of the oil barrel, especially

now, that the barrel is traded at about $105 US dollars. Figure 1.6 shows the percentage yields of

the products obtained from a crude barrel. Since the oil price is rising and the demand is growing for

middle distillates, there is a need to convert the heavier fractions of the crude distillation into lighter

distillates, preferably diesel.

Figure 1.6 - Crude products, source: Skrebowski Energy Institute Oil Deplection Conference, 2008.

6

Sulphur removal from Diesel proves to be the greatest challenge of the refining industry, due

to having greater need for processing unit addictions and higher costs. Diesel quality specifications

vary between geographic regions. In EU, Japan, Hong-Kong, New Zealand, Australia, South Korea,

Taiwan, Argentina, Armenia and Singapore the limit for Sulphur concentrations for on-road Diesel is

10 ppm. In the US, Canada and Chile the limitation is of 15 ppm. In some countries, there is even a

variation between cities. In China, the limit is 350 ppm with the exception of Beijing that has a

limitation of 10 ppm, and selected cities in the country have a limitation of 50 ppm. For India, the

nationwide limitation is of 350 ppm, but for selected cities is of 50 ppm. Belarus and Thailand have a

limitation of 50 ppm. The region with the lowest Diesel quality is Africa, having sulphur limits between

2000-3000ppm, with the great exception of South Africa, that plans to reduce the limitation to 10 ppm

by 2017. [1]

In 2009 there were about 195 hydrocracker units operating worldwide, processing about

4.000.000 b/d of feedstock [3]. In the start of run, the vast majority of hydrocrackers can reach a near-

zero sulphur content. Hydrocracker designs include single-stage (either once through or with recycle)

or multiple-stage hydrocracker (usually two-stage) and can run in vacuum gas oil (VGO) cracking

mode or in light-cycle oil. In the single-stage and once–through hydrocracker there is only one

reactor and the bottom of the fractionator (the unconverted oil) is not recycled for further cracking,

and it is usually needed to hydrotreat the feedstock to remove ammonia and sulphur (or the reactor

is equipped with catalyst to perform this pre-treatment task). Single-stage hydrocrackers with

recycling are the most used configurations because the uncracked residual from the bottoms of the

fractionator returns to the reactor for further cracking, increasing the reaction’s overall yield. Two-

stage hydrocrackers use two reactors, being the unconverted oil from the bottom of the fractionator,

recycled to the second reactor for further cracking. Since the first stage reactor performs both

hydrotreating and hydrocracking, the second stage reactor feed is almost entirely free of ammonia

and sulphur. [3]

Hydrocracker technology has become a key process to convert low-value, high-sulphur,

heavy-oil fractions into valuable products. This is of particular importance in an environment where

the rising crude prices, figure 1.4, have shrunken the profit margin and have forced refineries to

consider upgrading to poorer quality crudes and difficult hydrocracker feedstocks, and also were tight

fuel regulation and emissions legislation are major operational constraints. [4]

7

1.5 Thesis Outline

The goal of the work presented in this thesis is to develop a soft-sensor that enables the Galp

Refinery at Sines to predict the quality of Y produced by their hydrocracker, and to use it in advanced

control design of the unit. Having such a sensor enables the refinery to find ways to increase the

production of excellent quality Y to meet both market demand and also the existing regulatory

limitations. This study is necessary and important since it was the first of the kind and there was no

previous work in soft-sensor design for quality prediction in hydrocracking products in a online unit.

Starting with the description of the hydrocracking process at the refinery in Chapter 2, we will

proceed by describing the chemometric tools used to develop the soft-sensor (PCA and PLS), in

Chapter 3. Chapter 4 will describe the steps taken to plan and perform the step tests in the

hydrocracker unit and also the results of the tests. Chapter 5 will describe the development and

results of calibration and validation of the soft-sensors. Chapter 6 presents the conclusions of this

study, and the in last chapter, Chapter 7, future work to be done to improve the soft-sensor is

suggested.

9

2. Hydrocracking Process Overview

Hydrocracking process is a catalytic process used for cracking the complex high-boiling, high

molecular weight hydrocarbons mixtures[5] into more valuable low-boiling products [6] like kerosene,

diesel and naphtha. Hydrocracking is a very important and flexible refinery process because it can

process a large variety of gas oils, manufacturing products with low sulphur content and high smoke

point jet fuel, in order to meet the demand of cleaner and environmentally friendly fuels [5].

In this process, the cracking of carbon-carbon single bonds and the hydrogenation of the

double bonds are complementary phenomena [7], because the cracking reaction provides olefins for

hydrogenation and hydrogenation liberates the heat for cracking [6]. The hydrogenation reactions are

highly exothermic and the cracking reactions are slightly endothermic, making the overall process

highly exothermic. Hydrogenation reactions extend not only to olefins but also to aromatic, sulphur,

nitrogen and oxygen compounds [6], making the separation of these pollutants easier, and rendering

less costly to meet of the current fuel specifications.

This chapter will describe the hydrocracking process at Sines Galp Refinery, which is

important to the comprehension of the subject at hand. The description will begin at the hydrogen

make-up compression section, followed by the reaction section (the filter system, reactor feed

section, reaction system and effluent cooling), the effluent separation, fractionator section and

storage.

The aim of the make-up compression section is to compress hydrogen to ensure a continuous

supply of hydrogen to the reaction section to preserve system pressure, since hydrogen is consumed

in the reaction and is also lost by dissolution in the hydrocarbon liquid and eventually through leaks.

This make-up compression section is composed of three parallel trains of compression, having each

one of them three stages of compression. In normal operation, only two out of three are working,

making the third a spare. The feed gas is divided between the two operating trains, compressed to

the desired reaction section pressure and then combined and fed to the reaction section.

Vacuum Gas Oil (VGO) is the fresh feed for the first stage reactor (A-01) in the reaction

section and is pre-heated in the kerosene/fresh feed exchanger (B-01), followed by further heating in

the diesel/fresh feed exchanger (B-02), being afterwards sent to the filter system (C-01 A/B/C). The

filters are designed to clear particulate material from the fresh feed that could plug the catalyst bed in

the first stage reactor, causing not only catalyst deactivation but also pressure drop problems.

After filtration, the feed is sent to the filtered feed surge drum (D-01). From this drum, the feed

is pumped to the reactor system pressure. The D-01 is designed to prevent fluctuations and losses

of feed to the pumps and reaction section.

The oil feed to the second stage reactor (A-02) is the unconverted oil from the first and second

stage reactors and comes from the fractionator (D-02) bottoms. This stream is cooled by heat

10

exchange with the feed to the fractionator furnace in the fractionator feed and bottom exchanger B-

03 and in the fractionator bottom steam generator B-04. The second stage reactor feed stream is

then pumped to the reaction section.

After leaving their feed pumps, the feeds from the first and second stage reactors are mixed

pre-heated with hydrogen from either the make-up hydrogen from the make-up compression section

or the recycled hydrogen. The combined feed mixture to the first stage reactor is heated in the first

stage feed/effluent exchanger (B-05 A/B) and afterwards in the first stage furnace (E-01). The

second stage feed mixture is heated in the second stage reactor/effluent exchanger (B-06) and

afterwards in the second stage furnace (B-02).

The heated gas/oil mixtures are fed to their respective stage reactor: the first stage reactor,

(that has two types of catalyst, one for hydrotreating and the other one for hydrocracking, having six

catalyst beds) and the second stage reactor (that has only hydrocracking catalyst, having four

catalyst beds). As soon as the feed contacts the catalyst, the reaction begins and, because the

reactions are highly exothermic, the temperature of the mixture increases and also that of the

catalyst’s beds. To prevent excessive heating and to control the reaction temperature, a quench gas

(hydrogen) is introduced between the catalyst beds of each reactor section.

After reaction, the reactors effluents consist of product oil, excess hydrogen not consumed in

the reaction and light gases formed during hydrocracking. The stream leaving the first stage reactor

is cooled by heat exchange with the reactor’s feed in B-05 A/B and then mixed with the second stage

effluent. The second stage effluent is cooled by exchanging heat with the fractionator feed in B-07

and afterwards it is combined with the first stage reactor effluent. This mixture is further cooled B-08

and then sent to a steam generator (B-09) to complete the cooling before feeding the hot high

pressure separator (HHPS), D-03. The D-03 is designed to separate the excess hydrogen from the

reaction liquids, enabling the recycling of the hydrogen gas to the reaction section, in order to reduce

the cost of producing hydrogen. The remaining liquid products are then let down in pressure by the

power recovery engine (F-01) and is then flashed in the hot low pressure separator (D-04). This high

temperature, low pressure flash enables the separation of dissolved hydrogen gas in the liquid, and

the gas is recycled to the reaction section.

The D-04 bottoms is fed to the product stripper (D-05) to separate H2S, LPG and some

naphtha from the liquid reaction product. This stripper has three packed beds. The stripper bottoms

is heated in B-08 by heat exchange with the reactors effluent, also by exchanging heat with the

second stage reactor effluent in B-07. This stream is then further heated by heat exchange with the

fractionator bottoms stream, in the fractionator bottoms/feed exchanger (B-10), before being sent to

the fractionator feed furnace (E-03).

The Fractionator feed is heated in the fractionator feed furnace with the aim of producing

enough vapour rates so that overflash is produced in the column (in this case, overflash is defined as

the ratio of volumetric liquid going to the stripping section and the total volumetric rate of the distillate

products).

11

The product fractionator (D-02) in normal operation, the light naphtha is sent overhead, heavy

naphtha kerosene and diesel are drawn trough sidecut and diesel is split into product and

pumparound. The unconverted oil is drawn as the bottoms and the feed enters the column in the

flash zone.

Superheated low pressure stream is used in the fractionator’s stripping section to recover any

products from the bottom before it is pumped from the column. This steam is cooled by heat

exchange with the feed in the fractionator bottom and feed exchanger (B-10).

Before being sent to the fractionator reflux drum, the overhead vapour is totally condensed at

the fractionator overhead air cooler (B-11) and also at the overhead trim cooler (B-12). The reflux

drum is a horizontal vessel designed to separate oil from water which is collected at the boot of the

vessel and sent to the injection water drum (D-06). Part of the oil (light naphtha) is pumped to the

light ends section and the remaining is pumped back as reflux to the fractionator.

Heavy naphtha is drawn and flows to the heavy nafta stripper (D-07). This stripper has valve

trays and a thermosiphon reboiler that exchanges heat with the diesel pumparound. The heavy

naphtha vapour is returned to the fractionator, and the bottoms is pumped to the light ends section.

Kerosene is drawn from the fractionator column and is sent to the kerosene stripper (D-08)

that is similar to D-07, that is, has trays and a thermosiphon reboiler that exchanges heat with the

diesel pumparound. The stripper’s vapour is returned to the fractionator, and the bottoms is pumped,

cooled and sent to storage.

Diesel is drawn from a chimney tray of the fractionator and the flow is split between a

pumparound stream and also a stream fed to the diesel side stripper (D-09). This stripper uses

superheated low pressure steam to remove light components from the product. It’s overhead vapour

is returned to the fractionator and the diesel stripper bottoms is cooled by exchanging heat with the

first stage fresh feed exchanger B-13, being further cooled by reboiling the B-14 (deethanizer

bottoms reboiler). Because the diesel is a stream stripper, the water must be taken off to meet

product specifications, therefore, the stream is sent to the diesel vacuum drier air cooler (B-15), and

afterwards to the diesel vacuum drier (D-10). The bottoms of D-10 is then cooled in the diesel air

cooler (B-16) and later in the diesel trim cooler (B-17). Part of the resulting stream is to be sent to the

cold low pressure separator (D-11, designed to release hydrogen rich vapour, and after amine

treating is recycled to the reactors), in the reaction section to be used as sponge oil. The remaining

diesel is sent to storage.

The diesel pumparound stream removed from the fractionator reduces column traffic above

the diesel tray side cut and removes valuable high temperature heat that provides heat for four

column reboilers and also produces medium pressure steam before it returns to the fractionator. So,

this stream is pumped to the kerosene stripper reboiler (B-18), the heavy naphtha reboiler (B-19), the

naphtha splitter reboiler (B-20) and the naphtha stabilizer reboiler (B-21) and finally to the medium

12

pressure generator (B-22) to ensure a continuous pumparound heat removal before entering the

fractionator [8].

13

3. State of the Art

This chapter presents the state of the art of soft-sensors and their scope and useful

application in process industries. Moreover, soft-sensoring development and its difficulties will be

discussed and also data-driven methods for soft-model development, particularly Principal

Components Analysis (PCA) and Partial Least Squares (PLS), will be characterized and discussed.

3.1 Soft-Sensors for industrial processes

Chemical plants are usually highly instrumented and have a large number of sensors that

collect measured data for process control and monitoring. About two decades ago researchers

began using the large amount of data to build predictive models, and these models are called, in

process industry, Soft-Sensors. The term soft-sensor is a combination of the words ‘software’ (mainly

because models are developed in computer programs) and ‘sensors’, because these models are

providing similar information as hardware sensors. These soft-sensors are often divided into two

categories: model-driven and data-driven [9,10]. Model-driven sensors (also called white-box models)

are most commonly based on First Principle Models that describe the physical and chemical

properties of the process[9,10], are developed primarily for the planning of the plants and usually only

describe ideal process steady-states and not real process dynamics, focusing on the description of

the optimal process steady-state, (therefore not being useful or suitable for the description of any

dynamic state), describing a simplified theoretical background rather than real-life process conditions [9] and being somewhat computationally intensive for real-time applications[10,11] .

Data-driven models do not have this disadvantage because they are based on data measured

within the processing plants, thus describing the true conditions of the process in a better way[9,10],

providing real-time information necessary for effective quality control[11] Data-driven models are also

known as black-box techniques because the model itself has no knowledge about the process and is

based on empirical observations of the process. These models are based in real-life measurements

recorded, stored and provided as historical data.[9]

The span of tasks performed by Soft-Sensors is quite broad but the most common use is the

prediction of process variables that can only be known either at low sampling rates or through off-line

analysis [9,12]. These variables are usually very important for process control because they are

usually related to the process output quality and it is naturally important and necessary to deliver

additional information about these variables at higher sampling rate or lower financial burden[9,13],

hence the use of soft-sensors. Another field of application of soft-sensors is of process monitoring

and process fault detection by finding the state of the process and identification of the deviation

source. As previously said, real industrial plants have many sensors, and there is a certain

14

probability of a sensor failing. Detecting this failure is also the soft-sensor job, adding that it can act

up as a backup sensor while the hardware sensor is replaced, or, if the soft-sensor proves to be

good, it can act as a replacement for the hardware measuring device. [9]

Measuring variables that define product quality is a major problem in process industries.

These variables are called primary or quality variables quantify the productivity or the specifications

upon which the product is sold, like purity or physical or chemical properties, and these are the most

difficult to measure online . The online variables that are easy to access and measure are often

called secondary variables and can be temperature, pressure and flow rate and can be used to infer

primary variables. Because of the nature of chemical and processing engineering systems, the

dynamics and state of the secondary variables reflects the dynamics and state of the primary

variables, meaning that changes in secondary variables are indicative of changes in product quality.

The technique of using secondary variables to generate estimates of product quality is usually called

‘soft-sensing’ and these inferential estimators are usually in place of direct on-line measurement of

controlled variables if direct measurements are expensive, unreliable or add large lag[13].

Soft-sensors have been used for estimation of product composition of distillation columns,

particle size distributions in a grinding circuit, monitoring emissions of NOx, SO2 and CO2 in industrial

boilers and furnaces, ensure high and consistent product quality in the pharmaceutical industry and

process reliability[11]. They have also been used as a feed oil classifier to determine feed oil type by

estimation of kerosene dry point[14], modelling of an activated sludge plant for detection of shifts in

the process of various kinds[15] modelling product quality in a crude desalting and dehydration

process [13,] for oil sludge depository classification for waste treatment [16], to study the influence of

minerals on the taste of bottled tap water[17], modelling of ground-level ozone and factors affecting

it’s concentrations[18], to the prediction of product quality for catalytic hydrocracking of vacuum gas

oil[10], just to say a few.

3.2 Soft-Sensor methodology

There are some problems affecting the development of the soft-sensors, and usually they are

related with measurement noise, missing values, co-linear features and varying sampling rates.

Adding to this, process plants are usually dynamic environments and abrupt changes can exist like,

for example, the quality of the process input changes, that results in prediction accuracy

deterioration[9].

A challenge issue in soft-sensor development is data co-linearity, because typically, measured

data in process industry is strongly co-linear and results from partial redundancy in the sensor

arrangement ( for example: two neighbouring temperature sensors will collect strongly correlated

measurements). As the measurements collected are usually for process control purposes, there is a

15

great number of information accumulated that is data rich but information poor. For soft-sensor

modelling, the requirements are of other kind: only informative variables are required and any other

information just adds to model complexity, having a negative effect on model training and

performance. To deal with this problem, two methods are widely accepted, PCA and PLS, that

transform the input variables into a new reduced space with less co-linearity[9].

The presence of missing data presents difficulties in model development. Since it is necessary

to use the maximum amount of samples to develop a model, missing data or removal of incomplete

data decreases the accuracy of model estimates. Also, when the soft-model is applied and used to

estimate the quality variable as a part of a control system, the sensor must be able to deal with the

failure of some online measurements and still be able to provide reliable estimates[19]. Since the

possibility of having representative data is larger in large datasets than in small, missing data should

cause less problems in large datasets than in small datasets, because in large datasets any direction

is still fairly well represented, at least as long as one works in subspaces of projections, like PCA or

PLS[20].

In soft-model construction methodology there are no widely accepted guidelines, but there are

steps that are frequently taken in its development. The presented procedure is rather general but

resumes the most common steps in model development.

The usual first step is the first data inspection, where data structure is overviewed and any

obvious problems are identified, like locked variables having constant value. The next step is to

assess model complexity, that is, deciding if there is only a need for a simple regression model or a

more powerful tool like PCA or PLS analysis, for example, to develop the soft-sensor. Also,

assessing the target variable is very important, because there has to be enough variation in the

output variable and understand if it can be modelled at all.

Then one proceeds to the selection of historical data and identification of steady states. In this

stage a dataset is selected for training and another for validation. The stationary parts of the data are

identified, selected and used in model development. Next, data must be pre-processed. A typical

pre-processing step is to normalise the data to zero-mean and unit variance (as required for PCA),

but other types of pre-processing are also employed, such as handling missing data, outlier detection

and replacement, selection of relevant variables, and handling of drifting data. The data processing

is usually done iteratively until the developer considers the data and the model ready for validation.

Data pre-processing is considered to be the most time consuming, manual work demanding and

expert knowledge of the underlying process.

Following pre-processing, the next phase is model selection, training and validation. Selection

of model type is critical for soft-sensor performance. There is not a theoretical unified approach for

this step and usually model type and its parameters are selected in an ad hoc manner and its

selection often subjected to the developer’s past experience, expertise, and personal preference.

However, there are some techniques that can be adopted, such as starting with a simple model type

and then increase model complexity as long as significant model improvement can be observed (by

16

accessing model performance with independent data). After finding the optimal model structure and

training the model, the soft-sensor has to be validated with independent data. The evaluation of its

performance can be done numerically, by the use of the Mean Square Error (MSE), which measures

the average square distance between the predicted and the real value, and by visual representation

of the predictions. One disadvantage of this last method is that the final decision if the model

performs adequately is rather subjective depending on the model developer experience.

Finally, after its developing, the soft-sensor has to be maintained and tuned on a regular basis,

and this is necessary due to the fact that drifts and other changes in the data deteriorate the

performance of the soft-sensor, and have to be compensated by adapting or re-developing the

model[9].

3.3 Data-driven methods for soft-sensing

Using soft-sensors in crude oil distillation with varying feed-stock is still a difficult problem to

solve because of the relationship between easily measured process variables and the difficultly

measured quality variables vary with the types of crude processed. Moreover, most of the refineries

use mixed sources of crude oil with varying blending ratios, and the relationship between process

variables and quality variables varies with different crude oils or blends[14].

Hydrocracked products are separated into different fractions that constitute the blending

stocks for the final products. The product quality is significantly influenced by operating conditions

and the cracking yield is reduced with time by catalyst deactivation. Therefore, the continuous

monitoring of product quality is very important especially to avoid off-spec petroleum fractions, that

usually cause problems downstream at the blending stage[10].

It is usually difficult to get precise and reliable product composition measurements without time

delay because most composition analysers have significant time lags and their reliability is usually

quite low. Using tray temperature could be an indication of temperature, but the presence of off-key

components in multicomponent mixtures, column pressure, and also feed rate jump can affect tray

temperatures preventing it of being an exact indicator of composition. Due to the strong correlation

between tray temperature measurements, Principal Components Analysis (PCA) or Partial Least

Squares (PLS) methods should be applied[21].

The most used modelling techniques applied to data-driven soft-sensors are the Principle

Component Analysis (PCA) in a combination with a regression model, Partial Least Squares (PLS),

Artificial Neural Networks (ANN), Neuro-Fuzzy Systems (NFS) and Support Vector Machines (SVMs)

[9].

17

Several reasons motivate the multivariate approach to a problem. Process deviations are not

always detected by looking at one variable at a time, and often these deviations occur

simultaneously in many variables and even though a variation is very small it can pose a significant

influence on product quality. If the process drift to out-of-control state can be detected in early

stages, corrective measures can be taken sooner to avoid such states. Also, if many variables have

been measured, the effect of noise can be drastically diminished by modelling correlation structures

among the different variables [15], and by the reduction of data dimension by using, for example, PCA.

In this thesis PCA and PLS methods will be used because they are widely accepted and they

are usually the first approach to soft-model development for of process control.

3.3.1 Principal Components Analysis (PCA)

Noise can be found in almost all variables of the majority of datasets. Latent variable models

like PCA and PLS estimate the relevant part and the noise of each variable and therefore are used in

the present work[20]. Principal Component Analysis (PCA) was used for analysing the data so that

only the secondary variables important to the determination of product quality were selected[13].

Using PCA, the data can be described using far fewer variables than the original variables with no

significant loss of information, and also, PCA often produces linear combinations of variables that are

useful predictors of particular processes[12]. Mathematicaly, PCA relies on an eigenvector

decomposition of the covariation or correlation matrix of the process variables. Here X represents a

matrix (n x m) where its rows correspond to the samples and its columns correspond to the

variables. PCA the decomposes the data matrix X into the sum of the outer product of vectors ti and

pi (i=1,2,3...,k) plus a residual matrix E, equation 2.1 and 2.2 (matrix form).

� � �� ⋯ �� (2.1) Or,

� � �� (2.2)

Where PT is made up of the �� as rows and T of the �� as columns, and k in equation 2.1 must be less than or equal to the smaller dimension of X, i.e � � ��, ��. Vectors ti (n x 1) and pi (m x 1) are the ith score vector and loading vector, respectively. Score vectors are orthogonal and unit length

and loading vectors are also orthogonal. Loading vector p1 defines the direction of greatest

variability, and score vector t1 (also known as the first principal component) represents the projection

of each column of X onto p1, being the linear combination of the columns of X explaining the greatest

amount of variability (�� ). The second principal component is also the linear combination of the columns in X explaining the next greatest amount of variability (�� ) subject to the condition that it is orthogonal to the first principal component. Principal components are ordered in decreasing

18

variability. Since the X columns are highly correlated, the first few principal components can explain

the majority of data variability[21].

In equation 2.1, k represents the number of principal components to retain, and E (n x m) is

the residuals matrix of unfitted variation (or noise) [21]. The matrix product of T and PT reproduces the

most important variation in X. This matrix is a projection of the X-data onto a new low-dimensional

space, where it can be effectively analysed. This reduction on space dimensionality is achieved due

to correlations between the variables in matrix X, and this is the main reason why this method is

specifically advantageous for data analysis with a large number of mutually correlated variables[16].

The ��vectors are the eigenvectors of the covariance matrix, that is, in equation 2.3: �� (2.3)

Where � is the eigenvalue associated with the eigenvector ��. In PCA the �� are the loadings and contain information on how variables are related to each other. The �� form an orthogonal set while the �� are orthonormal. In equation 2.4 note that

�� or � � �� (2.4) The pairs �� and �� are in descending order of �, having the first pair captured the largest

amount of information of any other pair in the decomposition, and each subsequent pair captures the

greatest possible amount of the remaining variance[12].

The higher the loading of a variable, the more it contributes to explaining the variation of a

particular principal component, and only variables with loadings higher than 50% should be selected

for principal component interpretation, and any principal component with a value equal or greater

than one, is usually considered of statistical relevance[13]. The matrices T and P provide valuable

information on the internal data structure. These matrices are interpreted based on the fact that

correlation between two variables (or similarity between two samples) is a function of distance in the

PC-Space[16]. Pairwise scores plots are often referred to as ‘sample maps’ revealing their grouping

and outliers. Similarly, the loading plots (variable maps) show variable correlations. The distance

from the origin to a sample in the score plot or a variable on the loadings plot along a certain PC

reflects their importance in regard to that PC[16].

The number of principal components to be retained in the model is usually determined by

cross-validation and the dataset for building a model is divided into training and testing (validating)

data set[21]. In this study, the source of training and testing data is from the process data records,

which are recorded and collected from the DCS systems, and the corresponding laboratory analysis.

One of the limitations of pure PCA is that it can only effectively handle linear relationships of

the data and cannot deal with data non-linearity. Another disadvantage is the selection of the optimal

number of principal components (that can be addressed by using cross-validation techniques).

19

Another problem is that the principal components describe very well the input space but do not

explain the relations between the input and output data, that is usually what has to modelled[13].

3.3.2 Partial Least Squares Regression (PLS)

The regression problem, that is, the modelling of response variables (primary variables) Y, by

means of a set of predictor variables (secondary variables) X, is one of the most common problems

in data-analysis in science and technology, and one example of such problems may include relating

the quality and quantity of manufactured products (Y) to the conditions of the manufacturing

process[22].

The PLS algorithm pays attention to covariance matrix that brings together the input and

output data space. This method decomposes the input and output simultaneously while keeping the

orthogonality constraint, having the model focussed on the relation between the input and output

variables[9]. PLS can be seen as an extension of PCA. This method is concerned with two data

blocks, X and Y, and the objective is to model X in such a way that Y can be predicted as well as

possible, maximizing the covariance between matrices X and Y. Matrix X is decomposed into a score

matrix T and a loading matrix P[9,21] as show in equation 2.5 and 2.6 (matrix form):

� � ∑ �". �" �"$� (2.5)

� � �� (2.6) In a similar way, y can be decomposed in a score matrix U and a loading matrix Q, in

equations 2.7 and 2.8 (matrix form):

% � ∑ &". '" ("$� (2.7)

% � )* ( (2.8) Most of the variance of matrix Y is explained by the first latent variable that is extracted from

the matrices X and Y. In a similar way, the second latent variable is extracted from the residual

matrices which has not been described by the first variable, and so on. When optimal number of

latent variables are calculated, the remaining variance is considered noise[9].

The objective of this method is to fit a linear relationship between the dependent X variables

and independent Y variables by performing a least squares regression between each pair of

corresponding t and u latent vectors, equation 2.9:

&+" � �"," - � 1,2, … . , � (2.9)

20

Where bj is the coefficient from the inner linear regression between the jth latent variables tj

and uj, that is, in equation 2.10:

," � &" . �" 1&" . &"23 (2.10)

Linear PLS leads to the decomposition of the X and Y matrices into a number of rank-one

matrices. This decomposition can be defined as the product between each pair of input score

vectors, t, and predicted output score vectors ,û, and a set of corresponding input and output loading

vectors p and q. [21]

The PLS method prediction performance was characterized by the Root Mean-Square Error of

Cross-Validation (RMSECV) (equation 2.11) [24]:

456�78 � 9∑ �:+;;?@ A (2.11)

PLS is a simple and powerful approach for data-analysis for complex problems because of its

flexibility and ability to deal with incomplete and noisy data with multiple variables and observations

(measurements). In this study, PLS will only model one variable, but the method is able to model

several response variables[22]. The disadvantage of PLS is that like PCA, it can only model linear

relations between the data [9].

21

4. Implementation of Step Tests

Performing step tests in the unit was of great importance and that was proposed early in this

work. The unit is new and had never been submitted to step tests and therefore these tests were

planned and performed, in order to better understand it’s response and behaviour. By better

understanding the process performance, and by submitting the unit to step tests, we hoped to

develop a soft-sensor that could explain and predict the behaviour of Y (the primary/quality variable)

even in the case of the unit operating out of the specified operating temperature, pressure and flow

values.

This chapter presents the planning and the development of the tests carried out in the

Refinery.

4.1 Step Tests Planning

4.1.1 Historical Data Analysis and Variable Selection

Since step tests had never been performed in this unit the first approach was to select the

variables that would have an influence on Y. This first step included the study of the fractionator

together with the insight and experience of the Refinery Team and the Thesis Supervisors, and after

some exchange of ideas and suggestions it was agreed that the variables X3, X8, X9, X10, X12, X13

and X22 were to be tested.

The next step was to build a preliminary model using the historical data available using PCA

followed by PLS (obtained in a similar fashion as described in chapter 5). This model was to be used

only in assessing if a given step test would indeed influence the quality variable Y, and if it did, how

long it took the quality variable to stabilize. Then we looked into the historical data and checked if

there were disturbances in the secondary variables selected previously that could be considered a

‘step-test’ (like a sudden decline or increase of rate or temperature). Using those ‘step-test’ values,

we calculated the Y results, and evaluated and estimated the quality variable setting time for each

step test. After carefully analysing the data, it was found that for variables X8, X9 and X10 the setting

time between tests was required to be at least 2 hours; for variables X3, X12 and X13 the setting

time between tests was to be at least 3 hours and finally, for variable X22, the setting time between

tests was to be at least 5 hours. The sequence for the variables testing was agreed with the Refinery

Team in order to reduce the overall impact in the operating conditions.

22

4.1.2 Sensitivity Analysis

To evaluate the impact that the tests could have on the quality variable Y, the fractionator was

modelled using the simulator Petro-SIM™ version 4.1 (Modelling platform for refiners, petrochemical

and gas processing plants, from KBC Oil and Gas Consulting). To model the fractionator in this

software, the fluid packaging chosen was Peng-Robinson-LK and all the rates, temperatures and

pressures used were the ‘Base Case’ values of the Chevron’s Manual as the licensor’s of the

hydrocraking unit of Sines Refinery [8].

To start modelling the unit, one must first define the mixture of the feed stream, that is, define

the viscosity, standard density and ASTM D86 distillation for each oil compound of the feed stream.

Then, the compounds must be blended and the feed composition must also be described. After this

procedure this stream is included in the unit and its flow, temperature and pressure are defined.

Following this procedure, the side strippers were installed, as well as the stripping vapour streams,

and the pumparound and bottom stream.

The next step is to assign the pressure and tray efficiencies of the column. The printscreen of

the fractionator after modelling is in figure 4.1.

Figure 4.1 – Graphical User Interface for Petro-SIM™ after the fractionator was built and modelled.

23

After modelling the unit in Petro-SIM™, the amplitudes of the step tests were tested to acess

their influence in Y. Based on previous tests in other units of the Refinery, it was decided to test if the

impact of the following steps of -3%, -5%, -7%, -10%, -13% and -15%, and if changes of 3%, 5%,

7%, 10%, 13% and 15% on each of the chosen secondary variables (except X22) had any influence

on the quality variable. The results obtained for each of these simulation tests are shown in tables

4.1, 4.2 and 4.3. Tables 4.1 and 4.2 show the simulation test results for X3, X8, X9, X10, X12 and

X13. Each of the table’s lines express the percentage variance in the quality variable caused by the

test in the corresponding secondary variable.

Table 4.1– Simulation results for the negative step tests.

Input Output Step Tests Amplitude

-1(%) -3(%) -5(%) -7(%) -10(%) -13(%) -15(%) X3

Y deviation

(%)

0,41 1,46 2,54 3,71 8,20 X8 -0,05 -0,17 -0,29 -0,42 -0,61 -0,88 -0,95 X9 -0,09 -0,29 -0,49 -0,71 -1,05 -1,40 -1,64 X10 -0,61 -0,61 -1,27 -1,60 -2,97 -4,61 -5,72 X12 -6x10-5 1x10-4 4,5 x10-5 1x10-4 6 x10-5 8 x10-5 1x10-4 X13 -0,13 -0,43 -0,72 -1,00 -1,39 -1,39 -1,61

Table 4.2 – Simulation results for positive step tests

Input Output Step Tests Amplitude

1(%) 3(%) 5(%) 7(%) 10(%) 13(%) 15(%) X3

Y deviation

(%)

-0,39 -1,09 -1,64 -1,60 -1,99 -2,26 -2,41 X8 0,00 0,16 0,26 0,36 0,55 0,72 0,83 X9 0,09 0,25 0,44 0,63 0,90 1,14 1,29 X10 0,52 1,05 1,52 1,54 2,44 2,36 2,79 X12 -3x10-6 9x10-7 -5x10-5 -4x10-5 -7x10-5 -6x10-5 -3x10-5 X13 0,13 0,39 0,70 0,98 1,35 1,68 1,88

Table 4.3 shows the results of the simulation step tests results for variable X22. The tests

amplitude of disturbances was different because of the nature of the variable. The step amplitudes

used in the previous variables would not have any noticeable effect in Y, so the amplitudes of the

step tests were increased in this case.

Table 4.3 – Simulation results for both positive and negative tests in variable X22.

Input Output Step Tests (%)

-20 -10 +10 +20 +35 +40 X22 Y deviation (%) -0,141 -0,112 -0,056 -0,028 0,014 0,028

Analysing the previous tables one might be tempted to conclude that the magnitude of these

step tests on these variables has little influence in the Y, however, as can be seen in chapter 5,

most of these variables are present on the models developed, demonstrating their importance.

24

Moreover, the amplitude of the step tests had to be such that the production of the unit would not be

largely affected, hence the small magnitude of the tests.

4.2 Step Tests Results

The scheduling and the sequence of the variables testing was organized to accommodate the

Refinery conveniences, in order to minimize the impact into the production profile and quality. The

step tests were performed as much as possible without disrupting the Refinery’s routines. For each

particular test a sample was taken and the time stamp of the sample was annotated. Samples were

only taken after the calculations using the preliminary model showed that the quality variable Y had

stabilized after a given step test. The scheduling of the step tests can be seen in table 4.4.

Table 4.4 - Fractionator step tests scheduling.

Predicted

time Actual Time

Variable Test Magnitude

Sample Time

Day 1 14:00 14:17 X3 -1 % 16:45 17:00 16:50 X3 +1 % 20:10 20:00 20:16 X3 -3 % 23:16 23:00 23:16 X3 +3 % 01:57 Day 2 2:00 2:00 X10 -3 % 03:53 4:00 4:00 X10 +5 % 05:53 6:00 6:00 X10 -5 % 08:07 8:00 8:10 X10 -7 % 09:57 10:00 10:00 X10 +5 % 11:57 12:00 12:04 X10 +5 % 13:05 14:00 13:40 X12 +7 % 16:21 17:00 16:27 X12 -1 % 18:52 20:00 19:00 X12 +2 % 20:55 23:00 21:20 X12 +5 % 22:57 Day 3 2:00 23:03 X12 -5 % 01:01 2:00 01:58 X9 +7 % 03:12 4:00 03:15 X9 -5 % 05:11 6:00 05:16 X9 -7 % 06:40 8:00 07:04 X9 +5 % 08:45 10:00 09:00 X9 -5 % 10:19 14:00 13:53 X13 +7 % 17:00 17:00 17:07 X13 -5 % 18:45 Day 4 24:00 03:23 X13 -7 % 05:17 3:00 05:18 X13 +5 % 07:25 6:00 07:27 X13 +5 % 08:45 9:00 09:21 X8 +13 % 10:40 11:00 10:48 X8 +1 % 11:54 13:00 11:59 X8 -5 % 13:47 15:00 13:53 X8 +11 % 15:06 16:00 15:10 X8 +11 % 16:30 18:00 16:41 X22 -15% 18:47 22:00 19:46 X22 +15% 22:10 Day 5 3:00 22:17 X22 +15% 00:53

25

0,99

0,995

1

1,005

1,01

1,015

1,02

1,025

1,03

1,035

0 10 20 30 40

Y

Sample number

Pre-tests

samplesX3

X10

X12

X9

X13

X8

X22

Table 4.4 shows the predicted starting time for each test and the actual time the test was

started as well as the variables to be tested, the magnitude of the test, the time the sample was

taken and the result of that same sample. Most of the time the step tests started earlier because the

quality variable Y stabilized earlier than predicted, and the next test could be made sooner.

As expected, the step tests planned had a clear effect on the Y. All Y results (real values or

predicted values) presented here and throughout this thesis are shown in an adimensional form, as

seen in equation 4.1:

% � BCDEFGHIJFKLG�AKGHI (4.1)

The target range of [0.97,1.03] degrees Celsius for Y was covered. Having the Y cover a wide

range of values allows the dynamic data to accommodate the influence of a wider range of process

conditions on the quality variable. The laboratory analysis error for Y is 0.30% of the set point value

of the quality variable Y, and most step tests results have been bigger than 0.30%.

As noted in the subchapter 4.1.2, most of these variables end up appearing in the models

developed in the next chapter. Variables X3, X10, X13, X8 and X22 appear in Model B, variables X3

and X13 appear in Model C, and variable X13 appears in Model A.

Figure 4.2 – Laboratory results for Y during step tests.

Tables 4.5 to 4.7 presented show the effects of consecutive step tests had on the quality

variable. From the analysis of figure 4.3 and the above mentioned tables, we can infer which

selected step test variables influenced the most the response of the quality variable Y, and they are

X3, X10, X13, and X22.

26

Table 4.5 – Y magnitude of variation for each step test on variables X3, X10, X12 and X9

Table 4.6 - Y magnitude of variation for each step test on variables X13 and X8.

Table 4.7 - Y magnitude of variation for each step test on variable X22.


Y Magnitude of Variation



X3

-1 % -0,47 % X12 +7 % 0,30 % +1 % 0,71 % -1 % -0,05 % -3 % -0,68 % +2 % -0,30 % +3 % -1,07 % +5 % 0,05 %

X10 -3 % 1,89 % -5 % 0,49 % +5 % -1,55 % X9 +7 % -0,49 % -5 % -0,19 % -5 % -0,60 % -7 % 2,30 % -7 % 0,50 % +5 % -0,92 % +5 % 0,27 % +5 % -0,03 % -5 % 1,53 %




Y Magnitude Variation

X13 +7 % -3,56 % X8 +13 % 0,14 % -5 % 0,08 % +1 % -0,05 % -7 % 0,92 % -5 % 0,14 % +5 % 0,69 % +11 % 0,08 % +5 % -0,22 % +11 % 0,11 %

Variable Test Magnitude Y Magnitude of variation X22 -15% -1,10 %

+15% 0,80 % +15% -2,17 %

28

5. Soft-sensor Development

The first approach to soft-modelling is usually by the use of the most widely accepted linear

tools, like PCA and PLS regression. The main advantage of methods like PCA and PLS is that they

can cope with highly correlated variables. This characteristic is suitable for analysing data from

hydrocracking process units, because hydrocracking processes are multivariable systems and many

of these variables are mutually correlated. To perform this type of analysis and model development,

historical plant data for selected variables was collected and step tests were performed for carefully

chosen variables and process conditions.

In this section the quality variable Y is predicted using 25 online variables available in the

database. These variables include flowmeters, temperature and pressure sensors and all are online

measured variables. The selection of which variables should be included in the soft-sensor is a

complex task and the strategy consists in finding a good variable subset capable of making accurate

predictions. In this study two methods are used to obtain the soft-sensors to predict the quality

variable: Partial Least Squares (PLS) as a linear modelling tool, and Principal Component Analysis

(PCA) as a tool to select a good model variable set and to strip down the models from outliers and

noise.

Dataset were collected directly from the Digital Control System (DCS) and the Real Time

Database (RTDB) of the Refinery and were used to build four models. The soft-sensors obtained

from these data were labelled Model A, B, C and D, and the datasets are:

Model A: The soft-sensor is obtained from training data collected during the week of the step

tests, in 2013 from October 27th to October 31st, using the same data to calibrate the model.

Model B: The soft-sensor is obtained from training data collected in 2013 between the August

1st and the October 31st, using the same data to calibrate the model.

Model C: The soft-sensor is obtained using training data collected in

soft-sensor development for hydrocracker product quality … · paula sofia lourenço barbosa...

Documents

mutarelli, lourenço - transubstanciação

luciano lourenço luciano lourenÇo coords. · estruturas...

luciano lourenço luciano lourenÇo coords

são lourenço 2011

lourenço marques

apresentação pedro lourenço

são lourenço parque_das_aguas

bibliografia eduardo lourenço

sÃo lourenÇo - fsspx

probabilidade alelos múltiplos prof. lourenço ...

hydrocracker unit.docx

sao lourenço mg

cartaz heitor lourenço

são lourenço

sao lourenço meco_pop_goes_the_movies

jose lourenço

uitbreiding van de hydrocracker installatie exxonmobil

lourenÇo filho_fev2010.pmd

cidadania lourenço castanho

see quench overrides can control hydrocracker temperature