soft-sensor development for hydrocracker product quality … · paula sofia lourenço barbosa...
Post on 28-Jan-2021
0 Views
Preview:
TRANSCRIPT
-
I
Soft-sensor Development for Hydrocracker Product
Quality Prediction
Paula Sofia Lourenço Barbosa
Thesis to obtain the Master of Science Degree in
Chemical Engineering
Supervisors: Professor Dr. Carla Isabel Costa Pinheiro (I.S.T. Portugal)
Eng. Dora Luísa Rodrigues Moura Nogueira (GalpEnergia S.A.)
Examination Committee
Chairsperson: Professor Dr. Sebastião Manuel Tavares da Silva Alves
Supervisor: Professor Dr. Carla Isabel Costa Pinheiro
Member of the Committee: Professor Dr. José Monteiro Cardoso de Menezes
June 2014
-
II
-
I
Man replies:
You created night, I the lamp
You created clay, and I the cup
You-desert, mountain peak and valley
I-flower bed, park and orchard
It is I who grind a mirror out of stone
And brew elixir from poison
Excerpt from ‘Dialog Between Man and God’ by Muhammad Iqbal.
-
II
-
III
Resumo
Com o intuído de maximizar a sua capacidade produtiva, além da maximização do
rendimento de cada barril de petróleo, a Refinaria de Sines investiu na instalação de uma unidade
de hydrocracking. Dado que todos os combustíveis produzidos são objecto de rigorosa
regulamentação, é necessário exercer um controlo apertado sobre a sua qualidade. Assim sendo,
com o objectivo de implementar controlo avançado na unidade, procedeu-se a uma primeira
abordagem à previsão de qualidade de Y produzido fazendo uso de um soft-sensor.
Para desenvolver o soft-sensor para previsão da qualidade de Y, a unidade foi estudada,
foram escolhidas variáveis de interesse e os seus dados históricos foram recolhidos e analisados.
Procedeu-se também à realização de step-tests na unidade fabril real para melhor conhecimento da
dinâmica e comportamento da fracionadora. Realizou-se posteriormente uma análise multivariada
usando Principal Components Analysis seguido de regressão com Partial Least Squares para
obtenção de um modelo linear que pudesse prever da melhor forma a qualidade de Y.
Foram construídos quatro modelos (A, B, C e D) usando diferentes conjuntos de dados. Estes
modelos eram bons detectores de falhas de processo, porque incluíram as variáveis com valores
muito diferentes dos seus dados históricos nas suas equações. Todos estes modelos seguiram a
dinâmica do processo e apresentaram boas previsões da variável de qualidade Y, sendo que o
Modelo C é que apresenta melhores previsões e é a melhor escolha para ser implementado no
sistema DCS como um sensor inferencial para providenciar previsões em tempo real da variável de
qualidade Y.
Palavras-chave: PCA, PLS, Analise Multivariada, Hydrocracking, Previsão de qualidade, Soft-
Sensor.
-
IV
Abstract
The main goal of this work is to maximize the productive capacity, and the revenue from each
oil barrel, GalpEnergia Sines Refinery has invested in an hydrocracking unit. Given that all fuels are
subject to strict regulation, it is necessary to have tight control over their quality. Therefore, in order
to implement future advanced control on the unit, we proceeded to a first approach of the prediction
of a quality variable of the diesel produced by making use of a soft- sensor.
To develop the soft sensor for quality prediction, variables of interest and their historical data
were collected and analyzed. Step-tests were performed in the real industrial plant in order to better
understand the dynamic behaviour of the fractionator.
Four soft-sensors were developed using Principal Components Analysis followed by a Partial
Least Squares regression to obtain linear models able of quality prediction. The soft-sensors
developed were good detectors of process faults because they included the faulty variables for
prediction.
All soft-sensors followed process dynamics and showed good predictions of the variable quality
Y. Model C presents the best predictions and is the best choice to be implemented in the DCS
system as an inferential sensor to provide real time information of the Y prediction to the operators
and also to be used for control purposes.
Keywords: PCA, PLS, Multivariate Analysis, Hydrocracking, Quality prediction, Soft-Sensor.
-
V
Agradecimentos
Muitas foram as pessoas que me guiaram e ajudaram neste percurso. A todos devo
agradecimentos, um respeito acrescido e a certeza de que a aprendizagem que fiz durante este
período vai ser muito útil ao longo da minha carreira profissional.
Gostaria de agradecer em primeiro lugar às minhas orientadoras, Senhora Professora Carla
Pinheiro e Senhora Engenheira Dora Nogueira por todos os imensos ensinamentos e incansável
motivação no decorrer deste trabalho. O apoio dado em períodos de maior stress e a calma e
paciência com que me guiaram durante o desenvolvimento desta tese merecem toda a minha
gratidão e respeito, tornando-as no exemplo a seguir na minha futura carreira.
De seguida gostaria de agradecer ao Senhor Engenheiro José Roque pela oportunidade
concedida de estagiar na Galp e também por todo o apoio disponibilizado, e também à Senhora
Engenheira Cristina Ângelo pela disponibilidade em esclarecer dúvidas referentes a alguns
softwares usados.
Tenho de referir um especial agradecimento a todas as equipas operacionais e técnicas da
Fábrica III da Refinaria de Sines. Ao Senhor Engenheiro Hugo Carabineiro gostaria de agradecer a
grande oportunidade de fazer testes na unidade de Hydrocracking e todo o tempo que dispensou a
esclarecer duvidas, todo o feedback dado, todas as opiniões, todos os dados fornecidos e acima de
tudo, o acolhimento e a maneira como gentilmente que desde o primeiro minuto me fez sentir
confortável num ambiente desconhecido para mim. Também gostaria de agradecer ao Senhor
Manuel dos Santos, não só pelo caloroso acolhimento na Refinaria, como também o fornecimento
de algumas informações que me ajudaram na compreensão do processo de hydrocracking e do
funcionamento da refinaria. A ele tenho também de agradecer a infinita motivação na realização
desta tese e a boa disposição com que dispensava o seu tempo para me ajudar neste trabalho.
Gostaria também de referir o apoio dos Senhores Engenheiros Eurico Correia e António Pinto, que
cederam o seu tempo e conhecimentos aquando do planeamento e realização dos testes na
Fraccionadora. Não posso esquecer o empenho, dedicação e auxílio dos chefes de turno da sala de
controlo, os Senhores Paulo Azevedo e Joaquim Santiago que acompanharam de perto todos os
testes efectuados. Gostaria ainda de agradecer aos operadores de consola, os Senhores Mário
Oliveira e Jorge Elias, pela intensa e incansável dedicação aos testes, pelos ensinamentos e
também pelo excelente e caloroso acolhimento no seu local de trabalho.
Além destes profissionais ligados directamente à área de estudos deste trabalho, também
tenho de agradecer aos colegas que partilharam o dia a dia comigo no piso 10 da Torre C da Galp.
A enorme simpatia e boa disposição tornaram o ambiente de trabalho leve e agradável.
Todos estes excelentes profissionais mostraram-me que a Galp Energia vale muito mais pelo
seu Capital Humano do que pelos seus lucros anuais.
-
VI
Não me posso esquecer de alguns dos excelentes colegas do Técnico, entre eles João Pedro
Ferraz, Mafalda Lancinha, Inês Lino, Juliana Mota, Marisa Pardal, Sara Bernardo e Ana Paias, entre
outros, cujo companheirismo e amizade me ajudaram nos momentos mais difíceis no IST, e que
construíram comigo momentos de pura alegria. A todos agradeço profundamente todos os
momentos que com eles partilhei.
Last but never least, gostaria de agradecer a meus pais por todas as palavras de motivação e
também por toda a dedicação e amor, sem a sua constante dedicação não teria conseguido
terminar o curso. Ao meu irmão em particular agradeço pelo exemplo de trabalho duro e empenho,
para além do humor retorcido que partilha comigo.
A todos aqui nomeados, sem excepção, quero exprimir de novo o meu mais sentido
Muito Obrigada.
-
VII
Contents
1. Introduction...................................................................1
1.1 Supply/Demand.............................................................................................................1
1.2 Demand by Sector.........................................................................................................1
1.3 Market Trends...............................................................................................................3
1.4 Industry Profile..............................................................................................................4
1.5 Thesis Drivers and Overview .......................................................................................7
2. Hydrocracking Process Overview..............................9
3. State of the Art............................................................13
3.1 Soft-sensor Definition and Application in Industrial Processes...................................13
3.2 Soft-sensor Methodology............................................................................................14
3.3 Data Driven Methods for Soft-sensing........................................................................16
3.3.1 Principal Components Analysis (PCA)..........................................................17
3.3.2 Partial Least Squares (PLS)..........................................................................19
4. Implementation of Step Tests...................................21
4.1 Step Tests Planning....................................................................................................21
4.1.1 Historical Data Analysis and Variable Selection............................................21
4.1.2 Sensitivity Analysis........................................................................................22
4.2 Step Tests Results......................................................................................................25
5. Soft-sensor Development..........................................29
5.1 Model A.......................................................................................................................31
5.1.1 Principal Components Analysis.....................................................................29
5.1.2 Partial Least Squares....................................................................................35
5.1.3 Model Calibration...........................................................................................37
5.1.4 Model Validation............................................................................................37
5.2 Model B.......................................................................................................................40
5.2.1 Principal Components Analysis.....................................................................40
5.2.2 Partial Least Squares ...................................................................................45
5.2.3 Model Calibration...........................................................................................47
5.2.4 Model Validation............................................................................................49
5.3 Model C.......................................................................................................................50
5.3.1 Principal Components Analysis.....................................................................50
5.3.2 Partial Least Squares....................................................................................54
-
VIII
5.3.3 Model Calibration...........................................................................................56
5.3.4 Model Validation............................................................................................58
5.4 Model D.......................................................................................................................59
5.4.1 Principal Components Analysis.....................................................................59
5.4.2 Partial Least Squares....................................................................................63
5.4.3 Model Calibration...........................................................................................65
5.4.4 Model Validation............................................................................................67
5.5 Model Results Summary.............................................................................................68
5.6 Soft-Sensor Fault Detection........................................................................................71
6. Conclusion..................................................................75
7. Future Work................................................................77
8. Bibliography/References...........................................79
-
IX
List of Figures
Figure 1.1 - World supply of primary energy.[1] ..................................................................................... 1
Figure 1.2 – Percentage shares of oil demand by sector in 2010 and 2035. [1] .................................... 2
Figure 1.3 – Global product demand, 2012 and 2035. [1] ...................................................................... 3
Figure 1.4 - Crude Prices in US dollars, these include Saharan Blend, Girassol, Oriente, Iran Heavy,
Basra Light, Kuwait Export, Es Sider, Bonny Light, Qatar Marin, Arab Light, Murban and Merey. [2] .. 4
Figure 1.5 - Global Capacity requirements by process. [1] .................................................................... 5
Figure 1.6 - Crude products, source: Skrebowski Energy Institute Oil Deplection Conference, 2008. 5
Figure 4.1 – Graphical User Interface for Petro-SIM™ after the fractionator was built and modelled.
............................................................................................................................................................ 22
Figure 4.3 – Laboratory results for Y during step tests. ...................................................................... 25
Figure 5.1 - Eigenvalues and cross-validation RMSECV curves for Model A data. ........................... 31
Figure 5.2 - Scores plots: Q residuals vs Hotelling T2 (top) and confidence ellipse on PC1 vs PC2
plot (bottom). ....................................................................................................................................... 32
Figure 5.3 - Correlation Map for Model A. .......................................................................................... 33
Figure 5.4- Loadings plot PC1 vs PC2 for Model A. ........................................................................... 34
Figure 5.5- RMSECV Y (Y) vs number of LV plot for Model A. .......................................................... 35
Figure 5.6 - PLS scores plots for Model A: Q Residuals vs Hotelling's T2 (left) and Scores on LV1 vs
Scores on LV2 (right) .......................................................................................................................... 36
Figure 5.7- Model A calibration results. .............................................................................................. 36
Figure 5.8 – Parity plot of the calibration step. ................................................................................... 37
Figure 5.9 – Model A validation results. .............................................................................................. 37
Figure 5.10 – Parity plot of the validation step of Model A. ................................................................ 38
Figure 5.11 – Eigenvalues and cross-validation RMSECV curves for Model B data. ........................ 40
Figure 5.12 - Scores plots: Q residuals vs Hotelling T2 (top) and confidence ellipse on PC1 vs PC2
plot (bottom). ....................................................................................................................................... 41
Figure 5.13 - Eigenvalues and cross-validation RMSECV curves for Model B data (without outliers)
............................................................................................................................................................ 42
Figure 5.14 - Scores plots: Q residuals vs Hotelling T2 (top) and confidence ellipse on PC1 vs PC2
plot (bottom), from Model B, without outliers. ..................................................................................... 43
Figure 5.15 – Correlation Map for Model B (without outliers) ............................................................. 43
Figure 5.16 - Loadings plot PC1 vs PC2 for Model B. ....................................................................... 44
Figure 5.17- RMSECV Y (Y) vs number of LV plot for Model B. ........................................................ 45
Figure 5.18 – PLS scores plots for Model B: Q Residuals vs Hotelling's T2 (left) and Scores on LV1
vs Scores on LV2 (right). .................................................................................................................... 46
Figure 5.19 – Calibration Results for Model B. ................................................................................... 47
Figure 5.20 – Parity plot of the calibration step of Model B. ............................................................... 47
Figure 5.21– Validation Results for Model B. ..................................................................................... 48
-
X
Figure 5.22- Parity plot of the validation step of Model B. .................................................................. 48
Figure 5.23 - Eigenvalues and cross-validation RMSECV curves for Model C data. ......................... 50
Figure 5.24 - Scores plots for Model C. .............................................................................................. 50
Figure 5.25 - Eigenvalues and cross-validation RMSECV curves for Model C data (without outliers).
............................................................................................................................................................ 51
Figure 5.26 – Scores plots for Model C (without outliers) ................................................................... 52
Figure 5.27- Correlations map for Model C (without outliers). ............................................................ 52
Figure 5.28 - Loadings map for Model C. ........................................................................................... 53
Figure 5.29 - RMSECV Y (Y) vs number of LV plot for Model C. ...................................................... 54
Figure 5.30- PLS scores plots for Model C: Q Residuals vs Hotelling's T2 (left) and Scores on LV1 vs
Scores on LV2 (right). ......................................................................................................................... 55
Figure 5.31- Calibration Results for Model C. ..................................................................................... 56
Figure 5.32 – Parity plot of the calibration step of Model C. ............................................................... 56
Figure 5.33- Validation Results for Model C. ...................................................................................... 57
Figure 5.34 – Parity plot of the validation step of Model C ................................................................. 57
Figure 5.35 – Eigenvalues and cross-validation RMSECV curves for Model D data. ........................ 59
Figure 5.36 – Scores plots for Model D. ............................................................................................. 59
Figure 5.37 - Eigenvalues and cross-validation RMSECV curves for Model D data (without outliers).
............................................................................................................................................................ 60
Figure 5.38 – Scores plots for Model D (without outliers). .................................................................. 61
Figure 5.39 – Correlations map for Model D (without outliers). .......................................................... 61
Figure 5.40 – Loadings map for Model D. .......................................................................................... 62
Figure 5.41 – RSMSECV Y vs number of LV plot for Model D. .......................................................... 63
Figure 5.42 – PLS scores plots for Model D: Q Residuals vs Hotelling's T2 (left) and Scores on LV1
vs Scores on LV2 (right). .................................................................................................................... 64
Figure 5.43 – Calibration results for Model D. .................................................................................... 65
Figure 5.44 – Calibration parity plot for Model D. ............................................................................... 65
Figure 5.45 – Validation results for Model D ....................................................................................... 66
Figure 5.46 – Validation parity plot for Model D.................................................................................. 66
Figure 5.47 – Model A validation for the dataset 1st of November 2013 to January 13th of 2014. ..... 70
Figure 5.48 - Model B validation for the dataset 1st of November 2013 to January 13th of 2014. ...... 70
Figure 5.49- Model C validation for the dataset 1st of November 2013 to January 13th of 2014. ....... 71
Figure 5.50 – ‘Corrected’ Model A validation for the dataset 1st of November 2013 to January 13th of
2014. ................................................................................................................................................... 72
Figure 5.51 - ‘Corrected’ Model B validation for the dataset 1st of November 2013 to January 13th of
2014. ................................................................................................................................................... 72
Figure 5.52 - ‘Corrected’ Model C validation for the dataset 1st of November 2013 to January 13th of
2014. ................................................................................................................................................... 72
-
XI
List of Tables
Table 4.1– Simulation results for the negative step tests. .................................................................. 23
Table 4.2 – Simulation results for positive step tests ......................................................................... 23
Table 4.3 – Simulation results for both positive and negative tests in variable X22........................... 23
Table 4.4 - Fractionator step tests scheduling. ................................................................................... 24
Table 4.5 – Y magnitude of variation for each step test on variables X3, X10, X12 and X9 .............. 26
Table 4.6 - Y magnitude of variation for each step test on variables X13 and X8. ............................. 26
Table 4.7 - Y magnitude of variation for each step test on variable X22. ........................................... 26
Table 5.2- PCA results obtained for Model A data. ............................................................................ 30
Table 5.3 - PLS results of Model A data. ............................................................................................ 35
Table 5.4 - PCA results for Model B. .................................................................................................. 39
Table 5.5- PCA results for Model B (without outliers). ........................................................................ 41
Table 5.6 – PLS results for Model B data. .......................................................................................... 45
Table 5.7- PCA results for Model C .................................................................................................... 49
Table 5.8 - PCA results for Model C (without outliers) ....................................................................... 51
Table 5.9 - PLS results for Model C .................................................................................................... 54
Table 5.10 – PCA results for Model D ................................................................................................ 58
Table 5.11 – PCA results for Model D (without outliers) ..................................................................... 60
Table 5.12 – PLS results for Model D. ................................................................................................ 63
Table 5.13 – Model results summary .................................................................................................. 67
Table 5.14 – Performance criteria VAF for the modelling results. ...................................................... 67
-
XII
Abbreviations
ANN Artificial Neural Networks
b/d barrels of oil per day
DCS Digital Control Systems
LV Latent Variable
mb/d thousand barrels oil per day
mboe/d thousand barrels oil equivalent per day
MSE Mean Square Error
MSEP Mean Square Error of Prediction
NFS Neuro-Fuzzy Systems
OECD Organization for Economic Co-operation and Development
OPEC Organization of the Petroleum Exporting Countries
PC Principal Component
PCA Principal Components Analysis
PLS Partial Least Squares
RMSEC Root-Mean-Square Error of Calibration
RMSECV Root-Mean-Square Error of Cross-Validation
RTDB Real Time Database
SVM Support Vector Machines
VAF Variance Accounted For
VGO Vacuum Gas Oil
-
XIII
Nomenclature
Symbol Description
b Inner linear regression coefficient
E Principal Components Analysis residual matrix
F Partial Least Squares residual matrix
m Number of matrix columns
n Number of matrix rows
N Number of Samples
p Input loading vector
pT Transpose input loading vector
P Loading matrix from Matrix X decomposition
PT Transpose Matrix of P
q Output loading vector
qT Transpose output loading vector
Q Loading matrix from matrix Y decomposition
QT Transpose Matrix of Q
R2 Coefficient of Determination
s Number of splits
t Input score vector
T Score matrix from matrix X decomposition
u Output score vector
uT Transpose input score vector
U Score matrix from matrix Y decomposition
x Laboratory analysis
�� Model Prediction �� Mean value of the Laboratory Analysis X Input Matrix
Y Output Matrix
-
XIV
-
1
1. Introduction
Living without energy seems, nowadays, unthinkable. The need for energy comes from the
need of comfort, and this need includes heating, technology and the ability to move and travel. Most
of the technological world we live in today would not exist in the absence of fuels, and most
particularly in the absence of fossil fuels like petroleum, coal or natural gas.
1.1 Energy Supply and Demand
Fossil fuels accounted for 82% of energy supply in 2010 and will be 80% of the global total in
2035. In 2010 oil demand was of 81,2 mboe/d accounting for 32,2% of fuel shares (figure 1.1),
demand for coal was 69,8 mboe/d having 27,7% of the shares, and gas supply was 54,8 mboe/d
having 21,7% fuel shares (the remaining 18,4% are distributed between nuclear, hydro, biomass and
other renewables). The prediction for 2035 is that the oil demand will be of 100,2 mboe/d counting
for 26,3% of fuel shares, coal demand is predicted to be 104,0 mboe/d (having 27,2% of fuel shares)
and gas will have a demand of 99,8 mboe/d, accounting for 26,0% of fuel shares. By 2035 the global
oil use per head will average just 3,2 barrels, up from 2,4 in 2010 [1].
Figure 1.1 - World supply of primary energy.[1]
-
2
1.2 Demand by Sector
Of all sectors of oil consumption, transportation of people and goods (road, aviation, railways,
and marine transport) is the main use of oil, and the other sectors are the petrochemical industry,
agricultural/commercial/residential, and also electricity generation sector. In 2010 transportation
accounted for 52% of all oil use and the prediction for 2035 is of 60%, figure 1.2. Furthermore,
transportation is the main drive for the overall oil consumption increase and this increase is often
stimulated by demographic changes, higher wealth levels, increasing urbanization, etc, and all of
these lead to more passenger car ownership. Although car ownership will grow in OECD
(Organization of Economic Co-operation and Development) states, the major pull in car demand will
be in developing Asian countries and China, having the latest the biggest oil demand growth. In the
period of 2010 until 2035 the number of cars in OECD countries will rise by 125 million, but in China
alone, the rise is substantially dramatic, of about 380 million cars. The overall car park in 2035 will be
1,9 billion cars, more than the double of the 2010 numbers. As for global oil demand for
transportation, in 2010 demand was of 34,6 mboe/d and the prediction for 2035 is of 44,6 mboe/d. [1]
Figure 1.2 – Percentage shares of oil demand by sector in 2010 and 2035. [1]
Although air traffic is expected to rise, it is somewhat crippled by financial crisis in OECD since
72% of share in aviation oil demand is of OECD countries, and global recession made aviation oil
consumption values in 2010 smaller than in 2000.These facts show how much aviation oil demand is
closely linked to economic activity. Furthermore, it is also heavily influenced by jet fuel prices.
-
3
Demand for aviation oil was in 2010 of 5 mboe/d and predictions point to 7,2 mboe/d in 2035, having
the developing countries leading the growth in demand.
Oil supply in non-OPEC (Organization of the Petroleum Exporting Countries) countries, was of
46,4 mb/d in 2012 and is estimated of 45,9 mb/ by 2035, passing through a peak in 2020 of 50,3
mb/d. As for OPEC countries, supply in 2012 was of 31 mb/d and it is estimated that in 2035 supply
will be of 37 mb/d.
The refined product demand, and in the particular case of the middle distillates, in 2012 was of
32,3 mb/d (accounting for 36,3% of all distillated products) and it is estimated that in 2035 the
demand will be of 44.1 mb/d (accounting for 40,7% of all distillated products), having the biggest
growth rates of all the distillates, and particularly the diesel oil with the largest growth rate, figure 1.3. [1]
Figure 1.3 – Global product demand, 2012 and 2035. [1]
1.3 Market Trends
Crude prices have increased dramatically since the middle of the 2000’s, especially since
2006. The reason for this increase may be because there was (and still is) a rapid growth in Asian
economies that are sustained in large quantities of oil consumption. In 2008, the US faced the
longest recession since the Great Depression and therefore ceased to trade oil causing the crude
-
4
prices to decline. Knowing this, OPEC decides to also decrease production by the end of the year.
This decrease in production and the continuing demand in China, have a positive effect on the price,
and the prices rise steadily from the middle of the year 2009. After 2011, prices surpass those of the
previous peak in 2008 because of the civil war and loss of production in Libya, and continue to
increase due to unrest in Middle Easter and North African countries.
Figure 1.4 - Crude Prices in US dollars, these include Saharan Blend, Girassol, Oriente, Iran Heavy, Basra Light, Kuwait Export, Es Sider, Bonny Light, Qatar Marin, Arab Light, Murban and Merey.
[2]
1.4 Industry Profile
The refining capacity was usually measured by distillation capacity, but nowadays capacity for
conversion and product quality improvements prove to be the vital role in processing raw crude
fractions into more valued products, especially now that the trend is for higher demand in lighter
products with more limitative quality specifications. All new refinery projects have high levels of
desulphurization and secondary processing leading to the ability to produce high yields of light clean
products that comply with the most advanced specifications. Moreover, these new projects are
designed to be able to refine heavy, low quality crudes as well as better quality grades of crude. The
prediction for 2035 is that projects for conversion capacities will increase more that distillation
capacities, figure 1.5. Within the conversion projects, hydrocracking will have the highest growth,
because hydrocracking is the primary means to produce incremental distillate, once straight run
fractions from crude have been maximized. [1]
-
5
Figure 1.5 - Global Capacity requirements by process. [1]
With the rise in crude prices comes the need to make the most of the oil barrel, especially
now, that the barrel is traded at about $105 US dollars. Figure 1.6 shows the percentage yields of
the products obtained from a crude barrel. Since the oil price is rising and the demand is growing for
middle distillates, there is a need to convert the heavier fractions of the crude distillation into lighter
distillates, preferably diesel.
Figure 1.6 - Crude products, source: Skrebowski Energy Institute Oil Deplection Conference, 2008.
-
6
Sulphur removal from Diesel proves to be the greatest challenge of the refining industry, due
to having greater need for processing unit addictions and higher costs. Diesel quality specifications
vary between geographic regions. In EU, Japan, Hong-Kong, New Zealand, Australia, South Korea,
Taiwan, Argentina, Armenia and Singapore the limit for Sulphur concentrations for on-road Diesel is
10 ppm. In the US, Canada and Chile the limitation is of 15 ppm. In some countries, there is even a
variation between cities. In China, the limit is 350 ppm with the exception of Beijing that has a
limitation of 10 ppm, and selected cities in the country have a limitation of 50 ppm. For India, the
nationwide limitation is of 350 ppm, but for selected cities is of 50 ppm. Belarus and Thailand have a
limitation of 50 ppm. The region with the lowest Diesel quality is Africa, having sulphur limits between
2000-3000ppm, with the great exception of South Africa, that plans to reduce the limitation to 10 ppm
by 2017. [1]
In 2009 there were about 195 hydrocracker units operating worldwide, processing about
4.000.000 b/d of feedstock [3]. In the start of run, the vast majority of hydrocrackers can reach a near-
zero sulphur content. Hydrocracker designs include single-stage (either once through or with recycle)
or multiple-stage hydrocracker (usually two-stage) and can run in vacuum gas oil (VGO) cracking
mode or in light-cycle oil. In the single-stage and once–through hydrocracker there is only one
reactor and the bottom of the fractionator (the unconverted oil) is not recycled for further cracking,
and it is usually needed to hydrotreat the feedstock to remove ammonia and sulphur (or the reactor
is equipped with catalyst to perform this pre-treatment task). Single-stage hydrocrackers with
recycling are the most used configurations because the uncracked residual from the bottoms of the
fractionator returns to the reactor for further cracking, increasing the reaction’s overall yield. Two-
stage hydrocrackers use two reactors, being the unconverted oil from the bottom of the fractionator,
recycled to the second reactor for further cracking. Since the first stage reactor performs both
hydrotreating and hydrocracking, the second stage reactor feed is almost entirely free of ammonia
and sulphur. [3]
Hydrocracker technology has become a key process to convert low-value, high-sulphur,
heavy-oil fractions into valuable products. This is of particular importance in an environment where
the rising crude prices, figure 1.4, have shrunken the profit margin and have forced refineries to
consider upgrading to poorer quality crudes and difficult hydrocracker feedstocks, and also were tight
fuel regulation and emissions legislation are major operational constraints. [4]
-
7
1.5 Thesis Outline
The goal of the work presented in this thesis is to develop a soft-sensor that enables the Galp
Refinery at Sines to predict the quality of Y produced by their hydrocracker, and to use it in advanced
control design of the unit. Having such a sensor enables the refinery to find ways to increase the
production of excellent quality Y to meet both market demand and also the existing regulatory
limitations. This study is necessary and important since it was the first of the kind and there was no
previous work in soft-sensor design for quality prediction in hydrocracking products in a online unit.
Starting with the description of the hydrocracking process at the refinery in Chapter 2, we will
proceed by describing the chemometric tools used to develop the soft-sensor (PCA and PLS), in
Chapter 3. Chapter 4 will describe the steps taken to plan and perform the step tests in the
hydrocracker unit and also the results of the tests. Chapter 5 will describe the development and
results of calibration and validation of the soft-sensors. Chapter 6 presents the conclusions of this
study, and the in last chapter, Chapter 7, future work to be done to improve the soft-sensor is
suggested.
-
8
-
9
2. Hydrocracking Process Overview
Hydrocracking process is a catalytic process used for cracking the complex high-boiling, high
molecular weight hydrocarbons mixtures[5] into more valuable low-boiling products [6] like kerosene,
diesel and naphtha. Hydrocracking is a very important and flexible refinery process because it can
process a large variety of gas oils, manufacturing products with low sulphur content and high smoke
point jet fuel, in order to meet the demand of cleaner and environmentally friendly fuels [5].
In this process, the cracking of carbon-carbon single bonds and the hydrogenation of the
double bonds are complementary phenomena [7], because the cracking reaction provides olefins for
hydrogenation and hydrogenation liberates the heat for cracking [6]. The hydrogenation reactions are
highly exothermic and the cracking reactions are slightly endothermic, making the overall process
highly exothermic. Hydrogenation reactions extend not only to olefins but also to aromatic, sulphur,
nitrogen and oxygen compounds [6], making the separation of these pollutants easier, and rendering
less costly to meet of the current fuel specifications.
This chapter will describe the hydrocracking process at Sines Galp Refinery, which is
important to the comprehension of the subject at hand. The description will begin at the hydrogen
make-up compression section, followed by the reaction section (the filter system, reactor feed
section, reaction system and effluent cooling), the effluent separation, fractionator section and
storage.
The aim of the make-up compression section is to compress hydrogen to ensure a continuous
supply of hydrogen to the reaction section to preserve system pressure, since hydrogen is consumed
in the reaction and is also lost by dissolution in the hydrocarbon liquid and eventually through leaks.
This make-up compression section is composed of three parallel trains of compression, having each
one of them three stages of compression. In normal operation, only two out of three are working,
making the third a spare. The feed gas is divided between the two operating trains, compressed to
the desired reaction section pressure and then combined and fed to the reaction section.
Vacuum Gas Oil (VGO) is the fresh feed for the first stage reactor (A-01) in the reaction
section and is pre-heated in the kerosene/fresh feed exchanger (B-01), followed by further heating in
the diesel/fresh feed exchanger (B-02), being afterwards sent to the filter system (C-01 A/B/C). The
filters are designed to clear particulate material from the fresh feed that could plug the catalyst bed in
the first stage reactor, causing not only catalyst deactivation but also pressure drop problems.
After filtration, the feed is sent to the filtered feed surge drum (D-01). From this drum, the feed
is pumped to the reactor system pressure. The D-01 is designed to prevent fluctuations and losses
of feed to the pumps and reaction section.
The oil feed to the second stage reactor (A-02) is the unconverted oil from the first and second
stage reactors and comes from the fractionator (D-02) bottoms. This stream is cooled by heat
-
10
exchange with the feed to the fractionator furnace in the fractionator feed and bottom exchanger B-
03 and in the fractionator bottom steam generator B-04. The second stage reactor feed stream is
then pumped to the reaction section.
After leaving their feed pumps, the feeds from the first and second stage reactors are mixed
pre-heated with hydrogen from either the make-up hydrogen from the make-up compression section
or the recycled hydrogen. The combined feed mixture to the first stage reactor is heated in the first
stage feed/effluent exchanger (B-05 A/B) and afterwards in the first stage furnace (E-01). The
second stage feed mixture is heated in the second stage reactor/effluent exchanger (B-06) and
afterwards in the second stage furnace (B-02).
The heated gas/oil mixtures are fed to their respective stage reactor: the first stage reactor,
(that has two types of catalyst, one for hydrotreating and the other one for hydrocracking, having six
catalyst beds) and the second stage reactor (that has only hydrocracking catalyst, having four
catalyst beds). As soon as the feed contacts the catalyst, the reaction begins and, because the
reactions are highly exothermic, the temperature of the mixture increases and also that of the
catalyst’s beds. To prevent excessive heating and to control the reaction temperature, a quench gas
(hydrogen) is introduced between the catalyst beds of each reactor section.
After reaction, the reactors effluents consist of product oil, excess hydrogen not consumed in
the reaction and light gases formed during hydrocracking. The stream leaving the first stage reactor
is cooled by heat exchange with the reactor’s feed in B-05 A/B and then mixed with the second stage
effluent. The second stage effluent is cooled by exchanging heat with the fractionator feed in B-07
and afterwards it is combined with the first stage reactor effluent. This mixture is further cooled B-08
and then sent to a steam generator (B-09) to complete the cooling before feeding the hot high
pressure separator (HHPS), D-03. The D-03 is designed to separate the excess hydrogen from the
reaction liquids, enabling the recycling of the hydrogen gas to the reaction section, in order to reduce
the cost of producing hydrogen. The remaining liquid products are then let down in pressure by the
power recovery engine (F-01) and is then flashed in the hot low pressure separator (D-04). This high
temperature, low pressure flash enables the separation of dissolved hydrogen gas in the liquid, and
the gas is recycled to the reaction section.
The D-04 bottoms is fed to the product stripper (D-05) to separate H2S, LPG and some
naphtha from the liquid reaction product. This stripper has three packed beds. The stripper bottoms
is heated in B-08 by heat exchange with the reactors effluent, also by exchanging heat with the
second stage reactor effluent in B-07. This stream is then further heated by heat exchange with the
fractionator bottoms stream, in the fractionator bottoms/feed exchanger (B-10), before being sent to
the fractionator feed furnace (E-03).
The Fractionator feed is heated in the fractionator feed furnace with the aim of producing
enough vapour rates so that overflash is produced in the column (in this case, overflash is defined as
the ratio of volumetric liquid going to the stripping section and the total volumetric rate of the distillate
products).
-
11
The product fractionator (D-02) in normal operation, the light naphtha is sent overhead, heavy
naphtha kerosene and diesel are drawn trough sidecut and diesel is split into product and
pumparound. The unconverted oil is drawn as the bottoms and the feed enters the column in the
flash zone.
Superheated low pressure stream is used in the fractionator’s stripping section to recover any
products from the bottom before it is pumped from the column. This steam is cooled by heat
exchange with the feed in the fractionator bottom and feed exchanger (B-10).
Before being sent to the fractionator reflux drum, the overhead vapour is totally condensed at
the fractionator overhead air cooler (B-11) and also at the overhead trim cooler (B-12). The reflux
drum is a horizontal vessel designed to separate oil from water which is collected at the boot of the
vessel and sent to the injection water drum (D-06). Part of the oil (light naphtha) is pumped to the
light ends section and the remaining is pumped back as reflux to the fractionator.
Heavy naphtha is drawn and flows to the heavy nafta stripper (D-07). This stripper has valve
trays and a thermosiphon reboiler that exchanges heat with the diesel pumparound. The heavy
naphtha vapour is returned to the fractionator, and the bottoms is pumped to the light ends section.
Kerosene is drawn from the fractionator column and is sent to the kerosene stripper (D-08)
that is similar to D-07, that is, has trays and a thermosiphon reboiler that exchanges heat with the
diesel pumparound. The stripper’s vapour is returned to the fractionator, and the bottoms is pumped,
cooled and sent to storage.
Diesel is drawn from a chimney tray of the fractionator and the flow is split between a
pumparound stream and also a stream fed to the diesel side stripper (D-09). This stripper uses
superheated low pressure steam to remove light components from the product. It’s overhead vapour
is returned to the fractionator and the diesel stripper bottoms is cooled by exchanging heat with the
first stage fresh feed exchanger B-13, being further cooled by reboiling the B-14 (deethanizer
bottoms reboiler). Because the diesel is a stream stripper, the water must be taken off to meet
product specifications, therefore, the stream is sent to the diesel vacuum drier air cooler (B-15), and
afterwards to the diesel vacuum drier (D-10). The bottoms of D-10 is then cooled in the diesel air
cooler (B-16) and later in the diesel trim cooler (B-17). Part of the resulting stream is to be sent to the
cold low pressure separator (D-11, designed to release hydrogen rich vapour, and after amine
treating is recycled to the reactors), in the reaction section to be used as sponge oil. The remaining
diesel is sent to storage.
The diesel pumparound stream removed from the fractionator reduces column traffic above
the diesel tray side cut and removes valuable high temperature heat that provides heat for four
column reboilers and also produces medium pressure steam before it returns to the fractionator. So,
this stream is pumped to the kerosene stripper reboiler (B-18), the heavy naphtha reboiler (B-19), the
naphtha splitter reboiler (B-20) and the naphtha stabilizer reboiler (B-21) and finally to the medium
-
12
pressure generator (B-22) to ensure a continuous pumparound heat removal before entering the
fractionator [8].
-
13
3. State of the Art
This chapter presents the state of the art of soft-sensors and their scope and useful
application in process industries. Moreover, soft-sensoring development and its difficulties will be
discussed and also data-driven methods for soft-model development, particularly Principal
Components Analysis (PCA) and Partial Least Squares (PLS), will be characterized and discussed.
3.1 Soft-Sensors for industrial processes
Chemical plants are usually highly instrumented and have a large number of sensors that
collect measured data for process control and monitoring. About two decades ago researchers
began using the large amount of data to build predictive models, and these models are called, in
process industry, Soft-Sensors. The term soft-sensor is a combination of the words ‘software’ (mainly
because models are developed in computer programs) and ‘sensors’, because these models are
providing similar information as hardware sensors. These soft-sensors are often divided into two
categories: model-driven and data-driven [9,10]. Model-driven sensors (also called white-box models)
are most commonly based on First Principle Models that describe the physical and chemical
properties of the process[9,10], are developed primarily for the planning of the plants and usually only
describe ideal process steady-states and not real process dynamics, focusing on the description of
the optimal process steady-state, (therefore not being useful or suitable for the description of any
dynamic state), describing a simplified theoretical background rather than real-life process conditions [9] and being somewhat computationally intensive for real-time applications[10,11] .
Data-driven models do not have this disadvantage because they are based on data measured
within the processing plants, thus describing the true conditions of the process in a better way[9,10],
providing real-time information necessary for effective quality control[11] Data-driven models are also
known as black-box techniques because the model itself has no knowledge about the process and is
based on empirical observations of the process. These models are based in real-life measurements
recorded, stored and provided as historical data.[9]
The span of tasks performed by Soft-Sensors is quite broad but the most common use is the
prediction of process variables that can only be known either at low sampling rates or through off-line
analysis [9,12]. These variables are usually very important for process control because they are
usually related to the process output quality and it is naturally important and necessary to deliver
additional information about these variables at higher sampling rate or lower financial burden[9,13],
hence the use of soft-sensors. Another field of application of soft-sensors is of process monitoring
and process fault detection by finding the state of the process and identification of the deviation
source. As previously said, real industrial plants have many sensors, and there is a certain
-
14
probability of a sensor failing. Detecting this failure is also the soft-sensor job, adding that it can act
up as a backup sensor while the hardware sensor is replaced, or, if the soft-sensor proves to be
good, it can act as a replacement for the hardware measuring device. [9]
Measuring variables that define product quality is a major problem in process industries.
These variables are called primary or quality variables quantify the productivity or the specifications
upon which the product is sold, like purity or physical or chemical properties, and these are the most
difficult to measure online . The online variables that are easy to access and measure are often
called secondary variables and can be temperature, pressure and flow rate and can be used to infer
primary variables. Because of the nature of chemical and processing engineering systems, the
dynamics and state of the secondary variables reflects the dynamics and state of the primary
variables, meaning that changes in secondary variables are indicative of changes in product quality.
The technique of using secondary variables to generate estimates of product quality is usually called
‘soft-sensing’ and these inferential estimators are usually in place of direct on-line measurement of
controlled variables if direct measurements are expensive, unreliable or add large lag[13].
Soft-sensors have been used for estimation of product composition of distillation columns,
particle size distributions in a grinding circuit, monitoring emissions of NOx, SO2 and CO2 in industrial
boilers and furnaces, ensure high and consistent product quality in the pharmaceutical industry and
process reliability[11]. They have also been used as a feed oil classifier to determine feed oil type by
estimation of kerosene dry point[14], modelling of an activated sludge plant for detection of shifts in
the process of various kinds[15] modelling product quality in a crude desalting and dehydration
process [13,] for oil sludge depository classification for waste treatment [16], to study the influence of
minerals on the taste of bottled tap water[17], modelling of ground-level ozone and factors affecting
it’s concentrations[18], to the prediction of product quality for catalytic hydrocracking of vacuum gas
oil[10], just to say a few.
3.2 Soft-Sensor methodology
There are some problems affecting the development of the soft-sensors, and usually they are
related with measurement noise, missing values, co-linear features and varying sampling rates.
Adding to this, process plants are usually dynamic environments and abrupt changes can exist like,
for example, the quality of the process input changes, that results in prediction accuracy
deterioration[9].
A challenge issue in soft-sensor development is data co-linearity, because typically, measured
data in process industry is strongly co-linear and results from partial redundancy in the sensor
arrangement ( for example: two neighbouring temperature sensors will collect strongly correlated
measurements). As the measurements collected are usually for process control purposes, there is a
-
15
great number of information accumulated that is data rich but information poor. For soft-sensor
modelling, the requirements are of other kind: only informative variables are required and any other
information just adds to model complexity, having a negative effect on model training and
performance. To deal with this problem, two methods are widely accepted, PCA and PLS, that
transform the input variables into a new reduced space with less co-linearity[9].
The presence of missing data presents difficulties in model development. Since it is necessary
to use the maximum amount of samples to develop a model, missing data or removal of incomplete
data decreases the accuracy of model estimates. Also, when the soft-model is applied and used to
estimate the quality variable as a part of a control system, the sensor must be able to deal with the
failure of some online measurements and still be able to provide reliable estimates[19]. Since the
possibility of having representative data is larger in large datasets than in small, missing data should
cause less problems in large datasets than in small datasets, because in large datasets any direction
is still fairly well represented, at least as long as one works in subspaces of projections, like PCA or
PLS[20].
In soft-model construction methodology there are no widely accepted guidelines, but there are
steps that are frequently taken in its development. The presented procedure is rather general but
resumes the most common steps in model development.
The usual first step is the first data inspection, where data structure is overviewed and any
obvious problems are identified, like locked variables having constant value. The next step is to
assess model complexity, that is, deciding if there is only a need for a simple regression model or a
more powerful tool like PCA or PLS analysis, for example, to develop the soft-sensor. Also,
assessing the target variable is very important, because there has to be enough variation in the
output variable and understand if it can be modelled at all.
Then one proceeds to the selection of historical data and identification of steady states. In this
stage a dataset is selected for training and another for validation. The stationary parts of the data are
identified, selected and used in model development. Next, data must be pre-processed. A typical
pre-processing step is to normalise the data to zero-mean and unit variance (as required for PCA),
but other types of pre-processing are also employed, such as handling missing data, outlier detection
and replacement, selection of relevant variables, and handling of drifting data. The data processing
is usually done iteratively until the developer considers the data and the model ready for validation.
Data pre-processing is considered to be the most time consuming, manual work demanding and
expert knowledge of the underlying process.
Following pre-processing, the next phase is model selection, training and validation. Selection
of model type is critical for soft-sensor performance. There is not a theoretical unified approach for
this step and usually model type and its parameters are selected in an ad hoc manner and its
selection often subjected to the developer’s past experience, expertise, and personal preference.
However, there are some techniques that can be adopted, such as starting with a simple model type
and then increase model complexity as long as significant model improvement can be observed (by
-
16
accessing model performance with independent data). After finding the optimal model structure and
training the model, the soft-sensor has to be validated with independent data. The evaluation of its
performance can be done numerically, by the use of the Mean Square Error (MSE), which measures
the average square distance between the predicted and the real value, and by visual representation
of the predictions. One disadvantage of this last method is that the final decision if the model
performs adequately is rather subjective depending on the model developer experience.
Finally, after its developing, the soft-sensor has to be maintained and tuned on a regular basis,
and this is necessary due to the fact that drifts and other changes in the data deteriorate the
performance of the soft-sensor, and have to be compensated by adapting or re-developing the
model[9].
3.3 Data-driven methods for soft-sensing
Using soft-sensors in crude oil distillation with varying feed-stock is still a difficult problem to
solve because of the relationship between easily measured process variables and the difficultly
measured quality variables vary with the types of crude processed. Moreover, most of the refineries
use mixed sources of crude oil with varying blending ratios, and the relationship between process
variables and quality variables varies with different crude oils or blends[14].
Hydrocracked products are separated into different fractions that constitute the blending
stocks for the final products. The product quality is significantly influenced by operating conditions
and the cracking yield is reduced with time by catalyst deactivation. Therefore, the continuous
monitoring of product quality is very important especially to avoid off-spec petroleum fractions, that
usually cause problems downstream at the blending stage[10].
It is usually difficult to get precise and reliable product composition measurements without time
delay because most composition analysers have significant time lags and their reliability is usually
quite low. Using tray temperature could be an indication of temperature, but the presence of off-key
components in multicomponent mixtures, column pressure, and also feed rate jump can affect tray
temperatures preventing it of being an exact indicator of composition. Due to the strong correlation
between tray temperature measurements, Principal Components Analysis (PCA) or Partial Least
Squares (PLS) methods should be applied[21].
The most used modelling techniques applied to data-driven soft-sensors are the Principle
Component Analysis (PCA) in a combination with a regression model, Partial Least Squares (PLS),
Artificial Neural Networks (ANN), Neuro-Fuzzy Systems (NFS) and Support Vector Machines (SVMs)
[9].
-
17
Several reasons motivate the multivariate approach to a problem. Process deviations are not
always detected by looking at one variable at a time, and often these deviations occur
simultaneously in many variables and even though a variation is very small it can pose a significant
influence on product quality. If the process drift to out-of-control state can be detected in early
stages, corrective measures can be taken sooner to avoid such states. Also, if many variables have
been measured, the effect of noise can be drastically diminished by modelling correlation structures
among the different variables [15], and by the reduction of data dimension by using, for example, PCA.
In this thesis PCA and PLS methods will be used because they are widely accepted and they
are usually the first approach to soft-model development for of process control.
3.3.1 Principal Components Analysis (PCA)
Noise can be found in almost all variables of the majority of datasets. Latent variable models
like PCA and PLS estimate the relevant part and the noise of each variable and therefore are used in
the present work[20]. Principal Component Analysis (PCA) was used for analysing the data so that
only the secondary variables important to the determination of product quality were selected[13].
Using PCA, the data can be described using far fewer variables than the original variables with no
significant loss of information, and also, PCA often produces linear combinations of variables that are
useful predictors of particular processes[12]. Mathematicaly, PCA relies on an eigenvector
decomposition of the covariation or correlation matrix of the process variables. Here X represents a
matrix (n x m) where its rows correspond to the samples and its columns correspond to the
variables. PCA the decomposes the data matrix X into the sum of the outer product of vectors ti and
pi (i=1,2,3...,k) plus a residual matrix E, equation 2.1 and 2.2 (matrix form).
� � ���� ���� ⋯ �� � (2.1) Or,
� � �� � (2.2)
Where PT is made up of the �� as rows and T of the �� as columns, and k in equation 2.1 must be less than or equal to the smaller dimension of X, i.e � � �����, ��. Vectors ti (n x 1) and pi (m x 1) are the ith score vector and loading vector, respectively. Score vectors are orthogonal and unit length
and loading vectors are also orthogonal. Loading vector p1 defines the direction of greatest
variability, and score vector t1 (also known as the first principal component) represents the projection
of each column of X onto p1, being the linear combination of the columns of X explaining the greatest
amount of variability (�� � ���). The second principal component is also the linear combination of the columns in X explaining the next greatest amount of variability (�� � ���) subject to the condition that it is orthogonal to the first principal component. Principal components are ordered in decreasing
-
18
variability. Since the X columns are highly correlated, the first few principal components can explain
the majority of data variability[21].
In equation 2.1, k represents the number of principal components to retain, and E (n x m) is
the residuals matrix of unfitted variation (or noise) [21]. The matrix product of T and PT reproduces the
most important variation in X. This matrix is a projection of the X-data onto a new low-dimensional
space, where it can be effectively analysed. This reduction on space dimensionality is achieved due
to correlations between the variables in matrix X, and this is the main reason why this method is
specifically advantageous for data analysis with a large number of mutually correlated variables[16].
The ��vectors are the eigenvectors of the covariance matrix, that is, in equation 2.3: �������� � ��� (2.3)
Where � is the eigenvalue associated with the eigenvector ��. In PCA the �� are the loadings and contain information on how variables are related to each other. The �� form an orthogonal set while the �� are orthonormal. In equation 2.4 note that
��� � �� or � � �� (2.4) The pairs �� and �� are in descending order of �, having the first pair captured the largest
amount of information of any other pair in the decomposition, and each subsequent pair captures the
greatest possible amount of the remaining variance[12].
The higher the loading of a variable, the more it contributes to explaining the variation of a
particular principal component, and only variables with loadings higher than 50% should be selected
for principal component interpretation, and any principal component with a value equal or greater
than one, is usually considered of statistical relevance[13]. The matrices T and P provide valuable
information on the internal data structure. These matrices are interpreted based on the fact that
correlation between two variables (or similarity between two samples) is a function of distance in the
PC-Space[16]. Pairwise scores plots are often referred to as ‘sample maps’ revealing their grouping
and outliers. Similarly, the loading plots (variable maps) show variable correlations. The distance
from the origin to a sample in the score plot or a variable on the loadings plot along a certain PC
reflects their importance in regard to that PC[16].
The number of principal components to be retained in the model is usually determined by
cross-validation and the dataset for building a model is divided into training and testing (validating)
data set[21]. In this study, the source of training and testing data is from the process data records,
which are recorded and collected from the DCS systems, and the corresponding laboratory analysis.
One of the limitations of pure PCA is that it can only effectively handle linear relationships of
the data and cannot deal with data non-linearity. Another disadvantage is the selection of the optimal
number of principal components (that can be addressed by using cross-validation techniques).
-
19
Another problem is that the principal components describe very well the input space but do not
explain the relations between the input and output data, that is usually what has to modelled[13].
3.3.2 Partial Least Squares Regression (PLS)
The regression problem, that is, the modelling of response variables (primary variables) Y, by
means of a set of predictor variables (secondary variables) X, is one of the most common problems
in data-analysis in science and technology, and one example of such problems may include relating
the quality and quantity of manufactured products (Y) to the conditions of the manufacturing
process[22].
The PLS algorithm pays attention to covariance matrix that brings together the input and
output data space. This method decomposes the input and output simultaneously while keeping the
orthogonality constraint, having the model focussed on the relation between the input and output
variables[9]. PLS can be seen as an extension of PCA. This method is concerned with two data
blocks, X and Y, and the objective is to model X in such a way that Y can be predicted as well as
possible, maximizing the covariance between matrices X and Y. Matrix X is decomposed into a score
matrix T and a loading matrix P[9,21] as show in equation 2.5 and 2.6 (matrix form):
� � ∑ �". �" �"$� (2.5)
� � �� � (2.6) In a similar way, y can be decomposed in a score matrix U and a loading matrix Q, in
equations 2.7 and 2.8 (matrix form):
% � ∑ &". '" ("$� (2.7)
% � )* ( (2.8) Most of the variance of matrix Y is explained by the first latent variable that is extracted from
the matrices X and Y. In a similar way, the second latent variable is extracted from the residual
matrices which has not been described by the first variable, and so on. When optimal number of
latent variables are calculated, the remaining variance is considered noise[9].
The objective of this method is to fit a linear relationship between the dependent X variables
and independent Y variables by performing a least squares regression between each pair of
corresponding t and u latent vectors, equation 2.9:
&+" � �"," - � 1,2, … . , � (2.9)
-
20
Where bj is the coefficient from the inner linear regression between the jth latent variables tj
and uj, that is, in equation 2.10:
," � &" . �" 1&" . &"23 (2.10)
Linear PLS leads to the decomposition of the X and Y matrices into a number of rank-one
matrices. This decomposition can be defined as the product between each pair of input score
vectors, t, and predicted output score vectors ,û, and a set of corresponding input and output loading
vectors p and q. [21]
The PLS method prediction performance was characterized by the Root Mean-Square Error of
Cross-Validation (RMSECV) (equation 2.11) [24]:
456�78 � 9∑ �:+;;?@ A (2.11)
PLS is a simple and powerful approach for data-analysis for complex problems because of its
flexibility and ability to deal with incomplete and noisy data with multiple variables and observations
(measurements). In this study, PLS will only model one variable, but the method is able to model
several response variables[22]. The disadvantage of PLS is that like PCA, it can only model linear
relations between the data [9].
-
21
4. Implementation of Step Tests
Performing step tests in the unit was of great importance and that was proposed early in this
work. The unit is new and had never been submitted to step tests and therefore these tests were
planned and performed, in order to better understand it’s response and behaviour. By better
understanding the process performance, and by submitting the unit to step tests, we hoped to
develop a soft-sensor that could explain and predict the behaviour of Y (the primary/quality variable)
even in the case of the unit operating out of the specified operating temperature, pressure and flow
values.
This chapter presents the planning and the development of the tests carried out in the
Refinery.
4.1 Step Tests Planning
4.1.1 Historical Data Analysis and Variable Selection
Since step tests had never been performed in this unit the first approach was to select the
variables that would have an influence on Y. This first step included the study of the fractionator
together with the insight and experience of the Refinery Team and the Thesis Supervisors, and after
some exchange of ideas and suggestions it was agreed that the variables X3, X8, X9, X10, X12, X13
and X22 were to be tested.
The next step was to build a preliminary model using the historical data available using PCA
followed by PLS (obtained in a similar fashion as described in chapter 5). This model was to be used
only in assessing if a given step test would indeed influence the quality variable Y, and if it did, how
long it took the quality variable to stabilize. Then we looked into the historical data and checked if
there were disturbances in the secondary variables selected previously that could be considered a
‘step-test’ (like a sudden decline or increase of rate or temperature). Using those ‘step-test’ values,
we calculated the Y results, and evaluated and estimated the quality variable setting time for each
step test. After carefully analysing the data, it was found that for variables X8, X9 and X10 the setting
time between tests was required to be at least 2 hours; for variables X3, X12 and X13 the setting
time between tests was to be at least 3 hours and finally, for variable X22, the setting time between
tests was to be at least 5 hours. The sequence for the variables testing was agreed with the Refinery
Team in order to reduce the overall impact in the operating conditions.
-
22
4.1.2 Sensitivity Analysis
To evaluate the impact that the tests could have on the quality variable Y, the fractionator was
modelled using the simulator Petro-SIM™ version 4.1 (Modelling platform for refiners, petrochemical
and gas processing plants, from KBC Oil and Gas Consulting). To model the fractionator in this
software, the fluid packaging chosen was Peng-Robinson-LK and all the rates, temperatures and
pressures used were the ‘Base Case’ values of the Chevron’s Manual as the licensor’s of the
hydrocraking unit of Sines Refinery [8].
To start modelling the unit, one must first define the mixture of the feed stream, that is, define
the viscosity, standard density and ASTM D86 distillation for each oil compound of the feed stream.
Then, the compounds must be blended and the feed composition must also be described. After this
procedure this stream is included in the unit and its flow, temperature and pressure are defined.
Following this procedure, the side strippers were installed, as well as the stripping vapour streams,
and the pumparound and bottom stream.
The next step is to assign the pressure and tray efficiencies of the column. The printscreen of
the fractionator after modelling is in figure 4.1.
Figure 4.1 – Graphical User Interface for Petro-SIM™ after the fractionator was built and modelled.
-
23
After modelling the unit in Petro-SIM™, the amplitudes of the step tests were tested to acess
their influence in Y. Based on previous tests in other units of the Refinery, it was decided to test if the
impact of the following steps of -3%, -5%, -7%, -10%, -13% and -15%, and if changes of 3%, 5%,
7%, 10%, 13% and 15% on each of the chosen secondary variables (except X22) had any influence
on the quality variable. The results obtained for each of these simulation tests are shown in tables
4.1, 4.2 and 4.3. Tables 4.1 and 4.2 show the simulation test results for X3, X8, X9, X10, X12 and
X13. Each of the table’s lines express the percentage variance in the quality variable caused by the
test in the corresponding secondary variable.
Table 4.1– Simulation results for the negative step tests.
Input Output Step Tests Amplitude
-1(%) -3(%) -5(%) -7(%) -10(%) -13(%) -15(%) X3
Y deviation
(%)
0,41 1,46 2,54 3,71 8,20 X8 -0,05 -0,17 -0,29 -0,42 -0,61 -0,88 -0,95 X9 -0,09 -0,29 -0,49 -0,71 -1,05 -1,40 -1,64 X10 -0,61 -0,61 -1,27 -1,60 -2,97 -4,61 -5,72 X12 -6x10-5 1x10-4 4,5 x10-5 1x10-4 6 x10-5 8 x10-5 1x10-4 X13 -0,13 -0,43 -0,72 -1,00 -1,39 -1,39 -1,61
Table 4.2 – Simulation results for positive step tests
Input Output Step Tests Amplitude
1(%) 3(%) 5(%) 7(%) 10(%) 13(%) 15(%) X3
Y deviation
(%)
-0,39 -1,09 -1,64 -1,60 -1,99 -2,26 -2,41 X8 0,00 0,16 0,26 0,36 0,55 0,72 0,83 X9 0,09 0,25 0,44 0,63 0,90 1,14 1,29 X10 0,52 1,05 1,52 1,54 2,44 2,36 2,79 X12 -3x10-6 9x10-7 -5x10-5 -4x10-5 -7x10-5 -6x10-5 -3x10-5 X13 0,13 0,39 0,70 0,98 1,35 1,68 1,88
Table 4.3 shows the results of the simulation step tests results for variable X22. The tests
amplitude of disturbances was different because of the nature of the variable. The step amplitudes
used in the previous variables would not have any noticeable effect in Y, so the amplitudes of the
step tests were increased in this case.
Table 4.3 – Simulation results for both positive and negative tests in variable X22.
Input Output Step Tests (%)
-20 -10 +10 +20 +35 +40 X22 Y deviation (%) -0,141 -0,112 -0,056 -0,028 0,014 0,028
Analysing the previous tables one might be tempted to conclude that the magnitude of these
step tests on these variables has little influence in the Y, however, as can be seen in chapter 5,
most of these variables are present on the models developed, demonstrating their importance.
-
24
Moreover, the amplitude of the step tests had to be such that the production of the unit would not be
largely affected, hence the small magnitude of the tests.
4.2 Step Tests Results
The scheduling and the sequence of the variables testing was organized to accommodate the
Refinery conveniences, in order to minimize the impact into the production profile and quality. The
step tests were performed as much as possible without disrupting the Refinery’s routines. For each
particular test a sample was taken and the time stamp of the sample was annotated. Samples were
only taken after the calculations using the preliminary model showed that the quality variable Y had
stabilized after a given step test. The scheduling of the step tests can be seen in table 4.4.
Table 4.4 - Fractionator step tests scheduling.
Predicted
time Actual Time
Variable Test Magnitude
Sample Time
Day 1 14:00 14:17 X3 -1 % 16:45 17:00 16:50 X3 +1 % 20:10 20:00 20:16 X3 -3 % 23:16 23:00 23:16 X3 +3 % 01:57 Day 2 2:00 2:00 X10 -3 % 03:53 4:00 4:00 X10 +5 % 05:53 6:00 6:00 X10 -5 % 08:07 8:00 8:10 X10 -7 % 09:57 10:00 10:00 X10 +5 % 11:57 12:00 12:04 X10 +5 % 13:05 14:00 13:40 X12 +7 % 16:21 17:00 16:27 X12 -1 % 18:52 20:00 19:00 X12 +2 % 20:55 23:00 21:20 X12 +5 % 22:57 Day 3 2:00 23:03 X12 -5 % 01:01 2:00 01:58 X9 +7 % 03:12 4:00 03:15 X9 -5 % 05:11 6:00 05:16 X9 -7 % 06:40 8:00 07:04 X9 +5 % 08:45 10:00 09:00 X9 -5 % 10:19 14:00 13:53 X13 +7 % 17:00 17:00 17:07 X13 -5 % 18:45 Day 4 24:00 03:23 X13 -7 % 05:17 3:00 05:18 X13 +5 % 07:25 6:00 07:27 X13 +5 % 08:45 9:00 09:21 X8 +13 % 10:40 11:00 10:48 X8 +1 % 11:54 13:00 11:59 X8 -5 % 13:47 15:00 13:53 X8 +11 % 15:06 16:00 15:10 X8 +11 % 16:30 18:00 16:41 X22 -15% 18:47 22:00 19:46 X22 +15% 22:10 Day 5 3:00 22:17 X22 +15% 00:53
-
25
0,99
0,995
1
1,005
1,01
1,015
1,02
1,025
1,03
1,035
0 10 20 30 40
Y
Sample number
Pre-tests
samplesX3
X10
X12
X9
X13
X8
X22
Table 4.4 shows the predicted starting time for each test and the actual time the test was
started as well as the variables to be tested, the magnitude of the test, the time the sample was
taken and the result of that same sample. Most of the time the step tests started earlier because the
quality variable Y stabilized earlier than predicted, and the next test could be made sooner.
As expected, the step tests planned had a clear effect on the Y. All Y results (real values or
predicted values) presented here and throughout this thesis are shown in an adimensional form, as
seen in equation 4.1:
% � BCDEFGHIJFKLG�AKGHI (4.1)
The target range of [0.97,1.03] degrees Celsius for Y was covered. Having the Y cover a wide
range of values allows the dynamic data to accommodate the influence of a wider range of process
conditions on the quality variable. The laboratory analysis error for Y is 0.30% of the set point value
of the quality variable Y, and most step tests results have been bigger than 0.30%.
As noted in the subchapter 4.1.2, most of these variables end up appearing in the models
developed in the next chapter. Variables X3, X10, X13, X8 and X22 appear in Model B, variables X3
and X13 appear in Model C, and variable X13 appears in Model A.
Figure 4.2 – Laboratory results for Y during step tests.
Tables 4.5 to 4.7 presented show the effects of consecutive step tests had on the quality
variable. From the analysis of figure 4.3 and the above mentioned tables, we can infer which
selected step test variables influenced the most the response of the quality variable Y, and they are
X3, X10, X13, and X22.
-
26
Table 4.5 – Y magnitude of variation for each step test on variables X3, X10, X12 and X9
Table 4.6 - Y magnitude of variation for each step test on variables X13 and X8.
Table 4.7 - Y magnitude of variation for each step test on variable X22.
Variable Test Magnitude
Y Magnitude of Variation
Variable Test Magnitude
Y Magnitude of Variation
X3
-1 % -0,47 % X12 +7 % 0,30 % +1 % 0,71 % -1 % -0,05 % -3 % -0,68 % +2 % -0,30 % +3 % -1,07 % +5 % 0,05 %
X10 -3 % 1,89 % -5 % 0,49 % +5 % -1,55 % X9 +7 % -0,49 % -5 % -0,19 % -5 % -0,60 % -7 % 2,30 % -7 % 0,50 % +5 % -0,92 % +5 % 0,27 % +5 % -0,03 % -5 % 1,53 %
Variable Test Magnitude
Y Magnitude of Variation
Variable Test Magnitude
Y Magnitude Variation
X13 +7 % -3,56 % X8 +13 % 0,14 % -5 % 0,08 % +1 % -0,05 % -7 % 0,92 % -5 % 0,14 % +5 % 0,69 % +11 % 0,08 % +5 % -0,22 % +11 % 0,11 %
Variable Test Magnitude Y Magnitude of variation X22 -15% -1,10 %
+15% 0,80 % +15% -2,17 %
-
27
-
28
5. Soft-sensor Development
The first approach to soft-modelling is usually by the use of the most widely accepted linear
tools, like PCA and PLS regression. The main advantage of methods like PCA and PLS is that they
can cope with highly correlated variables. This characteristic is suitable for analysing data from
hydrocracking process units, because hydrocracking processes are multivariable systems and many
of these variables are mutually correlated. To perform this type of analysis and model development,
historical plant data for selected variables was collected and step tests were performed for carefully
chosen variables and process conditions.
In this section the quality variable Y is predicted using 25 online variables available in the
database. These variables include flowmeters, temperature and pressure sensors and all are online
measured variables. The selection of which variables should be included in the soft-sensor is a
complex task and the strategy consists in finding a good variable subset capable of making accurate
predictions. In this study two methods are used to obtain the soft-sensors to predict the quality
variable: Partial Least Squares (PLS) as a linear modelling tool, and Principal Component Analysis
(PCA) as a tool to select a good model variable set and to strip down the models from outliers and
noise.
Dataset were collected directly from the Digital Control System (DCS) and the Real Time
Database (RTDB) of the Refinery and were used to build four models. The soft-sensors obtained
from these data were labelled Model A, B, C and D, and the datasets are:
Model A: The soft-sensor is obtained from training data collected during the week of the step
tests, in 2013 from October 27th to October 31st, using the same data to calibrate the model.
Model B: The soft-sensor is obtained from training data collected in 2013 between the August
1st and the October 31st, using the same data to calibrate the model.
Model C: The soft-sensor is obtained using training data collected in
top related