multi-camera system for stone slab scanning

Multi-camera System for Stone Slab Scanning

João Henrique de Agrela Vital

Thesis to obtain the Master of Science Degree in

Mechanical Engineering

Supervisors: Prof. Jorge Manuel Mateus MartinsProf. Pedro Daniel Dinis Teodoro

Examination Committee

Chairperson: Prof. Paulo Jorge Coelho Ramalho OliveiraSupervisor: Prof. Jorge Manuel Mateus Martins

Member of the Committee: Prof. João Rogério Caldas Pinto

June 2018

Acknowledgements

A sincere acknowledgement to my supervisors, Prof. Jorge M. M. Martins and Prof. Pedro D. D. Teodoro,

who guided me through the project by sharing their experience.

I would also like to express my gratitude to the Frontwave, S.A. team, who was always welcoming and

ready to help when needed. A special attention to Eng. Nuno Reis, for helping me with everything

related to hardware and structure assembly.

Finally, a warm thank you to my family, who are the base of my education and values, for supporting me

unconditionally. To my girlfriend, for being by my side and cheering me on. To Manas Pitch, for being

the family that I chose inside the academic institution, who accompanied me throughout the years and

made it a very happy journey.

i

Resumo

Com a Indústria 4.0, alguns setores sofreram uma revolução completa. No entanto, a indústria de

processamento de pedra ainda assenta em processos não-ótimos. Devido ao carácter familiar das em-

presas e da grande variabilidade da matéria-prima, a resistência à mudança por parte deste setor é

bastante elevada. Empresas como a Frontwave, S.A. desenvolvem projetos que visam elevar os stan-

dards do setor e acreditam que o primeiro passo consiste em descrever a geometria e cor dos produtos

finais. A aquisição destes dados irá permitir um planeamento cuidado das próximas operações sobre

o produto, evitando desperdícios. O planeamento poderá então ser enviado para uma máquina CNC

ou qualquer outra máquina de processamento físico para uma execução limpa e planeada. A imagem

do produto pode também ser utilizada no processo de classificação, gestão de stock e vendas não-

presenciais. Este projeto propõe uma nova solução para a aquisição de uma imagem correspondente

ao produto. O desenvolvimento teve como objetivos um aumento na resolução de imagem e uma min-

imização dos custos associados à produção da máquina. O projeto culminou num sistema consistindo

numa régua de câmaras e respetivos controladores. Estes módulos são utilizados para uma primeira

fase de processamento distribuído, enviando as contribuições resultantes para um PC, onde a imagem

final é reconstruída. O processo baseia-se em algoritmos atuais como registo de imagem por pesquisa

grosseira-fina, mapeamento de dados utilizando funções de base radial, e métodos inteligentes de fusão

de imagens, que foram adaptados e implementados para servir o processo em causa. Comparando os

resultados do projeto com a máquina atual, produzida pela Frontwave, S.A., a solução proposta atinge

uma resolução de imagem dez vezes superior e permite poupar 30% em custos de equipamento de im-

agem. A disposição das câmaras a uma distância inferior do produto permite ainda reduzir o tamanho

da estrutura, na dimensão da distância das câmaras ao produto, em aproximadamente 80%.

Palavras-chave: Indústria 4.0, Visão Computacional, Pedra Ornamental, Digitalização

ii

Abstract

At the dawn of Industry 4.0, some sectors found its methods completely revolutionized. However, the

stone industry still relies on old, sub-optimal processes. For being traditionally a family business and

dealing with non-standardised raw-materials, this industry is very resistant to change. Companies like

Frontwave, S.A. are dedicated to rising the stone industry to higher standards and believe that the first

step is to create an accurate description of the geometry and colour of the final products, in the form

of an image. The data will allow to carefully plan the next operations, avoiding waste. These plans

can then be sent over to a CNC machine or any other processing machine, for a clean and planned

execution. Additionally, the image may be used for product classification, stock management, non-store

retailing and post processing planning. This thesis proposes a new solution for the acquisition of a pic-

ture describing a stone slab. The development was driven by achieving the highest image resolution

with minimal costs. The resulting system consists of an array of cameras and respective controllers.

The controller modules serve as a primary processing stage, sending the outputs to a PC, which re-

constructs the final image. State-of-the-art methods like coarse-to-fine matching, radial basis function

warping and multi-resolution splining were adapted and implemented to achieve the best results with the

least computational expense. Comparing with the current scanning machine developed by the company

Frontwave, S.A., the solution proposed achieves ten times more resolution and saves 30% in imaging

equipment costs. Additionally, the camera to slab distance was reduced by 80%, allowing for a much

slimmer scanner.

Keywords: Industry4.0, Computer Vision, Ornamental Stone, Scanner

iii

Table of Contents

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i

Resumo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

Nomenclature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x

1 Introduction 1

1.1 Industry Paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Ornamental Stone Resources and Production in Portugal . . . . . . . . . . . . . . . . . . 2

1.3 The Company . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.4 The challenge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.6 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Background 9

2.1 Stone Scanning Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Feature Detection and Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.1 Feature detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.2 Feature Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2.3 Feature matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3 Image Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.4 Colour Balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.5 Colour Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

6 Results 27

6.1 Single Image Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

6.2 Feature Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

6.3 Image Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

iv

6.4 Computation Time and Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

7 Conclusions 34

7.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

7.1.1 Automatic Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

7.1.2 Real-time validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

7.1.3 Implementing GANS for up-scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

References 37

v

List of Figures

1.1 Industrial evolution time-line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Total production of ornamental stones from 1992 to 2002, in thousands of tonnes. . . . . 3

1.3 Revenues from exports of ornamental stones from 2005 to 2013. . . . . . . . . . . . . . . 3

1.4 Distribution of mining sites in Portugal according to the type of stone extracted. . . . . . . 5

2.1 Classical example of the aperture problem. . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Image filtered with gaussian filters of different sizes and standard deviations. . . . . . . . 12

2.3 Edges highlighted using the Roberts, Prewitt, Sobel, Canny and Laplacian of Gaussian

methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 Corners detected using the Harris and the Shi-Tomasi methods, from left to right . . . . . 13

2.5 Blobs detected using different detection methods. . . . . . . . . . . . . . . . . . . . . . . . 14

2.6 Example of Gaussian and Laplacian pyramids built using the Burt and Adelson’s method. 15

2.7 Graphical explanation of the theory behind the SIFT descriptor. . . . . . . . . . . . . . . . 16

2.8 Plots showing different feathering window types. . . . . . . . . . . . . . . . . . . . . . . . 19

2.9 Images representing overlapping regions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.10 Results of merging the images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.11 Graphical representation of the feathering windows used by the different methods for

image blending. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.12 Image resulting from splining the two sample images using the multi-resolution method

proposed by Burt and Adelson. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.13 Model of the colour checker used in the project. . . . . . . . . . . . . . . . . . . . . . . . . 24

6.1 Image reconstruction using different approaches. . . . . . . . . . . . . . . . . . . . . . . . 28

6.2 Close ups on figure 6.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

6.3 Example of matches found with the SIFT algorithm. . . . . . . . . . . . . . . . . . . . . . 29

6.4 Results from different matching methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

6.5 Images containing high frequency information blended using the tested methods. . . . . . 31

6.6 Images containing low frequency information blended using the tested methods. . . . . . 31

vi

List of Tables

1.1 SWOT analysis of the Stone Industry in Portugal. . . . . . . . . . . . . . . . . . . . . . . . 4

6.1 Matching methods comparison. Relative to the reference match, made by hand. . . . . . 30

6.2 Final output dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

vii

Abbreviations

BMIE Block Matching with Initial Estimation. 29, 30

CEVALOR Centro Tecnológico para Aproveitamento e Valorização das Rochas Ornamentais e Indus-

triais. 4

DLT Direct Linear Transformation. 17, 18

DoG Difference of Gaussians. 14

GANS Generative Adversarial Networks. 35

GCDF Gaussian Cumulative Distribution Function. 18, 19, 31

GLOH Gradient Location and Orientation Histogram. 15, 16

LoG Laplacian of Gaussian. 14

MSER Maximally Stable Extremal Regions. 14, 29, 30

PC Personal Computer. 34

PCA Principal Component Analysis. 16

RAM Random Access Memory. 32

RANSAC Random Sample Consensus. 18, 29

RBF Radial Basis Function. 24–26

RGB Red, Green and Blue. 24, 26

SIFT Scale Invariant Feature Transform. 15, 16, 23, 29, 30

SURF Speeded Up Robust Features. 15, 29, 30

SWOT Strengths, Weaknesses, Opportunities and Threats. 3–5

viii

Nomenclature

Image Analysis

I Image

g(σ) Gaussian Filter

σ Standard Deviation

Iσ Gaussian Filtered Image

~ Convolution operator

∇2 Laplacian operator

Ii Image derivative over the axis i

S Structure Tensor, Second moment matrix

λi ith eigenvalue of a matrix

H Hessian Matrix

LoG Laplacian of Gaussian

Image Matching

H Homography matrix

R Rotation Matrix

T Translation Matrix

Radial Basis Functions

Φ Form function

p(x) Polynomial function

General

J Cost function

G Colour homogenization gains matrix

ix

E Error function

x

Chapter 1

Introduction

This chapter presents the general context where this project is inserted. Given the industry orientation

inherent to the project, an overview of the current industrial standards is provided in the first section,

Industry Paradigm. The project is motivated by the needs of potential clients and validated by the

current scenery of the industry sector in Portugal, taken as a first market approach. Both these topics

are addressed in the second section, Ornamental Stone Resources and Production in Portugal.

The development was supported by and took place at Frontwave, S.A’s facilities in Pêro Pinheiro, a brief

summary about the company is given in the third section, The Company. The project’s goals are defined

in the fourth section, The challenge and the contributions and achievements are presented in the fifth

section, Contributions. Finally, the sixth section, Thesis Structure provides a detailed structure of the

remaining document.

1.1 Industry Paradigm

During the last century, the Industrial activities have experienced significant development through stan-

dardisation, automation and production management. Although the developments are available to any

kind of industry, the nature of the business makes it easier or harder to implement the strategies. Up to

the current date, the technological development has been split up in four revolutions, portrayed in Figure

1.1. The first one traces back to 1780 with the first loom moved by steam, which relocated produc-

tion from homes to factories. The second revolution started 100 years later with continuous production

driven by the division of labour and introduction of conveyor belts. The third was marked by the intro-

duction of programmable logic controllers which enabled digital programming of autonomous systems.

The latter paradigm still rules today’s modern system engineering and allows for efficient and flexible

automation systems. The introduction of internet technologies into industry announced the arrival of

the fourth industrial revolution. Industry 4.0 is closely related to the implementation of cyber physical

systems. This means that every product, component and entity in the industrial process have an identity

1

Figure 1.1: Industrial evolution time-line.

on the network, enabling permanent communication and data traffic. This data can be used in optimiza-

tion algorithms for dynamic scheduling, opening new paths for autonomous product navigation through

the production line. There are currently many companies, organizations and universities working on the

transition of industrial paradigm, following certain prerequisites like:

1. Investment Protection: Industry 4.0 should be stepwise introducible into existing plants.

2. Stability: Industry 4.0 should not compromise production.

3. Data Privacy: access to data related to production and services must be controlled to protect the

company’s know-how.

4. Cybersecurity: Production systems must be programmed not to cause damage to the environment,

economy or humans.

"For Industrie 4.0, the term revolution does not refer to the technical realization but to the ability to meet

today’s as well as future challenges." [1].

1.2 Ornamental Stone Resources and Production in Portugal

Portugal is one of the world’s leading producers of ornamental stones being placed 9th in the worldwide

production rank. It has internationally renowned products such as the white and pink marbles, and

produces large quantities of light cream limestones, grey, yellow and pink granites, and dark grey slate.

Figure 1.2 shows the total production of ornamental stones from 1992 to 2002, as shown in the study

conducted by Sobreiro [2], and Figure 1.3 shows the revenues from international trade of ornamental

stone from 2005 to 2012, as shown in the work by Espírito Santo Research team [3]. Carvalho et al [4]

estimate a total resource availability of 410 million cubic meters, from which 274 million refer to granite,

76 million to limestone, 51 million to marble and 9 million to slate. Marble extraction comes mainly from

the region of Estremoz and Borba. The main limestone mining sites are located on the regions of Leiria

and Coimbra and the main mining sites for granite are located North, around the area of Monção and

Valença. Figure 1.4 shows the distribution of mining sites in Portugal.

2

Figure 1.2: Total production of ornamental stones from 1992 to 2002, in thousands of tonnes.

Figure 1.3: Revenues from exports of ornamental stones from 2005 to 2013.

A study conducted by Banco Espírito Santo [3] shows that traditional producers like Portugal, Spain

and Italy are loosing the market share to countries like China, India and Turkey, which have high re-

source availability and financial support for the development of the sector. This reinforces the need for

technological advancement as well as financial incentives for the implementation of new market strate-

gies to keep up with the competitive countries. A Strengths, Weaknesses, Opportunities and Threats

(SWOT) analysis 1 to the stone industry sector in Portugal reveals that the large amounts and high qual-

ity of the internationally recognised stones, as well as the know-how and long lasting tradition represent

strong points. The sector is threatened by the strong competition from countries like China, India and

Turkey and the development products which replaces stone. The sector’s weaknesses include the lack

of marketing strategy, competent management teams and inter-corporate cooperation. Finally, there are

1A SWOT analysis, acronym for Strengths, Weaknesses, Opportunities and Threats, is a business assessment

tool and serves as a decision tool for the future development of the company in its own commercial sector. The

theory was introduced by Albert S. Humphrey in the 1960’s.

3

opportunities in finding new production solutions and uses for stone products, as well as reaching out for

international markets. Table 1.1 shows a complete and detailed scheme of the SWOT analysis. Centro

Tecnológico para Aproveitamento e Valorização das Rochas Ornamentais e Industriais (CEVALOR) [5]

processed this analysis and summarized the critical concerns for the future success of the stone industry

on the following topics:

1. Improve marketing and communication strategies;

2. Increase the globalization efforts;

3. Specialization in non-standard products;

4. Increase the added value by extending the supply chain closer to the final consumer;

5. Investment in Human Resources qualification and training;

Helpful(to achieve the objective)

Harmful(to achieve the objective)

Inte

rnal

orig

in(p

rodu

ct/c

ompa

nyat

trib

utes

)

SLarge quantity and high quality of resources

Globally renowned products

Products exclusive to Portugal

Own know-how and technology

Long-lasting tradition of the businessW

Marketing and Management Strategy

Multiple small companies

Low inter-company cooperation

Poor human resource skills

Ext

erna

lorig

in(e

nviro

nmen

t/mar

keta

ttrib

utes

)

ONew production solutions

Alternative uses for ornamental stone

Globalization

New markets

Training of Human resourcesT

Strong competitors (China, India)

Alternative products

Environmental issues

Table 1.1: SWOT analysis of the Stone Industry in Portugal.

The processing of ornamental stones is mechanized and automated. However, production management

and quality control still rely on human decision. This places the industry in level 3.0, meaning that there is

a margin of improvement and work to be done to bring it to a higher technological level. As stated in the

previous section, the type of business may present resistance to change, making it harder to transition

from an industrial scene to another. In Portugal, the stone industry presents resistance to change due

to the following topics:

1. The stone industry traces back thousands of years, since stone started to be extracted and pro-

cessed, and in Portugal it is usually a family-based business;

2. Stone is a natural product, making standardisation difficult due to high variance of its characteris-

tics, i.e. dimensions, patterns, physical properties.

Figures 1.2 and 1.3 show that the production and sales are increasing. However, market analysis shows

the need for innovation to keep the business sustainable and profitable.

4

Figure 1.4: Distribution of mining sites in Portugal according to the type of stone extracted.

1.3 The Company

Frontwave, S.A. [6] is a company dedicated to developing solutions which bring the stone industry to

higher standards. Over the years of activity, the owners of stone processing factories showed the need

to describe their final products digitally, in the form of an image, with the primary objective of stock

management and serving as marketing material. Additionally, it could be used for quality control, material

classification and as a reference for future processing of the slab. Consequently, the company started a

project with the goal of obtaining an image of the polished slabs using a machine placed at the end of the

production line. The project is called Stone Scan and is in the course of being successfully introduced in

the market. The images acquired may be used to showcase the factory’s products on-line, making the

information available worldwide and portraying an appealing and clean view of the product, leading to an

easier communication with a potential client anywhere in the world. This is in line with the conclusions

drawn from the SWOT analysis presented in the previous Section 1.2.

1.4 The challenge

The first scanning machines provided a strong step towards the intended target. However, the image

resolution and the system adaptability and flexibility to the production lines do not meet the desired goals

yet. In addition, the cost of production was high for the results obtained. The challenge given by the

company was to design a machine capable of doing a better job than the previously developed one, with

the following requisites, relative to the Stone Scan machine:

5

1. Better Resolution

2. Lighter structure with less volume

3. Modularity of the machine, i.e. possibility to change the number of cameras, to fit the production

line.

4. Design a cheaper solution

The goal was to achieve better results using cheaper hardware making it a more competitive product.

This poses several problems to solve, such as multi-camera calibration and stitching and overall colour

correction. The results of the first machine built and the need of technological advancement in the stone

sector provided validation and motivation to carry out the development of a new version of the scanning

machine, using state-of-the-art algorithms and carefully selected hardware to fulfil the goals to the best

possible degree.

1.5 Contributions

The project encompassed fundamental steps in mechanical project: Hardware and structure design,

Software development and Validation through testing the final prototype. The main contributions in each

step were as follows:

• Hardware and Structure Design - Inspired by the layout of existing machines, the structure de-

sign and hardware chosen allows to significantly reduce the size and cost, achieving a better

performance. While the structure is similar to the available machines, the innovation lies on the

type of hardware used for this solution.

• Software Development - The software was designed for camera modularity and flexibility and

consists of a chain of events resulting in an image portraying the stone slab. The procedure is

supported by existing algorithms which were adapted and chained together to produce the final

output. This phase of the project consisted in the following steps:

1. Creation of a local network for communication between Linux devices (camera controllers)

and Windows Devices (PC), using SSH, SCP protocols.

2. Development of Python Scripts for synchronized image acquisition.

3. Development of Python Scripts for image reconstruction, including geometric and colour cor-

rections.

4. Development of MATLAB R© Scripts for Feature Matching, Colour Mapping and Image Blend-

ing

• Testing - During the testing phase, the methods used in the project were tested and evaluated

against state-of-the-art methods available in MATLAB R© toolboxes. The results can serve as a

baseline for future development of applications working in similar conditions.

6

As far as the extensive research and reading took me, this solution is innovative both in the hardware

used and in the chain of algorithms which lead to the final output.

1.6 Thesis Structure

The rest of the document is divided in the following chapters:

Chapter 2 – Background

Provides an insight on the current solutions available for the scanning of stone slabs as well as a theo-

retical background on state of the art feature detection and matching, colour balancing and image fusion

techniques. Some served as support in the development of the solution proposed in this document and

some are present for comparative reasons and further information.

??

This chapter is confidential and is presented in Appendix ??. It presents the hardware used in the

physical setup as well as the connectivity between the devices on the system. The final section of

this chapter outlines the main processes needed to carry out the scanning procedure and defines the

requirements and constraints taken in consideration in the development of the system.

??

This chapter is confidential and is presented in Appendix ??. The system’s successful implementa-

tion depends strongly on a correct calibration procedure. This chapter takes the reader through the

calibration of the system, describing the algorithms used in each step.

??

This chapter is confidential and is presented in Appendix ??. The data acquired in the calibration

process is used in the image acquisition and processing such as camera rotation corrections and colour

balancing. This chapter covers the process from image acquisition to the final panoramic output.

Chapter 6 – Results

To assess the performance of the system and validate the choices made in its development, this chapter

presents and compares the results with different approaches using state of the art algorithms.

7

Chapter 7 – Conclusions

A summary of the achievements of the project is presented. The final part of this chapter is dedicated to

the proposed future work, improvements of the scanning machine, and scientific research which can be

done from its outputs.

8

Chapter 2

Background

This chapter presents the theoretical basis as well as the existing state-of-the-art algorithms and equip-

ment used to support the development. An overview of the existing equipments for stone slab scanning

is presented in the first section Stone Scanning Machines. The device developed uses a multi-camera

array for scanning, hence the need to match the resulting images from each camera. The second sec-

tion, Feature Detection and Matching, provides a review spanning from the basics of feature detection

and matching to the current state-of-the-art algorithms. Merging the matched regions can be challeng-

ing due to small discrepancies in size, rotation and colour of the features. The third section, Image

Fusion, presents a review of different feathering methods along with advantages and disadvantages

of each. The colour distribution of matched regions often differs slightly between cameras, creating

colour-wise uneven panoramas. The fourth section, Colour Balancing, provides a review of methods to

solve this issue and create seamless panoramas. Finally, the whole process misses the point if the por-

trayed colours do not correspond to what is perceived under standard lighting conditions, like sunlight,

for example. The fifth section, Colour Correction, details the theory behind the current state-of-the-art

method for colour correction. For a complete version of the document, the reading of this chapter should

be followed by ??, in Appendix ??.

2.1 Stone Scanning Machines

At first thought, a simple solution comes to mind which is to simply take a picture of the full slab. This is

impractical because the slabs may have dimensions up to 2 x 3 meters, making it necessary to distance

the camera from the slab, leading to a very large structure in an environment where it is difficult to

control the lighting conditions. This would lead to a decrease in resolution, colour adulteration and

reflexes, which are not desired. To solve this issue, the current solutions take images closer to the

slab and stitch the outputs to create an image of the full slab. Currently there are four solutions on the

market. The Bstone Scaner [7] by Bstone , Taglio Scanner [8] by Taglio, MapaScan [9] by MapaStone

9

and Iris StoneScan by D2 Technologies [10]. The Bstone is a portable device, capable of scanning slabs

up to 500x600 mm, manually, outside off the production line. The Taglio Scanner and MapaScan are

very similar solutions, to be implemented on the production line and relying on a single high resolution

camera to perform the acquisition. The StoneScan differs from the latter two by the usage of two high-

end cameras instead of just one.

2.2 Feature Detection and Matching

Computer vision started being implemented in the industry around the 1990s and was predicted to rev-

olutionise the manufacturing processes and integrate controllable processes, [11]. Nowadays, visual

sensors, i.e. cameras, are being widely used for surveillance, creating maps and panoramas, as well

as in industry, to control processes and help with quality checks. This is achieved by extracting and

processing the data acquired. In the case where there are multiple sensors, this information must be

matched with the data coming from other sensors so that the sensed object or scene can be fully charac-

terised. This is done by detecting features in one image, describing them with a specific set of metrics,

and comparing these metrics with the ones found in the data acquired by another sensor. Features

from different sensors with similar metrics are potential matches. Once the features are successfully

matched, it is possible to compute a transformation matrix which relates the two images. The key for a

good match is to choose the most appropriate similarity measures, or metrics.

2.2.1 Feature detection

The first step is to detect interest points in an image I. An interest point is a point which could be

used for 2D matching, usually associated with brightness discontinuities. This leads to the aperture

problem. The aperture problem, Figure 2.1 occurs when observing a moving scene through a window

which is not big enough to unambiguously estimate its motion, this was firstly mentioned by Horn and

Schunk [12] and studied more intensively by Anandan [13]. This happens because motion estimation

requires references that move in both x and y axis. Motion can only be estimated over the normal of a

feature’s borders, hence the motivation for using corners or blobs as interest points. The most common

methods for detecting interest points analyse the derivatives of I. Edges correspond to regions with a

high derivative in only one direction, corners would have high derivative values on both directions and

blobs correspond to regions of low derivatives delimited by an edge. Canny [14] proposed three criteria

which should be satisfied by an edge detector and may be applied to any interest point detector:

• Good detection - The detector should only occasionally incorrectly assign edge pixels, either by

failing to mark true edge points or by incorrectly marking non-edge points.

• Good localisation - Points marked by the detector as edge points should be as close as possible

10

to the centre of the true edge

• Single response - The detector should only produce a single response to a given edge.

A B C

Figure 2.1: Classical example of the aperture problem. The striped patterns have different motion

directions. However, the apparent motion direction is the same when seen through the circular

path.

Gaussian filtering is widely used to filter out noisy data as well as selecting specific frequencies. This

served as the base for the first approaches on automatic scale detection. An image I is filtered by

performing a convolution with a gaussian filter g, Equations 2.1 and 2.2.

g(x, y, σ) =1

2πσ2e−x2 + y2

2σ2 (2.1)

Iσ = I(x, y) ∗ g(x, y, σ) =

x+3σ∑x−3σ

y+3σ∑y−3σ

I(x, y)g(x, y, σ) (2.2)

Where x and y correspond to pixel coordinates and sigma is the standard deviation of the gaussian

distribution. The filter is convoluted over a length of six sigma on both coordinates since this represents

99,73% of the information, comprising the most significant portion of information. Figure 2.2 shows an

example of an image filtered with gaussian filters of different sizes and standard deviations.

11

Figure 2.2: Image filtered with gaussian filters of different sizes and standard deviations. Larger

values of sigma result in increased blurring of the image. Notice how the size of the filter should

be adjusted according to the standard deviation in order to utilise the most significant portion of

the filter. The last column shows a filter with a size of three sigma instead of six sigma, which

leads to significant loss of filter information.

Edge detectors

The first approaches, Robert [15], Prewitt [16] and Sobel [17] operators, Equations 2.3, 2.4 and 2.5,

consisted in calculating the derivatives of the image by applying a set of discrete differentiation masks

in different directions, Equations 2.6 and 2.7, highlighting the high frequency information. Canny [14]

developed a multi-stage algorithm where the gradients were obtained using the previously mentioned

masks, followed by non-maximum suppression to exclude false detections, thresholding to evaluate

potential edges and eliminate all the low score edges which are not linked to potential strong edges.

MRobert =

−1 0

0 1

(2.3) MPrewitt =

−1 0 1

−1 0 1

−1 0 1

(2.4) MSobel =

−1 0 1

−2 0 2

−1 0 1

(2.5)

Ix = f(x, y) ∗MMethod (2.6)

Iy = f(x, y) ∗MTMethod (2.7)

There is also the method of the Laplacian of the Gaussian (LoG), Equation 2.8. The Laplacian operator is

applied to a previously gaussian filtered image. This method is analogous to the Difference of Gaussians

(DoG), where two images filtered with different strength gaussian filters are subtracted, resulting in a

discrete version of the differentiation operator. Figure 2.3 shows the results of applying the different

12

detectors to a test image.

∇2Iσ =∂2Iσ

∂x2+∂2Iσ

∂y2(2.8)

Figure 2.3: Edges highlighted using the Roberts, Prewitt, Sobel, Canny and Laplacian of Gaus-

sian methods. The results were obtained using the computer vision toolbox from MATLAB.

Corner detectors

Figure 2.3 shows that edge detectors enhance both edges and corners. In a way, edge detectors

highlight all discontinuities, including corners. Corner detectors take the result of an edge enhancing

method and apply a scoring procedure to evaluate the presence of a corner feature. State-of-the-

art detectors include the Harris and the Shi-Tomasi operators, which analyse the structure tensor of

a previously gaussian filtered image, derived from its gradients, Equation 2.9, alternatively called the

second moment matrix. The difference between the methods lies on the coefficients used as metrics

for the detection. Harris [18] implements the corner score with Equation 2.10, where k is a tunable

sensitivity factor. This avoids the computation of eigenvalues, which is computationally more expensive.

Shi-Tomasi proposes calculating the eigenvalues of the matrix and taking the minimum value as the

score [19], Equation 2.11. Both methods select the highest scores by applying a threshold to the score.

S =

I2x IxIy

IxIy I2y

(2.9)

dH = det(S)− k trace2(S) (2.10)

dST = min(λ1, λ2) (2.11)

Where λi corresponds to the ith eigenvalue of S and Ix and Iy are the derivatives of the image over axis

x and y, which can be determined using one of the methods of image differentiation presented in the

previous sub-section.

Figure 2.4: Corners detected using the Harris and the Shi-Tomasi methods, from left to right. The

results were obtained using the computer vision toolbox from MATLAB.

13

Blob detectors

Blobs correspond to areas without brightness discontinuities. However, a blob may be described using

its centre of mass, making it an interest point. Lindeberg [20] experimented using the determinant of

the Hessian, Equation 2.12, or the Laplacian, corresponding to the trace of the Hessian. Blobs were

detected by searching for the maximum of the normalized Laplacian of Gaussian (LoG) in scale-space,

where the scale corresponds to the amount of filtering applied to the image. The Laplacian of Gaussian

is normalized with σ2, as seen in Equation 2.13. Lowe approximates the Laplacian with the Difference

of Gaussians (DoG) and searches for local extrema of the scale-space. Matas et al [21] developed

the Maximally Stable Extremal Regions (MSER) which analyses a grey scale image to find connected

regions of similar pixel intensities, where the regions are surrounded by pixels with either higher or lower

intensity than all the pixels contained in the stable region. Figure 2.5 shows an example of the application

of the different blob detection methods.

H(f(q)) =∂2f(q)

∂qiqj=

∂2f

∂x2∂2f

∂x∂y∂2f

∂y∂x

∂2f

∂y2

(2.12)

LoGNormalized = σ2 ∗ LoG(x, y) =1

πσ2

(x2 + y2

2σ2− 1)e−x2 + y2

2σ2 (2.13)

DoG(x, y) = Iσ − Iσ∗

(2.14)

Figure 2.5: Blobs detected using different detection methods. The results were obtained using

the computer vision toolbox from MATLAB.

Scale-Space Theory and Pyramids

The scale-space was developed from the necessity to detect features at different scales. The most com-

mon scale-space is the Gaussian scale-space, which is generated by consecutively filtering an image

with increasingly strong filters to progressively average out the highest frequencies in the image. Linde-

berg [20] introduced the notion of automatic scale selection, using the LoG to generate the scale space

levels. To make this process more computationally efficient, Burt and Adelson propose to sub-sample

the blurred images by an octave every octave step on the filter’s standard deviation, thus creating an

image pyramid [22]. Both the LoG and DoG result in a pyramid where each level is a frequency band,

where features from each scale can be detected. Gaussian filtering corresponds to a low-pass filter

14

and the derivatives and subtraction of Gaussians corresponds to the continuous-time and discrete-time

versions of a high-pass filter, hence creating the referred band-pass filter. Figure 2.6 shows an example

of a Gaussian and a Laplacian Pyramid built with Burt and Adelson’s method. Anandan [13] proposes

a coarse to fine search by decomposing an image into frequency bands using Burt and Adelson’s algo-

rithm, searching for correspondences in a coarse scale, projecting into a finer scale and searching on

the neighbourhood of uncertainty of the projection.

Figure 2.6: Example of Gaussian and Laplacian pyramids built using the Burt and Adelson’s

method.

2.2.2 Feature Description

The second step is to extract and describe the detected features. The simplest possible metric is to use

pixel intensity to describe features. These measures can be used accurately when both frames differ

from each other only by a translation vector u [23].

However, more complex problems involve camera rotations, scale, point of view and luminance changes,

motivating the development of invariant descriptors. A good example is using the colour distribution,

or histogram. This measure describes the window of search, not the pixels, meaning that the same

feature will have the same description even if it is rotated. However, the histogram will not be the

same under scale or point of view changes. To address this issues, more advanced descriptors were

created such as the Scale Invariant Feature Transform (SIFT) [24], Gradient Location and Orientation

Histogram (GLOH) [25], Shape Context [26] and Speeded Up Robust Features (SURF) [27], among

others. The SIFT method uses the scale-space theory to detect interest points at different scales. After

detecting a potential interest point at a defined scale, a 16x16 patch around the point is extracted and

normalized using a Gaussian filter where the standard deviation depends on the detection scale. The

patch gradients are computed using finite differences and are grouped over 4x4 windows, quantizing the

15

information into 8 orientations. This results in descriptors of dimension 128. Figures 2.7(a) and 2.7(b)

show the computation of gradient field and the histogram over 4x4 patches using a simplified feature

example.

(a) (b)

Figure 2.7: Graphical explanation of the theory behind the SIFT descriptor. (a) Gradient magni-

tude and direction and (b) Histogram of 4x4 windows of the extracted feature.

Differences in colour gains, saturation and contrast affect magnitude but not orientation. Therefore, this

descriptor is robust to different lighting conditions. Moreover, the histograms can be rotated over the

maximum magnitude to provide some robustness to rotation changes. Gradient directions are quantized

inπ

4intervals, which means the descriptor is robust to rotations up to approximately 45 degrees.

GLOH is an extension of the SIFT descriptor, designed to increase its robustness and efficiency. The

SIFT descriptor is calculated for a log-polar location grid, resulting in 17 location bins with gradient

orientation quantized in 16 bins. Thus generating a descriptor of dimension 272, which is reduced using

Principal Component Analysis (PCA) to 128 elements. Shape Context is a descriptor similar to the

SIFT but using edge information instead of gradient information. The edges are detected using Canny’s

method [14]. The edge locations are described in a log-polar coordinate system and quantized in 9 bins

and edge orientation in 4 bins, leading to a descriptor of size 36.

2.2.3 Feature matching

The third and final step is to match the features found in different images. The most common method is

to take the L2 norm between the descriptors, Equation 2.15.

‖X,Y ‖2 =√

(X − Y ) · (X − Y ) =√

(x1 − y1)2 + (x2 − y2)2 + (x... − y...)2 + (xN − yN )2 (2.15)

The matches are used to compute a transformation matrix, also known as an homography matrix, which

16

relates the two frames and consists in a rotation and translation matrix stacked together, Equation 2.16.

The homography is computed using the location of a feature in both images, hence the need to find

matching features in both images. This is usually done by taking the Euclidean distance between de-

scriptors, where the best match corresponds to the pair with lowest distance. Lowe [24] states that

this method is not robust enough and that it would be useful to have a way of discarding features that

do not have any good match from the database, proposing an additional condition. If the conditiondsecond smaller distance

dsmaller distance≥ 1.5 is true, the features are matched. This eliminates features that do not have

any good match, or features which are not unique and had multiple good matches, making them im-

proper for calculating the homography matrix.

H =

h1 h2 h3

h4 h5 h6

h7 h8 h9

, R =

h1 h2

h4 h5

, T =

h3h6

(2.16)

Where hi are elements of the homography matrix. The R and T matrices are the rotation and translation

matrices implicit in the homography matrix. The last row[h7 h8 h9

]corresponds to additional scaling

terms in the homogeneous coordinate space. For an affine homography, this row is set to[0 0 1

].

Estimating an homography

The homography matrix maps features from one image to another. There are various methods for

computing the homography. Most often the method used is the Direct Linear Transformation (DLT),

[28]. Consider a pair of matched features, one in the left image and one in the right image with the

homogeneous coordinates x ={uleft vleft 1

}Tand x =′

{uright vright 1

}T. Since the coordinate

space is homogeneous, the relation between these points can be written as:

x′w = Hx (2.17)

The relation between these can be computed as follows:urightw

vrigtw

w

=

h1 h2 h3

h4 h5 h6

h7 h8 h9

uleft

vleft

1

(2.18)

Re-writing in to a system of equations leads to:

w′ = h7uleft + h8vleft + h9

uright =h1uleft + h2vleft + h3h7uleft + h8vleft + h9

vright =h4uleft + h5vleft + h6H7uleft + h8vleft + h9

Using h =[h1 h2 h3 h4 h5 h6 h7 h8 h9

]T,

B =

uleft vleft 1 0 0 0 −urightuleft −urightvleft −uright0 0 0 uleft vleft 1 −urightuleft −urightvleft −uright

(2.19)

17

Such that

Bh = 0 (2.20)

If the homography is normalized, i.e. H33 is 1, the problem has 8 variables, hence needs 4 pairs of

matched features in order to estimate h. The problem is solved by stacking matrices B resultant from

different pairs in a matrix A and using singular value decomposition to get a sum of squared differences

optimum solution. This is equivalent to taking the eigenvector corresponding to the smallest eigenvalue

of matrix ATA. With the homography matrix, it is possible to project points from the left image to the

right and assess the error that this projection poses against the real position of the feature. Although

thresholding eliminates most of the unfit matches, false matches are still a possibility and will introduce

an error in the homography estimate and could now be considered outliers from the positive match

data set. This motivated the usage of a non-linear state-of-the-art outlier rejection procedures like the

Random Sample Consensus (RANSAC) [29].

RANSAC

The RANSAC works by taking random samples using the necessary points to fit the function it is trying to

estimate and enlarging the set with data that produces coherent results, [29]. Applied to the homography

estimation, the algorithm would take 4 random pairs of matched features, compute the homography

matrix using a method like the DLT, and find the pairs which produce a coherent result through a number

of iterations. Knowing the probability of picking up a false match from the set, it is possible to calculate

the number of iterations k necessary to get a certain level of confidence z that at least one error free

selection of points was made, i.e. the algorithm succeeds, using Equation 2.21.

k =log(1− z)log(1− wn)

(2.21)

Where n is the number of points needed for fitting the model. In the case of estimating the homography,

every pair of points is associated with a probability of a good match, hence 4 pairs are needed leading

to n = 4. As an example, if the probability of picking a true match is 80%, and the required probability

of success is 99.9%, then the number of iterations needed is calculated plugging in the variables in

Equation 2.21, k =log(1− 0.999)

log(1− 0.84)= 14.

2.3 Image Fusion

Image fusion refers to the process of transitioning between two overlapping images. The classical

methods consist in applying a membership function over a fixed region of the overlapping regions. This

function spans from 0 to 1 and can be linear or non-linear. Figures 2.8(a) and 2.8(b) show examples of

applying a sharp transition and using a transition based on a Gaussian Cumulative Distribution Function

(GCDF), function of the standard deviation.

18

(a) (b)

Figure 2.8: Plots showing different feathering window types. (a) Sharp transition. (b) Fixed window

feathering

Using a GCDF is preferred over using a linear interpolation as it allows to choose the ratio at which

the images are merged. Merging two images could result in two artefacts, ghosts and seams. A ghost

appears when, due to a misalignment of the overlapping images, a faded version of the misaligned

feature appears. A seam occurs mostly when using a sharp transition, due to slight colour differences or

misalignment between images, a visible transition shows on the final output. Assuming Figures 2.9(a)

and 2.9(b) correspond to overlapping regions of two images, Figures 2.10(a) and 2.10(b) show the

result of applying a sharp transition or GCDF transition. As mentioned before, the presence of a seam

is noticeable in Figure 2.10(a), where the transition from one image to another is clearly visible. In

addition, ghosts are present in Figure 2.10(b), where features of one image, (red stripes), are visible due

to a misalignment both in position and size. The images used as examples are intentionally misaligned

and with a different colour distribution to clearly expose each method’s advantages and disadvantages.

(a) (b)

Figure 2.9: Images representing overlapping regions. (a) Left camera and (b) right camera.

19

(a) (b)

Figure 2.10: Results of merging the images. (a) sharp transition and (b) a wider window using a

CDF function

These misalignments occur in the process of acquisition using multiple cameras due to errors in the

assembly process or hardware differences. To avoid seams, the area over which the mosaics should

be interpolated should be equal to the largest feature in the image. Moreover, to avoid ghosts, the

interpolation window should be smaller than twice the size of the smallest feature in the image. In the

sample images, the largest feature corresponds to the area bellow the red stripes and the smallest

feature corresponds to the red stripes. Concluding, it is impossible to achieve the optimum feathering

window. This is the case for most images and imaging applications. To address this issue, Burt and

Adelson propose a method named multi-resolution spline. The term image splining is used to refer

to the procedure of merging two images avoiding seams. "A good image spline will make the seam

perfectly smooth, yet will preserve as much of the original image information as possible.". [30]. The

proposal is to decompose the image in frequency bands and join each band with increasingly large

feathering windows. Thus, the high frequency features will be blended using sharper weights while lower

frequency features are merged over a wider window. The decomposition is done by building Laplacian

Pyramids for both images as well as a Gaussian Pyramid for the weighting function. Figure 2.11 shows

a simplified example of the evolution of the weighing function over the pyramid levels.

20

Figure 2.11: Graphical representation of the feathering windows used by the different methods for

image blending.

The final step consists in reconstructing the splined image, which is done by summing each level of the

combined pyramid.

Summarising, the steps to achieve multi-resolution splining consist of:

1. Build Laplacian pyramids for image A and B denoted by LA and LB.

2. Build a Gaussian pyramid for the weighing function, GM . The weighing function can be converted

to an image by attributing the function value to a pixel.

3. Combine the pyramid levels by doing LSlevel = LAlevel ∗GM level + LBlevel ∗ (1−GM level).

4. Obtain the splined image by expanding and summing the levels of the pyramid.

The theory behind Gaussian and Laplacian pyramids is explained in detail in Section 2.2.1. Figure 2.12

shows the result of using this method to merge Figures 2.9(a) and 2.9(b)

21

Figure 2.12: Image resulting from splining the two sample images using the multi-resolution

method proposed by Burt and Adelson.

This method is not perfect, one artefact can be seen in the space between the middle red stripes.

However, there is a significant improvement over the results shown in figure 2.10(a) and 2.10(b), as the

interpolation function is chosen according to the frequency content of the images to spline.

2.4 Colour Balancing

Colour balancing is the process of balancing the colours between frames on a stitched mosaic. Colours

may differ due to different exposure levels, colour gains or view point changes. These techniques focus

on transforming the colours from a source image to a target value and can be divided in parametric

and non-parametric. Parametric approaches assume the colours can be transformed linearly using a

3x3 transformation matrix M such that Is ∗M = It, where Is and It are the source and target images.

Early approaches include the brightness compensation, where M is diagonal and with equal diagonal

values. The transformation matrix M is found using the colour information of two overlapping areas.

The simplest model is the one where M is diagonal. It assumes that colour channels are independent.

M takes the form of Equation 2.22, where α =mean(R2)

mean(R1), and β and γ are found similarly, using the

intensity values from the green and blue channels. The advantage of this model is that it does not need

two strictly overlapping areas, since the area is being averaged.

Mdiagonal =

α

β

γ

(2.22)

Dependence between colour channels can be added, resulting in a linear model, Equation 2.23, where

M can be estimated using two overlapping regions I1 and I2 by doing M = (IT1 I1)−1IT1 I2, where I is an

22

(n,3) matrix, where n is the number of pixels in the images.

M linear =

a b c

d e f

g h i

(2.23)

These models can be extended to an affine transformation by adding an offset, resulting in an improve-

ment of the mapping accuracy. In this case, the matrix containing pixel intensity information should be

extended, taking the form of Ii =[ri gi bi 1

]. Equations 2.24 and 2.25, show the extended forms

of the diagonal and linear model.

Mdiagonal−affine =

α

β

γ

α1 β1 γ1

(2.24)

M linear−affine =

a b c

d e f

g h i

a1 e1 i1

(2.25)

Where the offset can be found by

a1

e1

i1

T

=

mean(R2)

mean(G2)

mean(B2)

T

−

mean(R2)

mean(G2)

mean(B2)

T

·

a b c

d e f

g h i

(2.26)

Although the diagonal model with affine transformation and the linear model with and without affine

transformation provide more accurate mappings, they require that the gain estimation is made using

exact pixel correspondence. In problems involving different cameras and points of views, it is rare that

an exact correspondence is found. Tian et al [31] proposes a solution for this problem. Consider two

pictures I1 and I2, firstly the maximum overlapping area is found. Taking the histograms of the two

regions, region 1 is transformed so that it matches the histogram in image 2. Performing this transfor-

mation results in a transformed region 1 with direct pixel correspondences with the original image 1 and

allows the application of any of the previously mentioned models to estimate the Matrix M .

Non-parametric approaches rely on feature detection and balance the colours locally by finding a relation

between the matched features and applying the correction to neighbouring regions. Yamamoto et al

[32] proposes a method where SIFT features are extracted and the matches are used to generate a

look-up table using an energy minimization approach. These methods are not commonly used as the

results often do not compensate the increased complexity and computational load associated with their

implementation. Different methods are suitable for different applications. Xu,W and Mulligan, J [33]

23

showed that parametric methods, despite being less complex than non-parametric, yield stable and

effective results while being computationally faster than non-parametric approaches.

2.5 Colour Correction

Similarly to the previous section, colour correction is a form of colour warping. Colour correction is pre-

sented in a different section with the purpose of differentiating its use. While colour balancing focuses on

balancing the colours between adjacent mosaics to avoid seams, colour correction focuses on mapping

the overall colour of the resultant image to the intended Red, Green and Blue (RGB) values. These

values are commonly taken from a colour checker, consisting in a set of coloured patches with known

RGB values. The colour checker used was the SG X-Rite, shown in Figure 2.13.

Figure 2.13: Model of the colour checker used in the project.

The relation between source and target colour values is highly non-linear, hence the methods presented

in the colour balancing section are not suitable. Menesatti et al [34], propose a non-linear mapping using

a Radial Basis Function (RBF) with a thin plate spline weighing function. The following section explains

the theory behind RBF warping. The theory is presented considering the application to the RGB colour

space.

Radial Basis Function

Given a data set X = {xi}Ni=1 ⊂ IR3 and correspondent function values {fi}Ni=1 ⊂ IR, find the interpolant

s : IR3 → IR such that

s(xi) = fi, i = 1, ..., N. (2.27)

where x = (r, g, b) are data points from the colour space, with r, g and b being the red, green and blue

intensity values and fi = (r∗, g∗, b∗) is the correspondent correct colour. The interpolant is chosen from

the Beppo-Levi space of distributions on IR3 with square integrable second derivatives, which contains

a set:

S = {s ∈ BL(2)(IR3) : s(xi) = fi, i = 1, ..., N} (2.28)

24

of solutions for the problem. Taking the rotation invariant semi-norm inherent to this space,

‖s‖2 =

∫IR3

(∂2s(x)

∂r2

)2

+

(∂2s(x)

∂g2

)2

+

(∂2s(x)

∂b2

)2

+

2

(∂2s(x)

∂r∂g

)2

+ 2

(∂2s(x)

∂r∂b

)2

+ 2

(∂2s(x)

∂g∂b

)2

dx

(2.29)

as a measure of energy or smoothness, the functions with the lowest energy, i.e.,

s∗ = argmin ‖s‖ , s ∈ S (2.30)

are the smoothest and proven by Duchon [35] to have the form of:

s(x) = p(x) +

N∑i=1

λi(‖x− xi‖) (2.31)

where p(x) is a linear polynomial, λi are coefficients with real values and ‖·‖ is the Euclidean norm. This

is a particular example a RBF, where the data points xi are called centres of the function. A general

formulation of a RBF is

s(x) = p(x) +

N∑i=1

λiΦ(‖x− xi‖) (2.32)

where p(x) is a low degree polynomial and Φ(x) is a form function which is chosen according to the

problem. For fitting functions of three variables, the case of this problem, the bi-harmonic (Φ(r) = r, the

case of Eq. 2.31) and tri-harmonic (Φ(r) = r3) are the advised choices. In the specific case of this 3D

problem, the interpolant s(x) is defined by a polynomial of the form p(x) = c0 + c1r + c2g + c3b, being

the variables (r, g, b) correspondent to the red, green and blue colour channels, and the coefficients λi.

To ensure the interpolant is contained in the Beppo-Levi space of distributions on IR3, the coefficients λi

are required to fulfil the orthogonality conditions:

N∑i=1

λi =

N∑i=1

λiri =

N∑i=1

λigi =

N∑i=1

λibi = 0 (2.33)

The interpolation and orthogonality conditions may be combined in a linear system to solve for the

coefficients which define the RBF. Thus Equation 2.32 and 2.33 may be written as A P

PT 0

(λc

)= B

(λ

c

)=

(f

0

)(2.34)

where

Ai,j = Φ(‖xi − xj‖), i, j = 1, ...N (2.35)

P =

1 r1 g1 b1

1 r2 g2 b2...

......

...

1 rN gN bN

, λ ={λ1 · · · λN

}T, c =

{c0 c1 c2 c3

}T(2.36)

RBFs have the particularity of having an associated linear system which is always invertible, hence the

solution can be found by (λ

c

)= B−1

(f

0

)(2.37)

25

The function value is in fact {fi}Ni=1 ⊂ IR3, which results in a RBF being fitted to each colour channel.

The coefficients found are then (λ

c

)=

[(λ

c

)r

(λ

c

)g

(λ

c

)b

](2.38)

A set of data Y = {yi}Mi=1 ⊂ IR3, where {yi} ={r g b

}, with r, b and g being the measured pixel

intensity values for the colours red, green and blue, can now be mapped into the real values using the

RBF coefficients calculated with the calibration data set X. This can be done by using Equation 2.34

and plugging the data set in the equations. Resulting in:

Ai,j = Φ(‖yi − xj‖), i = 1, ...,M j = 1, ...N (2.39)

P =

1 y1

1 y2...

...

1 yM

=

1 r1 g1 b1

1 r2 g2 b2...

......

...

1 rM gM bM

(2.40)

resulting in the linear operation

(f

0

)=

r1 g1 b1

r2 g2 b2...

......

rM gM bM

=

A P

PT 0

[(λc

)r

(λ

c

)g

(λ

c

)b

](2.41)

Where f corresponds to corrected pixel RGB values. Radial basis functions are calculated for a calibra-

tion data set and later used to correct any data set acquired under the same calibration conditions. The

computation effort grows with the size of the calibration set, making the choice of said set important.

26

Chapter 6

Results

This chapter presents results from the implemented methods as well as the results of alternative meth-

ods which were taken as validation for the procedures used. The first section, Single Image Recon-

struction presents the results from the modular phase processing. The second section, Feature Match-

ing presents matching results obtained using several state-of-the-art methods and provides a compari-

son with a ground truth reference made manually and the method proposed in the document. The third

section, Image Fusion presents results of different image blending methods, supporting the method

proposed for this project. Finally, the fourth section, Computation Time and Efficiency provides a re-

view of the efforts taken to maximize efficiency of the processes and harvest the full processing potential

of the system.

6.1 Single Image Reconstruction

The solution implemented for video reconstruction relies on the correct measurement of the conveyor

belt’s velocity and that the calibration process is done correctly. It is arguable whether a more au-

tonomous stitching procedure could be implemented. To test this possibility, an algorithm similar to the

fine search presented in section ?? was used, assuming that the slab will only have a vertical translation.

The results show a classical example of the aperture problem, presented in section 2.2.1. In areas where

there are features like corners, the displacement vector was estimated correctly. However, in areas like

the ruler, features were not sufficient to estimate vertical translation, resulting in the shortening of the

ruler height, visible in Figure 6.1(a). Note that the images were rotated for a better space usage. The

images shown in the figure were reconstructed from left to right, corresponding to the rotated vertical

axis.

27

(a)

(b)

Figure 6.1: Image reconstruction using different approaches. (a) Stitching by search. (b) Stitching

using the system’s parameters. The images were reconstructed from left to right.

In the examples of the full reconstruction, the regions highlighted in red represent zones where the

automatic search is prone to fail due to insufficient features. Area 1 is visibly distorted in Figure 6.2(a)

and area 2, although harder to notice, is expanded by approximately 100 pixels, which is translated

to roughly 10 mm. The error resulting from the search method would break the process since the

reconstructed images would miss information or have information that would not match with the other

cameras.

(a)

(b)

Figure 6.2: Close ups on figure 6.1. (a) Ruler in image 6.1(a). (b) Ruler in image 6.1(b).

28

6.2 Feature Matching

The algorithm implemented in this project for feature matching was tested against the SIFT method, us-

ing the VLFeat toolbox [36] and the SURF, MSER and Harris implementations of the MATLAB R© Com-

puter Vision Toolbox [37]. Table 6.1 provides a comparison between methods, supporting the method

proposed in this project. Additionally, one matching was made manually, to be taken as the ground truth,

i.e. the "perfect" match. Using the different methods, matrices P were generated, as described in Equa-

tion ?? which were used as measures for comparison between methods. The row relative to vertical

alignment contains cumulative information which must be eliminated to leave the relative displacement

between consecutive frames. After eliminating this information, P can be compared with the reference

P ideal taking the absolute of the difference, Equation 6.1.

AD(P ideal, Pmethod) = |P ideal − Pmethod| (6.1)

Figure 6.3: Matches found with the SIFT algorithm. False matches highlighted in red.

Local methods rely on feature extraction and matching leading to a translation vector for each pair of

features matched. Figure 6.3 shows an example of two overlapping images matched using the SIFT

method. Although the methods try to maximize true positives, the results still contain false positives

which were excluded using a RANSAC routine, explained in Section 2.2.3. Block Matching with Initial

Estimation (BMIE) and SIFT performed similarly and have the best results. The other methods had

similar results for the vertical alignment but performed poorly in the horizontal alignment. This perfor-

mance can be explained by the presence of very similar patterns and small overlapping areas which

could lead to too many false positive matches and insufficient positive matches to correctly estimate the

displacement. The matching results are presented in Figure 6.4. Concerning the run time performance,

the BMIE method returns results 3.67 times faster than the SIFT method. Consequently, the method

proposed for matching the images was the one which performs the best. Table 6.2 displays the slab

width measured for the reference, BMIE and SIFT matches. The physical measurement of the slab’s

width was 600 mm. This confirms that the matching made manually yields the most approximate final

result, and that the two automatic methods return results which are very close to the real one.

29

Method vertical error (px) σ horizontal error (px) σ Run Time (s)

BMIE 0.5 0.54 2.17 2.32 1.55

SIFT 0.6 1.03 2.17 2.14 5.7

SURF 1.7 1.21 76.17 59.35 1.4

MSER 0.6 0.52 108.50 90.61 3

Harris 1.5 1.37 77.67 60.60 1.4

Table 6.1: Matching methods comparison. Relative to the reference match, made by hand.

(a) By hand (b) BMIE (c) SIFT

(d) SURF (e) Harris (f) MSER

Figure 6.4: Results from different matching methods.

Method Slab width (mm)

Ground Truth 600.7

BMIE 602.7

SIFT 596.8

Table 6.2: Final output dimensions

30

6.3 Image Fusion

Panoramas were obtained by merging image contributions using two naive solutions: no feathering and

fixed window feathering. These were compared to the implemented multi-resolution feathering to assess

its performance. The latter is fundamentally a mix between the first two, applying smaller feathering

windows to higher frequencies and a wider one to lower frequencies. Graphical representations of

the weights to use for each method are presented in Section ??, Figure 2.8 and Figure 2.11, for the

multi-resolution, GCDF and sharp feathering, respectively. To understand the benefits of using multi-

resolution blending, Figures 6.5 and 6.6 show examples of images containing high and low frequency

content, blended using different techniques. Analysing the high frequency results in Figure 6.5, the multi-

resolution blending performs similarly to the no-feathering solution, avoiding the ghosted area created

by the wider window feathering solution. Looking at the low frequency results in Figure 6.6, the multi-

resolution blending performs similarly to the wide fixed window feathering solution, avoiding the visible

seam created by the no-feathering solution.

(a) No feathering. (b) Fixed window feathering. (c) Multi-resolution feathering.

Figure 6.5: Images containing high frequency information blended using the tested methods.

(a) No feathering. (b) Fixed window feathering. (c) Multi-resolution feathering.

Figure 6.6: Images containing low frequency information blended using the tested methods.

31

6.4 Computation Time and Efficiency

In a first approach, all the procedures were made in a single computer, running a MATLAB R© instance.

This setup was not optimal since all procedures before the feature matching can be done separately and

simultaneously, using the controller connected to each camera. Therefore, the image reconstruction,

affine transformation, lens distortion correction and resolution homogenization were implemented in

Python, to be executed in the controller modules. Doing so arises a second problem: the reconstruction

of a high definition video into an image requires storing the reconstruction in Random Access Memory

(RAM) memory, in un uncompressed form, while in the processing phase, limiting the size of the image

that can be created. Making some simple operations, the maximum height of the image created can be

calculated, knowing that:

1. 800 MB of RAM available.

2. Image converted from 8-bit unsigned integer to double precision floating point format, for process-

ing, taking 24 bytes per pixel.

3. Fixed image width of 1920 pixels.

4. Approximate camera resolution of 13.5px

mm, at 18 cm from the object.

Then,

Hmax =800 · 106

1920 · 24 · 13.5= 1286 mm = 1.286 m (6.2)

Which means that the memory is depleted before even fulfilling the project’s requirements. The proce-

dure implemented to solve this issue resizes the size of the image as it is being reconstructed, reducing

the amount of memory necessary to allocate the reconstruction by the square of the image sub-sampling

factor. Transferring the processing to the controllers allows to scale the number of cameras without in-

creasing the time of computation prior to image matching.

In addition, the size of the image was already a limitation during the first approach where all processing

was made in a single computer since the scan of a 60cm by 60 cm slab, resulted in an image with 92.5

Million pixels, occupying 2220 MB in RAM memory, making the computations extremely slow. This issue

would only aggravate for slabs with larger dimensions. Concluding, it would always be necessary to

reduce the image’s sizes before the feature matching procedure, independently on where the process-

ing would take part. The resize procedure is also helpful for storing the final image. For example, in a

factory with a continuous production, storing full resolution images would result in a rapid depletion of

the available storage. Although sub-sampling does lead to a decrease in resolution, which can be seen

as a disadvantage, this is not equivalent to acquiring the images at a lower resolution. Acquiring the

images with a lower resolution setting would mean that some features would not have been sufficiently

described, while down-scaling from high resolution interpolates the well described features reducing

the number of pixels while retaining as much information as possible. In summary, resizing the images

makes all operations faster and reduces the RAM memory needed to carry out the procedures as well

as the memory needed to store the final output, retaining as much information as possible. There is a

32

trade-off between algorithm acceleration and acceptable final resolution which must be chosen so that

both lie in an acceptable range. The final output should be ready before the next stone is scanned, and

should have enough detail and quality to show all its characteristics to the client. Therefore, this setting

may vary among different factories, since image quality has a part of subjectivity and the time interval

between slabs in the production lines may vary.

33

Chapter 7

Conclusions

This document proposes an innovative system for scanning stone slabs. The system consists of an array

of cameras and respective controllers, and a Personal Computer (PC). This was achieved by creating

a network to enable device communication and capture synchronization. The system allows to scan

slabs up to 5 meters in length and since it was built for modularity, can be adapted to fit any conveyor

belt. It was inspired by the existing stone scanner Iris StoneScan, by D2 Technology in partnership with

Frontwave, S.A., and its development and design were driven towards an increase in resolution and

decrease in price and volume. Indeed, the resolution achieved was of 14.9pixel

mm, showing a nearly

ten times increase compared to the 1.55pixel

mm, achieved by the current version. The price is around

880e

m, hence the cost of a 2.4 m array would be 2112 e, representing 30% savings in the imaging

equipment employed in the Iris StoneScan. Regarding the size, the cameras stand at approximately

20 cm to the slab, while the Iris StoneScan has its cameras placed at about 90 cm from the slab,

representing a reduction of about 80%. The processing time from the end of acquisition to the final

output is around 10 seconds, 5 of which are in the acquisition modules, hence there is availability for

a new slab to be scanned every five seconds. This is sufficient for a regular stone processing factory.

These achievements were made at the cost of the assumption that the slab’s velocity is constant. This

assumption would be more easily satisfied if the cameras translated over the immobile slab, and not the

other way around. This is because the reduced weight and inertia allow for an easier control over the

motion of the cameras. One of the versions of the Iris StoneScan works in the vertical position, where

the slab is placed in a support while the cameras move over it at a constant velocity, hence the solution

proposed would have its best results when employed in this kind of system.

7.1 Future Work

This project comprised a scientific and an industrial mix, hence the future work can be divided into scien-

tific or industrial main interest, although both are related. The first and second sub-sections, Automatic

34

Calibration and Real-time validation have immediate industrial value as the calibration process would

be easier and more accurate and the real-time validation would avoid any unsuccessful scan from en-

tering the system, eliminating the effort of tracking and removing an improper image from the company’s

database. The third sub-section, Implementing GANS for up-scaling, would have a scientific interest

since it exploits a relatively new method which is not fully developed and studied.

7.1.1 Automatic Calibration

As stated in Section ??, proper camera alignment is crucial for the quality of the output. The alignment

process would benefit both in time and accuracy if done automatically, similarly to the colour gains

calibration presented in Section ??. This could be done by following the proposed steps:

1. Develop and implement detection techniques to extract interest points from the calibration ruler.

The interest points correspond to the horizontal line, and the spaced marks.

2. Analyse the information given by the interest points. The horizontal line provides information on

rotation over the S axis, and the spaced marks provide information on the rotation over A1 and A2

axis as seen in Section ??.

3. Couple engines to control the camera’s rotation over its axis and implement a feedback loop to

drive the current state of the camera to its optimum position.

7.1.2 Real-time validation

The system was designed based on the assumption that there is no slippage between the slab and

the conveyor belt. Developing a solution which reconstructs images even on the event of slab rotation

and translation during image acquisition appears to be computationally very expensive and even if im-

plemented, it would hinder the final output quality. Therefore, including an additional camera dedicated

to detecting rotation or horizontal translation would allow for a validation that the slab did not slip from

the conveyor belt during acquisition. In case of detection of slippage, the process would be interrupted,

protecting the final output from unwanted effects. Since the validation camera captures images which

are not for reconstruction, this camera may capture at a lower resolution, making real-time estimation

possible.

7.1.3 Implementing GANS for up-scaling

Both the existing and the proposed new version of the stone scanning machine produce similar images,

with the difference being that the proposed version produces images with 10 times higher resolution. It

would be interesting to use images from the same stone slabs to train Generative Adversarial Networks

(GANS) to perform image up-scaling. GANS are a branch of artificial intelligence algorithms which

35

implement two adversary neural networks, competing on a zero-sum game. The algorithm works by

having one of the neural networks try to mimic the training set and the other judging the success of the

first. In this case, one of the neural networks would try to recreate an up-scaled image of a stone slab,

and the judging network would decide if it was an image of the up-scaled training set or a try from the

other network. The ideal case is the one where the first neural network produces an output which looks

to the judge as if it was taken from the training set. While its interest is primarily scientific, this method

could be used in the future to generate higher resolution images to showcase the products to potential

costumers in a larger display, for example.

36

References

[1] R. Drath and A. Horch, “Industrie 4.0: Hit or hype? [industry forum],” IEEE Industrial Electronics

Magazine, vol. 8, pp. 56–58, June 2014.

[2] M. J. Sobreiro, “Produção nacional e comércio externo (1992 a 2002),” vol. 1, pp. 173–198, Feb

2002.

[3] E. S. Research, “Produção de rochas ornamentais. análise setorial,” vol. 1, 2014.

[4] J. Carvalho, J. Lisboa, A. Casal Moura, C. Carvalho, L. Sousa, and M. M. Leite, “Evaluation of the

portuguese ornamental stone resources,” vol. 548, pp. 3–9, Feb 2013.

[5] CEVALOR, “Estudo estratégico prospectivo 2004 – 2006,” p. 88, 2006.

[6] F. Technology, “Stonescan.” http://frontwave.pt/technology/project/stonescan/. Accessed:

2017-03-20.

[7] BStone, “Bstone scanner - the world’s first handy stone scanner.” http://www.bstone.com/. Ac-

cessed: 2017-03-20.

[8] T. S. House, “Scanner - marble and stone scanning.” http://www.taglio.it/en/stone/

scanner-2/. Accessed: 2017-03-20.

[9] M. Scan, “Mapascan, the 1st scanner in the world of stone.” http://www.mapastone.com/. Ac-

cessed: 2017-03-20.

[10] D. Technology, “Stonescan iris.” http://www.d2technology.com/visao.html. Accessed: 2017-

03-20.

[11] L. Rossol, “Computer vision in industry,” in Robot Vision, pp. 11–18, Springer Berlin Heidelberg,

1983.

[12] B. K. Horn and B. G. Schunck, “Determining optical flow,” Artificial Intelligence, vol. 17, no. 1,

pp. 185 – 203, 1981.

[13] P. Anandan, “A computational framework and an algorithm for the measurement of visual motion,”

International Journal of Computer Vision, vol. 2, pp. 283–310, Jan 1989.

37

http://frontwave.pt/technology/project/stonescan/

http://www.bstone.com/

http://www.taglio.it/en/stone/scanner-2/

http://www.taglio.it/en/stone/scanner-2/

http://www.mapastone.com/

http://www.d2technology.com/visao.html

[14] J. Canny, “A computational approach to edge detection,” IEEE Transactions on Pattern Analysis

and Machine Intelligence, vol. PAMI-8, pp. 679–698, Nov 1986.

[15] L. Roberts, Machine Perception of Three-Dimensional Solids. Jan 1963.

[16] J. Prewitt, “Object enhancement and extraction,” pp. 75–149, Feb 1970.

[17] I. Sobel, “An isotropic 3x3 image gradient operator,” Feb 2014.

[18] C. Harris and M. Stephens, “A combined corner and edge detector,” in In Proc. of Fourth Alvey

Vision Conference, pp. 147–151, 1988.

[19] J. Shi and C. Tomasi, “Good features to track,” in IEEE CVPR, pp. 593–600, 1994.

[20] T. Lindeberg, “Feature detection with automatic scale selection,” Int. J. Comput. Vision, vol. 30,

pp. 79–116, Nov. 1998.

[21] J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide-baseline stereo from maximally stable

extremal regions,” Image and Vision Computing, vol. 22, no. 10, pp. 761 – 767, 2004. British

Machine Vision Computing 2002.

[22] P. J. Burt and E. H. Adelson, “The laplacian pyramid as a compact image code,” IEEE

TRANSACTIONS ON COMMUNICATIONS, vol. 31, pp. 532–540, 1983.

[23] R. Szeliski, Computer Vision: Algorithms and Applications. New York, NY, USA: Springer-Verlag

New York, Inc., 1st ed., 2010.

[24] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of

Computer Vision, vol. 60, pp. 91–110, 2004.

[25] K. Mikolajczyk and C. Schmid, “A performance evaluation of local descriptors,” IEEE Transactions

on Pattern Analysis and Machine Intelligence, vol. 27, pp. 1615–1630, Oct 2005.

[26] S. Belongie, J. Malik, and J. Puzicha, “Shape matching and object recognition using shape con-

texts,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, pp. 509–522, Apr

2002.

[27] H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool, “Speeded-up robust features (surf),” Computer Vision

and Image Understanding, vol. 110, no. 3, pp. 346 – 359, 2008. Similarity Matching in Computer

Vision and Multimedia.

[28] R. I. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision. Cambridge University

Press, ISBN: 0521540518, second ed., 2004.

[29] M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm for model fitting with

applications to image analysis and automated cartography,” Commun. ACM, vol. 24, pp. 381–395,

June 1981.

[30] P. J. Burt and E. H. Adelson, “A multiresolution spline with application to image mosaics,” ACM

Trans. Graph., vol. 2, pp. 217–236, Oct. 1983.

38

[31] G. Y. Tian, D. Gledhill, D. Taylor, and D. Clarke, “Colour correction for panoramic imaging,” in

Proceedings Sixth International Conference on Information Visualisation, pp. 483–488, 2002.

[32] K. Yamamoto and R. Oi, “Color correction for multi-view video using energy minimization of view

networks,” International Journal of Automation and Computing, vol. 5, pp. 234–245, Jul 2008.

[33] W. Xu and J. Mulligan, “Performance evaluation of color correction approaches for automatic multi-

view image and video stitching,” in 2010 IEEE Computer Society Conference on Computer Vision

and Pattern Recognition, pp. 263–270, June 2010.

[34] P. Menesatti, C. Angelini, F. Pallottino, F. Antonucci, J. Aguzzi, and C. Costa, “RGB color calibration

for quantitative image analysis: The “3d thin-plate spline” warping approach,” Sensors, vol. 12,

pp. 7063–7079, May 2012.

[35] J. Duchon, “Splines minimizing rotation-invariant semi-norms in sobolev spaces,” in Constructive

Theory of Functions of Several Variables, pp. 85–100, Springer Berlin Heidelberg, 1977.

[36] A. Vedaldi and B. Fulkerson, “Vlfeat: An open and portable library of computer vision algorithms,”

in Proceedings of the 18th ACM International Conference on Multimedia, MM ’10, (New York, NY,

USA), pp. 1469–1472, ACM, 2010.

[37] “Matlab and computer vision system toolbox,” Release 2017a. The MathWorks, Inc., Natick, Mas-

sachusetts, United States.

39

multi-camera system for stone slab scanning

Documents