manufacturing equipment data collection framework · the bee framework project is essentially a...

176
Faculdade de Engenharia da Universidade do Porto Manufacturing Equipment Data Collection Framework Daniel Jos´ e Barbudo Aguilar Report of Project Master in Informatics and Computing Engineering Supervisor: Maria Teresa Galv˜ao Dias (PhD) July 2008

Upload: others

Post on 21-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

  • Faculdade de Engenharia da Universidade do Porto

    Manufacturing Equipment Data

    Collection Framework

    Daniel José Barbudo Aguilar

    Report of Project

    Master in Informatics and Computing Engineering

    Supervisor: Maria Teresa Galvão Dias (PhD)

    July 2008

  • c© Daniel Aguilar, 2008

  • Manufacturing Equipment Data CollectionFramework

    Daniel José Barbudo Aguilar

    Report of Project

    Master in Informatics and Computing Engineering

    Approved in oral examination by the committee:

    Chair: António Augusto de Sousa (PhD)

    External Examiner: César Analide (PhD)

    Internal Examiner: Maria Teresa Galvão Dias (PhD)

    31st July, 2008

  • Abstract

    The Bee Framework project is essentially a data collection framework used to collectthe data generated by equipments which intends to help improving the equipmentintegration process. Even considering different equipment types, there are severalcommon data collection functionalities that can be easily reused just by doing a smallamount of light changes or even no changes at all. Some of these functionalities are,for example, database access, folder monitoring actions, communication and datacollection, among others.

    The main goal of the project described in this report is the elaboration of adetailed specification for a data collection framework used to collect the data gener-ated by several equipments in an assembly line. Additionally, the core requirementsof the framework should also be implemented and an equipment integration shouldalso be done as a proof of concept. This framework intends to control and monitorthese equipments so that the data they generate can be collected. This data willbe fed into data analysis solutions that provide the tools required to follow-up on acontinuous process evolution and optimization and process control.

    However, the wide diversity of existing equipments used in such assembly lines isa well-known problem that makes it hard to adopt a common integration solution.Moreover, a large number of these equipments do not follow some of the interna-tional standards for data collection and data control used in the semiconductorindustry. These strong difficulties to adopt a common solution lead to the softwaredevelopment of a specific data collection solution for each new equipment type.

    The Bee Framework has been designed with a modular architecture in orderto provide easier and general methods capable of integrating equipments and theirrelated systems. By using such a framework, it will not be necessary to developa specific data collection solution for each new equipment type’s integration, sincethe core functionalities will already be available and ready for use. It will just benecessary to configure the framework according the equipment being integrated andadapt it to face the specific equipment requirements.

    Beyond the aspects related to the framework architecture and specification de-scribed in this document, this project has also considered the development of a proofof concept. The main goal of the proof of concept is to show the resulting advan-tages of using the framework in the equipment integration process. The equipmenttype considered had already a specific solution to collect the data generated and isintegrated in the Qimonda assembly line using a different approach. So, this proof ofconcept intends to establish a comparison regarding both the effort and time neededto integrate the same equipment type using the framework approach instead of theprevious one.

    i

  • ii

  • Resumo

    O projecto descrito neste relatório e designado por Bee Framework é, essencial-mente, uma framework de recolha de dados usada para a integração de equipamen-tos, que pretende auxiliar e melhorar o processo de integração. Mesmo tratando-sede equipamentos de tipos diferentes, existe um grande conjunto de funcionalidadesem comum no que à recolha de dados diz respeito, podendo assim proceder-se àreutilização das funcionalidades comuns efectuando apenas ligeiras ou até mesmonenhumas alterações. Algumas destas funcionalidades são, por exemplo, o acesso abases de dados, monitorização de directórios, comunicação e recolha de dados.

    O principal objectivo do projecto descrito é a elaboração de uma especificaçãodetalhada para uma framework de recolha de dados gerados por equipamentos uti-lizados em linhas de produção. Adicionalmente, os requisitos nucleares da frame-work devem também ser implementados, devendo ainda realizar-se a integração deum equipamento como prova de conceito. Pretende-se assim um controlo e moni-torização destes equipamentos de modo a recolher os dados por estes produzidos earmazená-los em suportes que permitam posterior análise e interpretação dos mes-mos, possibilitando uma cont́ınua evolução e optimização dos processos.

    No entanto, devido à grande diversidade de equipamentos existentes numa linhade produção e devido ao facto de alguns destes equipamentos não seguirem as normasexistentes utilizadas na indústria de semicondutores no que à recolha e controlo dosdados diz respeito, torna-se necessário desenvolver soluções espećıficas de recolha dedados para cada novo equipamento que se pretende integrar, de modo a proceder àrecolha dos dados que são gerados.

    A Bee Framework possui uma arquitectura modular, de modo a fornecer me-canismos fáceis e gerais capazes de integrar equipamentos e os sistemas com elesrelacionados. Utilizando uma framework deste género, não será necessário desen-volver uma solução espećıfica para cada nova integração de um equipamento, umavez que as funcionalidades nucleares estarão já dispońıveis e prontas a utilizar. Aframework deverá apenas ser configurada de acordo com os requisitos espećıficos doequipamento a integrar.

    Para além dos aspectos relacionados com a arquitectura e especificação da frame-work descritos neste relatório, este projecto contou ainda com o desenvolvimento deuma prova de conceito. O seu principal objectivo é demonstrar as vantagens decor-rentes da utilização da framework na integração de um equipamento, isto é, procederà recolha dos dados produzidos pelo equipamento quando este se encontra em fun-cionamento numa linha de produção da Qimonda. Poderá assim ser efectuada umacomparação relativa à quantidade de esforço e tempo requerido, já que o equipa-mento considerado se encontrava já previamente integrado com outra abordagem.

    iii

  • iv

  • Acknowledgments

    I would like to thank Qimonda Portugal for giving me the required conditions Ineeded to complete my project. I would like to thank all my work colleagues,especially my project supervisor Nuno Soares, but also Rui Alves and all othercolleagues belonging to the Equipment Control team for being so supportive andhelping me better understand my project domain.

    A special thank too for my project supervisor at Faculdade de Engenharia daUniversidade do Porto, professor Teresa Galvão, for giving me all the encouragementand support I needed, not only in this project but also during the course of mystudies, always with honest interest and friendship.

    I also would like to thank all the teachers that in some way contributed not onlyto my education but also throughout my personal life.

    Additionally, I extend my thanks to the English professor Sónia Nogueira for thehelp she provided me in the reviewing of this document.

    Finally, my biggest gratitude to my family, especially my parents and sister,and to all my friends for all the support they give me, both in the good and badtimes, always helping me facing my problems and giving me the encouragement andboldness I needed to keep going and achieve my goals. My sincere and special wordof thanks to all of you.

    Daniel Aguilar

    v

  • vi

  • To my family and friends

    vii

  • viii

  • Contents

    Abstract i

    Resumo iii

    Acknowledgments v

    Contents ix

    List of Figures xiii

    List of Tables xv

    Glossary xvii

    1 Introduction 11.1 Project Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Project Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Project Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Approach Methodology and Constraints . . . . . . . . . . . . . . . . 41.5 Report Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2 Data Collection Problem Analysis 72.1 Data Collection Overview . . . . . . . . . . . . . . . . . . . . . . . . 72.2 Data Collection in Semiconductor Industry . . . . . . . . . . . . . . . 82.3 Data Collection at Qimonda . . . . . . . . . . . . . . . . . . . . . . . 9

    3 State of the Art 113.1 Technology Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    3.1.1 Programming Languages and Tools . . . . . . . . . . . . . . . 113.1.2 Modeling Languages and Tools . . . . . . . . . . . . . . . . . 143.1.3 RDBMS and Database Tools . . . . . . . . . . . . . . . . . . . 153.1.4 Data Persistence . . . . . . . . . . . . . . . . . . . . . . . . . 173.1.5 Communication Technologies . . . . . . . . . . . . . . . . . . 203.1.6 Markup Languages . . . . . . . . . . . . . . . . . . . . . . . . 22

    3.2 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.2.1 Collecting Data from Files . . . . . . . . . . . . . . . . . . . . 243.2.2 Database Concurrency . . . . . . . . . . . . . . . . . . . . . . 243.2.3 Handling Messages and Communication . . . . . . . . . . . . 26

    ix

  • CONTENTS

    3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    4 Framework Specification and Architecture 294.1 Framework Black-Box Overview . . . . . . . . . . . . . . . . . . . . . 29

    4.1.1 Collecting Data . . . . . . . . . . . . . . . . . . . . . . . . . . 304.1.2 Saving Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    4.2 Framework White-Box Overview . . . . . . . . . . . . . . . . . . . . . 324.3 Framework Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 34

    4.3.1 Folder Monitor Module . . . . . . . . . . . . . . . . . . . . . . 394.3.2 Backup Module . . . . . . . . . . . . . . . . . . . . . . . . . . 424.3.3 Equipment Modules . . . . . . . . . . . . . . . . . . . . . . . 444.3.4 Message Handling . . . . . . . . . . . . . . . . . . . . . . . . . 48

    4.4 Framework Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.4.1 YODA Service . . . . . . . . . . . . . . . . . . . . . . . . . . 584.4.2 Database Service . . . . . . . . . . . . . . . . . . . . . . . . . 584.4.3 Email Service . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.4.4 Logging Service . . . . . . . . . . . . . . . . . . . . . . . . . . 594.4.5 Framework Messages Service . . . . . . . . . . . . . . . . . . . 604.4.6 Timer Service . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

    4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

    5 Prototype Development 635.1 Prototype Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635.2 AOI — Automatic Optical Inspection — Equipment Overview . . . . 645.3 Collecting Data From AOI Equipments . . . . . . . . . . . . . . . . . 655.4 AOI Integration Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 67

    5.4.1 Complementary Functions . . . . . . . . . . . . . . . . . . . . 685.5 AOI Use Cases Implementation . . . . . . . . . . . . . . . . . . . . . 71

    5.5.1 Process XML Files . . . . . . . . . . . . . . . . . . . . . . . . 725.5.2 Notify XML Generation Down . . . . . . . . . . . . . . . . . . 725.5.3 Backup AOI Log Files . . . . . . . . . . . . . . . . . . . . . . 735.5.4 Log Equipment Breakdown Reason . . . . . . . . . . . . . . . 75

    5.6 AOI Integration Architecture . . . . . . . . . . . . . . . . . . . . . . 765.6.1 Global Logical View . . . . . . . . . . . . . . . . . . . . . . . 785.6.2 Bee Framework Logical View . . . . . . . . . . . . . . . . . . 81

    5.7 AOI Integration Test Cases . . . . . . . . . . . . . . . . . . . . . . . 845.7.1 Process XML Files . . . . . . . . . . . . . . . . . . . . . . . . 84

    5.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

    6 Findings and Discussion 896.1 Event-based Framework . . . . . . . . . . . . . . . . . . . . . . . . . 89

    6.1.1 Detecting Changes in Files . . . . . . . . . . . . . . . . . . . . 896.1.2 Notifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

    6.2 Parsing XML Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . 926.3 Database Access and Saving Data . . . . . . . . . . . . . . . . . . . . 936.4 Time Required for Integration . . . . . . . . . . . . . . . . . . . . . . 946.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

    x

  • CONTENTS

    7 Conclusions 977.1 Project Applicability . . . . . . . . . . . . . . . . . . . . . . . . . . . 977.2 Final Recommendations and Perspectives of Future Work . . . . . . . 987.3 Final Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

    References 101

    Index 107

    A Bee Framework Configurations 113

    B Services Configurations 117B.1 YODA and Message Services . . . . . . . . . . . . . . . . . . . . . . . 117B.2 Database Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117B.3 Email Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119B.4 Logging Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120B.5 Timer Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

    C AOI Integration Test Cases 127C.1 Process XML Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127C.2 Notify XML File Generation Down . . . . . . . . . . . . . . . . . . . 129C.3 Backup AOI Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . 130C.4 Log Equipment Breakdown Reason . . . . . . . . . . . . . . . . . . . 131

    D AOI Database Schema 133D.1 AOI Control Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133D.2 AOI Raw Data Tables . . . . . . . . . . . . . . . . . . . . . . . . . . 135D.3 AOI Summary Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . 136D.4 AOI Target Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

    E AOI — SQL*Loader Usage 139E.1 SQL*Loader Header Files . . . . . . . . . . . . . . . . . . . . . . . . 140

    E.1.1 Board Inspection Header . . . . . . . . . . . . . . . . . . . . . 141E.1.2 Board Rework Header . . . . . . . . . . . . . . . . . . . . . . 141E.1.3 Location Inspection Header . . . . . . . . . . . . . . . . . . . 141E.1.4 Location Rework Header . . . . . . . . . . . . . . . . . . . . . 141

    F AOI Configurations 143F.1 Bee Framework Configuration File . . . . . . . . . . . . . . . . . . . . 143F.2 Folder Monitor Module Configuration File . . . . . . . . . . . . . . . 143

    F.2.1 CopyFile Message . . . . . . . . . . . . . . . . . . . . . . . . . 144F.2.2 MoveFile Message . . . . . . . . . . . . . . . . . . . . . . . . . 144F.2.3 LoadWatchers Message . . . . . . . . . . . . . . . . . . . . . . 144F.2.4 StartMonitoring Message . . . . . . . . . . . . . . . . . . . . . 145F.2.5 ListFilesDirectory Message . . . . . . . . . . . . . . . . . . . . 145

    F.3 AOI Equipment Module Configurations . . . . . . . . . . . . . . . . . 145F.3.1 CommandLot Message . . . . . . . . . . . . . . . . . . . . . . 145F.3.2 Created.AOI Watcher and Created.AOI Log Watcher Messages 146

    xi

  • CONTENTS

    xii

  • List of Figures

    1.1 Bee Framework logo . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    3.1 Visual Studio logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.2 Resharper logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.3 NUnit logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.4 UML logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.5 Oracle Corporation logo . . . . . . . . . . . . . . . . . . . . . . . . . 163.6 Interdependencies of the Enterprise Library application blocks . . . . 183.7 Technology review . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    4.1 Black-box overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.2 White-box overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.3 Framework architecture overview . . . . . . . . . . . . . . . . . . . . 354.4 Singleton pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.5 Framework modules hierarchy . . . . . . . . . . . . . . . . . . . . . . 374.6 Gang of Four — Factory Method pattern . . . . . . . . . . . . . . . . 384.7 Bee Framework — Factory Method pattern . . . . . . . . . . . . . . . 384.8 Relationship between Observer pattern actors . . . . . . . . . . . . . 404.9 Implemented architecture of the Observer pattern . . . . . . . . . . . 404.10 FolderMonitor sequence diagram . . . . . . . . . . . . . . . . . . . . 414.11 Automatic updates of external assemblies . . . . . . . . . . . . . . . . 434.12 Equipment modules hierarchy . . . . . . . . . . . . . . . . . . . . . . 444.13 Start data collection flow . . . . . . . . . . . . . . . . . . . . . . . . . 454.14 Strategy pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.15 Implemented architecture of the Strategy pattern . . . . . . . . . . . 474.16 Template Method pattern . . . . . . . . . . . . . . . . . . . . . . . . 484.17 Architecture that supports message handling . . . . . . . . . . . . . . 494.18 Flow of actions performed when instantiating modules . . . . . . . . 504.19 Messages hierarchy and parameters . . . . . . . . . . . . . . . . . . . 514.20 Chain of Responsibility pattern . . . . . . . . . . . . . . . . . . . . . 534.21 Sequence followed by a request in the chain . . . . . . . . . . . . . . . 544.22 Implemented architecture of the Chain of Responsibility pattern . . . 554.23 Example of broadcasting a message in the chain . . . . . . . . . . . . 56

    5.1 AOI equipment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645.2 AOI integration overview . . . . . . . . . . . . . . . . . . . . . . . . . 675.3 FolderMonitorModule use cases . . . . . . . . . . . . . . . . . . . . . 685.4 AOIEquipmentModule use cases . . . . . . . . . . . . . . . . . . . . . 69

    xiii

  • LIST OF FIGURES

    5.5 Use case: Process XML files . . . . . . . . . . . . . . . . . . . . . . . 735.6 Use case: Notify XML generation down . . . . . . . . . . . . . . . . . 745.7 Use case: Backup AOI log files . . . . . . . . . . . . . . . . . . . . . . 755.8 Use case: Log equipment breakdown reason . . . . . . . . . . . . . . 765.9 AOI main flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795.10 AOI integration entities . . . . . . . . . . . . . . . . . . . . . . . . . 805.11 Bee Framework logical view . . . . . . . . . . . . . . . . . . . . . . . 83

    D.1 AOI database schema . . . . . . . . . . . . . . . . . . . . . . . . . . . 134D.2 AOI Control table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134D.3 AOI raw data tables . . . . . . . . . . . . . . . . . . . . . . . . . . . 135D.4 AOI summary tables . . . . . . . . . . . . . . . . . . . . . . . . . . . 137D.5 AOI Target table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

    xiv

  • List of Tables

    5.1 Process XML files — Test case description . . . . . . . . . . . . . . . 845.2 Process XML files — Test case details . . . . . . . . . . . . . . . . . 85

    C.1 Process XML files — Test case description . . . . . . . . . . . . . . . 127C.2 Process XML files — Test case details . . . . . . . . . . . . . . . . . 128C.3 Notify XML file generation down — Test case description . . . . . . . 129C.4 Notify XML file generation down — Test case details . . . . . . . . . 129C.5 Backup AOI log files — Test case description . . . . . . . . . . . . . . 130C.6 Backup AOI log files — Test case details . . . . . . . . . . . . . . . . 130C.7 Log equipment breakdown reason — Test case description . . . . . . 131C.8 Log equipment breakdown reason — Test case details . . . . . . . . . 132

    xv

  • LIST OF TABLES

    xvi

  • Glossary

    AOI Automatic Optical InspectionAPI Application Programming InterfaceApplication block Software component designed to be as ag-

    nostic as possible to the application archi-tecture, so that it can be easily reused bydifferent software applications.

    Assembly line Manufacturing process in which parts areadded to a product in a sequential man-ner using optimized logistic and operationsplans in order to create a finished productfaster.

    CFGmgr Configuration ManagerCLR Common Language Runtime — Virtual

    machine component of Microsoft’s .NETinitiative.

    CPU Central Processing Unit

    DAB Data Access Block — Application blockprovided by Microsoft Enterprise Libraryrelated to database access architecture.

    Daemon Computer program that runs as a back-ground process.

    Data collection Process of preparing, collecting and savingthe data generated by some type of source.

    Data mining Process of sorting through large amountsof data and picking out relevant informa-tion.

    Data warehouse Electronic repository of an organization’sstored data

    DDL Data Definition Language — Computerlanguage for defining data structures.

    Deadlock Situation that occurs when two or morecompeting actions are waiting for the otherto finish, and thus neither ever does.

    Design pattern General reusable solution to a commonlyoccurring problem in software design.

    xvii

  • Glossary

    DLL Dynamic Link LibraryDMS Decision Making System — Computer-

    based information system includingknowledge-based systems that supportdecision-making activities. Also known asDecision Support System (DSS).

    ECMA European Computer Manufacturers As-sociation — International and private(membership-based) standards organiza-tion for information and communicationsystems.

    EDA Equipment Data Acquisition — Collectionof SEMI standards for the semiconductorindustry to improve and facilitate commu-nication between data collection softwareapplications and factory equipments.

    EDC Engineering Data Collection — Engineer-ing processes and tools used in data collec-tion.

    Event-based programming Programming paradigm in which the flowof the program is determined by sensoroutputs, user actions, or messages fromother programs or threads.

    Folder monitoring Process of observing the contents of afolder and detect occurred changes in itsfiles or folders.

    Framework Reusable design of a software system de-scribed by a set of abstract classes andby the way instances of these classes col-laborate, allowing both the reutilization ofcode and design architecture, which con-siderably reduces the development effortneeded.

    GNU Computer operating system composed en-tirely of free software.

    GoF Gang of Four — The group of authorsformed by Erich Gamma, Richard Helm,Ralph Johnson, and John Vlissides.

    GUI Graphical User Interface

    xviii

  • Glossary

    HTML HyperText Markup Language — Markuplanguage that provides a means to describethe structure of text-based information ina document.

    IDE Integrated Development Environment —Software application that provides com-prehensive facilities to computer program-mers for software development.

    Integration tests Tests in which individual software modulesare combined and tested as a group.

    Interface A Alternative name used to EDA (Equip-ment Data Acquisition).

    Inversion of Control Abstract principle describing an aspectof some software architecture designs inwhich the flow of a system is inverted incomparison to the traditional architectureof software libraries.

    Locking (database) Mechanism used to prevent data from be-ing corrupted or invalidated when multi-ple users need to access a database con-currently.

    Lot (semiconductor) Set of semiconductor modules acting as asole unit.

    LotEquipParamsHistSrv Lot Equipment Parameters History Server

    Manufacturing equipment Semiconductor manufacturing equipmentconsists of manufacturing equipment usedin a clean room for the fabrication of semi-conductor chips, test equipment used inthe manufacturing and research and devel-opment environment and to test semicon-ductor manufacturing equipment, and fix-tures in place to support a semiconductorfabrication facility.

    Message center Global access point of a software applica-tion used to monitor flow of messages in-side the application and that guaranteesthe attending and dispatching of receivedmessages.

    Message handling Comprises the concepts of sending, receiv-ing and correctly routing messages in asoftware application.

    xix

  • Glossary

    MSMQ Microsoft Message Queueing — Messagingprotocol that allows applications runningon disparate servers to communicate in afailsafe manner.

    OCL Object Constraint Language — Declara-tive language for describing rules that ap-ply to UML models.

    OEDV Online Equipment Data Visualization —Tools used to online visualization of thedata generated by equipments.

    OMG Object Management Group — Consor-tium focused on modeling and model-based standards.

    OOP Object-oriented programming — Pro-gramming paradigm based on the useand interaction of different software unitsknown as “objects”.

    ORM Object-Relational Mapping — Program-ming technique for converting data be-tween incompatible type systems in rela-tional databases and object-oriented pro-gramming languages.

    PCB Printed Circuit Board — Board used tomechanically support and electrically con-nect electronic components using conduc-tive pathways etched from copper sheetslaminated onto a non-conductive sub-strate.

    PL/SQL Procedural Language / Structured QueryLanguage — Procedural extension to theSQL database language.

    Raw data Term used to refer unprocessed data (alsoknown as primary data).

    RDBMS Relational Database Management SystemRefactoring Rewriting of some pieces of software code

    to improve its performance or to increaseits understanding, without changing itsinitial meaning and behavior

    Reflection (programming) Reflection is the mechanism of discoveringclass information solely at runtime.

    xx

  • Glossary

    Rolling (log files) Rolling is a combination of rotation andtranslation operations used when adding anew log entry. Oldest entries are trans-lated (or deleted if necessary) in favor ofnew entries, keeping a log file always up-dated with the recent entries.

    RS-232 Recommended Standard 232 — Standardfor serial binary data signals commonlyused in computer serial ports.

    SAM Statistical Appearance Model — Tech-nique that detaches users from the param-eter adjustment of complex algorithms andthat makes a more systematic use of train-ing images during teaching steps.

    SECS/GEM SEMI Equipment Communications Stan-dard / Generic Equipment Module —Standard interface used in semiconductorindustry for equipment communications.

    SEMI Semiconductor Equipment and MaterialsInternational — International organiza-tion which main focus is the promotion ofthe semiconductor industry and associatemanufacturers of equipment and materialsused in the fabrication of semiconductordevices.

    Semiconductor industry Collection of business firms engaged in thedesign and fabrication of semiconductordevices.

    SGML Standard Generalized Markup Language— ISO Standard metalanguage used to de-fined markup languages for documents.

    SMT Surface Mounting Technology — Methodfor constructing electronic circuits inwhich the components are mounted di-rectly onto the surface of printed circuitboards.

    SMTP Simple Mail Transfer Protocol — Standardfor email transmissions across the Internet.

    SQL Structured Query Language — Databasecomputer language designed for the re-trieval and management of data inRDBMS.

    xxi

  • Glossary

    SVN Subversion — Version control system usedto maintain current and historical versionsof files such as source code, web pages, anddocumentation.

    TCP/IP Transmission Control Protocol / InternetProtocol — Set of communications proto-cols that implement the protocol stack onwhich the Internet and most commercialnetworks run. Also known as Internet Pro-tocol Suite.

    TDD Test Driven Development — Software de-velopment technique consisting of short it-erations where new test cases covering thedesired improvement or new functionalityare written first, then the production codenecessary to pass the tests is implemented,and finally the software is refactored to ac-commodate changes.

    Test automation Test automation is the use of software toolsto control the execution of tests automat-ically.

    TNS Transparent Network Substrate — Trans-parent layer that enables a heterogeneousnetwork consisting of different protocols tofunction as a homogeneous network.

    UML Unified Modeling Language — Objectmodeling and specification language usedin software engineering.

    Unit test Automated test that validates if individualunits of source code are working properly.

    W3C World Wide Web Consortium — Main in-ternational standards organization for theWorld Wide Web.

    Wrapper (software) Refers to a type of packaging that hidesimplementation code from end users andjust provides the required interfaces thatallow the execution of a wrapped function-ality.

    XML Extensible Markup Language — General-purpose specification for creating custommarkup languages.

    xxii

  • Glossary

    YODA Your Own Data Adapter — Middlewarecross-platform application that allows dif-ferent applications to exchange messagesin the same network.

    xxiii

  • Glossary

    xxiv

  • Chapter 1

    Introduction

    This first chapter introduces the relevant themes needed for a full understanding

    of both the current document and the project context. The chapter includes five

    different sections to describe the project summary, the project main motivations

    and expected goals, the methodology used and finally a last section to describe the

    report structure.

    1.1 Project Introduction

    The project Manufacturing Equipment Data Collection Framework - Bee Framework

    - is essentially a framework specification for collecting the data generated by a wide

    variety of equipments in an assembly line.

    Its subtitle, Bee Framework, is a pure analogy with what happens in the animal

    world of bee. Bees focus either on gathering nectar or on gathering pollen depending

    on demand from a wide variety of sources. Bees also play an important role in

    pollinating flowering plants, and are the major pollinator in ecosystems that contain

    flowering plants. It is estimated that one third of the human food supply depends

    on insect pollination, most of which is accomplished by bees [1]. Moreover, pollen

    and nectar can then be used to produce a large number of products, such as honey,

    candles and beeswax, among others, and its usage comprises areas like medicine,

    food, cosmetics, beebread, pollination and even pollution monitoring [2, 3].

    Similarly, the Bee Framework project is intended to provide the necessary tools

    required to perform data collection from a wide source of manufacturing equipments.

    The collected data needs then to be prepared and finally fed into the data analysis

    solutions that support and help improving the manufacturing process. In summary,

    1

  • Introduction

    Figure 1.1: Bee Framework logo

    data may be collected from many sources and may as well have many target desti-

    nations, just like the pollen and nectar collected by bees that can be used in many

    areas.

    Most of the generated data is available through equipment local databases or file

    systems. This means that this data needs to be collected by querying a database or

    by reading, parsing and interpreting files, respectively. Since there is a large number

    of different equipment types, integrating equipments and collecting the data they

    generate becomes an unpleasant and difficult task. The Bee Framework must then

    provide, most of all, the common methods to collect the generated data and the

    crucial services generally used while integrating equipment, such as database access,

    folder monitoring, communication, or logging, among others.

    The impact of such a framework would be considerable in the equipment integra-

    tion process, mostly because common requirements are general enough to be easily

    used, decreasing the amount of effort and time needed to integrate equipments in

    terms of collecting the data they generate.

    1.2 Project Motivation

    With the continuous development and improvement of existing and new data anal-

    ysis and data mining software solutions, there is an increasing need to collect data.

    However, these software tools only provide valid and reliable results if large amounts

    of data are considered, otherwise the correctness of achieved results can be question-

    able and controversial. Regarding this need, it becomes obvious that data collection

    plays a critical role in the early phase of the process.

    Nowadays, a growing variety of problem domains require the usage of this kind

    of work and decision tools in order to follow-up a continuous process optimization

    and control. In order to meet these demands from data analysis solutions, effective

    and powerful data collection tools are also needed.

    2

  • Introduction

    A data collection framework enclosing common functionalities generally used to

    collect data from a wide scope of sources and save the collected data into numerous

    possible targets can then be particularly useful and critical. A framework meeting

    these characteristics could then be used in a large set of problem domains with a

    small amount of effort and in a very short time.

    Moreover, since the problem domain considered is data collection in manufactur-

    ing equipments, there are still some several interface constraints and limitations in

    following international standards in the way equipments generate data and how these

    data can be collected. With such restrictions in the current scenario, it generally

    implies the development of new specific data collection solutions, without promoting

    reusability and decreasing developers’ productivity by keeping them doing similar

    approaches.

    1.3 Project Goals

    The main goal of this project is to come up with a detailed system specification

    of a framework to be used for data collection in manufacturing equipments in an

    assembly line. Additionally, the core requirements of the framework should also

    be implemented and an equipment integration should also be done as a proof of

    concept. The proposed solution must take in consideration some of the existing

    data collection solutions and a wide range of possible equipments, so that it can

    achieve a high-level of abstraction. This required level of abstraction will then make

    it possible to collect data from different equipment types without much integration

    effort.

    The data collection framework must consider the existence of common data col-

    lection functionalities and different approaches for integrating different equipment

    types. These common functionalities are considered the central part of the frame-

    work. They provide the necessary tools to promote code and components reusability

    while integrating equipments and must be general enough so that no changes need

    to be done when their usage is required.

    Finally, a prototype should be built as a proof of concept of the proposed solu-

    tion. This prototype should consider the core functionalities of data collection for

    manufacturing equipments. Since the data collection problem is mostly related to

    the equipment integration topic, the prototype should consider the integration of

    an equipment type used in the Qimonda1 assembly line in terms of collecting and

    saving the data it generates.

    1http://www.qimonda.com

    3

    http://www.qimonda.com

  • Introduction

    1.4 Approach Methodology and Constraints

    This section will firstly describe the approach methodology used during the project.

    Afterwards, some constraints that affected the normal progress of the project or

    that imposed some restrictions in terms of project decisions are also referred.

    The collection and analysis of equipment data collection requirements has been

    the first task. Usual data collection flows and main requirements related to data

    collection have been analyzed and some previous equipment integration solutions

    regarding data collection have been studied, so that the common functionalities

    could be identified.

    At the same time, a research on the state of the art technologies that could

    possibly be used in the prototype development has also been started, so that the

    decisions about the choice of appropriate technologies and applications could be

    made.

    After collecting the major requirements needed for a data collection framework

    and after studying both the previous work done about this theme and the technolo-

    gies used, the design of the solution started almost in parallel with the development

    of the framework core functionalities.

    When the development of these core functionalities has been successfully com-

    pleted, the prototype related to the effective data collection by integrating an equip-

    ment and collect the data it generates has then started.

    Throughout the prototype development, whenever it was possible a Test Driven

    Development (TDD) methodology has been followed and the SVN (Subversion)

    version control system has also been adopted. In addition, some notes about archi-

    tecture design and a high-level design document about the data collection regarding

    the AOI — Automatic Optical Inspection — equipments have been written. This

    equipment type and its data collection process will be detailed in Chapter 5.

    The constraints that affected the normal progress of the project are mostly re-

    lated to technological characteristics and decisions about the applications and tech-

    nologies to be used in the project. For example, both C# programming language

    and UML modeling can be considered as being project normative.

    These constraints are essentially correlated to software applications that could

    possibly be used and the lack of the required licenses for using them. This situation

    has lead to the usage of other software alternatives or, in most cases, to the usage

    of older releases of the desired software. These technological constraints will be

    detailed in the Technology Review section (section 3.1) of the State of the Art

    chapter (Chapter 3).

    4

  • Introduction

    1.5 Report Structure

    This report has been written and structured in order to help readers understanding

    the document and the project by performing a top-down analysis. This writing

    approach starts by giving readers a high-level overview of the project contents so

    that they can immediately understand the global concepts of the project and what

    it is intended to do. After this high-level overview, the document will then gradually

    disclose all the details concerning the problem analysis and design, as well as the

    current implementation of the proposed proof of concept solution.

    According to this approach, this initial chapter elucidates the readers about the

    project overview context, its main goals and motivation and the used approach

    methodology that lead to the project conclusion.

    Chapter 2 will present a more detailed analysis of the data collection problem.

    It also will refer the main motivations that lead to the origin of the problem and the

    expected results that may have a considerable impact improving the data collection.

    The global problem that concerns data collection will then be divided into smaller

    problems, detailing each one and presenting a review of the previous approaches.

    Chapter 3 will focus on the State of the Art. The first section of this chapter will

    present a technology review, focusing in the technologies and applications consid-

    ered. An individual description of each one of these technologies and applications

    will also be done. In addition, a comparative analysis relating these technologies

    and the main reasons that lead to their adoption or abandon will also be provided.

    The following section of this chapter will concern the previous work done regarding

    data collection in manufacturing equipments used in the assembly line.

    Chapter 4 will refer to the proposed Framework Architecture and Specification.

    Initially, this chapter will focus on the main requirements identified for the presented

    solution, especially regarding the data collection problem related to manufacturing

    equipments. The following section of this chapter will then detail the proposed

    architecture for fulfilling the requirements identified and explain the usage of design

    patterns to solve some of those requirements and needs.

    Chapter 5 will describe the project development and some of the technical deci-

    sions taken at implementation level. This chapter will mainly focus in the related

    characteristics of each requirement previously specified in the fourth chapter. More-

    over, the chapter will explain the overall framework development, the required con-

    figuration settings needed and finally how the application works. Since the project

    is about data collection, specifically data collection from manufacturing equipments,

    the development of the proof of concept regarding the equipment integration and

    the consequent data collection for one type of equipment as an example will also be

    considered in this chapter.

    5

  • Introduction

    In Chapter 6, the main findings will be discussed. Additionally, a comparison

    between the proposed solution and the previous approaches will also be done. This

    comparison will mostly comprise the integration of the same type of equipment

    using both approaches in terms of effort and time needed as well as performance

    evaluation.

    Finally, the main conclusions achieved after the project conclusion will be related

    in Chapter 7. Furthermore, the conclusions will also include some final recommenda-

    tions and perspectives of future work to improve and expand the proposed solution.

    6

  • Chapter 2

    Data Collection Problem Analysis

    This chapter presents a detailed description regarding the data collection problem,

    relating the problem with manufacturing equipments. The needs that lead to the

    origin of the problem and the expected results that may have a considerable impact

    in tasks related to data collection will also be considered. The chapter starts by

    giving an overview about the data collection problem in general, then refers to the

    data collection problem considering the semiconductor industry and finally details

    the problem taking in consideration the specific case data collection at Qimonda.

    2.1 Data Collection Overview

    People live in an information society in which the creation, distribution, diffusion,

    use, and manipulation of information is significant for almost every activity. More

    now than ever before, governments, industry and society need reliable information to

    make better decisions in tackling these problems [4]. Consequently, the importance

    of data collection mechanisms has been and still is growing considerably, so that the

    collected data can be used to provide the adequate information required by society.

    A data collection system collects data from the outside world and which main

    goal is to feed other systems with this data, such as decision making systems (DMS),

    usually for the purpose of controlling a system [5]. The concept beyond data

    collection means collecting data from a wide range of possible data sources and

    turn it into useful information that can be further used. This information can then

    be critical not only to control a process or a system, but also to provide a better

    understanding about the domain considered in the data collection process [6, 7].

    7

  • Data Collection Problem Analysis

    2.2 Data Collection in Semiconductor Industry

    The existing competitiveness in the semiconductor industry and the demands for

    high quality and very reliable memory products lead companies in this sector to

    adopt manufacturing processes based on a set of international standards, rules and

    conventions in order to guarantee final products that meet customer quality de-

    mands. The adoption of common standards is a sign of a mature industry. World-

    wide semiconductor companies, through fierce competitors in the marketplace, have

    shown remarkable willingness to cooperate in creating and adopting common stan-

    dards for factory automation [8].

    Manufacturing processes and equipments used in the semiconductor industry

    have been continually improved. Along with the technical advances of the semicon-

    ductor manufacturing processes themselves, factory productivity and efficient man-

    ufacturing control are key to a fab’s success. In fact, that success increasingly relies

    on the collection and analysis of growing amounts of detailed process, measurement

    and operational data from the equipment to improve yield, efficiency, productivity

    and more. As processes become more complex, it becomes more important to use

    the data to reduce process variation, minimize the impact of excursions, and improve

    overall equipment effectiveness [9].

    During all the phases of the manufacturing processes, products are exhaustively

    tested to ensure a high quality level. The equipments used either when the mem-

    ory products are being mounted either when they are being tested generate large

    amounts of data. This data is related to measurements, physical or electrical fail-

    ures detected, temperatures or humidity values, tensions and voltages, or optical

    inspections, for example.

    The data generated by these different equipment types is extremely important

    and plays a critical role in the improvement of manufacturing processes. Collection

    of real data is a vital step in managing a modern manufacturing organization. This

    data can be very useful since it provides Process engineers the required values that

    can help them better understand all the manufacturing process stages and help

    detect which steps can be improved and how these improvements should be achieved.

    Furthermore, information is available immediately so problems can be identified and

    corrected when a problem exists, not when it is noticed after a full day of incorrect

    production [10].

    In order to face the issues related to the semiconductor industry, an organization

    have been founded during the decade of 1970. This organization is the Semicon-

    ductor Equipment and Materials International — shortly SEMI — and its initial

    main focus was to divulge and promote the semiconductor industry and associate

    manufacturers of equipment and materials used in the fabrication of semiconductor

    8

  • Data Collection Problem Analysis

    devices such as integrated circuits, transistors, diodes, and thyristors [11]. Among

    other activities, SEMI acts as a “clearinghouse” for the generation of standards

    specific to the industry and the generation of long-range plans for the industry.

    The most well known standard developed by this organization is the SECS/GEM

    (SEMI Equipment Communications Standard / Generic Equipment Model). The

    SECS/GEM interface is the semiconductor industry’s standard for equipment-to-

    host communications. In an automated fab, the interface can start and stop equip-

    ment processing, collect measurement data, change variables and select recipes for

    products. The SECS/GEM standards do all this in a defined way, defining a common

    set of equipment behavior and capabilities [12].

    With the purpose of always trying to improve performance and productivity for

    semiconductor fabs and equipment used, a new standard named Equipment Data

    Acquisition (EDA) Interface, also known as Interface A, is available and ready to

    be deployed in manufacturing organizations. Whereas the SECS/GEM standards

    were created to improve tool control and to facilitate and support high levels of

    factory automation, the EDA standards focus on improving process monitoring and

    control, given the advancing technology and increasing complexity of semiconductor

    manufacturing processes [13].

    Although Interface A offers improved data ports over SECS/GEM, it does not

    replace the SECS/GEM standards, which pertain to equipment control and config-

    uration. It is also distinct from Interface B, which facilitates data sharing between

    applications, and Interface C, which provides remote access to equipment data. In-

    dustry adoption of Interface A has been gaining momentum, but more needs to be

    done to fully implement the standard across the industry [14].

    2.3 Data Collection at Qimonda

    However, even considering the adoption of international standards by the industry,

    some of the equipments used in Qimonda assembly lines are generically used not

    only by the semiconductor industry. Consequently, some of these equipments do

    not follow the SECS/GEM standards defined by the semiconductor industry, making

    equipment integration tasks harder to perform and also introduces many difficulties

    in the data collection process.

    The integration of such equipments lead to the development of specific data

    collection solutions for each equipment type considered. Since all these equipments

    follow different conventions and rules not only in the way they are used but also in

    how they generate data and how this data can be collected, it becomes very hard to

    promote consistency.

    9

  • Data Collection Problem Analysis

    This situation led Qimonda equipment integration team to the development of

    multiple and different approaches related to data collection from these manufac-

    turing equipments. These approaches are designed and planned almost exclusively

    taking in consideration a specific type of equipment, which causes a single and

    unique architecture design. Consequently, the amount of time and effort required

    from equipment integration team members increases, not only because they have

    to think and implement a new architecture solution each time a new equipment

    needs to be integrated but also because they have to document them. Obviously,

    the productivity of the team members is then negatively affected.

    The existing lack of consistency related to different integration architectures

    approaches not only led developers to focus on specific equipments instead of a global

    architecture but also led them to adapt some software components used in previous

    integrations approaches achieved. The reutilization can be considered a positive

    aspect but also has some negative consequences and turn downs. Since integration

    solutions are developed focusing a single specific equipment, similar components can

    not usually be directly reused without having to change and adapt it to the new

    equipment requirements.

    Consequently, these components have to be continuously adapted and devel-

    opers recurrently face the same problems. Additionally, the reutilization of such

    components also introduces some constraints related to technological issues. Old

    versions of these software components are commonly used and this fact limits some

    choices related to the technologies and implementation approaches used due to com-

    patibility requirements. This way, even when considering the development of new

    integration solutions, these solutions are limited since their beginning by out of date

    technologies, which can considerably affect both the performance of solutions and

    the maintenance effort needed.

    A project to evaluate Qimonda EDA has looked at the problem of factory-wide

    deployment of Interface A from a number of perspectives and has tried to incor-

    porate the goals of the EDC — Engineering Data Collection — refactoring into a

    comprehensive vision [15]. It is difficult to conclude with certainty how rapid and

    pervasive the adoption of the nascent EDA standards will be.

    Moreover, even considering this new standard, the problem related to data col-

    lection is still not solved because there are still some equipments not following the

    standards of the semiconductor industry and for which a data collection approach

    is needed.

    10

  • Chapter 3

    State of the Art

    This chapter introduces both the technologies used in the project and the previous

    work done to help solving the data collection problem described in the second chap-

    ter. Both introductions will be focused on the semiconductor industry, since some

    of the technologies used to create the framework are related to this sector.

    Additionally, some alternative technologies that could possibly be used to act

    upon data collection will also be considered in this chapter. However, once again,

    some decisions about different technological choices have been made considering the

    semiconductor industry, so taking a decision about replacing a technology by other

    one should always take this factor in consideration.

    3.1 Technology Review

    This section presents the main applications and technologies studied during the

    initial phase of the project and later used in the project development. Additionally,

    some alternative technologies and applications that were considered but not used in

    the project will also be described. For these technologies, the main focus are the

    pros and cons of using them in the data collection problem related to manufacturing

    equipments and which were the main reasons that lead to their abandon.

    3.1.1 Programming Languages and Tools

    3.1.1.1 C# — C Sharp

    C Sharp is a successfully adopted object-oriented programming language developed

    by Microsoft as part of the .NET initiative and later approved as a standard by

    ECMA [16]. C# 3.0 is the current version of the language and was released on

    19 November 2007 as part of .NET Framework 3.5, but due to some technical and

    11

  • State of the Art

    licensing aspects, 2.0 version of the language and the 2.0 .NET Framework have

    been used instead in this project. This programming language has a procedural,

    object-oriented syntax initially based in the C family of languages but also including

    very strong influences from several other aspects of programming languages (C++,

    Python and most notably Java) with a particular emphasis on code simplification.

    C# is intended to be a simple, modern, type-safe, general-purpose programming

    language which allows the development of robust and durable applications. The lan-

    guage includes strong type checking, array bounds checking, detection of attempts

    to use uninitialized variables, source code portability, exception handling and auto-

    matic garbage collection. This way, not only software robustness and durability are

    considered by the language, but it also helps programmers using it, increasing their

    productivity.

    C# is an object-oriented language, but it further includes support for component-

    oriented programming. Currently, software design relies more and more on software

    components [17]. Key to such components is that they present a programming

    model with properties, methods, and events; they also have attributes that provide

    declarative information about the component and incorporate their own documen-

    tation. One of the biggest advantages about using C# is that this language directly

    supports all these concepts, making it a very natural language to create and use

    software components.

    C# can be considered a very high-level programming language when considering

    other languages such as C or Assembly. Although C# applications are intended to

    be economical concerning memory and processing power requirements, the language

    cannot compete on performance with those low-level languages. C# applications,

    like all programs written for the .NET tend to require more system resources than

    functionally similar applications that access machine resources more directly.

    Microsoft Visual Studio1 has been the development environment chosen to de-

    velop the framework and the proof of concept. However, two different versions of

    this product from Microsoft have been considered: Microsoft Visual Studio 2005

    and Microsoft Visual Studio 2008.

    Figure 3.1: Visual Studio logo

    1http://www.microsoft.com

    12

    http://www.microsoft.com

  • State of the Art

    Microsoft Visual Studio 2008

    Visual Studio 2008 is the most recent version of this IDE — Integrated Development

    Environment — from Microsoft and has been recently released in November 2007.

    This version has been the one initially chosen and considered to develop the C#

    components of the framework. However due to licensing difficulties, this choice has

    been abandoned, which also has determined the choice of the .NET Framework and

    C# language versions [18].

    Microsof Visual Studio 2005

    Visual Studio 2005 is the ancestor version of the Microsoft IDE referred in the

    previous section and it was the one adopted as a development environment [19].

    This product supports the features introduced by the .NET Framework 2.0 version.

    Resharper

    Furthermore, a trial version of Resharper 2.0 has been used. Resharper is a product

    from JetBrains2 and is a refactoring add-in for Visual Studio which helps program-

    mers increase their productivity while developing. Resharper allows code comple-

    tion, easy refactoring, code analysis and assistance, code formatting, code generation

    and templates, and also easier code navigation [20].

    This work tool has been proven very useful, allowing an easier and quicker de-

    velopment of the framework components.

    Figure 3.2: Resharper logo

    NUnit

    NUnit3 is an open source unit-testing framework for all .Net languages. Initially

    ported from JUnit (used with the same purpose for Java), it is written entirely in

    C# and has been completely redesigned to take advantage of many .NET language

    features, for example custom attributes and other reflection related capabilities [21].

    This testing framework discovers test methods using code reflection and provides

    test automation to control the execution of unit tests, the comparison of actual

    outcomes to predicted outcomes, the setting up of test preconditions, and other test

    control and test reporting functions [22].

    2http://www.jetbrains.com3http://www.nunit.org

    13

    http://www.jetbrains.comhttp://www.nunit.org

  • State of the Art

    Since a Test Driven Development [23] agile methodology has been adopted, this

    unit-testing framework has been not only very useful but also fundamental in the

    development process.

    Figure 3.3: NUnit logo

    3.1.2 Modeling Languages and Tools

    3.1.2.1 UML — Unified Modeling Language

    UML4 is a standardized visual specification language for object modeling used in

    the software engineering field. UML is a general-purpose modeling language that

    includes a graphical notation used to create an abstract model of a system, referred

    to as a UML model. UML is the OMG’s — Object Management Group — most-

    used specification, and the way the world models not only application structure,

    behavior, and architecture, but also business process and data structure [24].

    Figure 3.4: UML logo

    Modeling is the designing of software applications before coding. Achieved mod-

    els are very helpful since they let us work at a higher level of abstraction, helping

    specification, visualization and documentation modeling tasks of software systems,

    including their structure and design [25].

    Its origin dates from 1994, when the large abundance of modeling languages was

    slowing down the adoption of object technology. A unified method was needed and a

    consortium with several organizations, named UML Partners, has been established

    in 1996 with the purpose of coming up with a specification of a unified modeling

    language. As a result of this collaboration, a strong UML 1.0 definition has been

    4http://www.uml.org

    14

    http://www.uml.org

  • State of the Art

    achieved. This modeling language was already well defined, expressive, powerful,

    and generally applicable. It was submitted to the OMG in January 1997 as an

    initial Request for Purposal response [26].

    UML has matured significantly since its early versions. Several minor revisions

    fixed shortcomings and bugs and the UML 2.0 major revision was adopted by the

    OMG in 2003. There are actually four parts to the UML 2.x specification: the

    Superstructure (defines the notation and semantics for diagrams and their model

    elements), the Infrastructure (defines the core metamodel on which the Superstruc-

    ture is based), the OCL — Object Constraint Language — (defines rules for model

    elements) and finally the UML Diagram Interchange (defines how UML 2 diagram

    layouts are exchanged). The current versions of these standards follow: UML Su-

    perstructure version 2.1.2, UML Infrastructure version 2.1.2, OCL version 2.0, and

    UML Diagram Interchange version 1.0.

    Visio

    Visio5 is a diagramming software originally developed by Vision Corporation, a

    company that has been bought by Microsoft in 2000. It uses vector graphics to

    create diagrams and follows the recent standards of UML modeling language.

    Visio provides a wide range of templates - business process flowcharts, network

    diagrams, workflow diagrams, database models, and software diagrams - that can

    be used to visualize and streamline business processes, track projects and resources,

    chart organizations, map networks, diagram building sites, and optimize systems

    [27].

    Visio 2007 is the most recent version of this software. The current version of

    Visio has been used to model all UML diagrams referring to framework specification

    and architecture.

    3.1.3 RDBMS and Database Tools

    3.1.3.1 Oracle

    Oracle Database6 is a relational database management system (RDBMS) commonly

    referred as simply Oracle which has become a major presence in database computing.

    Oracle Corporation, which is the company that produces and markets this database

    software, has been founded in late 70’s (1977) and since then many widespread

    computing platforms have come to use the Oracle database extensively, making the

    company to actually be the market leader [28].

    The last version of Oracle Database has been recently released and this software

    is currently in the 11g version. Once again, due to licensing and technical matters,

    5http://office.microsoft.com6http://www.oracle.com

    15

    http://office.microsoft.comhttp://www.oracle.com

  • State of the Art

    Figure 3.5: Oracle Corporation logo

    the version considered while developing this proof of concept is the 9i. Moreover,

    data collection generally involves large volumes of collected data, increasing the

    risks of performing migrations of current database migrations. However, database

    operations described in this report and used in the proof of concept should behave

    the same expected way in recent releases of this software, even considering the grid-

    computing technology that came with 10g or later versions.

    Unlike the C# programming language referred before, a powerful development

    database environment was not needed. Some database applications, such as SQL

    Navigator or PL/SQL Developer have been considered but SQL Developer, which

    already comes up with the Oracle Database Client, meet the necessary requirements.

    SQL Developer

    Oracle SQL Developer7 is a free graphical tool for database development. SQL

    Developer can be used to browse, create and modify database objects, run SQL

    statements and SQL scripts, and edit, run and debug PL/SQL statements. It can

    also run any number of provided reports, as well as create and save new types of

    reports. Additionally, SQL Developer allows to export / import data and DDL and

    supports version control. SQL Developer is a tool that enhances productivity and

    simplifies database development tasks [29].

    SQL*Loader

    SQL*Loader8 is a bulk loader utility used for moving data from external files into

    the Oracle database. It comes with some configurable loading options and supports

    various load formats, selective loading, and multi-table loads [30]. Its usage is

    particularly recommended to load large volumes of data into Oracle Database be-

    cause it consumes less resources, specially those related to time, memory and CPU

    processing.

    3.1.3.2 SQL Server, Access and PostgreSQL

    Microsoft SQL Server9, Microsoft Access10 and PostgreSQL11 have been considered

    as possible alternative database management systems. However, since data collec-

    7http://www.oracle.com/technology/products/database/sql_developer/8http://www.orafaq.com/wiki/SQL*Loader_FAQ9http://www.microsoft.com/SQL

    10http://office.microsoft.com/access11http://www.postgresql.org

    16

    http://www.oracle.com/technology/products/database/sql_developer/http://www.orafaq.com/wiki/SQL*Loader_FAQhttp://www.microsoft.com/SQLhttp://office.microsoft.com/accesshttp://www.postgresql.org

  • State of the Art

    tion can be very resource consuming, usually involving large volumes of data and

    strongly related to mechanisms of data warehousing and data mining, Oracle has

    been the natural database choice. The main reason for this choice is the well known

    robustness and efficiency of Oracle databases in such situations. Nevertheless, the

    data collection framework presented not only considers Oracle but also these al-

    ternative databases. This topic will be detailed and explained in Chapter 5.

    3.1.4 Data Persistence

    3.1.4.1 Enterprise Library

    Microsoft Enterprise Library12 or simply Enterprise Library is a collection of reusable

    software components for the Microsoft .NET Framework. These components are

    application blocks designed to assist software developers with common enterprise

    development challenges and problems that commonly are faced from one project to

    the next ones [31].

    Application blocks are designed to encapsulate the Microsoft recommended best

    practices for .NET applications. In addition, they can be added to .NET applica-

    tions quickly and easily [32]. Application blocks are a type of guidance, provided

    not only as source code but also as documentation that can be directly used, ex-

    tended, or modified by developers to use on complex, enterprise-level line-of-business

    development projects.

    This guidance is based on real-world experience and goes far beyond typical

    white-papers and sample applications. They provide proven architectures, produc-

    tion quality code, and recommended engineering best practices. The technical guid-

    ance is created, reviewed, and approved by a wide diversity of experienced people

    including Microsoft architects, partners and customers, engineering teams, consul-

    tants, and product support engineers. The result is a thoroughly engineered and

    tested set of recommendations that can be followed with confidence when building

    applications based on this guidance [33].

    Enterprise Library and its applications blocks provide an API to facilitate best

    practices in core areas of programming such as logging, validation, data access,

    exception handling, and many others. However, application blocks are designed to

    be as “agnostic” as possible to the application architecture, so that they can be

    easily reused in different contexts.

    Figure 3.6 shows the applications blocks available in the Enterprise Library 3.1

    release and illustrates their interdependencies. [32] Both Data Access and Logging

    Application Blocks have been used in the Bee Framework. Their usage will be

    12http://msdn.microsoft.com/entlib

    17

    http://msdn.microsoft.com/entlib

  • State of the Art

    Figure 3.6: Interdependencies of the Enterprise Library application blocks

    explained later in the Framework Specification and Architecture chapter (Chapter

    4) because they have significant impact in the proposed architecture.

    Amongst the main benefits of using Enterprise Library we can identify the pro-

    ductivity and testability enhancements: each of the application blocks provides

    several interfaces meant to satisfy common concerns and a level of isolation that

    allows individual testing of each block. Additionally, extensibility (developers can

    customize the application blocks and extend their functionality to suit own needs),

    consistency (design patterns are applied in similar fashion in all the blocks), ease of

    use and integration (application blocks can be used as pluggable components) are

    also strong advantages of using Enterprise Library [34].

    Enterprise Library 4.0 is the most recent version and has just been released

    recently in May 2008. This last release includes some bug corrections, new appli-

    cations blocks, some performance improvements, and already supports Microsoft

    18

  • State of the Art

    Visual Studio 2008 and the .NET Framework 3.5 [35]. However, since this last

    version has been released after the beginning of the project, the previous release

    of Enterprise Library, version 3.1 - May 2007, has been used instead. Moreover,

    as described before in the C# Technology Review section, Visual Studio 2005 and

    .NET Framework 2.0 have been used, so using the last release of Enterprise Library

    would be impossible due to software requirements.

    3.1.4.2 NHibernate

    NHibernate13 is a port of the famous Hibernate Core for Java to the .NET Frame-

    work [36]. It is an Object-Relational Mapping (ORM) solution that provides an

    easy to use framework for mapping an object-oriented domain module to a tradi-

    tional relational database, handling persisting plain .NET objects to and from the

    underlying database.

    With this support to transparent persistence, object classes do not have to fol-

    low a restrictive programming module. These persistent classes do not need to

    implement any interface or inherit from a special base class. Just by giving a XML

    description of the entities and relationships, NHibernate automatically generates

    the necessary SQL for loading and storing the objects. This characteristic makes

    it possible to design the business logic using plain .NET (CLR — Common Lan-

    guage Runtime) objects and object-oriented idiom. This object-oriented approach

    relieves the developer from a significant amount of relational data persistence-related

    programming tasks.

    NHibernate is free as open source software that is distributed under the GNU

    Lesser General Public License and its most recent version number is 1.2.1. NHiber-

    nate 2.0 is currently under development [37].

    However, despite these apparent advantages and interesting features, NHiber-

    nate has been abandoned in favor of Microsoft Enterprise Library, described in the

    previous section. The reason for this choice is inspired by expressions as “no solu-

    tion is final solution” and “all have pros and cons” because it depends on the the

    data access layer architecture and how this layer is implemented. Each case must

    be individually considered in order to determine whether a technology / application

    is better or worst than other one.

    The main reason for this decision is the type of application considered: a data

    collection framework. When referring to a data collection framework, it is expected

    that all database accesses can be abstract, not depending on which type of database

    is used, which tables exists, which columns each table contains, and so on. Such a

    13http://www.hibernate.org

    19

    http://www.hibernate.org

  • State of the Art

    framework is expected to have a high level of abstraction, providing a wrapper that

    can execute any kind of query, stored procedure, or transaction different databases.

    Of course, each query, stored procedure or transaction must be defined when using

    the framework services but the way of using database services remain exactly the

    same whatever the database is.

    By choosing NHibernate it would not be possible to attend the required universal

    way because it would be necessary to generate the mapping XML files, then gener-

    ate the SQL used for database access and finally generate all object-oriented class

    files. Even considering the usage of automatic code generation tools for NHibernate,

    such as MyGeneration14 or CodeSmith15, some adjustments in XML files and object

    classes must usually be done specially when dealing with complex entity relation-

    ships. This situation increases the complexity and difficulty of maintainability tasks

    because it is easy to introduce an error and very hard to detect its origin.

    Unlike NHibernate, the Data Access Block (DAB) from Enterprise Library makes

    calling stored procedures very easy and uniform. The DAB manages the state of

    existing database connections and also provides the required uniform way for data

    access operations, making all the code look and behave similarly [38]. Moreover,

    Enterprise Library supports multiple database types (Oracle, SQL Server, DB2, or

    Access for example): due to its abstraction level regarding databases types and due

    to the usage of multiple abstract factories design patterns, there is no need to change

    a single line of code if the database type changes.

    Additionally, Enterprise Library supports applications using multiple databases

    and provides a simple way of choosing and alternating between all the configured

    databases and connection strings.

    3.1.5 Communication Technologies

    3.1.5.1 TIBCO RendezVous

    TIBCO Rendezvous16 is a software product from TIBCO Company that allows

    messaging interchanging between different applications. It is a very efficient, robust,

    reliable, scalable product and is the leading low latency-messaging product for real

    time throughput data distribution applications. It is a widely deployed, supported,

    and proven low latency messaging solution on the market today [39].

    TIBCO Rendezvous can be integrated with external components and provides

    different Application Programming Interfaces (API) to support the development of

    applications in different programming languages.

    14http://www.mygenerationsoftware.com15http://www.codesmithtools.com16http://www.tibco.com

    20

    http://www.mygenerationsoftware.comhttp://www.codesmithtools.comhttp://www.tibco.com

  • State of the Art

    The basic message passing is conceptually simple. A message has a single subject

    composed of elements separated by periods and has some message parameters, each

    one following the name-value-type paradigm [40]. The message is then sent to a

    single Rendezvous Daemon and a listener announces its subjects of interest to a

    Daemon (with a wildcard facility). The messages with matching subjects are then

    delivered to that Daemon [41].

    The main components for an application using TIBCO Rendezvous are the fol-

    lowing:

    • the messages and their content parameters;

    • the events related to the subscription, sending and receiving of messages;

    • finally, the transport and the logical connection between different applications,which includes the connection settings.

    3.1.5.2 YODA

    YODA is a middleware software solution developed by Infineon Technologies17.

    YODA stands for Your Own Data Adapter and is a set of components and libraries

    which are application, platform and technology independent. These components

    and libraries have defined a set of well-defined rules and conventions [42].

    YODA main goal is to achieve an efficient and complete integration between

    distributed applications, using a reliable and quick method to exchange messages

    between them. YODA is likely an internal network based protocol that allows differ-

    ent applications running on different platforms to intercommunicate and exchange

    information via YODA messages.

    It is a high-level layer based on TIBCO Rendezvous software that allows appli-

    cations to communicate through a network. YODA provides a uniform and easy

    way to send and receive messages. The main advantage of using YODA is that

    the communication between different applications is done by sending and receiving

    messages via a network and both the sender and the receiver applications do not

    need to know their locations on the network.

    An application subscribes the messages by creating a transport, IfxTransport,

    and specifying the subjects it wants to receive. When a message is available in the

    network, it will be delegated accordingly to the applications that have subscribed

    that message subject. These applications only have to install an event handler, so

    that they can receive the delegated messages and process them.

    17http://www.infineon.com

    21

    http://www.infineon.com

  • State of the Art

    3.1.5.3 Microsoft Message Queuing

    Microsoft Message Queuing (MSMQ) is a technology provided by Microsoft that

    enables applications running at different times to communicate across different net-

    works and systems. The main advantages of this technology are that it guarantees

    message delivery, efficient routing, security and priority-based messaging. Addi-

    tionally, it also can be used for implementing solutions for both synchronous and

    asynchronous messaging scenarios, which means it also supports systems that may

    be temporarily offline [43].

    MSMQ is a middleware tool, not responsible for passing the messages themselves

    bit by bit; this middleware leaves that low level work to already existing standards

    and only provides a friendly interface API to help developers. Each computer par-

    ticipating in the distributed application needs a message queue, which allows the

    application to send asynchronous messages to a disconnected computer [44].

    However, there is no need to use an advanced technology like MSMQ to support

    communication, since most of its features are not required by the framework. Addi-

    tionally, all the communications between different applications are already supported

    by YODA, which is widely used in Qimonda universe.

    3.1.6 Markup Languages

    3.1.6.1 XML

    XML stands for Extensible Markup Language and it is a W3C18 recommendation.

    XML was developed by an XML Working Group (originally known as the SGML19

    Editorial Review Board) formed under the auspices of the W3C in 1996 [45].

    XML is a simple and very flexible text format originally designed to meet the

    challenges of large-scale electronic publishing, XML is also playing an increasingly

    important role in the exchange of a wide variety of data on the Web and elsewhere

    [46]. XML is a markup language much like HTML, but XML is not a replacement for

    HTML since they were designed with different goals. XML was designed to transport

    and store data, with focus on what data is; HTML was designed to display data,

    with focus on how data looks [47].

    XML documents are made up of storage units called entities, which contain either

    parsed or unparsed data. Parsed data is made up of characters, some of which form

    character data, and some of which form markup. Markup encodes a description of

    the document’s storage layout and logical structure. XML provides a mechanism

    to impose constraints on the storage layout and logical structure. Unparsed data is

    made by contents that may or not be text, and if text, may be other than XML.

    18World Wide Web Consortium19Standard Generalized Markup Language

    22

  • State of the Art

    XML language can then be used to describe any kind of data type because tags

    are not predefined; used tags must be defined as needed, which makes XML to be

    classified as an extensible language. This characteristic makes XML documents self-

    descriptive because these documents are easier and intuitive to understand since

    they are relatively human-legible and reasonably clear [48].

    Amongst its main purposes are the facility of sharing and transport structured

    data across different information systems (interoperability), the encoding of docu-

    ments and the serialization of data. This data is stored in plain text format, which

    provides a software and hardware independent way of storing data. This makes XML

    straightforwardly usable because it is much easier to create data that different ap-

    plications can share. Moreover, XML documents should not only be easy to create,

    but the design of XML documents should also be formal, concise and quickly pre-

    pared. These characteristics also help reducing the complexity of exchanging data

    between incompatible systems, since the data can be read by different incompatible

    applications [49].

    3.1.6.2 XPath

    XML Path or shortly XPath20 is a language for finding information in an XML

    document. XPath is used to navigate through elements and attributes in an XML

    document [50]. In addition, XPath may be used to compute values (strings, num-

    bers, or boolean values) from the content of an XML document. XPath became a

    W3C Recommendation in November 1999 and the current version of the language

    is XPath 2.0 [51].

    XPath operates on the abstract, logical structure of an XML document, rather

    than its surface syntax. The XPath language is based on a tree representation of

    the XML document, and provides the ability to navigate around the tree, selecting

    nodes by a variety of criteria. XPath has a natural subset that can be used for

    matching (testing whether or not a node matches a pattern) [52].

    XPath gets its name from its use of a path notation for navigating through the

    hierarchical structure of an XML document. XPath uses path expressions to select

    nodes or node-sets in an XML document. These path expressions look very much

    like the expressions you see when you work with a traditional computer file system.

    3.2 Previous Work

    This section concerns some of the previous work done regarding the data collection

    in manufacturing equipments used in an assembly line. It presents the state of the

    20http://www.w3.org/TR/xpath

    23

    http://www.w3.org/TR/xpath

  • State of the Art

    art in terms of how data is collected from files generated by equipments, how the

    questions related to database concurrency accesses are resolved and how commu-

    nication is handled between both equipments and applications and also between

    different components of the same application.

    3.2.1 Collecting Data from Files

    Some manufacturing equipments generate data and save it into files. These files are

    usually saved in a folder configured in the data collection application settings. The

    folder containing these generated files is usually accessed via a network mapping

    using the TCP/IP protocol, which allows to access these remote files just like if

    they were in the same computer of the data collection application.

    This way, the problem related to accessing these files is solved. However, data

    collection approaches used at Qimonda for collecting files have some limitations,

    especially because applications do not know neither the exact moment a new file is

    created neither when the file becomes available for use and unlocked by the equip-

    ment software. Consequently, a periodic approach must be used. Depending on the

    frequency equipment generates data, the time value used between two consecutive

    folder inspections is adjusted. Because of these periodic inspections, it is impossible

    to know a priori which files have been generated between two consecutive inspec-

    tions, so each inspection needs to check all the existing files and folders existing in

    the mapped network folder. This way, a list containing the files and folders existing

    inside a directory must be kept in memory, so that comparisons between two con-

    secutive inspections can be made. Additionally, the list should always be updated

    at the end of each inspection, so that it can be used in the next inspection.

    Another important point related to collecting data from files is related to file

    contents. File contents need to be parsed and specific parsers must be configured

    to match the requirements of each specific equipment. These parsing approaches

    commonly used implement a sequential parsing of files, which leads to less tolerant

    parsers if errors occur and also makes harder to find the desired information inside

    the file contents.

    3.2.2 Database Concurrency

    Databases play a critical role in the data collection process: some of the data may

    be collected from equipment local databases and mostly because the main target for

    the data collected is usually a database.

    However, database accesses should be handled carefully because there is the

    possibility of having many reads and writes operations using the same data rows

    at the same type. This may be potentially dangerous due to concurrency problems

    24

  • State of the Art

    related to concurrent accesses. These concurrent acce