thesis haskell

Download Thesis haskell

Post on 21-Dec-2015

15 views

Category:

Documents

0 download

Embed Size (px)

DESCRIPTION

tesis about haskell

TRANSCRIPT

  • Charles University in Prague

    Faculty of Mathematics and Physics

    DOCTORAL THESIS

    Milan Straka

    Functional Data Structures

    and Algorithms

    Computer Science Institute of Charles University

    Supervisor of the thesis: doc. Mgr. Zdenk Dvok, Ph.D.

    Study programme: Computer Science

    Specialization: Discrete Models and Algorithms (4I4)

    Prague 2013

  • I am grateful to Zdenk Dvok for his support. He was very accommodative

    during my studies. He quickly discovered any errors in my early conjectures and

    suggested areas of interest that prove rewarding.

    I would also like to express my sincere gratitude to Simon Peyton Jones for

    his supervision and guidance during my internship in Microsoft Research Labs,

    and also for helping me with one of my papers. Our discussions were always very

    intriguing and motivating.

    Johan Tibell supported my work on data structures in Haskell. His enthusiasm

    encouraged me to overcome initial hardships and continues to make the work

    really enjoyable.

    Furthermore, I would like to thank to Michal Kouck for comments and dis-

    cussions that improved the thesis presentation considerably.

    Finally, all this would not be possible without my beloved wife and my parents.

    You make me very happy, Jana.

    iii

  • iv

  • I declare that I carried out this doctoral thesis independently, and only with the

    cited sources, literature and other professional sources.

    I understand that my work relates to the rights and obligations under the Act

    No. 121/2000 Coll., the Copyright Act, as amended, in particular the fact that

    the Charles University in Prague has the right to conclude a license agreement

    on the use of this work as a school work pursuant to Section 60 paragraph 1 of

    the Copyright Act.

    In Prague, date 12th August 2013 .....................................................

    signature of the author

    v

  • vi

  • Nzev prce: Funkcionln datov struktury a algoritmy

    Autor: Milan Straka

    stav: Informatick stav Univerzity Karlovy

    Vedouc doktorsk prce: doc. Mgr. Zdenk Dvok, Ph.D, Informatick stav

    Univerzity Karlovy

    Abstrakt: Funkcionln programovn je rozen a stle vce oblben programo-

    vac paradigma, kter nachz sv uplatnn i v prmyslovch aplikacch. Datov

    struktury pouvan ve funkcionlnch jazycch jsou pevn perzistentn, co

    znamen, e pokud jsou zmnny, zachovvaj sv pedchoz verze. Clem tto

    prce je rozit teorii perzistentnch datovch struktur a navrhnout efektivn

    implementace tchto datovch struktur pro funkcionln jazyky.

    Bezpochyby nejpouvanj datovou strukturou je pole. Akoli se jedn

    o velmi jednoduchou strukturu, neexistuje jeho perzistentn protjek s konstantn

    sloitost pstupu k prvku. V tto prci popeme zjednoduenou implementaci

    perzistentnho pole s asymptoticky optimln amortizovanou asovou sloitost

    (log log n) a pedevm tm optimln implementaci se sloitost v nejhorm

    ppad. Tak ukeme, jak efektivn rozpoznat a uvolnit nepouvan verze

    perzistentnho pole.

    Nejvkonnj datov struktury nemus bt vdy ty, kter jsou zaloeny na

    asymptoticky nejlepch strukturch. Z toho dvodu se tak zamme na imple-

    mentaci datovch struktur v ist funkcionlnm programovacm jazyku Haskell

    a podstatn zlepme standardn knihovnu datovch struktur jazyka Haskell.

    Klov slova: perzistentn datov struktury, perzistentn pole, algoritmy se sloi-

    tost v nejhorm ppad, ist funkcionln datov struktury, Haskell

    vii

  • viii

  • Title: Functional Data Structures and Algorithms

    Author: Milan Straka

    Institute: Computer Science Institute of Charles University

    Supervisor of the doctoral thesis: doc. Mgr. Zdenk Dvok, Ph.D, Computer

    Science Institute of Charles University

    Abstract: Functional programming is a well established programming paradigm

    and is becoming increasingly popular, even in industrial and commercial appli-

    cations. Data structures used in functional languages are principally persistent,

    that is, they preserve previous versions of themselves when modied. The goal

    of this work is to broaden the theory of persistent data structures and devise

    ecient implementations of data structures to be used in functional languages.

    Arrays are without any question the most frequently used data structure.

    Despite being conceptually very simple, no persistent array with constant time

    access operation exists. We describe a simplied implementation of a fully per-

    sistent array with asymptotically optimal amortized complexity (log logn) and

    especially a nearly optimal worst-case implementation. Additionally, we show

    how to eectively perform a garbage collection on a persistent array.

    The most ecient data structures are not necessarily based on asymptotically

    best structures. On that account, we also focus on data structure implementations

    in the purely functional language Haskell and improve the standard Haskell data

    structure library considerably.

    Keywords: persistent data structures, persistent arrays, worst-case algorithms,

    purely functional data structures, Haskell

    ix

  • x

  • Contents

    1 Introduction 1

    1.1 Functional programming . . . . . . . . . . . . . . . . . . . . . . . 1

    1.2 Persistent Data Structures . . . . . . . . . . . . . . . . . . . . . . 5

    1.3 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . 9

    I Persistent Data Structures 13

    2 Making Data Structures Persistent 15

    2.1 Path Copying Method . . . . . . . . . . . . . . . . . . . . . . . . 17

    2.2 Making Linked Structures Persistent . . . . . . . . . . . . . . . . 18

    2.3 Making Linked Structures Persistent in the Worst Case . . . . . . 25

    2.4 Making Amortized Structures Persistent . . . . . . . . . . . . . . 29

    3 Navigating the Version Tree 39

    3.1 Linearizing the Version Tree . . . . . . . . . . . . . . . . . . . . . 40

    3.2 List Labelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    3.3 List Order Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    4 Dynamic Integer Sets 53

    4.1 Van Emde Boas Trees . . . . . . . . . . . . . . . . . . . . . . . . 54

    4.2 Exponential Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

    5 Persistent Arrays 63

    5.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

    5.2 Lower Bound on Persistent Array Lookup . . . . . . . . . . . . . 67

    5.3 Amortized Persistent Array . . . . . . . . . . . . . . . . . . . . . 69

    5.4 Worst-Case Persistent Array . . . . . . . . . . . . . . . . . . . . . 74

    5.5 Improving Complexity of Persistent Array Operations . . . . . . . 76

    5.6 Garbage Collection of a Persistent Array . . . . . . . . . . . . . . 78

    xi

  • xii CONTENTS

    II Purely Functional Data Structures 85

    6 Persistent Array Implementation 87

    6.1 Fully Persistent Array Implementation . . . . . . . . . . . . . . . 87

    6.2 Choosing the Best Branching Factor . . . . . . . . . . . . . . . . 90

    7 BB- Trees 95

    7.1 BB- Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

    7.2 Rebalancing BB- Trees . . . . . . . . . . . . . . . . . . . . . . . 99

    7.3 Choosing the Parameters , and . . . . . . . . . . . . . . . . 101

    7.4 BB- Trees Height . . . . . . . . . . . . . . . . . . . . . . . . . . 106

    7.5 Performance of BB- Trees . . . . . . . . . . . . . . . . . . . . . 107

    7.6 Reducing Memory by Utilizing Additional Data Constructor . . . 109

    8 The Haskell containers Package 113

    8.1 The containers Package . . . . . . . . . . . . . . . . . . . . . . 114

    8.2 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

    8.3 Improving the containers Performance . . . . . . . . . . . . . . 128

    8.4 New Hashing-Based Container . . . . . . . . . . . . . . . . . . . . 137

    9 Conclusion 147

    Bibliography 149

    List of Terms and Abbreviations 159

    List of Figures 161

    Attachments 163

    A.1 Generating Figure 7.3 . . . . . . . . . . . . . . . . . . . . . . . . 163

    A.2 Packages Used in Chapter 8 . . . . . . . . . . . . . . . . . . . . . 164

  • Chapter 1

    Introduction

    Computer programming has been developing enormously ever since the rst

    high-level languages were created,1 and several fundamental approaches to com-

    puter programming, i.e., several programming paradigms, have been designed.

    The prevalent approach is the imperative programming paradigm, represented for

    example by the wide-spread C language.

    The imperative paradigm considers a computer program to be a sequence of

    statements that change the program state. In other words, serial orders (imper-

    atives) are given to the computer.

    The declarative programming represents a contrasting paradigm to the im-

    perative programming. The fundamental principle of declarative approach is

    describing the problem instead of dening the solution, allowing the program to

    express what should be accomplished instead of how should it be accomplished.

    The logic of the computation is described without dependence on control ow, as

    opposed to the imperative programing, where the control ow is a fundamental

    part of any program.

    One of the well established way of realizing the declarative paradigm is func-

    tional programming.

    1.1 Functional programming

    Functional programming treats computations as evaluations of mathematic func-

    t