learned index structures - cornell university...learned index structures bigtable research review...

84
Learned Index Structures Bigtable Research Review Meeting Presented by Deniz Altinbuken January 29, 2018 paper by Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, Neoklis Polyzotis go/learned-index-structures-presentation

Upload: others

Post on 31-Jan-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

  • Learned Index Structures

    Bigtable Research Review MeetingPresented by Deniz Altinbuken

    January 29, 2018

    paper by Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, Neoklis Polyzotis

    go/learned-index-structures-presentation

    http://go/learned-index-structures-presentation

  • Objectives1. Show that all index structures can be replaced with deep

    learning models: learned indexes.2. Analyze under which conditions learned indexes

    outperform traditional index structures and describe the main challenges in designing learned index structures.

    3. Show that the idea of replacing core components of a data management system through learned models can be very powerful.

  • Claims● Traditional indexes assume worst case data distribution so

    that they can be general purpose.○ They do not take advantage of patterns.

    ● Knowing the exact data distribution enables highly optimizing any index the database system uses.

    ● ML opens up the opportunity to learn a model that reflects the patterns and correlations in the data and thus enable the automatic synthesis of specialized index structures:learned indexes.

  • A model can learn the sort order or structure of lookup keys and use this signal to effectively predict the position or existence of records.

    Main Idea

  • Background

    Learned Index Structures

    Results

    Conclusion

  • Background

  • Neural Networks: An ExampleRecognizing handwriting● Very difficult to express our intuitions such as "9 has a

    loop at the top, and a vertical stroke in the bottom right".● Very difficult to create precise rules and solve this

    algorithmically.○ Too many exceptions, special cases.

    Back

    grou

    ndCo

    nclu

    sion

    Lear

    ned

    Inde

    x St

    ruct

    ures

    Resu

    lts

  • Neural Networks: An ExampleNeural networks approach the problem in a different way.

    ● Take a large number of handwritten digits: training data.

    ● Develop a system which can learn from the training data.

    Back

    grou

    ndCo

    nclu

    sion

    Lear

    ned

    Inde

    x St

    ruct

    ures

    Resu

    lts

  • Neural networks approach the problem in a different way.

    Neural Networks: An Example

    Automatically infer rules for recognizing handwritten digits by going through

    examples!

    Back

    grou

    ndCo

    nclu

    sion

    Lear

    ned

    Inde

    x St

    ruct

    ures

    Resu

    lts

  • Neural networks approach the problem in a different way.

    Neural Networks: An Example

    Create a network of neurons that can learn! :)

    Back

    grou

    ndCo

    nclu

    sion

    Lear

    ned

    Inde

    x St

    ruct

    ures

    Resu

    lts

  • Neurons: PerceptronA perceptron takes several binary inputs, x1,x2,… and produces a single binary output:

    The output is computed as a function of the inputs, where weights w1,w2,… express the importance of inputs to the output.

    x1

    x2

    x3

    output

    w1 w2 w3

    Back

    grou

    ndCo

    nclu

    sion

    Lear

    ned

    Inde

    x St

    ruct

    ures

    Resu

    lts

  • The output is determined by whether the weighted sum ∑jwjxj is less than or greater than some threshold value.

    Just like the weights, the threshold is a number which is a parameter of the neuron. If the threshold is reached, the neuron fires.

    Neurons: Perceptron

    x1

    x2

    x3

    output

    w1 w2 w3

    t

    Back

    grou

    ndCo

    nclu

    sion

    Lear

    ned

    Inde

    x St

    ruct

    ures

    Resu

    lts

  • Neurons: PerceptronThe output is determined by whether the weighted sum ∑jwjxj is less than or greater than some threshold value.

    Just like the weights, the threshold is a number which is a parameter of the neuron. If the threshold is reached, the neuron fires.

    0 if ∑j wjxj ≤ thresholdoutput = 1 if ∑j wjxj > threshold

    Back

    grou

    ndCo

    nclu

    sion

    Lear

    ned

    Inde

    x St

    ruct

    ures

    Resu

    lts

  • A more common way to describe a perceptron is:

    ● ∑jwjxj ● -threshold bias

    Neurons: Perceptron

    0 if w⋅x + bias ≤ 0output = 1 if w⋅x + bias > 0

    0 if ∑j wjxj ≤ thresholdoutput = 1 if ∑j wjxj > threshold

    Bias describes how easy it is to get the neuron to fire.

    w⋅x

    Back

    grou

    ndCo

    nclu

    sion

    Lear

    ned

    Inde

    x St

    ruct

    ures

    Resu

    lts

  • Neurons: Perceptron● By varying the weights and the threshold, we get different

    models of decision-making.● A complex network of perceptrons that uses layers can

    make quite subtle decisions.

    outputinputs

    Back

    grou

    ndCo

    nclu

    sion

    Lear

    ned

    Inde

    x St

    ruct

    ures

    Resu

    lts

  • Neurons: Perceptron● By varying the weights and the threshold, we get different

    models of decision-making.● A complex network of perceptrons that uses layers can

    make quite subtle decisions.

    outputinputs

    1st layer 2nd layer

    Back

    grou

    ndCo

    nclu

    sion

    Lear

    ned

    Inde

    x St

    ruct

    ures

    Resu

    lts

  • Neurons: Perceptron● By varying the weights and the threshold, we get different

    models of decision-making.● A complex network of perceptrons that uses layers can

    make quite subtle decisions.

    output

    input layer output layerhidden layers

    inputs

    Back

    grou

    ndCo

    nclu

    sion

    Lear

    ned

    Inde

    x St

    ruct

    ures

    Resu

    lts

  • Neurons: Perceptron

    Perceptrons are great for decision making.

    Back

    grou

    ndCo

    nclu

    sion

    Lear

    ned

    Inde

    x St

    ruct

    ures

    Resu

    lts

  • Neurons: Perceptron

    How about learning?

    Back

    grou

    ndCo

    nclu

    sion

    Lear

    ned

    Inde

    x St

    ruct

    ures

    Resu

    lts

  • Neurons: PerceptronEarlier

    Automatically infer rules for recognizing handwritten digits by going through

    examples!

    Back

    grou

    ndCo

    nclu

    sion

    Lear

    ned

    Inde

    x St

    ruct

    ures

    Resu

    lts

  • Learning● A neural network goes through examples to learn weights

    and biases so that the output from the network correctly classifies a given digit.

    ● When a small change is made in some weight or bias in the network if this causes a small corresponding change in the output from the network, the network can learn.

    Back

    grou

    ndCo

    nclu

    sion

    Lear

    ned

    Inde

    x St

    ruct

    ures

    Resu

    lts

  • Learning● A neural network goes through examples to learn weights

    and biases so that the output from the network correctly classifies a given digit.

    ● When a small change is made in some weight or bias in the network if this causes a small corresponding change in the output from the network, the network can learn.

    Trying to create the right mapping for all cases.

    Back

    grou

    ndCo

    nclu

    sion

    Lear

    ned

    Inde

    x St

    ruct

    ures

    Resu

    lts

  • Learning

    The neural network is “trained” by adjusting weights and biases to find the perfect model that would generate the

    expected output for the “training data”.

    Back

    grou

    ndCo

    nclu

    sion

    Lear

    ned

    Inde

    x St

    ruct

    ures

    Resu

    lts

  • Learning

    Through training you minimize the prediction error.

    (But having perfect output is difficult.)

    Back

    grou

    ndCo

    nclu

    sion

    Lear

    ned

    Inde

    x St

    ruct

    ures

    Resu

    lts

  • Neurons: Sigmoid● Sigmoid neurons are similar to perceptrons, but modified

    so that small changes in their weights and bias cause only a small change in their output.

    output + Δoutput inputs

    w + Δw

    Small Δ in any weight or bias causes a small Δ in the output!

    Back

    grou

    ndCo

    nclu

    sion

    Lear

    ned

    Inde

    x St

    ruct

    ures

    Resu

    lts

  • Neurons: Sigmoid● A sigmoid takes several inputs, x1,x2,… which can be

    any real number between 0 and 1 (i.e. 0.256) and produces a single output, which can also be any real number between 0 and 1.

    output = σ(w⋅x + bias)

    Back

    grou

    ndCo

    nclu

    sion

    Lear

    ned

    Inde

    x St

    ruct

    ures

    Resu

    lts

  • Neurons: Sigmoid● A sigmoid takes several inputs, x1,x2,… which can be

    any real number between 0 and 1 (i.e. 0.256) and produces a single output, which can also be any real number between 0 and 1.

    output = σ(w⋅x + bias)

    σ(z) = 11 + e-z

    sigmoid function

    Back

    grou

    ndCo

    nclu

    sion

    Lear

    ned

    Inde

    x St

    ruct

    ures

    Resu

    lts

  • Neurons: Sigmoid● A sigmoid takes several inputs, x1,x2,… which can be

    any real number between 0 and 1 (i.e. 0.256) and produces a single output, which can also be any real number between 0 and 1.

    output = σ(w⋅x + bias)

    Back

    grou

    ndCo

    nclu

    sion

    Lear

    ned

    Inde

    x St

    ruct

    ures

    Resu

    lts

    Great for representing probabilities!

  • Neurons: ReLU (Rectified Linear Unit)● Better for deep learning because it preserves the

    information from earlier layers better as it goes through hidden layers.

    outputinputs

    Back

    grou

    ndCo

    nclu

    sion

    Lear

    ned

    Inde

    x St

    ruct

    ures

    Resu

    lts

  • Neurons: ReLU (Rectified Linear Unit)● Better for deep learning because it preserves the

    information from earlier layers better as it goes through hidden layers.

    0 if x ≤ 0output = x if x > 0

    Back

    grou

    ndCo

    nclu

    sion

    Lear

    ned

    Inde

    x St

    ruct

    ures

    Resu

    lts

  • Activation Functions (Transfer Functions)

    To get an intuition about the neurons, it helps to see the shape of the activation function.

    Back

    grou

    ndCo

    nclu

    sion

    Lear

    ned

    Inde

    x St

    ruct

    ures

    Resu

    lts

  • Learned Index Structures

  • ● Indexes are already to a large extent learned models like neural networks.

    ● Indexes predict the location of a value given a key.○ A B-tree is a model that takes a key as an input and

    predicts the position of a data record.○ A bloom filter is a binary classifier, which given a key

    predicts if a key exists in a set or not.

    Index Structures as Neural Network Models

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • B-treeThe B-tree provides a mapping from a lookup key into a position inside the sorted array of records.

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • B-treeThe B-tree provides a mapping from a lookup key into a position inside the sorted array of records.

    For efficiency, index to page granularity.

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • B-treeThe B-tree provides a mapping from a lookup key into a position inside the sorted array of records.

    Map a key to a position with a min and max error.

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • Replace B-trees with ML Models!● We can replace the index with ML models that provide

    similar strong guarantees about the min and max error.● The B-tree only provides this guarantee over the stored

    data, not for all possible data.○ The min and max error is the maximum error of the

    model over the training data.○ Execute the model for every key and remember the

    worst over- and under-prediction of a position.

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • ● B-Trees have a bounded cost for inserts and lookups and are good in taking advantage of the cache.

    ● B-Trees can map keys to pages which are not continuously mapped to memory or disk.

    ● If a lookup key does not exist in the set, certain models might return positions outside the min/max error range if they are not monotonically increasing models.

    Challenges

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • ● Using ML models has the potential to transform the cost of log n B-tree look-up into a constant operation (in the best case).

    ● Neural networks are able to learn a wide variety of data distributions, mixtures and other data peculiarities and patterns and make use of these.○ Have to balance the complexity of the model with its

    accuracy.

    Advantages

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • A First, Naïve Learned Index ● Use 200M web-server log records to build a secondary

    index over the timestamps using Tensorflow.○ Two-layer fully-connected NN with 32 neurons per

    layer using ReLU activation functions; the timestamps are the inputs and the positions are the outputs.

    ○ Lookup time ≈ 80,000 ns (model execution only).

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • A First, Naïve Learned Index ● Use 200M web-server log records to build a secondary

    index over the timestamps using Tensorflow.○ Two-layer fully-connected NN with 32 neurons per

    layer using ReLU activation functions; the timestamps are the inputs and the positions are the outputs.

    ○ Lookup time ≈ 80,000 ns (model execution only).● CPU and space efficient to narrow down the position for

    an item from the entire data set to a region of thousands, but inefficient for the “last mile”.

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • A First, Naïve Learned Index

    For every key in 100M keys, we want to map it to a position in a sorted array.

    When we have one model, it has to be “complex enough” to figure out an

    accurate mapping for every key.

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • The Recursive Model Index

    It is much easier to have a model that can say that a given key from 100M keys

    maps to the first 10k, second 10k, etc. positions!

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • The Learning Index Framework (LIF)● The LIF can be regarded as an index synthesis system;

    given an index specification, LIF generates different index configurations, optimizes them, and tests them automatically.

    ● Given a trained Tensorflow model, LIF automatically extracts all weights from the model and generates efficient index structures in C++ based on the model specification.

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • ● Improve last-mile accuracy.○ Reducing min/max error to 100 from 100M records

    using a single model is very hard. ○ Reducing the error to 10k from 100M is much easier

    to achieve even with simple models. ○ Reducing the error from 10k to 100 is simpler as the

    model can focus only on a subset of the data.

    The Recursive Model Index

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • ● Improve last-mile accuracy.○ Reducing min/max error to 100 from 100M records

    using a single model is very hard. ○ Reducing the error to 10k from 100M is much easier to

    achieve even with simple models. ○ Reducing the error from 10k to 100 is simpler as the

    model can focus only on a subset of the data. 💡 Use a hierarchical approach where we can have models focus on smaller subsets of data.

    The Recursive Model Index

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • The Recursive Model IndexTake a layered approach and have models focus on limited layers:

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • The Recursive Model IndexTake a layered approach and have models focus on limited layers:

    Reduce from 100M to 1M

    Reduce from 1M to 10k

    Reduce from 10k to 100

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • The Recursive Model IndexTake a layered approach and have models focus on limited layers: Check out the math

    in the paper if you’re interested in

    the details! :)Reduce from 100M to 1M

    Reduce from 1M to 10k

    Reduce from 10k to 100

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • Hybrid End-to-End TrainingWith a layered approach we can build mixtures of models!

    Reduce from 100M to 1M

    Reduce from 1M to 10k

    Reduce from 10k to 100

    small ReLU NN

    Linear Regression

    Linear Regression

    Linear Regression

    B-tree B-tree B-tree B-tree

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • Starting from the entire dataset (line 3), it trains first the top-node model. Based on the prediction of this model, it then picks the model from the next stage (lines 9 and 10) and adds all keys which fall into that model (line 10). Finally, in the case of hybrid indexes, the index is optimized by replacing NN models with B-trees if absolute min-/max-error is above a predefined threshold (lines 11-14).

    Hybrid End-to-End Training

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • Hybrid End-to-End Training

    Worst case is a B-tree!

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • To find the record either binary search or scanning is used.Models might generate more information than page location.● Model Binary Search

    ○ Set first middle point to pos predicted by the model. ● Biased Search

    ○ Use standard deviation σ of the last stage model to set middle.

    ● Biased Quaternary Search○ Pick three middle points as pos − σ, pos, pos + σ.

    Search Strategies

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • ● Turn strings into inputs the NN model can use.○ Represent string as a vector, where each element is

    the decimal ASCII value of a char.○ Limit size of vector to N to have equally-sized inputs

    ● Vector inputs slow the model down significantly.

    ● Further research is needed to speed this case up :)

    Indexing strings

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • Inserts and Updates● Appends

    ○ No need to relearn if the model can only learn the key trend for the new items.

    ● Inserts in the middle○ If inserts follow roughly a similar pattern as the

    learned CDF, retraining is not needed since the index “generalizes” over the new items and inserts become an O(1) operation.

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • If we have a model that is more general, it is cheaper to insert new values, since

    they will follow the trend.

    Inserts and Updates

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • HashmapHashmaps use a hash function to deterministically map keys to random positions inside an array.

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • HashmapMain challenge is to reduce conflicts.

    ● Use a linked-list to handle the “overflow”. ● Use linear or quadratic probing.● Most solutions allocate significantly more memory than

    records and combine it with additional data structures.○ Dense hashmap: typical overhead of 78% memory.○ Sparse hashmap: only 4 bits overhead, but is up to

    3-7 times slower because of its search and data placement strategy.

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • ● If we could learn a model which uniquely maps every key into a unique position inside the array we could avoid conflicts.

    ● Learned models are capable of reaching higher utilization of the hashmap depending on the data distribution.

    ● Scale the distribution by the targeted size M of the hashmap and use h(K) = F(K) ∗ M, K is hash function.

    ● If the model F perfectly learned the distribution, no conflicts would exist.

    Hashmap

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • Bloom filterBloom filters are probabilistic data structures used to test whether an element is a member of a set.

    Blo

    om fi

    lter

    inse

    rtio

    nLe

    arne

    d bl

    oom

    fil

    ter i

    nser

    tion

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • Bloom filter● A bloom filter index needs to learn a function that

    separates keys from everything else.○ A good hash function for a bloom filter should have

    lots of collisions among keys and lots of collisions among non-keys, but few collisions of keys and non-keys.

    ● As a classification problem: learn a model f that can predict if an input x is a key or non-key.

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • ● As a classification problem: learn a model f that can predict if an input x is a key or non-key.○ Use sigmoid neurons to find probability between 0,1. ○ The output of NN is the probability that input x is a key

    in our database.○ Choose a threshold t above which we will assume the

    key exists in our database.○ Tune threshold t to achieve the desired false positive

    rate.○ To prevent false negatives, use overflow bloom filter.

    Bloom filter

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • Results

  • ● 4 datasets to compare the performance of learned index structures with B-trees.○ Compare lookup-time (model execution time + local search time).○ Compare index structure size.○ Compare model error and error variance.

    ● These results focus on read performance only, loading and insertion time are not included.○ A model without hidden layers can be trained on over 200M records in

    just few seconds.

    B-tree Results

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • 200M log entries for requests to a major university website. Index over all unique timestamps.

    Web Log Dataset

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • 200M log entries for requests to a major university website. Index over all unique timestamps.

    Web Log DatasetThe model error is the averaged standard error over all models on the last stage, whereas the error variance indicates how much this standard error varies between the models.

    Baseline

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • Model is 3× faster and up to an order-of-magnitude smaller.

    Web Log Dataset

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • Quarternary search only helps a little bit.

    Web Log Dataset

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • The error is high, which influences the search time.

    Web Log Dataset

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • Maps DatasetIndex of the longitude of ≈ 200M user-maintained features across the world. Relatively linear.

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • Maps Dataset

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

    Model is 3× faster and up to an order-of-magnitude smaller.

  • Maps DatasetQuarternary search does not help.

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • Lognormal DatasetSynthetic dataset of 190M unique values to test how the index works on heavy-tail distributions. Highly non-linear, making the distribution more difficult to learn.

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • Lognormal Dataset

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

    The error is high, which influences the search time.

  • Important Observations● 3× faster and being up to an order-of-magnitude smaller. ● Quarternary search only helps for some datasets. ● The model accuracy varies widely. Most noticeable for the

    synthetic dataset and the weblog data the error is much higher.

    ● Second stage size has a significant impact on the index size and lookup performance.○ This is not surprising as the second stage determines

    how many models have to be stored. Worth noting is that our second stage uses 10,000 or more models.

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • Web Document DatasetThe web-document dataset consists of the 10M non-continuous document-ids of a large web index used as part of a real product at a large internet company.

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • Web Document DatasetSpeedups for learned indexes is not as prominent, so hybrid indexes, which replace bad performing models with B-trees actually help to improve performance.

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • Web Document DatasetBecause cost of searching is higher, the different search strategies make a bigger difference. The reason why biased search and quaternary search performs better is that they can take the standard error into account.

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • ● Use 3 int datasets.● Model hash has similar

    performance and utilizes the memory better.

    ● When there are extra slots, the improvement disappears.

    Hashmap Results

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • ● Blacklisted phishing URLs dataset: 1.7M unique URLs.

    ● The more accurate the model is, the better the savings in bloom filter size.

    Bloom filter Results

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • ● A normal Bloom filter with a desired 1% false positive rate requires 2.04MB.

    ● For a 16-dim GRU with a 32-dim embedding for each character; the model is 0.0259MB, with the spillover it is 1.07 MB.

    Bloom filter Results

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • Conclusion

  • ● Multi-Dimensional Indexes: Extend learned indexes to multi-dimensional index structures. Models, especially neural nets, are extremely good at capturing complex high-dimensional relationships.

    ● Learned Algorithms: A model can also speed-up sorting and joins, not just indexes.

    ● GPU/TPUs: GPU/TPUs will make the idea of learned indexes even more viable.

    Conclusion and Future Work

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

  • Next time● Is this a good idea?

    ● Related work● Some Notes on "Learned Bloom Filters"

    ● Don't Throw Out Your Algorithms Book Just Yet

    Back

    grou

    ndRe

    sults

    Conc

    lusi

    onLe

    arne

    d In

    dex

    Stru

    ctur

    es

    https://mybiasedcoin.blogspot.com/2018/01/some-notes-on-learned-bloom-filters.html?m=1http://dawn.cs.stanford.edu/2018/01/11/index-baselines/