learned index structures - cornell university...learned index structures bigtable research review...
TRANSCRIPT
-
Learned Index Structures
Bigtable Research Review MeetingPresented by Deniz Altinbuken
January 29, 2018
paper by Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, Neoklis Polyzotis
go/learned-index-structures-presentation
http://go/learned-index-structures-presentation
-
Objectives1. Show that all index structures can be replaced with deep
learning models: learned indexes.2. Analyze under which conditions learned indexes
outperform traditional index structures and describe the main challenges in designing learned index structures.
3. Show that the idea of replacing core components of a data management system through learned models can be very powerful.
-
Claims● Traditional indexes assume worst case data distribution so
that they can be general purpose.○ They do not take advantage of patterns.
● Knowing the exact data distribution enables highly optimizing any index the database system uses.
● ML opens up the opportunity to learn a model that reflects the patterns and correlations in the data and thus enable the automatic synthesis of specialized index structures:learned indexes.
-
A model can learn the sort order or structure of lookup keys and use this signal to effectively predict the position or existence of records.
Main Idea
-
Background
Learned Index Structures
Results
Conclusion
-
Background
-
Neural Networks: An ExampleRecognizing handwriting● Very difficult to express our intuitions such as "9 has a
loop at the top, and a vertical stroke in the bottom right".● Very difficult to create precise rules and solve this
algorithmically.○ Too many exceptions, special cases.
Back
grou
ndCo
nclu
sion
Lear
ned
Inde
x St
ruct
ures
Resu
lts
-
Neural Networks: An ExampleNeural networks approach the problem in a different way.
● Take a large number of handwritten digits: training data.
● Develop a system which can learn from the training data.
Back
grou
ndCo
nclu
sion
Lear
ned
Inde
x St
ruct
ures
Resu
lts
-
Neural networks approach the problem in a different way.
Neural Networks: An Example
Automatically infer rules for recognizing handwritten digits by going through
examples!
Back
grou
ndCo
nclu
sion
Lear
ned
Inde
x St
ruct
ures
Resu
lts
-
Neural networks approach the problem in a different way.
Neural Networks: An Example
Create a network of neurons that can learn! :)
Back
grou
ndCo
nclu
sion
Lear
ned
Inde
x St
ruct
ures
Resu
lts
-
Neurons: PerceptronA perceptron takes several binary inputs, x1,x2,… and produces a single binary output:
The output is computed as a function of the inputs, where weights w1,w2,… express the importance of inputs to the output.
x1
x2
x3
output
w1 w2 w3
Back
grou
ndCo
nclu
sion
Lear
ned
Inde
x St
ruct
ures
Resu
lts
-
The output is determined by whether the weighted sum ∑jwjxj is less than or greater than some threshold value.
Just like the weights, the threshold is a number which is a parameter of the neuron. If the threshold is reached, the neuron fires.
Neurons: Perceptron
x1
x2
x3
output
w1 w2 w3
t
Back
grou
ndCo
nclu
sion
Lear
ned
Inde
x St
ruct
ures
Resu
lts
-
Neurons: PerceptronThe output is determined by whether the weighted sum ∑jwjxj is less than or greater than some threshold value.
Just like the weights, the threshold is a number which is a parameter of the neuron. If the threshold is reached, the neuron fires.
0 if ∑j wjxj ≤ thresholdoutput = 1 if ∑j wjxj > threshold
Back
grou
ndCo
nclu
sion
Lear
ned
Inde
x St
ruct
ures
Resu
lts
-
A more common way to describe a perceptron is:
● ∑jwjxj ● -threshold bias
Neurons: Perceptron
0 if w⋅x + bias ≤ 0output = 1 if w⋅x + bias > 0
0 if ∑j wjxj ≤ thresholdoutput = 1 if ∑j wjxj > threshold
Bias describes how easy it is to get the neuron to fire.
w⋅x
Back
grou
ndCo
nclu
sion
Lear
ned
Inde
x St
ruct
ures
Resu
lts
-
Neurons: Perceptron● By varying the weights and the threshold, we get different
models of decision-making.● A complex network of perceptrons that uses layers can
make quite subtle decisions.
outputinputs
Back
grou
ndCo
nclu
sion
Lear
ned
Inde
x St
ruct
ures
Resu
lts
-
Neurons: Perceptron● By varying the weights and the threshold, we get different
models of decision-making.● A complex network of perceptrons that uses layers can
make quite subtle decisions.
outputinputs
1st layer 2nd layer
Back
grou
ndCo
nclu
sion
Lear
ned
Inde
x St
ruct
ures
Resu
lts
-
Neurons: Perceptron● By varying the weights and the threshold, we get different
models of decision-making.● A complex network of perceptrons that uses layers can
make quite subtle decisions.
output
input layer output layerhidden layers
inputs
Back
grou
ndCo
nclu
sion
Lear
ned
Inde
x St
ruct
ures
Resu
lts
-
Neurons: Perceptron
Perceptrons are great for decision making.
Back
grou
ndCo
nclu
sion
Lear
ned
Inde
x St
ruct
ures
Resu
lts
-
Neurons: Perceptron
How about learning?
Back
grou
ndCo
nclu
sion
Lear
ned
Inde
x St
ruct
ures
Resu
lts
-
Neurons: PerceptronEarlier
Automatically infer rules for recognizing handwritten digits by going through
examples!
Back
grou
ndCo
nclu
sion
Lear
ned
Inde
x St
ruct
ures
Resu
lts
-
Learning● A neural network goes through examples to learn weights
and biases so that the output from the network correctly classifies a given digit.
● When a small change is made in some weight or bias in the network if this causes a small corresponding change in the output from the network, the network can learn.
Back
grou
ndCo
nclu
sion
Lear
ned
Inde
x St
ruct
ures
Resu
lts
-
Learning● A neural network goes through examples to learn weights
and biases so that the output from the network correctly classifies a given digit.
● When a small change is made in some weight or bias in the network if this causes a small corresponding change in the output from the network, the network can learn.
Trying to create the right mapping for all cases.
Back
grou
ndCo
nclu
sion
Lear
ned
Inde
x St
ruct
ures
Resu
lts
-
Learning
The neural network is “trained” by adjusting weights and biases to find the perfect model that would generate the
expected output for the “training data”.
Back
grou
ndCo
nclu
sion
Lear
ned
Inde
x St
ruct
ures
Resu
lts
-
Learning
Through training you minimize the prediction error.
(But having perfect output is difficult.)
Back
grou
ndCo
nclu
sion
Lear
ned
Inde
x St
ruct
ures
Resu
lts
-
Neurons: Sigmoid● Sigmoid neurons are similar to perceptrons, but modified
so that small changes in their weights and bias cause only a small change in their output.
output + Δoutput inputs
w + Δw
Small Δ in any weight or bias causes a small Δ in the output!
Back
grou
ndCo
nclu
sion
Lear
ned
Inde
x St
ruct
ures
Resu
lts
-
Neurons: Sigmoid● A sigmoid takes several inputs, x1,x2,… which can be
any real number between 0 and 1 (i.e. 0.256) and produces a single output, which can also be any real number between 0 and 1.
output = σ(w⋅x + bias)
Back
grou
ndCo
nclu
sion
Lear
ned
Inde
x St
ruct
ures
Resu
lts
-
Neurons: Sigmoid● A sigmoid takes several inputs, x1,x2,… which can be
any real number between 0 and 1 (i.e. 0.256) and produces a single output, which can also be any real number between 0 and 1.
output = σ(w⋅x + bias)
σ(z) = 11 + e-z
sigmoid function
Back
grou
ndCo
nclu
sion
Lear
ned
Inde
x St
ruct
ures
Resu
lts
-
Neurons: Sigmoid● A sigmoid takes several inputs, x1,x2,… which can be
any real number between 0 and 1 (i.e. 0.256) and produces a single output, which can also be any real number between 0 and 1.
output = σ(w⋅x + bias)
Back
grou
ndCo
nclu
sion
Lear
ned
Inde
x St
ruct
ures
Resu
lts
Great for representing probabilities!
-
Neurons: ReLU (Rectified Linear Unit)● Better for deep learning because it preserves the
information from earlier layers better as it goes through hidden layers.
outputinputs
Back
grou
ndCo
nclu
sion
Lear
ned
Inde
x St
ruct
ures
Resu
lts
-
Neurons: ReLU (Rectified Linear Unit)● Better for deep learning because it preserves the
information from earlier layers better as it goes through hidden layers.
0 if x ≤ 0output = x if x > 0
Back
grou
ndCo
nclu
sion
Lear
ned
Inde
x St
ruct
ures
Resu
lts
-
Activation Functions (Transfer Functions)
To get an intuition about the neurons, it helps to see the shape of the activation function.
Back
grou
ndCo
nclu
sion
Lear
ned
Inde
x St
ruct
ures
Resu
lts
-
Learned Index Structures
-
● Indexes are already to a large extent learned models like neural networks.
● Indexes predict the location of a value given a key.○ A B-tree is a model that takes a key as an input and
predicts the position of a data record.○ A bloom filter is a binary classifier, which given a key
predicts if a key exists in a set or not.
Index Structures as Neural Network Models
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
B-treeThe B-tree provides a mapping from a lookup key into a position inside the sorted array of records.
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
B-treeThe B-tree provides a mapping from a lookup key into a position inside the sorted array of records.
For efficiency, index to page granularity.
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
B-treeThe B-tree provides a mapping from a lookup key into a position inside the sorted array of records.
Map a key to a position with a min and max error.
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
Replace B-trees with ML Models!● We can replace the index with ML models that provide
similar strong guarantees about the min and max error.● The B-tree only provides this guarantee over the stored
data, not for all possible data.○ The min and max error is the maximum error of the
model over the training data.○ Execute the model for every key and remember the
worst over- and under-prediction of a position.
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
● B-Trees have a bounded cost for inserts and lookups and are good in taking advantage of the cache.
● B-Trees can map keys to pages which are not continuously mapped to memory or disk.
● If a lookup key does not exist in the set, certain models might return positions outside the min/max error range if they are not monotonically increasing models.
Challenges
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
● Using ML models has the potential to transform the cost of log n B-tree look-up into a constant operation (in the best case).
● Neural networks are able to learn a wide variety of data distributions, mixtures and other data peculiarities and patterns and make use of these.○ Have to balance the complexity of the model with its
accuracy.
Advantages
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
A First, Naïve Learned Index ● Use 200M web-server log records to build a secondary
index over the timestamps using Tensorflow.○ Two-layer fully-connected NN with 32 neurons per
layer using ReLU activation functions; the timestamps are the inputs and the positions are the outputs.
○ Lookup time ≈ 80,000 ns (model execution only).
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
A First, Naïve Learned Index ● Use 200M web-server log records to build a secondary
index over the timestamps using Tensorflow.○ Two-layer fully-connected NN with 32 neurons per
layer using ReLU activation functions; the timestamps are the inputs and the positions are the outputs.
○ Lookup time ≈ 80,000 ns (model execution only).● CPU and space efficient to narrow down the position for
an item from the entire data set to a region of thousands, but inefficient for the “last mile”.
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
A First, Naïve Learned Index
For every key in 100M keys, we want to map it to a position in a sorted array.
When we have one model, it has to be “complex enough” to figure out an
accurate mapping for every key.
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
The Recursive Model Index
It is much easier to have a model that can say that a given key from 100M keys
maps to the first 10k, second 10k, etc. positions!
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
The Learning Index Framework (LIF)● The LIF can be regarded as an index synthesis system;
given an index specification, LIF generates different index configurations, optimizes them, and tests them automatically.
● Given a trained Tensorflow model, LIF automatically extracts all weights from the model and generates efficient index structures in C++ based on the model specification.
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
● Improve last-mile accuracy.○ Reducing min/max error to 100 from 100M records
using a single model is very hard. ○ Reducing the error to 10k from 100M is much easier
to achieve even with simple models. ○ Reducing the error from 10k to 100 is simpler as the
model can focus only on a subset of the data.
The Recursive Model Index
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
● Improve last-mile accuracy.○ Reducing min/max error to 100 from 100M records
using a single model is very hard. ○ Reducing the error to 10k from 100M is much easier to
achieve even with simple models. ○ Reducing the error from 10k to 100 is simpler as the
model can focus only on a subset of the data. 💡 Use a hierarchical approach where we can have models focus on smaller subsets of data.
The Recursive Model Index
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
The Recursive Model IndexTake a layered approach and have models focus on limited layers:
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
The Recursive Model IndexTake a layered approach and have models focus on limited layers:
Reduce from 100M to 1M
Reduce from 1M to 10k
Reduce from 10k to 100
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
The Recursive Model IndexTake a layered approach and have models focus on limited layers: Check out the math
in the paper if you’re interested in
the details! :)Reduce from 100M to 1M
Reduce from 1M to 10k
Reduce from 10k to 100
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
Hybrid End-to-End TrainingWith a layered approach we can build mixtures of models!
Reduce from 100M to 1M
Reduce from 1M to 10k
Reduce from 10k to 100
small ReLU NN
Linear Regression
Linear Regression
Linear Regression
B-tree B-tree B-tree B-tree
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
Starting from the entire dataset (line 3), it trains first the top-node model. Based on the prediction of this model, it then picks the model from the next stage (lines 9 and 10) and adds all keys which fall into that model (line 10). Finally, in the case of hybrid indexes, the index is optimized by replacing NN models with B-trees if absolute min-/max-error is above a predefined threshold (lines 11-14).
Hybrid End-to-End Training
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
Hybrid End-to-End Training
Worst case is a B-tree!
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
To find the record either binary search or scanning is used.Models might generate more information than page location.● Model Binary Search
○ Set first middle point to pos predicted by the model. ● Biased Search
○ Use standard deviation σ of the last stage model to set middle.
● Biased Quaternary Search○ Pick three middle points as pos − σ, pos, pos + σ.
Search Strategies
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
● Turn strings into inputs the NN model can use.○ Represent string as a vector, where each element is
the decimal ASCII value of a char.○ Limit size of vector to N to have equally-sized inputs
● Vector inputs slow the model down significantly.
● Further research is needed to speed this case up :)
Indexing strings
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
Inserts and Updates● Appends
○ No need to relearn if the model can only learn the key trend for the new items.
● Inserts in the middle○ If inserts follow roughly a similar pattern as the
learned CDF, retraining is not needed since the index “generalizes” over the new items and inserts become an O(1) operation.
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
If we have a model that is more general, it is cheaper to insert new values, since
they will follow the trend.
Inserts and Updates
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
HashmapHashmaps use a hash function to deterministically map keys to random positions inside an array.
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
HashmapMain challenge is to reduce conflicts.
● Use a linked-list to handle the “overflow”. ● Use linear or quadratic probing.● Most solutions allocate significantly more memory than
records and combine it with additional data structures.○ Dense hashmap: typical overhead of 78% memory.○ Sparse hashmap: only 4 bits overhead, but is up to
3-7 times slower because of its search and data placement strategy.
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
● If we could learn a model which uniquely maps every key into a unique position inside the array we could avoid conflicts.
● Learned models are capable of reaching higher utilization of the hashmap depending on the data distribution.
● Scale the distribution by the targeted size M of the hashmap and use h(K) = F(K) ∗ M, K is hash function.
● If the model F perfectly learned the distribution, no conflicts would exist.
Hashmap
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
Bloom filterBloom filters are probabilistic data structures used to test whether an element is a member of a set.
Blo
om fi
lter
inse
rtio
nLe
arne
d bl
oom
fil
ter i
nser
tion
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
Bloom filter● A bloom filter index needs to learn a function that
separates keys from everything else.○ A good hash function for a bloom filter should have
lots of collisions among keys and lots of collisions among non-keys, but few collisions of keys and non-keys.
● As a classification problem: learn a model f that can predict if an input x is a key or non-key.
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
● As a classification problem: learn a model f that can predict if an input x is a key or non-key.○ Use sigmoid neurons to find probability between 0,1. ○ The output of NN is the probability that input x is a key
in our database.○ Choose a threshold t above which we will assume the
key exists in our database.○ Tune threshold t to achieve the desired false positive
rate.○ To prevent false negatives, use overflow bloom filter.
Bloom filter
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
Results
-
● 4 datasets to compare the performance of learned index structures with B-trees.○ Compare lookup-time (model execution time + local search time).○ Compare index structure size.○ Compare model error and error variance.
● These results focus on read performance only, loading and insertion time are not included.○ A model without hidden layers can be trained on over 200M records in
just few seconds.
B-tree Results
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
200M log entries for requests to a major university website. Index over all unique timestamps.
Web Log Dataset
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
200M log entries for requests to a major university website. Index over all unique timestamps.
Web Log DatasetThe model error is the averaged standard error over all models on the last stage, whereas the error variance indicates how much this standard error varies between the models.
Baseline
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
Model is 3× faster and up to an order-of-magnitude smaller.
Web Log Dataset
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
Quarternary search only helps a little bit.
Web Log Dataset
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
The error is high, which influences the search time.
Web Log Dataset
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
Maps DatasetIndex of the longitude of ≈ 200M user-maintained features across the world. Relatively linear.
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
Maps Dataset
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
Model is 3× faster and up to an order-of-magnitude smaller.
-
Maps DatasetQuarternary search does not help.
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
Lognormal DatasetSynthetic dataset of 190M unique values to test how the index works on heavy-tail distributions. Highly non-linear, making the distribution more difficult to learn.
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
Lognormal Dataset
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
The error is high, which influences the search time.
-
Important Observations● 3× faster and being up to an order-of-magnitude smaller. ● Quarternary search only helps for some datasets. ● The model accuracy varies widely. Most noticeable for the
synthetic dataset and the weblog data the error is much higher.
● Second stage size has a significant impact on the index size and lookup performance.○ This is not surprising as the second stage determines
how many models have to be stored. Worth noting is that our second stage uses 10,000 or more models.
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
Web Document DatasetThe web-document dataset consists of the 10M non-continuous document-ids of a large web index used as part of a real product at a large internet company.
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
Web Document DatasetSpeedups for learned indexes is not as prominent, so hybrid indexes, which replace bad performing models with B-trees actually help to improve performance.
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
Web Document DatasetBecause cost of searching is higher, the different search strategies make a bigger difference. The reason why biased search and quaternary search performs better is that they can take the standard error into account.
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
● Use 3 int datasets.● Model hash has similar
performance and utilizes the memory better.
● When there are extra slots, the improvement disappears.
Hashmap Results
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
● Blacklisted phishing URLs dataset: 1.7M unique URLs.
● The more accurate the model is, the better the savings in bloom filter size.
Bloom filter Results
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
● A normal Bloom filter with a desired 1% false positive rate requires 2.04MB.
● For a 16-dim GRU with a 32-dim embedding for each character; the model is 0.0259MB, with the spillover it is 1.07 MB.
Bloom filter Results
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
Conclusion
-
● Multi-Dimensional Indexes: Extend learned indexes to multi-dimensional index structures. Models, especially neural nets, are extremely good at capturing complex high-dimensional relationships.
● Learned Algorithms: A model can also speed-up sorting and joins, not just indexes.
● GPU/TPUs: GPU/TPUs will make the idea of learned indexes even more viable.
Conclusion and Future Work
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
-
Next time● Is this a good idea?
● Related work● Some Notes on "Learned Bloom Filters"
● Don't Throw Out Your Algorithms Book Just Yet
Back
grou
ndRe
sults
Conc
lusi
onLe
arne
d In
dex
Stru
ctur
es
https://mybiasedcoin.blogspot.com/2018/01/some-notes-on-learned-bloom-filters.html?m=1http://dawn.cs.stanford.edu/2018/01/11/index-baselines/