faster persistent data structures through hashingnodes behaving in a good karmic manner.the graph is...
TRANSCRIPT
![Page 2: Faster persistent data structures through hashingnodes behaving in a good karmic manner.The graph is culled from 100 million tweets and has about 3 million nodes. We need a data structure](https://reader033.vdocuments.net/reader033/viewer/2022041910/5e67211724ca2369fa4fe201/html5/thumbnails/2.jpg)
Motivating problem: Twitter data analysis
I'm computing a communication graph from Twitterdata and then scan it daily to allocate social capital tonodes behaving in a good karmic manner. The graphis culled from 100 million tweets and has about 3million nodes.
We need a data structure that isI fast when used with string keys, andI doesn't use too much memory.
![Page 3: Faster persistent data structures through hashingnodes behaving in a good karmic manner.The graph is culled from 100 million tweets and has about 3 million nodes. We need a data structure](https://reader033.vdocuments.net/reader033/viewer/2022041910/5e67211724ca2369fa4fe201/html5/thumbnails/3.jpg)
Persistent maps in Haskell
I Data.Map is the most commonly used map type.I It's implemented using size balanced trees and is
representative of the performance of other binary treeimplementations.
I Keys can be of any type, as long as values of the type canbe ordered.
![Page 4: Faster persistent data structures through hashingnodes behaving in a good karmic manner.The graph is culled from 100 million tweets and has about 3 million nodes. We need a data structure](https://reader033.vdocuments.net/reader033/viewer/2022041910/5e67211724ca2369fa4fe201/html5/thumbnails/4.jpg)
Real world performance of Data.Map
I Good in theory: no more than O(log n) comparisons.I Not great in practice: up to O(log n) comparisons!I Many common types are expensive to compare e.gString, ByteString, and Text.
I Given a string of length k, we need O(k ∗ log n)comparisons to look up an entry.
![Page 5: Faster persistent data structures through hashingnodes behaving in a good karmic manner.The graph is culled from 100 million tweets and has about 3 million nodes. We need a data structure](https://reader033.vdocuments.net/reader033/viewer/2022041910/5e67211724ca2369fa4fe201/html5/thumbnails/5.jpg)
Hash tables
I Hash tables perform well with string keys: O(k) amortizedlookup time for strings of length k.
I However, we want persistent maps, not mutable hashtables.
![Page 6: Faster persistent data structures through hashingnodes behaving in a good karmic manner.The graph is culled from 100 million tweets and has about 3 million nodes. We need a data structure](https://reader033.vdocuments.net/reader033/viewer/2022041910/5e67211724ca2369fa4fe201/html5/thumbnails/6.jpg)
Milan Straka's idea: IntMaps as arrays
I We can use hashing without using hash tables!I Data.IntMap implements a persistent array and is much
faster than Data.Map.I Use hashing to derive an Int from an arbitrary key.
class Hashable a wherehash :: a -> Int
![Page 7: Faster persistent data structures through hashingnodes behaving in a good karmic manner.The graph is culled from 100 million tweets and has about 3 million nodes. We need a data structure](https://reader033.vdocuments.net/reader033/viewer/2022041910/5e67211724ca2369fa4fe201/html5/thumbnails/7.jpg)
Collisions are easy to deal with
I IntMap implements a sparse, persistent array of size 232
(or 264).I Hashing using this many buckets makes collisions rare: for
224 entries we expect about 32,000 single collisions.I Implication: We can use any old collision handling strategy
(e.g. chaining using linked lists).
![Page 8: Faster persistent data structures through hashingnodes behaving in a good karmic manner.The graph is culled from 100 million tweets and has about 3 million nodes. We need a data structure](https://reader033.vdocuments.net/reader033/viewer/2022041910/5e67211724ca2369fa4fe201/html5/thumbnails/8.jpg)
HashMap implemented using an IntMap
Naive implementation:newtype HashMap k v = HashMap (IntMap [(k, v)])
By inlining (``unpacking'') the list and pair constructors we cansave 2 words of memory per key/value pair.
![Page 9: Faster persistent data structures through hashingnodes behaving in a good karmic manner.The graph is culled from 100 million tweets and has about 3 million nodes. We need a data structure](https://reader033.vdocuments.net/reader033/viewer/2022041910/5e67211724ca2369fa4fe201/html5/thumbnails/9.jpg)
Benchmark: Map vs HashMap
Keys: 212 random 8-byte ByteStrings
Runtime (μs) RuntimeMap HashMap % increase
lookup 1956 916 -53%insert 3543 1855 -48%
delete 3791 1838 -52%
![Page 10: Faster persistent data structures through hashingnodes behaving in a good karmic manner.The graph is culled from 100 million tweets and has about 3 million nodes. We need a data structure](https://reader033.vdocuments.net/reader033/viewer/2022041910/5e67211724ca2369fa4fe201/html5/thumbnails/10.jpg)
Can we do better?
I Imperative hash tables still perform better, perhaps there'sroom for improvement.
I We still need to perform O(min(W, log n)) Intcomparisons, where W is the number of bits in a word.
I The memory overhead per key/value pair is still high, about9 words per key/value pair.
![Page 11: Faster persistent data structures through hashingnodes behaving in a good karmic manner.The graph is culled from 100 million tweets and has about 3 million nodes. We need a data structure](https://reader033.vdocuments.net/reader033/viewer/2022041910/5e67211724ca2369fa4fe201/html5/thumbnails/11.jpg)
Borrowing from our neighbours
I Clojure uses a hash-array mapped trie (HAMT) datastructure to implement persistent maps.
I Described in the paper ``Ideal Hash Trees'' by Bagwell(2001).
I Originally a mutable data structure implemented in C++.I Clojure's persistent version was created by Rich Hickey.
![Page 12: Faster persistent data structures through hashingnodes behaving in a good karmic manner.The graph is culled from 100 million tweets and has about 3 million nodes. We need a data structure](https://reader033.vdocuments.net/reader033/viewer/2022041910/5e67211724ca2369fa4fe201/html5/thumbnails/12.jpg)
Hash-array mapped tries
I Shallow tree with high branching factor.I Each node, except the leaf nodes, contains an array of up
to 32 elements.I 5 bits of the hash are used to index the array at each level.I A clever trick, using bit population count, is used to
represent sparse arrays.
![Page 13: Faster persistent data structures through hashingnodes behaving in a good karmic manner.The graph is culled from 100 million tweets and has about 3 million nodes. We need a data structure](https://reader033.vdocuments.net/reader033/viewer/2022041910/5e67211724ca2369fa4fe201/html5/thumbnails/13.jpg)
HAMT
BitmapIndexed
Leaf
Hash
![Page 14: Faster persistent data structures through hashingnodes behaving in a good karmic manner.The graph is culled from 100 million tweets and has about 3 million nodes. We need a data structure](https://reader033.vdocuments.net/reader033/viewer/2022041910/5e67211724ca2369fa4fe201/html5/thumbnails/14.jpg)
The Haskell definition of a HAMT
data HashMap k v= Empty| BitmapIndexed !Bitmap !(Array (HashMap k v))| Leaf !Hash !k v| Full !(Array (HashMap k v))| Collision !Hash !(Array (Leaf k v))
type Bitmap = Wordtype Hash = Intdata Array a = Array (Array# a)
![Page 15: Faster persistent data structures through hashingnodes behaving in a good karmic manner.The graph is culled from 100 million tweets and has about 3 million nodes. We need a data structure](https://reader033.vdocuments.net/reader033/viewer/2022041910/5e67211724ca2369fa4fe201/html5/thumbnails/15.jpg)
High performance Haskell programming
Optimized implementation using standard techniques:I constructor unpacking,I GHC's new INLINABLE pragma, andI paying careful attention to strictness.
insert performance still bad (e.g compare to hash tables).
![Page 16: Faster persistent data structures through hashingnodes behaving in a good karmic manner.The graph is culled from 100 million tweets and has about 3 million nodes. We need a data structure](https://reader033.vdocuments.net/reader033/viewer/2022041910/5e67211724ca2369fa4fe201/html5/thumbnails/16.jpg)
Optimizing insertion
I Most time in insert is spent copying small arrays.I Array copying is implemented in Haskell and GHC doesn't
apply enough loop optimizations to make it run fast.I When allocating arrays GHC fills the array with dummy
elements, which are immediately overwritten.
![Page 17: Faster persistent data structures through hashingnodes behaving in a good karmic manner.The graph is culled from 100 million tweets and has about 3 million nodes. We need a data structure](https://reader033.vdocuments.net/reader033/viewer/2022041910/5e67211724ca2369fa4fe201/html5/thumbnails/17.jpg)
Optimizing insertion: copy less
I Bagwell's original formulation used a fanout of 32.I A fanout of 16 seems to provide a better trade-off betweenlookup and insert performance in our setting.
I Improved performance by 14%
![Page 18: Faster persistent data structures through hashingnodes behaving in a good karmic manner.The graph is culled from 100 million tweets and has about 3 million nodes. We need a data structure](https://reader033.vdocuments.net/reader033/viewer/2022041910/5e67211724ca2369fa4fe201/html5/thumbnails/18.jpg)
Optimizing insertion: copy faster
I Daniel Peebles and I have implemented a set of newprimops for copying arrays in GHC.
I The implementation generates straight-line code for copiesof statically known small size, and uses a fast memcpyotherwise.
I Improved performance by 20%
![Page 19: Faster persistent data structures through hashingnodes behaving in a good karmic manner.The graph is culled from 100 million tweets and has about 3 million nodes. We need a data structure](https://reader033.vdocuments.net/reader033/viewer/2022041910/5e67211724ca2369fa4fe201/html5/thumbnails/19.jpg)
Optimizing insertion: common patterns
I In many cases maps are created in one go from asequence of key/value pairs.
I We can optimize for this case by repeatedly mutating theHAMT and freezing it when we're done.
Keys: 212 random 8-byte ByteStrings
Runtime (%)fromList/pure 100
fromList/mutating 50
![Page 20: Faster persistent data structures through hashingnodes behaving in a good karmic manner.The graph is culled from 100 million tweets and has about 3 million nodes. We need a data structure](https://reader033.vdocuments.net/reader033/viewer/2022041910/5e67211724ca2369fa4fe201/html5/thumbnails/20.jpg)
Optimizing lookup: Faster population count
I Tried several bit population count implementations.I Best speed/memory-use trade-off is a lookup table based
approach.I Using the POPCNT SSE 4.2 instructions improves the
performance of lookup by 12%.
![Page 21: Faster persistent data structures through hashingnodes behaving in a good karmic manner.The graph is culled from 100 million tweets and has about 3 million nodes. We need a data structure](https://reader033.vdocuments.net/reader033/viewer/2022041910/5e67211724ca2369fa4fe201/html5/thumbnails/21.jpg)
Benchmark: IntMap-based vs HAMT
Keys: 212 random 8-byte ByteStrings
Runtime (μs) RuntimeIntMap HAMT % increase
lookup 916 477 -48%insert 1855 1998 8%
delete 1838 2303 25%
The benchmarks don't include the POPCNT optimization, due toit not being available on many architectures.
![Page 22: Faster persistent data structures through hashingnodes behaving in a good karmic manner.The graph is culled from 100 million tweets and has about 3 million nodes. We need a data structure](https://reader033.vdocuments.net/reader033/viewer/2022041910/5e67211724ca2369fa4fe201/html5/thumbnails/22.jpg)
Memory usage: IntMap-basedTotal: 96 MB, tree: 66MB (220 Int entries)
mem 43,678,549 bytes x seconds Thu Sep 22 13:48 2011
seconds0.0 0.2 0.4 0.6
byte
s
0M
20M
40M
60M
80M
main:Data.HashMap.Common.Bin
ghc-prim:GHC.Types.I#
main:Data.HashMap.Common.Tip
![Page 23: Faster persistent data structures through hashingnodes behaving in a good karmic manner.The graph is culled from 100 million tweets and has about 3 million nodes. We need a data structure](https://reader033.vdocuments.net/reader033/viewer/2022041910/5e67211724ca2369fa4fe201/html5/thumbnails/23.jpg)
Memory usage: HAMTTotal: 71MB, tree: 41MB (220 Int entries)
mem 25,805,095 bytes x seconds Thu Sep 22 13:49 2011
seconds0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5
byte
s
0M
20M
40M
60M
main:Data.HashMap.Base.BitmapIndexed
MUT_ARR_PTRS_FROZEN
main:Data.HashMap.Base.Leaf
ghc-prim:GHC.Types.I#
![Page 24: Faster persistent data structures through hashingnodes behaving in a good karmic manner.The graph is culled from 100 million tweets and has about 3 million nodes. We need a data structure](https://reader033.vdocuments.net/reader033/viewer/2022041910/5e67211724ca2369fa4fe201/html5/thumbnails/24.jpg)
Summary
Keys: 212 random 8-byte ByteStrings
Runtime (μs) RuntimeMap HAMT % increase
lookup 1956 477 -76%insert 3543 1998 -44%
delete 3791 2303 -39%