hopsfs & epipe - github pages · 2018-10-03 · hopsfs & epipe 1 distributed computing and...
TRANSCRIPT
Mahmoud IsmailKTH
HopsFS & ePipe
�1
Distributed Computing and Analytics Workshop, September 26th 2018
�2
�2
�2Problem: Data layer to store millions of these images and their annotations
At Scale
�3
Open Images Dataset
At Scale
�3
Open Images DatasetOpen Images DatasetOpen Images DatasetOpen Images DatasetOpen Images DatasetOpen Images DatasetOpen Images DatasetOpen Images DatasetOpen Images DatasetOpen Images DatasetOpen Images DatasetDataset X
Requirements
•Reading/Writing millions of images with high throughput
•Attaching annotations to each image, and then searching using these annotations
�4
HDFS
�5
HDFS
�5
Hadoop Software Stack
HDFS
�5
Hadoop Software Stack
HDFS
�5
Hadoop Software Stack
HDFS
�5
Hadoop Software Stack
HDFS
�5
Hadoop Software Stack
HDFS Architecture
�6
HDFS Architecture
�6
DataNodes
HDFS Architecture
�6
DataNodes
NameNode
HDFS Architecture
�6
DataNodes
HDFS Client
NameNode
HDFS Architecture
�6
DataNodes
HDFS Client
File1
NameNode
HDFS Architecture
�6
DataNodes
HDFS Client
File1
Where can I save the file?
NameNode
HDFS Architecture
�6
DataNodes
HDFS Client
File1
Where can I save the file?
DataNodes Addresses
NameNode
HDFS Architecture
�6
DataNodes
HDFS Client
File1
NameNode
HDFS Architecture
�6
DataNodes
HDFS Client
File1
NameNode
HDFS Architecture
�6
DataNodes
HDFS Client
File1
NameNode
HDFS Architecture
�6
DataNodes
HDFS Client
File Blocks Mappings
File System Metadata
File1
NameNode
HDFS Architecture
�6
DataNodes
File Blocks Mappings
File System Metadata
File1 Blk1 ! DN1, Blk2 ! DN5, Blk3 ! DN3
NameNode
HDFS Architecture
�6
DataNodes
File Blocks Mappings
File System Metadata
File1 Blk1 ! DN1, Blk2 ! DN5, Blk3 ! DN3 File2 Blk1 ! DN1, Blk2 ! DN4File3 Blk1 ! DN1, Blk2 ! DN2, Blk3 ! DN3File4 Blk1 ! DN100File5 Blk1 ! DN4, Blk2 ! DN2, Blk3 ! DN9
… … … …FileN Blk1 ! DN2, Blk2 ! DN8
NameNode
HDFS Performance at Scale
�7
DataNode
NameNode
HDFS Performance at Scale
�7
DataNode
NameNode
HDFS Performance at Scale
�7
DataNode
NameNode
HDFS Performance at Scale
�7
DataNode
NameNode
`
HDFS Limitations
• Namespace size upper bound: ~ 500 million files
• At most 70-80 thousands file system operations / sec
�9
HopsFS
�10
HopsFS
�10
HopsFS
�10
HopsFS Architecture
�1111
NameNodeFile Blocks Mappings
File System Metadata
File1 Blk1 ! DN1, Blk2 ! DN4, Blk3 ! DN5
File5 Blk1 ! DN4, Blk2 ! DN2, Blk3 ! DN9
File4 Blk1 ! DN100
File3 Blk1 ! DN1, Blk2 ! DN2, Blk3 ! DN3
File2 Blk1 ! DN1, Blk2 ! DN4
FileN Blk1 ! DN2, Blk2 ! DN8
HopsFS Architecture
�1111
NameNode File Blocks Mappings
File1 Metadata
File2 Metadata
File Blocks Mappings
File3 Metadata
File4 Metadata
File Blocks Mappings
File5 Metadata
File6 Metadata
File Blocks Mappings
File7 Metadata
File8 Metadata
HopsFS Architecture
�1111
NameNode File Blocks Mappings
File1 Metadata
File2 Metadata
File Blocks Mappings
File3 Metadata
File4 Metadata
File Blocks Mappings
File5 Metadata
File6 Metadata
File Blocks Mappings
File7 Metadata
File8 Metadata
Distributed Database
HopsFS Architecture
�1111
NameNode
File Blocks Mappings
File1 Metadata
File2 Metadata
File Blocks Mappings
File3 Metadata
File4 Metadata
File Blocks Mappings
File5 Metadata
File6 Metadata
File Blocks Mappings
File7 Metadata
File8 Metadata
Distributed Database
HopsFS Architecture
�1111
NameNode
File Blocks Mappings
File1 Metadata
File2 Metadata
File Blocks Mappings
File3 Metadata
File4 Metadata
File Blocks Mappings
File5 Metadata
File6 Metadata
File Blocks Mappings
File7 Metadata
File8 Metadata
Distributed Database
HopsFS Architecture
�1111
NameNode
File Blocks Mappings
File1 Metadata
File2 Metadata
File Blocks Mappings
File3 Metadata
File4 Metadata
File Blocks Mappings
File5 Metadata
File6 Metadata
File Blocks Mappings
File7 Metadata
File8 Metadata
Distributed Database
HopsFS Architecture
�1111
NameNode
File Blocks Mappings
File1 Metadata
File2 Metadata
File Blocks Mappings
File3 Metadata
File4 Metadata
File Blocks Mappings
File5 Metadata
File6 Metadata
File Blocks Mappings
File7 Metadata
File8 Metadata
Distributed Database
HopsFS Architecture
�1111
NameNode
File Blocks Mappings
File1 Metadata
File2 Metadata
File Blocks Mappings
File3 Metadata
File4 Metadata
File Blocks Mappings
File5 Metadata
File6 Metadata
File Blocks Mappings
File7 Metadata
File8 Metadata
Distributed Database
HopsFS Architecture
�1111
NameNode
File Blocks Mappings
File1 Metadata
File2 Metadata
File Blocks Mappings
File3 Metadata
File4 Metadata
File Blocks Mappings
File5 Metadata
File6 Metadata
File Blocks Mappings
File7 Metadata
File8 Metadata
Distributed Database
HopsFS Scalability
• 16X-37X the throughput of HDFS
• 37 times more files than HDFS
• 10 times lower latency
�12
Scale Challenge Winner (2017)Hops
Integration with NVMe
�13
https://cloud.google.com/compute/docs/disks/performance
Integration with NVMe
�13
https://cloud.google.com/compute/docs/disks/performance
Integration with NVMe
�13
https://cloud.google.com/compute/docs/disks/performance
HDFS (and S3) are designed around large blocks (optimized to overcome slow random I/O on disks), while new NVMe hardware supports fast random disk I/O (and potentially small blocks sizes)
Small files
�14
Small files
�14
0
0.2
0.4
0.6
0.8
1
1 KB
4 KB
5 KB
6 KB
8 KB
16 KB
32 KB
64 KB
100 KB
512 KB
1 MB
8 MB
64 MB
256 MB
1 GB
128 GB
CDF
File Size
a. File Size Distribution
Yahoo HDFS File DistributionSpotify HDFS File Distribution
LC HopsFS File Distribution At Yahoo! and Spotify ≈20% of the files are less than 4 KB. Logical Clocks’ HopsFS cluster ≈68% of the files are less than 4 KB
Small files
�14
0
0.2
0.4
0.6
0.8
1
1 KB
4 KB
5 KB
6 KB
8 KB
16 KB
32 KB
64 KB
100 KB
512 KB
1 MB
8 MB
64 MB
256 MB
1 GB
128 GB
CDF
File Size
a. File Size Distribution
Yahoo HDFS File DistributionSpotify HDFS File Distribution
LC HopsFS File Distribution At Yahoo! and Spotify ≈20% of the files are less than 4 KB. Logical Clocks’ HopsFS cluster ≈68% of the files are less than 4 KB
0
0.2
0.4
0.6
0.8
1
1 KB
4 KB
5 KB
6 KB
8 KB
16 KB
32 KB
64 KB
100 KB
512 KB
1 MB
8 MB
64 MB
256 MB
1 GB
128 GB
CDF
File Size
b. File Operations Distribution
Spotify HDFS File Ops DistributionLC HopsFS File Ops Distribution At Spotify, and Logical
Clocks ≈ 42% and ≈18% of all the file system operations are performed on files less than 16 KB files
Size Matters
�15
Small Files performance in HopsFS
�16
Open Images dataset
�17
83.5% of the files in the dataset are ⩽ 64 KB.
Open Images Dataset
�18
Requirements
•Reading/Writing millions of images with high throughput
• Attaching annotations to each image, and then searching using these annotations
�19
Attaching Extended Metadata
�20
Attaching Extended Metadata
�20
Attaching Extended Metadata
�20
Attaching Extended Metadata
�20
Foreign key
Attaching Extended Metadata
�20
attach /images/1.jpeg ’1 cat and 1 guitar’
Foreign key
Attaching Extended Metadata
�20
attach /images/1.jpeg ’1 cat and 1 guitar’
Foreign key
Attaching Extended Metadata
�20
attach /images/1.jpeg ’1 cat and 1 guitar’
Foreign key
Free text search?
HopsFS | ElasticSearch
�21
HopsFS | ElasticSearch
�21
HopsFS
HopsFS | ElasticSearch
�21
HopsFS ElasticSearch
HopsFS | ElasticSearch
�21
HopsFS ElasticSearch
HopsFS | ElasticSearch
�21
HopsFS ElasticSearch
1.jpeg
HopsFS | ElasticSearch
�21
HopsFS ElasticSearch
1 Dog
1.jpeg
HopsFS | ElasticSearch
�21
HopsFS ElasticSearch
1 Dog
1.jpeg
dog [1.jpeg,……]
HopsFS | ElasticSearch
�21
HopsFS ElasticSearch
1 Dog
1.jpeg
dog [1.jpeg,……]
Get All images that has a dog
HopsFS | ElasticSearch
�21
HopsFS ElasticSearch
1 Dog
1.jpeg
dog [1.jpeg,……]
Get All images that has a dog
HopsFS | ElasticSearch
�21
HopsFS ElasticSearch
1 Dog1 Cat and 1 Guitar
1.jpeg
dog [1.jpeg,……]
Get All images that has a dog
HopsFS | ElasticSearch
�21
HopsFS ElasticSearch
1 Dog1 Cat and 1 Guitar
1.jpeg
dog [1.jpeg,……]
Get All images that has a dog
?
HopsFS | ElasticSearch
�21
HopsFS ElasticSearch
1 Dog1 Cat and 1 Guitar
1.jpeg
dog [1.jpeg,……]
Get All images that has a dog
?
Store X
HopsFS | ElasticSearch
�21
HopsFS ElasticSearch
1 Dog1 Cat and 1 Guitar
1.jpeg
dog [1.jpeg,……]
Get All images that has a dog
?
Store X
ePipe
�22
ePipe
ePipe
�22
HopsFS
ePipe
ePipe
�22
HopsFS
NDB ePipe
ePipe
�22
HopsFS
NDB ePipe
ePipe
�22
HopsFS
NDB
Log fs changes
ePipe
ePipe
�22
HopsFS
NDB
Log fs changes
ChangeStream ePipe
ePipe
�22
HopsFS
NDB
Log fs changes
ChangeStream ePipe
ePipe
�22
HopsFS
NDB
Log fs changes
ChangeStream ePipe
ElasticSearch
ePipe
�22
HopsFS
NDB
Log fs changes
ChangeStream ePipe
ElasticSearch
Store X
Store Y
ePipe
�22
HopsFS
NDB
Log fs changes
ChangeStream ePipe
ElasticSearch
Store X
Store Y
ePipe
�22
HopsFS
NDB
Log fs changes
ChangeStream ePipe
ElasticSearch
Store X
Store Y
App A
App B
ePipe
�22
HopsFS
NDB
Log fs changes
ChangeStream ePipe
ElasticSearch
Store X
Store Y
App A
App B
ePipe
�23
HopsFS
NDBePipe
ePipe
�23
HopsFS
NDB
Create f1
ePipe
Append f1
ePipe
�23
HopsFS
NDB
Create f1
ePipe
Append f1
Epoch1
Create f1Append f1
ePipe
�23
HopsFS
NDB
Create f1
ePipe
Append f1
Epoch1
Create f1Append f1
ePipe
�23
HopsFS
NDB
Create f1
ePipe
Append f1Create f2Delete f1
Epoch1
Create f1Append f1
ePipe
�23
HopsFS
NDB
Create f1
ePipe
Append f1Create f2Delete f1
Epoch1Epoch2
Create f1Append f1
Create f2Delete f1
ePipe
�23
HopsFS
NDB
Create f1
ePipe
Append f1Create f2Delete f1
Epoch1Epoch2
Create f1Append f1
Create f2Delete f1
ePipe
�23
HopsFS
NDB
Create f1
ePipe
Append f1Create f2Delete f1
Epoch1Epoch2
Create f1Append f1
Create f2Delete f1
Delete f2
ePipe
�23
HopsFS
NDB
Create f1
ePipe
Append f1Create f2Delete f1
Epoch1Epoch2Epoch3
Create f1Append f1
Create f2Delete f1
Delete f2
Delete f2
ePipe
�23
HopsFS
NDB
Create f1
ePipe
Append f1Create f2Delete f1
Epoch1Epoch2Epoch3
Create f1Append f1
Create f2Delete f1
Delete f2
Delete f2
Order?Order?
ePipe
�23
HopsFS
NDB
Create f1
ePipe
Append f1Create f2Delete f1
Epoch1Epoch2Epoch3
Create f1Append f1
Create f2Delete f1
Delete f2
Delete f2
Order?Order?
ePipe
�23
HopsFS
NDB
Create f1
ePipe
Append f1Create f2Delete f1
Epoch1Epoch2Epoch3
Create f1Append f1
Create f2Delete f1
Delete f2
Delete f2
Order?Order?
NDB Ordering Properties
• Property 1: epochs are totally ordered.
• Property 2: Changes within the same transaction happen in the same epoch.
• Property 3: Changes on files are ordered only if they are in different epochs, that is, no ordering is guaranteed within the same epoch
�24
Adding version numbers
�25
HopsFS
NDB
Create f1
ePipe
Append f1Create f2Delete f1
Epoch1Epoch2Epoch3
Create f1Append f1
Create f2Delete f1
Delete f2
Delete f2
Order?Order?
Adding version numbers
�25
HopsFS
NDB
Create f1
ePipe
Append f1Create f2Delete f1
Epoch1Epoch2Epoch3
Create f1Append f1
Create f2Delete f1
Delete f2
Delete f2
Order?Order?
, 1
, 1
Adding version numbers
�25
HopsFS
NDB
Create f1
ePipe
Append f1Create f2Delete f1
Epoch1Epoch2Epoch3
Create f1Append f1
Create f2Delete f1
Delete f2
Delete f2
Order?Order?
, 1, 2
, 1, 2
Adding version numbers
�25
HopsFS
NDB
Create f1
ePipe
Append f1Create f2Delete f1
Epoch1Epoch2Epoch3
Create f1Append f1
Create f2Delete f1
Delete f2
Delete f2
Order?Order?
, 1, 2
, 1, 2
Adding version numbers
�25
HopsFS
NDB
Create f1
ePipe
Append f1Create f2Delete f1
Epoch1Epoch2Epoch3
Create f1Append f1
Create f2Delete f1
Delete f2
Delete f2
Order?Order?
, 1, 2
, 1, 3
, 1, 2
, 1, 3
Adding version numbers
�25
HopsFS
NDB
Create f1
ePipe
Append f1Create f2Delete f1
Epoch1Epoch2Epoch3
Create f1Append f1
Create f2Delete f1
Delete f2
Delete f2
Order?Order?
, 1, 2
, 1, 3, 2
, 1, 2
, 1, 3
, 2
ePipe Ordering Properties
•Property 4 & 5: Version number ensures serializability of changes on the same file/directory within epochs.
•Property 6: The order of changes for different files/directories within the same epoch doesn't matter.
�26
ePipe
• Low replication lag (~100msec)
• High throughput
�27
Requirements
• Reading/Writing millions of images with high throughput
• Attaching annotations to each image, and then searching using these annotations
�28
Questions?
�29