google file system

33

Upload: anurag-gautam

Post on 15-Jul-2015

353 views

Category:

Engineering


1 download

TRANSCRIPT

High performing, scalable, distributed

file system.

Batch oriented , data-intensive apps.

Fault-tolerant.

Inexpensive commodity hardware.

Google uses the GFS to organize huge

files and to allow application

developers,the research and

development resources they require.

GFS is unique to Google and isn't for sale.

Inexpensive commodity hardware.

Modest number of large files.

Large streaming reads, small random

reads. (map-reduce)

Mostly appends.

Consistent concurrent execution is

important.

High throughput and low latency.

Need large, distributed, highly fault

tolerant file system

GFS CLIENT

GFS MASTER SERVER

GFS CHUNKSERVER

Control (metadata) requests to master

server

Data requests to chunkservers

Caches metadata

No caching of data

DVS 8

Manages metadata

Manages chunk creation, replication,

placement

Performs check pointing and logging of

changes to metadata

Garbage Collection

Periodically communicate with chunkservers

(Heart Beat Message)

DVS 9

• Files are divided into fixed-size chunks

• Chunk Servers store chunks on local disk as Linux

files

• Unique 64 bit chunkhandle

• Replication for Reliability

• Chunk Size : 64MB - Much larger than typical file

system block sizes

DVS 10

CREATE

READ

WRITE

GFS Client

Application

GFS Chunk

Server

GFS Chunk

Server…

GFS Master

Create /home/user/filename

GFS Client

Application

GFS Chunk

Server

GFS Chunk

Server…

GFS Master

• Update operation log

• update metadata

rack 2rack 1

Create /home/user/filename

GFS Client

Application

GFS Chunk

Server

GFS Chunk

Server…

GFS Master

• Update operation log

• update metadata

• choose locations for chunks

• across multiple racks

• across multiple networks

• machines with low contention

• machines with low disk use

rack 2rack 1

Create /home/user/filename

GFS Client

Application

GFS Chunk

Server

GFS Chunk

Server…

GFS Master

• Update operation log

• update metadata

• choose locations for chunks

rack 2rack 1

Returns chunk handle,

Chunk locations

GFS Client

Application

GFS Chunk

Server

GFS Chunk

Server…

GFS Master

filename and

chunk index

GFS Client

Application

GFS Chunk

Server

GFS Chunk

Server…

GFS Master

chunk handle,

server locations

GFS Client

Application

GFS Chunk

Server

GFS Chunk

Server…

GFS Master

Chunk handle,

bit range

GFS Client

Application

GFS Chunk

Server

GFS Chunk

Server…

GFS Master

Data

GFS Client

Application

GFS Chunk

Server

GFS Chunk

Server…

GFS Master

Chunk id,

chunk offset

GFS Chunk

Server

GFS Client

Application

GFS Chunk

Server

GFS Chunk

Server…

GFS Master

Chunkserver locations

(caches this)

GFS Chunk

Server

GFS Client

Application

GFS Chunk

Server

GFS Chunk

Server

GFS Master

GFS Chunk

Server

data

Pass along data to nearest replica

GFS Client

Application

GFS Chunk

Server

GFS Chunk

Server…

GFS Master

Serializes all concurrent writes

GFS Chunk

Server

operation

GFS Client

Application

GFS Chunk

Server

GFS Chunk

Server…

GFS Master

GFS Chunk

Server

serialized order of writes

GFS Client

Application

GFS Chunk

Server

GFS Chunk

Server…

GFS Master

GFS Chunk

Server

ackack ack

GFS Client

Application

GFS Chunk

Server

GFS Chunk

Server…

GFS Master

GFS Chunk

Server

ack, chunk index

Extremely cheap hardware

› High failure rate

Highly concurrent reads and writes

Highly scalable

Supports undelete (for configurable

time)