introduciton of distributed computing
TRANSCRIPT
Distributed Computing – Case Study
Outline
• What is distributed computing
• Case study
– Hadoop – HDFS and map reduce
– Gluster File System
What is Distributed Computing/System?
• Distributed computing
– A field of computing science that
studies distributed system.
– The use of distributed systems to
solve computational problems.
• Distributed system
– Wikipedia• There are several autonomous
computational entities, each of which has its
own local memory.
• The entities communicate with each other by
message passing.
– Operating System Concept• The processors communicate with one
another through various communication
lines, such as high-speed buses or
telephone lines.
• Each processor has its own local memory.
What is Distributed Computing/System?
• Distributed program
– A computing program that runs in a distributed
system
• Distributed programming
– The process of writing distributed program
What is Distributed Computing/System?
• Common properties
– Fault tolerance• When one or some nodes fails, the whole system can still work fine
except performance.
• Need to check the status of each node
– Each node plays partial role• Each computer has only a limited, incomplete view of the system. Each
computer may know only one part of the input.
– Resource sharing• Each user can share the computing power and storage resource in the
system with other users
– Load Sharing• Dispatching several tasks to each nodes can help share loading to the
whole system.
– Easy to expand• We expect to use few time when adding nodes. Hope to spend no time
if possible.
CASE STUDY - HADOOP
Quick overview
Paramount Q1 2008 - 7
• Features
• HDFS
• Map-Reduce Framework
Features
• Large files
– Gigabytes, Terabytes
• Write once, read many
• Commodity Hardware
HDFS
• Namenode:
– manages the file system namespace and
regulates access to files by clients.
– determines the mapping of blocks to DataNodes.
– fsImage and editLog
• Data Node :
– manage storage attached to the nodes that they
run on
– save CRC codes
– send heartbeat to namenode.
– Each data is split as a chunk and each chuck is
stored on some data nodes.
HDFS
• Secondary Namenode
– responsible for merging fsImage and EditLog
– Not a namenode
HDFS architecture
Secondary namenode
• Edit log
– Transaction log• Update transaction log before updating content in memory
• Always update this file when each request has been sent to namenode
• fsImage
– Persistent checkpoint
• Secondary namenode
– Responsible for merging editLog and fsImage.
Secondary namenode
From Hadoop - The Definitive Guide
Map-Reduce Framework
• JobTracker
– Responsible for dispatch job to each tasktracker
– Job management like removing and scheduling.
• TaskTracker
– Responsible for executing job. Usually tasktracker
launch another JVM to execute the job.
Map-Reduce Framework
From Hadoop - The Definitive Guide
Summary - Hadoop
• Hadoop provides a distributed file system (HDFS) that
stores data on the compute nodes, providing very high
aggregate bandwidth across the cluster.
• Hadoop implements a computational paradigm named
Map/Reduce, where the application is divided into
many small fragments of work, each of which may be
executed or reexecuted on any node in the cluster.
CASE STUDY – GLUSTER
FILESYSTEM
Quick overview
Paramount Q1 2008 - 18
• Introduction
• Gluster File system design
• Example : 4 nodes GlusterFS
GlusterFSCluster File System
Introduction
• GlusterFS is an open source clustered file system
and runs on industry standard hardware from any
vendor and delivers multiple times the scalability
and performance of conventional storage at a
fraction of the cost.
=
N x Performance & Capacity
+ +
GlusterFS Overview
From GlusterFS Datasheet
GlusterFS Design
GigE
GlusterFS Clustered Filesystem on x86-64 platform
Storage ClientsCluster of Clients (Supercomputer, Data Center)
GLFS Client
Clustered Vol Manager
Clustered I/O Scheduler
GLFS Client
Clustered Vol Manager
Clustered I/O Scheduler
GlusterFS Client
Clustered Vol Manager
Clustered I/O Scheduler
GLFS Client
Clustered Vol Manager
Clustered I/O Scheduler
GLFS Client
Clustered Vol Manager
Clustered I/O Scheduler
GlusterFS Client
Clustered Vol Manager
Clustered I/O Scheduler
GLFSDVolumeGLFSDVolume
Storage Brick 1
GlusterFSVolume
Storage Brick 2
GlusterFSVolume
Storage Brick 3
GlusterFSVolume
GLFSDVolumeGLFSDVolume
Storage Brick 4
GlusterFSVolume
Storage Gateway
NFS/Samba
GLFS Client
Storage Gateway
NFS/Samba
GLFS Client
Storage Gateway
NFS/Samba
GLFS Client
InfiniBand RDMA (or) TCP/IP
NFS / SAMBA over TCP/IP
Compatibility withMS Windows
and other Unices
From http://www.gluster.org/
Key Design Considerations
• Capacity Scaling
– Scalable beyond Peta Bytes
• I/O Throughput Scaling
– Pluggable Clustered I/O Schedulers
– Advantage of RDMA transport
• Reliability
– Non Stop Storage
• Ease of Manageability
– Self Heal
– NFS like Disk Layout
• Elegance in Design
– Stackable Modules
– Not tied to I/O Profiles or Hardware or OS
Translators• Performance translators
1. Read Ahead
2. Write Behind
3. Threaded I/O
4. IO-Cache
• Clustering translators
1. Automatic File Replication (AFR)
2. Stripe
3. Unify
• Scheduling translators
1. Adaptive Least Usage (ALU)
2. Non-uniform filesystem architecture (NUFA)
3. Random
4. Rand-Robin
FUSE
• What’s FUSE ?
• Stands for “File system in USErspace”
• Makes it easy to write new filesystems
1.without knowing how the kernel works
2.without breaking unrelated things
3.more quickly/easily than traditional file systems
built as a kernel module
FUSE structure
From http://fuse.sourceforge.net/
How FUSE Works
• Application makes a file-related syscall
• Kernel figures out that the file is in a mounted
FUSE filesystem
• The FUSE kernel module forwards the
request to your userspace FUSE app
• Your app tells FUSE how to reply
Example : 4 nodes GlusterFS
Storage Virtualization : GlusterFS (AFR + Unify) ~1.8TB
Virtual Machine (XEN + KVM) Web App. MySQL
User
Server
POSIX
Ext4
vlab01
GlusterFS Server
Global Name Space ( /mnt/glusterfs )
Server
POSIX
Ext3
vlab02
GlusterFS Server
Server
POSIX
XFS
vlab03
GlusterFS Server
Server
POSIX
Ext3
vlab04
GlusterFS Server
TCPIP – GigE
User User
The view of GlusterFS client
• $ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 901G 115G 740G 14% /
tmpfs 4.0G 0 4.0G 0% /dev/shm
/etc/glusterfs/glusterfs.vol 1.8T 243G 1.6T 13% /mnt/glusterfs
benchmark.pdf
test.ogg
initcore.c
mylogo.xcf
driver.c
ether.c
test.m4a
Unify Volume
work.ods
corporate.odp
driver.c
The view of GlusterFS server
accounts-2007.db
backup.db.zip
accounts-2006.db
accounts-2007.db
backup.db.zip
accounts-2006.db
accounts-2007.db
backup.db.zip
accounts-2006.db
Mirror Volume
north-pole-map
dvd-1.iso
xen-image
north-pole-map
dvd1.iso
xen-image
north-pole-map
dvd1.iso
xen-image
Stripe Volume
BRICK1 BRICK2 BRICK3
Summary - GlusterFS
• GlusterFS clusters together storage building blocks,
aggregating disk and memory resources and
managing your data in a single global namespace.
• GlusterFS is based on a stackable architecture that
can be optimized for specific application profiles with
simple plug-in modules, optimizing performance for a
wide range of workloads.
Reference
• http://en.wikipedia.org/wiki/Message_passing
• http://en.wikipedia.org/wiki/Distributed_computing
• http://en.wikipedia.org/wiki/Filesystem_in_Userspace
• http://en.wikipedia.org/wiki/Distributed_file_system
• http://hadoop.apache.org/
• Tom White - Hadoop - The Definitive Guide
• Silberschatz Galvin - Operating System Concepts
• http://www.gluster.org/
• http://www.zresearch.com/
• http://fuse.sourceforge.net/