survey - networked file systems & file servers › sites › default › education ›...

55
EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan Goldick, ONStor Philippe Nicolas, Brocade

Upload: others

Post on 03-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Advanced Data Sharing -Survey of Networked File Systems & File Servers

Jonathan Goldick, ONStorPhilippe Nicolas, Brocade

Page 2: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

2

SNIA Legal Notice

• The material contained in this tutorial is copyrighted by the SNIA.

• Member companies and individuals may use this material in presentations and literature under the following conditions:– Any slide or slides used must be reproduced without modification– The SNIA must be acknowledged as source of any material used

in the body of any document containing material from these presentations.

• This presentation is a project of the SNIA Education Committee.

Page 3: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

3

Abstract

Survey of Networked File Systems and File Servers

With all of the new advances in file systems and file server technology how do you know which ones are the best for you? This presentation will provide a framework for evaluating file systems approaches and a look at how each approach is evolving. Topics discussed will include: survey of local, SAN, clustered, NAS, global, and wide area file systems, how application characteristics should affect your choice of file systems, as well as performance, scalability, ease of use, data management, deployment and maintenance and cost considerations.

Page 4: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

4

Agenda

• File Services• How to Evaluate File Systems• Comparison of File System Types• Conclusion

Page 5: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

5

What This Session Will Not Cover

• Volume Managers• Databases• Storage Layer – Block Services• Security Models

Check outSNIA Tutorial:

Check outSNIA Tutorial:

Object-Based Storage Device (OSD) –Architecture and System

Check outSNIA Tutorial:

Check outSNIA Tutorial:

NAS & iSCSITechnologyOverview

Page 6: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

6

TheSNIA Shared Storage Model

File/record layerFile/record layer

Database(dbms)

File system(FS)

Stor

age

dom

ain

Block layerBlock layer

Storage devices (disks, …)Storage devices (disks, …)

Ser

vice

sS

ervi

ces

Dis

cove

ry, m

onito

ring

Dis

cove

ry, m

onito

ring

Res

ourc

e m

gmt,

conf

igur

atio

nR

esou

rce

mgm

t, co

nfig

urat

ion

Sec

urity

, bill

ing

Sec

urity

, bill

ing

Red

unda

ncy

mgm

t (ba

ckup

, …)

Red

unda

ncy

mgm

t (ba

ckup

, …)

Hig

h av

aila

bilit

y (fa

il-ov

er, …

)H

igh

avai

labi

lity

(fail-

over

, …)

Cap

acity

pla

nnin

gC

apac

ity p

lann

ing

Network

Host

DeviceBlock aggregation

Application

Page 7: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

7

File Services

• Accommodate Data Types– Structured: Data Base– Semi-Structured: email– Unstructured: Text, Excel, image files, etc.

• Provide Interface to Storage• Manage Files

– Backup– Provisioning– Availability

• Allow Data Sharing

Page 8: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

8

What is a File System?

• A management system that exports a hierarchy of files and directories with a simple and constrained set of access methods.

• Access methods follow one of a very few semantic models, POSIX, NTFS, etc.

• Coordinates access to data and state information between multiple requests.

• Manages disk utilization on behalf of requests.• Maintains metadata integrity and restores it in

the event of a failure.

Page 9: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

9

File System Components

Volume Management

Access Mediator

Metadata Methods Data Methods

Interface to Storage

Metadata Cache Data Cache

ILM

Transaction Manager

Recovery Logic

Inodes, Directories,

etc. File Data

Fast Failure Recovery

Logic

Data Placement Strategy

Lock Management, Access Control

Interface to External

Applications

Volume Access Mediator Cluster/SAN FS Access Management

SCSI, FC, etc.

LUN(s), Logical Blocks, RAID,

Mirroring, Striping, etc.

Auditing

Snapshots

Application Access Methods

Block Allocator

Page 10: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

10

File System Types

1. Local File Systema) Host-based, single operating systemb) Co-located with application serverc) Many types with unique formats, feature mix

2. Distributed File Systema) Remote, network-accessb) Semantics are limited subset of local file systems c) Cooperating file serversd) May include integrated replication

3. Shared (SAN and Clustered) File Systemsa) Host-based file systemsb) Hosts access all datac) Co-located with application server for performance

4. Clustered Distributed File Systema) Each file server runs a SAN/Clustered file systemb) Global name space enables access to all data

5. Wide Area File Systema) Distributed file system b) Improved performance over long latency networksc) Deployed as appliances in a hub-spoke model

Page 11: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

11

Evaluating File Systems

• Does it fit the Application Characteristics?– Does the application even support the file system?– Is it optimized for the type of operations that are important to the

application?

• Performance & Scalability– Does the file system meet the latency and throughput

requirements?– Can it scale up to the expected workload and deal with growth?– Can it support the number of files and total storage needed?

• Data Management– What kind of features does it include? Backup, Replication,

Snapshots, ILM, …

Page 12: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

12

Evaluating File Systems

• Security– Does it conform to the security requirements of your company?– Does it integrate with your security services?– Does it have Auditing, Access Control and at what granularity?

• Ease of Use– Does it require training the end users or changing applications to

perform well?– Can it be easily administered in small and large deployments?– Does it have centralized monitoring, reporting?– How hard is it to recover from a software or hardware failure and

how long does it take?– How hard is it to upgrade or downgrade the software and is it

live?

Page 13: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

13

Application Characteristics

Workload Profiles (A) (B) (C) (D) (E)1. Latency Sensitive High Med Low Low High

2. Throughput High read/write

High read Low High read High

write3. Concurrent sharing High High Low High read Low4. Caching (re-read rate) High High High Low Low

Typical Applications:(A) OLTP (B) Small Data Mart(C) Home Directory

(D) Large Scale Streaming (Web Farm)(E) High Frequency Meta Data Update (small file create/delete)

Page 14: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

14

Performance & Scalability

• Performance– Throughput– Read / write access patterns– Impact of data protection mechanisms, operations

• Scalability– Number of files, directories, file systems– Performance, recovery time– Simultaneous and active users

Page 15: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

15

Data Management

• Backup– Performance– Backup vendors; native agent vs. network-based– Data de-duplication – backup once

• Replication– Multiple read-only copies– Optimization for performance over network– Data de-duplication – transfer once

Page 16: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

16

Data Management

• Quotas– Granularity

• User quotas• Group quotas• Directory tree quotas• Nested directory tree quotas

– Extended quota features– Ease of set up– Native vs. external servers– Scalability with increasing number of files– Quota per user vs. quota per file system per user

Page 17: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

17

Data Management

• Information Lifecycle Management (ILM)– Lots of features, differing definitions– Can enforce compliance and auditing rules– Cost & performance vs. impact of lost/altered data

Check outSNIA Tutorial:

Check outSNIA Tutorial:

The Secret Sauce of ILM – The Professional ILM

Check outSNIA Tutorial:

Check outSNIA Tutorial:

ILM: Tiered Services and the Need for Classification

Page 18: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

18

Security Considerations

• Authentication– Support and to what degree

• Authorization– Granularity by access types– Need for client-side software– Performance impact of large scale ACL changes

• Auditing– Controls– Audit log full condition– Login vs. login attempt vs. data access– Digitally signed audit trails

Page 19: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

19

Security Considerations

• Virus Scanning– Preferred vendor supported?– Performance & scalability– External vs. file server-side virus scanning

• Vulnerabilities– Security & data integrity vulnerabilities vs. performance & cost– Compromised file system

• One client• One file server

– Detection– Packet sniffing

Page 20: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

20

Ease of Use

• End-User– Local file systems vs. any other

• Deployment & Maintenance– Implementation– Scalability of management– File system migration between servers– Automatic provisioning– Centralized monitoring, reporting, phone-home– Hardware failure recovery– Single points of failure– Performance monitoring

Page 21: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

21

How Do They Stack Up?

• Local File System

• Distributed File System

• Shared File System

• Clustered Distributed File System

• Wide Area File System

Page 22: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

22

Local File System

• Key Characteristics– Easy to use– Scale up via processor performance– Cannot scale file services independently of application services– Most feature-rich data management tools but you cannot offload

data management• Quota Management for Windows• HSM/ILM• Data Classification

– Islands of storage• Low utilization rates• Non-scalable management

Page 23: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

23

Local File System

• Target Applications (or Best Suited for Applications)– Productivity software applications such as MS Excel, Word, text

editors– Personal file services for Unix, Windows & Linux applications– Small and medium size databases used for semi-structured

applications (e.g. Email systems – MS Exchange, Lotus Notes)– Document Management Systems– Tightly integrated Web applications (web servers, application

servers etc.)• Interesting Developments

– Content aware file systems– Transactional semantics on top of file systems (winfs)– Encryption and steganography– Data de-duplication

Page 24: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

24

How Do They Stack Up?

• Local File System

• Distributed File System

• Shared File System

• Clustered Distributed File System

• Wide Area File System

Page 25: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

25

Client Client

Distributed File System

File Server

StorageNetwork

Data & Control Access

Shared Disks

Client

NAS Protocols* NFS * WebDAV* CIFS * HTTP* AFS

Client

Page 26: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

26

Distributed File System

• Key Characteristics– Often purpose-built file servers – No real standardization for file sharing across Unix (NFS) and

Windows (CIFS)– Scales independently of application services– Performance limited to that of a single file server– Reduces (not eliminate) islands of storage– Replication sometimes built in– Global name space through external service– Less featured 3rd party data management tools– Strong network security supported– NAS historically suffered a variety of security vulnerabilities– Implementations evolved to leveraging block storage technology

Page 27: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

27

Distributed File System

Target Applications (or Best Suited for Applications)– NFS, CIFS, static HTTP file serving– Productivity applications that can store files over a

networked share (E.g., MS Excel, Word, Text editors)– Generic home directory files– Large scale software development with file sharing– Reference data archiving– Image file sharing – PACS

Page 28: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

28

Distributed File System

• Complimentary Products– Network compression products– NDMP for server-less backup

• Interesting Developments– NFSv4 (v4.1 soon)– NFS RDMA– Parallel NFS– Native ILM– Content Addressable Storage– Virtualized file servers

Page 29: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

29

How Do They Stack Up?

• Local File System

• Distributed File System

• Shared File System

• Clustered Distributed File System

• Wide Area File System

Page 30: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

30

Shared File System

• SAN File System

NFS/CIFSServer

SharedDisks

StorageNetwork

Data Network - LANMetadata

Server

Client sw Client sw

Block list

File Request

Data Access

App.

App. App.

Page 31: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

31

Shared File System

• Cluster File System

FirstHost

Shared Disks

HeartBeatLock Management

StorageNetwork

Cluster File System

Cluster Volume ManagerCluster

WebServer

WebServer

WebServer

Page 32: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

32

Shared File System

• Key Characteristics– Tremendous scalability– Highest data throughput– Applications must be cluster-ready– Less featured/unsupported 3rd party data management tools– Offload data management tasks – Secure all application hosts– Downtime to upgrade cluster of hosts – Limited host operating system versions– Lead time for certification of new operating systems/versions– Single master server can be a single point of failure

Page 33: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

33

Shared File System

• Target Applications (or Best Suited for Applications)– Applications that need large files to be shared by multiple

processes in a workflow (e.g., scientific computations, video post-production rendering, vector analysis, seismic data analysis)

– Database applications for OLTP (e.g., Oracle 9i RAC)– High performance computing applications (e.g., Rendering, Grid

computing, Financial analysis, Computer Aided designs)– Highly scalable Web serving

Page 34: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

34

How Do They Stack Up?

• Local File System

• Distributed File System

• Shared File System

• Clustered Distributed File System

• Wide Area File System

Page 35: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

35

Clustered Distributed File System

Cluster File SystemLock Management

Cluster

File Server File Server

Cluster Volume Manager

StorageNetwork

HeartBeatFirstHost

Optional Layer

Shared Disks

File Server

Client Client

Client

NAS Protocols

Client

Page 36: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

36

Clustered Distributed File System

• Key Characteristics– Advantages of distributed file systems + performance and

scalability of a Shared File System– Span a number of file servers– File servers may not have full read/write access to all files– Usually includes integrated global name space feature– No host operating system compatibility issues.– Upgrade complexity less than with shared file systems– Mostly an NFS solution today

• Target Applications (or Best Suited for Applications)– Scalable NFS-based file services– High throughput when reading & writing large files– Require concurrent data access to files and data

• Seismic data analysis, CAD & E-CAD design simulations, digital image rendering, pre & post-production video

Page 37: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

37

How Do They Stack Up?

• Local File System

• Distributed File System

• Shared File System

• Clustered Distributed File System

• Wide Area File System

Page 38: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

38

NAS Aggregation – WAFS(Core + Edge)

FileServer

NFS/CIFSServer

SharedDisks

StorageNetwork

Data Network - LAN

Data and Control Access

NFS/CIFSClient

NFS/CIFSClient

NFS/CIFSClient

NFS/CIFSClient

Data Network - LAN

NFS/CIFSClient

NFS/CIFSClient

NFS/CIFSClient

NFS/CIFSClient WAN DataCenter

Data Network - LAN

NFS/CIFSClient

NFS/CIFSClient

NFS/CIFSClient

NFS/CIFSClient

Remote Offices

Recent MethodWAFSEdge

Appliance

WAFSCore

Appliance

WAFSEdge

Appliance

NAS Protocols

NAS Protocols

NAS Protocols

• Private protocol

• Excellent in Read caching mode

• Write-through is preferable

Page 39: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

39

Wide Area File System

• Key Characteristics– LAN-like performance over the WAN

• Optimized TCP/IP file sharing protocols• Storage and data caching for reducing file access latencies

– Cached data enables high read performance– Hub-spoke model of remote office data services– Consolidated of data management and file services – Little concurrent read/write sharing of files across sites– Scalability dependent on remote file servers’ native file system – Application aware data caching beyond simple unstructured files

• E.g. Microsoft Exchange– Remote NAS file servers to provide data management capabilities – Security

• Secure communications (encryption) at the network layer– Remote NAS file servers provide authentication and authorization

mechanisms – Complexity at data center vs. managing systems at all remote sites

Page 40: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

40

Wide Area File System

• Target Applications (or Best Suited for Applications)– Distributed software development for a variety of

applications (e.g., Computer Aided Design)– File sharing applications (e.g., home directory,

document management)– Email messaging systems (e.g., MS Exchange)– Web applications– Distributed print services

Page 41: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

41

Conclusion

• There are a number of things to consider when choosing a file system or server.– Will the application work as desired?– Will it perform and scale?– Does it have the required data management services?– Is it secure enough?– Is it easy to use and manage?

• There is no single solution that is superior in all cases.

Page 42: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

42

Q&A / Feedback

• Please send any questions or comments on this presentation to SNIA: [email protected]

Many thanks to the following individuals for their contributions to this tutorial.

SNIA Education Committee

David Black, EMC Philippe Nicolas, BrocadeNarayan Venkat, ONStor Elaine Silber, Firefly Comm.Jonathan Goldick, ONStor

Page 43: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Appendix

Page 44: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

44

Performance & Scalability Notes

• Performance– Throughput is basically how fast you can stream data in and out of the file system/server. More file servers

and better disk striping will give better results. Be mindful of the difference between out of cache versus sustained numbers, vendors commonly quote the former.

– If your applications are pretty much random in data access patterns, as is common with many small files, caching won’t help much and read ahead will hurt. Look for systems that have optimized for metadata writes and can strip across many disks.

– How does performance change when doing data management operations like snapshot and backup? These are regular and recurring tasks so take them into account when evaluating the performance of the system.

– Does the system do read after write? This hurts performance but ensures that the data made it to disk as intended. If your storage is not highly reliable, and your application is risk averse, this may be worth it.

• Scalability– How many files and directories can really fit in a single (logical) file system? This is not just addressable

storage. Consider how long would it take to recover from a disk subsystem outage/failure.– Most file systems have a performance cliff on directory size. Hundreds of thousands or millions of files in a

single directory works poorly on most systems.– How many simultaneous, and active, users can the system really support? What for quotes of supported

users that don’t actually say “active”.– How many file systems are supported and is performance that same when spread across many of them

when compared to just a few? If you are planning on consolidating your infrastructure, how many file systems would that be?

Page 45: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

45

Data Management Notes

• Backup– How does performance change when doing backup? If it halves performance, this is a critical deciding

factor. Does the cache get wiped out? Does the network bandwidth get used up?– Can you offload backup to other servers with some shared access to data?– Does it support a Disk-to-disk-to-tape model?– Does your preferred backup vendor support it?– Note that Native Agent backup applications often have far more features and capabilities than network-

based ones.– Data de-duplication is a real plus here. This is where multiple blocks with the same data only get backed up

once.• Replication

– Some file systems have native support for making read only copies of all or part of a directory tree. These replicas are generally mountable on other file servers and are used for disaster protection and content distribution. This is a simple way of getting very high read throughput without the complexity of SAN, cluster, or global file systems. If your application can meaningfully use data that is say at most a few minutes out of date, this is a very scalable alternative. Most content delivery systems fall into this category. As a rule this scales linearly with the number of file servers since there is little overhead.

– Be mindful of how many file servers can mount the same replica.– Look for systems that work with network optimizers and compression.– Look for systems that send the least amount of data over the network. Some send any file that changed,

some send any changed blocks, and some send only the changed bytes.– Data de-duplication is a real plus here. This is where multiple blocks with the same data only get transferred

up once.

Page 46: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

46

Data Management Notes

• Quotas– Make sure the quota features support the level of granularity you require.

• User quotas• Group quotas• Directory tree quotas• Nested directory tree quotas

– Does the file system directly, or through a 3rd party add-on, support extended quota features like a file-type quota? Think of a quota on all .MP3 files. There are many quota-like features out there with a variety of supported policies, reporting mechanisms, and automation.

– What kind of policy infrastructure is offered and how hard is it to set up?– Does it run natively on the file server or does it require external servers? External servers may not offer the

same level of high-availability, security, and scalability capabilities as the file servers they manage. – Just as in Backup, native agents often have more capabilities than remote ones. It’s not fast to determine

when to block the creation of an .MP3 file by calling out to an external server. Since this would increase latencies a great deal it often isn’t done.

– Does the software create a separate shadow copy of the entire directory tree? Most non-native systems do this and have scalability problems when the number of files gets large. Basically the software is keeping a duplicate of the directory tree, often in an external relational database.

– In NAS environments where CIFS and NFS are both present, is the one quota for a user or does CIFS usage and NFS usage count separately? Note that when there is a single quota there is some agent doing mappings between Windows and Unix domain. This gives a good user experience but quotas can be down for a long time when a rebuild needs to be done and you can get a load spike on your domain controllers. As a rule, directory tree quotas have lower overhead and don’t suffer load rebuild periods.

Page 47: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

47

Data Management Notes

• Information Lifecycle Management (ILM)– This can encompass a lot of very different features.– Often used to describe traditional Hierarchical Storage Management (HSM) but with a disk to disk instead of

a disk to tape model. In reality there are many views as to what constitutes ILM. SNIA to the rescue ☺– File systems play a key role in ILM. They mediate access to data so are a clear point at which to enforce

compliance and auditing rules. They control block allocation so can decide initial placement strategies. They own the name space so can efficiently implement retention policies.

– New file system models have recently emerged to tackle some of the core ILM problems, content addressable storage being a prime example. In CAS a signature based on the contents of a file is used instead of a file name, if the contents change so does the signature. These systems are generally much slower than traditional NAS because they are doing a great deal of extra data integrity checks, including read after write. The cost and performance penalty must be weighted against the regulatory hit you will take if data is lost or altered.

Check outSNIA Tutorial:

Check outSNIA Tutorial:

The Secret Sauce of ILM – The Professional ILM

Check outSNIA Tutorial:

Check outSNIA Tutorial:

ILM: Tiered Services and the Need for Classification

Page 48: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

48

Security Notes

• Authentication– All network file servers need to authenticate users, or trust clients not to lie to them (most NFS

deployments).– Most environments have domain authentication services, it is an important consideration whether a file

system supports them and to what degree. Note that most NAS authentication schemes supports multiple levels, watch for servers that only support the weaker strength choices.

• Authorization– File systems all have mechanisms to control access to a file. These can be as simple as the Unix rwx mode

bits to complex ACL(s) and compliance-related filters.– Make sure the authorization model has the required granularity in terms of access types.– If end users needs to be able to get/set the authorization fields watch for systems that require special client-

side software to function.– If you regularly change ACL(s) over large numbers of files on a regular basis you can hit performance

problems over NAS. NAS protocols are not optimized for recursive security changes. Compare an tree ACL update on a local NTFS file system to the same operation over CIFS. Companies that have strict security compliance requirements on ACL(s) often change them regularly when employees come and go.

• Auditing– Auditing of data accesses has been commonly available in Windows for many years.– Often included in ILM offerings.– Watch for controls on who can control/access the audit facility.– What happens if the audit log is full? Does the system stop serving data?– Are logins and login attempts audited as well or only data accesses?– Are audit trails digitally signed?

Page 49: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

49

Security Notes

• Virus Scanning– Does the file system support your preferred virus vendor?– What kind of performance penalty is incurred? A 50% loss of read throughput is not uncommon.– Can external virus scan servers be used, and if so, how many? External servers are regularly used in the

NAS world and allow you to add more virus scan resources for scalability. The downside is that not all vendors support this model of operation. With the rise of automated updates of virus definition files on clients, does file server-side virus scanning matter as much as it used to?

• Vulnerabilities– All file systems have some security and data integrity vulnerabilities. To remove them all has a huge

performance and complexity cost. The important thing is to have your eyes open when looking at the tradeoffs you are making.

– Is the file system compromised when any client on the network is? What if one file server is compromised, how far does that extend?

– Is there any way to know it has happened?– Can someone sniffing network packets easily read the data?

Page 50: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

50

Ease of Use Notes

• End-User– Local file systems that come bundled with the host operating system are the least likely to have application

problems or require end user training. Virtually any other choice has some issue that can affect the end user.

– When possible choose something that is standards-based. While this is no guarantee of success, it increases your odds.

• Deployment & Maintenance– How long does the initial implementation take?– Does management scale with the number of servers?– How hard is it to upgrade/downgrade the software? Is it a live upgrade?– Can file system load be migrated between servers to avoid performance bottlenecks? Is this live? Is it

policy-driven?– Does the system have any automatic provisioning? Being able to set some policies and have the system

provide some level of automation is a serious labor saving device. File systems only run out of space at 2am ☺

– Does it have centralized monitoring, reporting, phone-home support? This is a major part of scaling management with the number of servers.

– When there is a hardware failure how long does it take to recover? Can there be partial data access while a disk system is down?

– Are there any single points of failure?– Can you find out where your performance is going? Hot files, hot clients, etc.

Page 51: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

51

Distributed File System

• Key Characteristics– These are often purpose-built file servers with a focus on consolidation and data management.– Standards-based network file systems work with most, but definitely not all, applications. There is no real

standardization when it comes to file sharing across Unix (NFS) and Windows (CIFS). Not all distributed file systems provide the same sharing semantics.

– Can scale up file services independently of application services.– The maximum performance on a file system does not scale beyond what a single file server can provide.– Consolidates data management tasks onto fewer servers. Reduces, but does not eliminate, islands of

storage.– Replication sometimes built in.– Usually must be coupled with an external service to get a global name space. NAS aggregation not

generally available by default.– 3rd party data management tools generally have less features when they work at all. There is rarely support

for native agents.– File servers can be secured in a physical and networking sense.– Strong network security can be put in place, but can also be run wide open.– NAS as typically deployed has historically suffered a variety of security vulnerabilities. CIFS has been open

to “man in the middle attacks” where passwords are compromised. NFS without Kerberos is as secure as the least secure client on the network.

– Implementations have evolved from the file server implementing block data protection on proprietary storage to file servers leveraging sophisticated disk array and block storage virtualization technology

Page 52: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

52

Shared File System

• Key Characteristics– Can scale up enormously, often to hundreds or even thousands of nodes.– Provides the highest throughput to data.– Often performs poorly on small files and latency-sensitive applications due to locking overhead. Really

targeted at very large files and streaming applications.– Applications must be cluster-ready. They must be aware that the same data is accessible from multiple

hosts at the same time. They must handle locking and synchronization using either an out of band protocol or file system primitives. They can also avoid sharing conflicts by construction.

– Not all systems guarantee cache consistency on the cluster members. Can be a source of application compatibility problems.

– 3rd party data management tools may work but are often unsupported because the file systems are so new and are not widely deployed. The focus here is performance and scalability, not data management.

– You can offload data management tasks to dedicated members of the cluster.– Since all application hosts have full access to the data all of them must be secured.– Upgrading a cluster of hosts is often painful and has significant downtime.– Generally only runs on a small set of host operating system versions.– Certification of new operating systems, and new operating system releases, can lag by months.– Watch for systems that have a single master server as this can be a single point of failure.

Page 53: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

53

Clustered Distributed File System• Key Characteristics

– Combines the advantages of distributed file systems with the performance and scalability advantages of a Shared File System.

– Allows a single NAS file system, and even a single directory, to span a number of file servers.– Not all solutions allow all file servers to have full read/write access to all files, some use a proxy model

wherein all updates route through single host.– Usually comes with an integrated global name space feature. This allows any file to be uniquely named no

matter what file server is contacted.– Does not suffer from host operating system compatibility issues.– Upgrade can still be complex but is much more manageable than with shared file systems.– Mostly an NFS solution today.

• Target Applications (or Best Suited for Applications)– Scalable NFS-based file services for applications that require high throughput when reading & writing large

files.– High performance computing applications (seismic data analysis, CAD & E-CAD design simulations, digital

image rendering, pre & post-production video) that require concurrent data access to files and data.

Page 54: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Survey of Networked File Systems and File Servers © 2007 Storage Networking Industry Association. All Rights Reserved.

54

Wide Area File System

• Unique Characteristics– Goal is to give LAN-like performance over the WAN using optimized TCP/IP file sharing protocols and

storage and data caching for reducing file access latencies.– Read performance tends to be quite high because data is cached in remote locations.– Works in conjunction with existing file servers to provides file consistency and coherency across a WAN.– Targets a hub-spoke model of remote office data services.– Enables consolidation of data management services to a remote data center.– The ultimate goal is to remove the need for a file server or backup administrator from the remote offices.– As a rule the expectation is that there will be little or no concurrent read/write sharing of files across sites.– Scalability can be excellent, but is dependent on the scalability of native file system on the remote file

servers.– Some products include application aware data caching beyond simple unstructured files, Microsoft

Exchange being a major example.– All WAFS implementations rely on the remote NAS file servers to provide data management capabilities

such as backup (e.g., snapshot, rapid restores), replication, and quota management.– WAFS implementation provide capabilities for secure communications (encryption) at the network layer.– They leverage the authentication and authorization mechanisms available on the remote NAS file servers.– In general, deployment and maintenance of WAFS solutions adds complexity to administering file services at

the data center but make up for it in benefits at the remote offices.• Target Applications (or Best Suited for Applications)

– Distributed software development for a variety of applications (e.g., Computer Aided Design)– File sharing applications (e.g., home directory, document management)– Email messaging systems (e.g., MS Exchange)– Web applications– Distributed print services

Page 55: Survey - Networked File Systems & File Servers › sites › default › education › ...EDUCATION Advanced Data Sharing - Survey of Networked File Systems & File Servers Jonathan

EDUCATION

Advanced Data Sharing -Survey of Networked File Systems & File Servers

Jonathan Goldick, ONStorPhilippe Nicolas, Brocade