dtc intelligent storage consortium research activities ... · 5/10/2005 disc - digital technology...

52
5/10/2005 DISC - Digital Technology Center Intelligent Storage Consortium 1 DTC Intelligent Storage Consortium Research Activities Supported by StorageTek, Veritas, Engenio, Sun Micro, ETRI/Korea DOE, ONR David H.C. Du Department of Computer Science and Engineering

Upload: hanhi

Post on 31-Jul-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

5/10/2005 DISC - Digital Technology Center Intelligent Storage Consortium 1

DTC Intelligent Storage Consortium Research Activities

Supportedby

StorageTek, Veritas, Engenio, Sun Micro, ETRI/Korea

DOE, ONR

David H.C. DuDepartment of Computer Science and Engineering

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

2

Disk = Node (Intelligent Storage)has magnetic storage (1TB?)has processor & DRAMhas SAN/IP attachment has an execution environment Semantic-awareApplication-awareObject storage device

OS KernelSAN driver Disk driver

File System RPC, ...Services DBMS

Applications

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

3

Example : Adding storageExpand a file system to allocate/extend a database table

DBA: Noticetablespace full

Contact Sys Admin

Check for spaceon filesystem;

see more needed

Contact Storage Admin

Find LUN(s) withright characteristics

Make sure LUN(s) arebound to right port

Ensure HBA, portare in same zone

Mask LUN to allow

HBA access

Contact Sys Admin

Log into host; run commands to see

new volumes

Add primary volumeto volume group,

expand group

Mount primaryvolume to

filesystem; extend

Log into DB;create datafile

Add datafile to tablespace and

expand

DBA-or-App

Installer

Storage Admin

System Admin Contact DBA

OrInstaller

Repeat previous 4 steps for volumeat remote DR site

Setup mirroring withremote volume

using volume mgr

Installer:Determine space

needed

Contact Storage Admin

Log into SystemInstall application-or--or-

-or-

Source: IBM

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

4

Our Approach

Explore the intelligence to be included in storage devicesDevelop and extend the OSD (Object Storage Device) StandardsInvestigate essential technologies and design new storage architectures Investigate applications and environment that can benefit directly

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

5

Background: Intelligent storage

Blocks

Files

Object

Information

KnowledgeCaptured in the attributes of an object

Exploited to store data more efficiently

[ INTELLIGENCE ]

Extended attributes augmented view high level semantics associated.

Traditional storage device view raw bits, no associated semantics.

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

6

Focused ResearchEssential Technical Components– Extended Attributes– Security Applications– Micro-Array Data (Functional Genomics)– Supercomputing (Tape Backup and Archive)

Exploiting the Attributes for Intelligence– QoS Support– Search and Indexing– Data Provenance

Tape Backup and Archive– Parallel Archive– Parallel Data Placement and File System for Tape Library

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

7

Extended Attributes (EAs) in OSDMotivations– Attributes are important extension of objects to traditional

files– EAs can be used to support semantic-aware and application-

aware storage – Only preliminary solutions in several existing file systems:

Ext2, Ext3, XFS, JFS, ReiserFS, etc.ObjectivesHow to use EAs and how to efficiently store and access EAs?– Access control of EAs– Fast retrieval of any EA by name– Fast bulk copy of EAs– Efficient space utilization for variable-sized EAs– Help on search and indexing

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

8

Access Control of EAsPermission bits and file ACL are NOT enough to protect EAs

– Example: an “audit trace”EA should not be accessible to file owner

Categorization of EAs– Search and indexing

Group– Storage Management

Group– Monitoring Group– Enhanced Application-

aware GroupDefault access control rules are defined for members of each group when createdNew access control entry can be inserted into ACL of each EA

<page 1>

xattr 1xattr 2xattr 3xattr 4xattr 5

extended attribute headers

reserved ACEs for default access rules

xattr 6xattr 7 <page 0>

external pages for long ACLs

ACL of xattr 4

ACL of xattr 7

i-node

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

9

Storing EA on Storage Media

Inline (within i-node) space for small EAExternal pages for more EAsDifferentiate per-instance EAs(IEAs) and per-object EAs

– Example: When copying, “audit trace” EA should not be copied to the new object

Separated EA headers and EA values

– Headers are organized into index for fast lookup

– Headers have inline space for storage small values

– EA value pages support efficient variable-sized EA values

Mode

Owner info

Size

Timestamps

EA Header Block

Direct Blocks

Indirect Blocks

Double Indirect Blocks

Triple Indirect Blocks

Data

Data

Data

Data

Data

Data

EA headers……ACL Block 1

ACLs

EA Value Block

EA valuesACL Block 2

ACLs

reserved space for inline EA

IEA Value Block

IEA Value Block

IEA headers

IEA values

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

10

EAs in External Pages

EA headers are built into B-Tree w/ EA name as searching keyEA headers have reserved space for small valuesExternal EA values are reference by (logical EA page number, slot number)

– Reference of external EA value don’t need to change when copying to new object

EA header EA header EA header EA header……

Value space

external EA value page

Free space

Slot Directory

PAGE i

(page i, slot 1)

# of slots

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

11

Focused ResearchEssential Technical Components– Extended Attributes– Security Applications– Micro-Array Data (Functional Genomics)– Supercomputing (Tape Backup and Archieve)

Exploiting the Attributes for Intelligence– QoS– Search and Indexing– Data Provenance

Tape Backup and Archive– Parallel Archive– Parallel Data Placement and File System for Tape Library

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

12

Selected Medical Application

Data Explosion : Can intelligent storage help ??

Access to Diverse Heterogeneous Distributed Data

Expression Arrays (various tissues)

Personal genomics

X-rays, MRI, mamograms, etc

Clinical Record

Analysis lab notes

Hospital events ....admission, surgery, recovery, discharge

1. Patient Information Challenges

Volume and complexity of data

Integrating massive volumes of disparate data

Need for sophisticated analytics

Growing collaboration across ecosystem

Slide from Dr. Khaja Zafarullah’s presentation

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

13

IBM Healthcare and Life Sciences Clinical Genomics Solution Conceptual Architecture

Medical Research

Clinical Care

HIS, RIS, CIS, Pathology, Rx, Patient Charts

Expression, SNPs, Clinical Studies & Trials, Proteomic

Medical Information GatewayDeidentification of Patient Data & Anonymous

Global Patient Identifier Assigned

Adherence to StandardsHL7, BSML/HapMap, CDISC/ODM, MAGE-ML, CDA, etc.

Medical Information BrokerMedical

Information Repository

DB2 Information Integrator

WebSphere

Source scientific data & unstructured text files

e.g. MS Access. MS Excel, EST/ GenBank, XML,

Medline, dbSNP

Data Mining/Statistical Analysis/Visualization

Source: IBM

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

14

Focus of Work.

Explore the scope and advantages of intelligent storage in a practical setting.Collaborator: Mayo clinic

High level goalData explosion: Volume and complexity of data is increasing by the day – bioinformatics and health care sectorStore and organize large volumes of disparate data in an efficient manner.How can intelligent storage make things better ?

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

15

Data pieces generated at various phases of the experiment

Quantization

geneID

sampleID

Gene expression level

Consolidation

3. Scanned image file(s)4. Gene expression matrix

1. MIAME data

2. Chip data

Experiment Setup

5. Annotation

Analysis

External knowledge bases

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

16

Pieces of data generated andtheir characteristics:

Data Piece characteristicsExperiment description, gene exp. matrix : MAGE-ML

structured

Chip data : vendor provided structured

Annotation and findings (XML)

semi-structured, frequently accessed

Image files Unstructured, less frequently accessed.

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

17

Research Road Map

MicroArray ExperimentMicroArray Data– MIAME– MAGE-ML

Field DataStarting the MicroArray Intelligent storage mapping

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

18

Prototype Implementation: A rough plan

Hardware: – storage brick– dual processor

RAID controllerEmbedding intelligence

Operating System:– Linux clone– ext3 filesystem modify to infuse intelligence

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

19

Focused ResearchEssential Technical Components– Extended Attributes– Security Applications– Micro-Array Data (Functional Genomics)– Supercomputing

Exploiting the Attributes for Intelligence– QoS Support– Search and Indexing– Data Provenance

Tape Backup and Archive– Parallel Archive– Parallel Data Placement and File System for Tape Library

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

20

Integrated QoS Provisioning in A Remote Accessible Environment

Problem– How to provide QoS support to clients that access remote

storage?Solution– A full integration of network QoS, SAN QoS and storage

QoSApproach– Take both network (TCP/IP and SAN) and storage

conditions into consideration– Combine the feedback mechanism with storage and

network scheduler

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

21

QoS Support for Remote Data Accesses

Computer

Server

Laptop

Server

Multilayer Switch

Data Center

FC-AL storage

SAN SwitchRouter

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

22

Approaches and ResultsWorking on a OSD standard based QoS Specification (iSCSI QoS Reference Implementation) Propose a QoS framework for OSD-based data access– Support of multiple QoS classes– Extension of OSD specification to incorporate QoS

specification Intelligent storage scheduling and resource allocation to support QoSInvestigate SAN QoSIntegrate network QoS, SAN QoS with Storage QoS

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

23

What is data provenance?

Provenance is a relationship between data objects to explain how a particular object has been derived.A workflow of data processes usually explains this relationshipUsing provenance, a user can trace the “workflow” that led to the aggregation of processes producing a particular object.

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

24

How to solve data provenance in bioinformatics?

Workflow of Functional GenomicsData Dependent Relationships Between Data ObjectsAnalysis Tools: take several input data with a set of parameter values to produce a version of output data objectResults and generated knowledge are presented as annotations and feedback to the system

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

25

A Novel Update Propagation Module for the Data Provenance Problem

Objective: Preventing the following scenarios:Propagation of erroneous outcomesUnnecessary rerunning of time consuming and heavily computations

Approach: Integrating three major decision factors as a sequential hypothesis testing problem to form a unique decision module

– Sensitivity analysis (Variance-based Approach) – Independent inputs– Correlated inputs

– Uncertainty analysis (Root-Sum-of the Squares Method)– Complexity & Dependency

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

26

Search and Indexing: Many Types of Data

Homeland Security– Fingerprints – Facial photographs – Satellite imagery– Intelligence reports – Criminal records

Special Cases– Multiple types of data in

one itemPatient or criminal record has both text and images

– Some data has additional semantic information and relationships

Medical– X-rays– Micro arrays – Lab notes– Publications – Patient records

Business/Personal– Office files (.doc, .ppt,

.pdf)– Multimedia (images,

video, and audio)– Emails

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

27

Data FormatsStructured– A strict format (schema) is known in advance – Databases use structured data– Good query performance on indexed fields

Semi-structured– No fixed or predefined schema– Data and schema information are mixed together

as “self-describing data” (e.g., XML)Unstructured– Various text, HTML, images, video, audio, and

other files arranged in no particular way – Slow query performance due to exhaustive

searches

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

28

ChallengesThe system needs to scale to massive levelsThere are some limitations to existing database technology– If data structure changes often, databases need to

deal with schema evolution and migration of all old data

– Dealing with multi-format data sometimes resorts to pointers to a separate file system

Location of unstructured data often requires an expensive exhaustive search– Even with pre-indexed data, an unexpected query

can trigger an exhaustive search

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

29

Automatic Indexing by Query History

High Speed Network

Clients1. Find attr3 = X2. Find attr3 = Y

Metadata Server4. attr3 index here

Intelligent Storage Devices1. Full scan2. Full scan and build index on attr33. Index scan

ObjectsExtended Attributes:attr1, attr2, attr3

3. Find attr3 = Z

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

30

Example Semantics: What is MeSH?

over 22,000 descriptors arranged MeSHstands for Medical Subject HeadingsA controlled thesaurus of medical terms according to National Laboratory of MedicineContains hierarchically into 15 main categorieshttp://www.nlm.nih.gov/mesh/2005/MeSHtree.html

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

31

MeSH-based Grouped Allocation

High Speed Network

ClientsGenerate dataAssign MeSH terms

Metadata ServerMeSH-aware

Intelligent Storage DevicesAllocate somewhat intelligently…by grouping related objects

ObjectsRelated objects inherit attributes

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

32

Proposed Approaches

Take advantage of the hierarchical structure and semantic of MeSH (Medical Subject Headings) terms to assist search for similar or required data Design an adaptive data allocation on intelligent storage devices for fast retrieval based on past query results

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

33

Focused Research

Essential Technical Components– Extended Attributes– Security Applications– Micro-Array Data (Functional Genomics)– Supercomputing

Exploiting the Attributes for Intelligence– QoS– Search and Indexing– Data Provenance

Tape Backup and Archive– Parallel Archive– Parallel Data Placement and File System for Tape Library

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

34

… …

Tape Controller Tape Controller

I/O Controller

Data Transfer

Creating OSD-Enabled Tape Library

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

35

Research Goals on Parallel Tape Libraries

• For high performance cluster environment, how to place objects across multiple tape drives to increase aggregate object retrieval rate from tape archives

• For thousands of distributed clients with various connection rates and no knowledge about tapes, how to improve backup/archive performance and easy to use interface

• Determine the polices required for this HSM software to allow efficient storage of objects on tapes

5/10/2005 DISC - Digital Technology Center Intelligent Storage Consortium 36

High Performance Tape File System for Data Backup/Archive

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

37

Motivation

Users’ laptops and desktops usually have invaluable data for business, but users are not experts for data backupTo avoid natural disasters and terror attacks, critical data have to be archived to tapes and sent to off-siteReducing the backup window to tape is critical for the safety of the valuable dataSingle data repository is easy to be managed and avoids human errors.

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

38

System Desired Features

Simplicity, convenience, low cost and high performance for data backup/archiveTape should be treated as a normal storage device and provides various data I/O interfaces to users– cp, scp, sftp, http put, osd and etc.

Direct backup/archive to the final destination – tapes –without involving disk cachingProvides “infinite” storage and concurrent writing to users

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

39

The Big Picture

VFS

ext3

nfs

fuse

op

glibcglibc

libfuse

/mnt/tape

internet

client 1

client 2

client n

user space

kernel space

network

scp

sftp

http

Note: Data streams are interleaved if necessary

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

40

In general, data interleaving is good for write and bad for readRestore from interleaved data will be talked in next slide

µ a11 a12 b11 c11 a13 a13 b12 c12 a21 a22 b21 c21

µ/2 µ/4µ/4

c1a1 b1 c2a2 b2

Control point

µ/3

sendingwaiting

Block Level Data Interleaving

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

41

Interleaved Data Restore

Effective restore bandwidth for interleaved data objects depends on how objects are interleavedRequests for archived data usually come in batches, which provides space for scheduling optimizationScheduling Issues for restore performance– Data backup scheduling issue: optimize data interleaving for

future restore based on object access probabilities and relationship

– Data restore scheduling issue: for a given bunch of data requests, find a near-optimal solution to minimize object restore window

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

42

Prototyping

Client B

SCSI

Client A

PC

Put a PC in front of a physical tape library to provide necessary processing power and memory buffer– Write data to disk and tape simultaneously

Interleave data objects for tape writing

– Read data object from disk or tape

Client C Virtual Tape Library

network

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

43

Data Transfer Model for d2d2t

µ µ µ µ

Disk Array Cache with central control

Tape Drives

Data path

Control path

Tape cartridge

Tape Library Robot

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

44

Object Allocation Scheme for d2d2tby considering the object relationship

3rd batch 4th batch1st batch: n(d-1) 2nd batch

tape

n tape libraries

near-line tapes

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

45

Parallel Archiving in OSD Environment

Supported by DOE contractOSDs have internal hierarchies of storage managed by independent commercial HSMFiles can consists of multiple objects stored in parallel OSDs

– Synchronize the storage-level of related objects, e.g., migrations of object 3 and object 4 should be coordinated

Objective– Leverage object based

parallel file system technology and commercial (non-parallel) HSM products

– Parallel archive integrated tightly into a globally shared scalable parallel file system

Metadata hashed across multiple

machinesFile /foo =

OBSD1 object 4 OBSD3 object 3

owner = me date = today

etc.

Metadata Cluster

Object Storage Devices

Network

OBSD1

object 4

OBSD2 OBSD3

object 3

Clients

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

46

Classical Hierarchical Storage Management Approach

KernelUser

Application calls for IO, DMAPI layer

intercepts IO call, if data has been

migrated to tape, Migration Daemon is notified to recall data

If file system is too full, Migration Daemon is notified to migrate

data out

DMAPI

VFS

File System X

Application

File System Client machine

Migration Daemon

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

47

Coordinating Parallel HSMs

Metadata hashed across multiple

machines File /foo = OBSD1 object 4 OBSD3 object 3

owner = me date = today

Archive attributes etc.

ClientsMetadata Cluster

Object Storage Devices 1 2 3

4 3

Network

Migration Agents w/

partial DMAPI FunctionsMultiple instances

of commercial HSM solutions used in parallel means parallel migration and

recall

Migration coordinator: query Metadata cluster, instruct

HSM’s on migration, update Metadata

cluster about new location

Migration coordinator cluster

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

48

OSD Internal Architecture

Migration Agent

Disk File SystemDisk Driver Tape Library Driver

Tape File System/DB

HSM

Networking driver

NIC

OSD Interface

Potential POSIX standard supported by all HSM vendors

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

49

Other Projects

Active Data ObjectONR Storage and Networking Planning Tool

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

50

Active Objects

Problem: How can the OSD environment be extended to be even more flexible?Approach: Allow objects to include executable methods in addition to the data, attributes and metadata. These methods can be invoked when a pre-set condition is met.Uniqueness: Data objects are truly autonomic. Intelligent storage devices have to designed to provide such a capability.

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

51

Our Implementation

Networking

MDSServer

LockClient

MDS Backend

Rec

over

y

LockServer

Load Balancing

OBDClient

MDSClient

LockClient

Lustre clientfilesystem

Run-time

OS bypassfile I/O

Meta-dataWB cache

Method API

Application (distributed grep)

Networking

Rec

over

y

Networking

Object-Based Disk(OBD) server

Lockserver

Object-Based Disk (OBD)

Methods Run-time

Rec

over

y

Networking

Object-Based Disk(OBD) server

Lockserver

Object-Based Disk (OBD)

Methods Run-time

Rec

over

y

Object Storage Target Object Storage Target Meta-Data Server

File Client

System Configuration

- 1 file client: Blade server node (linux)- 2 or more OST: Blade node (linux)- 1 MDS: Blade node (linux)

- Integrates method API into Lustre file system- Develops new distributed grep program

5/10/2005 DISC - Digital Technology Center Intelligent Storage ConsortiumUniversity of Minnesota

52

Conclusions

Intelligent Storage has a long way to goMany interesting and promising issuesIndustrial collaboration is extremely importantDefinitely need to demonstrate the advantages of OSD with real applications

Thanks! and Questions?