egee-ii infso-ri-031688 enabling grids for e-science data grid services/srb/srm & practical...
TRANSCRIPT
EGEE-II INFSO-RI-031688
Enabling Grids for E-sciencE
www.eu-egee.org
Data Grid Services/SRB/SRM & Practical
Hai-Ning Wu
Academia Sinica Grid Computing
2
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Outlines
• Introduction• Characteristics of data grid• Storage Resource Management (SRM)• Storage Resource Broker (SRB)
– SRB Practical
• Summary
3
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Data Storage in Large Scales
• Historically data has been STORED rather than MANAGED
• The amount of data grows so rapidly that traditional storage architectures are no longer suitable
• Data are distributed in multiple types of source – hard to integrate data and increase the barriers between users and storage systems
4
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Challenges of Data Storage
• Large scales of data• Distributed storage via network• Heterogeneous data resources• Management data with efficiency and safety• Long-term preservation
5
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
The Solution: Data Grid
• Data virtualization– Manipulates data in high level– Hides details in low level
• Provides a uniform interface to access the distributed data storage systems
6
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Virtualization
• Data virtualization• Trust virtualization• Data grids are used to manage shared collections that are
distributed across multiple sites and multiple storage systems
7
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Data Grid - The Idea
Data Grid Data Grid
Data found
Request for D
ata
Client Users
8
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Data Grid - The Idea
Data found
Request for D
ata
Client Users
Details are hidden. The data grid system finds out where the data are located.
Data Grid System
10
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Data Grid Transparencies
• Find data without knowing the identifier– Descriptive attributes
• Access data without knowing the location– Logical name space
• Access data without knowing the type of storage– Storage repository abstraction
• Retrieve data using your preferred API– Access abstraction
• Provide transformations for any data collection– Data behavior abstraction
11
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Data Grid Components
• Federated client-server architecture– Servers can talk to each other independently of the client
• Infrastructure independent naming– Logical names for users, resources, files, applications
• Collective ownership of data– Collection-owned data, with infrastructure independent access control
lists• Context management
– Record state information in a metadata catalog from data grid services such as replication
• Abstractions for dealing with heterogeneity
12
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Data Grid Architecture
Unix Shell
Java, NTBrowsers
OAI,WSDLOGSA
HTTP
Archives - Tape,HPSS, ADSM,
UniTree, DMF, CASTOR,ADS
DatabasesDB2, Oracle, Sybase,SQLserver,Postgres,
mySQL, Informix
File SystemsUnix, NT,Mac OSX
Application
ORB
Standard Storage System Operations InterfaceStandard Database Interface
DatabasesDB2, Oracle, Sybase,
Postgres, mySQL,Informix
C, C++, Java Libraries
Logical Name Space Management
LatencyManagement
Digital ComponentTransport
MetadataTransport
Consistency & Metadata Management / Authorization-Authentication Audit
Linux I/O
DLL /Python,
Perl
Federation Management
13
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
SRMThe Data Grid Interface for EGEE Grid
Middleware
14
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
gLite Services
15
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
SE
• Storage Element– The Storage Element is the service which allows a
user or an application to store data – Data Channel Protocols
File Transfer and File I/O
16
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
SRM (Storage Resource Management)
• What is SRM?– SRM is a protocol to manage storage resources (It is NOT a file
access protocol!)– Provides an uniform interface for computing applications and
client users to heterogeneous storage elements– Does not transfer files itself– Provides space management– Manage the life time of file
17
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
SRM & Grid
18
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Grid files
• Grid Files– Files in the Grid can be referred by different names:
Logical File Name (LFN) : An alias created by a user to refer to some item of data. For example, /grid/gilda/gridcamp/testFile.txt
Grid Unique IDentifier (GUID) : A non-human-readable unique identifier for an item of data. For example, 37afd0cc-c53b-4795-a873-6a9dde35a9cc
Site URL (SURL) : The location of an actual piece of data on a storage system. For example, srm://dpm01.grid.sinica.edu.tw/dpm/grid.sinica.edu.tw/home/twgrid/generated/2007-09-18/file4c4a5a6f-878d-4ef3-a73d-941ae6275383
Transport URL (TURL) : Temporary locator of a replica + access protocol: understood by a SE. For example, gsiftp://dpm01.grid.sinica.edu.tw/dpm01.grid.sinica.edu.tw:/path1/twgrid/2007-09-18/file4c4a5a6f-878d-4ef3-a73d-941ae6275383.168233.0
– While the GUIDs and LFNs identify a file irrespective of its location, the SURLs and TURLs contain information about where a physical replica is located, and how it can be accessed.
19
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
LFC
• File Catalogue (LFC)– The mappings between LFNs, GUIDs and SURLs are kept in a
File Catalogue service, while the files themselves are stored in Storage Elements.
– The only file catalogue officially supported in WLCG/EGEE is the LCG File Catalogue (LFC).
Mapping by the “LFC” catalogue server
20
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
I/O server
GridFTP
WNWN
WNWN
CE SE
SRMSite A
LFCUser interface
1
23
Upload a file to a SE
CASE 1User needs to store data in SE (from a UI)1. Create a new LFN entry
in LFC, return a SURL.2. srmPrepateToPut
(SURL)3. Transfer the file4. srmPutDone (SURL)
4
21
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Upload a file to a SE
CASE 2Application needs to store data in SE (from a WN)
I/O server
GridFTP
WNWN
WNWN
CE SE
SRM
Site A
LFC1
2
3
1. Create a new LFN entry in LFC, return a SURL.
2. srmPrepateToPut (SURL)
3. Transfer the file4. srmPutDone
(SURL)
4
22
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
I/O server
GridFTP
WNWN
WNWN
CE SE
SRMSite A
LFCUser interface
1
23
Download files from a SE
CASE 3User needs to retrieve (onto the UI) data stored into SE1. Query the file catalog
to retrieve the SURL from the LFN.
2. srmPrepateToGet (SURL)
3. Transfer the file (read)4. srmReleaseFile (SURL)
4
23
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Download files from a SE
CASE 4Application needs to copy data locally (into the WN) and use them.
I/O server
GridFTP
WNWN
WNWN
CE SE
SRM
Site A
LFC1
2
3
1. Query the file catalog to retrieve the SURL from the LFN.
2. srmPrepateToGet (SURL)
3. Transfer the file (read)
4. srmReleaseFile (SURL)
4
24
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
SRB
Storage Resource Broker
25
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Storage Resource Broker
• Developed at San Diego Supercomputer Center• A distributed file management system (Data Grid), based on a
client-server architecture• A uniform interface to heterogeneous data storage resources, • Based upon their attributes rather than just their names or
physical locations• Support many data storage systems• Provide various types of client interfaces on different platforms
26
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
SRB Physical Structure
Oracle Client SRB ServerOracle RDBMS
SRB Vault@ location A
SRB Vault@ location B
SRB Vault@ location D
SRB Server
Storage Space
Storage Driver
SRB Server
Storage Space
Storage Driver
SRB Server
Storage Space
Storage Driver
User@
location X
MCAT-Enabled Server
27
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
SRB Pratical - inQ
• Download inQ 3.5.0 from
http://www.sdsc.edu/srb/tarfiles/inQ350.zip• Unzip inQ350.zip• Execute inQ.exe
28
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
inQ – Login
• Name: srbusr+your number• Host: tap07.grid.sinica.edu.tw• Domain: ASGC• Port: 6833• Authorization: ENCRYPT1• Password: The same as your user name
29
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
SRB Client Tool - inQ
30
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
SRB Demonstrations
• Use InQ to upload, download, remove files.• Use Scommands to upload, download, remove files.
– Sinit: log in SRB system Syntax: Sinit
– Sls: list directory content Syntax: Sls
– Sput: upload a file to the SRB server Syntax: Sput filename
– Sget: download a file from the SRB server Syntax: Sget filename
– Srm: remove a file stored in SRB server Syntax: Srm filename
– Sreplicate: to replicate data to another resource Syntax: Sreplicate filename
– Sexit: log out SRB system Syntax: Sexit
31
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Data Grid Applications
• Digital Archiving– Long-term preservation– Heterogeneous backup
• Digital Library– Data sharing
• Scientific Computing
32
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
SRB Use Case
• Build Data Grid Management System – Data Grid services in Academia Sinica– NDAP cross-organization data backup project
33
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
SRB Data Grid Services in Academia Sinica (1)
• Objective– To provide Grid services for long-term preservation and
unified data access• Data Collection Status
– File size: ~ 60 TB– File count: ~ 3.5 Million
34
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
SRB Data Grid Service in Academia Sinica(2)
Campus Backbone Network
ASCC(20TB)
IIS(8TB)
IOE(8TB)
ITH(8TB)
IHP(8TB)
IMH(8TB)
IZAS(8TB)
Tape Library(500TB)
35
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
• Academia Sinica (AS)• National Palace Museum (NPM)• National Taiwan University (NTU)• National Museum of History (NMH)• Academia Historica (DRNH)• National Central Library (NCL)
• National Museum of Natural Science (NMNS)• Taiwan Historica (TH)
NDAP Partners For Long-term Data Preservation
36
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Data grid for NDAP LTP service
37
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Summary
• Data grids provides a new solution for large-scale storage with the following features: – Distributed data storage – Efficient and safe management of data– A uniform interface to heterogeneous systems– Flexibility to new storage technology
38
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
SRM & SRB
• SRM– Used in gLite middleware– A uniform interface between different SEs and grid middleware
• SRB– Developed by SDSC– Support many backend storage systems– Widely used data grid software
39
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
SRM & SRB
• SRM and SRB cannot interoperate unless they have a standard to communicate
• Constructing a bridge between SRM and SRB so that– Integrate SRB into the gLite environment– Bind resources from the two important data grid systems– This project is currently developed by ASGC
40
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
SRM & SRB
41
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
iRODS
• A next generation data grid system after SRB developed by SDSC
• A rule-oriented data grid system• More flexibility for data management• Current version: iRODS 1.0
42
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
iRODS Workshop
• Time – Tue 8 April 2008• Location – 2nd Conference Room, 3F• For more information, please check on ISGC 2008
Website
43
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
References
[1] Use Cases on Data Services, Fu-Ming Tsai
[2] Building Preservation Environments with
Data Grid Technology, R. Moore
[3] EGEE Middleware Architecture and Planning (Release 1)
44
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Thanks for your attentions!