establishment of national agricultural bioinformatics...

23
Establishment of National Agricultural Bioinformatics Grid in ICAR आई. ए. एस. आर. आई./टे. बु.—05/2014 I.A.S.R.I/T.B.— 05/2014 भा. कृ. अनु. प. म राीय कृषि जैव - सूचना िड की थापना Ñf"k tSolwpuk dsUnz Hkkjrh; Ñf"k lkaf[;dh vuqla/kku laLÉku ykbcs zjh ,osU;w] iw lk] uà fnYyh&110012 ¿HkkjrÀ Centre for Agricultural Bioinformatics Indian Agricultural Statistics Research Institute Library Avenue, Pusa, New Delhi 110012, India 2014 नीि और संसाधन के आवंटन दिावेज़

Upload: dokhue

Post on 08-Apr-2018

221 views

Category:

Documents


1 download

TRANSCRIPT

  • Establishment of National Agricultural Bioinformatics Grid in ICAR

    . . . . ./. .05/2014

    I.A.S.R.I/T.B. 05/2014

    . . . . -

    f"k tSolwpuk dsUnz

    Hkkjrh; f"k lkaf[;dh vuqla/kku laLku

    ykbcszjh ,osU;w] iwlk] u fnYyh&110012 Hkkjr

    Centre for Agricultural Bioinformatics

    Indian Agricultural Statistics Research Institute

    Library Avenue, Pusa, New Delhi 110012, India

    2014

  • POLICY AND RESOURCE ALLOCATION DOCUMENT

    Anil Rai

    K. K. Chaturvedi

    S. B. Lal

    Anu Sharma

    IASRI, New Delhi

    Sanjay Wandhekar

    Ashish Ranjan

    Gourav Chaudhari

    Abhishek Sharma

    Tarun Singh

    C-DAC, Pune

  • Table of Contents

    1 Data Center Security Policy ............................................................................................ 1

    1.1 Introduction ............................................................................................................. 1

    2 Management Responsibilities ......................................................................................... 2

    3 Policy ................................................................................................................................. 3

    3.1 Physical Access Control.......................................................................................... 3

    3.1.1 Access to Network and communication devices ................................................ 3

    3.1.2 Access to HPC Datacenter facilities ................................................................... 3

    3.1.3 BMS room .......................................................................................................... 3

    3.1.4 UPS room ........................................................................................................... 4

    3.1.5 Visitors ............................................................................................................... 4

    3.1.6 Facilities maintenance personnel........................................................................ 4

    3.1.7 Food, Drink, Tobacco and Inflammable Products ............................................. 4

    3.1.8 Photography/Videography ................................................................................. 4

    3.2 Logical Access Control ........................................................................................... 4

    3.2.1 User Creation ...................................................................................................... 4

    3.2.2 Procedure to get a new account: ......................................................................... 5

    3.2.3 Modification of user rights: ................................................................................ 5

    3.2.4 Account locking ................................................................................................. 5

    3.2.5 Unlocking or resetting of password ................................................................... 5

    3.2.6 User deletion ...................................................................................................... 5

    3.3 Application Access and Installation ........................................................................ 6

    4 Physical Access to Data Centre Facility ......................................................................... 7

    5 Administrator rights ........................................................................................................ 8

    5.1 User ID Reviews: .................................................................................................... 8

    6 Storage Allocation ............................................................................................................ 9

    6.1 Storage Summary: ................................................................................................... 9

  • Policy and resource allocation document

    6.2 Type of users: .......................................................................................................... 9

    6.3 Storage quota......................................................................................................... 10

    6.3.1 Limits for home storage 250T: ......................................................................... 10

    6.3.2 Limits for scratch (Parallel) storage 200T: ...................................................... 10

    6.3.3 scratch Policy ................................................................................................... 11

    6.3.4 Limits for archive storage 200T: ..................................................................... 11

    6.4 Automatic File Deletion Policy............................................................................. 11

    7 CLUSTER RESOURCE ALLOCATION ................................................................... 13

    7.1 Types of cluster: .................................................................................................... 13

    7.2 Available Cluster Resources: ................................................................................ 13

    8 BACKUP AND RESTORE ........................................................................................... 15

    8.1 Procedure .............................................................................................................. 15

    8.2 Backup Content ..................................................................................................... 15

    8.3 Backup Approaches .............................................................................................. 15

    8.4 Database Back-up ................................................................................................. 16

    8.5 Data Backup policy: .............................................................................................. 16

    8.6 Restore Procedure ................................................................................................. 16

    8.7 Deleted User Account ........................................................................................... 17

    8.8 Best Practices for User .......................................................................................... 17

  • Policy and resource allocation document

    Tables

    Table 1: Management Responsibilities ...................................................................................... 2

    Table 2: Storage Summary......................................................................................................... 9

    Table 3: Limits for Home Storage ........................................................................................... 10

    Table 4: Limits for Scratch Storage ......................................................................................... 10

    Table 5: Limits for Archive Storage ........................................................................................ 11

    Table 6: Automatic File Deletion Policy ................................................................................. 12

    Table 7: Available Cluster Resources ...................................................................................... 13

    Table 8: Allocation of Resources ............................................................................................. 14

  • Policy and resource allocation document

    Abbreviations

    HPC High Performance Computing (HPC) is a system having high

    processing and computing capability to carryout complex

    calculations.

    Lead Centre IASRI, New Delhi

    Domain Centres NBPGR, New Delhi, NBAGR, Karnal, NBFGR, Lucknow,

    NBAIM Mau and NBAII, Bangalore.

    NFS Network File System is a distributed file system protocol allowing

    a user on a client computer to access files over a network in a

    manner similar to how local storage is accessed.

    SMP Symmetrical Multiprocessing involves a multiprocessor computer

    hardware and software architecture where two or more identical

    processors are connected to a single shared main memory, have

    full access to all I/O devices, and are controlled by a single OS

    instance, and in which all processors are treated equally, with none

    being reserved for special purposes.

    CABin Centre for Agricultural Bioinformatics

    IASRI Indian Agricultural Statistics Research Institute

    PFS Parallel File System

  • Policy and resource allocation document

    1

    1 DATA CENTER SECURITY POLICY

    1.1 INTRODUCTION

    This document covers data center security, allocation of storage, allocation of cluster

    resources and data backup requirements. Data center security is about physical and logical

    security of data center resources. Allocation of storage and cluster resources is about

    allocation of limited amount of storage, cores and memory among different user category.

    Data backup requirement describes how to protect data, what approach to use for backing up

    important data, how often to take backup and how to restore data.

    The implementation of policy mentioned in this document will increase the security and

    efficient use of HPC system and help to safeguard HPC resources.

    The physical and logical access to data and information processing resources are covered in

    this document.

  • Policy and resource allocation document

    2

    2 MANAGEMENT RESPONSIBILITIES

    There is two level of user authentication implemented to avoid the unauthorized access of the

    resources. The level of authentication will initially be approved by the Centre Head. The

    login credentials will be created by the System Administrator. The Centre Head will be

    responsible to grant the physical and logical access of the resources. The role of the manager

    type is shown in table 1. Two manager positions are proposed and are responsible for

    providing the authentication of the available resources to the intended user

    Table 1: Management Responsibilities

    Manager Type Role Centre Head Identifying physical and logical access rights to

    be granted to the users.

    System Administrator 1. Creation / Deletion of User IDs. 2. Granting / Revoking of access rights 3. Review and report

  • Policy and resource allocation document

    3

    3 POLICY

    Access control is a mechanism to ensure that authorized personnel have access to the

    information and information processing resources that are assigned to them. It also helps to

    track the accountability. Access controls are mainly two types namely physical access control

    and logical access control. Physical access control ensures that authorized personnel can have

    access to the physical assets that are assigned to them. This would include physical access to

    HPC Data center. Logical access control ensures that only authorized personnel have access

    to the information or data in electronic form. This includes access to the Operating system,

    application and associated information.

    3.1 Physical Access Control

    3.1.1 Access to Network and communication devices

    The network devices on all the floors should be housed in secure cabinets that can be locked

    and access should be restricted to network administrators or authorized personnel only.

    3.1.2 Access to HPC Datacenter facilities

    A valid ID card is mandatory for system administrators or maintenance personnel for

    accessing the service to co-located equipment. These cards need to be checked into entry and

    out at exit of/from the facility. IASRI staff will escort them to their equipment in HPC Data

    center.

    As HPC systems shall be operational 24X7, support provided by a vendor should be

    identified in advance so that vendor representatives can get necessary access to the system.

    System administrator should notify staff as soon as possible when they are notified that a

    vendor support visit is planned. Vendor representatives will be escorted in the facility.

    Biometric device is installed to control the access to the HPC data center. CCTV cameras are

    installed in the HPC Data center to monitor activities of the server room. No person is

    allowed to enter into the HPC Data center unless authorized person accompanies him/her.

    Entry and exit time for visitor must be logged in the system or in entry log book.

    3.1.3 BMS room

    Biometric device is installed to control the access to the BMS room. Only authorized staff or

    person who monitors alert generated by security devices is allowed to enter BMS room.

  • Policy and resource allocation document

    4

    3.1.4 UPS room

    Biometric device is installed to control the access to the UPS room. Only authorized person is

    allowed to enter the UPS room.

    3.1.5 Visitors

    All IASRI staff, students, and third-parties who are visiting the facility are required to present

    their ID cards or valid government-issued identification which will be checked while entering

    in and out of/from the facility. General visitors to the HPC Data center must be escorted by

    IASRI staff during their visit to the facility.

    3.1.6 Facilities maintenance personnel

    Maintenance of equipment and the facility by IASRI staff and third parties are essentially

    required. Maintenance may include but is not limited to general cleaning, raised floor space

    cleaning, and maintenance on electrical and mechanical systems. Maintenance visits by non-

    IASRI staff must be scheduled in advance and informed to the System administrator.

    Maintenance staff will be escorted at all times and/or under surveillance. All maintenance

    personnel must carry an approved identification credential and adhere to IASRI policies and

    procedures.

    3.1.7 Food, Drink, Tobacco and Inflammable Products

    Food, drinks, tobacco and inflammable products shall not be allowed in the HPC Data center

    area. Smoking shall be strictly prohibited in the HPC Data center facility.

    3.1.8 Photography/Videography

    Taking of pictures and/or video, including by cell phones equipped with cameras, is

    prohibited unless a valid approval from the competent authority is presented.

    3.2 Logical Access Control

    The login credentials can be created by the system administrator to access the computing

    resources or information/data.

    3.2.1 User Creation

    A unique Identifier (User ID) should be created for every individual who is given access to

    the HPC facilities at IASRI. The ID creation naming convention would be decided by the

    system administrator. The system administrator will create a new user-id and provide the

    access rights as recommended by the CABin head.

  • Policy and resource allocation document

    5

    3.2.2 Procedure to get a new account:

    1. Fill up the registration/login form available at webapp.cabgrid.res.in/biocomp portal.

    2. Fill the form online

    3. The credentials will be verified.

    4. After verification, get the approval of the Centre head.

    5. Submit the request to HPC system administrator to get the account created.

    6. Send the email to the concern user.

    3.2.3 Modification of user rights:

    In case of any modification in user access rights, the request should be submitted to CABin

    Head for approval. The approved request should be submitted to the system administrator

    who will do the necessary changes. If the rights are often modified on temporary basis then it

    is very important to review the user rights on regular basis so that misuse of elevation of

    rights can be avoided.

    3.2.4 Account locking

    Account will be locked if number of wrong password attempt is made. Account would be

    locked if user is not going to use the account for a long period of time.

    3.2.5 Unlocking or resetting of password

    If an account is locked or a user has forgotten his/her password then he/she needs to send a

    request mail request to the system administrator who will then unlock the account or reset the

    password.

    3.2.6 User deletion

    Information regarding the user deletion should be sent to CABin Head who will notify the

    system administrator for disabling / deleting the user-id of the user. This would be required in

    case of the user has been resigeds / suspended / terminated from the service or left (for any

    other reason) the institute.

    On receipt of notification, the system administrator would carry out the requested action

    before the specified period of time. User-ids of retired personnel must be deleted within 15

    days, unless explicitly advised by the CABin head. It is better to retain deleted user data in

    archive storage for a certain period of time.

  • Policy and resource allocation document

    6

    3.3 Application Access and Installation

    User can access all the available applications through web portal. Users are not supposed to

    install any application in their home directory without prior permission from system

    administrator/ CABin Head. In case any specific application is not installed and it is required

    by the user then he may send a request to the system administrator. This application will be

    installed if CABin Head approves the request.

  • Policy and resource allocation document

    7

    4 PHYSICAL ACCESS TO DATA CENTRE FACILITY

    1. If a person wants physical access to the data center, he would need to submit a written

    request to system administrator/ CABin head for permission.

    2. Once the permission is granted, the user needs to contact the system administrator

    who will then create an account in the biometric software.

    3. After creation of the account, the user can get entry in the data center through finger

    scan device.

  • Policy and resource allocation document

    8

    5 ADMINISTRATOR RIGHTS

    Administrator logins and privileged access rights allow users to override HPC system

    controls. Users must not be allowed to work with administrator credentials or with

    privileged rights, unless it is very much required, it must be done in the presence of system

    administrator.

    5.1 User ID Reviews:

    Access requests must be renewed annually to maintain approved access. Access permissions

    should be reviewed periodically. Users shall notify the system administrator immediately if

    the access is no longer required due to an employees termination or a change in

    responsibilities.

  • Policy and resource allocation document

    9

    6 STORAGE ALLOCATION

    Each individual user/researcher is assigned a standard storage allocation or quota on each

    type of storage namely /home, /scratch and /archive. Researchers/users are allowed to use

    home storage according to soft limit, hard limit and grace period, scratch storage according to

    fixed space and fixed time period. The chart below shows the general view of types of

    storage will be provided to the users.

    6.1 Storage Summary:

    Table 1 shows the different type of file systems with their purpose of use, total size and file

    system used.

    Table 2: Storage Summary

    Storage Purpose Size Back up File System

    /home Space where users have their

    home directories, users can keep

    their files as long as they want but

    must be kept under soft limit.

    250 TB

    Yes, but for

    particular user

    account

    NFS

    /scratch Computational work space 200 TB No PFS

    /archive Long-term storage 200 TB No NFS

    Important: Of all the space, only /scratch should be used for computational purposes.

    6.2 Type of users:

    The users are grouped based on the resources and application usage. There are three types of

    users in the HPC system.

    1. Registered User (RU)

    2. Centre Normal User (CNU)

    3. Centre Main User (CMU)

    The registered users (RU) profile will be created through web based registration process and

    is available to any user whose request is approved by the Centre Head. Center Normal User

    (CNU) category is the type of user category which is capable of using fair amount of cluster

    resources. Center Main Users are those who will utilize the huge amount of cluster resources

    frequently.

    There will be approximately 1200 users in all categories initially i.e. 100 in CMU, 100 in

    CNU and 1000 in RU category.

  • Policy and resource allocation document

    10

    6.3 Storage quota

    The storage quota is assigned based on the types of users. There are three types of storage

    namely home, scratch and archive. The storage is allocated based on soft limit, hard limit and

    grace period.

    Soft Limit: Soft limit in quota is defined as the limit which can be exceeded for a particular

    time i.e. grace period.

    Hard Limit: Hard limit in quota is defined as the limit which cannot be exceeded.

    Grace Period: Time period for which a user can keep its space usage above the soft limit.

    6.3.1 Limits for home storage 250T:

    Main purpose of home storage is keep home directory of users where users will keep their

    files as long as they want. Table 2 shows the assigned soft limit, hard limit and grace period

    to each category with total number of users in each category.

    Table 3: Limits for Home Storage

    USER TYPE SOFT

    LIMIT

    HARD

    LIMIT

    GRACE

    PERIOD

    Total No.

    of users

    Registered User 20G 25G 60 Days 1000

    CNU 500G 600G 30 Days 100

    CMU 750G 900G 40 Days 100

    Maximum home space used

    by all users

    60T 175T ----------- -----

    Above mentioned home storage limit can be varied according to number of users in future.

    6.3.2 Limits for scratch (Parallel) storage 200T:

    Scratch storage is used here to keep the data which is either required as input or produced as

    an output during the execution of parallel application. Table 3 shows the assigned soft limit,

    hard limit and grace period to each category with total number of users in each category.

    Table 4: Limits for Scratch Storage

    USER TYPE HARD

    LIMIT

    TIME

    PERIOD

    Total no. of

    users

    Registered User 25G 25 Days 1000

    CNU 600G 40 Days 100

    CMU 900G 50 Days 100

    Maximum scratch space used by all users 175T ----------- ------

  • Policy and resource allocation document

    11

    Above mentioned scratch storage limits can be varied according to the number of users in

    future. If a user wants more space (greater than hard limit), he/she can make a request to

    exceed the hard limit for some time.

    6.3.3 /scratch Policy

    The /scratch storage system is a shared resource that needs to run as efficiently as possible for

    the benefit of all users. There is no system backup for data in /scratch, it is the user's

    responsibility to back up their data frequently.

    All files which are older then allowed time period to a particular user will be removed

    on the regular basis as a part of the cleaning process.

    It is strongly suggested to the user that they do regular cleaning of their data in

    /scratch to decrease /scratch usage by backing up files they need to retain either on

    /archive or elsewhere.

    Administrator has reserved the rights to clean up files on /scratch at any time if it is

    needed to improve the performance of the system..

    Some precautions for the users:

    Do not put important source code, scripts, libraries, executables in /scratch. These

    important files should be stored in /home.

    Do not make soft link for the folders in /scratch to /home for /scratch access

    6.3.4 Limits for archive storage 200T:

    Archive storage is a low cost storage which is used to store data for longer period of time and

    it can only be used by CMU. 40% i.e. .8TB will be kept of database and application

    archiving. Table 4 shows the assigned limit of archive storage to users.

    Table 5: Limits for Archive Storage

    USER TYPE HARD LIMIT TIME PERIOD

    Registered User 0 0

    CNU 0 0

    CMU 1.2T None

    Above mentioned archive storage limit can be varied according to number of users in future.

    6.4 Automatic File Deletion Policy

    The table below describes the policy concerning the automatic deletion of files from home,

    scratch and archive storage.

  • Policy and resource allocation document

    12

    Table 6: Automatic File Deletion Policy

    Space Automatic File Deletion Policy

    /home None

    /archive None

    /scratch Files will be deleted after the expiry of allowed time period. Files may be

    deleted as needed without warning if required for system productivity.

    ALL ALL /home and /archive files associated with expired accounts will be

    automatically deleted after 90 days. The /scratch files will automatically be

    deleted according to time period assigned to each user.

  • Policy and resource allocation document

    13

    7 CLUSTER RESOURCE ALLOCATION

    Resource allocation is an important process to ensure efficient and fair use of the cluster.

    Following section describes the different types of cluster available and allocation of resources

    to a particular user type with respect to a particular cluster.

    7.1 Types of cluster:

    1. Linux cluster having 256 nodes

    2. Windows cluster having 16 nodes

    3. Linux based GPU cluster having 16 nodes

    4. Linux cluster at each of the five domain center

    5. SMP

    7.2 Available Cluster Resources:

    Following table provides the information about the cores and RAM in each cluster and their

    individual nodes. Column Cores tells the number of cores in the cluster and its nodes and

    column memory gives the total amount memory in the cluster and its nodes.

    Table 7: Available Cluster Resources

    Cluster Type Cores Memory

    PER NODE CORES PER NODE

    256 Nodes Linux Cluster 12 12*256=3072 96G

    16 Nodes Windows Cluster 12 12*16=192 96G

    16 Node GPU based Linux Cluster* 12 12*16=192 96 G

    SMP 64 64 1.5T

    *Each GPU node contains two GPU cards and the memory of each GPU card is 6 GB.

    Allocation of Resources:

    Following tables shows allocation of cores to CMU (Center Main User), CNU (Center

    Normal User) and RU (registered user) from all the available users

  • Policy and resource allocation document

    14

    Table 8: Allocation of Resources

    Cluster Type Max Cores/User Total Cores Assigned

    to User Category

    Total Cores

    Under Use CMU CNU RU

    256 Nodes Linux Cluster 40 20 4 1600 800 400 2800

    16 Nodes Windows

    Cluster

    8 4 NA 176 80 NA 256

    16 Node GPU based

    Linux Cluster*

    8 4 NA 176 80 NA 256

    SMP NL NL NA NL NL NA 64

    *32 GB RAM is set as a limit to the registered user

    *NA stands for Not Allowed and NL stands for No Limit

  • Policy and resource allocation document

    15

    8 BACKUP AND RESTORE

    The importance of data and unprecedented growth in data volumes has necessitated an

    efficient approach to data backup and recovery. This document is intended to provide details

    of data backup and retrieval operations.

    The purpose of back and restore policy is as follows:

    To safeguard the information assets of IASRI

    To prevent the loss of data in the case of an accidental deletion or corruption of data,

    system failure, or disaster.

    To permit timely restoration of information if some unwanted event occur.

    To manage and secure backup and restoration processes and the media employed in

    the process.

    8.1 Procedure

    The Archive storage currently deployed for backup has 200 TB of disk-based storage. 60% of

    storage is reserved for user data and rest is for database, application data and other data.

    The backup software used to control the backup processes is HP ibrix. The Systems Support

    team ensures that all backups are completed successfully and reviews the backup process

    daily. Logs are maintained to verify the amount of data backed up and the unsuccessful

    backup occurrences.

    8.2 Backup Content

    The primary data that will be backed up are: Data files of Center Main User, Database files,

    application installed on the cluster and common application data required by users. Data to be

    backed up will be listed by location and specified data sources.

    8.3 Backup Approaches

    1. Data accessed 24x7 should backed up with full back-up most of the time as restore

    process will take less time to make data back online again. It is important to decide

    how many days you want to keep full backup copy as it will consume lot of backup

    storage space. If back-up space is constraint then repetition of one full backup

    followed by several differential backup daily should be carried out.

    2. User data back-up should be carried out as differential or incremental backup.

    Differential back-up should be considered first if there is enough space available,

    otherwise go for incremental backup.

  • Policy and resource allocation document

    16

    3. Installed applications and data used by these applications do not change very often, so

    incremental backup is the best option.

    8.4 Database Back-up

    1. Database back-up process needs online database to be offline or locked, so that

    backup copy contains consistent state of database.

    2. If making database offline is not an option than specialized software should be used to

    carry out periodic backups.

    3. Time interval for backup

    8.5 Data Backup policy:

    1. Full backups are performed weekly. Full backups are retained for 3 months before

    being overwritten.

    2. Incremental backups are performed daily. Incremental backups are retained for

    1month before being overwritten.

    3. Backups are carried out overnight.

    4. Once the Backup process is finished, Backup copy will be copied to remote site for

    disaster recovery process.

    5. Backups are stored securely and only authorized person have access to it.

    6. The IT department monitors backup operations and the status for backup jobs is

    checked on a daily basis during the working week.

    7. Re-run of failed backup will be done next day.

    8.6 Restore Procedure

    1. Data for restoration will be available once the ongoing backup is done or required

    backup copy already exist i.e. older backup.

    2. Backup data will only be available for restoration during retention period.

    3. Request for data restoration/recovery must be sent to backup/IT administrator or

    HEAD of IASRI datacenter.

  • Policy and resource allocation document

    17

    8.7 Deleted User Account

    In case of any user account is deleted, backup will be kept for some limited period of time.

    But user must be made aware of the fact that IASRI is not responsible of the data one account

    is deleted as in case of shortage of backup space user might get deleted.

    8.8 Best Practices for User

    1. Always have backup of your data on your personal device.

    2. For better use of backup storage, please remove the backed up data that you will never

    need in the future.

    3. Do not let the multimedia files backed up which are not at all related to your HPC

    work.