user guide - deutsche telekom · 5.12.3 backing up metadata.....227 5.12.4 backing up service data...

MapReduce Service

User Guide

Issue 01

Date 2017-02-20

Contents

1 Overview......................................................................................................................................... 11.1 Introduction.................................................................................................................................................................... 11.2 Application Scenarios.....................................................................................................................................................11.3 Functions........................................................................................................................................................................ 21.3.1 Cluster Management Function.....................................................................................................................................21.3.2 Hadoop.........................................................................................................................................................................31.3.3 Spark............................................................................................................................................................................31.3.4 Spark SQL................................................................................................................................................................... 41.3.5 HBase...........................................................................................................................................................................41.3.6 Hive............................................................................................................................................................................. 51.4 Relationships with Other Services..................................................................................................................................61.5 Required Permission for Using MRS............................................................................................................................. 71.6 Limitations......................................................................................................................................................................8

2 MRS Quick Start..........................................................................................................................102.1 Introduction to the Operation Process.......................................................................................................................... 102.2 Quick Start.................................................................................................................................................................... 112.2.1 Creating a Cluster...................................................................................................................................................... 112.2.2 Using Hadoop from Scratch...................................................................................................................................... 122.2.3 Using Spark from Scratch..........................................................................................................................................172.2.4 Using Spark SQL from Scratch................................................................................................................................. 222.2.5 Using HBase from Scratch........................................................................................................................................ 24

3 Cluster Operation Guide............................................................................................................283.1 Overview...................................................................................................................................................................... 283.2 Cluster List................................................................................................................................................................... 293.3 Creating a Cluster......................................................................................................................................................... 313.4 Viewing Cluster Information........................................................................................................................................ 373.4.1 Viewing Basic Information About a Cluster............................................................................................................. 373.4.2 Viewing Patch Information About a Cluster............................................................................................................. 393.4.3 Entering the Cluster Management Page.................................................................................................................... 393.5 Expanding a Cluster......................................................................................................................................................403.6 Terminating a Cluster................................................................................................................................................... 403.7 Deleting a Failed Task.................................................................................................................................................. 41

MapReduce ServiceUser Guide Contents

Issue 01 (2017-02-20) ii

3.8 Managing Data Files.....................................................................................................................................................413.9 Managing Jobs..............................................................................................................................................................443.9.1 Introduction to Jobs................................................................................................................................................... 443.9.2 Adding a Jar or Script Job......................................................................................................................................... 473.9.3 Submitting a Spark SQL Statement...........................................................................................................................493.9.4 Viewing Job Configurations and Logs...................................................................................................................... 503.9.5 Stopping Jobs.............................................................................................................................................................513.9.6 Replicating Jobs.........................................................................................................................................................513.9.7 Deleting Jobs............................................................................................................................................................. 533.10 Querying Operation Logs........................................................................................................................................... 543.11 Viewing the Alarm List...............................................................................................................................................55

4 Remote Operation Guide...........................................................................................................574.1 Overview...................................................................................................................................................................... 574.2 Logging In to a Master Node........................................................................................................................................584.2.1 Logging In to an ECS Using VNC............................................................................................................................ 584.2.2 Logging In to a Linux ECS Using a Key Pair (SSH)................................................................................................ 594.2.3 Logging In to a Linux ECS Using a Password (SSH)...............................................................................................594.3 Viewing Active and Standby Nodes............................................................................................................................. 594.4 Client Management.......................................................................................................................................................604.4.1 Updating the Client....................................................................................................................................................604.4.2 Using the Client on a Cluster Node........................................................................................................................... 614.4.3 Using the Client on Another Node of a VPC............................................................................................................ 62

5 MRS Manager Operation Guide.............................................................................................. 665.1 MRS Manager Introduction..........................................................................................................................................665.2 Accessing MRS Manager............................................................................................................................................. 685.3 Viewing Running Tasks in a Cluster............................................................................................................................ 685.4 Monitoring Management.............................................................................................................................................. 685.4.1 Viewing the System Overview.................................................................................................................................. 685.4.2 Configuring a Monitoring History Report.................................................................................................................695.4.3 Managing Service and Host Monitoring................................................................................................................... 705.4.4 Managing Resource Distribution...............................................................................................................................755.4.5 Configuring Monitoring Indicator Dumping.............................................................................................................765.5 Alarm Management...................................................................................................................................................... 775.5.1 Viewing and Manually Clearing an Alarm................................................................................................................775.5.2 Configuring an Alarm Threshold.............................................................................................................................. 785.5.3 Configuring Syslog Northbound Interface................................................................................................................ 805.5.4 Configuring SNMP Northbound Interface................................................................................................................ 825.6 Alarm Reference...........................................................................................................................................................835.6.1 ALM-12001 Audit Log Dump Failure...................................................................................................................... 845.6.2 ALM-12002 HA Resource Is Abnormal................................................................................................................... 855.6.3 ALM-12004 OLdap Resource Is Abnormal..............................................................................................................875.6.4 ALM-12005 OKerberos Resource Is Abnormal....................................................................................................... 89


Issue 01 (2017-02-20) iii

5.6.5 ALM-12006 Node Fault............................................................................................................................................ 905.6.6 ALM-12007 Process Fault.........................................................................................................................................925.6.7 ALM-12010 Manager Heartbeat Interruption Between the Active and Standby Nodes...........................................945.6.8 ALM-12011 Manager Data Synchronization Exception Between the Active and Standby Nodes.......................... 955.6.9 ALM-12012 NTP Service Is Abnormal.................................................................................................................... 975.6.10 ALM-12016 CPU Usage Exceeds the Threshold....................................................................................................995.6.11 ALM-12017 Insufficient Disk Capacity................................................................................................................ 1015.6.12 ALM-12018 Memory Usage Exceeds the Threshold............................................................................................1035.6.13 ALM-12027 Host PID Usage Exceeds the Threshold...........................................................................................1055.6.14 ALM-12028 Number of Processes in the D State on the Host Exceeds the Threshold........................................ 1065.6.15 ALM-12031 User omm or Password Is About to Expire......................................................................................1085.6.16 ALM-12032 User ommdba or Password Is About to Expire................................................................................ 1105.6.17 ALM-12033 Slow Disk Fault................................................................................................................................ 1115.6.18 ALM-12034 Periodic Backup Failure................................................................................................................... 1125.6.19 ALM-12035 Unknown Data Status After Recovery Task Failure........................................................................ 1145.6.20 ALM-12037 NTP Server Is Abnormal.................................................................................................................. 1155.6.21 ALM-12038 Monitoring Indicator Dump Failure................................................................................................. 1175.6.22 ALM-12039 GaussDB Data Is Not Synchronized................................................................................................ 1195.6.23 ALM-12040 Insufficient System Entropy............................................................................................................. 1215.6.24 ALM-13000 ZooKeeper Service Unavailable.......................................................................................................1235.6.25 ALM-13001 Available ZooKeeper Connections Are Insufficient........................................................................ 1265.6.26 ALM-13002 ZooKeeper Memory Usage Exceeds the Threshold.........................................................................1285.6.27 ALM-14000 HDFS Service Unavailable.............................................................................................................. 1305.6.28 ALM-14001 HDFS Disk Usage Exceeds the Threshold.......................................................................................1325.6.29 ALM-14002 DataNode Disk Usage Exceeds the Threshold.................................................................................1345.6.30 ALM-14003 Number of Lost HDFS Blocks Exceeds the Threshold....................................................................1365.6.31 ALM-14004 Number of Damaged HDFS Blocks Exceeds the Threshold............................................................1375.6.32 ALM-14006 Number of HDFS Files Exceeds the Threshold............................................................................... 1395.6.33 ALM-14007 HDFS NameNode Memory Usage Exceeds the Threshold............................................................. 1405.6.34 ALM-14008 HDFS DataNode Memory Usage Exceeds the Threshold............................................................... 1425.6.35 ALM-14009 Number of Faulty DataNodes Exceeds the Threshold..................................................................... 1435.6.36 ALM-14010 NameService Service Is Abnormal.................................................................................................. 1465.6.37 ALM-14011 HDFS DataNode Data Directory Is Not Configured Properly......................................................... 1485.6.38 ALM-14012 HDFS JournalNode Data Is Not Synchronized................................................................................1515.6.39 ALM-16000 Percentage of Sessions Connected to the HiveServer to the Maximum Number Allowed Exceedsthe Threshold.................................................................................................................................................................... 1535.6.40 ALM-16001 Hive Warehouse Space Usage Exceeds the Threshold.....................................................................1555.6.41 ALM-16002 Hive SQL Execution Success Rate Is Lower Than the Threshold...................................................1575.6.42 ALM-16004 Hive Service Unavailable.................................................................................................................1595.6.43 ALM-18000 Yarn Service Unavailable................................................................................................................. 1635.6.44 ALM-18002 NodeManager Heartbeat Lost...........................................................................................................1655.6.45 ALM-18003 NodeManager Unhealthy................................................................................................................. 1665.6.46 ALM-18006 MapReduce Job Execution Timeout.................................................................................................167


Issue 01 (2017-02-20) iv

5.6.47 ALM-19000 HBase Service Unavailable.............................................................................................................. 1695.6.48 ALM-19006 HBase Replication Sync Failed........................................................................................................1715.6.49 ALM-25000 LdapServer Service Unavailable...................................................................................................... 1735.6.50 ALM-25004 Abnormal LdapServer Data Synchronization.................................................................................. 1755.6.51 ALM-25500 KrbServer Service Unavailable........................................................................................................ 1785.6.52 ALM-27001 DBService Unavailable.................................................................................................................... 1795.6.53 ALM-27003 DBService Heartbeat Interruption Between the Active and Standby Nodes................................... 1825.6.54 ALM-27004 Data Inconsistency Between Active and Standby DBServices........................................................1835.6.55 ALM-28001 Spark Service Unavailable............................................................................................................... 1865.7 Object Management....................................................................................................................................................1875.7.1 Introduction............................................................................................................................................................. 1875.7.2 Querying Configurations......................................................................................................................................... 1885.7.3 Managing Services.................................................................................................................................................. 1895.7.4 Configuring Service Parameters..............................................................................................................................1895.7.5 Configuring Customized Service Parameters..........................................................................................................1915.7.6 Synchronizing Service Configurations....................................................................................................................1925.7.7 Managing Role Instances.........................................................................................................................................1935.7.8 Configuring Role Instance Parameters.................................................................................................................... 1935.7.9 Synchronizing Role Instance Configuration............................................................................................................1945.7.10 Decommissioning and Recommissioning Role Instances..................................................................................... 1955.7.11 Managing a Host....................................................................................................................................................1955.7.12 Isolating a Host......................................................................................................................................................1965.7.13 Canceling Isolation of a Host................................................................................................................................ 1965.7.14 Starting and Stopping a Cluster............................................................................................................................. 1975.7.15 Synchronizing Cluster Configurations.................................................................................................................. 1975.7.16 Exporting Configuration Data of a Cluster............................................................................................................1985.8 Log Management........................................................................................................................................................1985.8.1 Viewing and Exporting Audit Logs.........................................................................................................................1985.8.2 Exporting Services Logs..........................................................................................................................................1995.8.3 Configuring Audit Log Dumping Parameters......................................................................................................... 2005.9 Health Check Management........................................................................................................................................ 2025.9.1 Performing a Health Check..................................................................................................................................... 2025.9.2 Viewing and Exporting a Check Report.................................................................................................................. 2035.9.3 Configuring the Number of Health Check Reports to Be Reserved........................................................................2045.9.4 Managing Health Check Reports.............................................................................................................................2055.10 Static Service Pool Management..............................................................................................................................2055.10.1 Viewing the Status of a Static Service Pool...........................................................................................................2055.10.2 Configuring a Static Service Pool......................................................................................................................... 2065.11 Tenant Management..................................................................................................................................................2095.11.1 Introduction............................................................................................................................................................2095.11.2 Creating a Tenant................................................................................................................................................... 2105.11.3 Creating a Sub-tenant.............................................................................................................................................213


Issue 01 (2017-02-20) v

5.11.4 Deleting a Tenant................................................................................................................................................... 2155.11.5 Managing a Tenant Directory................................................................................................................................ 2165.11.6 Recovering Tenant Data.........................................................................................................................................2175.11.7 Creating a Resource Pool.......................................................................................................................................2185.11.8 Modifying a Resource Pool................................................................................................................................... 2195.11.9 Deleting a Resource Pool.......................................................................................................................................2195.11.10 Configuring a Queue............................................................................................................................................2205.11.11 Configuring the Queue Capacity Policy of a Resource Pool...............................................................................2215.11.12 Clearing the Configuration of a Queue................................................................................................................2225.12 Backup and Restoration............................................................................................................................................2225.12.1 Introduction........................................................................................................................................................... 2225.12.2 Enabling Cross-Cluster Replication...................................................................................................................... 2265.12.3 Backing Up Metadata............................................................................................................................................ 2275.12.4 Backing Up Service Data...................................................................................................................................... 2295.12.5 Recovering Metadata.............................................................................................................................................2325.12.6 Recovering Service Data....................................................................................................................................... 2355.12.7 Managing Local Quick Recovery Tasks................................................................................................................2375.12.8 Modifying a Backup Task......................................................................................................................................2385.12.9 Viewing Backup and Recovery Tasks................................................................................................................... 2395.13 Security Management............................................................................................................................................... 2405.13.1 List of Default Users..............................................................................................................................................2405.13.2 Changing the Password for User admin................................................................................................................ 2445.13.3 Changing the Password for the Kerberos Administrator.......................................................................................2455.13.4 Changing the Password for the OMS Kerberos Administrator............................................................................. 2455.13.5 Changing the Password for the LDAP (including OMS LDAP) Administrator................................................... 2465.13.6 Changing the Password for a Component Running User...................................................................................... 2475.13.7 Changing the Password for the OMS Database Administrator............................................................................. 2485.13.8 Changing the Password for the Data Access User of the OMS Database............................................................. 2495.13.9 Changing the Password for a Component Database User..................................................................................... 2495.13.10 Replacing the CA Certificate...............................................................................................................................2505.13.11 Replacing HA Certificates................................................................................................................................... 2525.13.12 Updating a Key for a Cluster...............................................................................................................................253

6 FAQs.............................................................................................................................................2556.1 What Is MRS?............................................................................................................................................................ 2556.2 What Are the Highlights of MRS?............................................................................................................................. 2556.3 What Is MRS Used For?.............................................................................................................................................2566.4 How Do I Use MRS?..................................................................................................................................................2566.5 How Do I Ensure Data and Service Running Security?.............................................................................................2576.6 How Do I Prepare a Data Source for MRS?...............................................................................................................2576.7 What Is the Difference Between Data in OBS and That in HDFS?........................................................................... 2586.8 How Do I View All Clusters?.....................................................................................................................................2596.9 How Do I View Log Information?..............................................................................................................................259


Issue 01 (2017-02-20) vi

6.10 What Types of Jobs Are Supported by MRS?.......................................................................................................... 2596.11 How Do I Submit Developed Programs to MRS?....................................................................................................2606.12 How Do I View Cluster Configurations?..................................................................................................................2616.13 What Types of Host Specifications Are Supported by MRS?..................................................................................2616.14 What Components Are Supported by MRS?............................................................................................................2636.15 What Is the Relationship Between Spark and Hadoop?........................................................................................... 2636.16 What Types of Spark Jobs Are Supported by an MRS Cluster?.............................................................................. 2636.17 Can a Spark Cluster Access Data in OBS?...............................................................................................................2636.18 What Is the Relationship Between Hive and Other Components?...........................................................................2636.19 What Types of Distributed Storage Are Supported by MRS?..................................................................................2646.20 Can MRS Cluster Nodes Be Changed on the MRS Management Console?............................................................264

A Change History......................................................................................................................... 265

B Glossary...................................................................................................................................... 266


Issue 01 (2017-02-20) vii

1 Overview

1.1 IntroductionMapReduce Service (MRS) is a data processing and analysis service that is based on a cloudcomputing platform. It is stable, reliable, scalable, and easy to manage. You can use MRSimmediately after applying for it.

MRS builds a reliable, secure, and easy-to-use operation and maintenance (O&M) platformand provides storage and analysis services for massive data, which helps meet enterprisestorage and processing demands. To provide these services and to create clusters on a host,you can both apply for independently and use the hosted Hadoop, Spark, HBase, and Hiveservices. After data storage and computing tasks are fulfilled, the cluster service can beterminated.

1.2 Application ScenariosMRS can be applied in various industries in the processing, analysis, and storage of massivedata.

l Analyzing and processing massive sets of dataUsage: analysis and processing of massive sets of data, online and offline analysis, andbusiness intelligenceCharacteristics: processing of massive data sets, heavy computing workloads, long-termanalysis, and data analysis and processing on a large number of computersApplication scenarios: log analysis, online and offline analysis, simulation calculationsin scientific research, biometric analysis, and spatial-temporal data analysis

l Storing massive sets of dataUsage: storage and retrieval of massive sets of data and data warehouseCharacteristics: storage, retrieval, backup, and disaster recovery of massive sets of datawith zero data lossApplication scenarios: log storage, file storage, simulation data storage in scientificresearch, biological characteristic information storage, genetic engineering data storage,and spatial-temporal data storage

MapReduce ServiceUser Guide 1 Overview

Issue 01 (2017-02-20) 1

1.3 FunctionsMRS, capable of processing and storing massive sets of data, supports the following features:

l Enhanced open-source Hadoop softwarel Spark in-memory computing enginel HBase distributed storage databasel Hive data warehouse

It also supports cluster management. To meet service requirements, you should specify thenode quantity and data disk space when applying for MRS. Then you need only focus on dataanalysis.

1.3.1 Cluster Management FunctionThis section describes the Web interface functions of MRS clusters.

MRS provides a Web interface, the functions of which are described as follows:

l Creating a cluster:Users can create a cluster on MRS. The application scenarios of a cluster are as follows:– Data storage and computing are performed separately. Data is stored in the Object

Storage Service (OBS), which features a low-cost and unlimited storage capacityand clusters can be terminated at any time. The computing performance isdetermined by the OBS access performance and is lower than that of HadoopDistributed File System (HDFS). OBS is recommended when data computing isinfrequent.

– Data storage and computing are performed together. Data is stored in HDFS, whichfeatures high cost, high computing performance, and limited storage capacity.Before terminating clusters, you must export and store the data. HDFS isrecommended when data computing is frequent.

l Expanding clusters:To expand clusters and handle peak service loads, add core nodes.

l Managing clusters:After completing data processing and analysis, you can manage and terminate clusters.– Querying alarms:

If either the system or a cluster is faulty, Elastic BigData will collect faultinformation and report it to the network management system. Maintenancepersonnel will then be able to locate the faults.

– Querying logs:To help locate faults in the case of faulty clusters, operation information is recorded.

– Managing files:MRS supports importing data from the OBS system to HDFS and also exportingdata that has already been processed and analyzed. You can store data in HDFS.

l Adding a job:A job is an executable program provided by MRS to process and analyze user data.Currently, MRS supports MR jobs, Spark jobs, and Hive jobs, and allows users to submitSpark SQL statements online to query and analyze data.


Issue 01 (2017-02-20) 2

l Managing jobs:Jobs can be managed, stopped, or deleted. You can also view details of completed jobsalong with detailed configurations. Spark SQL jobs, however, cannot be stopped.

l Providing management interfaces:MRS Manager functions as a unified management platform for MRS clusters.– Cluster monitoring enables you to quickly see the health status of hosts and

services.– Graphical indicator monitoring and customization enable you to quickly obtain key

information about the system.– Service property configurations help meet service performance requirements.– Cluster, service, and role instance operations enable you to start or stop services and

clusters in one-click mode.

1.3.2 HadoopMRS deploys and hosts Apache Hadoop clusters in the cloud to provide services featuringhigh availability and enhanced reliability for big data processing and analysis.

Hadoop is a distributed system architecture that consists of HDFS, MapReduce, and Yarn.The following describes the functions of each component:l HDFS:

HDFS provides high-throughput data access and is applicable to the processing of largedata sets. MRS cluster data is stored in HDFS.

l MapReduce:As a programming model that simplifies parallel computing, MapReduce gets its namefrom two key operations: Map and Reduce. Map divides one task into multiple tasks, andReduce summarizes their processing results and produces the final analysis result. MRSclusters allow users to submit self-developed MapReduce programs, execute theprograms, and obtain the results.

l Yarn:As the resource management system of Hadoop, Yarn manages and schedules resourcesfor applications. MRS uses Yarn to schedule and manage cluster resources.

For details about Hadoop architecture and principles, see http://hadoop.apache.org/docs/stable/index.html.

1.3.3 SparkSpark is a distributed and parallel data processing framework. MRS deploys and hosts ApacheSpark clusters in the cloud.

Fault-tolerant Spark is a distributed computing framework based on memory, which ensuresthat data can be quickly restored and recalculated. It is more efficient than MapReduce interms of iterative data computing.

In the Hadoop ecosystem, Spark and Hadoop are seamlessly interconnected. By using HDFSfor data storage and Yarn for resource management and scheduling, users can switch fromMapReduce to Spark quickly.

Spark applies to the following scenarios:


Issue 01 (2017-02-20) 3

http://hadoop.apache.org/docs/stable/index.html

http://hadoop.apache.org/docs/stable/index.html

l Data processing and ETL (extract, transform, and load)

l Machine learning

l Interactive analysis

l Iterative computing and data reuse. Users benefit more from Spark when they performoperations frequently and the volume of the required data is large.

l On-demand capacity expansion. This is due to Spark's ease-of-use and low cost in thecloud.

For details about Spark architecture and principles, see http://spark.apache.org/docs/1.5.1/quick-start.html.

1.3.4 Spark SQLSpark SQL is an important component of Apache Spark and subsumes Shark. It helpsengineers unfamiliar with MapReduce to get started quickly. Users can enter SQL statementsdirectly to analyze, process, and query data.

Spark SQL has the following highlights:

l Is compatible with most Hive syntax, which enables seamless switchovers.

l Is compatible with standard SQL syntax.

l Resolves data skew problems.

Spark SQL can join and convert skew data. It evenly distributes data that does notcontain skewed keys to different tasks for processing. For data that contains skewedkeys, Spark SQL broadcasts the smaller amount of data and uses the Map-Side Join toevenly distribute the data to different tasks for processing. This fully utilizes CPUresources and improves performance.

l Optimizes small files.

Spark SQL employs the coalesce operator to process small files and combines partitionsgenerated by small files in tables. This reduces the number of hash buckets during ashuffle operation and improves performance.

For details about Spark SQL architecture and principles, see http://spark.apache.org/docs/1.5.1/sql-programming-guide.html.

1.3.5 HBaseHBase is a column-oriented distributed cloud storage system. It features enhanced reliability,excellent performance, and elastic scalability.

It is applicable in distributed computing and the storage of massive data. With HBase, userscan filter and analyze data with ease and get responses in milliseconds, thereby rapidly miningdata.

HBase applies to the following scenarios:

l Massive data storage

Users can use HBase to build a storage system capable of storing TB or PB of data. Italso provides dynamic scaling capabilities so that users can adjust cluster resources tomeet specific performance or capacity requirements.

l Real-time query


Issue 01 (2017-02-20) 4

http://spark.apache.org/docs/1.5.1/quick-start.html

http://spark.apache.org/docs/1.5.1/quick-start.html

http://spark.apache.org/docs/1.5.1/sql-programming-guide.html

http://spark.apache.org/docs/1.5.1/sql-programming-guide.html

The columnar and key-value storage models apply to the ad-hoc querying of enterpriseuser details. The low-latency point query, based on the master key, reduces the responselatency to seconds or milliseconds, facilitating real-time data analysis.

HBase has the following highlights:l The HBase secondary index enables HBase to query data based on specific column

values. HBase locates the desired data quickly, improving data obtaining efficiency.l HBase supports multi-point division (also called dynamic division). Multi-point division

divides an empty region into multiple regions, thereby preventing performancedeterioration caused by insufficient space.

l HBase stores MOB data. For Apache HBase, if stored data exceeds 100 KB or evenreaches up to 10 MB, inserting the same number of data files generates a large amount ofdata. This causes frequent compaction and split operations, occupying massive CPUresources and decreasing performance. HBase stores MOB data (between 100 KB and 10MB data) in a file system (such as HDFS) in the HFile format. Files are centrallymanaged using the expiredMobFileCleaner and Sweeper tools. The addresses and size offiles are stored in the HBase store as values. The compaction and split frequency issubstantially reduced, improving performance.

For details about HBase architecture and principles, see http://hbase.apache.org/book.html.

1.3.6 HiveHive is a data warehouse framework built on Hadoop. It stores structured data using the Hivequery language (HiveQL), a language similar to SQL.

Hive converts HiveQL statements to MapReduce or HDFS tasks to query and analyzemassive data stored in Hadoop clusters. The console provides the interface to enter HiveScript and supports the online submission of HiveQL statements.

Hive supports the HDFS Colocation, column encryption, Hbase deletion, row delimiter, andCSV SerDe functions, as detailed below.

HDFS Colocation

HDFS Colocation is the data location control function provided by HDFS. The HDFSColocation interface stores associated data or data on which associated operations areperformed on the same storage node.

Hive supports the HDFS Colocation function. When Hive tables are created, after the locatorinformation is set for table files, the data files of related tables are stored on the same storagenode. This ensures convenient and efficient data computing among associated tables.

Column Encryption

Hive supports encryption of one or more columns. The columns to be encrypted and theencryption algorithm can be specified when a Hive table is created. When data is inserted intothe table using the insert statement, the related columns are encrypted.

The Hive column encryption mechanism supports two encryption algorithms that can beselected to meet site requirements during table creation:l AES (the encryption class is org.apache.hadoop.hive.serde2.AESRewriter)l SMS4 (the encryption class is org.apache.hadoop.hive.serde2.SMS4Rewriter)


Issue 01 (2017-02-20) 5

http://hbase.apache.org/book.html

HBase Deletion

Due to the limitations of underlying storage systems, Hive does not support deleting a singlepiece of table data. In Hive on HBase, MRS Hive supports deleting a single piece of HBasetable data. Using specific syntax, Hive can delete one or multiple pieces of data from anHBase table.

Row Delimiter

In most cases, a carriage return character is used as the row delimiter in Hive tables stored intext files, that is, the carriage return character is used as the terminator of a row duringsearches. However, some data files are delimited by special characters, and not a carriagereturn character.

MRS Hive allows users to use different characters or character combinations to delimit rowsof Hive text data. When creating a table, set inputformat toSpecifiedDelimiterInputFormat, and set the following parameter before each search.

set hive.textinput.record.delimiter='';

The table data is then queried by the specified delimiter.

CSV SerDe

Comma separated value (CSV) is a common text file format. CSV stores table data (digits andtexts) in texts and uses a comma (,) as the text delimiter.

CSV files are universal. Many applications allow users to view and edit CSV files inWindows Office or conventional databases.

MRS Hive supports CSV files. Users can import CSV files to Hive tables or export user Hivetable data as CSV files to use them in other applications.

1.4 Relationships with Other ServicesThis section describes the relationships between MRS and other services.

l Virtual Private Cloud (VPC)MRS clusters are created in the subnets of a VPC. VPCs provide secure, isolated, logicalnetwork environments for MRS clusters.

l Object Storage Service (OBS)OBS stores the following user data:– MRS job input data, such as user programs and data files– MRS job output data, such as result files and log files of jobs

l Relational Database Service (RDS)RDS stores MRS operation data, such as MRS cluster metadata and user accounting.

l Elastic Cloud Server (ECS)Each node in an MRS cluster is an ECS.

l Identity and Access Management (IAM)IAM provides authentication for MRS.


Issue 01 (2017-02-20) 6

1.5 Required Permission for Using MRSThis section describes permission required for using MRS.

PermissionUser operation permission varies with the user groups to which the users belong.

Permission required for creating a user and creating or modifying a user group must be set onthe Identity and Access Management (IAM) console. For details, see How Do I Create UserGroups and Assign Rights? and Permission Description in the IAM User Guide. Foroperation permission of other services such as VPC and OBS, add the permission of theservice to the user group to which the user belongs.

Table 1-1 lists MRS permission.

Table 1-1 Permission list

Permission Description Setting

MRS operationpermission

Users with this permissionhave the full operationrights on MRS resources.

Two methods are available:l Add the Tenant Administrator

permission to the user group.l Add the MRS Administrator,

Server Administrator, andTenant Guest permission tothe user group.


Issue 01 (2017-02-20) 7

Permission Description Setting

MRS query permission Users with this permissioncan:l View overview

information aboutMRS.

l Query MRS operationlogs.

l Query MRS clusterlists, including existingclusters, historicalclusters, and task lists.

l View cluster basicinformation and patchinformation.

l View job lists and jobdetails.

l Query the HDFS filelist and file operationrecords.

l Query the alarm list.l Access the MRS

Manager portal.

Add the MRS Administrator orTenant Guest permission to theuser group.NOTE

For MRS, the Tenant Guestpermission and MRS Administratorpermission are the same. Theirdifference is that Tenant Guest hasthe rights to query other cloudservices, while MRS Administratorhas only the rights to query MRSresources.

1.6 LimitationsBefore using MRS, ensure that you have read and understand the following limitations.

l MRS clusters must be created in VPC subnets.

l Only applications that reside in the same subnet as an MRS cluster can access the MRScluster.

l You are advised to use any of the following browsers to access MRS:

– Google Chrome 36.0 or later

– Mozilla Firefox 35.0 or later

– Internet Explorer 9.0 or later

If you use Internet Explorer 9.0, you may fail to log in to the MRS managementconsole because user Administrator is disabled by default in some Windowssystems, such as Windows 7 Ultimate. Internet Explorer automatically selects asystem user for installation. As a result, Internet Explorer cannot access themanagement console. You are advised to reinstall Internet Explorer 9.0 or later asthe administrator (recommended) or alternatively run Internet Explorer 9.0 as theadministrator.

l To prevent illegal access, only assign access permission for security groups used byMRS where necessary.


Issue 01 (2017-02-20) 8

l Do not perform the following operations because they will cause cluster exceptions:– Deleting or modifying the default security group that is created when you create an

MRS cluster.– Deleting, closing, or restarting the cluster nodes displayed in ECS when you use

MRS.– Deleting the processes or files that already exist on the cluster node.– Deleting MRS cluster nodes. Deleted nodes will still be charged.

l If a cluster exception occurs when no incorrect operations haven been performed, contacttechnical support engineers. The technical support engineers will ask you for your keyand then perform troubleshooting after you provide the password.

l If a cluster exception occurs, a daemon will start. If the exception is due to incorrectmanual operations, the daemon may not start. If this happens, stop using the cluster andcreate a new one.

l Changing the password may make services unavailable. For this reason, change thepassword on the ECS console.

l MRS clusters are still charged during exceptions. Contact technical support engineers tohandle cluster exceptions.

l You are not advised to perform the following operations on the ECS: powering off,restarting, or deleting an MRS cluster or its nodes, changing or reinstalling their OS, ormodifying their specifications. Otherwise, the MRS cluster may experience an exception.


Issue 01 (2017-02-20) 9

2 MRS Quick Start

2.1 Introduction to the Operation ProcessMRS is easy to use and provides a user-friendly user interface (UI). By using computersconnected in a cluster, you can run various tasks, and process or store petabytes of data.

A typical procedure for using MRS is as follows:

1. Prepare data.Upload the local programs and data files to be computed to Object Storage Service(OBS).

2. Create a cluster.Create clusters before you use MRS. The cluster quantity is subject to the Elastic CloudServer (ECS) quantity. Configure basic cluster information to complete cluster creation.You can submit a job at the same time when you create a cluster.

NOTE

When you create a cluster, only one new job can be added. If you need to add more jobs, performStep 4

3. Import data.After an MRS cluster is successfully created, use the import function of the cluster toimport OBS data to HDFS. An MRS cluster can process both OBS data and HDFS data.

4. Add a job.After a cluster is created, you can analyze and process data by adding jobs. Note thatMRS provides a platform for executing programs developed by users. You can submit,execute, and monitor such programs by using MRS. After a job is added, the job is in theRunning state by default.

5. View the execution result.The job operation takes a while. After job running is complete, go to the JobManagement page, and refresh the job list to view the execution results on the Job tabpage.You cannot execute a successful or failed job, but can add or copy the job. After settingjob parameters, you can submit the job again.

6. Terminate a cluster.

MapReduce ServiceUser Guide 2 MRS Quick Start

Issue 01 (2017-02-20) 10

If you want to terminate a cluster after jobs are complete, click Terminate in Cluster.The cluster status changes from Running to Terminating. After the cluster isterminated, the cluster status will change to Terminated and will be displayed inHistorical Cluster.

2.2 Quick Start

2.2.1 Creating a ClusterThis section describes how to create a cluster using MRS.

Procedure

Step 1 Log in to the MRS management console.

Step 2 Click Create Cluster and open the Create Cluster page.

NOTE

Note the usage of quotas when you create a cluster. If the resource quotas are insufficient, apply for newquotas based on the prompted information and create new clusters.

The following is a cluster configuration example:

l Cluster Name: This parameter can be set to the default system name. For the ease ofdistinguishing and memorizing, it is recommended that the cluster name be set to a valueconsisting of the employee ID, short spelling of the user's name, or the date. Forexample:mrs_20160907

l AZ: Use the default value. If a cluster already exists in the region, you are advised to usea different region to create a cluster.

l VPC: Use the default value. If no virtual private cloud (VPC) exists, click View VPC toenter VPC, and create a VPC.

l Subnet: Use the default value. If no subnet is created in the VPC, click Create Subnet tocreate a subnet in the corresponding VPC.

l Instance Specifications: Select MRS_16U_32G for both the Master and Core nodes.l Quantity: Retain the default number 2 for the Master nodes and set the number of Core

nodes to 3.l Data Disk: Indicate the Core node data disk storage space. Select Common I/O. The size

is 100 GB.l Key Pair: Select the key pair from the drop-down list. If you have obtained the private

key file, select I acknowledge that I have obtained private key file SSHkey-bba1.pemand that without this file I will not be able to log in to my ECS. If no key pair iscreated, click View Key Pair and create or import a key pair. Then obtain the private keyfile.

l Version: Select MRS 1.2.l Component: Select the Spark, HBase, and Hive components.l Create Job: Do not add a job here and do not select Terminate the cluster after jobs are

completed.

Step 3 Click Create Now.


Issue 01 (2017-02-20) 11

Step 4 Confirm cluster specifications and click Submit.

The cluster creation takes a while. The initial state of the cluster created is Starting. After thecluster is created successfully, the status will be updated to Running. Please be patient.

----End

2.2.2 Using Hadoop from ScratchThis section describes how to use Hadoop to submit a wordcount job. Wordcount, a typicalHadoop job, is used to count the words in texts.

Procedure

Step 1 Prepare the wordcount program.

The open source Hadoop example program contains the wordcount program. You candownload the Hadoop example program at http://dist.apache.org/repos/dist/release/hadoop/common/hadoop-2.7.2/.

Download hadoop-2.7.2.tar.gz, decompress it, and obtain hadoop-mapreduce-examples-2.7.2.jar from the hadoop-2.7.2\share\hadoop\mapreduce directory. Thehadoop-mapreduce-examples-2.7.2.jar example program contains the wordcount program.

Step 2 Prepare data files.

There is no format requirement for data files. Prepare one or multiple TXT files. Thefollowing is an example of a TXT file:

qwsdfhoedfrffrofhuncckgktpmhutopmmajjpsffjfjorgjgtyiuyjmhombmbogohoyhmjhheyeombdhuaqqiquyebchdhmamdhdemmjdoeyhjwedcrfvtgbmojiyhhqssddddddfkfkjhhjkehdeiyrudjhfhfhffooqweopuyyyy

Step 3 Upload data to OBS.

1. Log in to the OBS console.2. Click Create Bucket to create a bucket and name it. The name must be unique or else

the bucket cannot be created. Here the name wordcount will be used as an example.3. In the wordcount bucket, click Create Folder to create the program, input, output,

and log folders, as shown in Figure 2-1.

Figure 2-1 Folder list

– program: stores user programs.


Issue 01 (2017-02-20) 12

http://dist.apache.org/repos/dist/release/hadoop/common/hadoop-2.7.2/

http://dist.apache.org/repos/dist/release/hadoop/common/hadoop-2.7.2/

– input: stores user data files.– output: stores job output files.– log: stores job output log files.

4. Go to the program folder, click to select the program package downloaded in Step1, and click Upload, as shown in Figure 2-2.

Figure 2-2 Program list

5. Go to the input folder and upload the data file which prepared in Step 2 to be computed,as shown in Figure 2-3.

Figure 2-3 Data file list

Step 4 Log in to the MRS management console. In the Cluster drop-down list box, select the clusternamed mrs_20160907. The mrs_20160907 cluster was created in section Creating aCluster.

Step 5 Submit a wordcount job.

Select Job Management. On the Job tab page, click Create to go to the Create Job page, asshown in Figure 2-4.

Only when the mrs_20160907 cluster is in the running state can jobs be submitted.


Issue 01 (2017-02-20) 13

Figure 2-4 Creating a MapReduce job

Table 2-1 describes parameters of the job configuration. The following is a job configurationexample:

l Type: Select MapReduce.

l Name: For example, mr_01.

l Program Path:

Set the path to the address that stores the program on the OBS. Replace the bucket nameand program name with the names of the bucket and program that you created in Step3.3- For example, s3a://wordcount/program/hadoop-mapreduce-examples-2.7.2.jar.

l Parameters:

Indicate the main class of the program to be executed, for example, wordcount.

l Import From:

Set the path to the address that stores the input data files on the OBS. Replace the bucketname and input name with the names of the bucket and file folder that you created inStep 3.3 For example, s3a://wordcount/input.

l Export To:

Set the path to the address that stores the job output files on the OBS. Replace thebucket name and output name with the names of the bucket and file folder that youcreated in Step 3.3 For example, s3a://wordcount/output.

l Log path:


Issue 01 (2017-02-20) 14

Set the path to the address that stores the job log files on the OBS. Replace the bucketname and log name with the names of the bucket and file folder that you created in Step3.3 For example, s3a://wordcount/log.

A job will be executed immediately after being created successfully.

Table 2-1 Job configuration information

Parameter Description

Type Job typePossible types include:l MapReducel Spark Jarl Spark Scriptl Hive ScriptNOTE

To add jobs of the Spark and Hive types, you need to select Spark and Hivecomponents when creating a cluster and the cluster must be in the running state.Spark Script jobs support Spark SQL only, and Spark Jar supports Spark Core andSpark SQL.

Name Job nameThis parameter consists of 1 to 64 characters, including letters, digits,and underscores (_).NOTE

Identical job names are allowed but not recommended.

Program Path Address of the JAR file of the program for executing jobsNOTE

When configuring this parameter, click OBS or HDFS, specify the file path, andclick OK.

This parameter must meet the following requirements:l A maximum of 1023 characters are allowed, but special characters

(*?<">|\) are not allowed. The address cannot be empty or full ofspaces.

l The path varies depending on the file system:– OBS: The path must start with s3a://, for example, s3a://

wordcount/program/hadoop-mapreduce-examples-2.7.2.jar.– HDFS: The path must start with /user.

l Spark Script must end with .sql; MapReduce and Spark Jar must endwith .jar; programs developed using Python must end with .py. sql,jar, and py are case-insensitive.

Parameters Key parameter for executing jobsThis parameter is assigned by an internal function. MRS is onlyresponsible for inputting the parameter.Format: package name.class nameA maximum of 255 characters are allowed, but special characters (;|&>',<$) are not allowed. This parameter can be empty.


Issue 01 (2017-02-20) 15


Import From Address for inputting dataNOTE


This address must start with /user, s3a://. A correct OBS path isrequired.A maximum of 1023 characters are allowed, but special characters (*?<">|\) are not allowed. This parameter can be empty.

Export To Address for outputting dataNOTE


This address must start with /user, s3a://. A correct OBS path isrequired. If no such path is provided, an OBS path will be automaticallycreated.A maximum of 1023 characters are allowed, but special characters (*?<">|\) are not allowed. This parameter can be empty.

Log path Address for storing job logs that record job running statusNOTE



Step 6 View the job execution results.

1. Go to the Job Management page. On the Job tab page, check whether the jobs arecomplete.

The job operation takes a while. After the jobs are complete, refresh the job list, asshown in Figure 2-5.

Figure 2-5 Job list

You cannot execute a successful or failed job, but can add or copy the job. After settingjob parameters, you can submit the job again.

2. Log in to the OBS console. Go to the OBS directory and query job output information.

In the wordcount > output directory of OBS, you can query and download the joboutput files, as shown in Figure 2-6.


Issue 01 (2017-02-20) 16

Figure 2-6 Output file list

3. Log in to the OBS console. Go to the OBS directory and check the detailed job executionresults.In the wordcount > log directory of OBS, you can query and download the jobexecution logs by job ID, as shown in Figure 2-7.

Figure 2-7 Log list

Step 7 Terminate a cluster.

For details, see Terminating a Cluster in the User Guide.

----End

2.2.3 Using Spark from ScratchThis section describes how to use Spark to submit a sparkPi job. SparkPi, a typical Spark job,is used to calculate the value of pi (π).

Procedure

Step 1 Prepare the sparkPi program.

The open source Spark example program contains the sparkPi program. You can downloadthe Spark example program at http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz.

Decompress the Spark example program to obtain the spark-examples-1.5.1-hadoop2.6.0.jar file in the spark-1.5.1-bin-hadoop2.6/lib directory. The spark-examples-1.5.1-hadoop2.6.0.jar example program contains the sparkPi program.


1. Log in to the OBS console.2. Click Create Bucket to create a bucket and name it. The name must be unique or else

the bucket cannot be created. Here the name sparkPi will be used as an example.

3. In the sparkpi bucket, click Create Folder to create the program, output, and logfolders, as shown in Figure 2-8.


Issue 01 (2017-02-20) 17

http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz

http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz

Figure 2-8 Folder list

4. Go to the program folder, click to select the program package downloaded in Step1, and click Upload, as shown in Figure 2-9.

Figure 2-9 Program list

Step 3 Log in to the MRS management console. In the Cluster drop-down list box, select the clusternamed mrs_20160907. The mrs_20160907 cluster was created in section Creating aCluster.

Step 4 Submit a sparkPi job.

Select Job Management. On the Job tab page, click Create to go to the Create Job page, asshown in Figure 2-10.

Only when the mrs_20160907 cluster is in the running state can jobs be submitted.


Issue 01 (2017-02-20) 18

Figure 2-10 Creating a Spark job

Table 2-2 describes parameters of the job configuration. The following is a job configurationexample:

l Type: Select Spark Jar.l Name: For example, job_spark.l Program Path:

Set the path to the address that stores the program on the OBS. Replace the bucket nameand program name with the names of the bucket and program that you created in Step2.3 For example, s3a://sparkpi/program/spark-1.5.1-bin-hadoop2.6libspark-examples-1.5.1-hadoop2.6.0.jar.

l Parameters:Indicate the main class of the program to be executed, for example,org.apache.spark.examples.SparkPi 10.

l Export To:Set the path to the address that stores the job output files on the OBS. Replace thebucket name and output name with the names of the bucket and file folder that youcreated in Step 2.3 For example, s3a://sparkpi/output.

l Log path:Set the path to the address that stores the job log files on the OBS. Replace the bucketname and log name with the names of the bucket and file folder that you created in Step2.3 For example, s3a://sparkpi/log.


Issue 01 (2017-02-20) 19

A job will be executed immediately after being created successfully.
















Issue 01 (2017-02-20) 20











Step 5 View the job execution results.

1. Go to the Job Management page. On the Job tab page, check whether the jobs arecomplete.The job operation takes a while. After the jobs are complete, refresh the job list, asshown in Figure 2-11.

Figure 2-11 Job list


2. Go to the OBS directory and query job output information.In the sparkpi > output directory of OBS, you can query and download the job outputfiles.

3. Go to the OBS directory and check the detailed job execution results.


Issue 01 (2017-02-20) 21

In the sparkpi > log directory of OBS, you can query and download the job executionlogs by job ID, as shown in Figure 2-12.

Figure 2-12 Log list



----End

2.2.4 Using Spark SQL from ScratchTo process structured data, Spark provides the Spark SQL language, which is similar to SQL.

You can create a table named src_data, write a data entry in each row of the src_data table,and store data in the mrs_20160907 cluster. You can then use the SQL statements to querydata in the src_data table. Afterwards, you can delete the src_data table.

PrerequisitesYou have obtained the AK/SK for writing data from the OBS data source to the Spark SQLtable. The method for obtaining the AK/SK is as follows:

1. Log in to the management console.2. Click the username and choose My Credential from the drop-down list.3. Click Access Credentials.4. Click Add Access Key to switch to the Add Access Key page.5. Enter the short message verification code and click OK to download the access key.

Keep the access key secure.

Procedure

Step 1 Prepare data sources for Spark SQL analysis.

The following is an example of a text file:

abcd3ghjiefgh658ko1234jjyu97h8kodfg1kk99icxz3


1. Log in to the OBS management console.2. Click Create Bucket to create a bucket and name it. The name must be unique or else

the bucket cannot be created. Here the name sparksql will be used as an example.


Issue 01 (2017-02-20) 22

3. In the sparksql bucket, click Create Folder to create the input folder.

4. Go to the input folder, click to select a local text file, and click Upload, as shownin Figure 2-13.

Figure 2-13 Input file list

Step 3 Import the text file in the OBS to HDFS.

1. Log in to the MRS management console. In the Cluster drop-down list box, select thecluster named mrs_20160907. The mrs_20160907 cluster was created in sectionCreating a Cluster.

2. In the navigation tree of the MRS management console, choose File Management.3. Click Create Folder and create a userinput file folder.4. Go to the userinput file folder, and click Import Data.5. Select the OBS and HDFS paths and click OK.

OBS path: s3a://sparksql/input/sparksql-test.txtHDFS path: /user/userinput

Step 4 Submit the Spark SQL statement.

1. On the Job Management page, select Spark SQL. The Spark SQL job page isdisplayed.Only when the mrs_20160907 cluster is in the running state can jobs be submitted.

2. Enter the Spark SQL statement to create a table.When entering Spark SQL statements, ensure that they have no more than 10,000characters.The syntax is as follows:CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name [(col_name data_type[COMMENT col_comment], ...)] [COMMENT table_comment] [PARTITIONED BY(col_name data_type [COMMENT col_comment], ...)] [CLUSTERED BY (col_name,col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_bucketsBUCKETS] [ROW FORMAT row_format] [STORED AS file_format] [LOCATIONhdfs_path];You can use the following two methods to create a table:– Method 1: Create table src_data and write data in every row.

The data source is stored in the /user/userinput file folder of HDFS: createexternal table src_data(line string) row format delimited fields terminated by '\\n'stored as textfile location '/user/userinput';The data source is stored in the /sparksql/input file folder of OBS: create externaltable src_data(line string) row format delimited fields terminated by '\\n' stored astextfile location 's3a://AK:SK@sparksql/input';


Issue 01 (2017-02-20) 23

For the method of obtaining the AK/SK, see the description in Prerequisites.– Method 2: Create table src_data1 and load the data to the src_data1 table in

batches.create table src_data1 (line string) row format delimited fields terminated by ',' ;load data inpath '/user/userinput/sparksql-test.txt' into table src_data1;

NOTE

When method 2 is used, the data from OBS cannot be loaded to the created tables directly.

3. Enter the Spark SQL statement to query a table.The syntax is as follows:SELECT col_name FROM table_name;To query data in the src_data table, for example, enter the following statement:select * from src_data;

4. Enter the Spark SQL statement for deleting a table.The syntax is as follows:DROP TABLE [IF EXISTS] table_name;For example:drop table src_data;

5. Click Check to check whether the statements are correct.6. Click Submit.

After submitting Spark SQL statements, you can check whether the execution issuccessful in Last Execution Result and view detailed execution results in Last QueryResult Set.



----End

2.2.5 Using HBase from ScratchHBase is a scalable column-based distributed storage system. It features high reliability andhigh performance.

You can update the client on a Master node in the mrs_20160907 cluster. The client can beused to create a table, data can be inserted to read and deleted from the table, and the table canbe modified and deleted.

Background

After an MRS cluster has been successfully created, the original client is by default stored inthe /opt/client directory on all nodes in the cluster. Before using the client, download theclient file, update the client, and locate the active management node of MRS Manager.

For example, if a user develops an application to manage information about users who useservice A in an enterprise, the operation processes of service A using the HBase client are asfollows:

l Create a user information table.


Issue 01 (2017-02-20) 24

l Add diplomas and titles of users to the table.l Query usernames and addresses by user ID.l Query information by username.l Deregister users and delete user data.l Delete the user information table after service A ends.

Table 2-3 User information

ID Name Gender Age Address

12005000201 A Male 19 City A

12005000202 B Female 23 City B

12005000203 C Male 26 City C

12005000204 D Male 18 City D

12005000205 E Female 21 City E

12005000206 F Male 32 City F

12005000207 G Female 29 City G

12005000208 H Female 30 City H

12005000209 I Male 26 City I

12005000210 J Male 25 City J

Procedure

Step 1 Download the client configuration file.

1. Log in to the MRS management console. In the Cluster drop-down list box, select thecluster named mrs_20160907. The mrs_20160907 cluster was created in sectionCreating a Cluster.

2. In the navigation tree, choose Basic Information, and click Cluster Manager to openMRS Manager.

3. Click Service, and click Download Client.Set Client Type to Only configuration files, set Download Path to Server, and clickOK to generate the client configuration file. The generated file is saved in the /tmp/MRS-client directory on the active management node by default. You can modify thefile save path as required.

Step 2 Log in to the active management node.

1. Choose Basic Information > Cluster Configuration to view the Active Master NodeIP Address parameter. Active Master Node IP Address is the IP address of the activeMaster node in a cluster, which is also the IP address of the active management node ofMRS Manager.The active and standby management nodes of MRS Manager are installed on Masternodes by default. Because Master1 and Master2 are switched over in active and standbymode, Master1 is not always the active management node of MRS Manager.


Issue 01 (2017-02-20) 25

2. Log in to the Master1 node using a key pair as user linux. For details, see Logging In toan ECS Using VNC in the User Guide.The Master node supports Cloud-init. The preset username and password for Cloud-initis linux and cloud.1234. If you have changed the password, log in to the node using thenew password. See "How Do I Log In to an ECS Once All Images Support Cloud-Init?"in the Elastic Cloud Server User Guide (FAQs > Login FAQs > How Do I Log In to anECS Once All Images Support Cloud-Init?).

3. Log in to the active management node as user root.

Step 3 Run the following command to go to the client directory:

After an MRS cluster is successfully created, the client is installed in the /opt/client directoryby default.

cd /opt/client

Step 4 Run the following command to update the client configuration for the active managementnode.

sh refreshConfig.sh /opt/client Full path of the client configuration file package

For example, run the following command:

sh refreshConfig.sh /opt/client /tmp/MRS-client/MRS_Services_Client.tar

If the following information is displayed, the configuration is updated successfully.

ReFresh components client config is complete.Succeed to refresh components client config.

Step 5 Use the client on a Master node.

1. On the active management node where the client is updated, for example, node-master2-LJXDj, run the following command to go to the client directory.cd /opt/client

2. Run the following command to configure the environment variable.source bigdata_env

3. Run an HBase component client command directly.hbase shell

Step 6 Run commands on the HBase client to implement service A.

1. Create a user information table according to Table 2-3.create 'user_info',{NAME => 'i'}

2. Add degree and title information about the user to the table.For example, to add degree and title information about user 12005000201, run thefollowing commands:put 'user_info','12005000201','i:degree','master'put 'user_info','12005000201','i:pose','manager'

3. Query usernames and addresses by user ID.For example, to query the username and address of user 12005000201, run the followingcommand:scan'user_info',{STARTROW=>'12005000201',STOPROW=>'12005000201',COLUMNS=>['i:name','i:address']}


Issue 01 (2017-02-20) 26

4. Query information by username.For example, to query information about user A, run the following command:scan'user_info',{FILTER=>"SingleColumnValueFilter('i','name',=,'binary:A')"}

5. Delete user data from the user information table.All user data needs to be deleted. For example, to delete data of user 12005000201, runthe following command:delete'user_info','12005000201','i'

6. Run the following command to delete the user information table.disable'user_info';drop 'user_info'



----End


Issue 01 (2017-02-20) 27

3 Cluster Operation Guide

3.1 OverviewYou can view the overall cluster status on the Dashboard > Overview page, and obtainrelevant MRS documents by clicking the document name under Helpful Links.

MRS helps manage and analyze massive data. MRS is easy to use and allows you to create acluster in about 20 minutes. You can add MapReduce, Spark, and Hive jobs to clusters toprocess and analyze user data. Additionally, processed data can be encrypted by using SecureSockets Layer (SSL) and transmitted to OBS, ensuring data security and integrity.

Cluster Status

Table 3-1 describes the possible status of each cluster on the MRS management console.

Table 3-1 Cluster status

Status Description

Starting A cluster is being created.

Running A cluster has been created successfully and all components in thecluster are running properly.

Expanding Core nodes are being added to a cluster.

Shrinking The Shrinking state is displayed when a node is being deleted inthe following operations: shutting down the node, deleting thenode, changing the OS of the node, reinstalling the OS of the node,and modifying the specifications of the node.

Abnormal Some components in a cluster are abnormal, and the cluster isabnormal condition.

Terminating A cluster is being terminated.

Failed Cluster creation, termination, or capacity expansion fails.

Terminated A cluster has been terminated.

MapReduce ServiceUser Guide 3 Cluster Operation Guide

Issue 01 (2017-02-20) 28

Job Status

Table 3-2 describes the status of jobs that you can add after logging in to the MRSmanagement console.

Table 3-2 Job status

Status Description

Running A job is being executed.

Completed Job execution is complete and successful.

Terminated A job is stopped during execution.

Abnormal An error occurs during job execution or job execution fails.

3.2 Cluster ListThe cluster list contains all clusters in MRS. You can view clusters in various states. If amassive number of clusters are involved, navigate through multiple pages to view all of theclusters.

MRS, as a platform managing and analyzing massive data, provides a PB-level dataprocessing capability. Multiple clusters can be created. The cluster quantity is subject to theECS quantity.

Clusters are listed in chronological order by default in the cluster list, with the most recentcluster displayed at the top. Table 3-3 describes parameters of the cluster list.

l Active Cluster: contains all clusters except the clusters in the Terminated state.l Historical Cluster: contains only the clusters in the Terminated state. Only clusters

terminated within the last six months are displayed. If you want to view clustersterminated six months ago, contact technical support engineers.

l Failed Task: contains only the tasks in the Failed state. You can also delete failed taskson this page. Task failures include:– Cluster creation failure– Cluster termination failure– Cluster capacity expansion failure

Table 3-3 Parameters in the cluster list


Name Cluster name, which is set when a cluster is created.

ID Unique identifier of a cluster, which is automatically assigned when acluster is created.This parameter is displayed in Active Cluster only.


Issue 01 (2017-02-20) 29


Nodes Number of nodes that can be deployed in a cluster. This parameter is setwhen a cluster is created.NOTE

A small value may cause slow cluster running. Properly set a value based on datato be processed.

Status Status of a cluster.

Created Time when MRS starts charging MRS clusters of the customer.

Terminated Termination time of a cluster, that is, time when charging for the clusterstops. This parameter is displayed in Historical Cluster only.

AZ An available zone of the working zone in the cluster, which is set when acluster is created.

Operation Terminate: If you want to terminate a cluster after jobs are complete,click Terminate. The cluster status changes from Running toTerminating. After the cluster is terminated, the cluster status willchange to Terminated and will be displayed in Historical Cluster. Ifthe MRS cluster fails to be deployed, the cluster is automaticallyterminated.This parameter is displayed in Active Cluster only.NOTE

If a cluster is terminated before data processing and analysis are completed, dataloss may occur. Therefore, exercise caution when terminating a cluster.

Table 3-4 Button description

Button Description

In the drop-down list, select a state to filter clusters:l Active Cluster

– All (Num): displays all existing clusters.– Starting (Num): displays existing clusters in the Starting state.– Running (Num): displays existing clusters in the Running state.– Expanding (Num): displays existing clusters in the Expanding

state.– Shrinking (Num): displays existing clusters in the Shrinking

state.– Abnormal (Num): displays existing clusters in the Abnormal

state.– Terminating (Num): displays existing clusters in the Terminating

state.– Failed (Num): displays the failed tasks in the Failed state.


Issue 01 (2017-02-20) 30

Button Description

Click to open the page for managing failed task.

Num: displays the failed tasks in the Failed state.

Enter a cluster name in the search bar and click to search for acluster.

Click to manually refresh the cluster list.

3.3 Creating a ClusterThis section describes how to create a cluster using MRS.

Procedure

Step 1 Log in to the MRS management console.

Step 2 In the navigation tree of the MRS management console, choose Cluster.

Step 3 Click Create Cluster and open the Create Cluster page.

NOTE

Note the usage of quotas when you create a cluster. If the resource quotas are insufficient, apply for newquotas based on the prompted information and create new clusters.

Step 4 Table 3-5, Table 3-6, Table 3-7, Table 3-8 and Table 3-9 describe the basic configurationinformation, node configuration information, login information, component information andjob configuration information for a cluster, respectively.

Table 3-5 Basic cluster information


Cluster Name Cluster name, which is globally unique.A cluster name can contain only 1 to 64 characters, including letters,digits, and underscores (_).The default name is mrs_xxxx, where xxxx is a random combination offour letters and numbers.

AZ An available zone (AZ) is a physical area that uses independent powerand network resources. In this way, applications are interconnectedusing internal networks but are physically isolated. As a result,application availability is improved. It is recommended that you createclusters under different AZs.Select an AZ of the working zone in the cluster. Currently, only the eu-de working zone is supported.


Issue 01 (2017-02-20) 31


VPC A VPC is a secure, isolated, and logical network environment.Select the VPC for which you want to create a cluster and click ViewVPC to view the name and ID of the VPC. If no VPC is available, createone.

Subnet A subnet provides dedicated network resources that are isolated fromother networks, improving network security.Select the subnet for which you want to create a cluster to enter the VPCand view the name and ID of the subnet.If no subnet is created under the VPC, click Create Subnet to createone.

Table 3-6 Cluster node information


Node Type MRS provides two types of nodes:l Master: A Master node in an MRS cluster manages the cluster,

assigns cluster executable files to Core nodes, traces the executionstatus of each job, and monitors the DataNode running status.

l Core: A Core node in a cluster processes data and stores process datain HDFS.


Issue 01 (2017-02-20) 32


InstanceSpecifications

Instance specifications of a node. MRS supports nine host specifications.The host specifications are determined by CPU, memory, and disksused. Currently, Master nodes support the specifications C2.4xlarge,S1.4xlarge and S1.8xlarge, and Core nodes support all the followingspecifications:l MRS_4U_16G (S1.xlarge)

– CPU: 4-core– Memory: 16 GB– System Disk: 40 GB

l MRS_4U_32G_HDD (D1.xlarge)– CPU: 4-core– Memory: 32 GB– System Disk: 40 GB– Data Disk: 1.8 TB x 3 HDDs

l MRS_8U_16G (C2.2xlarge)– CPU: 8-core– Memory: 16 GB– System Disk: 40 GB

l MRS_8U_64G_HDD (D1.2xlarge)– CPU: 8-core– Memory: 64 GB– System Disk: 40 GB– Data Disk: 1.8 TB x 6 HDDs

l MRS_16U_32G (C2.4xlarge)– CPU: 16-core– Memory: 32 GB– System Disk: 40 GB

l MRS_16U_64G (S1.4xlarge)– CPU: 16-core– Memory: 64 GB– System Disk: 40 GB

l MRS_16U_128G_HDD (D1.4xlarge)– CPU: 16-core– Memory: 128 GB– System Disk: 40 GB– Data Disk: 1.8 TB x 12 HDDs

l MRS_32U_128G (S1.8xlarge)– CPU: 32-core– Memory: 128 GB


Issue 01 (2017-02-20) 33


– System Disk: 40 GBl MRS_36U_256G_HDD (D1.8xlarge)

– CPU: 36-core– Memory: 256 GB– System Disk: 40 GB– Data Disk: 1.8 TB x 24 HDDs

NOTE

l More advanced instance specifications allow better data processing.

l When the specifications of Core nodes are D1.xlarge, D1.2xlarge, D1.4xlarge,or D1.8xlarge, Data Disk is not displayed.

l After you select HDD for core nodes, there is no charging information for datadisks. The fees are charged with ECSs.

l After you select HDD for core nodes, the system disks (40 GB) of Masternodes and core nodes, as well as the data disks (200 GB) of Master nodes, areSATA disks.

l If you select non-HDD disks for core nodes, the disk types of Master and corenodes are determined by Data Disk.

Quantity Number of master and Core nodes.Master: currently fixed at 2.Core: 3 to 100NOTE

l By default, a maximum of 100 Core nodes are supported. If more than 100Core nodes are required, contact technical support engineers or invoke abackground interface to modify the database.

l A small number of nodes may cause clusters to run slowly. Set an appropriatevalue based on data to be processed.


Issue 01 (2017-02-20) 34


Data Disk Disk space of Core nodes. Users can add disks to increase storagecapacity when creating a cluster. There are two different configurationsfor storage and computing, which are described as follows:l Data storage and computing are performed separately

Data is stored in OBS, which features low cost and unlimited storagecapacity. The clusters can be terminated at any time in OBS. Thecomputing performance is determined by OBS access performanceand is lower than that of HDFS. This configuration is recommendedif data computing is infrequent.

l Data storage and computing are performed togetherData is stored in HDFS, which features high cost, high computingperformance, and limited storage capacity. Before terminatingclusters, you must export and store the data. This configuration isrecommended if data computing is frequent.

The following disk types are supported:l SATA: Common I/Ol SAS: High I/Ol SSD: Ultra-high I/OThe disk sizes range from 100 GB to 32000 GB.NOTE

l The Master node increases data disk storage space for MRS Manager. Thedisk type must be the same as the data disk type of Core nodes. The defaultdisk space is 200 GB and cannot be changed.

l When the specifications of Core nodes are D1.xlarge, D1.2xlarge, D1.4xlarge,or D1.8xlarge, Data Disk is not displayed.


Issue 01 (2017-02-20) 35

Table 3-7 Login information


Key Pair Keys are used to log in to Master1 of the cluster.A key pair, also called an SSH key, consists of a public key and a privatekey. You can create an SSH key and download the private key forauthenticating remote login. For security, a private key can only bedownloaded once. Keep it secure.Select the key pair from the drop-down list. If you have obtained theprivate key file, select I acknowledge that I have obtained private keyfile SSHkey-bba1.pem and that without this file I will not be able tolog in to my ECS. If no key pair is created, click View Key Pair tocreate or import keys. Then obtain the private key file.Configure an SSH key using either of the following two methods:l Create an SSH key

After you create an SSH key, a public key and a private key aregenerated. The public key is stored in the system, and the private keyis stored in the local ECS. When you log in to an ECS, the public andprivate keys are used for authentication.

l Import an SSH keyIf you have obtained the public and private keys, import the publickey into the system. When you log in to an ECS, the public andprivate keys are used for authentication.

Table 3-8 Component information


Version Currently, MRS 1.2 is supported.

Component MRS 1.2 supports the following components:l Hadoop 2.7.2: distributed system architecturel Spark 1.5.1: in-memory distributed computing frameworkl HBase 1.0.2: distributed column store databasel Hive 1.3.0: data warehouse framework built on HadoopHadoop is mandatory, and Spark and Hive must be used together. Selectcomponents based on services.



Show You can click Show to display job configuration information.

Hide You can click Hide to hide job configuration information.


Issue 01 (2017-02-20) 36


Create Job You can click Create to submit a job at the same time when you create acluster. Only one job can be added and its status is Running after acluster is created. For details, see Adding a Jar or Script Job.

Name Name of a job

Type Type of a job

Parameter Key parameters for executing an application.

Operation l Edit: modifies job configurations.l Delete: deletes a job.

Step 5 Select Terminate the cluster after jobs are completed based on site requirements.

Step 6 Click Create Now.

Step 7 Confirm cluster specifications and click Submit.

Cluster creation takes some time. While the cluster is being created, its status is Starting.After the cluster is created successfully, the cluster status becomes Running.

----End

3.4 Viewing Cluster InformationAfter an MRS cluster is created, you can view basic information and patch information aboutthe clusters and the cluster management page.

3.4.1 Viewing Basic Information About a ClusterAfter clusters are created, you can monitor and manage clusters. On the Basic Informationpage, you can view information about a cluster such as configurations, deployed nodes, andother basic information.

Table 3-10 and Table 3-11 describe the information about cluster configurations and nodes,respectively.

Table 3-10 Cluster configuration information


Cluster ID Unique identifier of a clusterThis parameter is automatically assigned when a cluster is created.

Cluster Name Cluster nameThis parameter is set when a cluster is created.

Cluster Version MRS versionCurrently, MRS 1.2 is supported.


Issue 01 (2017-02-20) 37


Key Pair Key pair nameThis parameter is set when a cluster is created.

Created Time when MRS starts charging MRS clusters of the customer

AZ An available zone of the working zone in the clusterThis parameter is set when a cluster is created.

Master Node Information about the Master nodeFormat: [instance specification, node quantity]

Core Node Information about a Core nodeFormat: [instance specification, node quantity]

Active MasterNode IPAddress

IP address of the active Master node in a cluster, which is also the IPaddress of the active management node of MRS Manager.

VPC VPC informationThis parameter is set when a cluster is created.A VPC is a secure, isolated, and logical network environment.

Subnet Subnet informationThis parameter is set when a cluster is created.A subnet provides dedicated network resources that are isolated fromother networks, improving network security.

Hadoop Version Hadoop version.

Spark Version Spark versionOnly a Spark cluster displays this version. Because Spark and Hive mustbe used together, Spark and Hive versions are displayed at the sametime.

HBase Version HBase versionOnly an HBase cluster displays this version.

Hive Version Hive versionOnly a Hive cluster displays this version.

Table 3-11 Node information


Add Node For details about adding a Core node to a cluster, see Expanding aCluster.

Name Name of a cluster node.


Issue 01 (2017-02-20) 38


Type Node type.l Master

A Master node in an MRS cluster manages the cluster, assignsMapReduce executable files to Core nodes, traces the executionstatus of each job, and monitors DataNode running status.

l CoreA Core node in a cluster processes data and stores processed data inHDFS.

IP Address IP address of a cluster node.

Specifications Instance specifications of a nodeThis parameter is determined by the CPU, memory, and disks used.NOTE

More advanced instance specifications allow better data processing.

DefaultSecurity Group

Security group name for master and Core nodes, which areautomatically assigned when a cluster is created.This is the default security group. Do not modify or delete the securitygroup because modifying or deleting it will cause a cluster exception.

3.4.2 Viewing Patch Information About a ClusterYou can view patch information about cluster components. If a cluster component, such asHadoop or Spark, is abnormal, download the patch. Then choose System > ClusterManagement > Manage Patch on MRS Manager to upgrade the component to resolve theproblem.

The Patch Information is displayed on the Basic Information page only when patchinformation exists in the database. Patch information contains the following parameters:l Patch Name: patch name set when the patch is uploaded to OBS.l Patch Path: location where the patch is stored in OBS.l Patch Description: patch description.

3.4.3 Entering the Cluster Management PageChoose Basic Information > Cluster Configuration and click Cluster Manager to go to thecluster management page. You can view and handle alarms, modify cluster configurations,and upgrade cluster patches on the page.

You can enter the cluster management page of clusters in the Abnormal, Running,Expanding or Shrinking state only. For details about how to use the cluster managementpage, see MRS Manager Operation Guide.


Issue 01 (2017-02-20) 39

3.5 Expanding a ClusterThe storage and computing capabilities of MRS can be improved by simply adding nodeswithout modifying system architecture. This reduces O&M costs. Core nodes can compute orstore data. You can add Core nodes to expand the node capacities and handle peak loads.

Background

An MRS cluster supports a maximum of 102 nodes. A maximum of 100 Core nodes aresupported by default. If more than 100 Core nodes are required, contact technical supportengineers or invoke a background interface to modify the database.

Only the Core nodes can be expanded. Master nodes cannot be expanded. The maximumnumber of nodes that can be added equals to 100 minus the number of existing Core nodes.For example, if the number of existing Core nodes is 3, a maximum of 97 nodes can be added.If a cluster fails to be expanded, you can perform capacity expansion for the cluster again.

Procedure

Step 1 Log in to the MRS management console. In the navigation tree, choose Basic Information.Then click Add Node.

The expand operation can only be performed on running clusters.

Step 2 Set Node Quantity and click OK.

Cluster expansion is explained as follows:

l During expansion: The cluster status is Expanding. The submitted jobs will be executedand you can submit new jobs. You are not allowed to continue to expand, restart, modify,or terminate clusters.

l Successful expansion: The cluster status is Running. The resources used in the old andnew nodes are charged.

l Failed expansion: The cluster status is Failed. You are allowed to execute jobs andexpand the clusters again.

----End

3.6 Terminating a ClusterIf you do not need an MRS cluster after the job execution is complete, you can terminate theMRS cluster.

Background

If a cluster is terminated before data processing and analysis are completed, data loss mayoccur. Therefore, exercise caution when terminating a cluster. If MRS cluster deploymentfails, the cluster is automatically terminated.


Issue 01 (2017-02-20) 40

Procedure

Step 1 Log in to MRS management console.


Step 3 In the Operation column of the cluster that you want to terminate, click Terminate.

The cluster status changes from Running to Terminating. After the cluster is terminated, thecluster status will change to Terminated and will be displayed in Historical Cluster.

----End

3.7 Deleting a Failed TaskThis section describes how to delete an MRS task.

BackgroundIf cluster creation, termination, or capacity expansion fails, the failed task is first displayed inActive Cluster and then transferred to the Manage Failed Task page within 1 hour. If you donot need the failed task, you can delete it on the Manage Failed Task page.

Procedure

Step 1 Log in to MRS management console.


Step 3 Click Failed Task.

The Manage Failed Task page is displayed.

Step 4 In the Operation column of the task that you want to delete, click Delete.

This operation deletes only a single failed task.

Step 5 You can click Delete All on the upper left of the task list to delete all tasks.

----End

3.8 Managing Data FilesYou can create directories, delete directories, and import, export, or delete files on the FileManagement page.

BackgroundData to be processed by MRS is stored in either OBS or HDFS. OBS provides you withmassive, highly reliable, and secure data storage capabilities at a low cost. You can view,manage, and use data through OBS Console or OBS Browser. In addition, you can use theREST APIs to manage or access data. The REST APIs can be used alone or it can beintegrated with service programs.

Before creating jobs, upload the local data to OBS for computing and analysis. MRS allowsdata to be exported from OBS to HDFS for computing and analysis. After the analysis and


Issue 01 (2017-02-20) 41

computing are complete, you can either store the data in HDFS or export it to OBS. HDFSand OBS can store compressed data in the format of bz2 or gz.

Importing Data

MRS supports data import from the OBS system to HDFS. This function is recommended ifthe data size is small, because the upload speed reduces as the file size increases.

Both files and folders containing files can be imported. The operations are as follows:

1. Log in to the MRS management console and go to the File Management page.2. Select HDFS File List.3. Click the data storage directory, for example, bd_app1.

bd_app1 is just an example. The storage directory can be any directory on the page. Youcan create a directory by clicking Create Folder.The name of the created directory must meet the following requirements:– Contains a maximum of 255 characters, and the full path contains a maximum of

1023 characters.– Cannot be empty.– Must be different from the names of the directories at same level.– Cannot contain special characters (/:*?"<|>\).– Cannot start or end with a period (.).

4. Click Import Data to configure the paths for HDFS and OBS.

NOTE

When configuring the OBS or HDFS path, click Browse, select the file path, and click OK.

– The path for OBSn Must start with s3a://.n Files and programs encrypted by the KMS cannot be imported.n Empty folders cannot be imported.n Directories and file names can contain letters, Chinese characters, digits, and

underscores (_), but cannot contain special characters (;|&><'$*?\).n Directories and file names cannot start or end with spaces, but can have spaces

between other characters.n The full path of OBS contains a maximum of 1023 characters.

– The path for HDFSn Must start with /user.n Directories and file names can contain letters, Chinese characters, digits, and


between other characters.n The full path of HDFS contains a maximum of 1023 characters.n The parent HDFS directory in HDFS File List is displayed in the textbox for

the HDFS path by default when data is imported.5. Click OK.


Issue 01 (2017-02-20) 42

View the upload progress in File Operation Record. The data import operation isoperated as a Distcp job by MRS. You can check whether the Distcp job is successfullyexecuted in Job Management > Job.

Exporting DataAfter data is processed and analyzed, you can either store the data in HDFS or export it to theOBS system.

Both files and folders containing files can be exported. The operations are as follows:

1. Log in to the MRS management console and go to the File Management page.2. Select HDFS File List.3. Click the data storage directory, for example, bd_app1.4. Click Export Data and configure the paths for HDFS and OBS.

NOTE

When configuring the OBS or HDFS path, click Browse, select the file path, and click OK.

– The path for OBSn Must start with s3a://.n Directories and file names can contain letters, Chinese characters, digits, and


between other characters.n The full path of OBS contains a maximum of 1023 characters.

– The path for HDFSn Must start with /user.n Directories and file names can contain letters, Chinese characters, digits, and


between other characters.n The full path of HDFS contains a maximum of 1023 characters.n The parent HDFS directory in HDFS File List is displayed in the textbox for

the HDFS path by default when data is exported.

NOTE

Ensure that the exported folder is not empty. If an empty folder is exported to the OBS system, thefolder is exported as a file. After the folder is exported, its name is changed, for example, from testto test-$folder$, and its type is file.

5. Click OK.View the upload progress in File Operation Record. The data export operation isoperated as a Distcp job by MRS. You can check whether the Distcp job is successfullyexecuted in Job Management > Job.

Viewing File Operation RecordsWhen importing or exporting data on the MRS management console, you can choose FileManagement > File Operation Record to view the import or export progress.

Table 3-12 lists the parameters in file operation records.


Issue 01 (2017-02-20) 43

Table 3-12 Parameters in file operation records


Created Time when data import or export is started

Source Path Source path of datal In data import, Source Path is the OBS path.l In data export, Source Path is the HDFS path.

Target Path Target path of datal In data import, Target Path is the HDFS path.l In data export, Target Path is the OBS path.

Status Status of the data import or export operationl Runningl Completedl Terminatedl Abnormal

Duration (min) Total time used by data import or exportUnit: minute

Result Data import or export resultl Successfull Failed

Operation View Log: You can click View Log to view log information of a job. Fordetails, see Viewing Job Configurations and Logs.

3.9 Managing JobsYou can query, add, and delete MRS jobs on the Job Management page.

3.9.1 Introduction to JobsA job is an executable program provided by MRS to proces and analyze user data. All addedjobs are displayed in Job Management, where you can add, query, and manage jobs.

Job Types

An MRS cluster allows you to create and manage the following jobs.

l MapReduce: provides the capability to process massive data quickly and in parallel. It isa distributed data processing mode and execution environment. MRS supports thesubmission of the MapReduce Jar program.

l Spark: functions as a distributed computing framework based on memory. MRS supportsthe submission of Spark Jar, Spark Script, and Spark SQL jobs.


Issue 01 (2017-02-20) 44

– Spark Jar: submits the Spark Jar program, executes the Spark application, andcomputes and processes user data.

– Spark Script: submits the Spark Script script and batch executes Spark SQLstatements.

– Spark SQL: uses Spark SQL statements (similar to SQL statements) to query andanalyze user data in real time.

l Hive: functions as an open-source data warehouse constructed on Hadoop. MRSsupports the submission of the Hive Script script and batch executes HiveQL statements.

If you fail to create a job in a Running cluster, check the component health status on thecluster management page. For details, see Viewing the System Overview.

Job ListJobs are listed in chronological order by default in the job list, with the most recent jobsdisplayed at the top. Table 3-13 describes parameters of the job list.

Table 3-13 Parameters of the job list


Name Job nameThis parameter is set when a job is added.

ID Unique identifier of a jobThis parameter is automatically assigned when a job is added.

Type Job typePossible types include:l Distcp (data import and export)l MapReducel Spark Jarl Spark Scriptl Spark SQLl Hive Script

NOTEAfter you import or export data on the File Management page, you can viewthe Distcp job on the Job Management page.

Status Job statusl Runningl Completedl Terminatedl Abnormal


Issue 01 (2017-02-20) 45


Result Execution result of a jobl Successfull FailedNOTE

You cannot execute a successful or failed job, but can add or copy the job. Aftersetting job parameters, you can submit the job again.

Created Time when a job starts

Duration (min) Duration of executing a job, specifically from the time when a job isstarted to the time when the job is completed or stopped.Unit: minute

Operation l View Log: You can click View Log to view log information of a job.For details, see Viewing Job Configurations and Logs.

l View: You can click View to view job details. For details, seeViewing Job Configurations and Logs.

l More– Stop: You can click Stop to stop a running job. For details, see

Stopping Jobs.– Copy: You can click Copy to copy and add a job. For details, see

Replicating Jobs.– Delete: You can click Delete to delete a job. For details, see

Deleting Jobs.NOTE

l Spark SQL jobs cannot be stopped.

l Deleted jobs cannot be recovered. Therefore, exercise caution whendeleting a job.

l If you configure the system to save job logs to an HDFS or OBS path, thesystem compresses the logs and saves them to the specified path after jobexecution is complete. In this case, the job remains in the Running stateafter execution is complete and changes to the Completed state after thelogs are successfully saved. The time required for saving the logs dependson the log size. The process generally takes a few minutes.


Button Description

In the drop-down list, select a job state to filter jobs.l All (Num): displays all jobs.l Completed (Num): displays jobs in the Completed state.l Running (Num): displays jobs in the Running state.l Terminated (Num): displays jobs in the Terminated state.l Abnormal (Num): displays jobs in the Abnormal state.


Issue 01 (2017-02-20) 46

Button Description

Enter a job name in the search bar and click to search for a job.

Click to manually refresh the job list.

3.9.2 Adding a Jar or Script JobYou can submit developed programs to MRS, execute them, and obtain the execution result.This section describes how to add a job.

PrerequisitesYou have completed the procedure described in Background.

Procedure

Step 1 Log in to the MRS management console and go to the Job Management page.

Step 2 On the Job tab page, click Create and go to the Create Job page.

Table 3-15 describes job configuration information








Issue 01 (2017-02-20) 47

















Issue 01 (2017-02-20) 48





NOTE

l The OBS path supports s3a://.

l Files and programs encrypted by the KMS cannot be imported if the OBS path is used.

l The full path of HDFS and OBS contains a maximum of 1023 characters.

Step 3 Confirm job configuration information and click OK.

After jobs are added, you can manage them.

----End

3.9.3 Submitting a Spark SQL StatementThis section describes how to use Spark SQL. You can submit a Spark SQL statement toquery and analyze data on the MRS management console page. To submit multiplestatements, separate them from each other using semicolons (;).

Procedure

Step 1 Log in to the MRS management console and go to the Job Management page.

Step 2 Select Spark SQL. The Spark SQL job page is displayed.

Step 3 Enter the Spark SQL statement for table creation.

When entering Spark SQL statements, ensure that they have no more than 10,000 characters.

Syntax:

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name [(col_name data_type[COMMENT col_comment], ...)] [COMMENT table_comment] [PARTITIONED BY(col_name data_type [COMMENT col_comment], ...)] [CLUSTERED BY (col_name,col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS][ROW FORMAT row_format] [STORED AS file_format] [LOCATION hdfs_-path];

Use either of the following methods to create a table:

l Method 1: Create an src_data table and write data in every row. The data is stored inthe /user/guest/input directory.create external table src_data(line string) row format delimited fields terminated by '\\n' stored as text file location '/user/guest/input/';


Issue 01 (2017-02-20) 49

l Method 2: Create an src_data table and load the data to the src_dada1 table.create table src_data1 (eid int, name String, salary String, destination String) rowformat delimited fields terminated by ',' ;load data in path '/tttt/test.txt' into table src_data1;

NOTE

The data from OBS cannot be loaded to the created tables in method 2.

Step 4 Enter the Spark SQL statement for table query.

Syntax:

SELECT col_name FROM table_name;

Example:

select * from src_data;

Step 5 Enter the Spark SQL statement for table deletion.

Syntax:

DROP TABLE [IF EXISTS] table_name;

Example:

drop table src_data;

Step 6 Click Check to check the statement correctness.

Step 7 Click Submit.

After submitting Spark SQL statements, you can check whether the execution is successful inLast Execution Result and view detailed execution results in Last Query Result Set.

----End

3.9.4 Viewing Job Configurations and LogsThis section describes how to view job configurations and logs.

Backgroundl You can view configurations of all jobs.l For clusters created on MRS of a version earlier than 1.0.7, logs of completed jobs in the

clusters cannot be viewed. For clusters created on MRS 1.0.7 or later, logs of all jobs canbe viewed.

Procedure

Step 1 Log in to the MRS management console and select a running cluster.

Step 2 In the navigation tree on the left, click Job Management.

Step 3 In the Operation column corresponding to the selected job, click View.

In the View Job Information window that is displayed, configuration of the selected job isdisplayed.


Issue 01 (2017-02-20) 50

Step 4 Select a MapReduce job, and click View Log in the Operation column corresponding to theselected job.

In the page that is displayed, log information of the selected job is displayed.

The MapReduce job is only an example. You can view log information about MapReduce,Spark Jar, Spark Script, and Hive Script jobs regardless of their status.

----End

3.9.5 Stopping JobsThis section describes how to stop running MRS jobs.

Background

Spark SQL jobs cannot be stopped. After a job is stopped, its status changes to Terminated,and it cannot be executed again.

Procedure



Step 3 Select a running job and choose More > Stop in the Operation column corresponding to theselected job.

The job status changes from Running to Terminated.

NOTE

When you submit a job on the Spark SQL page, you can click Cancel to stop the job.

----End

3.9.6 Replicating JobsThis section describes how to replicate MRS jobs.

Background

Currently, all types of jobs except for Spark SQL and Distcp jobs can be replicated.

Procedure



Step 3 In the Operation column corresponding to the to-be-replicated job, choose More > Copy.

The Copy Job dialog box is displayed.

Step 4 Set job parameters, and click OK.

Table 3-16 describes job configuration information.


Issue 01 (2017-02-20) 51

After being successfully submitted, a job changes to the Running state by default; you do notneed to manually execute the job.
















Issue 01 (2017-02-20) 52











----End

3.9.7 Deleting JobsThis section describes how to delete MRS jobs.

Background

Jobs can be deleted one after another or in a batch. The deletion operation is irreversible.Exercise caution when performing this operation.

Procedure



Step 3 In the Operation column corresponding to the selected job, choose More > Delete.

This operation deletes only a single job.

Step 4 You can select multiple jobs and click Delete on the upper left of the job list.


Issue 01 (2017-02-20) 53

This operation deletes multiple jobs at a time.

----End

3.10 Querying Operation LogsThe Operation Log page records cluster and job operations. Logs are typically used toquickly locate faults in case of cluster exceptions, helping you resolve problems.

Operation TypesCurrently, two types of operations are recorded in the logs. You can filter and search for adesired type of operations.

l Cluster: Creating, terminating, shrinking, and expanding a clusterl Job: Creating, stopping, and deleting a job

Log ParametersLogs are listed in chronological order by default in the log list, with the most recent logsdisplayed at the top.

Table 3-17 describes parameters in logs.

Table 3-17 Description of parameters in logs


Operation Type Operation typePossible types include:l Clusterl Job

IP Address IP address where an operation is executed.NOTE

If MRS cluster deployment fails, the cluster is automatically terminated, and theoperation log of the terminated cluster does not contain the user's IP Addressinformation.

OperationDetails

Operation contentThe content can contain a maximum of 2048 characters.

Operation Time Operation timeFor terminated clusters, only those terminated within the last six monthsare displayed. If you want to view clusters terminated six months ago,contact technical support engineers.


Issue 01 (2017-02-20) 54


Button Description

In the drop-down list, select an operation type to filter logs.l All: displays all logs.l Cluster: displays logs of Cluster.l Job: displays logs of Job.

Filter logs by time.1. Click the button.2. Specify the date and time.3. Click OK.Enter the query start time in the left box and end time in the right box.The end time must be later than the start time. Otherwise, logs cannot befiltered by time.

Enter key words in Operation Details and click to search for logs.

Click to manually refresh the log list.

3.11 Viewing the Alarm ListThe alarm list provides information about all alarms in the MRS cluster. Examples of alarmsinclude host faults, disk usage exceeding the threshold, and component abnormalities.

In Alarm on the MRS management console, you can only view basic information aboutalarms that are not cleared in MRS Manager. If you want to view alarm details or managealarms, log in to MRS Manager. For details, see Alarm Management.

Alarms are listed in chronological order by default in the alarm list, with the most recentalarms displayed at the top.

Table 3-19 describes alarm parameters.

Table 3-19 Alarm parameters


Severity Alarm severityPossible values include:l Criticall Majorl Warningl Minor


Issue 01 (2017-02-20) 55


Service Name of the service that reports the alarm

Description Alarm description

Generated Alarm generation time


Button Description

In the drop-down list, select an alarm severity to filter alarms.l All: displays all alarms.l Critical: displays Critical alarms.l Major: displays Major alarms.l Warning: displays Warning alarms.l Minor: displays Minor alarms.

Click to manually refresh the alarm list.


Issue 01 (2017-02-20) 56

4 Remote Operation Guide

4.1 OverviewThis section describes remote login, MRS cluster node types, and node functions.

MRS cluster nodes support remote login. The following remote login methods are available:l GUI login: Use the remote login function provided by the ECS management console to

log in to the Linux interface of Master node.l SSH login: Applies to Linux ECSs only. You can use a remote login tool (such as

PuTTY) to log in to an ECS. To use this method, you must assign an elastic IP address(EIP) to the ECS.For details about applying for and binding an elastic IP address for Master nodes, see"Assigning an EIP and Binding It to an ECS" under the "Management" section in theVPC User Guide.You can log in to a Linux ECS using either a key pair or password. To use a key pair,you must log in to the Linux ECS as user linux. For the login procedure, see Logging Into a Linux ECS Using a Key Pair (SSH). If you use a password, see Logging In to aLinux ECS Using a Password (SSH).

In an MRS cluster, a node is an ECS. Table 4-1 describes node types and functions.

MapReduce ServiceUser Guide 4 Remote Operation Guide

Issue 01 (2017-02-20) 57

Table 4-1 Cluster node types

Node Type Function

Master node Management node of an MRS cluster. It manages and monitorsthe cluster.In the navigation tree of the MRS management console, clickBasic Information. In Node information, view the Name. Thenode that contains master1 in its name is the Master1 node.The node that contains master2 in its name is the Master2node.You can log in to a Master node either using VNC on the ECSmanagement console or using SSH. After logging in to theMaster node, you can access Core nodes without enteringpasswords.The system automatically deploys the Master nodes in active/standby mode and supports the high availability (HA) featurefor MRS cluster management. If the active management nodefails, the standby management node switches to the active stateand takes over services.To determine whether the Master1 node is the activemanagement node, see Viewing Active and Standby Nodes.

Core node Working node of an MRS cluster. It processes and analyzesdata and stores process data on HDFS.

4.2 Logging In to a Master NodeThis section describes how to log in to the Master nodes of a cluster using the GUI and SSH.

4.2.1 Logging In to an ECS Using VNCThis section describes how to log in to an ECS using VNC on the ECS management console.This login method is mainly used for emergent O&M. In other scenarios, it is recommendedthat you log in to the ECS using SSH.

Login Notes

If no default image password is set when Cloud-Init is installed, you must log in to the ECSby following the instructions provided in section Logging In to a Linux ECS Using a KeyPair (SSH). After logging in using SSH, you can set the ECS login password. For detailsabout other login notes, see "Logging In to an ECS Using VNC" in the ECS User Guide(Getting Started > Logging In to an ECS > Logging In to an ECS Using VNC).

Logging In to an ECS


Step 2 On the Basic Information page, query the IP addresses of Master1 and Master2 nodes.


Issue 01 (2017-02-20) 58

Step 3 Log in to the ECS management console. Choose IP Address from the drop-down list of thesearch box on the right.

Step 4 Enter the IP address of Master1 or Master2 and click .

Step 5 In the searched ECS, click Remote Login in Operation.

For details about remote login to an ECS, see "Logging In to an ECS Using VNC" in the ECSUser Guide (Getting Started > Logging In to an ECS > Logging In to an ECS UsingVNC).

----End

Changing the OS Keyboard Language

All nodes in the MRS cluster use the Linux OS. For details about changing the OS keyboardlanguage, see "Logging In to an ECS Using VNC" in the ECS User Guide (Getting Started >Logging In to an ECS > Logging In to an ECS Using VNC).

4.2.2 Logging In to a Linux ECS Using a Key Pair (SSH)This section describes how to log in to a Linux ECS using a key pair.

For details about logging in to a Linux ECS using a key pair, see "Logging In to a Linux ECSUsing a Key Pair (SSH)" in the ECS User Guide (Getting Started > Logging In to an ECS> Logging In to a Linux ECS Using a Key Pair (SSH)).

4.2.3 Logging In to a Linux ECS Using a Password (SSH)Logging in to a Linux ECS in SSH password authentication mode is disabled by default. Ifyou require the password authentication mode, configure it after logging in to the ECS. Toensure system security, you must reset the common user password for logging in to the LinuxECS after configuring the SSH password authentication mode.

All nodes in the MRS cluster use the Linux OS. For details about logging in to a Linux ECSusing a password, see "Logging In to a Linux ECS Using a Password (SSH)" in the ECS UserGuide (Getting Started > Logging In to an ECS > Logging In to a Linux ECS Using aPassword (SSH)).

4.3 Viewing Active and Standby NodesThis section describes how to confirm the active and standby management nodes of MRSManager.

Background

You can log in to other nodes in a cluster from the Master node. After logging in to the Masternode, you can confirm the active and standby management nodes of MRS Manager and runcommands on the corresponding management nodes.

Procedure

Step 1 Log in to the MRS management console and view Basic Information of the specified cluster.


Issue 01 (2017-02-20) 59

Step 2 In Cluster Configuration, view the Active Master Node IP Address.

Active Master Node IP Address is the IP address of the active Master node in a cluster,which is also the IP address of the active management node of MRS Manager.

----End

4.4 Client Management

4.4.1 Updating the Client

Scenario

The MRS cluster provides the client, and users can use the client in scenarios such as serverconnection, task result query, and data management. When users need to use the MRS clientor service configuration parameters are modified on MRS Manager and the service isrestarted, the client configuration file must be prepared and the client must be updated.

During cluster creation, the original client is installed and saved in the /opt/client directory onall nodes in the cluster by default. After the cluster is created, only the clients on Master nodescan be used directly, and the client on the Core node must be updated before being used.

Procedure

Step 1 Log in to MRS Manager.

Step 2 Click Service, and click Download Client.

Set Client Type to Only configuration files, set Download Path to Server, and click OK togenerate the client configuration file. The generated file is saved in the /tmp/MRS-clientdirectory on the active management node by default. You can modify the file save path asrequired.

Step 3 On the MRS management console, view Basic Information of the specified cluster.

Step 4 In Cluster Configuration, view the Active Master Node IP Address.

Active Master Node IP Address is the IP address of the active Master node in a cluster,which is also the IP address of the active management node of MRS Manager.

Step 5 Locate the active management node based on the IP address and log in to the activemanagement node as user linux using VNC. For details, see Logging In to an ECS UsingVNC.

The Master node supports Cloud-init. The preset username and password for Cloud-init islinux and cloud.1234. If you have changed the password, log in to the node using the newpassword.

Step 6 Run the following command to switch the user:

sudo su - omm


cd /opt/client


Issue 01 (2017-02-20) 60

Step 8 Run the following command to update the client configuration:

sh refreshConfig.sh Client installation directory Full path of the client configuration filepackage

For example, run the following command:

sh refreshConfig.sh /opt/client /tmp/MRS-client/MRS_Services_Client.tar

If the following information is displayed, the configuration is updated successfully.

ReFresh components client config is complete.Succeed to refresh components client config.

----End

4.4.2 Using the Client on a Cluster Node

Scenario

After the client is updated, users can use the client on a Master node or the Core node in thecluster.

Prerequisites

The client has been updated on the active management node.

Procedurel Use the client on a Master node.

a. On the active management node where the client is updated, that is, a Master node,run the sudo su - root command to switch the user. Run the following command togo to the client directory:cd /opt/client

b. Run the following command to configure the environment variable:source bigdata_env

c. Run a component client command directly.For example, to view files in the HDFS root directory by running an HDFS clientcommand, run hdfs dfs -ls /.

l Use the client on the Core node.

a. Update the client on the active management node.b. Locate the active management node based on the IP address and log in to the active

management node as user linux using VNC. For details, see Logging In to an ECSUsing VNC.

c. On the active management node, run the following command to switch the user:sudo su - omm

d. On the MRS management console, view IP Address in the Cluster Configurationof the specified cluster.

e. On the active management node, run the following command to copy the packageto the Core node:


Issue 01 (2017-02-20) 61

scp -p /tmp/MRS-client/MRS_Services_Client.tar IP address of the Corenode:///opt/client

f. Log in to the Core node as user linux. For details, see Logging In to an ECS UsingVNC.The Core node supports Cloud-init. The preset username and password for Cloud-init is linux and cloud.1234. If you have changed the password, log in to the nodeusing the new password.

g. On the Core node, run the following command to switch the user:sudo su - omm

h. Run the following command to update the client configuration:sh /opt/client/refreshConfig.sh Client installation directory Full path of the clientconfiguration file packageFor example, run the following command:sh refreshConfig.sh /opt/client /opt/client/MRS_Services_Client.tar

i. Run the following commands to go to the client directory and configure theenvironment variable:cd /opt/clientsource bigdata_env

j. Run a component client command directly.For example, to view files in the HDFS root directory by running an HDFS clientcommand, run hdfs dfs -ls /.

4.4.3 Using the Client on Another Node of a VPC

Scenario

After the client is prepared, users can use the client on a node outside the MRS cluster.

Prerequisitesl An ECS has been prepared. For details about the OS and its version of the ECS, see

Table 4-2.

Table 4-2 Reference list

OS Supported Version

SUSE l Recommended: SUSE Linux Enterprise Server 11 SP4 (SUSE11.4)

l Available: SUSE Linux Enterprise Server 11 SP3 (SUSE 11.3)l Available: SUSE Linux Enterprise Server 11 SP1 (SUSE 11.1)l Available: SUSE Linux Enterprise Server 11 SP2 (SUSE 11.2)

Red Hat l Recommended: Red Hat-6.6-x86_64 (Red Hat 6.6)l Available: Red Hat-6.4-x86_64 (Red Hat 6.4)l Available: Red Hat-6.5-x86_64 (Red Hat 6.5)l Available: Red Hat-6.7-x86_64 (Red Hat 6.7)


Issue 01 (2017-02-20) 62

OS Supported Version

CentOS l Available: CentOS-6.4 (CentOS 6.4)l Available: CentOS-6.5 (CentOS 6.5)l Available: CentOS-6.6 (CentOS 6.6)l Available: CentOS-6.7 (CentOS 6.7)l Available: CentOS-7.2 (CentOS 7.2)

For example, a user can select the enterprise imageEnterprise_SLES11_SP4_latest(4GB) or standard imageStandard_CentOS_7.2_latest(4GB) to prepare the OS for an ECS.In addition, sufficient disk space is allocated for the ECS, for example, 40GB.

l The ECS and the MRS cluster are in the same VPC.l The NICs of the ECS and the MRS cluster are in the same network segment.l The security group of the ECS is the same as that of the Master node of the MRS cluster.

If the preceding requirements are not met, modify the ECS security group or configurethe inbound and outbound rules of the ECS security group to allow the ECS securitygroup to be accessed by all security groups of MRS cluster nodes.For details about how to create an ECS that meets the preceding requirements, see"Creating an ECS" in the Elastic Cloud Server User Guide (Getting Started > Creatingan ECS).

l To enable users to log in to the Linux ECS using a password (SSH), see "Logging In to aLinux ECS Using a Password (SSH)" in the Elastic Cloud Server User Guide (GettingStarted > Logging In to an ECS > Logging In to a Linux ECS Using a Password(SSH)).

Procedure

Step 1 Create an ECS that meets the requirements in the prerequisites.

Step 2 Log in to the MRS management console, and click Cluster Manager to open MRS Manager.

Step 3 Click Service, and click Download Client.

Step 4 In Client Type, select All client files.

Step 5 In Download Path, select Remote host.

Step 6 Set Host IP Address to the IP address of the ECS, set Host Port to 22, and set Save Path to /home/linux.l If the default port 22 for logging in to an ECS using SSH has been changed, set Host

Port to the new port.l Save Path contains a maximum of 256 characters.

Step 7 Set Login User to linux.

If other users are used, ensure that the users have read, write, and execute permission on thesave path.


Issue 01 (2017-02-20) 63

Step 8 In SSH Private Key, select and upload the private key used for creating the ECS.

Step 9 Click OK to start downloading the client to the ECS.

If the following information is displayed, the client package is successfully saved. ClickClose.

Client file downloaded to the remote host successfully.

Step 10 Log in to the ECS using VNC. See "Logging In to a Linux ECS Using VNC" in the ElasticCloud Server User Guide (Getting Started > Logging In to an ECS > Logging In to aLinux ECS Using VNC).

All standard (Standard_xxx) and enterprise (Enterprise_xxx) images support Cloud-Init. Thepreset username and password for Cloud-init is linux and cloud.1234. If you have changedthe password, log in to the ECS using the new password. See "How Do I Log In to an ECSOnce All Images Support Cloud-Init?" in the Elastic Cloud Server User Guide (FAQs >Login FAQs > How Do I Log In to an ECS Once All Images Support Cloud-Init?).

Step 11 On the ECS, switch to user root and copy the installation package to the /opt directory.

sudo su - root

cp /home/linux/MRS_Services_Client.tar /opt

Step 12 Run the following command in the /opt directory to decompress the package to obtain theverification file and the configuration file package of the client:

tar -xvf MRS_Services_Client.tar

Step 13 Run the following command to verify the configuration file package of the client:

sha256sum -c MRS_Services_ClientConfig.tar.sha256

The command output is as follows:

MRS_Services_ClientConfig.tar: OK

Step 14 Run the following command to decompress MRS__Services_ClientConfig.tar:

tar -xvf MRS_Services_ClientConfig.tar

Step 15 Run the following command to install the client to a new directory, for example, /opt/hadoopclient. A directory is automatically generated during the client installation.

sh /opt/MRS_Services_ClientConfig/install.sh /opt/hadoopclient

If the following information is displayed, the client is successfully installed:

Components client installation is complete.

Step 16 Check whether the IP address of the ECS node is connected to the IP address of the clusterMaster node.

For example, run the following command: ping Master node IP address.

l If yes, go to Step 17.l If no, check whether the VPC and security group are correct and whether the ECS and

the MRS cluster are in the same VPC and security group, and go to Step 17.

Step 17 Run the following command to configure the environment variable:

source /opt/hadoopclient/bigdata_env


Issue 01 (2017-02-20) 64

Step 18 Run the client command of the component.

For example, run the following command to query the HDFS directory.

hdfs dfs -ls /

----End


Issue 01 (2017-02-20) 65

5 MRS Manager Operation Guide

5.1 MRS Manager Introduction

OverviewThe MRS manages and analyzes massive data and helps users rapidly obtain desired datafrom structured and unstructured data. However, structures of open source components arecomplicated and component installation, configuration, and management are time- and labor-consuming. The MRS Manager provides a unified enterprise-level platform for managing bigdata clusters. The MRS Manager provides the following functions:

l Cluster monitoring: enables you to rapidly know health status of hosts and services.l Graphical indicator monitoring and customization: enable you to obtain key system

information in time.l Service property configuration: meets your service performance requirements.l Cluster, service, and role instance operations: enable you to start or stop services and

clusters by one click.

Browser Requirementsl Internet Explorer 9.0 (standard mode) or later is recommended.l Google Chrome 36.0 or later is recommended.l Mozilla Firefox 35.0 or later is recommended.

MRS Manager InterfaceThe MRS Manager provides a unified cluster management platform to help users rapidly runand maintain clusters.

Table 5-1 describes the functions of operation entries.

MapReduce ServiceUser Guide 5 MRS Manager Operation Guide

Issue 01 (2017-02-20) 66

Table 5-1 Function description of MRS Manager operation entries

Operation Entry Function Description

Dashboard Shows status and key monitoring indicators of all services as well ashost status in histograms, line charts, and tables and so on. Userscan customize a dashboard for the key monitoring indicators anddrag it to any position on the interface. Data can be automaticallyupdated on the dashboard.

Service Provides the service monitoring, operation guidance, andconfiguration, which helps users manage services in a unifiedmanner.

Host Provides the host monitoring and operation guidance to help usersmanage hosts in a unified manner.

Alarm Provides alarm query and the guidance to clear alarms, whichenables users to identify product faults and potential risks in time,ensuring proper system running.

Audit Queries and exports audit logs to help users know users' activitiesand operations.

Tenant Provides a unified tenant management platform.

System Enables users to manage monitoring and alarm configurations aswell as backup.

Reference InformationMapReduce Service (MRS) is a data analysis service on the public cloud. This service is usedfor management and analysis of massive data.

MRS uses the MRS Manager portal to manage big data components, for example,components in the Hadoop ecosystem. Therefore, some concepts of MRS on the public cloudand the MRS Manager portal must be differentiated. Table 5-2 provides the details.

Table 5-2 Differences

Concept MRS on the Public Cloud MRS Manager

MapReduce Service Indicates the data analysis cloudservice on the public cloud. MRSis short for MapReduce Service.This service includes componentssuch as Hive, Spark, Yarn, HDFS,and ZooKeeper.

Indicates the MapReducecomponent in the Hadoopecosystem.


Issue 01 (2017-02-20) 67

5.2 Accessing MRS Manager

ScenarioMRS Manager supports the MRS cluster monitoring, configuration, and management. Youcan open the Manager page on the MRS Console page.

Procedure

Step 1 Log in to the Management Console of the public cloud, and click MapReduce Service.

Step 2 Click Cluster, view clusters of which the Status is Running in Active Clusters. Select thespecified cluster in Cluster and click Basic Information.

Step 3 In Cluster Configuration, click Cluster Manager to open MRS Manager.

If you access MRS Manager after successfully logging in to the MRS console, you do notneed to enter the password again because user admin is used for login by default.

----End

5.3 Viewing Running Tasks in a Cluster

ScenarioAfter you trigger a running task on MRS Manager, the task running process and progress aredisplayed. After the task window is closed, you need to use the task management function toopen the task window.

By default, MRS Manager keeps the records of the latest 10 running tasks, such as restartingservices, synchronizing service configurations, and performing health checks.

Procedure

Step 1 On the MRS Manager portal, click and open Task Lists.

You can view the following information under Task Lists: Name, Status, Progress, StartTime, and End Time.

Step 2 Click the name of a specified task and view details about the task execution process.

----End

5.4 Monitoring Management

5.4.1 Viewing the System Overview

ScenarioView basic statistics about services and clusters on the MRS Manager portal.


Issue 01 (2017-02-20) 68

Procedure

Step 1 On the MRS Manager portal, choose Dashboard > Real-Time Monitoring.l The Health Status and Roles of each service are displayed in Service Summary.l The following statistics about some host indicators are displayed:

– Cluster Host Health Status– Host Network Read Speed Distribution– Host Network Write Speed Distribution– Cluster Disk Information– Host Disk Usage Distribution– Cluster Memory Usage– Host Memory Usage Distribution– Host CPU Usage Distribution– Average Cluster CPU UsageClick Customize to display customized statistics.

Step 2 Set an interval for automatic page refreshing or click to refresh now.

The following parameters are supported:

l Refresh every 30 seconds: refreshes the page once every 30 seconds.l Refresh every 60 seconds: refreshes the page once every 60 seconds.l Stop refreshing: stops page refreshing.

NOTE

Selecting Full screen maximizes the Real-time Monitoring window.

----End

5.4.2 Configuring a Monitoring History Report

Scenario

On MRS Manager, the nodes where roles are deployed in a cluster can be classified intomanagement nodes, control nodes, and data nodes. Change trends of key host monitoringindicators on each type of nodes can be calculated and displayed as curve charts in reportsbased on user-defined periods. If a host belongs to multiple types of nodes, the indicatorstatistics will be collected several times.

View, customize, and export node monitoring indicator reports on MRS Manager.

Procedure

Step 1 View a monitoring indicator report.

1. On MRS Manager, click Dashboard.2. Click Historical Report to view the report.

By default, the report displays the monitoring indicator statistics of the previous day.

Step 2 Customize a monitoring indicator report.


Issue 01 (2017-02-20) 69

1. Click Customize and select the monitoring indicators to be displayed on MRS Manager.

The following indicators are supported and the page displays a maximum of sixcustomized indicators:

– Cluster CPU Usage Statistics– Cluster Memory Usage Statistics– Cluster Disk Write Speed Statistics– Cluster Disk Read Speed Statistics– Cluster Disk Usage Statistics– Cluster Network Read Speed Statistics– Cluster Network Write Speed Statistics– Cluster Disk Information

2. Click OK to save the settings and view the selected indicators.

NOTE

Click Clear to deselect all the indicators.

Step 3 Export a monitoring indicator report.

1. Select a period.

The options are Last day, Last week, Last month, Last quarter, and Last half year.

You can define the start time and end time in From and to.

2. Click Export to generate a report file of selected cluster monitoring indicators in thespecified period, and select a storage location to save the file.

NOTE

To view the curve charts of monitoring indicators in a specified period, click View. The page willdisplay the distribution curve of selected indicators in the specified period.

----End

5.4.3 Managing Service and Host Monitoring

Scenario

On MRS Manager, manage status and indicator information about all services (including roleinstances) and hosts.

l Status information, including operation, health, configuration, and role instance status.

l Information about key monitoring indicators of services.

l Monitoring indicator export.

NOTE

Set an interval for automatic page refreshing or click to refresh the page now.


l Refresh every 30 seconds: refreshes the page once every 30 seconds.


l Stop refreshing: stops page refreshing.


Issue 01 (2017-02-20) 70

Managing Service Monitoring

Step 1 On MRS Manager, click Service to view status of all services.

The service list includes Service, Operating Status, Health Status, Configuration Status,Roles and Operation.

l Table 5-3 describes service operating status.

Table 5-3 Service operating status

Status Description

Started Indicates that the service is started.

Stopped Indicates that the service is stopped.

Failed to start Indicates that the service fails to start.

Failed to stop Indicates that the service fails to stop.

Unknown Indicates initial service status after the background systemrestarts.

l Table 5-4 describes service health status.

Table 5-4 Service health status

Status Description

Good Indicates that all role instances in the service are runningproperly.

Bad Indicates that at least one role instance in the service is inBad state or the dependent service is abnormal.

Unknown Indicates that all role instances in the service are inUnknown state.

Concerning Indicates that the background system is restarting theservice.

Partially Healthy Indicates that the service that this service depends on isabnormal and the interfaces of the abnormal service cannotbe invoked externally.

l Table 5-5 describes service configuration status.

Table 5-5 Service configuration status

Status Description

Synchronized Indicates that the latest configuration takes effect.


Issue 01 (2017-02-20) 71

Status Description

Expired Indicates that the latest configuration does not take effectafter the parameter modification. You need to restart relatedservices.

Failed Indicates that the communication is abnormal or datacannot be read or written during the parameterconfiguration. You can try to click SynchronizeConfiguration to recover the previous configuration.

Configuring Indicates that the parameter is being configured.

Unknown Indicates that current configuration status cannot beobtained.

By default, services are displayed in ascending order by Service You can click Service,Operating Status, Health Status, or Configuration Status to change the display mode.

Step 2 Click the target service in the service list to view its status and indicator information.

Step 3 Customize monitoring indicators and export customized monitoring information.

1. In the Real-Time Statistics area, click Customize to customize key monitoringindicators.

2. Click History to display the historical monitoring information query page.3. Select a time period, and click View to display monitoring data in the specified time

period.4. Click Export to export the displayed indicator information.

----End

Managing Role Instance Monitoring

Step 1 On MRS Manager, click Service, and click the target service in the service list.

Step 2 Click Instance to view role instance status.

The role instance list includes Role, Host Name, OM IP Address, Business IP Address,Rack, Operating Status, Health Status and Configuration Status.

l Table 5-6 describes role instance operating status.

Table 5-6 Role instance operating status

Status Description

Started Indicates that the role instance is started.

Stopped Indicates that the role instance is stopped.

Failed to start Indicates that the role instance fails to start.

Failed to stop Indicates that the role instance fails to stop.


Issue 01 (2017-02-20) 72

Status Description

Decommissioning Indicates that the role instance is decommissioning.

Decommissioned Indicates that the role instance has decommissioned.

Recommissioning Indicates that the role instance is recommissioning.

Unknown Indicates initial role instance status after the backgroundsystem restarts.

l Table 5-7 describes role instance health status.

Table 5-7 Role instance health status

Status Description

Good Indicates that the role instance is running properly.

Bad Indicates that the role instance running is abnormal. Forexample, a port cannot be accessed because the PID doesnot exist.

Unknown Indicates that the host on which the role instance is runningdoes not connect to the background system.

Concerning Indicates that the background system is restarting the roleinstance.

l Table 5-8 describes role instance configuration status.

Table 5-8 Role instance configuration status

Status Description

Synchronized Indicates that the latest configuration takes effect.

Expired Indicates that the latest configuration does not take effectafter the parameter modification. You need to restart relatedservices.

Failed Indicates that the communication is abnormal or datacannot be read or written during the parameterconfiguration. You can try to click SynchronizeConfiguration to recover the previous configuration.

Configuring Indicates that the parameter is being configured.

Unknown Indicates that configuration status cannot be obtained.

By default, roles are displayed in ascending order by Role You can click Role, Host Name,OM IP Address, Business IP Address, Rack, Operating Status, Health Status, orConfiguration Status to change the display mode.

You can filter out all instances of the same role in Role.


Issue 01 (2017-02-20) 73

Click Advanced Search, set search criteria in the role search area, and click Search to viewspecified role information. You can click Reset to reset search criteria.

Step 3 Customize monitoring indicators and export customized monitoring information. Theoperation process is the same as that of exporting service monitoring indicators.

----End

Managing Host Monitoring

Step 1 On MRS Manager, click Host.

The host list includes Host Name, OM IP Address, Business IP Address, Rack, NetWorkSpeed, Operating Status, Health Status, Disk Usage, Memory Usage and CPU Usage.

l Table 5-9 describes operational status.

Table 5-9 Host operational status

Status Description

Normal The host and service roles on the host are running properly.

Isolated The host is isolated by the user, and service roles on thehost are stopped.

l Table 5-10 describes host health status.

Table 5-10 Host health status

Status Description

Good Indicates that the host can properly send heartbeats.

Bad Indicates that the host fails to send heartbeats due totimeout.

Unknown Indicates the host initial status when the host is beingadded.

By default, roles are displayed in ascending order by Host Name. You can click Host Name,OM IP Address, Business IP Address, Rack, NetWork Speed, Operating Status, HealthStatus, Disk Usage, Memory Usage, or CPU Usage to change the display mode.

Click Advanced Search, set search criteria in the role search area, and click Search to viewspecified host information. You can click Reset to reset search criteria.

Step 2 Click the target host in the host list to view its status and indicator information.

Step 3 Customize monitoring indicators and export customized monitoring information.

1. In the Real-Time Statistics area, click Customize to customize key monitoringindicators.

2. Click History to display the historical monitoring information query page.


Issue 01 (2017-02-20) 74

3. Select a time period, and click View to display monitoring data in the specified timeperiod.

4. Click Export to export the displayed indicator information.

----End

5.4.4 Managing Resource Distribution

Scenario

Master top value curves, bottom value curves, or average data curves of the key monitoringindicators of services and hosts, that is, master the resource distribution status. MRS Managerallows a user to view monitoring data in the last one hour.

Users can also modify the resource distribution on MRS Manager to display top value curvesand bottom value curves in user-defined quantity in a service and host resource distributionfigure.

Resource distribution of some monitoring indicators is not recorded.

Procedurel View the resource distribution of service monitoring indicators.

a. On MRS Manager, click Service.b. Select the specific service in the service list.c. Click Resource Distribution.

Select key service indicators from Metric. MRS Manager displays the resourcedistribution data of the selected service indicators in the last one hour.

l View the resource distribution of host monitoring indicators.

a. On MRS Manager, click Host.b. Click the specific host in the host list.c. Click Resource Distribution.

Select key host indicators from Metric. MRS Manager displays the resourcedistribution data of the selected indicators in the last one hour.

l Configure resource distribution.

a. On MRS Manager, click System.b. In Configuration, click Configure Resource Contribution Ranking under

Monitoring and Alarm.Modify the displayed resource distribution quantity.n Set Number of Top Resources to the number of top values.n Set Number of Bottom Resources to the number of bottom values.

NOTE

The sum of the number of top values and the number of bottom values cannot be greaterthan five.

c. Click OK to save the settings.Click Close when Number of top and bottom resources saved successfully isdisplayed.


Issue 01 (2017-02-20) 75

5.4.5 Configuring Monitoring Indicator Dumping

Scenario

Configure parameters for monitoring indicator data interconnection on MRS Manager so thatthe monitoring indicator data can be saved to a specified FTP server using the FTP or SFTPprotocol and MRS Manager can interconnect with third-party systems. The FTP protocol doesnot encrypt data, bringing potential security risks. Therefore, the SFTP protocol isrecommended.

MRS Manager supports the function of collecting all the monitoring indicator data in themanaged clusters. The collection period is 30 seconds, 60 seconds, or 300 seconds. Themonitoring indicator data is stored to different monitoring files on the FTP server bycollection period. The monitoring file naming rule is Cluster name_metric_Monitoringindicator data collection period_File saving time.log.

Prerequisites

The corresponding ECS of the dump server and the Master node of the MRS cluster aredeployed on the same VPC, and the Master node can access the IP address and specific portsof the dump server. The FTP service of the dump server is running properly.

Procedure

Step 1 On MRS Manager, click System.

Step 2 In Configuration, click Configure Monitoring Metric Dump under Monitoring andAlarm.

The default value of Dump Monitoring Metric is Off. Set Dump Monitoring Metric to Onto enable the monitoring indicator data interconnection function.

Step 3 Table 5-11 describes the parameters.

Table 5-11 Dump parameters


FTP IP Address Mandatory.Specifies the FTP server for storing monitoring files after themonitoring indicator data interconnection is enabled.

FTP Port Mandatory.Specifies the port for connecting to the FTP server.

FTP Username Mandatory.Specifies the username for logging in to the FTP server.

FTP Password Mandatory.Specifies the password for logging in to the FTP server.

Save Path Mandatory.Specifies the save path of monitoring files on the FTP server.


Issue 01 (2017-02-20) 76


Dump Interval (s) Mandatory.Specifies the interval for saving monitoring files to the FTP server, insecond.

Dump Mode Mandatory.Specifies the protocol used to send monitoring files. The optionsinclude FTP and SFTP.

SFTP Public Key Optional.Specifies the public key of the FTP server. This parameter is validwhen Mode is set to SFTP. You are advised to set this parameter;otherwise, security risks may exist.

Step 4 Click OK. The parameters are set.

----End

5.5 Alarm Management

5.5.1 Viewing and Manually Clearing an Alarm

Scenario

Users can view and manually clear an alarm on MRS Manager.

Generally, the system automatically clears an alarm when the fault for which the alarm isgenerated is rectified. If the alarm cannot be automatically cleared after the fault is rectified,and the alarm has no impact on the system, you can manually clear the alarm.

On the MRS Manager portal, you can view the latest 100,000 alarms, including alarms thatare manually cleared, automatically cleared, or not cleared. If the number of cleared alarmsexceeds 100,000 and is about to reach 110,000, the system automatically dumps the earliest10,000 cleared alarms to the dump path ${BIGDATA_HOME}/OMSV100R001C00x8664/workspace/data on the active management node. The directory will be automaticallygenerated when alarms are dumped for the first time.

NOTE

Set an interval for automatic page refreshing or click to refresh the page now.




l Stop refreshing: stops page refreshing.

Procedure

Step 1 On MRS Manager, click Alarm and view the information about alarms in the alarm list page.


Issue 01 (2017-02-20) 77

l By default, alarms are displayed in descending order by Generated On. You can clickSeverity or Generated On to change the display mode.

l You can filter out all alarms of the same severity in Severity, including cleared alarmsand uncleared alarms.

l You can click , , , or to filter out alarms whose severity is Critical, Major,Minor, or Warning.

Step 2 Click Advanced Search. In the displayed alarm search area, set search criteria and clickSearch to view the information about specified alarms. Click Reset to reset search criteria.

Rectify the fault by referring to the help information. If the alarms in some scenarios aregenerated due to other cloud services that MRS depends on, you need to contact maintenancepersonnel of the corresponding cloud services.

Step 3 If the alarm needs to be manually cleared, click .

NOTE

If multiple alarms are handled, select one or multiple alarms to be cleared and click Clear Alarm toclear the alarms in batches. A maximum of 300 alarms can be cleared in batches each time.

----End

5.5.2 Configuring an Alarm Threshold

ScenarioConfigure an alarm threshold so that you can learn the indicator health status. After SendAlarm is selected, the system sends an alarm message when the monitored data reaches thealarm threshold. You can view the alarm information in Alarm.

Procedure


Step 2 In Configuration, click Configure Alarm Threshold under Monitoring and Alarm.

Step 3 Click an indicator, for example, CPU Usage, and click Create Rule.

Step 4 Set monitoring indicator rule parameters.

Table 5-12 Description of monitoring indicator rule parameters

Parameter Value Description

Rule Name CPU_MAX (example value) Specifies the rule name.

Reference Date 2016-11-06 (example value) Specifies the date on whichthe reference indicatorhistory is generated.


Issue 01 (2017-02-20) 78


Threshold Type l Max. valuel Min. value

Specifies whether to use themaximum or minimumvalue of the indicator forsetting the threshold. Whenthis parameter is set to Max.value, an alarm will begenerated when the actualvalue of the indicator isgreater than the threshold.When this parameter is setto Min. value, an alarm willbe generated when theactual value of the indicatoris smaller than the threshold.

Alarm Severity l Criticall Majorl Minorl Warning

Specifies the alarm severity.

Time Range From 00:00 to 23:59(example value)

Specifies the period inwhich the rule takes effect.

Threshold Set value 80 (examplevalue)

Specifies the threshold ofthe rule monitoringindicator.

Date l Workdayl Weekendl Other

Specifies the type of datewhen the rule takes effect.

Step 5 Click OK. The Information dialog box is displayed, indicating Template savedsuccessfully. Click Close.

Send Alarm is selected by default. MRS Manager checks whether the values of monitoringindicators meet the threshold requirements. If the number of times that the values do not meetthe threshold requirements during consecutive checks exceeds the value of Trigger Count, analarm will be sent. The value of Trigger Count can be customized. Check Period (s)specifies the interval for MRS Manager to check monitoring indicators.

Step 6 In the row that contains the newly added rule, click in the Operation column. A dialogbox is displayed indicating that the rule has been successfully applied. Click Close in thedialog box that is displayed. The icon becomes green, and the operation is complete. Click

in the Operation column. A dialog box is displayed indicating that the rule has beencanceled successfully. The click Close.

----End


Issue 01 (2017-02-20) 79

5.5.3 Configuring Syslog Northbound Interface

ScenarioConfigure the northbound interface. In this way, alarms generated on MRS Manager can bereported to your monitoring operation and maintenance system using Syslog.

NOTICEThe Syslog protocol is not encrypted. Therefore, data can be easily stolen duringtransmission, bringing security risks.

PrerequisitesThe corresponding ECS of the interconnected server and the Master node of the MRS clusterare deployed on the same VPC, and the Master node can access the IP address and specificports of the interconnected server.

Procedure


Step 2 In Configuration, click Configure Syslog under Monitoring and Alarm.

The default value of Syslog Service is Off. If you set Syslog Service to On, the Syslogservice is started.

Step 3 On the displayed page, set Syslog parameters listed in Table 5-13:

Table 5-13 Description of Syslog parameters

Area Parameter Description

Syslog Protocol Server IP Address Specifies the IP address ofthe interconnected server.

Server Port Specifies the port numberfor interconnection.

Protocol Specifies the protocol type,which can be:l TCPl UDP


Issue 01 (2017-02-20) 80

Area Parameter Description

Severity Specifies the messageseverity, which can be:l Informationall Emergencyl Alertl Criticall Errorl Warningl Noticel Debug

Facility Specifies the module wherethe log is generated.

Identifier Specifies the product. Thedefault value is MRSManager.

Report Message Report Format Specifies the messageformat of alarms. For detailsabout the formatrequirements, see the helpinformation on the WebUI.

Report Alarm Type Specifies the type of alarmsto be reported.

Report Alarm Severity Specifies the severity ofalarms to be reported.

Uncleared AlarmReporting

Periodic Uncleared AlarmReporting

Specifies whether unclearedalarms are reportedperiodically. On indicatesthat the function is enabled.Off indicates that thefunction is disabled. Thedefault value is Off.

Report Interval (min) Specifies the interval forperiodical alarm report.When Periodic UnclearedAlarm Reporting is set toOn, the periodical alarmreport function is enabled.The unit of the interval isminute and the default valueis 15. The value range is 5 to1440.


Issue 01 (2017-02-20) 81

Step 4 Click OK to complete the settings.

----End

5.5.4 Configuring SNMP Northbound Interface

Scenario

Integrate alarm and monitoring data of MRS Manager to the network management system(NMS) using the Simple Network Management Protocol (SNMP).

Prerequisites

The corresponding ECS of the interconnected server and the Master node of the MRS clusterare deployed on the same VPC, and the Master node can access the IP address and specificports of the interconnected server.

Procedure


Step 2 In Configuration, click Configure SNMP under Monitoring and Alarm.

The default value of SNMP Service is Off. If you set SNMP Service to On, the SNMPservice is started.

Step 3 On the displayed page, set SNMP parameters listed in Table 5-14:

Table 5-14 Description of SNMP parameters


Version Specifies the version of the SNMP protocol, which can be:l v2c: an earlier version of SNMP with low securityl v3: the latest version of SNMP with higher security than

SNMPv2cSNMPv3 is recommended.

Local Port Specifies the local port number. The default value is 20000. Thevalue ranges from 1025 to 65535.

Read-OnlyCommunity

Specifies the read-only community name. This parameter is validwhen Version is set to v2c.

Read-WriteCommunity

Specifies the write community name. This parameter is valid whenVersion is set to v2c.

Security Username Specifies the SNMP security username. This parameter is validwhen Version is set to v3.

AuthenticationProtocol

Specifies the authentication protocol. You are advised to set thisparameter to SHA. This parameter is valid when Version is set tov3.


Issue 01 (2017-02-20) 82


AuthenticationPassword

Specifies the authentication key. This parameter is valid whenVersion is set to v3.

Confirm Password Used to confirm the authentication key. This parameter is validwhen Version is set to v3.

EncryptionProtocol

Specifies the encryption protocol. You are advised to set thisparameter to AES256. This parameter is valid when Version is setto v3.

EncryptionPassword

Specifies the encryption key. This parameter is valid when Versionis set to v3.

Confirm Password Used to confirm the encryption key. This parameter is valid whenVersion is set to v3.

NOTE

l The length of the values of Authentication Password and Encryption Password is 8 to 16characters. At least three types of characters need to be contained among uppercase letters,lowercase letters, digits, and special characters. The passwords must be different. The passwordscannot be the same as the security username or reverse security username.

l To ensure security, periodically change the values of Authentication Password and EncryptionPassword parameters when the SNMP protocol is used.

l When SNMP v3 is used, a security user will be locked if authentication fails for five consecutivetimes in 5 minutes and will be unlocked 5 minutes later.

Step 4 Click Create Trap Target under Trap Target, and set the following parameters in the CreateTrap Target dialog box that is displayed:

l Target Symbol: Specifies the ID of the Trap target, generally the ID of the networkmanagement or host that receives Trap. The value consists of 1 to 255 characters,including letters or digits.

l Target IP Address: Specifies the target IP address. The value of this parameter can beset to an IP address of A, B, or C class and can communicate with the IP address of themanagement plane on the management node.

l Target Port: Specifies the port that receives Trap. The value of this parameter must bethat same as that on the peer end and ranges from 0 to 65535.

l Trap Community: Specifies the trap community name. This parameter is valid whenVersion is set to v2c.

Click OK to finish the settings and exit the Create Trap Target dialog box.

Step 5 Click OK.

----End

5.6 Alarm Reference


Issue 01 (2017-02-20) 83

5.6.1 ALM-12001 Audit Log Dump Failure

Description

Cluster audit logs need to be dumped on a third-party server due to the local historical databackup policy. Audit logs can be successfully dumped if the dump server meets theconfiguration conditions. This alarm is generated when the audit log dump fails because thedisk space of the dump directory on the third-party server is insufficient or a user changes theusername, password, or dump directory of the dump server.

Attribute

Alarm ID Alarm Severity Automatically Cleared

12001 Medium Yes

Parameters


ServiceName Specifies the service for which the alarm isgenerated.

RoleName Specifies the role for which the alarm isgenerated.

HostName Specifies the host for which the alarm isgenerated.

Impact on the System

The system can only store a maximum of 50 dump files locally. If the fault persists on thedump server, the local audit log may be lost.

Possible Causesl The network connection is abnormal.l The username, password, or dump directory of the dump server does not meet the

configuration conditions.l The disk space of the dump directory is insufficient.

Procedure

Step 1 Check whether the username, password, or dump directory are correct.

1. On the dump configuration page of MRS Manager, check whether the username,password, and dump directory of the third-party server are correct.– If yes, go to Step 3.


Issue 01 (2017-02-20) 84

– If no, go to Step 1.2.

2. Change the username, password, or dump directory, and click OK.

3. Wait 2 minutes and check whether the alarm is cleared.

– If yes, no further action is required.

– If no, go to Step 2.

Step 2 Reset the dump rule.

1. On the MRS Manager portal, choose System > Dump Audit Log.

2. Reset dump rules, set the parameters properly, and click OK.




Step 3 Collect fault information.

1. On the MRS Manager portal, choose System > Export Log.

2. Call the OTC Customer Hotline for support.

Telephone:

Germany: 0800 330 44 44

International: +800 44556600

----End

Related Information

N/A

5.6.2 ALM-12002 HA Resource Is Abnormal

Description

The high availability (HA) software periodically checks the WebService floating IP addressesand databases of Manager. This alarm is generated when the HA software detects that theWebService floating IP addresses or databases are abnormal.

This alarm is cleared when the HA software detects that the floating IP addresses or databasesare in normal state.

Attribute


12002 Major Yes


Issue 01 (2017-02-20) 85

ParametersParameter Description




RESName Specifies the resource for which the alarm isgenerated.


If the WebService floating IP addresses of Manager are abnormal, users cannot log in to anduse MRS Manager. If databases are abnormal, all core services and related service processes,such as alarm and monitoring functions, are affected.

Possible Causesl The floating IP address is abnormal.l An exception occurs in the database.

Procedure

Step 1 Check the floating IP address status of the active management node.

1. In the alarm list on MRS Manager, locate the row that contains the alarm, and view thehost address and resource name of the alarm in the alarm details.

2. Log in to the active management node. Run the following command to switch the user:sudo su - rootsu - omm

3. Go to the ${BIGDATA_HOME}/om-0.0.1/sbin/ directory, run the status-oms.sh scriptto check whether the floating IP address of the active Manager is normal. View thecommand output, locate the row where ResName is floatip, and check whether thefollowing information is displayed.For example:10-10-10-160 floatip Normal Normal Single_active– If yes, go to Step 2.– If no, go to Step 1.4.

4. Contact the O&M personnel of the public cloud to check whether the floating IP NICexists.– If yes, go to Step 2.– If no, go to Step 1.5.

5. Contact the O&M personnel of the public cloud to rectify NIC faults.


Issue 01 (2017-02-20) 86

Wait 5 minutes and check whether the alarm is cleared.



Step 2 Check the database status of the active and standby management nodes.

1. Log in to the active and standby management nodes respectively, run the sudo su - rootand su - ommdba command to switch to user ommdba, and then run the gs_ctl querycommand to check whether the following information is displayed in the commandoutput.

Command output of the active management node:Ha state: LOCAL_ROLE: Primary STATIC_CONNECTIONS: 1 DB_STATE: Normal DETAIL_INFORMATION: user/password invalid Senders info: No information Receiver info: No information

Command output of the standby management node:Ha state: LOCAL_ROLE: Standby STATIC_CONNECTIONS: 1 DB_STATE : Normal DETAIL_INFORMATION: user/password invalid Senders info: No information Receiver info: No information

– If yes, go to Step 2.3.


2. Contact the O&M personnel of the public cloud to check whether a network fault occursand rectify the fault.






Step 3 Call the OTC Customer Hotline for support.

Telephone:

Germany: 0800 330 44 44


----End

Related Information

N/A

5.6.3 ALM-12004 OLdap Resource Is Abnormal

Description

This alarm is generated when the Ldap resource in Manager is abnormal.

This alarm is cleared when the Ldap resource in Manager recovers and the alarm handling iscomplete.


Issue 01 (2017-02-20) 87

Attribute


12004 Major Yes

Parameters






The OLdap resources are abnormal and the Manager authentication service is unavailable. Asa result, security authentication and user management functions cannot be provided for webupper-layer services. Users may be unable to log in to Manager.

Possible Causes

The LdapServer process in Manager is abnormal.

Procedure

Step 1 Check whether the LdapServer process in Manager is in normal state.

1. Log in to the active management node.2. Run ps -ef | grep slapd to check whether the LdapServer resource process in the $

{BIGDATA_HOME}/om-0.0.1/ directory of the configuration file is running properly.You can determine that the resource is normal by checking the following information:

a. Run sh ${BIGDATA_HOME}/om-0.0.1/sbin/status-oms.sh. You can view thatResHAStatus of the OLdap process is Normal.

b. Run ps -ef | grep slapd. You can view the slapd process occupying port 21750.

– If yes, go to Step 2.– If no, go to Step 3.

Step 2 Run kill -2 PID of LdapServer process and wait 20 seconds. The HA starts the OLdap processautomatically. Check whether the status of the OLdap resource is in normal state.l If yes, no further action is required.l If no, go to Step 3.


Issue 01 (2017-02-20) 88


Telephone:

Germany: 0800 330 44 44


----End

Related InformationN/A

5.6.4 ALM-12005 OKerberos Resource Is Abnormal

DescriptionThe alarm module monitors the status of the Kerberos resource in Manager. This alarm isgenerated when the Kerberos resource is abnormal.

This alarm is cleared when the Kerberos resource recovers and the alarm handling iscomplete.

AttributeAlarm ID Alarm Severity Automatically Cleared

12005 Major Yes





Impact on the SystemThe Kerberos resources are abnormal and the Manager authentication service is unavailable.As a result, the security authentication function cannot be provided for web upper-layerservices. Users may be unable to log in to Manager.

Possible CausesThe OLdap resource on which the Okerberos depends is abnormal.


Issue 01 (2017-02-20) 89

Procedure

Step 1 Check whether the OLdap resource on which the Okerberos depends is abnormal in Manager.

1. Log in to the active management node.

2. Run the following command to check whether the OLdap resource managed by HA is innormal state:

sh ${BIGDATA_HOME}/OMSV100R001C00x8664/workspace0/ha/module/hacom/script/status_ha.sh

The OLdap resource is in normal state when the OLdap resource is in the Active_normalstate on the active node and in the Standby_normal state on the standby node.

– If yes, go to Step 3.


Step 2 See ALM-12004 OLdap Resource Is Abnormal to resolve the problem. After the OLdapresource status recovers, check whether the OKerberos resource is in normal state.

l If yes, no further action is required.

l If no, go to Step 3.


Telephone:

Germany: 0800 330 44 44


----End

Related Information

N/A

5.6.5 ALM-12006 Node Fault

Description

Controller checks the NodeAgent status every 30 seconds. This alarm is generated whenController fails to receive the status report of a NodeAgent for three consecutive times.

This alarm is cleared when Controller can properly receive the status report of theNodeAgent.

Attribute


12006 Critical Yes


Issue 01 (2017-02-20) 90

Parameters






Services on the node are unavailable.

Possible Causes

The network is disconnected, or the hardware is faulty.

Procedure

Step 1 Check whether the network is disconnected or the hardware is faulty.

1. In the alarm list on MRS Manager, locate the row that contains the alarm, and view thealarm host address in the alarm details.

2. Log in to the active management node.3. Run the following command to check whether the faulty node is reachable:

ping IP address of the faulty host

a. If yes, go to Step 2.b. If no, go to Step 1.4.

4. Contact the O&M personnel of the public cloud to check whether a network fault occursand rectify the fault.– If yes, go to Step 2.– If no, go to Step 1.6.

5. Rectify the network fault and check whether the alarm is cleared from the alarm list.– If yes, no further action is required.– If no, go to Step 1.6.

6. Contact the O&M personnel of the public cloud to check whether a hardware fault (forexample, a CPU fault or memory fault) occurs on the node.– If yes, go to Step 1.7.– If no, go to Step 2.

7. Repair the faulty components and restart the node. Check whether the alarm is clearedfrom the alarm list.– If yes, no further action is required.


Issue 01 (2017-02-20) 91



Telephone:

Germany: 0800 330 44 44


----End

Related Information

N/A

5.6.6 ALM-12007 Process Fault

Description

The process health check module checks the process status every 5 seconds. This alarm isgenerated when the process health check module detects that the process connection status isBad for three consecutive times.

This alarm is cleared when the process can be connected.


12007 Major Yes






The service provided by the process is unavailable.

Possible Causesl The instance process is abnormal.


Issue 01 (2017-02-20) 92

l The disk space is insufficient.

Procedure

Step 1 Check whether the instance process is abnormal.

1. In the alarm list on MRS Manager, locate the row that contains the alarm, and view thealarm host name and service name in the alarm details.

2. On the Alarm page, check whether the alarm ALM-12006 Node Fault is generated.



3. See the procedure in ALM-12006 Node Fault to handle the alarm.

4. Check whether the installation directory user, user group, and permission of the alarmrole are correct. The user, user group, and the permission must be omm:ficommon 750.



5. Run the following command to set the permission to 750 and User:Group toomm:ficommon:

chmod 750 <folder_name>chown omm:ficommon <folder_name>




Step 2 Check whether the disk space is insufficient.

1. On MRS Manager, check whether the alarm list contains ALM-12017 Insufficient DiskCapacity.



2. See the procedure in ALM-12017 Insufficient Disk Capacity to handle the alarm.








Telephone:

Germany: 0800 330 44 44


----End


Issue 01 (2017-02-20) 93


5.6.7 ALM-12010 Manager Heartbeat Interruption Between theActive and Standby Nodes

DescriptionThis alarm is generated when the active Manager does not receive the heartbeat signal fromthe standby Manager within 7 seconds.

This alarm is cleared when the active Manager receives the heartbeat signal from the standbyManager.


12010 Major Yes





Local Manager HA Name Specifies a local Manager HA.

Peer Manager HA Name Specifies a peer Manager HA.

Impact on the SystemWhen the active Manager process is abnormal, an active/standby failover cannot beperformed, and services are affected.

Possible CausesThe link between the active and standby Managers is abnormal.

Procedure

Step 1 Check whether the network between the active Manager server and the standby Managerserver is in normal state.


Issue 01 (2017-02-20) 94

1. In the alarm list on MRS Manager, locate the row that contains the alarm, and view theIP address of the standby Manager server in the alarm details.

2. Log in to the active management node.3. Run the following command to check whether the standby Manager is reachable:

ping heartbeat IP address of the standby Manager– If yes, go to Step 2.– If no, go to Step 1.4.

4. Contact the O&M personnel of the public cloud to check whether the network is faulty.– If yes, go to Step 1.5.– If no, go to Step 2.

5. Rectify the network fault and check whether the alarm is cleared from the alarm list.– If yes, no further action is required.– If no, go to Step 2.


Telephone:

Germany: 0800 330 44 44


----End


5.6.8 ALM-12011 Manager Data Synchronization ExceptionBetween the Active and Standby Nodes

DescriptionThis alarm is generated when the standby Manager fails to synchronize files with the activeManager.

This alarm is cleared when the standby Manager synchronizes files with the active Manager.


12011 Critical Yes


Issue 01 (2017-02-20) 95





Local Manager HA Name Specifies a local Manager HA.

Peer Manager HA Name Specifies a peer Manager HA.

Impact on the SystemBecause the configuration files on the standby Manager are not updated, some configurationswill be lost after an active/standby switchover. Manager and some components may not runproperly.

Possible CausesThe link between the active and standby Managers is interrupted.

Procedure

Step 1 Check whether the network between the active Manager server and the standby Managerserver is in normal state.

1. In the alarm list on MRS Manager, locate the row that contains the alarm, and view theIP address of the standby Manager in the alarm details.

2. Log in to the active management node. Run the following command to check whetherthe standby Manager is reachable:ping IP address of the standby Manager

a. If yes, go to Step 2.b. If no, go to Step 1.3.

3. Contact the O&M personnel of the public cloud to check whether the network is faulty.– If yes, go to Step 1.4.– If no, go to Step 2.

4. Rectify the network fault and check whether the alarm is cleared from the alarm list.– If yes, no further action is required.– If no, go to Step 2.


Telephone:

Germany: 0800 330 44 44


Issue 01 (2017-02-20) 96


----End

Related Information

N/A

5.6.9 ALM-12012 NTP Service Is Abnormal

Description

This alarm is generated when the NTP service on the current node fails to synchronize timewith the NTP service on the active OMS node.

This alarm is cleared when the NTP service on the current node synchronizes time properlywith the NTP service on the active OMS node.

Attribute


12012 Major Yes

Parameters






The time on the node is inconsistent with the time on other nodes in the cluster. Therefore,some MRS applications on the node may not run properly.

Possible Causesl The NTP service on the current node cannot start properly.l The current node fails to synchronize time with the NTP service on the active OMS

node.l The key value authenticated by the NTP service on the current node is inconsistent with

that on the active OMS node.


Issue 01 (2017-02-20) 97

l The time offset between the node and the NTP service on the active OMS node is large.

Procedure

Step 1 Check the NTP service on the current node.

1. Check whether the ntpd process is running on the node using the following method. Login to the node and run the sudo su - root command to switch the user. Run the followingcommand to check whether the command output contains the ntpd process:ps -ef | grep ntpd | grep -v grep.– If yes, go to Step 2.1.– If no, go to Step 1.2.

2. Run service ntp start to start the NTP service.3. Wait 10 minutes and check whether the alarm is cleared.

– If yes, no further action is required.– If no, go to Step 2.1.

Step 2 Check whether the current node can synchronize time properly with the NTP service on theactive OMS node.

1. Check whether the node can synchronize time with the NTP service on the active OMSnode based on Additional Info of the alarm.If yes, go to Step 2.2.If no, go to Step 3.

2. Check whether the synchronization with the NTP service on the active OMS node isfaulty.Log in to the alarmed node and run the sudo su - root command to switch the user. Thenrun the ntpq -np command.If an asterisk (*) exists before the IP address of the NTP service on the active OMS nodein the command output, the synchronization is in normal state. The command output isas follows:remote refid st t when poll reach delay offset jitter ============================================================================== *10.10.10.162 .LOCL. 1 u 1 16 377 0.270 -1.562 0.014

If no asterisk (*) exists before the IP address of the NTP service on the active OMS nodein the command output and the value of refid is .INIT., the synchronization is abnormal.The command output is as follows:remote refid st t when poll reach delay offset jitter ============================================================================== 10.10.10.162 .INIT. 1 u 1 16 377 0.270 -1.562 0.014

– If yes, go to Step 2.3.– If no, go to Step 3.

3. Rectify the fault, wait 10 minutes, and then check whether the alarm is cleared.An NTP synchronization failure is usually related to the system firewall. If the firewallcan be disabled, disable the firewall and then check whether the fault is rectified. If thefirewall cannot be disabled, check the firewall configuration policies and ensure that portUDP 123 is enabled (you need to follow specific firewall configuration policies of eachsystem).– If yes, no further action is required.


Issue 01 (2017-02-20) 98


Step 3 Check whether the key value authenticated by the NTP service on the current node isconsistent with that on the active OMS node.

Run cat /etc/ntp/ntpkeys to check whether the authentication code whose key value index is1 is the same as the value of the NTP service on the active OMS node.

l If yes, go to Step 4.1.l If no, go to Step 5.

Step 4 Check whether the time offset between the node and the NTP service on the active OMS nodeis large.

1. Check whether the time offset is large in Additional Info of the alarm.– If yes, go to Step 4.2.– If no, go to Step 5.

2. On the Host page, select the host of the node, and choose More > Stop All Roles to stopall the services on the node.If the time on the alarm node is later than that on the NTP service of the active OMSnode, adjust the time of the alarm node. After adjusting the time, choose More > StartAll Roles to start services on the node.If the time on the alarm node is earlier than that on the NTP service of the active OMSnode, wait until the time offset is due and adjust the time of the alarm node. Afteradjusting the time, choose More > Start All Roles to start services on the node.

NOTE

If you do not wait until the time offset is due, data loss may occur.

3. Wait 10 minutes and check whether the alarm is cleared.– If yes, no further action is required.– If no, go to Step 5.


Telephone:

Germany: 0800 330 44 44


----End


5.6.10 ALM-12016 CPU Usage Exceeds the Threshold

DescriptionThe system checks the CPU usage every 30 seconds and compares the actual CPU usage withthe threshold. The CPU usage has a default threshold. This alarm is generated when the CPUusage exceeds the threshold for several times (configurable, 10 times by default)consecutively.


Issue 01 (2017-02-20) 99

To change the threshold, choose System > Configure Alarm Threshold.

This alarm is cleared when the average CPU usage is less than or equal to 90% of thethreshold.

Attribute


12016 Major Yes

Parameters





Trigger Condition Generates an alarm when the actualindicator value exceeds the specifiedthreshold.


Service processes respond slowly or become unavailable.

Possible Causesl The alarm threshold or Trigger Count is configured inappropriately.

l The CPU configuration cannot meet service requirements. The CPU usage reaches theupper limit.

Procedure

Step 1 Check whether the alarm threshold or Trigger Count is appropriate.

1. Log in to MRS Manager and change the alarm threshold and Trigger Count based onthe actual CPU usage.

2. Choose System > Configure Alarm Threshold and change the alarm threshold basedon the actual CPU usage.

3. Choose System > Configure Alarm Threshold and change Trigger Count based on theactual CPU usage.


Issue 01 (2017-02-20) 100

NOTE

This option defines the alarm check phase. Interval indicates the alarm check period and TriggerCount indicates the number of times when the CPU usage exceeds the threshold. An alarm isgenerated when the CPU usage exceeds the threshold for several times consecutively.


Step 2 Expand the system.

1. In the alarm list on MRS Manager, locate the row that contains the alarm, and view theIP address of the alarm node in the alarm details.

2. Log in to the node for which the alarm is generated.3. Run the cat /proc/stat | awk 'NR==1'|awk '{for(i=2;i<=NF;i++)j+=$i;print "" 100 -

($5+$6) * 100 / j;}' command to check the system CPU usage.4. If the CPU usage exceeds the threshold, expand the CPU capacity.5. Check whether the alarm is cleared.

– If yes, no further action is required.– If no, go to Step 3.


Telephone:

Germany: 0800 330 44 44


----End

Related Information

N/A

5.6.11 ALM-12017 Insufficient Disk Capacity

Description

The system checks the host disk usage every 30 seconds and compares the actual disk usagewith the threshold. The disk usage has a default threshold. This alarm is generated when thehost disk usage exceeds the specified threshold.


This alarm is cleared when the host disk usage is less than or equal to the threshold.

Attribute


12017 Major Yes


Issue 01 (2017-02-20) 101





PartitionName Specifies the disk partition for which thealarm is generated.



Service processes become unavailable.

Possible Causes

The disk configuration cannot meet service requirements. The disk usage reaches the upperlimit.

Procedure

Step 1 Log in to MRS Manager and check whether the threshold is appropriate.

1. By default, 90% is appropriate, but the threshold can be configured on demand.– If yes, go to Step 1.2.– If no, go to Step 2.

2. Choose System > Configure Alarm Threshold and change the alarm threshold basedon the actual disk usage.


Step 2 Check whether the disk is a system disk.

1. In the alarm list on MRS Manager, locate the row that contains the alarm, and view thealarm host name and disk partition information in the alarm details.

2. Log in to the node for which the alarm is generated.3. Run the df -h command to check the system disk partition usage. Check whether the disk

is mounted to the following directories based on the disk partition name obtained in Step2.1: /, /boot, /home, /opt, /tmp, /var, /var/log, /boot, and /srv/BigData.– If yes, the disk is a system disk. Then go to Step 3.1.


Issue 01 (2017-02-20) 102

– If no, the disk is not a system disk. Then go to Step 2.4.4. Run the df -h command to check the system disk partition usage. Determine the role of

the disk based on the disk partition name obtained in Step 2.1.5. Check whether the disk is used by HDFS or Yarn.

– If yes, expand the disk capacity for the Core node. Then go to Step 2.6.– If no, go to Step 4.


Step 3 Check whether a large file is written to the disk.

1. Run the find / -xdev -size +500M -exec ls -l {} \; command to view files larger than 500MB on the node. Check whether such files are written to the disk.– If yes, go to Step 3.2.– If no, go to Step 4.

2. Process the large file and check whether the alarm is cleared after 2 minutes.– If yes, no further action is required.– If no, go to Step 4.

3. Expand the disk capacity.4. Wait 2 minutes and check whether the alarm is cleared.



Telephone:

Germany: 0800 330 44 44


----End

Related Information

N/A

5.6.12 ALM-12018 Memory Usage Exceeds the Threshold

Description

The system checks the memory usage every 30 seconds and compares the actual memoryusage with the threshold. The memory usage has a default threshold. This alarm is generatedwhen the host memory usage exceeds the threshold.


This alarm is cleared when the host memory usage is less than or equal to 90% of thethreshold.


Issue 01 (2017-02-20) 103


12018 Major Yes






Impact on the SystemService processes respond slowly or become unavailable.

Possible CausesMemory configuration cannot meet service requirements. The memory usage reaches theupper limit.

Procedure


1. In the alarm list on MRS Manager, locate the row that contains the alarm, and view theIP address of the alarm host in the alarm details.

2. Log in to the node for which the alarm is generated.3. Run the free -m | grep Mem\: | awk '{printf("%s,", ($3-$6-$7) * 100 / $2)}' command

to check the system memory usage.4. If the memory usage exceeds the threshold, expand the memory capacity.5. Wait 5 minutes and check whether the alarm is cleared.



Telephone:

Germany: 0800 330 44 44


Issue 01 (2017-02-20) 104


----End

Related Information

N/A

5.6.13 ALM-12027 Host PID Usage Exceeds the Threshold

Description

The system checks the PID usage every 30 seconds and compares the actual PID usage withthe default threshold. This alarm is generated when the PID usage exceeds the threshold.


This alarm is cleared when the host PID usage is less than or equal to the threshold.

Attribute


12027 Major Yes

Parameters







No PID is available for new processes and service processes are unavailable.

Possible Causes

Too many processes are running on the node. You need to increase the value of pid_max. Thesystem is abnormal.


Issue 01 (2017-02-20) 105

Procedure

Step 1 Increase the value of pid_max.

1. On MRS Manager, click the alarm in the real-time alarm list. In the Alarm Details area,obtain the IP address of the host for which the alarm is generated.

2. Log in to the node for which the alarm is generated.3. Run the cat /proc/sys/kernel/pid_max command to check the value of pid_max.4. If the PID usage exceeds the threshold, run the following command to double the value

of pid_max:echo new pid_max value > /proc/sys/kernel/pid_max.For example,echo 65536 > /proc/sys/kernel/pid_max


Step 2 Check whether the system environment is abnormal.

1. Contact the O&M personnel of the public cloud to check whether the operating system isabnormal.– If yes, go to Step 2 to rectify the fault.– If no, go to Step 3.



Telephone:

Germany: 0800 330 44 44


----End


5.6.14 ALM-12028 Number of Processes in the D State on the HostExceeds the Threshold

DescriptionThe system checks the number of processes in the D state of user omm on the host every 30seconds and compares the number with the threshold. The number of processes in the D stateon the host has a default threshold. This alarm is generated when the number of processes inthe D state exceeds the threshold.



Issue 01 (2017-02-20) 106

This alarm is cleared when the number is less than or equal to the threshold.

Attribute


12028 Major Yes

Parameters







Excessive system resources are used and service processes respond slowly.

Possible Causes

The host responds slowly to I/O (disk I/O and network I/O) requests and a process is in the Dstate.

Procedure

Step 1 Check the process that is in the D state.

1. In the alarm list on MRS Manager, locate the row that contains the alarm, and view theIP address of the alarm host in the alarm details.

2. Log in to the node for which the alarm is generated.3. Run the following command to switch the user:

sudo su - rootsu - omm

4. Run the following command to view the PID of the process of user omm that is in the Dstate:ps -elf | grep -v "\[thread_checkio\]" | awk 'NR!=1 {print $2, $3, $4}' | grep omm |awk -F' ' '{print $1, $3}' | grep D | awk '{print $2}'


Issue 01 (2017-02-20) 107

5. Check whether no command output is displayed.– If yes, the service process is running properly. Then go to 1.7.– If no, go to 1.6.

6. Switch to user root and run the reboot command to restart the alarm host.Restarting the host brings certain risks. Ensure that the service process runs properlyafter the restart.



Telephone:

Germany: 0800 330 44 44


----End

Related Information

N/A

5.6.15 ALM-12031 User omm or Password Is About to Expire

Description

The system starts at 00:00 every day to check whether user omm and password are about toexpire every eight hours. This alarm is generated if the user or password is about to expire in15 days.

The alarm is cleared when the validity period of user omm is changed or the password is resetand the alarm handling is complete.

Attribute


12031 Major Yes

Parameters





Issue 01 (2017-02-20) 108




The node trust relationship is unavailable and Manager cannot manage the services.

Possible Causes

User omm or password is about to expire.

Procedure

Step 1 Check whether user omm and password in the system are valid.

1. Log in to the faulty node.2. Run the following command to view the information about user omm and password:

chage -l omm3. Check whether the user has expired based on the system message.

a. View the value of Password expires to check whether the password is about toexpire.

b. View the value of Account expires to check whether the user is about to expire.

NOTE

If the parameter value is never, the user and password are valid permanently; if the value is a date,check whether the user and password are about to expire within 15 days.


4. Run the following command to modify the validity period configuration:

Run the following command to set a validity period for user omm:chage -E 'specified date' ommRun the following command to set the number of validity days for user omm:chage -M 'number of days' omm

5. Check whether the alarm is cleared automatically in the next periodic check.– If yes, no further action is required.– If no, go to Step 2.


Telephone:

Germany: 0800 330 44 44


----End


Issue 01 (2017-02-20) 109

Related Information

N/A

5.6.16 ALM-12032 User ommdba or Password Is About to Expire

Description

The system starts at 00:00 every day to check whether user ommdba and password are aboutto expire every eight hours. This alarm is generated if the user or password is about to expirein 15 days.

The alarm is cleared when the validity period of user ommdba is changed or the password isreset and the alarm handling is complete.

Attribute


12032 Major Yes

Parameters






The OMS database cannot be managed and data cannot be accessed.

Possible Causes

User ommdba or password is about to expire.

Procedure

Step 1 Check whether user ommdba and password in the system are valid.

1. Log in to the faulty node.2. Run the following command to view the information about user ommdba and password:

chage -l ommdba


Issue 01 (2017-02-20) 110

3. Check whether the user has expired based on the system message.

a. View the value of Password expires to check whether the password is about toexpire.

b. View the value of Account expires to check whether the user is about to expire.

NOTE

If the parameter value is never, the user and password are valid permanently; if the value is a date,check whether the user and password are about to expire within 15 days.


4. Run the following command to modify the validity period configuration:Run the following command to set a validity period for user ommdba:chage -E 'specified date' ommdbaRun the following command to set the number of validity days for user ommdba:chage -M 'number of days' ommdba

5. Check whether the alarm is cleared automatically in the next periodic check.– If yes, no further action is required.– If no, go to Step 2.


Telephone:

Germany: 0800 330 44 44


----End

Related Information

N/A

5.6.17 ALM-12033 Slow Disk Fault

Description

The system runs the iostat command every second to monitor the disk I/O indicator. Thisalarm is generated when the svctm value is larger than 100 ms for more than 30 times in 60seconds, indicating the disk is faulty.

This alarm is automatically cleared after the disk is replaced.

Attribute


12033 Critical Yes


Issue 01 (2017-02-20) 111

Parameters





DiskName Specifies the disk for which the alarm isgenerated.


Service performance deteriorates and service processing capabilities become poor. Forexample, DBService active/standby synchronization is affected and even the service isunavailable.

Possible Causes

The disk is aged or has bad sectors.

Procedure


Telephone:

Germany: 0800 330 44 44


----End

Related Information

N/A

5.6.18 ALM-12034 Periodic Backup Failure

Description

This alarm is generated when a periodic backup task fails to be executed. This alarm iscleared when the next backup task is executed successfully.


Issue 01 (2017-02-20) 112


12034 Major Yes





TaskName Specifies the task.

Impact on the SystemNo backup package is available for a long time, so the system cannot be restored in case ofexceptions.

Possible CausesThe alarm cause depends on the task details. Handle the alarm according to the logs and alarmdetails.

Procedure


Telephone:

Germany: 0800 330 44 44


----End



Issue 01 (2017-02-20) 113

5.6.19 ALM-12035 Unknown Data Status After Recovery TaskFailure

DescriptionAfter the recovery task fails, the system automatically rolls back. If the rollback fails, datamay be lost. If this occurs, an alarm is generated. This alarm is cleared when the nextrecovery task is executed successfully.


12035 Critical Yes





TaskName Specifies the task.

Impact on the SystemAfter the recovery task fails, the system automatically rolls back. If the rollback fails, datamay be lost or the data status may be unknown, which may affect services.

Possible CausesThe alarm cause depends on the task details. Handle the alarm according to the logs and alarmdetails.

Procedure


Telephone:

Germany: 0800 330 44 44


----End


Issue 01 (2017-02-20) 114

Related Information

N/A

5.6.20 ALM-12037 NTP Server Is Abnormal

Description

This alarm is generated when the NTP server is abnormal.

This alarm is cleared when the NTP server recovers.

Attribute


12037 Major Yes

Parameters




HostName Specifies the IP address of the NTP serverfor which the alarm is generated.


The NTP server configured on the active OMS node is abnormal. In this case, the active OMSnode fails to synchronize time with the NTP server and a time offset may be generated in thecluster.

Possible Causesl The NTP server network is faulty.l The NTP server authentication fails.l The NTP server time cannot be obtained.l The time obtained from the NTP server is not continuously updated.

Procedure

Step 1 Check the NTP server network.

1. On the MRS Manager portal, view the real-time alarm list and locate the target alarm.


Issue 01 (2017-02-20) 115

2. In the Alarm Details area, view the additional information to check whether the NTPserver fails to be pinged.



3. Contact the O&M personnel of the public cloud to check the network configuration andensure that the network between the NTP server and the active OMS node is in normalstate. Then, check whether the alarm is cleared.



Step 2 Check whether the NTP server authentication fails.

1. Log in to the active management node.

2. Run the ntpq -np command to check whether the NTP server authentication fails. Ifrefid of the NTP server is .AUTH., the authentication fails.



Step 3 Check whether the time can be obtained from the NTP server.

1. View the alarm additional information to check whether the time cannot be obtainedfrom the NTP server.



2. Contact the O&M personnel of the public cloud to rectify the NTP server fault. After theNTP server is in normal state, check whether the alarm is cleared.



Step 4 Check whether the time obtained from the NTP server is not continuously updated.

1. View the alarm additional information to check whether the time obtained from the NTPserver is not continuously updated.



2. Contact the provider of the NTP server to rectify the NTP server fault. After the NTPserver is in normal state, check whether the alarm is cleared.




Telephone:

Germany: 0800 330 44 44


----End


Issue 01 (2017-02-20) 116

Related Information

N/A

5.6.21 ALM-12038 Monitoring Indicator Dump Failure

Description

This alarm is generated when dump fails after monitoring indicator dump is configured onMRS Manager.

This alarm is cleared when dump is successful.

Attribute


12038 Major Yes

Parameters






The upper-layer management system fails to obtain monitoring indicators from the MRSManager system.

Possible Causesl The server cannot be connected.l The save path on the server cannot be accessed.l The monitoring indicator file fails to be uploaded.

Procedure

Check whether the server connection is in normal state.

Step 1 Contact the O&M personnel of the public cloud to check whether the network connectionbetween the MRS Manager system and the server is in normal state.


Issue 01 (2017-02-20) 117

l If yes, go to Step 3.l If no, go to Step 2.

Step 2 Contact the O&M personnel of the public cloud to restore the network and check whether thealarm is cleared.l If yes, no further action is required.l If no, go to Step 3.

Step 3 Choose System > Configure Monitoring Index Dump and check whether the FTPusername, password, port, dump mode, and public key configured on the monitoring indicatordumping configuration page are consistent with those on the server.l If yes, go to Step 5.l If no, go to Step 4.

Step 4 Enter the correct configuration information, click OK, and check whether the alarm iscleared.l If yes, no further action is required.l If no, go to Step 5.

Check whether the permission of the save path on the server is correct.

Step 5 Choose System > Configure Monitoring Index Dump and check the configuration items,including FTP Username, Save Path, and Dump Mode.l If the dump mode is FTP, go to Step 6.l If the dump mode is SFTP, go to Step 7.

Step 6 Log in to the server in FTP mode. In the default path, check whether the relative path SavePath has the read and write permission on FTP Username.l If yes, go to Step 9.l If no, go to Step 8.

Step 7 Log in to the server in FTP mode. In the default path, check whether the absolute path SavePath has the read and write permission on FTP Username.l If yes, go to Step 9.l If no, go to Step 8.

Step 8 Add the read and write permission and check whether the alarm is cleared.l If yes, no further action is required.l If no, go to Step 9.

Check whether the save path on the server has sufficient disk space.

Step 9 Log in to the server and check whether the save path has sufficient disk space.l If yes, go to Step 11.l If no, go to Step 10.

Step 10 Delete unnecessary files or go to the monitoring indicator dumping configuration page tochange the save path. Check whether the alarm is cleared.l If yes, no further action is required.l If no, go to Step 11.


Issue 01 (2017-02-20) 118


Telephone:

Germany: 0800 330 44 44


----End

Related Information

N/A

5.6.22 ALM-12039 GaussDB Data Is Not Synchronized

Description

The system checks the data synchronization status between the active and standby GaussDBsevery 10 seconds. This alarm is generated when the synchronization status cannot be queriedfor six consecutive times or when the synchronization status is abnormal.

This alarm is cleared when the data synchronization is in normal state.

Attribute


12039 Critical Yes

Parameters





Local GaussDB HA IP Specifies the HA IP address of the localGaussDB.

Peer GaussDB HA IP Specifies the HA IP address of the peerGaussDB.

SYNC_PERSENT Specifies the synchronization percentage.


Issue 01 (2017-02-20) 119


When data is not synchronized between the active and standby GaussDBs, the data may belost or abnormal if the active instance becomes abnormal.

Possible Causesl The network between the active and standby nodes is unstable.l The standby GaussDB is abnormal.l The disk space of the standby node is full.

Procedure

Check whether the network between the active and standby nodes is in normal state.

Step 1 Log in to MRS Manager, click Alarm, click the row where the alarm is located in the alarmlist, and view the IP address of the standby GaussDB in the alarm details.


Step 3 Run the following command to check whether the standby GaussDB is reachable:

ping heartbeat IP address of the standby GaussDB

If yes, go to Step 6.

If no, go to Step 4.

Step 4 Contact the O&M personnel of the public cloud to check whether the network is faulty.l If yes, go to Step 5.l If no, go to Step 6.

Step 5 Rectify the network fault and check whether the alarm is cleared from the alarm list.l If yes, no further action is required.l If no, go to Step 6.

Check whether the standby GaussDB is in normal state.

Step 6 Log in to the standby GaussDB node.


sudo su - root

su - omm

Step 8 Go to the ${BIGDATA_HOME}/om-0.0.1/sbin/ directory.

Run the following command to check whether the resource status of the standby GaussDB isnormal:

sh status-oms.sh

In the command output, check whether the following information is displayed in the rowwhere ResName is gaussDB:10_10_10_231 gaussDB Standby_normal Normal Active_standby

l If yes, go to Step 9.


Issue 01 (2017-02-20) 120


Check whether the disk space of the standby node is insufficient.

Step 9 Log in to the standby GaussDB node.


sudo su - root

su - omm

Step 11 Run the echo ${BIGDATA_DATA_HOME}/dbdata_om command to obtain the GaussDBdata directory.

Step 12 Run the df -h command to check the system disk partition usage.

Step 13 Check whether the disk where the GaussDB data directory is mounted is full.

l If yes, go to Step 14.


Step 14 Contact the O&M personnel of the public cloud to expand the disk capacity. After capacityexpansion, wait 2 minutes and check whether the alarm is cleared.




Telephone:

Germany: 0800 330 44 44


----End

Related Information

N/A

5.6.23 ALM-12040 Insufficient System Entropy

Description

The system checks the entropy at 00:00:00 every day and performs five consecutive checkseach time. First, the system checks whether the rng-tools or haveged tool is enabled andcorrectly configured. If not, the system checks the current entropy. This alarm is generated ifthe entropy is less than 500 in the five checks.

This alarm is cleared if the true random number mode is configured, random numbers areconfigured in the pseudo-random number mode, or neither the true random number mode northe pseudo-random number mode is configured but the entropy is greater than or equal to 500in at least one check among the five checks.


Issue 01 (2017-02-20) 121


12040 Major Yes






Decryption failures occur and functions related to decryption are affected, for example,DBService installation.

Possible Causes

The haveged service or rngd service is abnormal.

Procedure

Check and manually configure the system entropy.

Step 1 On the MRS Manager portal, click Alarm.

Step 2 View the detailed alarm information to obtain the value of the HostName field in locationinformation.

Step 3 Log in to the node for which the alarm is generated. Run the sudo su - root command toswitch the user.

Step 4 Run the /bin/rpm -qa | grep -w "haveged" command. If the command can be executedsuccessfully, run the /sbin/service haveged status |grep "running" command and view thecommand output.l If the command is executed successfully, the haveged service is installed, correctly

configured, and is running properly. Go to Step 8.l If the command is not executed successfully, the haveged service is not running properly.

Go to Step 5.

Step 5 Run the /bin/rpm -qa | grep -w "rng-tools" command. If the command is executedsuccessfully, run the ps -ef | grep -v "grep" | grep rngd | tr -d " " | grep "\-o/dev/random"| grep "\-r/dev/urandom" command and view the command output.


Issue 01 (2017-02-20) 122

l If the command is executed successfully, the rngd service is installed, correctlyconfigured, and is running properly. Go to Step 8.

l If the command is not executed successfully, the rngd service is not running properly. Goto Step 6.

Step 6 Manually configure the system entropy. For details, see Related Information.

Step 7 Wait until 00:00:00 when the system checks the entropy again. Check whether the alarm iscleared automatically.




Telephone:

Germany: 0800 330 44 44


----End

Related Information

Manually check the system entropy.

Log in to the node and run the sudo su - root command to switch the user. Run the cat /proc/sys/kernel/random/entropy_avail command to check whether the system entropymeets the requirement (the entropy must be greater than or equal to 500). If the systementropy is less than 500, you can reset it by using one of the following methods:

l Using the haveged tool (true random number mode): Contact the O&M personnel of thepublic cloud to install the tool and then start it.

l Using the rng-tools tool (pseudo-random number mode): Contact the O&M personnel ofthe public cloud to install the tool.

5.6.24 ALM-13000 ZooKeeper Service Unavailable

Description

The system checks the ZooKeeper service status every 60 seconds. This alarm is generatedwhen the ZooKeeper service is unavailable.

This alarm is cleared when the ZooKeeper service recovers.

Attribute


13000 Critical Yes


Issue 01 (2017-02-20) 123

Parameters






ZooKeeper fails to provide coordination services for upper-layer components and thecomponents depending on ZooKeeper may not run properly.

Possible Causesl The ZooKeeper instance is abnormal.l The disk capacity is insufficient.l The network is faulty.l The DNS is installed on the ZooKeeper node.

Procedure

Check the ZooKeeper service instance status.

Step 1 On MRS Manager, choose Service > ZooKeeper > quorumpeer.

Step 2 Check whether the ZooKeeper instances are normal.l If yes, go to Step 6.l If no, go to Step 3.

Step 3 Select instances whose status is not good and choose More > Restart Instance.

Step 4 Check whether the instance status is good after restart.l If yes, go to Step 5.l If no, go to Step 19.

Step 5 On the Alarm tab, check whether the alarm is cleared.l If yes, no further action is required.l If no, go to Step 6.

Check the disk status.

Step 6 On MRS Manager, choose Service > ZooKeeper > quorumpeer, and check the hostinformation of the ZooKeeper instance on each node.



Issue 01 (2017-02-20) 124

Step 8 In the Disk Usage column, check whether the disk space of each node where ZooKeeperinstances are located is insufficient (disk usage exceeds 80%).l If yes, go to Step 9.l If no, go to Step 11.

Step 9 Expand disk capacity. For details, see ALM-12017 Insufficient Disk Capacity.


Check the network status.

Step 11 On the Linux node where the ZooKeeper instance is located, run the ping command to checkwhether the host names of other nodes where the ZooKeeper instance is located can be pingedsuccessfully.l If yes, go to Step 15.l If no, go to Step 12.

Step 12 Modify the IP addresses in /etc/hosts and add the host name and IP address mapping.

Step 13 Run the ping command again to check whether the host names of other nodes where theZooKeeper instance is located can be pinged successfully.l If yes, go to Step 14.l If no, go to Step 19.


Check the DNS.

Step 15 Check whether the DNS is installed on the node where the ZooKeeper instance is located. Onthe Linux node where the ZooKeeper instance is located, run the cat /etc/resolv.confcommand to check whether the file is empty.l If yes, go to Step 16.l If no, go to Step 19.

Step 16 Run the service named status command to check whether the DNS is started.l If yes, go to Step 17.l No, go to Step 19.

Step 17 Run the service named stop command to stop the DNS service. If "Shutting down nameserver BIND waiting for named to shut down (28s)" is displayed, the DNS service is stoppedsuccessfully. Comment out the content (if any) in /etc/resolv.conf.


Collect fault information.

Step 19 On the MRS Manager portal, choose System > Export Log.


Issue 01 (2017-02-20) 125


Telephone:

Germany: 0800 330 44 44


----End

Related Information

N/A

5.6.25 ALM-13001 Available ZooKeeper Connections AreInsufficient

Description

The system checks ZooKeeper connections every 60 seconds. This alarm is generated whenthe system detects that the number of used ZooKeeper instance connections exceeds thethreshold (80% of the maximum connections).

This alarm is cleared when the number of used ZooKeeper instance connections is less thanthe threshold.

Attribute


13001 Major Yes

Parameters







Issue 01 (2017-02-20) 126

Impact on the SystemAvailable ZooKeeper connections are insufficient. When the connection usage reaches 100%,external connections cannot be handled.

Possible CausesThe number of connections to the ZooKeeper node exceeds the threshold. Connection leakageoccurs on some connection processes, or the maximum number of connections does not meetthe requirement of the actual scenario.

Procedure

Step 1 Check the connection status.

1. On the MRS Manager portal, choose Alarm > ALM-13001 Available ZooKeeperConnections Are Insufficient > Location. Check the IP address of the node for whichthe alarm is generated.

2. Obtain the PID of the ZooKeeper process. Log in to the node for which this alarm isgenerated and run the pgrep -f proc_zookeeper command.

3. Check whether the PID can be successfully obtained.– If yes, go to Step 1.4.– If no, go to Step 2.

4. Obtain all the IP addresses connected to the ZooKeeper instance and the number ofconnections and check 10 IP addresses with top connections. Run the followingcommand based on the obtained PID and IP address: lsof -i|grep $pid | awk '{print $9}'| cut -d : -f 2 | cut -d \> -f 2 | awk '{a[$1]++} END {for(i in a){print i,a[i] | "sort -r -g -k 2"}}' | head -10. ($pid is the PID obtained in the preceding step.)

5. Check whether the node IP addresses and the number of connections are successfullyobtained.– If yes, go to Step 1.6.– If no, go to Step 2.

6. Obtain the ID of the port connected to the process. Run the following command based onthe obtained PID and IP address: lsof -i|grep $pid | awk '{print $9}' |cut -d \> -f 2 |grep$IP | cut -d :-f 2. ($pid and $IP are the PID and IP address obtained in the precedingstep.)

7. Check whether the port ID is successfully obtained.– If yes, go to Step 1.8.– If no, go to Step 2.

8. Obtain the ID of the connected process. Log in to each IP address and run the followingcommand based on the obtained port ID: lsof -i|grep $port. ($port is the port IDobtained in the preceding step.)

9. Check whether the process ID is successfully obtained.– If yes, go to Step 1.10.– If no, go to Step 2.

10. Check whether connection leakage occurs on the process based on the obtained processID.– If yes, go to Step 1.11.


Issue 01 (2017-02-20) 127

– If no, go to Step 2.11. Close the process where connection leakage occurs and check whether the alarm is

cleared.– If yes, no further action is required.– If no, go to Step 2.

12. On the MRS Manager portal, choose Service > ZooKeeper > Service Configuration >All > quorumpeer > Performance and change the value of maxCnxns to 20000 ormore.

13. Check whether the alarm is cleared.– If yes, no further action is required.– If no, go to Step 2.


1. On the MRS Manager portal, choose System > Export Log.2. Call the OTC Customer Hotline for support.

Telephone:Germany: 0800 330 44 44International: +800 44556600

----End


5.6.26 ALM-13002 ZooKeeper Memory Usage Exceeds theThreshold

DescriptionThe system checks the memory usage of the ZooKeeper service every 30 seconds. This alarmis generated when the memory usage of a ZooKeeper instance exceeds the threshold (80% ofthe maximum memory).

The alarm is cleared when the memory usage is less than the threshold.


13002 Major Yes


Issue 01 (2017-02-20) 128

Parameters







If the available memory for the ZooKeeper service is insufficient, a memory overflow occursand the service breaks down.

Possible Causes

The memory usage of the ZooKeeper instance on the node is overused or the memory isinappropriately allocated.

Procedure

Step 1 Check the memory usage.

1. On the MRS Manager portal, choose Alarm > ALM-13002 ZooKeeper Memory UsageExceeds the Threshold > Location. Check the IP address of the instance for which thealarm is generated.

2. On the MRS Manager portal, choose Service > ZooKeeper > Instance >quorumpeer(the IP address checked) > Customize > Heap and Direct Memory ofZooKeeper. Check the heap usage.

3. Check whether the used heap memory of ZooKeeper reaches 80% of the maximum heapmemory specified for ZooKeeper.– If yes, go to Step 1.4.– If no, go to Step 1.6.

4. On the MRS Manager portal, choose Service > ZooKeeper > Service Configuration >All > quorumpeer > System. Increase the value of -Xmx in GC_OPTS as required.

5. Check whether the alarm is cleared.– If yes, no further action is required.– If no, go to Step 1.6.

6. On the MRS Manager portal, choose Service > ZooKeeper > Instances >quorumpeer(the IP address checked) > Customize > Heap and Direct Memory ofZooKeeper. Check the direct buffer memory usage.


Issue 01 (2017-02-20) 129

7. Check whether the used direct buffer memory of ZooKeeper reaches 80% of themaximum direct buffer memory specified for ZooKeeper.– If yes, go to Step 1.8.– If no, go to Step 2.

8. On the MRS Manager portal, choose Service > ZooKeeper > Service Configuration >All > quorumpeer > System. Increase the value of -XX:MaxDirectMemorySize inGC_OPTS as required.





----End

Related Information

N/A

5.6.27 ALM-14000 HDFS Service Unavailable

Description

The system checks the service status of NameService every 60 seconds. This alarm isgenerated when the HDFS service is unavailable because all the NameService services areabnormal.

This alarm is cleared when the HDFS service recovers because at least one NameServiceservice is in normal state.

Attribute


14000 Critical Yes

Parameters




Issue 01 (2017-02-20) 130





HDFS fails to provide services for HDFS service-based upper-layer components, such asHBase and MapReduce. As a result, users cannot read or write files.

Possible Causesl The ZooKeeper service is abnormal.l All NameService services are abnormal.

Procedure

Step 1 Check the ZooKeeper service status.

1. Log in to MRS Manager, choose Service, and check whether the health status of theZooKeeper service is Good.– If yes, go to Step 1.2.– If no, go to Step 2.1.

2. Rectify the health status of the ZooKeeper service. For details, see ALM-13000ZooKeeper Service Unavailable. Then check whether the health status of theZooKeeper service is Good.– If yes, go to Step 1.3.– If no, go to Step 3.

3. Wait 5 minutes and check whether the alarm is cleared.– If yes, no further action is required.– If no, go to Step 2.1.

Step 2 Handle the NameService service exception alarm.

1. Log in to MRS Manager. On the Alarm page, check whether all NameService serviceshave abnormal alarms.– If yes, go to Step 2.2.– If no, go to Step 3.

2. See ALM-14010 NameService Service Is Abnormal to handle the abnormalNameService services and check whether each NameService service exception alarm iscleared.– If yes, go to Step 2.3.– If no, go to Step 3.



Issue 01 (2017-02-20) 131





----End

Related Information

N/A

5.6.28 ALM-14001 HDFS Disk Usage Exceeds the Threshold

Description

The system checks the disk usage of the HDFS cluster every 30 seconds and compares theactual disk usage with the threshold. The HDFS cluster disk usage indicator has a defaultthreshold. This alarm is generated when the HDFS disk usage exceeds the threshold.

To change the threshold, choose System > Configure Alarm Threshold > Service > HDFS.

This alarm is cleared when the disk usage of the HDFS cluster is less than or equal to thethreshold.

Attribute


14001 Major Yes

Parameters





NSName Specifies the NameService service forwhich the alarm is generated.


Issue 01 (2017-02-20) 132




The performance of writing data to HDFS is affected.

Possible Causes

The disk space configured for the HDFS cluster is insufficient.

Procedure

Step 1 Check the disk capacity and delete unnecessary files.

1. On the MRS Manager portal, choose Service > HDFS. The Service Status page isdisplayed.

2. In the Real-Time Statistics area, view the value of the monitoring indicator Percentageof HDFS Capacity to check whether the HDFS disk usage exceeds the threshold (80%by default).– If yes, go to Step 1.3.– If no, go to Step 3.

3. Use the client on the cluster node and run the hdfs dfsadmin -report command to checkwhether the value of DFS Used% is less than 100% minus the threshold.– If yes, go to Step 1.5.– If no, go to Step 3.

4. Use the client on the cluster node and run the hdfs dfs -rm -r file or directory commandto delete unnecessary files.







Telephone:Germany: 0800 330 44 44


Issue 01 (2017-02-20) 133


----End

Related Information

N/A

5.6.29 ALM-14002 DataNode Disk Usage Exceeds the Threshold

Description

The system checks the DataNode disk usage every 30 seconds and compares the actual diskusage with the threshold. The Percentage of DataNode Capacity indicator has a defaultthreshold. This alarm is generated when the value of the Percentage of DataNode Capacityindicator exceeds the threshold.


This alarm is cleared when the value of the Percentage of DataNode Capacity indicator isless than or equal to the threshold.

Attribute


14002 Major Yes

Parameters







The performance of writing data to HDFS is affected.


Issue 01 (2017-02-20) 134

Possible Causesl The disk space configured for the HDFS cluster is insufficient.

l Data skew occurs among DataNodes.

Procedure

Step 1 Check the cluster disk capacity.

1. Log in to MRS Manager. On the Alarm page, check whether the ALM-14001 HDFSDisk Usage Exceeds the Threshold alarm exists.– If yes, go to Step 1.2.– If no, go to Step 2.1.

2. Follow procedures in ALM-14001 HDFS Disk Usage Exceeds the Threshold to handlethe alarm and check whether the alarm is cleared.– If yes, go to Step 1.3.– If no, go to Step 3.


Step 2 Check the balance status of DataNodes.

1. Use the client on the cluster node, run the hdfs dfsadmin -report command to view thevalue of DFS Used% on the DataNode for which the alarm is generated, and comparethe value with those on other DataNodes. Check whether the difference between thevalues is greater than 10.– If yes, go to Step 2.2.– If no, go to Step 3.

2. If data skew occurs, use the client on the cluster node and run the hdfs balancer -threshold 10 command.





----End

Related Information

N/A


Issue 01 (2017-02-20) 135

5.6.30 ALM-14003 Number of Lost HDFS Blocks Exceeds theThreshold

Description

The system checks the number of lost blocks every 30 seconds and compares the number oflost blocks with the threshold. The lost blocks indicator has a default threshold. This alarm isgenerated when the number of lost blocks exceeds the threshold.


This alarm is cleared when the number of lost blocks is less than or equal to the threshold.

Attribute


14003 Major Yes

Parameters








Data stored in HDFS is lost. HDFS may enter the safe mode and cannot provide writeservices. Lost block data cannot be restored.

Possible Causesl The DataNode instance is abnormal.

l Data is deleted.


Issue 01 (2017-02-20) 136

Procedure

Step 1 Check the DataNode instance.

1. On the MRS Manager portal, choose Service > HDFS > Instance.

2. Check whether the status of all DataNode instances is Good.



3. Restart the DataNode instance and check whether the DataNode instance restartssuccessfully.



Step 2 Delete the damaged file.

1. Use the client on the cluster node. Run the hdfs fsck / -delete command to delete the lostfile. Then rewrite the file and recover the data.







Telephone:

Germany: 0800 330 44 44


----End

Related Information

N/A

5.6.31 ALM-14004 Number of Damaged HDFS Blocks Exceeds theThreshold

Description

The system checks the number of damaged blocks every 30 seconds and compares thenumber of damaged blocks with the threshold. The damaged blocks indicator has a defaultthreshold. This alarm is generated when the number of damaged blocks exceeds the threshold.


This alarm is cleared when the number of damaged blocks is less than or equal to thethreshold.


Issue 01 (2017-02-20) 137

Attribute


14004 Major Yes

Parameters








Data is damaged and HDFS fails to read files.

Possible Causesl The DataNode instance is abnormal.

l Data verification information is damaged.

Procedure


Telephone:

Germany: 0800 330 44 44


----End

Related Information

N/A


Issue 01 (2017-02-20) 138

5.6.32 ALM-14006 Number of HDFS Files Exceeds the Threshold

Description

The system checks the number of HDFS files every 30 seconds and compares the number ofHDFS files with the threshold. This alarm is generated when the system detects that thenumber of HDFS files exceeds the threshold.


This alarm is cleared when the number of HDFS files is less than or equal to the threshold.

Attribute


14006 Major Yes

Parameters








Disk storage space is insufficient, which may result in data import failure. The performance ofthe HDFS system is affected.

Possible Causes

The number of HDFS files exceeds the threshold.

Procedure

Step 1 Check whether unnecessary files exist in the system.


Issue 01 (2017-02-20) 139

1. Use the client on the cluster node and run the hdfs dfs -ls file or directory command tocheck whether the files in the directory can be deleted.– If yes, go to Step 1.2.– If no, go to Step 2.1.

2. Run the hdfs dfs -rm -r file or directory command. Delete unnecessary files, wait 5minutes, and check whether the alarm is cleared.– If yes, no further action is required.– If no, go to Step 2.1.

Step 2 Check the number of files in the system.

1. On the MRS Manager portal, choose System > Configure Alarm Threshold.2. In the navigation tree on the left, choose Service > HDFS > HDFS File > Total

Number of Files.3. In the right pane, modify the threshold in the rule based on the number of current HDFS

files.To check the number of HDFS files, choose Service > HDFS, click Customize in theReal-Time Statistics area on the right, and select the HDFS File monitoring item.





----End

Related Information

N/A

5.6.33 ALM-14007 HDFS NameNode Memory Usage Exceeds theThreshold

Description

The system checks the HDFS NameNode usage every 30 seconds and compares the actualmemory usage with the threshold. The HDFS NameNode memory usage has a defaultthreshold. This alarm is generated when the HDFS NameNode memory usage exceeds thethreshold.


This alarm is cleared when the HDFS NameNode memory usage is less than or equal to thethreshold.


Issue 01 (2017-02-20) 140

Attribute


14007 Major Yes

Parameters





Trigger condition Generates an alarm when the actualindicator value exceeds the specifiedthreshold.


The HDFS NameNode memory usage is too high, which affects the data read/writeperformance of the HDFS.

Possible Causes

The HDFS NameNode memory is insufficient.

Procedure

Step 1 Delete unnecessary files.








Telephone:

Germany: 0800 330 44 44


Issue 01 (2017-02-20) 141


----End

Related Information

N/A

5.6.34 ALM-14008 HDFS DataNode Memory Usage Exceeds theThreshold

Description

The system checks the HDFS DataNode usage every 30 seconds and compares the actualmemory usage with the threshold. The HDFS DataNode memory usage has a defaultthreshold. This alarm is generated when the HDFS DataNode memory usage exceeds thethreshold.


This alarm is cleared when the HDFS DataNode memory usage is less than or equal to thethreshold.

Attribute


14007 Major Yes

Parameters







The HDFS DataNode memory usage is too high, which affects the data read/writeperformance of the HDFS.


Issue 01 (2017-02-20) 142

Possible Causes

The HDFS DataNode memory is insufficient.

Procedure

Step 1 Delete unnecessary files.








Telephone:

Germany: 0800 330 44 44


----End

Related Information

N/A

5.6.35 ALM-14009 Number of Faulty DataNodes Exceeds theThreshold

Description

The system periodically checks the number of faulty DataNodes in the HDFS cluster every 30seconds, and compares the number with the threshold. The number of faulty DataNodes has adefault threshold. This alarm is generated when the number of faulty DataNodes in the HDFScluster exceeds the threshold.


This alarm is cleared when the number of faulty DataNodes in the HDFS cluster is less thanor equal to the threshold.

Attribute


14009 Major Yes


Issue 01 (2017-02-20) 143

Parameters







Faulty DataNodes cannot provide HDFS services.

Possible Causesl DataNodes are faulty or overloaded.l The network between the NameNode and the DataNode is disconnected or busy.l NameNodes are overloaded.

Procedure

Step 1 Check whether DataNodes are faulty.

1. Use the client on the cluster node and run the hdfs dfsadmin -report command to checkwhether DataNodes are faulty.– If yes, go to Step 1.2.– If no, go to Step 2.1.

2. On the MRS Manager portal, choose Service > HDFS > Instance to check whether theDataNode is stopped.– If yes, go to Step 1.3.– If no, go to Step 2.1.

3. Select the DataNode instance, and choose More > Restart Instance to restart it. Wait 5minutes and check whether the alarm is cleared.– If yes, no further action is required.– If no, go to Step 2.1.

Step 2 Check the status of the network between the NameNode and the DataNode.

1. Log in to the faulty DataNode by using the service IP address of the node, and run theping IP address of the NameNode command to check whether the network between theDataNode and the NameNode is abnormal.– If yes, go to Step 2.2.


Issue 01 (2017-02-20) 144


2. Rectify the network fault. Wait 5 minutes and check whether the alarm is cleared.



Step 3 Check whether the DataNode is overloaded.

1. On the MRS Manager portal, click Alarm and check whether the alarm ALM-14008HDFS DataNode Memory Usage Exceeds the Threshold exists.



2. Follow procedures in ALM-14008 HDFS DataNode Memory Usage Exceeds theThreshold to handle the alarm and check whether the alarm is cleared.






Step 4 Check whether the NameNode is overloaded.

1. On the MRS Manager portal, click Alarm and check whether the alarm ALM-14007HDFS NameNode Memory Usage Exceeds the Threshold exists.



2. Follow procedures in ALM-14007 HDFS NameNode Memory Usage Exceeds theThreshold to handle the alarm and check whether the alarm is cleared.









Telephone:

Germany: 0800 330 44 44


----End

Related Information

N/A


Issue 01 (2017-02-20) 145

5.6.36 ALM-14010 NameService Service Is Abnormal

Description

The system checks the NameService service status every 180 seconds. This alarm is generatedwhen the NameService service is unavailable.

This alarm is cleared when the NameService service recovers.

Attribute


14010 Major Yes

Parameters





NSName Specifies the name of NameService forwhich the alarm is generated.


HDFS fails to provide services for upper-layer components based on the NameServiceservice, such as HBase and MapReduce. As a result, users cannot read or write files.

Possible Causesl The JournalNode is faulty.l The DataNode is faulty.l The disk capacity is insufficient.l The NameNode enters safe mode.

Procedure

Step 1 Check JournalNode instance status.

1. On the MRS Manager portal, click Service.2. Click HDFS.


Issue 01 (2017-02-20) 146

3. Click Instance.4. Check whether the Health Status of the JournalNode is Good.

– If yes, go to Step 2.1.– If no, go to Step 1.5.

5. Select the faulty JournalNode, and choose More > Restart Instance. Check whether theJournalNode successfully restarts.– If yes, go to Step 1.6.– If no, go to Step 5.


Step 2 Check DataNode instance status.

1. On the MRS Manager portal, click Service.2. Click HDFS.3. In Operation and Health Summary, check whether the Health Status of all

DataNodes is Good.– If yes, go to Step 3.1.– If no, go to Step 2.4.

4. Click Instance. On the DataNode management page, select the faulty DataNode, andchoose More > Restart Instance. Check whether the DataNode successfully restarts.– If yes, go to Step 2.5.– If no, go to Step 3.1.


Step 3 Check disk status.

1. On the MRS Manager portal, click Host.2. In the Disk Usage column, check whether disk space is insufficient.




Step 4 Check whether NameNode is in safe mode.

1. Use the client on the cluster node, and run the hdfs dfsadmin -safemode get commandto check whether Safe mode is ON is displayed.Information behind Safe mode is ON is alarm information and is displayed based actualconditions.– If yes, go to Step 4.2.


Issue 01 (2017-02-20) 147

– If no, go to Step 4.2.2. Use the client on the cluster node and run the hdfs dfsadmin -safemode leave

command.3. Wait 5 minutes and check whether the alarm is cleared.





----End


5.6.37 ALM-14011 HDFS DataNode Data Directory Is NotConfigured Properly

DescriptionThe DataNode parameter dfs.datanode.data.dir specifies DataNode data directories. Thisalarm is generated when a configured data directory cannot be created, a data directory usesthe same disk as other critical directories in the system, or multiple directories use the samedisk.

This alarm is cleared when the DataNode data directory is configured properly and thisDataNode is restarted.


14011 Major Yes





Issue 01 (2017-02-20) 148



Impact on the SystemIf the DataNode data directory is mounted to critical directories such as the root directory, thedisk space of the root directory will be used up after long time running, causing a systemfault.

If the DataNode data directory is not configured properly, HDFS performance will deteriorate.

Possible Causesl The DataNode data directory fails to be created.l The DataNode data directory uses the same disk with critical directories, such as / or /

boot.l Multiple directories in the DataNode data directory use the same disk.

Procedure

Step 1 Check the alarm cause and information about the DataNode for which the alarm is generated.

1. On the MRS Manager portal, choose Alarm. In the alarm list, click the alarm.2. In the Alarm Details area, view Alarm Cause. In HostName of Location, obtain the

host name of the DataNode for which the alarm is generated.

Step 2 Delete directories that do not comply with the disk plan from the DataNode data directory.

1. Choose Service > HDFS > Instance. In the instance list, click the DataNode instance onthe node for which the alarm is generated.

2. Click Instance Configuration and view the value of the DataNode parameterdfs.datanode.data.dir.

3. Check whether all DataNode data directories are consistent with the disk plan.– If yes, go to Step 2.4.– If no, go to Step 2.7.

4. Modify the DataNode parameter dfs.datanode.data.dir and delete the incorrectdirectories.

5. Choose Service > HDFS > Instance and restart the DataNode instance.6. Check whether the alarm is cleared.


7. Log in to the DataNode for which the alarm is generated.– If the alarm cause is "The DataNode data directory fails to be created", go to Step

3.1.– If the alarm cause is "The DataNode data directory uses the same disk with critical

directories, such / or /boot", go to Step 4.1.


Issue 01 (2017-02-20) 149

– If the alarm cause is "Multiple directories in the DataNode data directory use thesame disk", go to Step 5.1.

Step 3 Check whether the DataNode data directory fails to be created.

1. Run the following command to switch the user:sudo su - rootsu - omm

2. Run the ls command to check whether the directories exist in the DataNode datadirectory.– If yes, go to Step 7.– If no, go to Step 3.3.

3. Run the mkdir data directory command to create the directory and check whether thedirectory can be successfully created.– If yes, go to Step 6.1.– If no, go to Step 3.4.

4. On the MRS Manager portal, click Alarm to check whether the alarm ALM-12017Insufficient Disk Capacity exists.– If yes, go to Step 3.5.– If no, go to Step 3.6.

5. Adjust the disk capacity and check whether the alarm ALM-12017 Insufficient DiskCapacity is cleared. For details, see ALM-12017 Insufficient Disk Capacity.– If yes, go to Step 3.3.– If no, go to Step 7.

6. Check whether user omm has the rwx or x permission of all the upper-layer directoriesof the directory. (For example, for /tmp/abc/, user omm has the x permission for thetmp directory and the rwx permission for the abc directory.)– If yes, go to Step 6.1.– If no, go to Step 3.7.

7. Run the chmod u+rwx path or chmod u+x path command as user root to assign therwx or x permission of these directories to user omm. Go to Step 3.3.

Step 4 Check whether the DataNode data directory uses the same disk as other critical directories inthe system.

1. Run the df command to obtain the disk mounting information of each directory in theDataNode data directory.

2. Check whether the directories mounted to the disk are critical directories, such as / or /boot.– If yes, go to Step 4.3.– If no, go to Step 6.1.

3. Change the value of the DataNode parameter dfs.datanode.data.dir and delete thedirectories that use the same disk as critical directories.

4. Go to Step 6.1.

Step 5 Check whether multiple directories in the DataNode data directory use the same disk.

1. Run the df command to obtain the disk mounting information of each directory in theDataNode data directory. Record the mounted directory in the command output.


Issue 01 (2017-02-20) 150

2. Modify the DataNode node parameter dfs.datanode.data.dir to reserve only onedirectory among the directories that mounted to the same disk directory.

3. Go to Step 6.1.

Step 6 Restart the DataNode and check whether the alarm is cleared.

1. On the MRS Manager portal, choose Service > HDFS > Instance and restart theDataNode instance





----End

Related Information

N/A

5.6.38 ALM-14012 HDFS JournalNode Data Is Not Synchronized

Description

On the active NameNode, the system checks data synchronization on all JournalNodes in thecluster every 5 minutes. This alarm is generated when the data on a JournalNode is notsynchronized with the data on the other JournalNodes.

This alarm is cleared in 5 minutes after the data on JournalNodes is synchronized.


14012 Major Yes





Issue 01 (2017-02-20) 151


IP Specifies the service IP address of theJournalNode instance for which the alarm isgenerated.

Impact on the SystemWhen a JournalNode is working incorrectly, the data on the node is not synchronized withthat on the other JournalNodes. If data on more than half of JournalNodes is notsynchronized, the NameNode cannot work correctly, making the HDFS service unavailable.

Possible Causesl The JournalNode instance has not been started or has been stopped.l The JournalNode instance is working incorrectly.l The network of the JournalNode is unreachable.

Procedure

Step 1 Check whether the JournalNode instance has been started up.

1. Log in to MRS Manager, and click Alarm. In the alarm list, click the alarm.2. In the Alarm Details area, check Location and obtain the IP address of the JournalNode

for which the alarm is generated.3. Choose Service > HDFS > Instance. In the instance list, click the JournalNode for

which the alarm is generated and check whether Operating Status of the node isStarted.– If yes, go to Step 2.1.– If no, go to Step 1.4.

4. Select the JournalNode instance and choose More > Start Instance to start the instance.5. Wait 5 minutes and check whether the alarm is cleared.


Step 2 Check whether the JournalNode instance is working correctly.

1. Check whether Health Status of the JournalNode instance is Good.– If yes, go to Step 3.1.– If no, go to Step 2.2.

2. Select the JournalNode instance and choose More > Start Instance to start the instance.3. Wait 5 minutes and check whether the alarm is cleared.


Step 3 Check whether the network of the JournalNode is reachable.

1. On the MRS Manager portal, choose Service > HDFS > Instance to check the serviceIP address of the active NameNode.


Issue 01 (2017-02-20) 152

2. Log in to the active NameNode.3. Run the ping command to check whether a timeout occurs or the network is unreachable

between the active NameNode and the JournalNode.ping Service IP address of the JournalNode– If yes, go to Step 3.4.– If no, go to Step 4.

4. Contact O&M personnel of the public cloud to rectify the network fault. Wait 5 minutesand check whether the alarm is cleared.– If yes, no further action is required.– If no, go to Step 4.




----End


5.6.39 ALM-16000 Percentage of Sessions Connected to theHiveServer to the Maximum Number Allowed Exceeds theThreshold

DescriptionThe system checks the percentage of sessions connected to the HiveServer to the maximumnumber allowed every 30 seconds. This indicator can be viewed on the Hive servicemonitoring page. This alarm is generated when the percentage of sessions connected to theHiveServer to the maximum number allowed exceeds the specified threshold (90% bydefault).

To change the threshold, choose System > Configure Alarm Threshold > Service > Hive >Percentage of Sessions Connected to the HiveServer to Maximum Number of SessionsAllowed by the HiveServer.

This alarm can be automatically cleared when the percentage is less than or equal to thethreshold.


16000 Major Yes


Issue 01 (2017-02-20) 153

Parameters







If a connection alarm is generated, too many sessions are connected to the HiveServer andnew connections cannot be created.

Possible Causes

Too many clients are connected to the HiveServer.

Procedure

Step 1 Increase the maximum number of connections to Hive.

1. Log in to the MRS Manager portal.2. Choose Service > Hive > Service Configuration, and set Type to All.3. Increase the value of the hive.server.session.control.maxconnections configuration

item: Suppose the value of the configuration item is A, the threshold is B, and sessionsconnected to the HiveServer is C. Adjust the value of the configuration item according toA x B > C. Sessions connected to the HiveServer can be viewed on the Hive servicemonitoring page.





----End


Issue 01 (2017-02-20) 154

Related Information

N/A

5.6.40 ALM-16001 Hive Warehouse Space Usage Exceeds theThreshold

Description

The system checks the Hive data warehouse space usage every 30 seconds. The indicator"Percentage of HDFS Space Used by Hive to the Available Space" can be viewed on the Hiveservice monitoring page. This alarm is generated when the Hive warehouse space usageexceeds the specified threshold (85% by default).

To change the threshold, choose System > Configure Alarm Threshold > Service > Hive >Percentage of HDFS Space Used by Hive to the Available Space.

This alarm is cleared when the Hive warehouse space usage is less than or equal to thethreshold. You can reduce the warehouse space usage by expanding the warehouse capacity orreleasing the used space.

Attribute


16001 Major Yes

Parameters







The system fails to write data, which causes data loss.


Issue 01 (2017-02-20) 155

Possible Causesl The upper limit of the HDFS capacity available for Hive is too small.l The system disk space is insufficient.l Some data nodes break down.

Procedure

Step 1 Expand the system configuration.

1. Analyze the cluster HDFS capacity usage and increase the upper limit of the HDFScapacity available for Hive.Log in to MRS Manager, choose Service > Hive > Service Configuration, and set Typeto All. Increase the value of the hive.metastore.warehouse.size.percent configurationitem. Suppose the value of the configuration item is A, total HDFS storage space is B,the threshold is C, and HDFS space used by Hive is D. Adjust the value of theconfiguration item according to A x B x C > D. Total HDFS storage space can be viewedon the HDFS monitoring page, and HDFS space used by Hive can be viewed on the Hiveservice monitoring page.

2. Check whether the alarm is cleared.– If yes, no further action is required.– If no, go to Step 2.1.

Step 2 Expand the system capacity.

1. Add nodes.2. Check whether the alarm is cleared.


Step 3 Check whether data nodes are in normal state.

1. Log in to MRS Manager and click Alarm.2. Check whether the alarm ALM-12006 Node Fault, ALM-12007 Process Fault, or

ALM-14002 DataNode Disk Usage Exceeds the Threshold exists.– If yes, go to Step 3.3.– If no, go to Step 4.

3. Follow procedures in ALM-12006 Node Fault, ALM-12007 Process Fault, orALM-14002 DataNode Disk Usage Exceeds the Threshold to handle the alarm.




Telephone:Germany: 0800 330 44 44


Issue 01 (2017-02-20) 156


----End

Related Information

N/A

5.6.41 ALM-16002 Hive SQL Execution Success Rate Is LowerThan the Threshold

Description

The system checks the percentage of the HiveQL statements that are executed successfully inevery 30 seconds. Percentage of HiveQL statements that are executed successfully = Numberof HiveQL statements that are executed successfully by Hive in a specified period/Totalnumber of HiveQL statements that are executed by Hive. This indicator can be viewed on theHive service monitoring page. This alarm is generated when the percentage of the HiveQLstatements that are executed successfully exceeds the specified threshold (90% by default).The name of the host where the alarm generated can be obtained from location information ofthe alarm. The host IP address is the IP address of the HiveServer node.

To change the threshold, choose System > Configure Alarm Threshold > Service > Hive >Percentage of HQL Statements Successfully Executed by Hive.

This alarm is cleared when the percentage of the HiveQL statements that are executedsuccessfully in a test period is less than or equal to the threshold.

Attribute


16002 Major Yes

Parameters







Issue 01 (2017-02-20) 157


The system configuration and performance cannot meet service processing requirements.

Possible Causesl A syntax error occurs in HiveQL commands.l The HBase service is abnormal when a Hive on HBase task is being performed.l The Spark service is abnormal when a Hive on Spark task is being performed.l Basic services that are depended on are abnormal, such as HDFS, Yarn, and ZooKeeper.

Procedure

Step 1 Check whether the HiveQL commands comply with syntax.

1. Use the Hive client to log in to the HiveServer node where an alarm is generated. Querythe HiveQL syntax standard provided by Apache, and check whether the HiveQLcommands are correct. For details, see https://cwiki.apache.org/confluence/display/hive/languagemanual.– If yes, go to Step 2.1.– If no, go to Step 1.2.

NOTE

To view the user who runs an incorrect statement, download HiveServerAudit logs of theHiveServer node where this alarm is generated. Set Start Time and End Time to 10 minutesbefore and after the alarm generation time respectively. Open the log file and search for theResult=FAIL keyword to filter the log information about the incorrect statement, and then viewthe user who runs the incorrect statement according to UserName in the log information.

2. Enter correct HiveQL statements, and check whether the command can be properlyexecuted.– If yes, go to Step 4.5.– If no, go to Step 2.1.

Step 2 Check whether the HBase service is abnormal.

1. Check whether a Hive on HBase task is performed.– If yes, go to Step 2.2.– If no, go to Step 3.1.

2. Check whether the HBase service is in normal state in the service list.– If yes, go to Step 3.1.– If no, go to Step 2.3.

3. Check the alarms displayed on the alarm page and clear them according to Alarm Help.4. Enter correct HiveQL statements, and check whether the command can be properly

executed.– If yes, go to Step 4.5.– If no, go to Step 3.1.

Step 3 Check whether the Spark service is abnormal.

1. Check whether a Hive on Spark task is performed.


Issue 01 (2017-02-20) 158


2. Check whether the Spark service is in normal state in the service list.– If yes, go to Step 4.1.– If no, go to Step 3.3.


executed.– If yes, go to Step 4.5.– If no, go to Step 4.1.

Step 4 Check whether HDFS, Yarn, and ZooKeeper are in normal state.

1. On the MRS Manager portal, click Services.2. In the service list, check whether the services, such as HDFS, Yarn, and ZooKeeper are

in normal state.– If yes, go to Step 4.5.– If no, go to Step 4.3.


executed.– If yes, go to Step 4.5.– If no, go to Step 5.





----End


5.6.42 ALM-16004 Hive Service Unavailable

DescriptionThe system checks the Hive service status every 60 seconds. This alarm is generated when theHive service is unavailable.

This alarm is cleared when the Hive service is in normal state.


Issue 01 (2017-02-20) 159

Attribute


16004 Critical Yes

Parameters






The system cannot provide data loading, query, and extraction services.

Possible Causesl The Hive service unavailability may be related to basic services, such as ZooKeeper,

HDFS, Yarn, and DBService or caused by the faults of the Hive processes.– The ZooKeeper service is abnormal.– The HDFS service is abnormal.– The Yarn service is abnormal.– The DBService service is abnormal.– The Hive service process is faulty. If the alarm is caused by a Hive process fault,

the alarm report has a delay of about 5 minutes.l The network communication between the Hive service and basic services is interrupted.

Procedure

Step 1 Check the HiveServer/MetaStore process status.

1. On the MRS Manager portal, click Service > Hive > Instance. In the Hive instance list,check whether all HiveSserver/MetaStore instances are in Unknown state.– If yes, go to Step 1.2.– If no, go to Step 2.

2. Above the Hive instance list, choose choose More > Restart Instance to restart theHiveServer/MetaStore process.

3. In the alarm list, check whether the alarm ALM-16004 Hive Service Unavailable iscleared.


Issue 01 (2017-02-20) 160



1. In the alarm list on MRS Manager, check whether the alarm ALM-12007 Process Faultis generated.– If yes, go to Step 2.2.– If no, go to Step 3.

2. In the Alarm Details area of ALM-12007 Process Fault, check whether ServiceName isZooKeeper.– If yes, go to Step 2.3.– If no, go to Step 3.

3. Rectify the fault by following the steps provided in ALM-12007 Process Fault.4. In the alarm list, check whether the alarm ALM-16004 Hive Service Unavailable is

cleared.– If yes, no further action is required.

n If no, go to Step 3.

Step 3 Check the HDFS service status.

1. In the alarm list on MRS Manager, check whether the alarm ALM-14000 HDFS ServiceUnavailable is generated.– If yes, go to Step 3.2.– If no, go to Step 4.

2. Rectify the fault by following the steps provided in ALM-14000 HDFS ServiceUnavailable.

3. In the alarm list, check whether the alarm ALM-16004 Hive Service Unavailable iscleared.– If yes, no further action is required.– If no, go to Step 4.

Step 4 Check the Yarn service status.

1. In the alarm list on MRS Manager, check whether the alarm ALM-18000 Yarn ServiceUnavailable is generated.– If yes, go to Step 4.2.– If no, go to Step 4.

2. Rectify the fault by following the steps provided in ALM-18000 Yarn ServiceUnavailable.

3. In the alarm list, check whether the alarm ALM-16004 Hive Service Unavailable iscleared.– If yes, no further action is required.– If no, go to Step 4.

Step 5 Check the DBService service status.

1. In the alarm list on MRS Manager, check whether the alarm ALM-27001 DBServiceUnavailable is generated.


Issue 01 (2017-02-20) 161



2. Rectify the fault by following the steps provided in ALM-27001 DBServiceUnavailable.




Step 6 Check the network connection between the Hive and ZooKeeper, HDFS, Yarn, andDBService.

1. On the MRS Manager portal, choose Service > Hive.

2. Click Instance.

The HiveServer instance list is displayed.

3. Click Host Name in the row of HiveServer.

The HiveServer host status page is displayed.

4. Record the IP address under Summary.

5. Use the IP address obtained in Step 6.4 to log in to the host that runs HiveServer.

6. Run the ping command to check whether the network connection between the host thatruns HiveServer and the hosts that run the ZooKeeper, HDFS, Yarn, and DBServiceservices is in normal state. Methods of obtaining IP addresses of the hosts that runZooKeeper, HDFS, Yarn, and DBService services as well as the HiveServer IP addressare the same.



7. Contact O&M personnel of the public cloud to recover the network.







Telephone:

Germany: 0800 330 44 44


----End

Related Information

N/A


Issue 01 (2017-02-20) 162

5.6.43 ALM-18000 Yarn Service Unavailable

Description

The alarm module checks the Yarn service status every 60 seconds. This alarm is generatedwhen the Yarn service is unavailable.

This alarm is cleared when the Yarn service recovers.

Attribute


18000 Critical Yes

Parameters






The cluster cannot provide the Yarn service. Users cannot run new applications. Submittedapplications cannot be run.

Possible Causesl The ZooKeeper service is abnormal.l The HDFS service is abnormal.l There is no active ResourceManager node in the Yarn cluster.l All NodeManager nodes in the Yarn cluster are abnormal.

Procedure


1. In the alarm list on MRS Manager, check whether the alarm ALM-13000 ZooKeeperService Unavailable is generated.– If yes, go to Step 1.2.– If no, go to Step 2.1.


Issue 01 (2017-02-20) 163

2. Rectify the fault by following the steps provided in ALM-13000 ZooKeeper ServiceUnavailable, and check whether the alarm is cleared.– If yes, no further action is required.– If no, go to Step 2.1.


1. In the alarm list on MRS Manager, check whether an alarm related to HDFS isgenerated.– If yes, go to Step 2.2.– If no, go to Step 3.1.

2. Click Alarm, and handle HDFS alarms according to Alarm Help. Check whether thealarm is cleared.– If yes, no further action is required.– If no, go to Step 3.1.

Step 3 Check the ResourceManager node status in the Yarn cluster.

1. On the MRS Manager portal, choose Service > Yarn.2. In Yarn Summary, check whether there is an active ResourceManager node in the Yarn

cluster.– If yes, go to Step 4.1.– If no, go to Step 5.

Step 4 Check the NodeManager node status in the Yarn cluster.

1. On the MRS Manager portal, choose Service > Yarn > Instance.2. Check Health Status of NodeManager, and check whether there are unhealthy nodes.


3. Rectify the fault by following the steps provided in ALM-18002 NodeManagerHeartbeat Lost or ALM-18003 NodeManager Unhealthy. After the fault is rectified,check whether the alarm is cleared.– If yes, no further action is required.– If no, go to Step 5.




----End

Related Information

N/A


Issue 01 (2017-02-20) 164

5.6.44 ALM-18002 NodeManager Heartbeat Lost

Description

The system checks the number of lost NodeManager nodes every 30 seconds, and comparesthe number of lost nodes with the threshold. The Lost Nodes indicator has a default threshold.This alarm is generated when the value of the Lost Nodes indicator exceeds the threshold.

To change the threshold, choose System > Configure Alarm Threshold > Service > Yarn.

This alarm is cleared when the value of Lost Nodes is less than or equal to the threshold.

Attribute


18002 Major Yes

Parameters






Impact on the Systeml The lost NodeManager node cannot provide the Yarn service.

l The number of containers decreases, so the cluster performance deteriorates.

Possible Causesl NodeManager is forcibly deleted without decommission.

l All NodeManager instances are stopped or the NodeManager process is faulty.

l The host where the NodeManager node resides is faulty.

l The network between the NodeManager and ResourceManager is disconnected or busy.


Issue 01 (2017-02-20) 165

Procedure




----End

Related Information

N/A

5.6.45 ALM-18003 NodeManager Unhealthy

Description

The system checks the number of abnormal NodeManager nodes every 30 seconds, andcompares the number of abnormal nodes with the threshold. The Unhealthy Nodes indicatorhas a default threshold. This alarm is generated when the value of the Unhealthy Nodesindicator exceeds the threshold.


This alarm is cleared when the value of Unhealthy Nodes is less than or equal to thethreshold.


18003 Major Yes







Issue 01 (2017-02-20) 166

Impact on the Systeml The faulty NodeManager node cannot provide the Yarn service.

l The number of containers decreases, so the cluster performance deteriorates.

Possible Causesl The hard disk space of the host where the NodeManager node resides is insufficient.

l User omm does not have the permission to access a local directory on the NodeManagernode.

Procedure




Telephone:

Germany: 0800 330 44 44


----End

Related Information

N/A

5.6.46 ALM-18006 MapReduce Job Execution Timeout

Description

The alarm module checks the MapReduce job execution every 60 seconds. This alarm isgenerated when the execution of a submitted MapReduce job times out.


This alarm must be manually cleared.

Attribute


18006 Major No


Issue 01 (2017-02-20) 167







Execution of the submitted MapReduce job times out, so no execution result can be obtained.Execute the job again after rectifying the fault.

Possible Causes

MapReduce job execution requires a long time, but the specified time period is shorter thanthe execution time.

Procedure

Step 1 Check whether time is improperly set.

On the client, execute the MapReduce job again. Set -Dapplication.timeout.interval to alarger value, or do not set the parameter. Check whether the MapReduce job can be executed.l If yes, go to Step 2.4.l If no, go to Step 2.1.

Step 2 Check the Yarn service status.

1. In the alarm list on MRS Manager, check whether the alarm ALM-18000 Yarn ServiceUnavailable is generated.– If yes, go to Step 2.2.– If no, go to Step 3.

2. Rectify the fault by following the steps provided in ALM-18000 Yarn ServiceUnavailable.

3. Run the MapReduce job command again to check whether the MapReduce job can beexecuted.– If yes, go to Step 2.4.– If no, go to Step 4.

4. In the alarm list, choose Operation > to manually clear the alarm. No further actionis required.


Issue 01 (2017-02-20) 168

Step 3 Adjust the timeout threshold.

On the MRS Manager portal, choose System > Configure Alarm Threshold > Service >Yarn > Timed Out Tasks, and increase the maximum number of timeout tasks allowed bythe current threshold rule. Check whether the alarm is cleared.l If yes, no further action is required.l If no, go to Step 4.




----End

Related Information

N/A

5.6.47 ALM-19000 HBase Service Unavailable

Description

The alarm module checks the HBase service status every 60 seconds. This alarm is generatedwhen the HBase service is unavailable.

This alarm is cleared when the HBase service recovers.

Attribute


19000 Critical Yes

Parameters






Issue 01 (2017-02-20) 169


Operations cannot be performed, such as reading or writing data and creating tables.

Possible Causesl The ZooKeeper service is abnormal.l The HDFS service is abnormal.l The HBase service is abnormal.l The network is abnormal.

Procedure


1. In the service list on MRS Manager, check whether the health status of ZooKeeper isGood.– If yes, go to Step 2.1.– If no, go to Step 1.2.

2. In the alarm list, check whether the alarm ALM-13000 ZooKeeper Service Unavailableexists.– If yes, go to Step 1.3.– If no, go to Step 2.1.

3. Rectify the fault by following the steps provided in ALM-13000 ZooKeeper ServiceUnavailable.

4. Wait several minutes and check whether the alarm is cleared.– If yes, no further action is required.– If no, go to Step 2.1.


1. In the alarm list, check whether the alarm ALM-14000 HDFS Service Unavailableexists.– If yes, go to Step 2.2.– If no, go to Step 3.

2. Rectify the fault by following the steps provided in ALM-14000 HDFS ServiceUnavailable.

3. Wait several minutes and check whether the alarm is cleared.




----End


Issue 01 (2017-02-20) 170

Related Information

N/A

5.6.48 ALM-19006 HBase Replication Sync Failed

Description

This alarm is generated when disaster recovery (DR) data fails to be synchronized to astandby cluster.

This alarm is cleared when DR data synchronization succeeds.

Attribute


19006 Major Yes

Parameters






HBase data in a cluster fails to be synchronized to the standby cluster, causing datainconsistency between active and standby clusters.

Possible Causesl The HBase service on the standby cluster is abnormal.l The network is abnormal.

Procedure

Step 1 Check whether the alarm is automatically cleared.

1. Log in to the MRS Manager portal of the active cluster, and click Alarm.2. In the alarm list, click the alarm to obtain alarm generation time from Generated On in

Alarm Details. Check whether the alarm has existed for over 5 minutes.


Issue 01 (2017-02-20) 171


3. Wait 5 minutes and check whether the alarm is automatically cleared.– If yes, no further action is required.– If no, go to Step 2.1.

Step 2 Check the HBase service status of the standby cluster.

1. Log in to the MRS Manager portal of the active cluster, and click Alarm.2. In the alarm list, click the alarm and obtain HostName from Location in Alarm Details.3. Log in to the node where the HBase client resides in the active cluster. Run the following

command to switch the user:sudo su - rootsu - omm

4. Run the status 'replication', 'source' command to check the replication synchronizationstatus of the faulty node.The replication synchronization status of a node is as follows.10-10-10-153: SOURCE: PeerID=abc, SizeOfLogQueue=0, ShippedBatches=2, ShippedOps=2, ShippedBytes=320, LogReadInBytes=1636, LogEditsRead=5, LogEditsFiltered=3, SizeOfLogToReplicate=0, TimeForLogToReplicate=0, ShippedHFiles=0, SizeOfHFileRefsQueue=0, AgeOfLastShippedOp=0, TimeStampsOfLastShippedOp=Mon Jul 18 09:53:28 CST 2016, Replication Lag=0, FailedReplicationAttempts=0 SURCE: PeerID=abc1, SizeOfLogQueue=0, ShippedBatches=1, ShippedOps=1, ShippedBytes=160, LogReadInBytes=1636, LogEditsRead=5, LogEditsFiltered=3, SizeOfLogToReplicate=0, TimeForLogToReplicate=0, ShippedHFiles=0, SizeOfHFileRefsQueue=0, AgeOfLastShippedOp=16788, TimeStampsOfLastShippedOp=Sat Jul 16 13:19:00 CST 2016, Replication Lag=16788, FailedReplicationAttempts=5

5. Obtain PeerID corresponding to a record whose FailedReplicationAttempts value isgreater than 0.In the preceding step, data on the faulty node 10-10-10-153 fails to be synchronized to astandby cluster whose PeerID is abc1.

6. Run the list_peers command to find the cluster and the HBase instance corresponding tothe PeerID.PEER_ID CLUSTER_KEY STATE TABLE_CFS abc1 10.10.10.110,10.10.10.119,10.10.10.133:24002:/hbase2 ENABLED abc 10.10.10.110,10.10.10.119,10.10.10.133:24002:/hbase ENABLED

In the preceding information, /hbase2 indicates that data is synchronized to the HBase2instance of the standby cluster.

7. In the service list on MRS Manager of the standby cluster, check whether the healthstatus of the HBase instance obtained in Step 2.6 is Good.– If yes, go to Step 3.1.– If no, go to Step 2.8.

8. In the alarm list, check whether the alarm ALM-19000 HBase Service Unavailableexists.– If yes, go to Step 2.9.– If no, go to Step 3.1.

9. Rectify the fault by following the steps provided in ALM-19000 HBase ServiceUnavailable.


Issue 01 (2017-02-20) 172

10. Wait several minutes and check whether the alarm is cleared.– If yes, no further action is required.– If no, go to Step 3.1.

Step 3 Check network connections between RegionServers on active and standby clusters.

1. Log in to the MRS Manager portal of the active cluster, and click Alarm.2. In the alarm list, click the alarm and obtain HostName from Location in Alarm Details.3. Log in to the faulty RegionServer node.4. Run the ping command to check whether network connections between the faulty

RegionServer node and the host where RegionServer of the standby cluster resides are innormal state.– If yes, go to Step 4.– If no, go to Step 3.5.

5. Contact O&M personnel of the public cloud to recover the network.6. After the network recovers, check whether the alarm is cleared in the alarm list.





----End

Related Information

N/A

5.6.49 ALM-25000 LdapServer Service Unavailable

Description

The system checks the LdapServer service status every 30 seconds. This alarm is generatedwhen the system detects that both the active and standby LdapServer services are abnormal.

This alarm is cleared when the system detects that one or two LdapServer services are normal.

Attribute


25000 Critical Yes


Issue 01 (2017-02-20) 173






When this alarm is generated, no operation can be performed for the KrbServer users andLdapServer users in the cluster. For example, users, user groups, or roles cannot be added,deleted, or modified, and user passwords cannot be changed on the MRS Manager portal. Theauthentication for existing users in the cluster is not affected.

Possible Causesl The node where the LdapServer service locates is faulty.l The LdapServer process is abnormal.

Procedure

Step 1 Check whether the nodes where the two SlapdServer instances of the LdapServer servicelocate are faulty.

1. On the MRS Manager portal, choose Service > LdapServer > Instance to go to theLdapServer instance page to obtain the host name of the node where the two SlapdServerinstances locate.

2. On the Alarm page of MRS Manager, check whether the alarm ALM-12006 Node Faultis generated.– If yes, go to Step 1.3.– If no, go to Step 2.1.

3. Check whether the host name in the alarm is consistent with the host name in Step 1.1.– If yes, go to Step 1.4.– If no, go to Step 2.1.

4. Rectify the fault by following the steps provided in ALM-12006 Node Fault.5. In the alarm list, check whether the alarm ALM-25000 LdapServer Service Unavailable

is cleared.– If yes, no further action is required.– If no, go to Step 3.

Step 2 Check whether the LdapServer process is in normal state.

1. On the Alarm page of MRS Manager, check whether the alarm ALM-12007 ProcessFault is generated.


Issue 01 (2017-02-20) 174


2. Check whether the service name and host name in the alarm are consistent with theLdapServer service and host names.– If yes, go to Step 2.3.– If no, go to Step 3.

3. Rectify the fault by following the steps provided in ALM-12007 Process Fault.4. In the alarm list, check whether the alarm ALM-25000 LdapServer Service Unavailable

is cleared.– If yes, no further action is required.– If no, go to Step 3.




----End

Related Information

N/A

5.6.50 ALM-25004 Abnormal LdapServer Data Synchronization

Description

This alarm is generated when LdapServer data on Manager is inconsistent. This alarm iscleared when the data becomes consistent.

This alarm is generated when LdapServer data in the cluster is inconsistent with LdapServerdata on Manager. This alarm is cleared when the data becomes consistent.

Attribute


25004 Critical Yes

Parameters




Issue 01 (2017-02-20) 175





LdapServer data inconsistency occurs because LdapServer data on Manager or in the clusteris damaged. The LdapServer process with damaged data cannot provide services externally,and the authentication functions of Manager and the cluster are affected.

Possible Causesl The network of the node where the LdapServer process locates is faulty.l The LdapServer process is abnormal.l The OS restart damages data on LdapServer.

Procedure

Step 1 Check whether the network where the LdapServer nodes reside is faulty.

1. On the MRS Manager portal, click Alarm. Record the IP address of HostName inLocation of the alarm as IP1 (if multiple alarms exist, record the IP addresses as IP1,IP2, and IP3 respectively).

2. Contact O&M personnel and log in to the node corresponding to IP1. Run the pingcommand on the node to check whether the IP address of the management plane of theactive OMS node can be pinged.– If yes, go to Step 1.3.– If no, go to Step 2.1.

3. Contact O&M personnel of the public cloud to recover the network and check whetherthe alarm ALM-25004 Abnormal LdapServer Data Synchronization is cleared.– If yes, no further action is required.– If no, go to Step 2.1.

Step 2 Check whether the LdapServer process is in normal state.

1. On the Alarm page of MRS Manager, check whether the alarm ALM-12004 OLdapResource Is Abnormal is generated.– If yes, go to Step 2.2.– If no, go to Step 2.4.

2. Rectify the fault by following the steps provided in ALM-12004 OLdap Resource IsAbnormal.

3. Check whether the alarm ALM-25004 Abnormal LdapServer Data Synchronization iscleared.– If yes, no further action is required.


Issue 01 (2017-02-20) 176

– If no, go to Step 2.4.4. On the Alarm page of MRS Manager, check whether the alarm ALM-12007 Process

Fault of LdapServer is generated.– If yes, go to Step 2.5.– If no, go to Step 3.1.

5. Rectify the fault by following the steps provided in ALM-12007 Process Fault.6. Check whether the alarm ALM-25004 Abnormal LdapServer Data Synchronization is

cleared.– If yes, no further action is required.– If no, go to Step 3.1.

Step 3 Check whether the OS restart damages data on LdapServer.

1. On the MRS Manager portal, click Alarm. Record the IP address of HostName inLocation of the alarm as IP1 (if multiple alarms exist, record the IP addresses as IP1,IP2, and IP3 respectively). Choose Service > LdapServer > Service Configuration,record the LdapServer port number as PORT. If the IP address in the alarm locationinformation is the IP address of the standby OMS node, the port ID is the default port ID21750.

2. Log in to the nodes corresponding to IP1 as user omm and run the ldapsearch -Hldaps://IP1:PORT -x -LLL -b dc=hadoop,dc=com command to check whether errorsare displayed in the queried information. If the IP address is the IP address of the standbyOMS node, run export LDAPCONF=${CONTROLLER_HOME}/ldapserver/ldapserver/local/conf/ldap.conf before running the command.– If yes, go to Step 3.3.– If no, go to Step 4.

3. Recover the LdapServer and OMS nodes using backup data before the alarm isgenerated. For details, see section "Recovering Manager Data" in the AdministratorGuide.

NOTE

Use the OMS data and LdapServer data backed up at the same time to restore data. Otherwise, theservice and operation may fail. To recover data when services run properly, you are advised tomanually back up the latest management data and then recover the data. Otherwise, Manager dataproduced between the backup point in time and the recovery point in time will be lost.

4. Check whether the alarm ALM-25004 Abnormal LdapServer Data Synchronization iscleared.– If yes, no further action is required.– If no, go to Step 4.




----End


Issue 01 (2017-02-20) 177

Related Information

N/A

5.6.51 ALM-25500 KrbServer Service Unavailable

Description

The system checks the KrbServer service status every 30 seconds. This alarm is generatedwhen the KrbServer service is abnormal.

This alarm is cleared when the KrbServer service is in normal state.

Attribute


25500 Critical Yes

Parameters






When this alarm is generated, no operation can be performed for the KrbServer component inthe cluster. The authentication of KrbServer in other components will be affected. The healthstatus of components that depend on KrbServer in the cluster is Bad.

Possible Causesl The node where the KrbServer service locates is faulty.l The OLdap service is unavailable.

Procedure

Step 1 Check whether the node where the KrbServer service locates is faulty.

1. On the MRS Manager portal, choose Service > KrbServer > Instance to go to theKrbServer instance page to obtain the host name of the node where the KrbServerservice locates.


Issue 01 (2017-02-20) 178

2. On the Alarm page of MRS Manager, check whether the alarm ALM-12006 Node Faultis generated.– If yes, go to Step 1.3.– If no, go to Step 2.1.

3. Check whether the host name in the alarm is consistent with the host name in Step 1.1.– If yes, go to Step 1.4.– If no, go to Step 2.1.

4. Rectify the fault by following the steps provided in ALM-12006 Node Fault.5. In the alarm list, check whether the alarm ALM-25500 KrbServer Service Unavailable is

cleared.– If yes, no further action is required.– If no, go to Step 3.

Step 2 Check whether the OLdap service is unavailable.

1. On the Alarm page of MRS Manager, check whether the alarm ALM-12004 OLdapResource Is Abnormal is generated.– If yes, go to Step 2.2.– If no, go to Step 3.

2. Rectify the fault by following the steps provided in ALM-12004 OLdap Resource IsAbnormal.

3. In the alarm list, check whether the alarm ALM-25500 KrbServer Service Unavailable iscleared.– If yes, no further action is required.– If no, go to Step 3.




----End

Related Information

N/A

5.6.52 ALM-27001 DBService Unavailable

Description

The alarm module checks the DBService status every 30 seconds. This alarm is generatedwhen the system detects that DBService is unavailable.

This alarm is cleared when DBService recovers.


Issue 01 (2017-02-20) 179


27001 Critical Yes






The database service is unavailable and cannot provide data import and query functions forupper-layer services, which results in service exceptions.

Possible Causesl The floating IP address does not exist.l There is no active DBServer instance.l The active and standby DBServer processes are abnormal.

Procedure

Step 1 Check whether the floating IP address exists in the cluster environment.

1. On the MRS Manager portal, choose Service > DBService > Instance.2. Check whether the active instance exists.


3. Select the active DBServer instance and record the IP address.4. Log in to the host that corresponds to the preceding IP address, and run the ifconfig

command to check whether the DBService floating IP address exists on the node.– If yes, go to Step 1.5.– If no, go to Step 2.1.

5. Run the ping floating IPaddress command to check whether the DBService floating IPaddress can be pinged.– If yes, go to Step 1.6.– If no, go to Step 2.1.


Issue 01 (2017-02-20) 180

6. Log in to the host that corresponds to the DBService floating IP address, and run theifconfig interface down command to delete the floating IP address.

7. On the MRS Manager portal, choose Service > DBService > More > Restart Service torestart DBService, and check whether DBService is restarted successfully.– If yes, go to Step 1.8.– If no, go to Step 2.1.

8. Wait about 2 minutes and check whether the alarm is cleared from the alarm list.– If yes, no further action is required.– If no, go to Step 3.1.

Step 2 Check the status of the active DBServer instance.

1. Select the DBServer instance whose role status is abnormal and record the IP address.2. On the Alarm page, check whether the alarm ALM-12007 Process Fault occurs in the

DBServer instance on the host that corresponds to the IP address.– If yes, go to Step 2.3.– If no, go to Step 4.

3. Follow procedures in ALM-12007 Process Fault to handle the alarm.4. Wait about 5 minutes and check whether the alarm is cleared from the alarm list.


Step 3 Check the status of the active and standby DBServers.

1. Log in to the host that corresponds to the DBService floating IP address, and run thesudo su - root and su - omm command to switch to user omm. Run the cd ${BIGDATA_HOME}/FusionInsight/dbservice/ command to go to the installationdirectory of the DBService.

2. Run the sh sbin/status-dbserver.sh command to view the status of the active andstandby HA processes of DBService. Check whether the status can be viewedsuccessfully.– If yes, go to Step 3.3.– If no, go to Step 4.

3. Check whether the active and standby HA processes are abnormal.– If yes, go to Step 3.4.– If no, go to Step 4.

4. On the MRS Manager portal, choose Service > DBService > More > Restart Service torestart DBService, and check whether DBService is restarted successfully.– If yes, go to Step 3.5.– If no, go to Step 4.

5. Wait about 2 minutes and check whether the alarm is cleared from the alarm list.– If yes, no further action is required.– If no, go to Step 4.




Issue 01 (2017-02-20) 181


Telephone:

Germany: 0800 330 44 44


----End

Related Information

N/A

5.6.53 ALM-27003 DBService Heartbeat Interruption Between theActive and Standby Nodes

Description

This alarm is generated when the active or standby DBService node does not receiveheartbeat messages from the peer node.

This alarm is cleared when the heartbeat recovers.

Attribute


27003 Major Yes

Parameters





Local DBService HA Name Specifies a local DBService HA.

Peer DBService HA Name Specifies a peer DBService HA.


During the DBService heartbeat interruption, only one node can provide the service. If thisnode is faulty, no standby node is available for failover and the service is unavailable.


Issue 01 (2017-02-20) 182

Possible Causes

The link between the active and standby DBService nodes is abnormal.

Procedure

Step 1 Check whether the network between the active and standby DBService servers is in normalstate.

1. In the alarm list on MRS Manager, locate the row that contains the alarm, and view theIP address of the standby DBService server in the alarm details.

2. Log in to the active DBService server.

3. Run the ping heartbeat IP address of the standby DBService command to check whetherthe standby DBService server is reachable.



4. Contact the network administrator to check whether the network is faulty.



5. Rectify the network fault and check whether the alarm is cleared from the alarm list.






Telephone:

Germany: 0800 330 44 44


----End

Related Information

N/A

5.6.54 ALM-27004 Data Inconsistency Between Active andStandby DBServices

Description

The system checks the data synchronization status between the active and standbyDBServices every 10 seconds. This alarm is generated when the synchronization status cannotbe queried for six consecutive times or when the synchronization status is abnormal.

This alarm is cleared when the synchronization is in normal state.


Issue 01 (2017-02-20) 183


27004 Critical Yes





Local DBService HA Name Specifies a local DBService HA.

Peer DBService HA Name Specifies a peer DBService HA.

SYNC_PERSENT Specifies the synchronization percentage.


When data is not synchronized between the active and standby DBServices, the data may belost or abnormal if the active instance becomes abnormal.

Possible Causesl The network between the active and standby nodes is unstable.l The standby DBService is abnormal.l The disk space of the standby node is full.

Procedure

Step 1 Check whether the network between the active and standby nodes is in normal state.

1. Log in to MRS Manager, click Alarm, click the row where the alarm is located in thealarm list, and view the IP address of the standby DBService in the alarm details.

2. Log in to the active DBService node.3. Run the ping heartbeat IP address of the standby DBService command to check whether

the standby DBService node is reachable.– If yes, go to Step 2.1.– If no, go to Step 1.4.

4. Contact the O&M personnel of the public cloud to check whether the network is faulty.– If yes, go to Step 1.5.


Issue 01 (2017-02-20) 184

– If no, go to Step 2.1.5. Rectify the network fault and check whether the alarm is cleared from the alarm list.


Step 2 Check whether the standby DBService is in normal state.

1. Log in to the standby DBService node.2. Run the following command to switch the user:

sudo su - rootsu - omm

3. Go to the ${DBSERVER_HOME}/sbin directory and run the ./status-dbserver.shcommand to check whether the GaussDB resource status of the standby DBService is innormal state. In the command output, check whether the following information isdisplayed in the row where ResName is gaussDB:For example:10_10_10_231 gaussDB Standby_normal Normal Active_standby– If yes, go to Step 3.1.– If no, go to Step 4.

Step 3 Check whether the disk space of the standby node is insufficient.

1. Use PuTTY to log in to the standby DBservice node as user root.2. Run the su - omm command to switch to user omm.3. Go to the ${DBSERVER_HOME} directory, and run the following commands to obtain

the DBService data directory:cd ${DBSERVER_HOME}source .dbservice_profileecho ${DBSERVICE_DATA_DIR}

4. Run the df -h command to check the system disk partition usage.5. Check whether the DBService data directory space is full.


6. Perform upgrade and expand capacity.7. After capacity expansion, wait 2 minutes and check whether the alarm is cleared.





----End


Issue 01 (2017-02-20) 185

Related Information

N/A

5.6.55 ALM-28001 Spark Service Unavailable

Description

The system checks the Spark service status every 60 seconds. This alarm is generated whenthe Spark service is unavailable.

This alarm is cleared when the Spark service recovers.

Attribute


28001 Critical Yes

Parameters






The Spark tasks submitted by users fail to be executed.

Possible Causesl The KrbServer service is abnormal.l The LdapServer service is abnormal.l The ZooKeeper service is abnormal.l The HDFS service is abnormal.l The Yarn service is abnormal.l The corresponding Hive service is abnormal.

Procedure

Step 1 Check whether service unavailability alarms exist in services depended by Spark.


Issue 01 (2017-02-20) 186

1. On the MRS Manager portal, click Alarm.

2. Check whether the following alarms exist in the alarm list:

a. ALM-25500 KrbServer Service Unavailable

b. ALM-25000 LdapServer Service Unavailable

c. ALM-13000 ZooKeeper Service Unavailable

d. ALM-14000 HDFS Service Unavailable

e. ALM-18000 Yarn Service Unavailable

f. ALM-16004 Hive Service Unavailable



3. Handle the alarms based on the troubleshooting methods provided in the alarm help.

After all the alarms are cleared, wait a few minutes and check whether this alarm iscleared.






Telephone:

Germany: 0800 330 44 44


----End

Related Information

N/A

5.7 Object Management

5.7.1 IntroductionAn MRS cluster contains different types of basic objects. Table 5-15 describes these objects.

Table 5-15 MRS basic objects overview

Object Description Example

Service Function set that can complete specificbusiness.

Example: KrbServer service andLdapServer service

Serviceinstance

Specific instance of a service, usuallycalled service.

Example: KrbServer service


Issue 01 (2017-02-20) 187

Object Description Example

Servicerole

Function entity that forms a completeservice, usually called role.

Example: KrbServer consists of theKerberosAdmin role andKerberosServer role.

Roleinstance

Specific instance of a service rolerunning on a host.

Example: KerberosAdmin that isrunning on Host2 andKerberosServer that is running onHost3

Host Elastic Cloud Server (ECS) that runs aLinux OS.

Example: Host1 to Host5

Rack Physical entity that contains multiplehosts connecting to the same switch.

Example: Rack1 contains Host1 toHost5.

Cluster Logical entity that consists of multiplehosts and provides various services.

Example: Cluster1 cluster consistsof five hosts (Host1 to Host5) andprovides services such asKrbServer and LdapServer.

5.7.2 Querying Configurations

Scenario

On MRS Manager, users can query the configurations of services (including roles) and roleinstances.

Procedurel Query service configurations.

a. On MRS Manager, click Service.b. Select the target service from the service list.c. Click Service Configuration.d. Set Type to All. All configuration parameters of the service are displayed in the

navigation tree. The root nodes in the navigation tree represent the service namesand role names.

e. In the navigation tree, choose a parameter and change its value. You can also enterthe parameter name in Search to search for the parameter and view the result.The parameters under the service nodes and role nodes are service configurationparameters and role configuration parameters respectively.

l Query role instance configurations.

a. On MRS Manager, click Service.b. Select the target service from the service list.c. Click the Instance tab.d. Click the target role instance from the role instance list.


Issue 01 (2017-02-20) 188

e. Click Instance Configuration.

f. Set Type to All. All configuration parameters of the service are displayed in thenavigation tree. The root nodes in the navigation tree represent the service namesand role names.

g. In the navigation tree, choose a parameter and change its value. You can also enterthe parameter name in Search to search for the parameter and view the result.

5.7.3 Managing Services

Scenario

On MRS Manager, users can perform the following operations:

l Start a service that is in the Stopped, Stopped_Failed, or Start_Failed state.

l Stop the services or stop abnormal services.

l Restart abnormal services or configure expired services to restore or enable the services.

Procedure

Step 1 On MRS Manager, click Service.

Step 2 Locate the row that contains the target service, click , , and to start, stop, or restartthe service.

Services are interrelated. If a service is started, stopped, or restarted, services dependent on itwill be affected.

The services will be affected in the following ways:

l If a service is to be started, the lower-layer services dependent on it must be started first.

l If a service is stopped, the upper-layer services dependent on it are unavailable.

l If a service is restarted, the running upper-layer services dependent on it must berestarted.

----End

5.7.4 Configuring Service Parameters

Scenario

On MRS Manager, users can view and modify default service configurations based on siterequirements. Configurations can be imported and exported.

Impact on the Systeml After the attributes of HBase, HDFS, Hive, Spark, Yarn, and MapReduce are configured,

the client configurations need to be downloaded to update the files.

l The parameters of DBService cannot be modified when only one DBService roleinstance exists in the cluster.


Issue 01 (2017-02-20) 189

Procedurel Modify a service.

a. Click Service.

b. Select the target service from the service list.

c. Click the Service Configuration tab.

d. Set Type to All. All configuration parameters of the service are displayed in thenavigation tree. The root nodes in the navigation tree represent the service namesand role names.

e. In the navigation tree, choose a parameter and change its value. You can also enterthe parameter name in Search to search for the parameter and view the result.

You can click to restore a parameter value.

NOTE

You can also use host groups to change role instance configurations in batches. Choose arole name in Role, and then choose <select hosts> in Host. Enter a name in Host GroupName, select the target host from All Hosts and add it to Selected Hosts. Click OK to add itto the host group. The added host group can be selected from Host and is only valid on thecurrent page. The page cannot be saved after being refreshed.

f. Click Save Configuration, select Restart the affected services or instances, andclick OK to restart the service.

Click Finish when the system displays Operation succeeded. The service issuccessfully started.

NOTE

After you update the Yarn service queue configuration but do not restart the service, chooseMore > Refresh the queue for the configuration to take effect.

l Export service configuration parameters.

a. Click Service.

b. Select a service.

c. Click Service Configuration.

d. Click Export Service Configuration. Select a path for saving the configurationfiles.

l Import service configuration parameters.

a. Click Service.

b. Select a service.

c. Click Service Configuration.

d. Click Import Service Configuration.

e. Select the target configuration file.

f. Click Save Configuration, and choose Restart the affected services or instances.Click OK.

In the displayed window, enter the password of the current user and click OK.Operation succeeded will be displayed. Click Finish. The service is startedsuccessfully.


Issue 01 (2017-02-20) 190

5.7.5 Configuring Customized Service Parameters

Scenario

Each component of MRS supports all open source parameters. MRS Manager supports themodification of some parameters for key application scenarios. Some component clients maynot include all parameters with open source features. To modify the component parametersthat are not directly supported by MRS Manager, users can add new parameters forcomponents by using the configuration customization function on MRS Manager. Newlyadded parameters are saved in component configuration files and take effect after restart.

Impact on the Systeml After the service attributes are configured, the service needs to be restarted. The service

cannot be accessed during restart.l After the attributes of HBase, HDFS, Hive, Spark, Yarn, and MapReduce are configured,

the client configurations need to be downloaded to update the files.

Prerequisites

You have learned the meanings of parameters to be added, configuration files to take effect,and impact on components.

Procedure


Step 2 Select the target service from the service list.

Step 3 Click Service Configuration.

Step 4 Set Type to All.

Step 5 In the navigation tree, select Customization. The customized parameters of the currentcomponent are displayed on MRS Manager.

The configuration files that save the newly added customized parameters are displayed inParameter File. Different configuration files may support open source parameters with thesame names. After the parameters in different files are set to different values, theconfiguration effect depends on the sequence of the configuration files that are loaded bycomponents. Service-level and role-level customized parameters are supported. Performconfiguration based on the actual service requirements. Customized parameters for a singlerole instance are not supported.

Step 6 Based on the configuration files and parameter functions, enter parameter names supported bycomponents in Name and enter the parameter values in the Value column of the row wherethe parameters are located.

l You can click or to add or delete a customized parameter. A customized parametercan be deleted only after you click to add the parameter.

l You can click to restore a parameter value.

Step 7 Click Save Configuration, select Restart the affected services or instances, and click OKto restart the service.


Issue 01 (2017-02-20) 191

When Operation succeeded is displayed, click Finish. The service is started successfully.

----End

Task ExampleConfiguring Customized Hive Parameters

Hive depends in HDFS. By default, Hive accesses the client of HDFS. The configurationparameters to take effect are controlled by HDFS in a unified manner. For example, the HDFSparameter ipc.client.rpc.timeout affects the RPC timeout period for all clients to connect tothe HDFS server. If you need to modify the timeout period for Hive to connect to HDFS, youcan use the configuration customization function. After this parameter is added to the core-site.xml file of Hive, this parameter can be identified by the Hive service and replace theHDFS configuration.

Step 1 On MRS Manager, choose Service > Hive > Service Configuration.

Step 2 Set Type to All.

Step 3 In the navigation tree, select Customization of the Hive service level. The service-levelcustomized parameters supported by Hive are displayed on MRS Manager.

Step 4 In the Name: column of thecore.site.customized.configs parameter in core-site.xml, enteripc.client.rpc.timeout, and enter the new parameter value in Value: such as 150000. The unitis millisecond.

Step 5 Click Save Configuration, select Restart the affected services or instances, and click OKto restart the service.

Click Finish when the system displays Operation succeeded. The service is successfullystarted.

----End

5.7.6 Synchronizing Service Configurations

ScenarioIf Configuration Status of all services or some services is Expired or Failed, users cansynchronize configurations for the cluster or service to recover its configuration status. If theconfiguration status of all services in the cluster is Failed, synchronize the clusterconfigurations with the background configurations.

Impact on the SystemAfter synchronizing service configuration, users need to restart the service whoseconfiguration has expired. The service is unavailable during restart.

Procedure



Step 3 Click More in the upper pane, and select Synchronize Configuration from the drop-downlist.


Issue 01 (2017-02-20) 192

Step 4 In the dialog box that is displayed, select Restart services or instances whoseconfigurations have expired, and click OK.

When Operation succeeded is displayed, click Finish. The service is started successfully.

----End

5.7.7 Managing Role Instances

Scenario

Users can start a role instance that is in the Stopped, Stopped_Failed or Start_Failed state,stop an unused or abnormal role instance, or restart an abnormal role instance to recover itsfunctions.

Procedure



Step 3 Click the Instance tab.

Step 4 Select the check box on the left of the target role instance.

Step 5 Choose More > Start Instance, Stop Instance, or Restart Instance to performcorresponding operations.

----End

5.7.8 Configuring Role Instance Parameters

Scenario

View and modify default role instance configuration on MRS Manager. Parameters must beconfigured based on site requirements. And configurations can be imported and exported.


After the attributes of HBase, HDFS, Hive, Spark, Yarn, and MapReduce are configured, theclient configurations need to be downloaded to update the files.

Procedurel Modify role instance configurations.

a. On MRS Manager, click Service.b. Select the target service from the service list.c. Click the Instance tab.d. Click the target role instance from the role instance list.e. Click the Instance Configuration tab.f. Set Type to All. All configuration parameters of the role instances are displayed in

the navigation tree.


Issue 01 (2017-02-20) 193

g. In the navigation tree, choose a parameter and change its value. You can also enterthe parameter name in Search to search for the parameter and view the result.

You can click to restore a parameter value.h. Click Save Configuration, select Restart the role instance and click OK to restart

the role instance.When the system displays Operation succeeded, click Finish. The service isstarted successfully.

l Export configuration data of a role instance.

a. On MRS Manager, click Service.b. Select a service.c. Select a role instance or click Instance.d. Select a role instance on a specified host.e. Click Instance Configuration.f. Click Export Instance Configuration to export the configuration data of a

specified role instance, and choose a path for saving the configuration file.l Import configuration data of a role instance.

a. Click Service.b. Select a service.c. Select a role instance or click Instance.d. Select a role instance on a specified host.e. Click Instance Configuration.f. Click Import Instance Configuration to import configuration data of a specified

role instance.g. Click Save Configuration and select Restart Role Instance. Click OK.

When the system displays Operation succeeded, the operation is successful. ClickFinish. The service is started successfully.

5.7.9 Synchronizing Role Instance Configuration

Scenario

When the Configuration Status of a role instance is Expired or Failed, users cansynchronize the configuration data of the role instance with the background configuration.


After synchronizing a role instance configuration, you need to restart the role instance whoseconfiguration has expired. The role instance is unavailable during restart.

Procedure

Step 1 On MRS Manager, click Service and choose a service name.


Step 3 Click the target role instance from the role instance list.


Issue 01 (2017-02-20) 194

Step 4 Click More in the upper pane, and select Synchronize Configuration from the drop-downlist.

Step 5 In the dialog box that is displayed, select Restart services or instances whoseconfigurations have expired, and click OK to restart a role instance.

When the system displays Operation succeeded, the operation is successful. Click Finish.The service is started successfully.

----End

5.7.10 Decommissioning and Recommissioning Role Instances

Scenario

When a Core node becomes faulty, the cluster status may be abnormal. In an MRS cluster,data can be stored on different Core nodes, and users can decommission the specifiedDataNode role instance of HDFS or the NodeManager role instance of Yarn on MRS Managerto stop the role instance from providing services. After fault rectification, users canrecommission the DataNode or NodeManager role instance.

l When the number of DataNodes is less than or equal to the number of HDFS copies,decommissioning cannot be performed. For example, if the number of HDFS copies isthree and the number of DataNodes is less than four in the system, decommissioningcannot be performed. After the decommissioning is performed for 30 minutes, an errorwill be reported and force the MRS Manager to exit the decommissioning.

l After a role instance is decommissioned, users must recommission and restart it for use.

Procedure


Step 2 In the service list, click HDFS or Yarn.


Step 4 Select the check box before the specified DataNode or NodeManager role instance name.

Step 5 Click More, and select Decommission or Recommission from the drop-down list.

NOTE

If the HDFS is restarted in another browser or window while the instance decommissioning operation isin progress, MRS Manager displays a message indicating that the decommissioning is suspended andOperating Status is Started. However, the instance decommissioning is actually complete in thebackground. You need to decommission the instance again to synchronize the status.

----End

5.7.11 Managing a Host

Scenario

When a host is abnormal or faulty, users need to stop all roles of the host on MRS Manager tocheck the host. After the host fault is rectified, start all roles running on the host to recoverhost services.


Issue 01 (2017-02-20) 195

Procedure


Step 2 Select the check box of the target host.

Step 3 Choose More > Start All Roles or More > Stop All Roles to perform correspondingoperations.

----End

5.7.12 Isolating a Host

Scenario

When detecting that a host is abnormal or faulty and cannot provide services or affects clusterperformance, users can exclude the host from the available nodes in the cluster temporarily sothat the client can access other available nodes. In scenarios where patches are to be installedin a cluster, users can also exclude a specified node from patch installation.

Users can isolate a host manually on MRS Manager based on the actual service requirementsor O&M plan. Only non-management nodes can be isolated.

Impact on the Systeml After a host is isolated, all role instances on the host will be stopped, and you cannot

start, stop, or configure the host and all instances on the host.l After a host is isolated, statistics about the monitoring status and indicator data of the

host hardware and instances on the host cannot be collected or displayed.

Procedure


Step 2 Select the check box of the host to be isolated.

Step 3 Choose More > Isolate Host.

Step 4 In Isolate Host, click OK.

Operation succeeded is displayed. Click Finish. The host is isolated successfully, and thevalue of Operating Status becomes Isolated.

NOTE

The isolation of a host can be canceled and the host can be added to the cluster again. For details, seeCanceling Isolation of a Host.

----End

5.7.13 Canceling Isolation of a Host

Scenario

After the exception or fault of a host is handled, users must cancel the isolation of the host sothat the host can be used properly.


Issue 01 (2017-02-20) 196

Users can cancel the isolation of a host on MRS Manager.

Prerequisitesl The host status is Isolatedl The exception or fault of the host has been rectified.

Procedure


Step 2 Select the check box of the host to be isolated.

Step 3 Choose More > Cancel Host Isolation.

Step 4 In Cancel Host Isolation, click OK.

Operation succeeded is displayed. Click Finish. The host is isolated successfully, and thevalue of Operational Status becomes Normal.

Step 5 Click the name of the host of which the isolation has been canceled. Status of the host isdisplayed. Click Start All Roles.

----End

5.7.14 Starting and Stopping a Cluster

ScenariosA cluster is a collection of service components. Users can perform the following operationson a cluster:

l Start all services in the cluster.l Stop all services in the cluster.

Procedure


Step 2 Click More above the service list and select Start Cluster or Stop Cluster from the drop-down list.

----End

5.7.15 Synchronizing Cluster Configurations

ScenariosIf Configuration Status of all services or some services is Expired or Failed, users cansynchronize configurations to recover the configuration status.

l If the configuration status of all services in the cluster is Failed, synchronize the clusterconfigurations with the background configurations.

l If the configuration status of some services in the cluster is Failed, synchronize thespecified service configurations with the background configurations.


Issue 01 (2017-02-20) 197

Impact on the SystemAfter synchronizing cluster configurations, users need to restart the service whoseconfiguration has expired. The service is unavailable during restart.

Procedure


Step 2 Click More above the service list, and select Synchronize Configuration from the drop-down list.

Step 3 In the dialog box that is displayed, select Restart services or instances whoseconfigurations have expired, and click OK.

After the system displays Operation succeeded, click Finish. The cluster is successfullystarted.

----End

5.7.16 Exporting Configuration Data of a Cluster

ScenariosUsers can export all configuration data of a cluster from the MRS Manager to meet actualservice requirements. The exported file can be used to rapidly update service configurations.

Procedure


Step 2 Click More above the service list, and select Export Cluster Configuration from the drop-down list.

The exported file is used to update service configurations. For details, see Import serviceconfiguration parameters in Configuring Service Parameters.

----End

5.8 Log Management

5.8.1 Viewing and Exporting Audit Logs

ScenarioOn MRS Manager, view and export audit logs for post-event tracing, fault cause locating, andresponsibility classification of security events.

The system records the following log information:

l User activity information, such as user login and logout, system user informationmodification, and system user group information modification

l User operation instruction information, such as cluster startup, shutdown, and softwareupgrade


Issue 01 (2017-02-20) 198

Procedurel View audit logs.

a. Click Audit to view default audit logs.If audit log content contains more than 256 characters, click log file under AuditDetails to download the log file to view complete log information.n By default, audit logs are displayed in descending order by Occurred On. You

can click Operation Type, Severity, Occurred On, User, Host, Service,Instance, or Operation Result to change the display mode.

n You can filter out all alarms of the same severity in Severity, including clearedalarms and uncleared alarms.

Export audit logs, which contain the following information:n Sno: indicates the number of audit logs generated by MRS Manager. The

number increases by 1 when a new audit log is generated.n Operation Type: indicates the type of user operations. User operations are

classified into the following scenarios: Alarm, Auditlog, Backup AndRecovery, Cluster, Collect Log, Host, Service, and Tenant. Each scenariocontains different operation types. For example, Alarm contains Exportalarms, Cluster contains Start Cluster, and Tenant contains Add Tenant.

n Severity: indicates the security level of each audit log, including Critical,Major, Minor, and Notice.

n Start Time: indicates the CET time when a user operation starts.n End Time: indicates the CET time when a user operation ends.n User IP: indicates the IP address used by a user.n User: indicates user admin who performs the operations.n Host: indicates the node where a user operation is performed. The information

is not saved if the operation does not involve a node.n Service: indicates the service on which a user operation is performed. The

information is not saved if the operation does not involve a service.n Instance: indicates the role instance on which a user operation is performed.

The information is not saved if the operation does not involve a role instance.n Operation Result: indicates the user operation result, including Success and

Failed.n Content: indicates execution information of the user operation.

b. Click Advanced Search. In the audit log search area, set search criteria and clickSearch to view audit logs of the specified type. Click Reset to reset search criteria.

c. Export audit logs.

i. In the audit log list, click Export All to export all the logs.ii. In the audit log list, select the check box of a log and click Export to export

the log.

5.8.2 Exporting Services Logs

Scenario

Export logs of each service role from MRS Manager.


Issue 01 (2017-02-20) 199

Prerequisitesl You have obtained the Access Key ID (AK) and Secret Access Key (SK) for the

corresponding account. For details, see the My Credential User Guide. (My Credential> How Do I Manage User Access keys (AK/SK)?)

l You have created a bucket in the Object Storage Service (OBS) system for thecorresponding account. For details, see the Object Storage Service User Guide. (ObjectStorage Service > Quick Start > Common Operations Using OBS Console >Creating a Bucket)

Procedure


Step 2 Click Export Log under Maintenance.

Step 3 Set Service, set Host to the IP address of the host where the service is deployed, and set StartTime and End Time.

Step 4 In OBS Path, specify the path where service logs are stored in the OBS system.

You must specify the full path that starts with /. You do not need to create the path in advance.The system automatically creates the path.

Step 5 In Bucket Name, enter the name of the created OBS bucket. In AK and SK, enter the AccessKey ID and Secret Access Key for the account

Step 6 Click OK to export logs.

----End

5.8.3 Configuring Audit Log Dumping Parameters

Scenario

If audit logs on MRS Manager are stored in the database for a long time, the disk space for thedata directory may be insufficient. Therefore, set dump parameters to automatically dumpaudit logs to a specified directory on a server.

If you do not configure the audit log dumping function, the system automatically saves100,000 audit logs to a file when the number of audit logs reaches 100,000. The save path is ${BIGDATA_HOME}/OMSV100R001C00x8664/workspace/conf/data/operatelog on theactive management node, and the file name format isOperateLog_store_YY_MM_DD_HH_MM_SS.csv. A maximum of 50 historical files ofaudit logs can be saved. The directory will be automatically generated when audit logs aredumped for the first time.

Prerequisites

The corresponding ECS of the dump server and the Master node of the MRS cluster aredeployed on the same VPC, and the Master node can access the IP address and specific portsof the dump server. The SFTP service of the dump server is running properly.


Issue 01 (2017-02-20) 200

Procedure


Step 2 Click Dump Audit Log under Maintenance.

Table 5-16 Description of audit log dumping parameters


Dump Audit Log l Onl Off

Mandatory.Specifies whether to enableaudit log dumping.l On: enables audit log

dumping.l Off: disables audit log

dumping.

Dump Mode l By quantityl By time

Mandatory.Specifies the dump mode.l By quantity: The dump

starts when the numberof audit logs reaches theupper limit (100,000 bydefault).

l By time: The dumpstarts on a specified date.

SFTP IP Address 192.168.10.51 (examplevalue)

Mandatory.Specifies the SFTP serveron which the dumped auditlogs are stored.

SFTP Port 22(example value) Mandatory.Specifies the connectionport of the SFTP server onwhich the dumped audit logsare stored.

Save Path /opt/omm/oms/auditLog(example value)

Mandatory.Specifies the path for storingaudit logs on the SFTPserver.

SFTP Username root (example value) Mandatory.Specifies the username forlogging in to the SFTPserver.


Issue 01 (2017-02-20) 201


SFTP Password Root_123 (example value) Mandatory.Specifies the password forlogging in to the SFTPserver.

SFTP Public Key - Optional.Specifies the public key ofthe SFTP server. You areadvised to set thisparameter; otherwise,security risks may exist.

Dump Date Nov 06 (example value) Mandatory.Specifies the date when thesystem starts dumping auditlogs. This parameter is validwhen Dumping Mode is setto By time. The logs to bedumped include all the auditlogs generated beforeJanuary 1 00:00 of thecurrent year.

NOTE

Key fields in audit log dumping files are described as follows:

l USERTYPE specifies the user type. 0 indicates that the user is a Human-Machine user. 1 indicatesthat the user is a Machine-Machine user.

l LOGLEVEL specifies the security level. 0 indicates the critical level. 1 indicates the major level. 2indicates the minor level. 3 indicates the notice level.

l OPERATERESULT specifies the operation results. 0 indicates that the operations are successful. 1indicates that the operations are failed.

----End

5.9 Health Check Management

5.9.1 Performing a Health Check

ScenarioPerform a health check for the cluster during routine maintenance to ensure that clusterparameters, configuration, and monitoring are correct and the cluster can run stably for a longtime.


Issue 01 (2017-02-20) 202

NOTE

The system health checks include MRS Manager, service-level, and host-level health checks:

l MRS Manager health checks focus on whether the unified management platform can providemanagement functions.

l Service-level health checks focus on whether components can provide services properly.

l Host-level health checks focus on whether host indicators are normal.

The system health check has three types of check items: Health Status, related alarms, and customizedmonitoring indicators of each check object. The health check results are not always equal to the HealthStatus on the page.

Procedurel Manually perform the health check for all services.

a. On MRS Manager, click Service.b. Choose More > Start Cluster Health Check to start the health check for all

services.

NOTE

l The cluster health checks include MRS Manager, service, and host status checks.

l To perform cluster health checks, you can also choose System > Check Health Status > StartCluster Health Check on the MRS Manager portal.

l To export the health check result, click Export Report in the upper left corner.

l Manually perform the health check for a service.

a. On MRS Manager, click Service, and click the target service in the service list.b. Choose More > Start Service Health Check to start the health check for a

specified service.l Manually perform the health check for a host.

a. On MRS Manager, click Host.b. Select the check box of the target host.c. Choose More > Start Host Health Check to start the health check for the host.

l Perform an automatic health check.

a. On MRS Manager, click System.b. Under Maintenance, click Check Health Status.c. Click Configure Health Check to configure automatic health check items.

Periodic Health Check indicates whether to enable the automatic health checkfunction. The value Enable indicates that the automatic health check function isenabled. Select Daily, Weekly, or Monthly as required. The default value isDisable.

d. Click OK to save the configuration. In the dialog box showing Health checkconfiguration is saved successfully, click Close.

5.9.2 Viewing and Exporting a Check Report

ScenarioView the health check result on MRS Manager and export the health check result for furtheranalysis.


Issue 01 (2017-02-20) 203

NOTE

The system health checks include MRS Manager, service-level, and host-level health checks:

l MRS Manager health checks focus on whether the unified management platform can providemanagement functions.

l Service-level health checks focus on whether components can provide services properly.

l Host-level health checks focus on whether host indicators are normal.

The system health check has three types of check items: Health Status, related alarms, and customizedmonitoring indicators of each check object. The health check results are not always equal to the HealthStatus on the page.

Prerequisites

You have performed a health check.

Procedure


Step 2 Choose More > View Cluster Health Check Report to view the health check report of thecluster.

Step 3 Click Export Report on the pane of the health check report to export the report and viewdetailed information about check items.

----End

5.9.3 Configuring the Number of Health Check Reports to BeReserved

Scenario

The health check reports of the MRS cluster, services, and hosts at different time andscenarios are not completely the same. To reserve more reports for comparison, you canmodify the number of health check reports to be reserved on MRS Manager.

This setting is valid for health check reports of clusters, services, and hosts. After healthchecks, report files are saved in ${BIGDATA_HOME}/apache-tomcat-7.0.72/webapps/web/WEB-INF/bak (the software version varies according to the actual softwareused) on the active management node by default and are automatically synchronized to thestandby management node.

Prerequisites

Users have specified service requirements and planned the save time and health checkfrequency, and the disk spaces of the active and standby management nodes are sufficient.

Procedure

Step 1 On MRS Manager, choose System > Check Health Status > Configure Health Check.

Step 2 Set Max Number of Health Check Reports to the number of health check reports to bereserved. The default value is 50. The value ranges from 1 to 100.


Issue 01 (2017-02-20) 204

Step 3 Click OK to save the configuration. In the dialog box showing Health check configurationis saved successfully, click Close.

----End

5.9.4 Managing Health Check Reports

Scenario

On MRS Manager, you can manage historical health check reports, for example, viewing,downloading, and deleting historical health check reports.

Procedurel Download a specified health check report.

a. Choose System > Check Health Status.

b. Locate the row that contains the target health check report and click todownload the report file.

l Download specified health check reports in batches.

a. Choose System > Check Health Status.b. Select multiple health check reports and click Download File to download them.

l Delete a specified health check report.

a. Choose System > Check Health Status.

b. Locate the row that contains the target health check report and click to deletethe report file.

l Delete specified health check reports in batches.

a. Choose System > Check Health Status.b. Select multiple health check reports and click Delete File to delete them.

5.10 Static Service Pool Management

5.10.1 Viewing the Status of a Static Service Pool

Scenario

The big data management platform supports management and isolation of service resourcesthat are not running on Yarn using static service resource pools. The platform supportsdynamic management of CPU, I/O, and memory capacity that can be used by HBase, HDFS,and Yarn on the deployment nodes. The system supports time-based automatic policyadjustment of static service resource pools. This enables the cluster to automatically adjust theparameters at different periods to ensure more efficient resource utilization.

Users can view the monitoring indicators of the resources used by each service in staticservice pools on MRS Manager. The following indicators are included:

l Overall CPU usage of a service


Issue 01 (2017-02-20) 205

l Overall disk I/O read rate of a service

l Overall disk I/O write rate of a service

l Overall memory used by a service

Procedure

Step 1 On MRS Manager, click System. In the Resource area, click Configure Static Service Pool.

Step 2 Click Status.

Step 3 View the system resource adjustment base.

l System Resource Adjustment Base specifies the maximum amount of resources thatcan be used by services on each node in the cluster. If the node has only one service, thisservice exclusively uses the available resources on the node. If the node has multipleservices, these services share the available resources on the node.

l CPU specifies the maximum number of CPUs that can be used by services on the node.

l Memory specifies the maximum memory that can be used by services on the node.

Step 4 View the usage status of cluster service resources.

Click the title row of the Service Pool Configuration table. The resource usage status of allservices in the service pool will be displayed in Real-Time Statistics.

NOTE

Effective Configuration Group specifies the resource control configuration group used by clusterservices currently. By default, the default configuration group is used at all periods in a day. Thisconfiguration group specifies that all CPUs and 70% memory of a node can be used by cluster services.

Step 5 View the resource usage status of a single service.

Click the row where a specified service is located in Service Pool Configuration. Theresource usage status of the service in the service pool will be displayed in Real-TimeStatistics.

----End

5.10.2 Configuring a Static Service Pool

Scenario

Users can adjust the resource base on MRS Manager and customize a resource configurationgroup to control the node resources that can be used by cluster services or specify differentnode CPUs for cluster services at different periods.

Prerequisitesl After a static service pool is configured, the HDFS and Yarn services need to be

restarted. The services are unavailable during restart.

l After a static service pool is configured, the maximum amount of resources used by theservices and their role instances cannot exceed the threshold.


Issue 01 (2017-02-20) 206

Procedure

Step 1 Modify the resource adjustment base.

1. On MRS Manager, click System. In the Resource area, click Configure Static ServicePool.

2. Click Configuration. The service pool configuration group management page isdisplayed.

3. In System Resource Adjustment Base, modify parameters CPU and Memory.

By modifying System Resource Adjustment Base, you can restrict the maximumnumber of physical CPUs and memory resource percentage that can be used by theHBase, HDFS and Yarn services. If multiple services are deployed on the same node, themaximum physical resource percentage used by all services cannot exceed the value ofthis parameter.

4. Click OK to complete the editing.

If parameters need to be modified again, you can click on the right of SystemResource Adjustment Base.

Step 2 Modify the default configuration group of the service pool.

1. Click default, and set CPU LIMIT(%), CPU SHARE(%), I/O(%), and Memory(%)for the HBase, HDFS and Yarn services in the Service Pool Configuration table.

NOTE

l The sum of CPU LIMIT(%) used by all services can exceed 100%.

l The sum of CPU SHARE(%) and the sum of I/O(%) used by all services must be 100%. Forexample, if CPU resources are allocated to the HDFS, HBase, and Yarn services, the total percentageof the CPU resources allocated to the services must be 100%.

l The sum of Memory(%) used by all services must be less than or equal to 100%.

l Memory(%) cannot take effect dynamically. This parameter can only be modified in the defaultconfiguration group.

2. Click OK to complete the editing. The correct configuration values of service poolparameters are generated in Detailed Configuration: by MRS Manager based on thecluster hardware resources and distribution.

If parameters need to be modified again, you can click on the right of Service PoolConfiguration.

3. Click on the right of Detailed Configuration: to change the parameter values ofthe service pool based on service requirements.

After you click the name of a specified service in Service Pool Configuration,parameters of only this service will be displayed in Detailed Configuration. Thedisplayed resource usage will not be updated by changing the parameter valuesmanually. For parameters that can take effect dynamically, their names displayed in anewly added configuration group will contain the ID of the configuration group, such asHBase : RegionServer : dynamic-config1.RES_CPUSET_PERCENTAGE, and theparameters function in the same way as those in the default configuration group.


Issue 01 (2017-02-20) 207

Table 5-17 Static service pool parameters

Parameter Name Description

– RES_CPUSET_PERCENTAGE– dynamic-

configX.RES_CPUSET_PERCENTAGE

Specifies the CPU percentage used by aservice.

– RES_CPU_SHARE– dynamic-

configX.RES_CPU_SHARE

Specifies the CPU share used by a service.

– RES_BLKIO_WEIGHT– dynamic-

configX.RES_BLKIO_WEIGHT

Specifies the I/O weight used by a service.

HBASE_HEAPSIZE Specifies the maximum JVM memory ofRegionServer.

HADOOP_HEAPSIZE Specifies the maximum JVM memory ofDataNode.

dfs.datanode.max.locked.memory Specifies the size of the cached memoryblock replica of DataNode in the memory.

yarn.nodemanager.resource.memory-mb

Specifies the memory that can be used byNodeManager on the current node.

Step 3 Add a customized resource configuration group.

1. Determine whether to implement time-based automatic resource configurationadjustment.If yes, go to Step 3.2.If no, go to Step 4.

2. Click to add a resource configuration group. In Scheduling Time, click to openthe time policy configuration page.Modify the following parameters based on service requirements, and click OK to savethe modification.– Repeat: When Repeat is selected, the resource configuration group runs repeatedly

by scheduling period. When Repeat is not selected, you need to set a date and timepoint for the resource configuration group to take effect.

– Repeat On: Daily, Weekly, and Monthly are supported. This parameter takeseffect only in Repeat mode.

– Between: This parameter specifies the start time and end time for the resourceconfiguration group to take effect. You need to set this parameter to a unique timesegment. If the value is the same as the time segment set for an existingconfiguration group, the settings cannot be saved.


Issue 01 (2017-02-20) 208

NOTE

– The default configuration group takes effect in all undefined periods.– The newly added configuration group is a configuration item set that takes effect dynamically

in a specified time range.– The newly added configuration group can be deleted. A maximum of four configuration

groups that take effect dynamically can be added.– Select any type of Repeat On. If the end time is earlier than the start time, the end time on the

second day is adopted by default. For example, 22:00 to 6:00 indicates that the scheduling timerange is from 22:00 on the current day to 06:00 on the next day.

– If the types of Repeat On for multiple configuration groups are different, the time segmentscan overlap, and the priorities of policies that take effect are as follows, from low to high:Daily, Weekly, and Monthly. For example, if two scheduling configuration groups exist, oneis of the Monthly type and its time segment is from 04:00 to 07:00, and the other is of theDaily type and its time segment is from 06:00 to 08:00, the configuration group of theMonthly type prevails.

– If the types of Repeat On for multiple configuration groups are the same, the time segmentscan overlap when the dates are different. For example, if two scheduling configuration groupsof the Weekly type exist, the time segments can be specified to from 04:00 to 07:00 onMonday and from 04:00 to 07:00 on Wednesday.

3. Modify the resource configuration of each service in Service Pool Configuration, clickOK, and go to Step 4.

You can click to modify the parameters again. You can click in DetailedConfiguration to manually update the parameter values generated by the system basedon service requirements.

Step 4 Save the configuration.

Click Save, select Restart the affected services or instances in the Save Configurationwindow, and click OK.

When the system displays Operation succeeded, click Finish.

----End

5.11 Tenant Management

5.11.1 Introduction

DefinitionThe MRS cluster provides various resources and services for multiple organizations,departments, or applications to share. The cluster provides tenants as a logical entity to usethese resources and services. A mode involving different tenants is called multi-tenant mode.

PrinciplesThe MRS cluster provides the multi-tenant function. It supports a layered tenant model andallows dynamic adding or deleting of tenants to implement resource isolation and manage aswell as configure computing and storage resources of tenants dynamically.

The computing resources indicate tenants' Yarn task queue resources. The task queue quotacan be modified, and the task queue usage status and statistics can be viewed.


Issue 01 (2017-02-20) 209

Storage resources support the HDFS storage. Tenants' HDFS storage directories can be addedor deleted, and the directory file quantity quota and storage space quota can be configured.

As the unified tenant management platform of the MRS cluster, MRS Manager provides amature multi-tenant management model for enterprises to implement centralized tenant andservice management. Users can create and manage tenants in the cluster based on servicerequirements.

l Roles, computing resources, and storage resources are automatically created whentenants are created. By default, all rights on the new computing and storage resources areassigned to the tenant roles.

l By default, the permission to view tenant resources, create sub-tenants, and manage sub-tenant resources is assigned to the tenant roles.

l After the computing or storage resources of tenants are modified, the related role rightsare updated automatically.

MRS Manager supports a maximum of 512 tenants. The tenants that are created by default inthe system contain default. Tenants that are in the topmost layer with the default tenant arecalled level-1 tenants.

Resource PoolYarn task queues support only one scheduling policy, namely label based scheduling. Thispolicy enables Yarn task queues to associate NodeManagers that have specific node labels. Inthis way, Yarn tasks run on specified nodes so that tasks are scheduled and certain hardwareresources are utilized. For example, Yarn tasks requiring a large memory capacity can run onnodes with a large memory capacity by means of label association, preventing service impactdue to poor performance.

On the MRS cluster, users can logically divide Yarn cluster nodes to combine multipleNodeManagers into a resource pool. Yarn task queues can be associated with specifiedresource pools by configuring queue capacity policies, ensuring efficient and independentresource utilization in the resource pools.

MRS Manager supports a maximum of 50 resource pools. The Default resource pool isincluded in the system by default.

5.11.2 Creating a Tenant

ScenarioCreate a tenant on MRS Manager when the resource usage needs to be specified based onservice requirements.

Prerequisitesl A tenant name has been planned based on service requirements. The name cannot be the

same as that of a role or Yarn queue that exists in the current cluster.l If a tenant requires storage resources, a storage directory has been planned in advance

based on service requirements, and the planned directory does not exist under the HDFSdirectory.

l The resources that can be allocated to the current tenant have been planned and the sumof resource percentages of direct sub-tenants under the parent tenant at every level doesnot exceed 100%.


Issue 01 (2017-02-20) 210

Procedure

Step 1 On MRS Manager, click Tenant.

Step 2 Click Create Tenant. On the displayed page, configure tenant attributes according to thefollowing table.

Table 5-18 Tenant parameters


Name Specifies the name of the current tenant. The value consistsof 3 to 20 characters, which can contain letters, digits, andunderscores (_).

Tenant Type The options include Leaf and Non-leaf. When Leaf isselected, the current tenant is a leaf tenant and no sub-tenantcan be added. When Non-leaf is selected, sub-tenants can beadded to the current tenant.

Dynamic Resource Specifies the dynamic computing resources for the currenttenant. The system automatically creates a task queue inYarn and the queue is named the same as the name of thetenant. If dynamic resources are not Yarn resources, thesystem does not automatically create a task queue.

Default Resource PoolCapacity

Specifies the percentage of the computing resources used bythe current tenant in the default resource pool.

Default Resource PoolMax. Capacity

Specifies the maximum percentage of the computingresources used by the current tenant in the default resourcepool.

Storage Resource Specifies the storage resources for the current tenant. Thesystem automatically creates a file folder in the /tenantdirectory by default and the file folder name is the same asthe name of the tenant. Upon initial tenant creation, thesystem automatically creates the /tenant directory under theroot directory of HDFS. If storage resources are not HDFS,the system does not create a storage directory under the rootdirectory of HDFS.


Issue 01 (2017-02-20) 211


Space Quota Specifies the HDFS storage space quota used by the currenttenant. The value of Space Quota ranges from 500 to8796093022208. The unit is MB. This parameter indicatesthe maximum HDFS storage space that can be used by thetenant, but does not indicate the actual space used. If thevalue is greater than the size of the HDFS physical disk, themaximum space that can be used is all the HDFS physicaldisk space.NOTE

To ensure data reliability, two more copies of a file are automaticallygenerated when the file is stored in HDFS. That is, three copies ofthe same file are stored by default. The HDFS storage spaceindicates the total disk space occupied by all these copies. Forexample, if Space Quota is set to 500, the actual space for storingfiles is about 166 MB (500/3 = 166).

Storage Path Specifies the HDFS storage directory of the tenant. Thesystem automatically creates a file folder in the /tenantdirectory by default and the file folder name is the same asthe name of the tenant. For example, the default HDFSstorage directory for tenant ta1 is tenant/ta1. Upon initialtenant creation, the system automatically creates the /tenantdirectory under the root directory of HDFS. The storage pathis customizable.

Service Specifies other service resources associated with the currenttenant. HBase is supported. Click Associate Services. In thedialog box that is displayed, set Service to HBase. IfAssociation Type is set to exclusive, service resources areoccupied exclusively. If share is selected, service resourcesare shared.

Description Specifies the description of the current tenant.

Step 3 Click OK to save the settings.

It takes a few minutes to save the settings. Click OK when The tenant is successfullycreated is displayed. The tenant is added successfully.

NOTE

Roles, computing resources, and storage resources are automatically created when tenants are created.

----End

Related Tasks

Viewing an added tenant


Step 2 In the tenant list on the left, click the name of an added tenant.

The Summary tab is displayed on the right by default.


Issue 01 (2017-02-20) 212

Step 3 View Basic Information, Resource Quota, and Statistics of the tenant.

If HDFS is in the Stopped state, Space Quota in Basic Information and Available andUsage of Space in Resource Quota are unknown.

----End

5.11.3 Creating a Sub-tenant

ScenarioAdd a sub-tenant on MRS Manager when the resources of the current user need to be furtherallocated based on service requirements.

Prerequisitesl A parent tenant has been added.l A tenant name has been planned based on service requirements. The name cannot be the

same as that of a role or Yarn queue that exists in the current cluster.l If a sub-tenant requires storage resources, a storage directory has been planned in

advance based on service requirements, and the planned directory does not exist underthe storage directory of the parent tenant.

l The resources that can be allocated to the current tenant have been planned and the sumof resource percentages of direct sub-tenants under the parent tenant at every level doesnot exceed 100%.

Procedure


Step 2 In the tenant list on the left, move the cursor to the tenant node to which a sub-tenant is to be

added. Click on the right. Choose Create sub-tenant from the drop-down list. On thedisplayed page, configure the sub-tenant attributes according to the following table.

Table 5-19 Sub-tenant parameters


Parent tenant Specifies the name of the parent tenant.

Name Specifies the name of the current tenant. The value consistsof 3 to 20 characters, which can contain letters, digits, andunderscores (_).

Tenant Type The options include Leaf and Non-leaf. When Leaf isselected, the current tenant is a leaf tenant and no sub-tenantcan be added. When Non-leaf is selected, sub-tenants can beadded to the current tenant.


Issue 01 (2017-02-20) 213


Dynamic Resource Specifies the dynamic computing resources for the currenttenant. The system automatically creates a task queue in theYarn parent tenant queue and the task queue name is thesame as the name of the sub-tenant. If dynamic resources arenot Yarn resources, the system does not automatically createa task queue. If the parent tenant does not have dynamicresources, the sub-tenant cannot use dynamic resources.

Default Resource PoolCapacity

Specifies the percentage of the computing resources used bythe current tenant. The base value is the total resources ofthe parent tenant.

Default Resource PoolMax. Capacity

Specifies the maximum percentage of the computingresources used by the current tenant. The base value is thetotal resources of the parent tenant.

Storage Resource Specifies the storage resources for the current tenant. Thesystem automatically creates a file folder in the HDFS parenttenant directory and the file folder name is the same as thename of the sub-tenant. If storage resources are not HDFS,the system does not create a storage directory under theHDFS directory. If the parent tenant does not have storageresources, the sub-tenant cannot use storage resources.

Space Quota Specifies the HDFS storage space quota used by the currenttenant. The value of Space Quota ranges from 500 to8796093022208. The unit is MB. This parameter indicatesthe maximum HDFS storage space that can be used by thetenant, but does not indicate the actual space used. If thevalue is greater than the size of the HDFS physical disk, themaximum space that can be used is all the HDFS physicaldisk space. If this quota is greater than the quota of theparent tenant, the actual storage space will be affected by thequota of the parent tenant.NOTE

To ensure data reliability, two more copies of a file are automaticallygenerated when the file is stored in HDFS. That is, three copies ofthe same file are stored by default. The HDFS storage spaceindicates the total disk space occupied by all these copies. Forexample, if Space Quota is set to 500, the actual space for storingfiles is about 166 MB (500/3 = 166).

Storage Path Specifies the HDFS storage directory of the tenant. Thesystem automatically creates a file folder in the parent tenantdirectory by default and the file folder name is the same asthe name of the sub-tenant. For example, if the sub-tenant ista1s and the parent directory is tenant/ta1, the system setsthis parameter for the sub-tenant to tenant/ta1/ta1s bydefault. The storage path is customizable in the parentdirectory. The parent directory for the storage path must bethe storage directory of the parent tenant.


Issue 01 (2017-02-20) 214


Service Specifies other service resources associated with the currenttenant. HBase is supported. Click Associate Services. In thedialog box that is displayed, set Service to HBase. IfAssociation Type is set to exclusive, service resources areoccupied exclusively. If share is selected, service resourcesare shared.

Description Specifies the description of the current tenant.


It takes a few minutes to save the settings. Click OK when The tenant is successfullycreated is displayed. The tenant is added successfully.

NOTE

Roles, computing resources, and storage resources are automatically created when tenants are created.

----End

5.11.4 Deleting a Tenant

Scenario

On MRS Manager, delete a tenant that is not required based on service requirements.

Prerequisitesl A tenant has been added.

l You have checked whether the tenant to be deleted has sub-tenants. If the tenant has sub-tenants, delete all sub-tenants; otherwise, you cannot delete the tenant.

Procedure


Step 2 In the tenant list on the left, move the cursor to the tenant node where the tenant is to be

deleted. Click on the right. Choose Delete from the drop-down list.

The Delete Tenant dialog box is displayed. Based on service requirements, select Reservethe data of this tenant to save the tenant data. Otherwise, the tenant storage space will bedeleted.

Step 3 Click OK.

It takes a few minutes to save the configuration. The tenant is deleted successfully. The roleand storage space of the tenant are deleted.


Issue 01 (2017-02-20) 215

NOTE

l After the tenant is deleted, the task queue of the tenant in Yarn still exists.

l If you choose not to reserve data when deleting the parent tenant, data of sub-tenants is also deletedif the sub-tenants use storage resources.

----End

5.11.5 Managing a Tenant Directory

Scenario

Manage the HDFS storage directory used by a specific tenant on MRS Manager based onservice requirements. The management operations including adding a tenant directory,modifying a directory file quantity quota and storage space quota, and deleting a directory.

Prerequisites

A tenant with HDFS storage resource has been added.

Procedurel View a tenant directory.

a. On MRS Manager, click Tenant.b. In the tenant list on the left, click the target tenant.

c. Click the Resource tab.

d. View the HDFS Storage table.

n The Quota column indicates the quotas of file and directory quantity of thetenant directory.

n The Space Quota column indicates the storage space size of the tenantdirectory.

l Add a tenant directory.

a. On MRS Manager, click Tenant.b. In the tenant list on the left, click the tenant whose HDFS storage directory needs to

be added.


d. In the HDFS Storage table, click Create Directory.

n Set Path to a tenant directory path. The new path is created in HDFS rootdirectory.

A complete HDFS storage path contains a maximum of 1023 characters. AnHDFS directory name can contain digits, letters, spaces, and underscores (_).The name cannot start or end with a space.

n Set Quota to the quotas of file and directory quantity.

Quota is optional. Its value ranges from 1 to 9223372036854775806.

n Set Space Quota to the storage space size of the tenant directory.

The value of Space Quota ranges from 500 to 8796093022208.


Issue 01 (2017-02-20) 216

NOTE

To ensure data reliability, two more copies of a file are automatically generated whenthe file is stored in HDFS. That is, three copies of the same file are stored by default.The HDFS storage space indicates the total disk space occupied by all these copies.For example, if Space Quota is set to 500, the actual space for storing files is about166 MB (500/3 = 166).

e. Click OK. The system creates the tenant directory in HDFS root directory.

l Modify a tenant directory attributes.

a. On MRS Manager, click Tenant.

b. In the tenant list on the left, click the tenant whose HDFS storage directory needs tobe modified.


d. In the HDFS Storage table, click in the Operation column of the specifiedtenant directory.

n Set Quota to the quotas of file and directory quantity.

Quota is optional. Its value ranges from 1 to 9223372036854775806.

n Set Space Quota to the storage space size of the tenant directory.

The value of Space Quota ranges from 500 to 8796093022208.

NOTE

To ensure data reliability, two more copies of a file are automatically generated whenthe file is stored in HDFS. That is, three copies of the same file are stored by default.The HDFS storage space indicates the total disk space occupied by all these copies.For example, if Space Quota is set to 500, the actual space for storing files is about166 MB (500/3 = 166).

e. Click OK.

l Delete a tenant directory.

a. On MRS Manager, click Tenant.

b. In the tenant list on the left, click the tenant whose HDFS storage directory needs tobe deleted.


d. In the HDFS Storage table, click in the Operation column of the specifiedtenant directory.

The default HDFS storage directory configured during tenant creation cannot bedeleted. Only new HDFS storage directories can be deleted.

e. Click OK. The tenant directory is deleted.

5.11.6 Recovering Tenant Data

Scenario

Tenant data is stored on MRS Manager and in cluster components by default. Whencomponents are recovered from faults or reinstalled, some tenant configuration data may be inthe abnormal state. Manually recover the tenant data in such a situation.


Issue 01 (2017-02-20) 217

Procedure


Step 2 In the tenant list on the left, click a tenant node.

Step 3 Check the status of the tenant data.

1. In Summary, check the color of the circle on the left of Basic Information. Greenindicates that the tenant is available and gray indicates that the tenant is unavailable.

2. Click Resource, and check the color of the circle on the left of Yarn or HDFS Storage.Green indicates that the resource is available and gray indicates that the resource isunavailable.

3. Click Service Association, and check the State column of the associated service table.Good indicates that the component can provide services for the associated tenant. Badindicates that the component cannot provide services for the tenant.

4. If any of the preceding check items is abnormal, go to Step 4 to recover tenant data.

Step 4 Click Restore Tenant Data.

Step 5 In the Restore Tenant Data window, select one or multiple components whose data needs tobe recovered, and click OK. The system automatically recovers the tenant data.

----End

5.11.7 Creating a Resource Pool

Scenario

On the MRS cluster, users can logically divide Yarn cluster nodes to combine multipleNodeManagers into a Yarn resource pool. Each NodeManager belongs to one resource poolonly. The system contains a Default resource pool by default. All NodeManagers that are notadded to customized resource pools belong to this resource pool.

Create a customized resource pool on MRS Manager and add hosts that are not added tocustomized resource pools to the newly created resource pool.

Procedure


Step 2 Click the Resource Pool tab.

Step 3 Click Create Resource Pool.

Step 4 In Create Resource Pool, set the attributes of the resource pool.l Name: Enter the name of the resource pool. The name of a newly created resource pool

cannot be Default.The name contains 3 to 20 characters and can consist of digits, letters, and underscores(_). However, the name must not start with underscores.

l Available Hosts: In the host list on the left, select the name of a specified host and click

to add the selected host to the resource pool. Only hosts in the cluster can beselected. The host list of a resource pool can be left blank.


Issue 01 (2017-02-20) 218


Step 6 After the resource pool is created, users can view the Name, Members, Association Mode,vCore, and Memory in the resource pool list. Hosts that are added to the customized resourcepool are no longer members of the Default resource pool.

----End

5.11.8 Modifying a Resource Pool

Scenario

Modify members of an existing resource pool on MRS Manager.

Procedure



Step 3 Locate the row that contains the specified resource pool, and click in the Operationcolumn.

Step 4 In Modify Resource Pool, modify Added Hosts.

l Adding a host: Select the name of a specified host in the host list on the left and click

to add the selected host to the resource pool.

l Deleting a host: Select the name of a specified host in the host list on the right and click

to delete the selected host from the resource pool. The host list of a resource poolcan be left blank.


----End

5.11.9 Deleting a Resource Pool

Scenario

Delete an existing resource pool on MRS Manager.

Prerequisitesl You have canceled the default resource pool before deleting a resource pool because any

queue in the cluster cannot use the resource pool to be deleted as its default resourcepool. For details, see Configuring a Queue.

l You have cleared the policy before deleting a resource pool because any queue in thecluster is not allowed to configure the resource distribution policy in the resource pool tobe deleted. For details, see Clearing the Configuration of a Queue.


Issue 01 (2017-02-20) 219

Procedure



Step 3 Locate the row that contains the specified resource pool, and click in the Operationcolumn.

In the dialog box that is displayed, select I have read the information and understand theimpact, and click OK.

----End

5.11.10 Configuring a Queue

Scenario

On MRS Manager, modify queue configuration for a specific tenant based on servicerequirements.

Prerequisites

A tenant associated with Yarn and allocated with dynamic resources has been added.

Procedure


Step 2 Click the Dynamic Resource Plan tab.

Step 3 Click the Queue Configuration tab.

Step 4 In the tenant queue table, click in the Operation column of the specific tenant queue.

NOTE

In the tenant list on the left of the Tenant Management tab, click the target tenant. In the displayed

window, choose Resource. On the displayed page, click to open the queue configurationmodification page.

Table 5-20 Queue configuration parameters


Max. Applications Specifies the maximum number of applications. The valueranges from 1 to 2147483647.

Max. AM Resources Specifies the maximum percentage of resources that can beused to run ApplicationMaster in a cluster. The value rangesfrom 0 to 1.

Min. User ResourcesPercent

Specifies the minimum user resource usage percentage. Thevalue ranges from 0 to 100.


Issue 01 (2017-02-20) 220


User Resource UpperLimit Factor

Specifies the limited factor of the maximum user resourceusage. The maximum user resource usage percentage can beobtained by multiplying the actual resource usage percentageof the current tenant in the cluster with this factor. Theminimum value is 0.

Management Status Specifies the current status of a resource plan. Runningindicates that the resource plan runs. Stopped indicates thatthe resource plan is stopped.

Default Resource Pool Specifies the resource pool used by a queue. The default valueis Default. If you want to change the resource pool, configurethe queue capacity first. For details, see Configuring theQueue Capacity Policy of a Resource Pool.

----End

5.11.11 Configuring the Queue Capacity Policy of a Resource Pool

ScenarioAfter a resource pool is added, capacity policies of available resources need to be configuredfor Yarn task queues to ensure the proper running of tasks in the resource pool. Each queuecan be configured with the queue capacity policy of only one resource pool. Users can viewthe queues in any resource pool and configure queue capacity policies. After the queuepolicies are configured, Yarn task queues and resource pools are associated.

Configure the queue policy on MRS Manager.

Prerequisitesl You have added a resource pool.l The task queues are not associated with other resource pools. By default, all task queues

are associated with the default resource pool.

Procedure



Step 3 In Resource Pool, select a specified resource pool.

Available Resource Quota: indicates that all resources in each resource pool are available forqueues by default.

Step 4 Locate the specified queue in the Resource Allocation table, and click in the Operationcolumn.

Step 5 In Modify Resource Allocation, configure the resource capacity policy of the task queue inthe resource pool.


Issue 01 (2017-02-20) 221

l Capacity: Specifies the computing resource usage percentage of the current tenant.l Max. Capacity: Specifies the maximum computing resource usage percentage of the

current tenant.


----End

5.11.12 Clearing the Configuration of a Queue

ScenarioUsers can clear the configuration of a queue on MRS Manager when the queue does not needresources from a resource pool or a resource pool needs to be disassociated from the queue.Clearing the configuration of a queue means that the resource capacity policy of the queue inthe resource pool is canceled.

PrerequisitesIf a queue needs to be unbound from a resource pool, this resource pool cannot serve as thedefault resource pool of the queue. Therefore, you need to change the default resource pool ofthe queue to another one first. For details, see Configuring a Queue.

Procedure



Step 3 In Resource Pool, select a specified resource pool.

Step 4 Locate the specified queue in the Resource Allocation table, and click in the Operationcolumn.

In Clear Queue Configuration, click OK to clear the queue configuration in the currentresource pool.

NOTE

If no resource capacity policy is configured for a queue, the clearance function is unavailable for thequeue by default.

----End

5.12 Backup and Restoration

5.12.1 Introduction

OverviewThe MRS Manager provides backup and recovery for user data and system data. The backupfunction is provided based on components to back up Manager data (including OMS data andLdapServer data), Hive user data, component metadata saved in DBService, and HDFSmetadata.


Issue 01 (2017-02-20) 222

Backup and recovery tasks are performed in the following scenarios:

l Routine backup is performed to ensure the data security of the system and components.l When the system is faulty, backup data can be used to restore the system.l When the active cluster is completely faulty, an image cluster that is the same as the

active cluster needs to be created, and backup data can be used to perform restorationoperations.

Table 5-21 Backing up metadata based on service requirements

Backup Type Backup Content

OMS Back up database data (excluding alarm data) andconfiguration data in the cluster management system bydefault.

LdapServer Back up user information, including the username, password,key, password policy, and group information.

DBService Back up metadata of the component (Hive) managed byDBService.

NameNode Back up HDFS metadata.

Table 5-22 Backing up service data of specific components based on service requirements

Backup Type Backup Content

HBase Back up table-level user data.

HDFS Back up the directories or files that correspond to userservices.

Hive Back up table-level user data.

Note that some components do not provide the data backup and recovery functions:

l ZooKeeper data is backed up each other on ZooKeeper nodes.l MapReduce and Yarn data is stored in HDFS. Therefore, MapReduce and Yarn depend

in HDFS to provide the backup and recovery functions.

PrinciplesTask

Before backup or recovery, you need to create a backup or recovery task and set taskparameters, such as the task name, backup data source, and type of directories for savingbackup files. Data backup and recovery can be performed by executing backup and recoverytasks. When the Manager is used to recover the data of HDFS, Hive, and NameNode, thecluster cannot be accessed.

Each backup task can back up different data sources and generate an independent backup filefor each data source. All the backup files generated in each backup task form a backup file


Issue 01 (2017-02-20) 223

set, which can be used in recovery tasks. Backup files can be stored on Linux local disks,local cluster HDFS, and standby cluster HDFS. The backup task provides the full backup andincremental backup policies. HDFS and Hive backup tasks support the incremental backuppolicy, while OMS, LdapServer, DBService, and NameNode backup tasks support only thefull backup policy.

NOTE

Task execution rules:

l If a task is being executed, the task cannot be executed repeatedly and other tasks cannot be started.

l The interval at which a periodic task is automatically executed must be greater than 120s; otherwise,the task is postponed and will be executed in the next period. Manual tasks can be executed at anyinterval.

l When a periodic task is to be automatically executed, the current time cannot be 120s later than thetask start time; otherwise, the task is postponed and will be executed in the next period.

l When a periodic task is locked, it cannot be automatically executed and needs to be manuallyunlocked.

l Before an OMS, LdapServer, DBService, or NameNode backup task starts, ensure that theLocalBackup partition on the active management node has more than 20 GB available space;otherwise, the backup task cannot be started.

l When planning backup and recovery tasks, select the data to be backed up or recovered strictlybased on the service logic, data storage structure, and database or table association. By default, thesystem creates a periodic backup task default whose execution interval is 24 hours to perform fullbackup of OMS, LdapServer, DBService, and NameNode data to the Linux local disk.

Snapshot

The system adopts the snapshot technology to quickly back up data. Snapshots include HDFSsnapshots.

HDFS snapshot

An HDFS snapshot is a read-only backup copy of the HDFS file system at a specified timepoint. The snapshot is used in data backup, misoperation protection, and disaster recoveryscenarios.

The snapshot function can be enabled for any HDFS directory to create the related snapshotfile. Before creating a snapshot for a directory, the system automatically enables the snapshotfunction for the directory. Snapshot creation does not affect any HDFS operation. Amaximum of 65,536 snapshots can be created for each HDFS directory.

When a snapshot has been created for an HDFS directory, the directory cannot be deleted andthe directory name cannot be modified before the snapshot is deleted. Snapshots cannot becreated for the upper-layer directories or subdirectories of the directory.

DistCp

Distributed copy (DistCp) is a tool used to perform large-amount data replication in thecluster HDFS or between the HDFSs of different clusters. In an HBase, HDFS, or Hivebackup or recovery task, if the data is backed up in HDFS of the standby cluster, the systeminvokes DistCp to perform the operation. You need to install the MRS system of the sameversion in the active and standby clusters.

DistCp uses MapReduce to implement data distribution, troubleshooting, recovery, and report.DistCp specifies different Map jobs for various source files and directories in the specifiedlist. Each Map job copies the data in the partition that corresponds to the specified file in thelist.


Issue 01 (2017-02-20) 224

To use DistCp to perform data replication between HDFSs of two clusters, configure thecross-cluster trust relationship and enable the cross-cluster replication function for bothclusters.

Local quick recovery

After using DistCp to back up the HDFS and Hive data of the local cluster to HDFS of thestandby cluster, HDFS of the local cluster retains the backup data snapshots. Users can createlocal quick recovery tasks to recover data by using the snapshot files in HDFS of the localcluster.

Specifications

Table 5-23 Backup and recovery feature specifications

Item Specification

Maximum number of backup or recoverytasks

100

Number of concurrent running tasks 1

Maximum number of waiting tasks 199

Maximum size of backup files on a Linuxlocal disk (GB)

600

Table 5-24 Specifications of the default task

Item OMS LdapServer DBService NameNode

Backup period 1 hour

Maximumnumber ofbackup copies

2

Maximum sizeof a backup file

10 MB 20 MB 100 MB 1.5 GB

Maximum sizeof disk spaceused

20 MB 40 MB 200 MB 3 GB

Save path ofbackup data

Data save path/LocalBackup/ on active and standby management nodes

NOTE

l The administrator must regularly transfer the backup data of the default task to an external clusterbased on the enterprise's O&M requirements.

l The administrator can create a DistCp backup task to store data of OMS, LdapServer, DBService,and NameNode to an external cluster.


Issue 01 (2017-02-20) 225

5.12.2 Enabling Cross-Cluster Replication

Scenario

DistCp is used to copy the data stored in HDFS from the current cluster to another cluster.DistCp depends on the cross-cluster replication function, which is disabled by default. Thisfunction needs to be enabled in the two clusters.

Modify parameters on MRS Manager to enable the cross-cluster replication function.


Yarn needs to be restarted to enable the cross-cluster replication function and cannot beaccessed during restart.

Prerequisitesl The hadoop.rpc.protection parameter of the two HDFS clusters must be set to the same

data transmission mode. privacy is set by default, indicating that the channels areencrypted. authentication indicates that channels are not encrypted.

l The inbound rules of the two security groups on the peer cluster are added to the twosecurity groups in each cluster to allow all access requests of all protocols and ports ofall ECSs in the security groups.

Procedure

Step 1 On MRS Manager of a cluster, choose Service > Yarn > Service Configuration, and setType to All.

Step 2 In the navigation tree, choose Yarn > Distcp.

Step 3 Set dfs.namenode.rpc-address.haclusterX.remotenn1 to the service IP address and RPCport number of one NameNode instance of the peer cluster, and set dfs.namenode.rpc-address.haclusterX.remotenn2 to the service IP address and RPC port number of the otherNameNode instance of the peer cluster.

dfs.namenode.rpc-address.haclusterX.remotenn1 and dfs.namenode.rpc-address.haclusterX.remotenn2 do not distinguish active and standby NameNode instances.The default NameNode RPC port number is 25000 and cannot be modified on MRS Manager.

Examples of modified parameter values: 10.1.1.1:25000 and 10.1.1.2:25000.

Step 4 Click Save Configuration, select Restart the affected services or instances, and click OKto restart the Yarn service.

After the system displays Operation succeeded, click Finish. The Yarn service issuccessfully started.

Step 5 Log in to MRS Manager of the other cluster and repeat the preceding operations.

----End


Issue 01 (2017-02-20) 226

5.12.3 Backing Up Metadata

ScenarioTo ensure metadata security routinely or before and after a critical operation (such as capacityexpansion and reduction, patch installation, upgrade, or migration) is performed on metadatafunctions, metadata needs to be backed up. The backup data can be used to recover the systemin time if an exception occurs or the operation has not achieved the expected result,minimizing the reverse impact on services. Metadata includes OMS data, LdapServer data,DBService data, and NameNode data. Manager data to be backed up includes OMS data andLdapServer data.

By default, metadata backup is supported by the default task. Users can create a backup taskon MRS Manager to back up metadata. Both automatic backup tasks and manual backup tasksare supported.

Prerequisitesl A standby cluster for backing up data has been created and the network is connected.

The inbound rules of the two security groups on the peer cluster are added to the twosecurity groups in each cluster to allow all access requests of all protocols and ports ofall ECSs in the security groups.

l No cross-cluster trust relationship is configured between the two MRS clusters for databackup.

l Cross-cluster replication has been configured for the active and standby clusters. Fordetails, see Enabling Cross-Cluster Replication.

l The backup type, period, policy, and other specifications have been planned based onservice requirements, and whether the Data save path/LocalBackup/ directory on theactive and management nodes has sufficient space has been checked.

Procedure

Step 1 Create a backup task.

1. On MRS Manager, choose System > Back Up Data.2. Click Create Backup Task.

Step 2 Set backup policies.

1. Set Name to the name of the backup task.2. Set Mode to the type of the backup task. Periodic indicates that the backup task is

periodically executed and Manual indicates that the backup task is manually executed.To create a periodic backup task, set the following parameters in addition to thepreceding parameters:– Start Time: indicates the time when the task is started for the first time.– Period: indicates a task execution interval. The options include By hour and By

day.– Backup Policy: indicates the volume of data to be backed up when each task is

started. The options include Full backup at the first time and subsequentincremental backup, Full backup every time, and Full backup once every ntimes. When the parameter is set to Full backup once every n times, n must bespecified.


Issue 01 (2017-02-20) 227

Step 3 Select backup sources.

Set Configuration to OMS and LdapServer.

Step 4 Set backup parameters.

1. Set Path Type of OMS and LdapServer to a backup directory type.

The following backup directory types are supported:

– LocalDir: indicates that backup files are stored on the local disk of the activemanagement node and the standby management node automatically synchronizesthe backup files. The default save path is Data save path/LocalBackup/. If youselect this value, you need to set Max. Number of Backup Copies to specify thenumber of backup files that can be retained in the backup directory.

– LocalHDFS: indicates that backup files are stored in HDFS directory of the currentcluster. If you select this value, you need to set the following parameters:

n Target Path: indicates the backup file save path in HDFS. The save pathcannot be an HDFS hidden directory, such as a snapshot or recycle bindirectory, or a default system directory.

n Max. Number of Backup Copies: indicates the number of backup file setsthat can be retained in the backup directory.

n Target Instance Name: indicates the name of the NameService instance thatcorresponds to the backup directory. The default value is hacluster.

– RemoteHDFS(DistCp): indicates that backup files are stored in HDFS directory ofthe standby cluster. If you select this value, you need to set the followingparameters:

n Target NameNode IP Address: indicates the NameNode service plane IPaddress of the standby cluster, supporting the active node or standby node.

n Target Path: indicates the HDFS directory for storing standby cluster backupdata. The save path cannot be an HDFS hidden directory, such as a snapshot orrecycle bin directory, or a default system directory, such as /hbase or /user/hbase/backup.

n Max. Number of Backup Copies: indicates the number of backup file setsthat can be retained in the backup directory.

n Queue Name: indicates the name of the Yarn queue used for backup taskexecution. The name must be the same as the name of the queue that is runningproperly in the cluster.

2. Click OK to save the settings.

Step 5 Execute the backup task.

In the Operation column of the created task in the backup task list, click to execute thebackup task.

After the backup task is executed, the system automatically creates a subdirectory for eachbackup task in the backup directory. The format of the subdirectory name is Backup taskname_Task creation time, and the subdirectory is used to save data source backup files. Theformat of the backup file name is Version_Data source_Task execution time.tar.gz.

----End


Issue 01 (2017-02-20) 228

5.12.4 Backing Up Service Data

ScenarioTo ensure service data security routinely or before and after a critical operation (such asupgrade or migration) is performed on service components, service data needs to be backedup. The backup data can be used to recover the system in time if an exception occurs or theoperation has not achieved the expected result, minimizing the reverse impact on services.

Users can create a backup task on MRS Manager to back up service data. Both automaticbackup tasks and manual backup tasks are supported.

The following scenarios may occur when HBase backs up data:

l When a user creates an HBase table, KEEP_DELETED_CELLS is set to false bydefault. When the user backs up this HBase table, deleted data will be backed up andjunk data may exist after data restoration. Based on service requirements, this parameterneeds to be set to true manually when an HBase table is created.

l A user manually specifies the timestamp when writing data into an HBase table and thespecified time is earlier than the last backup time of the HBase table. Otherwise, newdata may not be backed up in incremental backup tasks.

l The HBase backup function cannot back up data generated when HBase global ornamespace is read, written, created, executed and the access control lists (ACLs) ofmanagement permissions. After the HBase data is restored, the administrator needs to setnew permission for roles on MRS Manager.

l If an HBase backup task has been created and the current backup data in the standbycluster is lost, the next incremental task will fail and a new HBase backup task needs tobe created. The next full backup task will be normal.

Note the following information when backing up Hive service data:

l The Hive backup and recovery functions cannot identify the service and structurerelationships of objects such as Hive tables, indexes, and views. When executing backupand recovery tasks, users need to manage a unified recovery point based on the servicescenario to ensure proper service running.

l The Hive backup and recovery functions do not support Hive on RDB data tables. Theoriginal data tables need to be backed up and recovered in the external databaseindependently.

Prerequisitesl A standby cluster for backing up data has been created and the network is connected.

The inbound rules of the two security groups on the peer cluster are added to the twosecurity groups in each cluster to allow all access requests of all protocols and ports ofall ECSs in the security groups.

l No cross-cluster trust relationship is configured between the two MRS clusters for databackup.

l Cross-cluster replication has been configured for the active and standby clusters. Fordetails, see Enabling Cross-Cluster Replication.

l Backup policies, such as the backup task type, period, backup object, backup directory,and Yarn queue required by the backup task are planned based on service requirements.

l Check whether HDFS of the standby cluster has sufficient space. It is recommended thedirectory for storing backup files be a user-defined directory.


Issue 01 (2017-02-20) 229

l On the HDFS client, run hdfs lsSnapshottableDir to check the list of directories forwhich HDFS snapshots have been created in the current cluster. Ensure that the HDFSparent directory or subdirectory where data files to be backed up are stored does not haveHDFS snapshots. Otherwise, the backup task cannot be created.

Procedure

Step 1 Create a backup task.

1. On MRS Manager, choose System > Back Up Data.2. Click Create Backup Task.

Step 2 Set backup policies.

1. Set Name to the name of the backup task.2. Set Mode to the type of the backup task. Periodic indicates that the backup task is

periodically executed and Manual indicates that the backup task is manually executed.To create a periodic backup task, set the following parameters in addition to thepreceding parameters:– Start Time: indicates the time when the task is started for the first time.– Period: indicates a task execution interval. The options include By hour and By

day.– Backup Policy: indicates the volume of data to be backed up in each task

execution. The options include Full backup at the first time and subsequentincremental backup, Full backup every time, and Full backup once every ntimes. When the parameter is set to Full backup once every n times, n must bespecified.

Step 3 Select backup sources.

Set Configuration to HDFS.

Step 4 Set backup directory parameters.

1. Set Path Type to RemoteHDFS(DistCp).2. Set Target NameNode IP Address to the NameNode service plane IP address of the

standby cluster, supporting the active node or standby node.3. Set Target Path to the HDFS directory for storing standby cluster backup data. The save

path cannot be an HDFS hidden directory, such as a snapshot or recycle bin directory, ora default system directory.

4. Set Max. Number of Backup Copies to the number of backup file sets that can beretained in the backup directory.

5. When the number of backup file sets is greater than the value of Max. Number ofBackup Copies, the latest files are retained by default.Set Queue Name to the name of the Yarn queue used for backup task execution. Thename must be the same as the name of the queue that is running properly in the cluster.

6. Set Instance Name to the name of the NameService instance used for backup taskexecution. The default value is hacluster.

Step 5 Set backup data parameters.

1. Set Max. Number of Restoration Points to the number of snapshots that can beretained in the cluster.


Issue 01 (2017-02-20) 230

2. Set Backup Object to one or multiple HDFS directories to be backed up based onservice requirements.The following methods are supported to select backup data:– Select directly

Click the name of a directory in the navigation tree to show all the subdirectories inthe directory, and select specified directories.

– Select using regular expressions

i. Click Query Regular Expression.ii. Enter the parent directory full path of the directory in the first text box as

prompted. The directory must be the same as the existing directory, forexample, /tmp.

iii. Enter a regular expression in the second text box. Standard regular expressionsare supported. For example, if all files or subdirectories in the parent directoryneed to be filtered, enter ([\s\S]*?). If files of which the names consisting ofletters and digits, such as file1, need to be filtered, enter file\d*.

iv. Click Refresh to view the selected directories in Directory Name.v. Click Synchronize to save the result.

NOTE

○ When entering regular expressions, you can click or to add or delete anexpression.

○ If the selected table or directory is incorrect, you can click Clear Selected Node todeselect it.

3. Click Verify to check whether the backup task is configured correctly.– The possible causes of an HBase verification failure are as follows:

n The target NameNode IP address is incorrect.n The queue name is incorrect.n The HDFS parent directory or subdirectory where HBase table data files to be

backed up are stored has HDFS snapshots.n The directory or table to be backed up does not exist.

– The possible causes of an HDFS verification failure are as follows:n The target NameNode IP address is incorrect.n The queue name is incorrect.n The HDFS parent directory or subdirectory where data files to be backed up

are stored has HDFS snapshots.n The directory or table to be backed up does not exist.n The name of the NameService instance is incorrect.

– The possible causes of a Hive verification failure are as follows:n The target NameNode IP address is incorrect.n The queue name is incorrect.n The HDFS parent directory or subdirectory where data files to be backed up

are stored has HDFS snapshots.n The directory or table to be backed up does not exist.



Issue 01 (2017-02-20) 231

Step 6 Execute the backup task.

In the Operation column of the created task in the backup task list, click to execute thebackup task.

After the backup task is executed, the system automatically creates a subdirectory for eachbackup task in the backup directory. The format of the subdirectory name is Backup taskname_Data source_Task creation time, and the subdirectory is used to save latest data sourcebackup files. All the backup file sets are saved to the related snapshot directories.

----End

5.12.5 Recovering Metadata

ScenarioMetadata needs to be recovered in the following scenarios:

l Data is modified or deleted unexpectedly and needs to be restored.l After a critical operation (such as upgrade and critical data adjustment) is performed on

metadata components, an exception occurs or the operation has not achieved theexpected result. All modules are faulty and become unavailable.

l Data is migrated to a new cluster.

Users can create a recovery task on MRS Manager to recover metadata. Only manualrecovery tasks are supported.

NOTICEl Data recovery can be performed only when the system version is consistent with that of

data backup.l To recover data when the service is running properly, you are advised to manually back up

the latest management data before recovering data. Otherwise, the metadata that isgenerated after the data backup and before the data recovery will be lost.

l Use the OMS data and LdapServer data that are backed up at the same time point torecover the data. Otherwise, the service and operation may fail.

l The MRS cluster uses DBService to save Hive metadata by default.

Impact on the Systeml After the data is recovered, the data generated between the backup time and restoration

time is lost.l After the data is recovered, the configuration of the components that depend on

DBService may expire and these components need to be restarted.

Prerequisitesl No cross-cluster trust relationship is configured between the two MRS clusters.l Cross-cluster replication has been configured for the active and standby clusters. For

details, see Enabling Cross-Cluster Replication.


Issue 01 (2017-02-20) 232

l The data in the OMS and LdapServer backup files is backed up at the same time.l The status of the OMS resources and the LdapServer instances is normal. If the status is

abnormal, data recovery cannot be performed.l The status of the cluster hosts and services is normal. If the status is abnormal, data

recovery cannot be performed.l The cluster host topologies during data recovery and data backup are the same. If the

topologies are different, data recovery cannot be performed and you need to back up dataagain.

l The services added to the cluster during data recovery and data backup are the same. Ifthe services are different, data recovery cannot be performed and you need to back updata again.

l The status of the active and standby DBService instances is normal. If the status isabnormal, data recovery cannot be performed.

l The upper-layer applications that depend on the MRS cluster are stopped.l On MRS Manager, all the NameNode role instances whose data is to be recovered are

stopped. Other HDFS role instances keep running. After data is recovered, theNameNode role instances need to be restarted. The NameNode role instances cannot beaccessed before they are restarted.

l You have checked whether the NameNode backup files are saved in the Data save path/LocalBackup/ directory on the active management node.

Procedure

Step 1 Check the location of backup data.

1. On MRS Manager, choose System > Back Up Data.

2. In the Operation column of a specified task in the task list, click to view historicalbackup task execution records. In the window that is displayed, locate a specifiedsuccess record and click View in the Backup Path column to view the backup pathinformation of the task and find the following information:– Backup Object: indicates the data source of the backup data.– Backup Path: indicates the full path where the backup files are saved.

3. Select the correct item, and manually copy the full path of backup files in Backup Path.

Step 2 Create a recovery task.

1. On MRS Manager, choose System > Restore Data.2. Click Create Restoration Task.3. Set Name to the name of the recovery task.

Step 3 Select recovery sources.

Select components whose metadata is to be recovered in Configuration.

Step 4 Set recovery parameters.

1. Set Path Type to a backup directory type.2. The settings vary according to backup directory types:

– LocalDir: indicates that backup files are stored on the local disk of the activemanagement node. If you select this value, you need to set Source Path to the fullpath of the backup file. For example, Data path/LocalBackup/backup task


Issue 01 (2017-02-20) 233

name_task creation time/data source_task execution time/version_datasource_task execution time.tar.gz.

– LocalHDFS: indicates that backup files are stored in HDFS directory of the currentcluster. If you select this value, you need to set the following parameters:n Source Path: indicates the full path of the backup file in HDFS. For example,

backup path/backup task name_task creation time/version_data source_taskexecution time.tar.gz.

n Source Instance Name: indicates the name of the NameService instance thatcorresponds to the backup directory when the recovery task is executed. Thedefault value is hacluster.

– RemoteHDFS(DistCp): indicates that backup files are stored in HDFS directory ofthe standby cluster. If you select this value, you need to set the followingparameters:n Source NameNode IP Address: indicates the NameNode service plane IP

address of the standby cluster, supporting the active node or standby node.n Source Path: indicates the full path of the HDFS directory for storing standby

cluster backup data. For example, backup path/backup task name_datasource_task creation time/version_data source_task execution time.tar.gz.

n Queue Name: indicates the name of the Yarn queue used for backup taskexecution. The name must be the same as the name of the queue that is runningproperly in the cluster.


Step 5 Execute the recovery task.

In the Operation column of the created task in the recovery task list, click to execute therecovery task.

l After the recovery is successful, the progress bar is in green.l After the recovery is successful, the recovery task cannot be executed again.

l If the recovery task fails during the first execution, rectify the fault and click toexecute the task again.

Step 6 Determine what metadata is recovered.l If OMS and LdapServer metadata is recovered, go to Step 7.l If DBService data is recovered, the task is completed.l If NameNode data is recovered, choose Service > HDFS > More > Restart Service on

MRS Manager The task is completed.

Step 7 Restart Manager to make the recovered data take effect.

1. On MRS Manager, choose LdapServer > More > Restart Service, click OK, and waitfor the LdapServer service to be restarted.

2. Log in to the active management node. For details, see Viewing Active and StandbyNodes.

3. Run the following command to restart OMS:sh ${BIGDATA_HOME}/om-0.0.1/sbin/restart-oms.shThe command is executed successfully if the following information is displayed:start HA successfully.


Issue 01 (2017-02-20) 234

4. On MRS Manager, choose KrbServer > More > Synchronize Configuration, deselectRestart services or instances whose configurations have expired, click OK, and waitfor the KrbServer service configuration to be synchronized and the service to berestarted.

5. On MRS Manager, choose Service > More > Synchronize Configuration, deselectRestart services or instances whose configurations have expired, click OK, and waitfor the cluster configuration to be synchronized.

6. Choose Service > More > Stop Cluster. After the cluster is stopped, choose Service >More > Start Cluster, and wait for the cluster to be started.

----End

5.12.6 Recovering Service Data

Scenario

Data needs to be recovered on service components in the following scenarios:

l Data is modified or deleted unexpectedly and needs to be restored.

l After a critical operation (such as upgrade and critical data adjustment) is performed onservice components, an exception occurs or the operation has not achieved the expectedresult. All modules are faulty and become unavailable.

l Data is migrated to a new cluster.

Users can create a recovery task on MRS Manager to recover service data. Only manualrecovery tasks are supported.

NOTICEl Data recovery can be performed only when the system version is consistent with that of

data backup.

l To recover data when the service is running properly, you are advised to manually back upthe latest management data before recovering data. Otherwise, the service data that isgenerated after the data backup and before the data recovery will be lost.

l The Hive backup and recovery functions cannot identify the service and structurerelationships of objects such as Hive tables, indexes, and views. When executing backupand recovery tasks, users need to manage a unified recovery point based on the servicescenario to ensure proper service running.

Impact on the Systeml During data recovery, user authentication stops and users cannot create new connections.

l After the data is recovered, the data generated between the backup time and restorationtime is lost.

l After the data is recovered, upper-layer applications of HBase, HDFS, or Hive need to berestarted.


Issue 01 (2017-02-20) 235

Prerequisitesl No cross-cluster trust relationship is configured between the two MRS clusters.l Cross-cluster replication has been configured for the active and standby clusters. For

details, see Enabling Cross-Cluster Replication.l The directory for saving HDFS or Hive backup files has been checked.l Upper-layer applications of HDFS or Hive are stopped.

Procedure

Step 1 Check the location of backup data.

1. On MRS Manager, choose System > Back Up Data.

2. In the Operation column of a specified task in the task list, click to view historicalbackup task execution records.In the displayed window, locate a specified success record and click View in the BackupPath column to view the backup path information of the task and find the followinginformation:– Backup Object: indicates the data source of the backup data.– Backup Path: indicates the full path where the backup files are saved.

3. Select the correct item, and manually copy the full path of backup files in Backup Path.

Step 2 Create a recovery task.

1. On MRS Manager, choose System > Restore Data.2. Click Create Restoration Task.3. Set Name to the name of the recovery task.

Step 3 Select recovery sources.

Select service components whose data is to be recovered, such as HDFS, in Configuration.

Step 4 Set recovery parameters.

1. Set Path Type to RemoteHDFS(DistCp).2. Set Source NameNode IP Address to the service plane IP address of the active

NameNode in the standby cluster.3. Set Source Path to the HDFS directory for storing standby cluster backup data.4. Set Queue Name to the name of the Yarn queue used for backup task execution.5. Click Refresh and set Recovery Point List to an HDFS directory that has been backed

up in the standby cluster.6. Set Instance Name to the name of the specified NameService instance. The default

value is hacluster.7. Select one or multiple backup data items to be recovered in select Data based on service

requirements.– To recover HBase service data, select one or multiple backup data items to be

recovered in Backup Data based on service requirements, and specify thenamespace for backup data recovery in the Target Namespace column.You are advised to set Target Namespace to a location different from the backupnamespace.


Issue 01 (2017-02-20) 236

– To recover HDFS service data, specify the location where the backup data is to berecovered in the Target Path column.You are advised to set Target Path to a new path different from the target path.

– To recover Hive service data, specify the database and the file save path for backupdata recovery in the Target Database and Target Path columns.Configuration restrictions:n Data can be recovered to the original database, but data tables are saved in a

new path different from the target path.n To recover Hive index tables, you need to select the Hive data tables that

correspond to the Hive index tables to be recovered.n If you select a new recovery directory to prevent current data from being

affected, you need to grant HDFS permission on the new directory to userswho need to access the new directory manually. This enables users who havepermission on backup tables to access the directory.

n Data can be recovered to another database. If data is recovered to anotherdatabase, you need to grant HDFS permission on the directory in HDFS thatcorresponds to the database to users who need to access the directorymanually. This enables users who have permission on backup tables to accessthe directory.

8. Set Force recovery to YES, which indicates to forcibly recover all backup data when afile with the same name already exists. If the directory contains new data added afterbackup, the new data will be lost after the data recovery. If you set the parameter to NO,the recovery task is not executed if a file with the same name exists.

9. Click Verify to check whether the recovery task is configured correctly.– If the queue name is incorrect, the verification fails.– If the specified recovery directory does not exist, the verification fails.– If the forcible replacement conditions are not met, the verification fails.


Step 5 Execute the recovery task.

In the Operation column of the created task in the recovery task list, click to execute therecovery task.

l After the recovery is successful, the progress bar is in green.l After the recovery is successful, the recovery task cannot be executed again.

l If the recovery task fails during the first execution, rectify the fault and click toexecute the task again.

----End

5.12.7 Managing Local Quick Recovery Tasks

Scenario

When DistCp is used to back up data, the backup snapshot information is saved to HDFS ofthe active cluster. On MRS Manager, local snapshots can be used for quick data recovery,requiring less time than data recovery from the standby cluster.


Issue 01 (2017-02-20) 237

Use MRS Manager and the snapshots in HDFS of the active cluster to create a local quickrecovery task and execute the task.

Procedure

Step 1 On MRS Manager, choose System > Back Up Data.

Step 2 In the backup task list, locate a created task and click in the Operation column.

Step 3 Check whether the system displays No data is available for restoration. Create a task onthe restoration management page to restore data.l If yes, click Close to close the dialog box. No backup data snapshot is created in the

active cluster, and no further action is required.l If no, go to Step 4 to create a local quick recovery task.

Step 4 Set Name to the name of the local quick recovery task.

Step 5 Set Configuration to a data source.

Step 6 Set Recovery point list to a recovery point that contains the backup data.

Step 7 Set Queue Name to the name of the Yarn queue used for task execution. The name must bethe same as the name of the queue that is running properly in the cluster.

Step 8 Set Backup Data to the object to be recovered.

Step 9 Click Verify. After Verify succeeded is displayed, click OK.

Step 10 Click OK.

Step 11 Choose System > Restore Data.

Step 12 In the Operation column of the created task in the recovery task list, click to execute therecovery task.

----End

5.12.8 Modifying a Backup Task

Scenario

Modify the parameters of a created backup task on MRS Manager to meet changing servicerequirements. The parameters of recovery tasks can only be viewed but cannot be modified.


After a backup task is modified, the new parameters take effect when the task is executed nexttime.

Prerequisitesl A backup task has been created.l A new backup task policy has been planned based on the actual situation.


Issue 01 (2017-02-20) 238

Procedure

Step 1 On MRS Manager, choose System > Back Up Data.

Step 2 In the task list, locate a specified task, and click in the Operation column to go to theconfiguration modification page.

Step 3 On the page that is displayed, modify the following parameters:l Start Timel Periodl Target NameNode IP Addressl Target Pathl Max. Number of Backup Copiesl Queue Namel Max. Number of Restoration Points

NOTE

l Target NameNode IP Address and Queue Name are valid only in HBase, HDFS, Hive, andNameNode backup tasks. Max. Number of Restoration Points is valid only in HBase, HDFS, andHive backup tasks.

l After the Target Path parameter of a backup task is modified, this task will be performed as a fullbackup task for the first time by default.


----End

5.12.9 Viewing Backup and Recovery Tasks

ScenarioOn MRS Manager, view created backup and recovery tasks and check their running status.

Procedure


Step 2 Click Back Up Data or Restore Data.

Step 3 In the task list, obtain the previous task execution result in the Task Progress column. Greenindicates that the task is executed successfully, and red indicates that the execution fails.

Step 4 In the Operation column of a specified task in the task list, click to view the taskexecution records.

In the displayed window, click View in the Details column of a specified record to display loginformation about the execution.

----End

Related Tasksl Modifying a backup task

See Modifying a Backup Task.


Issue 01 (2017-02-20) 239

l Viewing a recovery task

In the task list, locate a specified task and click in the Operation column to view arecovery task. The parameters of recovery tasks can only be viewed but cannot bemodified.

l Executing a backup or recovery task

In the task list, locate a specified task and click in the Operation column to start abackup or recovery task that is ready or fails to be executed. Executed recovery taskscannot be repeatedly executed.

l Stopping a backup or recovery task

In the task list, locate a specified task and click in the Operation column to stop abackup or recovery task that is running.

l Deleting a backup or recovery task

In the task list, locate a specified task and click in the Operation column to delete abackup or recovery task. Backup data will be reserved by default after a task is deleted.

l Suspending a backup task

In the task list, locate a specified task and click in the Operation column to suspend abackup task. Only periodic backup tasks can be suspended. Suspended backup tasks areno longer executed automatically. When you suspend a backup task that is beingexecuted, the task execution stops. If you want to cancel the suspension status of a task,click .

5.13 Security Management

5.13.1 List of Default Users

User ClassificationThe MRS cluster provides the following three types of users. Users are required toperiodically change the passwords. It is not recommended to use the default passwords.

User Type Description

System user A user used to run OMS system processes.

Internal systemuser

An internal user provided by the MRS cluster and used to implementcommunication between processes, save user group information, andassociate user rights.

Database user l A user used to manage the OMS database and access data.l A user used to run the database of service components (Hive and

DBservice).


Issue 01 (2017-02-20) 240

System UsersNOTE

l User ldap of the OS is required in the MRS cluster. The account cannot be deleted because thisoperation may interrupt cluster running. Password management policies are maintained by the users.

l Reset the password when you change the passwords of user ommdba and user omm for the firsttime. Change the passwords regularly after you have retrieved them.

Type Username InitialPassword

Description

MRS clustersystem user

admin Admin12! The default user of MRSManager, used to record thecluster audit logs

MRS clusternode OS user

ommdba Randomlygenerated by thesystem

User who creates the MRScluster system database. Thisuser is an OS user generated onthe management nodes anddoes not require a unifiedpassword.

omm Randomlygenerated by thesystem

Internal running user of theMRS cluster system. This useris an OS user generated on allnodes and does not require aunified password.

User for runningMRS cluster jobs

yarn_user Randomlygenerated by thesystem

Internal user used to run theMRS cluster jobs. This user isgenerated on Core nodes.

Internal System UsersType Default User Initial

PasswordDescription

Kerberosadministrator

kadmin/admin Admin@123 Account that is used to add,delete, modify, and query the useron Kerberos.

OMS Kerberosadministrator

kadmin/admin Admin@123 Account that is used to add,delete, modify, and query the useron OMS Kerberos.

LDAPadministrator

cn=root,dc=hadoop,dc=com

LdapChangeMe@123

Account that is used to add,delete, modify, and query the userinformation on LDAP.

OMS LDAPadministrator

cn=root,dc=hadoop,dc=com

LdapChangeMe@123

Account that is used to add,delete, modify, and query the userinformation on OMS LDAP.


Issue 01 (2017-02-20) 241

Type Default User InitialPassword

Description

Componentrunning user

oms/manager Randomlygenerated bythe system

The user that is used forcommunication between Masternodes and Core nodes.

check_ker_M Randomlygenerated bythe system

Kerberos internal functional user.This user cannot be deleted, andthe password of this user cannotbe changed. This internal accountcannot be used on the nodeswhere Kerberos service is notinstalled.

K/M Randomlygenerated bythe system

kadmin/changepw Randomlygenerated bythe system

kadmin/history Randomlygenerated bythe system

krbtgt/HADOOP.COM

Randomlygenerated bythe system

User Group InformationDefault User Group Description

supergroup Primary group of user admin. The primary group does nothave additional permissions in the cluster where theKerberos authentication is disabled.

check_sec_ldap Used to test whether the active LDAP works properly.This user group is generated randomly in a test andautomatically deleted after the test is complete. Internalsystem user group, which is used only betweencomponents.

Manager_tenant_187 Tenant system user group. Internal system user group,which is used only between components and in the clusterwhere the Kerberos authentication is enabled.

System_administrator_186 MRS cluster system administrator group. Internal systemuser group, which is used only between components andin the cluster where the Kerberos authentication isenabled.

Manager_viewer_183 MRS Manager system viewer group. Internal system usergroup, which is used only between components and in thecluster where the Kerberos authentication is enabled.


Issue 01 (2017-02-20) 242

Default User Group Description

Manager_operator_182 MRS Manager system operator group. Internal systemuser group, which is used only between components andin the cluster where the Kerberos authentication isenabled.

Manager_auditor_181 MRS Manager system auditor group. Internal system usergroup, which is used only between components and in thecluster where the Kerberos authentication is enabled.

Manager_administrator_180 MRS Manager system administrator group. Internalsystem user group, which is used only betweencomponents and in the cluster where the Kerberosauthentication is enabled.

compcommon MRS cluster internal group for accessing public clusterresources. All system users and system running users areadded to this user group by default.

default_1000 This group is created for tenants. Internal system usergroup, which is used only between components.

OS User Group Description

wheel Primary group of the MRS internal running user omm.

ficommon MRS cluster common group that corresponds tocompcommon for accessing public cluster resource filesstored in the OS.

Database UsersMRS cluster system database users contain OMS database users and DBService databaseusers.


Description

OMS database ommdba dbChangeMe@123456

OMS database administratorwho performs maintenanceoperations, such as creating,starting, and stoppingapplications.

omm ChangeMe@123456

User for accessing OMSdatabase data


Issue 01 (2017-02-20) 243


Description

DBServicedatabase

omm dbserverAdmin@123

Administrator of the GaussDBdatabase in the DBServicecomponent

hive HiveUser@ User for Hive to connect to theDBService database

5.13.2 Changing the Password for User admin

Scenario

Periodically change the password for user admin to improve the system O&M security.

Prerequisites

The client has been updated on the active management node.

Procedure



cd /opt/client

Step 3 Run the following command to configure the environment variable:

source bigdata_env

Step 4 Run the following command to change the password for user admin. This operation takeseffect in the entire cluster.

kpasswd admin

Enter the old password and then enter a new password twice. The password complexityrequirements are as follows by default:

l The password must contain 8 to 32 characters.

l The password must contain at least four types of the following: lowercase letters,uppercase letters, digits, spaces, and special characters which can only be ~`!?,.:;-_'(){}[]/<>@#$%^&*+|\=

l The password cannot be the same as the username or reverse username.

l The password cannot be the same as the previous password.

----End


Issue 01 (2017-02-20) 244

5.13.3 Changing the Password for the Kerberos Administrator

ScenarioPeriodically change the password for the Kerberos administrator kadmin of the MRS clusterto improve the system O&M security.

If the user password is changed, the OMS Kerberos administrator password is changed aswell.

PrerequisitesA client has been prepared on the Master1 node.

Procedure

Step 1 Log in to the Master1 node.

Step 2 Run the following command to go to the client directory /opt/client.

cd /opt/client

Step 3 Run the following command to configure environment variables:

source bigdata_env

Step 4 Run the following command to change the password for kadmin/admin. The passwordchange takes effect on all servers.

kpasswd kadmin/admin

The password complexity requirements are as follows by default:l The password must contain 8 to 32 characters.l The password must contain at least four types of the following: lowercase letters,

uppercase letters, digits, spaces, and special characters which can only be ~`!?,.:;-_'(){}[]/<>@#$%^&*+|\=

l The password cannot be the same as the username or reverse username.l The password cannot be the same as the previous password.

----End

5.13.4 Changing the Password for the OMS KerberosAdministrator

ScenarioPeriodically change the password for the OMS Kerberos administrator kadmin of the MRScluster to improve the system O&M security.

If the user password is changed, the Kerberos administrator password is changed as well.

PrerequisitesA client has been prepared on the Master1 node.


Issue 01 (2017-02-20) 245

Procedure


Step 2 Run the following command to go to the related directory:

cd ${BIGDATA_HOME}/om-0.0.1/meta-0.0.1-SNAPSHOT/kerberos/scripts


source component_env

Step 4 Run the following command to change the password for kadmin/admin. The passwordchange takes effect on all servers.

kpasswd kadmin/admin

The password complexity requirements are as follows by default:

l The password must contain 8 to 32 characters.l The password must contain at least four types of the following: lowercase letters,

uppercase letters, digits, and special characters which can only be ~`!?,.:;-_'(){}[]/<>@#$%^&*+|\=


----End

5.13.5 Changing the Password for the LDAP (including OMSLDAP) Administrator

Scenario

Periodically change the password for the LDAP administrator cn=root,dc=hadoop,dc=comof the MRS cluster to improve the system O&M security.

If the LDAP administrator password is changed, the OMS LDAP administrator password ischanged as well.


All services need to be restarted for the new password to take effect. The services areunavailable during the restart.

Procedure

Step 1 Choose Service > LdapServer > More on MRS Manager.

Step 2 Click Change Password.

Step 3 In the Change Password dialog box, enter the old password in Old Password and the newpassword in New Password and Confirm Password.

The password complexity requirements are as follows:



Issue 01 (2017-02-20) 246

l The password must contain at least three types of the following: lowercase letters,uppercase letters, digits, and special characters which can only be `~!@#$%^&*()-_=+\|[{}];:'",<.>/?


Step 4 Select I have read the information and understood the impact, and click OK to confirmthe password changing and restart the service.

----End

5.13.6 Changing the Password for a Component Running User

Scenario

Periodically change the password for each component running user of the MRS cluster toimprove the system O&M security.

If the initial password is randomly generated by the system, reset the initial password.


A component running user, whose initial password is randomly generated by the system,needs to change the initial password. After the password changes, the MRS cluster needs tobe restarted, during which services are temporarily interrupted.

Prerequisites

A client has been prepared on the Master1 node.

Procedure


Step 2 Run the following command to go to the client directory, such as /opt/client.

cd /opt/client


source bigdata_env

Step 4 Run the following command to log in to the console using kadmin/admin:

kadmin -p kadmin/admin

Step 5 Run the following command to change the password of an internal system user. The passwordchange takes effect on all servers.

cpw component running user

For example: cpw oms/manager

The password complexity requirements are as follows by default:



Issue 01 (2017-02-20) 247

l The password must contain at least four types of the following: lowercase letters,uppercase letters, digits, spaces, and special characters which can only be ~`!?,.:;-_'(){}[]/<>@#$%^&*+|\=


----End

5.13.7 Changing the Password for the OMS DatabaseAdministrator

Scenario

Periodically change the password for the OMS database administrator to ensure the systemO&M security.

Procedure


NOTE

The password of user ommdba cannot be changed on the standby management node; otherwise, thecluster cannot work properly. Change the password of user ommdba on the active management nodeonly.


sudo su - root

su - omm

Step 3 Run the following command to go to the related directory:

cd $OMS_RUN_PATH/tools

Step 4 Run the following command to change the password for user ommdba:

mod_db_passwd ommdba

Step 5 Enter the old password of user ommdba and enter a new password twice. The passwordchange takes effect on all servers.


l The password must contain 16 to 32 characters.l The password must contain at least three types of the following: lowercase letters,

uppercase letters, digits, and special characters which can only be ~`!@#$%^&*()-+_=\|[{}];:",<.>/?

l The password cannot be the same as the username or reverse username.l The password cannot be the same as the last 20 historical passwords.

If the following information is displayed, the password is changed successfully.

Congratulations, update [ommdba] password successfully.

----End


Issue 01 (2017-02-20) 248

5.13.8 Changing the Password for the Data Access User of theOMS Database

Scenario

Periodically change the password for the OMS data access user to ensure the system O&Msecurity.


The OMS service needs to be restarted for the new password to take effect. The service isunavailable during the restart.

Procedure

Step 1 Click System on MRS Manager.

Step 2 In the Permission area, click Change OMS Database Password.

Step 3 Locate the row that contains user omm and click in Operation to change the passwordfor the OMS database user.


l The password must contain 8 to 32 characters.l The password must contain at least three types of the following: lowercase letters,

uppercase letters, digits, and special characters which can only be ~`!@#$%^&*()-+_=\|[{}];:",<.>/?


Step 4 Click OK. After the system displays Operation succeeded, click Finish.

Step 5 Locate the row that contains user omm and click in Operation to restart the OMSdatabase.

Step 6 In the dialog box that is displayed, select I have read the information and understood theimpact, click OK, and then restart the OMS service.

----End

5.13.9 Changing the Password for a Component Database User

Scenario

Periodically change the password for each component database user to improve the systemO&M security.


The services need to be restarted for the new password to take effect. The services areunavailable during the restart.


Issue 01 (2017-02-20) 249

Procedure

Step 1 Click Service on MRS Manager and click the name of the database user service to bemodified.

Step 2 Determine the component database user whose password is to be changed.l To change the password for the DBService database user, go to Step 3.l To change the password for the Hive database user, you must stop the service first, and

go to Step 3.

Click Stop Service to stop the service.

Step 3 Choose More > Change Password.

Step 4 In the displayed window, enter the old and new passwords as prompted.


l The password for a DBService database user must contain 16 to 32 characters; thepassword for a Hive database user must contain 8 to 32 characters.

l The password must contain at least three types of the following: lowercase letters,uppercase letters, digits, and special characters which can only be ~`!@#$%^&*()-+_=\|[{}];:",<.>/?


Step 5 Select I have read the information and understood the impact and click OK. The systemautomatically restarts the service. After Operation succeeded is displayed, click Finish.

----End

5.13.10 Replacing the CA Certificate

ScenarioThe CA certificate of the MRS cluster encrypts communication data between a componentclient and a server to ensure communication security. Replace the CA certificate on MRSManager to ensure product security.

The certificate file and key file can be generated by the users.

Impact on the SystemThe MRS cluster must be restarted during the replacement and cannot be accessed or provideservices.

Prerequisitesl You have obtained the files to be imported to the MRS cluster, including the CA

certificate file (such as *.crt), key file (*.key), and file (password.property) that savesthe key file password. The certificate name and key name support uppercase letters,lowercase letters, and digits.

l You have prepared a password, for example, Userpwd@123, for accessing the key file.The password must meet the following complexity requirements. Otherwise, potentialsecurity risks may exist:


Issue 01 (2017-02-20) 250

– The password must contain at least eight characters.– The password must contain at least four types of the following: uppercase letters,

lowercase letters, digits, and special characters ~`!?,.:;-_'(){}[]/<>@#$%^&*+|\=.

Procedure



sudo su - root

su - omm

Step 3 Run the following command to generate the certificate file and key file in the omm userdirectory on the management node:

Generate the key file:

openssl genrsa -out key name.key -passout pass:password -aes256 2048 -sha256

For example, run the following command to generate the key file ca.key: openssl genrsa -outca.key -passout pass:Userpwd@123 -aes256 2048 -sha256

Generate the certificate file:

openssl req -new -x509 -days 36135 -key key name.key -out certificate name.crt -passinpass:password -subj "/C=de/ST=ber/L=eur/O=dt/OU=dt/CN=dt" -sha256

For example, run the following command to generate the certificate file ca.crt: openssl req -new -x509 -days 36135 -key ca.key -out ca.crt -passin pass:Userpwd@123 -subj "/C=de/ST=ber/L=eur/O=dt/OU=dt/CN=dt" -sha256

Step 4 Run the following command in the omm user directory on the management node to save thepassword for accessing the key file:

sh ${BIGDATA_HOME}/om-0.0.1/sbin/genPwFile.sh

Enter the password twice as prompted, and press Enter. After being encrypted, the passwordis saved in password.property.

Please input key password:Please Confirm password:

NOTE

The password.property file that is generated on a node is applicable only to the cluster to which thecurrent node belongs.

Step 5 Compress the three files in the .tar format and save them to the local computer.

tar -cvf package name certificate name.crt key name.key password.property

For example, tar -cvf test.tar ca.crt ca.key password.property

Step 6 Log in to the MRS Manager system and click System.

Step 7 In the Certificate area, click Manage Certificate to go to Manage Certificate.

Step 8 In the Certificate Package area, click file selecting button. In the window for selecting files,select the obtained .tar certificate file packages and open them. The system automaticallyimports the certificate.


Issue 01 (2017-02-20) 251

Step 9 After the certificate is imported, the system reports a message indicating to restart the MRScluster for the certificate to take effect. Click OK, and the OMSServer is restarted.

Step 10 After the restart is complete, access the MRS Manager to verify the replacement.

Step 11 On the MRS Manager, choose Service > More > Stop Cluster.

In the displayed page, click OK to stop the user.

Step 12 Choose Service > More > Start Cluster to restart the cluster.

----End

5.13.11 Replacing HA Certificates

Scenario

HA certificates are used to encrypt the communication between active/standby processes andhigh availability processes to ensure security. Replace the HA certificates on active andstandby management nodes on MRS Manager to ensure product security.

The certificate file and key file can be generated by the users.


The MRS Manager system must be restarted during the replacement and cannot be accessedor provide services.

Prerequisitesl You have obtained the root-ca.crt root file and the root-ca.pem key file of the certificate

to be replaced.l You have prepared a password, for example, Userpwd@123, for accessing the key file.

The password shall meet the following complexity requirements. Otherwise, securityrisks may be incurred.– The password must contain at least eight characters.– The password must contain at least four types of the following: uppercase letters,

lowercase letters, digits, and special characters ~`!?,.:;-_'(){}[]/<>@#$%^&*+|\=.

Procedure



sudo su - root

su - omm

Step 3 Run the following command to generate root-ca.crt and root-ca.pem in the ${OMS_RUN_PATH}/workspace0/ha/local/cert directory:

sh ${OMS_RUN_PATH}/workspace/ha/module/hacom/script/gen-cert.sh --root-ca --country=country --state=state --city=city --company=company --organize=organize --common-name=commonname --email=Administrator email address --password=password


Issue 01 (2017-02-20) 252

For example, run the following command to generate the files: sh ${OMS_RUN_PATH}/workspace/ha/module/hacom/script/gen-cert.sh --root-ca --country=DE --state=eur --city=ber --company=dt --organize=IT --common-name=HADOOP.COM [email protected] --password=Userpwd@123

The command is run successfully if the following information is displayed:

Generate root-ca pair success.

Step 4 On the active management node, run the following command as user omm to copy root-ca.crt and root-ca.pem to the ${BIGDATA_HOME}/om-0.0.1/security/certHA directory:

cp -arp ${OMS_RUN_PATH}/workspace0/ha/local/cert/root-ca.* ${BIGDATA_HOME}/om-0.0.1/security/certHA

Step 5 Copy root-ca.crt and root-ca.pem generated on the active management node to ${BIGDATA_HOME}/om-0.0.1/security/certHA on the standby management node as useromm.

Step 6 Run the following command to generate an HA certificate and perform automaticreplacement:

sh ${BIGDATA_HOME}/om-0.0.1/sbin/replacehaSSLCert.sh

Enter password as prompted and press Enter.

Please input ha ssl cert password:

The HA certificate is replaced successfully if the following information is displayed:

[INFO] Succeed to replace ha ssl cert.

Step 7 Run the following command to restart the OMS.

sh ${BIGDATA_HOME}/om-0.0.1/sbin/restart-oms.sh

The following information is displayed:

start HA successfully.

Step 8 Log in to the standby management node and switch to user omm. Repeat Step 6 to Step 7.

Run the sh ${BIGDATA_HOME}/om-0.0.1/sbin/status-oms.sh command to check whetherHAAllResOK of the management node is Normal. Access the MRS Manager again. If theMRS Manager can be accessed, the operation is successful.

----End

5.13.12 Updating a Key for a Cluster

ScenarioWhen a cluster is created, the system automatically generates an encryption key to store thesecurity information in the cluster (such as all database user passwords and key file accesspasswords) in encryption mode. After a cluster is successfully installed, it is advised toregularly update the encryption key based on the following procedure.

Impact on the Systeml After a cluster key is updated, a new key is generated randomly in the cluster. This key is

used to encrypt and decrypt the newly stored data. The old key is not deleted, and it is


Issue 01 (2017-02-20) 253

used to decrypt old encrypted data. After security information is modified, for example, adatabase user password is changed, the new password is encrypted using the new key.

l When a key is updated for a cluster, the cluster must be stopped and cannot be accessed.

Prerequisites

You have stopped the upper-layer service applications that depend on the cluster.

Procedure

Step 1 On MRS Manager, choose Service > More > Stop Cluster.

Select I have read the information and understand the impact in the displayed window,and click OK. After Operation succeeded is displayed, click Finish. The cluster is stopped.



sudo su - root

su - omm

Step 4 Run the following command to prevent you from being forcibly logged out when a timeoutoccurs:

TMOUT=0

Step 5 Run the following command to switch the directory:

cd ${BIGDATA_HOME}/om-0.0.1/tools

Step 6 Run the following command to update the cluster key:

sh updateRootkey.sh

Enter y as prompted.

The root key update is a critical operation.Do you want to continue?(y/n):

If the following information is displayed, the key is updated successfully.

Step 4-1: The key save path is obtained successfully....Step 4-4: The root key is sent successfully.

Step 7 On MRS Manager, choose Service > More > Start Cluster.

In the confirmation dialog box, click Yes to start the cluster. After Operation succeeded isdisplayed, click Finish. The cluster is started.

----End


Issue 01 (2017-02-20) 254

6 FAQs

6.1 What Is MRS?MapReduce Service (MRS for short), one of basic services on the public cloud, is used formanaging and analyzing massive data.

MRS builds a reliable, secure, and easy-to-use operation and maintenance (O&M) platform.The platform provides analysis and computing capabilities for massive data and can addressenterprises' demands on data storage and processing. Users can independently apply for anduse the hosted Hadoop, Spark, HBase, and Hive components to quickly create clusters on ahost, which provides batch data analysis and computing capabilities for massive data that doesnot have demanding requirements on real-time processing.

6.2 What Are the Highlights of MRS?The highlights of MRS are as follows:

l Easy to useMRS provides not only the capabilities supported by Hadoop, Spark, Spark SQL, HBase,and Hive, but also the unified SQL interaction interfaces in the entire process, whichsimplifies big data application development.

l Low costMRS is free of O&M and separates computing from storage. The computing cluster canbe created as required and released after a job operation is complete.

l StabilityMRS makes you spend less time on commissioning and monitoring clusters. The serviceusability reaches 99.9% and the data reliability reaches 99.9999%.

l High opennessMRS is open source-based and compatible with other services, and provides REST APIsand JDBC interfaces.

MapReduce ServiceUser Guide 6 FAQs

Issue 01 (2017-02-20) 255

6.3 What Is MRS Used For?Based on the Hadoop open source software, Spark memory computing engine, HBasedistributed storage database, and Hive data warehouse framework, MRS provides a unifiedplatform for storing, querying, and analyzing enterprise-level big data to help enterprisesquickly establish a massive data processing system. This platform features:

l Analyzing and computing massive data

l Storing massive data

6.4 How Do I Use MRS?MRS is a basic service on the public cloud and is easy to use. By using computers connectedin a cluster, you can run various tasks, and process or store PB-level data. A typical procedurefor using MRS is as follows:

1. Prepare data.

Upload the local programs and data files to Object Storage Service (OBS).

2. Create a cluster.

Create clusters before you use MRS. The number of clusters you can create is restrictedby the number of Elastic Cloud Servers (ECSs). Configure basic cluster information tocomplete cluster creation. You can submit a job when you are creating a cluster.

NOTE

Only one new job can be created when you are creating a cluster. If you want to create more thanone job, perform Step 4.

3. Import data.

After an MRS cluster is successfully created, use the import function of the cluster toimport OBS data to HDFS of the cluster. An MRS cluster can process both OBS data andHDFS data.

4. Create jobs.

Data can be analyzed and processed after being uploaded to OBS. MRS provides aplatform for executing programs developed by users. You can submit, execute, andmonitor such programs by using MRS. After a job is created, the job is in the Runningstate by default.

5. View the execution result.

Job running takes a while. After job running is complete, go to the Job Managementpage, and refresh the job list to view the execution results on the Job tab page.


6. Terminate a cluster.

If you want to terminate a cluster after job execution is completed, click Terminate inCluster. The cluster status changes from Running to Terminating. After the cluster isterminated, the cluster status will change to Terminated and will be displayed inHistorical Cluster.


Issue 01 (2017-02-20) 256

6.5 How Do I Ensure Data and Service Running Security?MRS is a platform for massive data management and analysis and features high security. Itensures user data and service running security from the following aspects:

l Network isolationThe public cloud divides the entire network into two planes: the service plane andmanagement plane. The two planes are physically isolated to ensure security of theservice and management networks.– Service plane

Network plane where cluster components are running. It provides service channelsfor users and delivers data access, task submitting, and computing functions.

– Management planePublic cloud console. It is used to apply for and manage MRS.

l Host securityUsers can deploy third-party antivirus software based on their service requirements. Forthe operating system (OS) and interfaces, MRS provides the following securityprotection measures:– Hardening OS kernel security– Installing the latest OS patch– Controlling the OS rights– Managing OS interfaces– Preventing the OS protocols and interfaces from attacks

l Data securityMRS stores data on the OBS platform, ensuring data security.

l Data integrityAfter processing data, MRS encrypts and transmits data to the OBS system through SSL,ensuring data integrity.

6.6 How Do I Prepare a Data Source for MRS?MRS can process data in both OBS and HDFS. Before using MRS to analyze data, you arerequired to prepare the data.

1. Upload local data to OBS.

a. Log in to the OBS management console.b. Create a userdata bucket, and then create program, input, output, and log folders

in the userdata bucket.

i. Click Create Bucket to create a userdata bucket.ii. In the userdata bucket, click Create Folder to create program, input,

output, and log folders.c. Upload local data to the userdata bucket.

i. Go to the program folder, and click to select a user program.


Issue 01 (2017-02-20) 257

ii. Click Upload.iii. Repeat preceding steps to upload the data files to the input folder.

2. Import OBS data to HDFS.

a. Log in to the MRS management console.b. Go to the File Management page and select HDFS File List.c. Click the data storage directory, for example, bd-app1.

bd-app1 is just an example. The storage directory can be any directory on the page.You can create a directory by clicking Create Folder.

d. Click Import Data, and click Browse to configure the paths of HDFS and OBS, asshown in Figure 6-1.

Figure 6-1 Importing files

e. Click OK.You can view the file upload progress in File Operation Record.

6.7 What Is the Difference Between Data in OBS and Thatin HDFS?

The data source to be processed by MRS is from OBS or HDFS. OBS provides you withmassive, highly reliable, and secure data storage capabilities at a low cost. MRS can processthe data in OBS. You can view, manage, and use data by using OBS Console or an OBSclient. In addition, you can use the REST APIs to manage or access data. You can use theREST APIs alone or integrate it with service programs.

l OBS data storage: Data storage and computing are performed separately. OBS datastorage features low cost and unlimited storage capacity. The clusters can be terminatedat any time in OBS. The computing performance is determined by OBS accessperformance and is lower than that of HDFS. OBS is recommended when datacomputing is not frequent.

l HDFS data storage: Data storage and computing are performed together. HDFS datastorage features high cost, high computing performance, and limited storage capacity.Before terminating clusters, you must export and store the data. HDFS is recommendedwhen data computing is frequent.


Issue 01 (2017-02-20) 258

6.8 How Do I View All Clusters?On the Cluster page, you can view clusters in various states. If massive clusters are involved,you can turn pages to view clusters in any status.

l Active Cluster: contains all clusters except the clusters in the Terminated state.

l Historical Cluster: contains only the clusters in the Terminated state. Only clustersterminated within the last six months are displayed. If you want to view clustersterminated six months ago, contact technical support engineers.

l Failed Task: contains only the tasks in the Failed state. You can also delete failed taskson this page. Task failures include:

– Cluster creation failure

– Cluster termination failure

– Cluster capacity expansion failure

6.9 How Do I View Log Information?On the Operation Log page, you can view log information about users' operations on clustersand jobs. Currently, MRS has two types of logs:

l Cluster: Creating, terminating, shrinking, and expanding a cluster

l Job: Creating, stopping, and deleting a job

Figure 6-2 shows log information about users' operations.

Figure 6-2 Log information

6.10 What Types of Jobs Are Supported by MRS?A job functions as a program execution platform provided by MRS. Currently, MRS supportsMapReduce jobs, Spark jobs, and Hive jobs. Table 6-1 describes job characteristics.


Issue 01 (2017-02-20) 259

Table 6-1 Job types

Type Description

MapReduce MapReduce is a programming model with parallel computingsimplified, and is used for parallel computing of big data sets (over oneTB).Map divides one task into multiple tasks, and Reduce summarizes theprocessing results of these tasks and produces the final analysis result.After you complete code development, pack the code into a JAR file inIDEA or Eclipse, upload the file to the MRS cluster for execution, andobtain the execution result.

Spark Spark is a batch data processing engine with high processing speed.Spark has demanding requirements on memory because it performscomputing based on memory. A Spark job includes:l Spark Jar: Ends with .jar, which is case-insensitive.l Spark Script: Ends with .sql, which is case-insensitive.l Spark SQL: Specifies standard Spark SQL statements, for example,

show tables;.The program developed using Python must end with .py, which is case-insensitive.

Hive Hive is a data warehouse framework built on Hadoop. Hive providesHive query language (HiveQL), similar to structured query language(SQL), to process structured data. Hive automatically converts HiveQLin Hive Script to a MapReduce task to query and analyze massive datastored in the Hadoop cluster.An example of a standard HiveQL statement is as follows: create tablepage_view(viewTime INT,userid BIGINT,page_urlSTRING,referrer_uel STRING,ip STRING COMMENT 'IP Addressof the User');

6.11 How Do I Submit Developed Programs to MRS?MRS provides a platform for executing programs developed by users. You can submit,execute, and monitor such programs by using MRS. To submit developed programs to MRS,set Program Path to the actual path for storing such programs, as shown in Figure 6-3.


Issue 01 (2017-02-20) 260

Figure 6-3 Creating a job

6.12 How Do I View Cluster Configurations?l After a cluster is created, you can choose Basic Information > Cluster Configuration

to view basic configuration information about a cluster. The instance specifications andcapacities of nodes determine the data analysis and processing capability of a cluster.More advanced instance specifications and larger capacity allow faster cluster runningand better data processing, and accordingly require higher cluster costs.

l Choose Basic Information > Cluster Configuration > Cluster Manager. On the MRScluster management page that is displayed, you can view and process alarm information,modify cluster configurations, and upgrade cluster patches.

6.13 What Types of Host Specifications Are Supported byMRS?

MRS provides optimal specifications based on extensive experience in big data productoptimization. Host specifications are determined by CPUs, memory, and disks. Currently, thefollowing specifications are supported:

l MRS_4U_16G (S1.xlarge)

– CPU: 4-core

– Memory: 16 GB


Issue 01 (2017-02-20) 261

– System Disk: 40 GB

l MRS_4U_32G_HDD (D1.xlarge)

– CPU: 4-core

– Memory: 32 GB


– Data Disk: 1.8 TB x 3 HDDs

l MRS_8U_16G (C2.2xlarge)

– CPU: 8-core

– Memory: 16 GB


l MRS_8U_64G_HDD (D1.2xlarge)

– CPU: 8-core

– Memory: 64 GB



l MRS_16U_32G (C2.4xlarge)

– CPU: 16-core

– Memory: 32 GB


l MRS_16U_64G (S1.4xlarge)

– CPU: 16-core

– Memory: 64 GB



– CPU: 16-core

– Memory: 128 GB



l MRS_32U_128G (S1.8xlarge)

– CPU: 32-core

– Memory: 128 GB



– CPU: 36-core

– Memory: 256 GB



More advanced host specifications enable better data processing, and accordingly requirehigher cluster costs. You can choose host specifications based on site requirements.


Issue 01 (2017-02-20) 262

6.14 What Components Are Supported by MRS?MRS supports components, such as Hadoop 2.7.2, Spark 1.5.1, HBase 1.0.2, and Hive 1.3.0.More versions and components will be supported by later versions. A component is alsoknown as a service in an MRS Manager.

6.15 What Is the Relationship Between Spark andHadoop?

Spark is a fast and common computing engine that is compatible with Hadoop data. Spark canoperate in a Hadoop cluster by using Yarn and process data of any type in HDFS, HBase,Hive, and Hadoop.

6.16 What Types of Spark Jobs Are Supported by an MRSCluster?

On the page of MRS, an MRS cluster supports Spark jobs submitted in Spark Jar, SparkScript, or Spark SQL mode.

6.17 Can a Spark Cluster Access Data in OBS?Similar to a Hadoop cluster, a Spark cluster can access data stored in the OBS system. Youneed only to set Import From and Export To to the path of the OBS system when submittingjobs.

6.18 What Is the Relationship Between Hive and OtherComponents?

l Relationship between Hive and HDFS

Hive is the subproject of Apache Hadoop. Hive uses HDFS as the file storage system.Hive parses and processes structured data, and HDFS provides highly reliable underlyingstorage support for Hive. All data files in the Hive database are stored in HDFS, and alldata operations on Hive are also performed using HDFS APIs.

l Relationship between Hive and MapReduce

Hive data computing depends on MapReduce. MapReduce is a subproject of ApacheHadoop. It is a parallel computing framework based on HDFS. During data analysis,Hive translates HiveQL statements submitted by users into MapReduce jobs and submitsthe jobs for MapReduce to execute.

l Relationship between Hive and DBService

MetaStore (metadata service) of Hive processes the structure and attribute informationabout Hive databases, tables, and partitions. The information needs to be stored in arelational database and is maintained and processed by MetaStore. In MRS, the relationaldatabase is maintained by the DBService component.


Issue 01 (2017-02-20) 263

l Relationship between Hive and SparkHive data computing can be implemented on Spark. Spark is an Apache project. It is adistributed computing framework based on memory. During data analysis, Hivetranslates HiveQL statements submitted by users into Spark jobs and submits the jobs forSpark to execute.

6.19 What Types of Distributed Storage Are Supported byMRS?

MRS supports Hadoop 2.7.2 now and will support other mainstream Hadoop versionsreleased by the community.

6.20 Can MRS Cluster Nodes Be Changed on the MRSManagement Console?

MRS cluster nodes cannot be changed on the MRS management console. You are not advisedto change MRS cluster nodes on the ECS management console either. If you manually stop ordelete the ECS, modify or reinstall the ECS OS, or modify the ECS specifications for a clusternode on the ECS management console, the cluster may work incorrectly.

If you have performed any of the preceding operations, MRS automatically identifies anddeletes the involved cluster node. You can substitute the deleted node by expanding thecapacity of the cluster on the MRS management console. Do not perform any operation on anode during capacity expansion.


Issue 01 (2017-02-20) 264

A Change History

Release Date What's New

2017-02-20 This issue is the first official release.

MapReduce ServiceUser Guide A Change History

Issue 01 (2017-02-20) 265

B Glossary

For details about the terms involved in this document, see Glossary.

MapReduce ServiceUser Guide B Glossary

Issue 01 (2017-02-20) 266

https://docs.otc.t-systems.com/en-us/glossary/index.html

user guide - deutsche telekom · 5.12.3 backing up metadata.....227 5.12.4 backing up service data...

Documents