upgrading from hdp 2.1 to hdp 2.2

Upgradingfrom HDP2.1 to HDP2.2

2014/12/18@tagomoris

HadoopSCR #hadoopreading

Satoshi Tagomori (@tagomoris)LINE Corp.

Analysis2 (CDH4)

Hadoop Cluster SwitchingLong running CDH4 cluster

Switching to new cluster

w/ Fast network, Large HDD, Many CPU core

changing Hive table schema/file formats

No downtime!

MRv1/HDFS

Distribution Options

Options at Oct 2014

HDP2.1

Apache Hadoop Release

Hive 0.13, Tez -> HDP2.1 !

input datafluent-plugin-webhdfs

executing queriesover hiveserver1/2

Analysis2 (CDH4)

MRv1/HDFS

double write

Analysis2 (CDH4)

MRv1/HDFS

Analysis3 (HDP2.1)

MRv2/HDFS

Hivedistcp

Nov-Dec 2014

HDP 2.1.5.0

Install over Ansible, w/o Ambari

for configuration versioning

Hadoop 2.4.0

YARN RM-HA + Namenode HA

Hive 0.13

Analysis2 (CDH4)

MRv1/HDFS

Analysis3 (HDP2.1)

MRv2/HDFS

Few days later (not yet)

HDP 2.2!

Hadoop 2.6.0

Datanode hot swap drive

YARN ResourceManager REST API

Hive 0.14.0 (...)

Latest Tez

diff HDP2.1 HDP2.2

hadoop-yarn-2.4.0.2.1.5.0-695.el6

-> hadoop-yarn-2.6.0.2.2.0.0-2041.el6

+ hadoop_2_2_0_0_2041-yarn-2.6.0.2.2.0.0-2041.el6

/usr/lib/hadoop/....

-> /usr/hdp/current/hadoop-*

diff HDP2.1 HDP2.2

Toooooooooooooo many diff lines

Companion files of HDP (2.1 -> 2.2)

in hive-site.xml: 353 -> 1207 lines

in tez-site.xml: 126 -> 261 lines

How to edit/control?

IDE? Editor? KIAI? Excel?

hadoop_xml_diff.rb

http://d.hatena.ne.jp/tagomoris/20141215/1418631988

Upgrade test in test clusterAutomated Upgrade by Ansible playbook

stop hiveserver2stop cluster

-safemode enter, -saveNamespacemake backup (hdfs metadata, hive metastore)-finalizeUpgradenm, rm, dn, nn, zkfc, jn, zkcheck edits stopped

Upgrade yum repo/packages/configurationsexecute hdp-selectStart cluster

zk, jn

“hdfs namenode -upgrade”

Upgrade in test clusterAutomated Upgrade by Ansible playbook

zk, jn

“hdfs namenode -upgrade” ... ever lasting ...

“Ah, I might make any mistakes...”

double write

Analysis2 (CDH4)

MRv1/HDFS

Analysis3 (HDP2.2)

MRv2/HDFS

Upgrade HDP 2.1->2.2Dec 16 2014

Upgrade in analysis3Manual Procedure!!!

zk, jn

/usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh \start namenode -upgrade

2014-12-16 14:53:28,919 INFO namenode.NNUpgradeUtil (NNUpgradeUtil.java:doUpgrade(139)) - Performing upgrade of storage directory /var/hadoop/hdfs/nn2014-12-16 14:53:28,939 INFO namenode.FSNamesystem (FSNamesystem.java:loadFSImage(1029)) - Need to save fs image? false (staleImage=false, haEnabled=true, isRollingUpgrade=false)2014-12-16 14:53:28,941 INFO namenode.FSEditLog (FSEditLog.java:startLogSegment(1173)) - Starting log segment at 2627951392014-12-16 14:53:29,224 INFO namenode.NameCache (NameCache.java:initialized(143)) - initialized with 23408 entries 1740524 lookups2014-12-16 14:53:29,227 INFO namenode.FSNamesystem (FSNamesystem.java:loadFromDisk(748)) - Finished loading FSImage in 15695 msecs2014-12-16 14:53:29,346 INFO namenode.NameNode (NameNodeRpcServer.java:<init>(329)) - RPC server is binding to master1.local:80202014-12-16 14:53:29,348 INFO ipc.CallQueueManager (CallQueueManager.java:<init>(53)) - Using callQueue class java.util.concurrent.LinkedBlockingQueue2014-12-16 14:53:29,390 INFO ipc.Server (Server.java:run(827)) - IPC Server Responder: starting2014-12-16 14:53:29,390 INFO ipc.Server (Server.java:run(674)) - IPC Server listener on 8020: starting2014-12-16 14:53:29,393 INFO namenode.NameNode (NameNode.java:startCommonServices(646)) - NameNode RPC up at: master1.local/10.0.0.0:80202014-12-16 14:53:29,393 INFO namenode.FSNamesystem (FSNamesystem.java:startActiveServices(1142)) - Starting services required for active state2014-12-16 14:53:29,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(160)) - Starting CacheReplicationMonitor with interval 30000 milliseconds2014-12-16 14:53:29,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 13919829439 milliseconds2014-12-16 14:53:29,576 INFO fs.TrashPolicyDefault (TrashPolicyDefault.java:initialize(92)) - Namenode trash configuration: Deletion interval = 360 minutes, Emptier interval = 0 minutes.2014-12-16 14:53:29,576 INFO fs.TrashPolicyDefault (TrashPolicyDefault.java:<init>(247)) - The configured checkpoint interval is 0 minutes. Using an interval of 360 minutes that is used for deletion instead2014-12-16 14:53:29,584 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 189 millisecond(s).2014-12-16 14:53:59,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30000 milliseconds2014-12-16 14:53:59,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).2014-12-16 14:54:29,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30000 milliseconds2014-12-16 14:54:29,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 2 millisecond(s).2014-12-16 14:54:59,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30000 milliseconds2014-12-16 14:54:59,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).2014-12-16 14:55:29,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30000 milliseconds2014-12-16 14:55:29,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).2014-12-16 14:55:59,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30001 milliseconds2014-12-16 14:55:59,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).2014-12-16 14:56:29,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30000 milliseconds2014-12-16 14:56:29,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).2014-12-16 14:56:59,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30000 milliseconds2014-12-16 14:56:59,398 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).2014-12-16 14:57:29,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30000 milliseconds2014-12-16 14:57:29,398 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 2 millisecond(s).2014-12-16 14:57:59,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30000 milliseconds2014-12-16 14:57:59,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s). (ever lasting...)

https://gist.github.com/tagomoris/ed7aa8ccb3d6003a29f9

ever lasting!!!!!!!!

${dfs.namenode.name.dir}/current and .../previous are not modified anymore in 60 minutes...

rollbackstop all daemonsreplace all packages w/ HDP2.1replace configurations for HDP2.1/usr/lib/hadoop/sbin/hadoop-daemon.sh --config ... start namenode -rollback

$ /usr/lib/hadoop/sbin/hadoop-daemon.sh --config /etc/hadoop/conf start namenode -rollbackstarting namenode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-namenode-4c3bf0834.livedoor.out"rollBack" will remove the current state of the file system,returning you to the state prior to initiating your recent.upgrade. This action is permanent and cannot be undone. If youare performing a rollback in an HA environment, you should becertain that no NameNode process is running on any host.Roll back file system state? (Y or N) Invalid input: Roll back file system state? (Y or N) Invalid input: Roll back file system state? (Y or N) Invalid input: Roll back file system state? (Y or N) Invalid input: Roll back file system state? (Y or N) Invalid input: Roll back file system state? (Y or N) Invalid input: $

impossible

I cannot input any “Y”s...

Recovery

replace namenode metadata w/ backup

execute NameNode (HDP 2.1) & DataNode

cluster recovered!!!!

Recovery

Replication numbers of all blocks are ZERO!!!!!!!1!!!!1!

Recovery

replication numbers of all blocks are ZERO!!!!!!!1!!!!1!

hadoop fsck / -> all blocks become fine!

Conclusion

I will wait for anyone who uses HDP 2.2...

upgrading from hdp 2.1 to hdp 2.2

info namenode

namenode upgrade2014

tez hdp2

upgradingfrom hdp2

yarn rmha namenode hahive

latest tezdiff hdp2

2mrv2hdfshiveupgrade

test clusterautomated

Technology

svaz průmyslu a dopravy Črdeficit vládního sektoru %...

hdp presentation linkedin

netwrix active directory object restore wizard...netwrix...

discover hdp 2.2: even faster sql queries with apache hive...

discover hdp 2.2: apache falcon for hadoop data governance

nanocarbon hdp

enchant hdp

upgrading from-hdp-21-to-hdp-24

catalogo servomotors hdp

hdp security overview

naval & special ship business unit · 2020. 12. 15. ·...

buku hdp rsa

discover hdp 2.2: data storage innovations in hadoop...

hdp - s1 - 12.1.14

discover hdp 2.2 hdfs - final

proyecto de (hdp)

apache http server version 2 - documentation & help ·...

seismic upgrading of existing rc buildings … upgrading of...

diapositivas de hdp

-,6 + %hdp