upgrading from hdp 2.1 to hdp 2.2

25
Upgrading from HDP2.1 to HDP2.2 2014/12/18 @tagomoris HadoopSCR #hadoopreading

Upload: satoshi-tagomori

Post on 12-Jul-2015

946 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Upgrading from HDP 2.1 to HDP 2.2

Upgradingfrom HDP2.1 to HDP2.2

2014/12/18@tagomoris

HadoopSCR #hadoopreading

Page 2: Upgrading from HDP 2.1 to HDP 2.2

Satoshi Tagomori (@tagomoris)LINE Corp.

Page 3: Upgrading from HDP 2.1 to HDP 2.2

Analysis2 (CDH4)

Hadoop Cluster SwitchingLong running CDH4 cluster

Switching to new cluster

w/ Fast network, Large HDD, Many CPU core

changing Hive table schema/file formats

No downtime!

MRv1/HDFS

Hive

Page 4: Upgrading from HDP 2.1 to HDP 2.2

Distribution Options

Options at Oct 2014

CDH5

HDP2.1

Apache Hadoop Release

Hive 0.13, Tez -> HDP2.1 !

Page 5: Upgrading from HDP 2.1 to HDP 2.2

input datafluent-plugin-webhdfs

Shib

executing queriesover hiveserver1/2

Analysis2 (CDH4)

MRv1/HDFS

Hive

Page 6: Upgrading from HDP 2.1 to HDP 2.2

double write

Shib

Analysis2 (CDH4)

MRv1/HDFS

Hive

Analysis3 (HDP2.1)

MRv2/HDFS

Hivedistcp

Nov-Dec 2014

Page 7: Upgrading from HDP 2.1 to HDP 2.2

HDP 2.1.5.0

Install over Ansible, w/o Ambari

for configuration versioning

Hadoop 2.4.0

YARN RM-HA + Namenode HA

Hive 0.13

Tez?

Page 8: Upgrading from HDP 2.1 to HDP 2.2

Shib

Analysis2 (CDH4)

MRv1/HDFS

Hive

Analysis3 (HDP2.1)

MRv2/HDFS

Hive

Few days later (not yet)

Page 9: Upgrading from HDP 2.1 to HDP 2.2

HDP 2.2!

Hadoop 2.6.0

Datanode hot swap drive

YARN ResourceManager REST API

Hive 0.14.0 (...)

Latest Tez

Page 10: Upgrading from HDP 2.1 to HDP 2.2

diff HDP2.1 HDP2.2

hadoop-yarn-2.4.0.2.1.5.0-695.el6

-> hadoop-yarn-2.6.0.2.2.0.0-2041.el6

+ hadoop_2_2_0_0_2041-yarn-2.6.0.2.2.0.0-2041.el6

/usr/lib/hadoop/....

-> /usr/hdp/current/hadoop-*

Page 11: Upgrading from HDP 2.1 to HDP 2.2

diff HDP2.1 HDP2.2

Toooooooooooooo many diff lines

Companion files of HDP (2.1 -> 2.2)

in hive-site.xml: 353 -> 1207 lines

in tez-site.xml: 126 -> 261 lines

How to edit/control?

IDE? Editor? KIAI? Excel?

Page 12: Upgrading from HDP 2.1 to HDP 2.2

hadoop_xml_diff.rb

http://d.hatena.ne.jp/tagomoris/20141215/1418631988

Page 13: Upgrading from HDP 2.1 to HDP 2.2

Upgrade test in test clusterAutomated Upgrade by Ansible playbook

stop hiveserver2stop cluster

-safemode enter, -saveNamespacemake backup (hdfs metadata, hive metastore)-finalizeUpgradenm, rm, dn, nn, zkfc, jn, zkcheck edits stopped

Upgrade yum repo/packages/configurationsexecute hdp-selectStart cluster

zk, jn

“hdfs namenode -upgrade”

Page 14: Upgrading from HDP 2.1 to HDP 2.2

Upgrade in test clusterAutomated Upgrade by Ansible playbook

stop hiveserver2stop cluster

-safemode enter, -saveNamespacemake backup (hdfs metadata, hive metastore)-finalizeUpgradenm, rm, dn, nn, zkfc, jn, zkcheck edits stopped

Upgrade yum repo/packages/configurationsexecute hdp-selectStart cluster

zk, jn

“hdfs namenode -upgrade” ... ever lasting ...

Page 15: Upgrading from HDP 2.1 to HDP 2.2

“Ah, I might make any mistakes...”

Page 16: Upgrading from HDP 2.1 to HDP 2.2

double write

Shib

Analysis2 (CDH4)

MRv1/HDFS

Hive

Analysis3 (HDP2.2)

MRv2/HDFS

Hive

Upgrade HDP 2.1->2.2Dec 16 2014

Page 17: Upgrading from HDP 2.1 to HDP 2.2

Upgrade in analysis3Manual Procedure!!!

stop hiveserver2stop cluster

-safemode enter, -saveNamespacemake backup (hdfs metadata, hive metastore)-finalizeUpgradenm, rm, dn, nn, zkfc, jn, zkcheck edits stopped

Upgrade yum repo/packages/configurationsexecute hdp-selectStart cluster

zk, jn

/usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh \start namenode -upgrade

Page 18: Upgrading from HDP 2.1 to HDP 2.2

2014-12-16 14:53:28,919 INFO namenode.NNUpgradeUtil (NNUpgradeUtil.java:doUpgrade(139)) - Performing upgrade of storage directory /var/hadoop/hdfs/nn2014-12-16 14:53:28,939 INFO namenode.FSNamesystem (FSNamesystem.java:loadFSImage(1029)) - Need to save fs image? false (staleImage=false, haEnabled=true, isRollingUpgrade=false)2014-12-16 14:53:28,941 INFO namenode.FSEditLog (FSEditLog.java:startLogSegment(1173)) - Starting log segment at 2627951392014-12-16 14:53:29,224 INFO namenode.NameCache (NameCache.java:initialized(143)) - initialized with 23408 entries 1740524 lookups2014-12-16 14:53:29,227 INFO namenode.FSNamesystem (FSNamesystem.java:loadFromDisk(748)) - Finished loading FSImage in 15695 msecs2014-12-16 14:53:29,346 INFO namenode.NameNode (NameNodeRpcServer.java:<init>(329)) - RPC server is binding to master1.local:80202014-12-16 14:53:29,348 INFO ipc.CallQueueManager (CallQueueManager.java:<init>(53)) - Using callQueue class java.util.concurrent.LinkedBlockingQueue2014-12-16 14:53:29,390 INFO ipc.Server (Server.java:run(827)) - IPC Server Responder: starting2014-12-16 14:53:29,390 INFO ipc.Server (Server.java:run(674)) - IPC Server listener on 8020: starting2014-12-16 14:53:29,393 INFO namenode.NameNode (NameNode.java:startCommonServices(646)) - NameNode RPC up at: master1.local/10.0.0.0:80202014-12-16 14:53:29,393 INFO namenode.FSNamesystem (FSNamesystem.java:startActiveServices(1142)) - Starting services required for active state2014-12-16 14:53:29,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(160)) - Starting CacheReplicationMonitor with interval 30000 milliseconds2014-12-16 14:53:29,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 13919829439 milliseconds2014-12-16 14:53:29,576 INFO fs.TrashPolicyDefault (TrashPolicyDefault.java:initialize(92)) - Namenode trash configuration: Deletion interval = 360 minutes, Emptier interval = 0 minutes.2014-12-16 14:53:29,576 INFO fs.TrashPolicyDefault (TrashPolicyDefault.java:<init>(247)) - The configured checkpoint interval is 0 minutes. Using an interval of 360 minutes that is used for deletion instead2014-12-16 14:53:29,584 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 189 millisecond(s).2014-12-16 14:53:59,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30000 milliseconds2014-12-16 14:53:59,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).2014-12-16 14:54:29,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30000 milliseconds2014-12-16 14:54:29,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 2 millisecond(s).2014-12-16 14:54:59,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30000 milliseconds2014-12-16 14:54:59,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).2014-12-16 14:55:29,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30000 milliseconds2014-12-16 14:55:29,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).2014-12-16 14:55:59,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30001 milliseconds2014-12-16 14:55:59,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).2014-12-16 14:56:29,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30000 milliseconds2014-12-16 14:56:29,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).2014-12-16 14:56:59,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30000 milliseconds2014-12-16 14:56:59,398 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).2014-12-16 14:57:29,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30000 milliseconds2014-12-16 14:57:29,398 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 2 millisecond(s).2014-12-16 14:57:59,396 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(178)) - Rescanning after 30000 milliseconds2014-12-16 14:57:59,397 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(201)) - Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s). (ever lasting...)

https://gist.github.com/tagomoris/ed7aa8ccb3d6003a29f9

Page 19: Upgrading from HDP 2.1 to HDP 2.2

ever lasting!!!!!!!!

${dfs.namenode.name.dir}/current and .../previous are not modified anymore in 60 minutes...

Page 20: Upgrading from HDP 2.1 to HDP 2.2

rollbackstop all daemonsreplace all packages w/ HDP2.1replace configurations for HDP2.1/usr/lib/hadoop/sbin/hadoop-daemon.sh --config ... start namenode -rollback

$ /usr/lib/hadoop/sbin/hadoop-daemon.sh --config /etc/hadoop/conf start namenode -rollbackstarting namenode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-namenode-4c3bf0834.livedoor.out"rollBack" will remove the current state of the file system,returning you to the state prior to initiating your recent.upgrade. This action is permanent and cannot be undone. If youare performing a rollback in an HA environment, you should becertain that no NameNode process is running on any host.Roll back file system state? (Y or N) Invalid input: Roll back file system state? (Y or N) Invalid input: Roll back file system state? (Y or N) Invalid input: Roll back file system state? (Y or N) Invalid input: Roll back file system state? (Y or N) Invalid input: Roll back file system state? (Y or N) Invalid input: $

Page 21: Upgrading from HDP 2.1 to HDP 2.2

impossible

I cannot input any “Y”s...

Page 22: Upgrading from HDP 2.1 to HDP 2.2

Recovery

replace namenode metadata w/ backup

execute NameNode (HDP 2.1) & DataNode

cluster recovered!!!!

Page 23: Upgrading from HDP 2.1 to HDP 2.2

Recovery

replace namenode metadata w/ backup

execute NameNode (HDP 2.1) & DataNode

cluster recovered!!!!

Replication numbers of all blocks are ZERO!!!!!!!1!!!!1!

Page 24: Upgrading from HDP 2.1 to HDP 2.2

Recovery

replace namenode metadata w/ backup

execute NameNode (HDP 2.1) & DataNode

cluster recovered!!!!

replication numbers of all blocks are ZERO!!!!!!!1!!!!1!

hadoop fsck / -> all blocks become fine!

Page 25: Upgrading from HDP 2.1 to HDP 2.2

Conclusion

I will wait for anyone who uses HDP 2.2...