openstack summit tokyo - know-how of challlenging deploy/operation ntt docomo's mail cloud...

Post on 15-Apr-2017

1.061 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Copyright © 2015 NTT DATA Corporation

2015/10/27 NTT DATA Corporation

Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail

Cloud System Powered by OpenStack Swift

2 Copyright © 2015 NTT DATA Corporation

Abstract

Docomo mail is 24/7 cloud mail system which has accesses from over 20 million people. This mail system stores user's mail archive in OpenStack Swift with Peta Byte scale capacity deployed by NTT DATA. We have been successfully operating this service since Sep 2014 without any downtime. In this session, we'll present the actual issues and challenges we have faced and conquered.

3 Copyright © 2015 NTT DATA Corporation

Today’s contents and presenter

○Project Overview

Changes of Japanese mobile situation and abstraction of this project

– Project Manager : Sosuke Kakehi

○Migrate process

Process of migrating swift to existed docomo mail system

– OpenStack Swift Engineer : Masaaki Nakagawa

○Technical challenges

Swift technical challenges on this project

– OpenStack Engineer : Ryosei Kasai

○Operating session

Large scale swift operation

– OpenStack Swift Engineer : Masaaki Nakagawa

Copyright © 2013 NTT DATA Corporation 4

Project Overview

5 Copyright © 2015 NTT DATA Corporation

Project Overview

1 NTT Docomo's Cloud Mail System

2 Project Background

3 Customer Requirements

6 Copyright © 2015 NTT DATA Corporation

Cloud Mail System

NTT Docomo's Cloud Mail System - System Summary

• Docomo Mail - NTT Docomo’s Cloud Mail Service

• Over 20 million users

• Powered by OpenStack Swift

High Performance Storage

Object Storage OpenStack Swift

Later Mail

Tablet PC Smart Phone

Archived Mail

Stored to Swift

7 Copyright © 2015 NTT DATA Corporation

NTT Docomo's Cloud Mail System - System Scale

• Geographically Distributed Swift Cluster

• Over 6.4 Peta Byte Logical Capacity

• Over Hundreds of Servers

Site2

Site3

Site4

Site1

Proxy Node

Storage Node Region1

Storage Node Region2

Storage Node Region3

8 Copyright © 2015 NTT DATA Corporation

Project Background

Shift from “Feature phone” to “Smart phone”

Service

Service

Service

Service

Smart Phone / Tablet PC

Service

Documents

Text

Photos

Music Movie Application

E-mail Data Size was increased

9 Copyright © 2015 NTT DATA Corporation

Cost

Cost Cost

Cost Cost Cost

Project Background

High-end Storage

High-end Storage

High-end Storage

High-end Storage

High-end Storage

Extend the High-end Storage, extend, extend

= expensive cost, cost, cost

High-end Storage

10 Copyright © 2015 NTT DATA Corporation

Customer Requirements

High Availability

Low Cost

High Scalability

OSS(Software Storage) + IA Server

Disaster Recovery

etc

Adopt OpenStack Swift

Copyright © 2013 NTT DATA Corporation 11

Migrate session

12 Copyright © 2015 NTT DATA Corporation

Overview of migration session

NTT DOCOMO has launched docomo mail service since Oct 2013, and swift was installed docomo mail system at Jan 2015. When we migrated swift to docomo mail system, docomo mail did not stop user service.

In this section, I would like to introduce overall of docomo mail system and migration process.

later older

Oct, 2013 docomo mail service in

Jan, 2015 Swift service in

May, 2014 test user start to use swift

Oct, 2015 General user start to test use Swift

13 Copyright © 2015 NTT DATA Corporation

swift (archived mail holder)

High speed block storage (later mail holder)

Swift migrate session System construction overview

Docomo mail frontend server (proxy of block storage and swift)

Proxy

Storage Storage Storage

Internet

archived user mail

archived user mail

archived user mail

user mail user mail user mail

14 Copyright © 2015 NTT DATA Corporation

Swift migrate session Mail access flow

Docomo mail frontend server (proxy of block storage and swift)

Block Storage

Proxy

Storage Storage Storage

Internet

archived user mail

archived user mail

archived user mail

access device

user mail user mail user mail

User mail will be archived/stored to swift

15 Copyright © 2015 NTT DATA Corporation

Swift migrate session System construction (before swift installed)

Docomo mail frontend server

Block Storage

Internet

archived user mail

archived user mail

user mail

16 Copyright © 2015 NTT DATA Corporation

Swift migrate session Migration 1st step – deploy swift and test

Docomo mail frontend server

Block Storage

Proxy

Storage Storage Storage

Internet

• Deploy swift • Trouble test • Tuning

archived user mail

archived user mail

user mail

17 Copyright © 2015 NTT DATA Corporation

Swift migrate session Migration 2nd step – copy test user’s archived mail

Docomo mail frontend server

Block Storage

Proxy

Storage Storage Storage

Internet

Copy test user’s archived mail

General user’s mail is not copied

archived user mail

archived user mail

archived user mail

archived user mail

archived user mail

user mail

18 Copyright © 2015 NTT DATA Corporation

Swift migrate session Migration 3rd step – copy general user’s archived mail

Docomo mail frontend server

Block Storage

Proxy

Storage Storage Storage

Internet

Move general user’s archived mail

keep all mail archive against swift trouble

archived user mail

archived user mail

archived user mail

archived user mail

archived user mail

user mail

19 Copyright © 2015 NTT DATA Corporation

Swift migrate session Migration 4th step – launch service

Docomo mail frontend server

Block Storage

Proxy

Storage Storage Storage

Internet

archived user mail

archived user mail

archived user mail

archived user mail

archived user mail

user mail

20 Copyright © 2015 NTT DATA Corporation

Conclusion of migrate session

• Firstly, docomo mail has only block storage

• We need to deploy and migrate swift with no down time

• To achieve it, we divide migrate to 4 steps

– Deploy

– Test user mail copy to swift

– General user mail copy to swift with remaining block storage

– System durability check

• We achieve no service down migration

As I said , in migrating, we achieve some technical challenges. Next session, Mr. Kasai introduce it.

Copyright © 2013 NTT DATA Corporation 21

Technical session

22 Copyright © 2015 NTT DATA Corporation

Our Technical Challenges

1 Durability assurance

2 Geographically distributed cluster

3 Quality

23 Copyright © 2015 NTT DATA Corporation

Challenge 1: Durability assurance

• Quality requirement in Japan

• This system needs very high quality.

• Everything should be under control

• System design for normal situation

• System design for defeat situation

Even on distributed system

• Analyze every behavior before building system

24 Copyright © 2015 NTT DATA Corporation

Recovery test in variety of defeat pattern

• Variety of failure pattern

(1) The point of failure • Disk, NIC, Process, Node, …

(2) The number of failures • 1, 2, 3, 4, …

(3) The range of failures • 1 node, multiple nodes/zones/regions, …

100s of test cases!!

Case #201

Proxy

Sto

rage

Sto

rage

Sto

rage

Sto

rage

Sto

rage

Sto

rage

Zone1 Zone2

Region 1

Case #201

Proxy

Sto

rage

Sto

rage

Sto

rage

Sto

rage

Sto

rage

Sto

rage

Zone1 Zone2

Region 1

Case #001

Proxy

Storage Storage Storage

Case #001

Proxy

Storage Storage Storage

Case #001

Proxy

Storage Storage Storage

Case #101

Proxy

Storage Storage Storage

Case #301

Proxy

Storage Storage Storage

Case #501

Proxy

Sto

rage

Sto

rage

Sto

rage

Sto

rage

Sto

rage

Sto

rage

Zone1 Zone2

Region 1

25 Copyright © 2015 NTT DATA Corporation

Result of recovery test

• Extreme durability and recoverability of swift

• Swift rarely loses data in it. Only accurate snipe or great disaster can causes data lost.

26 Copyright © 2015 NTT DATA Corporation

private network

Site 3

Storage

Site 4

Storage

Site 2

Storage

Challenge 2: Geographically distributed cluster

• Geographically distributed swift cluster to realize disaster recovery

• Important points to evaluate global distribution

1. Client request

2. Durability Site 1

Proxy 300km~ 300km~

300km~ 300km~

300km~

27 Copyright © 2015 NTT DATA Corporation

Pseudo-global cluster

• Pseudo-global cluster with simulated network latency

• Proxy and 3 Storage regions placed in different locations

• 10~200msec latency between locations simulated by tc

• TL msec latency for one way, 2*TL msec latency for round trip

Proxy

Storage region 1

Storage region 2

Storage region 3

10~200msec latency

10~200msec latency

10~200msec latency

10~200msec latency

10~200msec latency

10~200msec latency

Client Proxy

Storage region1

TLmsec

TLmsec

28 Copyright © 2015 NTT DATA Corporation

2 points of Pseudo-global cluster testing

1. Client request

• Object PUT/GET/DELETE from client

• Error rate

• Turnaround time for 1 request

• Throughput

• Latency between proxy and storage

2. Durability

• Auto recovery by object-replicator

• Error rate

• Turnaround time of 1 sync process

• Throughput

• Latency between storages

Proxy

Storage region 1

Storage region 2

Storage region 3

Storage region 1

Storage region 2

Storage region 3

Client

Proxy

PUT GET

Client

29 Copyright © 2015 NTT DATA Corporation

Test1: Client request

Object PUT/GET/DELETE from client

• No error caused by latency

• Degradation of turnaround time

• No throughput degradation for concurrent requests

latency

limitation of network bandwidth

PUT/GET

DELETE

Latency concurrency

Throughput Turnaround time

30 Copyright © 2015 NTT DATA Corporation

Test2: Durability

Auto recovery by object-replicator

• No error caused by latency

• Performance degradation of one process

• No throughput degradation for concurrent process

Latency concurrency

Throughput

latency

limitation of network bandwidth

Defeat

Recovery

Performance

31 Copyright © 2015 NTT DATA Corporation

Challenge 3: Quality

1. Software Quality

• All processes work well ?

• Account / Container / Object

• server / replicator / updater / reaper

2. System Quality

• Our system is working well ?

• All nodes

• All APIs

32 Copyright © 2015 NTT DATA Corporation

Software quality

1 Add process name checking into swift-init

2 Prevent redundant commenting by drive-audit

3 Remove invalid connection checking in db_replicator

4 Add timestamp checking in AccountBroker.is_status_deleted

5 Fix error log of proxy-server when cache middleware is disabled

Source Code Analysis and Customize

• Official patch (below)

• Original patch

Strict test all processes

and more …

Our official patch

33 Copyright © 2015 NTT DATA Corporation

System quality

storage servers …

Tempest

proxy servers

checking tool

Test all nodes

• Automation testing tools for

1. APIs : All swift APIs, including error case

2. Nodes : All swift nodes

• Extended Tempest and checking tool

Test all APIs

34 Copyright © 2015 NTT DATA Corporation

Our solutions

1 Durability assurance

2 Geographically distributed cluster

3 Quality

Recovery test in variety of failure pattern

Performance test of frontend/backend with pseudo-global swift cluster

・Source Code Analysis and Customize ・Automated testing

Challenge Solutions

Copyright © 2013 NTT DATA Corporation 35

Operating session

36 Copyright © 2015 NTT DATA Corporation

Overview of operating session

Operation scheme of Docomo mail is high confidential.

We would like to introduce about NTT DATA swift solution's operation.

Docomo mail system uses NTT DATA swift solution with customizing.

37 Copyright © 2015 NTT DATA Corporation

Operating session Large scale system makes operation costly

Large scale Swift

scale out management repair tuning

38 Copyright © 2015 NTT DATA Corporation

Operating session Reduce operating work amount

Parallel access (pssh / pscp)

Automatic deploy (kickstart)

Tuning (svn / puppet)

Master repository

39 Copyright © 2015 NTT DATA Corporation

Operating session Reduce operation frequency

Disk failure Node down Server Process Down Backend process down ex)auditor process

Service affect

40 Copyright © 2015 NTT DATA Corporation

Operating session Stop monitoring which low priority

Periodic performance check

monitoring alert

41 Copyright © 2015 NTT DATA Corporation

Conclusion of operating session

• Swift is consisted by many nodes

• System operating costs of Swift tend to be costly

• NTT DATA has know-how to reduce swift operation cost

– Using operation parallelized tool

– Customizing for monitoring priority

– Change monitoring items to periodic check

42 Copyright © 2015 NTT DATA Corporation

Conclusion of this presentation

We introduce usage, challenge, and operating OpenStack swift at docomo mail service system

• System migration with no service down time

• Three technical achievement

• Reduce operating cost

Docomo mail has been service with no down time.

If you have something questions, please come to NTT booth.

○Attention All company names, product names, and service names mentioned are trademarks or registered trademarks of the respective companies

Copyright © 2011 NTT DATA Corporation

Copyright © 2015 NTT DATA Corporation

top related