nttドコモ様 導入事例 openstack summit 2016 barcelona 講演「expanding and deepening ntt...
TRANSCRIPT
![Page 1: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/1.jpg)
Copyright©2016 NTT DOCOMO, INC. All rights reserved.
Expanding and Deepening
NTT DOCOMO’s private cloud
NTT DOCOMO Inc. Jun Ishii
Kojiro Amano
VirtualTech Japan Hiromichi Ito
![Page 2: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/2.jpg)
DOCOMO, INC All Rights Reserved
Jun Ishii
o Research Engineer, NTT DOCOMO
o Developer, operator and technical consultant in NTT
DOCOMO private cloud
Hiromichi Ito
o CTO, VirtualTech Japan
o One of the first members of proposing OpenStack Bare
Metal Provisioning (currently called "Ironic")
Kojiro Amano
o Research Engineer, NTT DOCOMO
o Security consultant in NTT DOCOMO private cloud
About us
![Page 3: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/3.jpg)
Copyright©2016 NTT DOCOMO, INC. All rights reserved.
Expanding strategy of
our private cloud
Scale-up
strategy
![Page 4: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/4.jpg)
DOCOMO, INC All Rights Reserved
o One year after launched our private cloud,
it goes larger and larger!
Jun. 2015 Oct. 2016 Mar. 2017
Number of DCs 1 2 4
Number of HWs 50 300 900
Cores 1500 10000 Over 35000
![Page 5: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/5.jpg)
DOCOMO, INC All Rights Reserved
o Why could we expand our cloud so fast ?
o Main Strategy : Forest and Tree
Make a forest
Fill in trees
![Page 6: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/6.jpg)
DOCOMO, INC All Rights Reserved
o Three important details
Decided to migrate a large scale in-house system
Obtain budget for making a forest
Fast deployment methods by normalization
Enable to create the forest quickly
Novel challenges for various users
Enrich the forest to plant various trees
• L2GW
• GPU instance
• Reference model
• Security update
![Page 7: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/7.jpg)
DOCOMO, INC All Rights Reserved
o Three important details
Decided to migrate a large scale in-house system
Obtain budget for making a forest
Fast deployment methods by normalization
Enable to create the forest quickly
Novel challenges for various users
Enrich the forest to plant various trees
• L2GW
• GPU instance
• Reference model
• Security update
![Page 8: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/8.jpg)
DOCOMO, INC All Rights Reserved
o Decided to migrate a large scale in-house system
Update whole system due to HW EOL
o OpenStack-based cloud has many strengths.
Three years TCO is superior to an on-premises
Reduce 22% TCO, CAPEX & OPEX
Distributed architecture is compatible with cloud.
REST interfaces are suitable for maintaining systems.
Feasibility of migration/replication between long distance
L2GW, details are mentioned in later.
![Page 9: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/9.jpg)
DOCOMO, INC All Rights Reserved
o Three important details
Decided to migrate a large scale in-house system
Obtain budget for making a forest
Fast deployment methods by normalization
Enable to create the forest quickly
Novel challenges for various users
Enrich the forest to plant various trees
• L2GW
• GPU instance
• Reference model
• Security update
![Page 10: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/10.jpg)
DOCOMO, INC All Rights Reserved
o Fast deployment methods by normalization
DRY : Don't repeat yourself
o How to
deploy
set up [compute node, swift storage node] more
deal with hardware trouble
ansible, KB (See our Tokyo summit presentation)
o Only take one month to deploy new DC
From after racking and cabling ends till finish first QA test
Over 300 nodes, HW configuration settings & OpenStack install are just
finished in 10 days by 5 operators.
![Page 11: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/11.jpg)
DOCOMO, INC All Rights Reserved
o Three important details
Decided to migrate a large scale in-house system
Obtain budget for making a forest
Fast deployment methods by normalization
Enable to create the forest quickly
Novel challenges for various users
Enrich the forest to plant various trees
• L2GW
• GPU instance
• Reference model
• Security update
![Page 12: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/12.jpg)
DOCOMO, INC All Rights Reserved
o Novel challenges… to satisfy various users' will, there are many
difficulty and many know-how.
Add functions
L2GW
GPU instance
Reduce time to construct and manage users' systems
Reference model
Security update
![Page 13: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/13.jpg)
DOCOMO, INC All Rights Reserved
o Three important details
Decided to migrate a large scale in-house system
Obtain budget for making a forest
Fast deployment methods by normalization
Enable to create the forest quickly
Novel challenges for various users
Enrich the forest to plant various trees
• L2GW
• GPU instance
• Reference model
• Security update
These reasons enable our private cloud so FaT !!
![Page 14: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/14.jpg)
Copyright©2016 NTT DOCOMO, INC. All rights reserved.
How deepening our private cloud
Enrich
for users
![Page 15: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/15.jpg)
Copyright©2016 NTT DOCOMO, INC. All rights reserved.
L2 Gatewayfor
connecting existing large-scale networks
and inter-cloud networking.
I'm good at
connecting.
![Page 16: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/16.jpg)
DOCOMO, INC All Rights Reserved
○ Overview
Our user has large scale existing network and proprietary computer
systems.
– This network system has the great ability that provides Layer-2 connectivity
to nationwide.
– This proprietary computer system side does not have enough flexibility.
REST API
Service mobility
They decided to migrate to OpenStack on this renewal timing.
Network system side migration must be minimal.
Our user requested new two network services.
– Connect the tenant network between the two datacenters
– Connect instance and existing equipment with the layer-2
![Page 17: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/17.jpg)
DOCOMO, INC All Rights Reserved
○ Before
DC 1RTT 20-40ms
WAN
(Nationwide)
DC 2
dedicated
equipment
dedicated
equipment
dedicated
equipment
dedicated
equipment
proprietary
computer
system
proprietary
computer
system
![Page 18: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/18.jpg)
DOCOMO, INC All Rights Reserved
○ After
DC 1RTT 20-40ms
WAN
(Nationwide)
DC 2
dedicated
equipment
dedicated
equipment
dedicated
equipment
dedicated
equipment
L2
GW
RT
L2
GW
RT
![Page 19: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/19.jpg)
DOCOMO, INC All Rights Reserved
○ Our user's requirements
High Availability
– Do not share control services between data center.
– Avoid Single-Point-Of-Failure (SPOF)
Technical limitations
– Do not change IP addressing and routing architecture.
– Do not use Network Address Translation(NAT).
– Must connect instance and existing equipment by Layer 2.
Performance
– Total throughput requirement is several dozen Gbps.
– Average packet size is smaller than general private clouds.
– Several hundred VLAN must connect.
![Page 20: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/20.jpg)
DOCOMO, INC All Rights Reserved
○ Our user's requirements
High Availability
– Do not share control services between data center.
– Avoid Single-Point-Of-Failure (SPOF).
Technical limitations
– Do not change IP addressing and routing architecture.
– Do not use Network Address Translation(NAT).
– Must connect instance and existing equipment by Layer 2.
Performance
– Total throughput requirement is several dozen Gbps.
– Average packet size is smaller than general private clouds.
– Several hundred VLAN must connect.
We choose "Region" zoning model.
In "Region" model all service is separated correctly.
![Page 21: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/21.jpg)
DOCOMO, INC All Rights Reserved
○ Our user's requirements
High Availability
– Do not share control services between data center.
– Avoid Single-Point-Of-Failure (SPOF).
Technical limitations
– Do not change IP addressing and routing architecture.
– Do not use Network Address Translation(NAT).
– Must connect instance and existing equipment by Layer 2.
Performance
– Total throughput requirement is several dozen Gbps.
– Average packet size is smaller than general private clouds.
– Several hundred VLAN must connect.
Our base OpenStack deployment model avoid SPOF already.
![Page 22: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/22.jpg)
DOCOMO, INC All Rights Reserved
○ Our user's requirements
High Availability
– Do not share control services between data center.
– Avoid Single-Point-Of-Failure (SPOF).
Technical limitations
– Do not change IP addressing and routing architecture.
– Do not use Network Address Translation(NAT).
– Must connect instance and existing equipment by Layer 2.
Performance
– Total throughput requirement is several dozen Gbps.
– Average packet size is smaller than general private clouds.
– Several hundred VLAN must connect.
Existing system's IP addressing and routing architecture can
deploy on the overlay network.
![Page 23: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/23.jpg)
DOCOMO, INC All Rights Reserved
○ Our user's requirements
High Availability
– Do not share control services between data center.
– Avoid Single-Point-Of-Failure (SPOF).
Technical limitations
– Do not change IP addressing and routing architecture.
– Do not use Network Address Translation(NAT).
– Must connect instance and existing equipment by Layer 2.
Performance
– Total throughput requirement is several dozen Gbps.
– Average packet size is smaller than general private clouds.
– Several hundred VLAN must connect.
NAT is must technique for floating IP and connecting the external network.
But, This system does not request the floating IP address function.
![Page 24: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/24.jpg)
DOCOMO, INC All Rights Reserved
○ Our user's requirements
High Availability
– Do not share control services between data center.
– Avoid Single-Point-Of-Failure (SPOF).
Technical limitations
– Do not change IP addressing and routing architecture.
– Do not use Network Address Translation(NAT).
– Must connect instance and existing equipment by Layer 2.
Performance
– Total throughput requirement is several dozen Gbps.
– Average packet size is smaller than general private clouds.
– Several hundred VLAN must connect.Our base OpenStack deploy model is using L3 ECMP fabric and VXLAN.
So, We choose VXLAN Layer 2 Gateway(L2 Gateway).
![Page 25: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/25.jpg)
DOCOMO, INC All Rights Reserved
○ Our user's requirements
High Availability
– Do not share control services between data center.
– Avoid Single-Point-Of-Failure (SPOF).
Technical limitations
– Do not change IP addressing and routing architecture.
– Do not use Network Address Translation(NAT).
– Must connect instance and existing equipment by Layer 2.
Performance
– Total throughput requirement is several dozen Gbps.
– Average packet size is smaller than general private clouds.
– Several hundred VLAN must connect.
Software based VXLAN L2 Gateway does not match short packet workload.
So, We choose using hardware based VXLAN L2 Gateway.
![Page 26: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/26.jpg)
DOCOMO, INC All Rights Reserved
○ Equipment selection
Hardware VTEP
– Modern L3 switch chipset has Hardware VTEP gateway functions.
Intel FM6000, Broadcom Trident II, Trident II+, Tomahawk
– We tried to examine both Intel FM6000 and Broadcom Trident II based L3
network switch.
– Finally, we compared three vendors' L3 switches. (A, D, J)
Comparison result
– Vendor A's L3 switch can support VXLAN within a Multi-Chassis LAG
(MLAG) deployment. Other vendors can not. (as of June 2016)
– All vendors' L3 switches cleared performance criteria.
– All vendors OVSDB protocol support has some issues.
We choose vendor A's L3 switch. Because they support MLAG.
![Page 27: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/27.jpg)
DOCOMO, INC All Rights Reserved
○ Software test and proof-of-concept
Test target
– Neutron networking-l2gw
API's and implementations to support L2 Gateways in Neutron.
– Networking-l2gw provides "L2GW Service Plugin" and "L2GW Agent".
"L2GW Service Plugin" provides L2GW API services.
"L2GW agent" controls L3 switch by OVSDB protocol.
Test results
– Several minor bugs (Already fixed by the community.)
– Missing of features that is required for the production environment.
SSL support (Already implemented by the community.)
Handling Mcast_Macs_Remote table (We created modified patch for
vendor A based on community patch, not merged yet.)
![Page 28: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/28.jpg)
DOCOMO, INC All Rights Reserved
○ Controller Node
Neutron Server
L2GW Service
Plugin
API
ML2
L3
Nova
Keystone
Glance
Cinder
Horizon
ML2 L2POP
Compute Node
ML2 OVS
Agent
Open vSwitch
VTEP
Virtual
Switch
OVSDB
Server
ML2 L2POP
A’s Management Virtual applianceNetwork Node
L2GW
Agent
ML2 OVS Agent
Open vSwitchVTEP
L3
Agent
A’s
Hardware VTEP
WAN
Hardware
VTEP
Virtual Router
VLAN VLAN
OVSDB
ServerOVSDB
protocol
OVSDB
Server
Control
![Page 29: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/29.jpg)
DOCOMO, INC All Rights Reserved
○ Result of pilot tests with as scale as production environment
Hardware L2GW side
– OVSDB Server crash issues
When inserting a large number of record at one time, OVSDB server has crashed. (This issue already fixed by the vendor.)
Networking-L2GW side
– We encountered several critical bugs.
But It is hard to reproduce.
When hit these bugs, L2GW agent stopped.
– L2GW agent recovery from a crash state is terrible.
L2 gateway agent always syncs state between neutron database and OVSDB.
Unfortunately, when L2GW agent crashed or stalled, these two databases sometimes lost sync.
So, We must re-register L2GW connections manually when met these bugs.
![Page 30: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/30.jpg)
DOCOMO, INC All Rights Reserved
o Network trouble occurred without missing a week.
The L2GW agent is unstable.
The l2gw agent does not work correctly after few days when users run a long
test.
• That test includes continuous instance creating and deleting.
• That test includes continuous CRUD testing for neutron virtual network
port.
The instance could not communicate another region instance and existing
equipment.
![Page 31: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/31.jpg)
DOCOMO, INC All Rights Reserved
o We decided to not use Networking-l2gw in the production
environment for the time being.
We could not reproduce connection troubles between OVSDB and the
L2GW agent that occurred a weekly.
We could not fix all critical bugs that we encountered.
In our environment, a port status issue occurred.
That issue will cause L2GW agent problem.
We would not like to use the l2population.
The l2population does not have enough scalability yet.
Keep to a delivery date.Delivery date delay. aka. Our project death.
![Page 32: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/32.jpg)
DOCOMO, INC All Rights Reserved
o Our solution
We created a manual procedure to manage L2GW.
Manage an OVSDB by CLI
• "vtep-ctl" command
Manage an OVS flow table by CLI
• "ovs-ofctl" command
We created an automation system based on the manual procedure.
The upper management system calls the system.
This system is working correctly now.
![Page 33: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/33.jpg)
DOCOMO, INC All Rights Reserved
o Results
We provide stable L2GW which is connecting the two regions instances
and existing equipment.
We passed all test criteria provided from the client.
Gets excellent service flexibility by the OpenStack.
The existing network configuration was kept that our user requested.
o Next challenges of our L2GW project
Fix known issues of networking-l2gw.
We would like to provide OpenStack API for managing L2GW.
Fix scalability issue of l2population.
Investigate EVPN for expanding L2GW services.
![Page 34: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/34.jpg)
Copyright©2016 NTT DOCOMO, INC. All rights reserved.
GPU instance
Machine learning
on OpenStack
![Page 35: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/35.jpg)
DOCOMO, INC All Rights Reserved
Nvidia Tesla M40
for CUDA/cuDNN
![Page 36: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/36.jpg)
DOCOMO, INC All Rights Reserved
o Deploy
To deploy GPU nodes, not only PCI pass-through functions but also
IOMMU functions must be enabled.
Enable PCI pass-through
Whitelist, alias, flavor
See OpenStack wiki*
Enable IOMMU
Add grub settings to grub.conf
GRUB_CMDLINE_LINUX_DEFAULT=“$GRUB_CMDLINE_LINUX_DEFAUL
T intel_iommu=on”
dmseg | grep –e DMAR –e IOMMU
* https://wiki.openstack.org/wiki/Pci_passthrough
![Page 37: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/37.jpg)
DOCOMO, INC All Rights Reserved
o Operate : Take care of flavor memory size
As a result of our verification, IOMMU allocates all memory resources
when instances are launched.
If you set flavor memory size large enough and launch maximum number
of instances, OOM-killer might kill the qemu process.
Swapping doesn't work well because IOMMU allocate memory too fast.
Normal compute node
Mem space
IOMMU-enabled
compute node
Mem space
HostOS
HostOS
Instance
AInstance B
Instance A Instance B
Allocated
arter
ballooning
![Page 38: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/38.jpg)
DOCOMO, INC All Rights Reserved
o A workarounds to this problem are to take enough margin for
host OS.
Reduce flavor memory size
Sometimes too uncomfortable for GPU users
Set reserved_host_memory_mb in nova.cfg to large size
Also affect other flavors
Decrease maximum number of instances on per host
→Any other solutions? Help us!
![Page 39: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/39.jpg)
DOCOMO, INC All Rights Reserved
o How should we offer GPU flavor to in-house users?
o GPU with OpenStack, pros/cons
As a point of Pros. Cons.
Virtualization More stable than container Some GPU card trouble needs host
reboot
Immutableness Fast deploy, fast PDCA cycle Difficulty fair sharing GPU resources
Preparation
before
run machine
learning
Can provide device driver and
CUDA pre-installed image file
Need to follow new version, new
combination of driver, guest OS, CUDA
ver, library ver…
Cooperate with GPU instance users is important for private cloud providers
![Page 40: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/40.jpg)
Copyright©2016 NTT DOCOMO, INC. All rights reserved.
Reference Model
![Page 41: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/41.jpg)
DOCOMO, INC All Rights Reserved
Aim to migrate some of in-house APP to our cloud
Problem Strict security policies:over 100 guidelines for system architecture
when users reconstruct APP on our cloud
A lot of efforts are required to meet the policies.
Nice if predefined models are provided:)
Monitoring
Security
vulnerabilityCertificate
Remote Access
Identification and
authentication IDS/IPS
Log
Encryption
Server Network Storage Operation
Firewall
Redundancy
BackupConfiguration
management
… … … …
![Page 42: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/42.jpg)
DOCOMO, INC All Rights Reserved
“Reference Model” on our cloud System architecture based on many of security policies
Sets of OSS stacks that have been heavily tested on our project
Our cloud
Web three-tier
model
Heat template• Web basic model
• Web three-tier model
Input Template file into heat
Virtual router
Virtual network
Jump server
Proxy server
Web basic
model
Virtual router
Virtual network
LB/Web server
Jump server
Proxy ServerLB/Web server
DB ServerWEB/LBJump
…...
AP server
Images
Mechanism
![Page 43: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/43.jpg)
DOCOMO, INC All Rights Reserved
System architecture of Reference Model
WEB/LB WEB/LB
DB DB
Storage Storage
Internet
AP AP
Backup
Public-NW 192.168.10.0/24
Private-NW 192.168.20.0/24
Management-NW 192.168.30.0/24
SSL termination SSL termination
LB
VI
P
DB
VI
P
VPNSSL -VPNProxy/NT
PMonitoring
Storage
LB-HA-NW
DB-HA-NW
DB-repl-NW
End User
HTTPS
Operator
VPN
![Page 44: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/44.jpg)
DOCOMO, INC All Rights Reserved
WEB(Apache)/LB(LVS w/ Ultramonkey) server HTTPS, dummy certificates installed by default
WAF for IDS/ IPS
The key point is to not only install, but also complete the
default setting about security.
Why didn’t we use LBaaS_v1 ?
LBaaS_v1(juno) doesn’t satisfy with use cases of our users.
Required to
• set security group to LB(LBaaS_v2:not yet)
• terminate SSL at LB(LBaaS_v2:done)
• provide sorry page (LBaaS_v2:not yet)
![Page 45: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/45.jpg)
DOCOMO, INC All Rights Reserved
VPN(OpenVPN) server SSL-VPN for secure remote access
Tools for operation of SSL-VPN, such as create and revoke certificates
Why didn’t we use VPNaaS_v1 ?
The algorithm for authentication in IKE phase1 accepted sha1, which
will be encryption losing safety assurance.
VPNaaS in recent version “Newton” accepted sha256.
![Page 46: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/46.jpg)
DOCOMO, INC All Rights Reserved
The covered areas by “Reference Model”
60% of security policies
Future Update this model by adding the missing parts about security policies
Aims to cover 100%
Monitoring
Security
vulnerabilityCertificate
Remote Access
Identification and
authentication IDS/IPS
Log
Encryption
Server Network Storage Operation
Firewall
Redundancy
BackupConfiguration
management
… … … …
![Page 47: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/47.jpg)
Copyright©2016 NTT DOCOMO, INC. All rights reserved.
Security Update
![Page 48: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/48.jpg)
DOCOMO, INC All Rights Reserved
Current daily operation about vulnerabilities Most of the operation is manual.
Check vulnerabilities
Risk assessment of vulnerabilities
Management TODO list
Update our cloud
Caused
Human error
• Forget to check vulnerabilities
Time:1hours/day
More important operation of security as our cloud expands
Nice if these operations can be automated:)
![Page 49: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/49.jpg)
DOCOMO, INC All Rights Reserved
Current Operation
We proposed how to be automatic through processes.
Testing to enable us to reduce human error
Check
vulnerabilities
Risk
assessmentManagement
TODO list
Update
our cloud
Semi-
automatic
OperationBy the script checking
package-version related
with vulnerabilities
By checking
vendor siteBy Excel
Semi-
automatic
By making ansible
playbook
![Page 50: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/50.jpg)
DOCOMO, INC All Rights Reserved
Check vulnerabilities CVE & CVSS
CVE: attached ID to vulnerabilities
CVSS: score to vulnerabilities
API “CVE-search” is used for check Github: https://github.com/cve-search/cve-search
![Page 51: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/51.jpg)
DOCOMO, INC All Rights Reserved
Risk Assessment Key point
CVSS risk assessment is not always match with our environment.
Important
The usage and version of package(→script) Whether the host can be internal NW or not.
Vulnerability that guest OS can invade host OS
Need to re-evaluate the CVSS score for each host regarding its
environment
![Page 52: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/52.jpg)
DOCOMO, INC All Rights Reserved
Management TODO list Do not forget vulnerabilities which have high risk until the patch of the
vulnerability is applied.
Important
Even if the CVSS score is low, it will sometimes become high score
in our environment.
Need to check the same vulnerabilities continuously
![Page 53: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/53.jpg)
DOCOMO, INC All Rights Reserved
Update Our cloud Semi-automatic Procedure
Manual interventions are required only for check points
Consider the influence on users’ instance
Future Apply our proposed way in the test environment at first
Extend our tools for user’s self check
Live Migration
users’ instance
Security
Update
Return back
user’s instance
Check point①The normality
of User’s APP
Check point②The normality of
Openstack function
Check point③The normality of
User’s APP
![Page 54: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)](https://reader033.vdocuments.net/reader033/viewer/2022050614/5876d4e91a28ab1d238b5521/html5/thumbnails/54.jpg)
DOCOMO, INC All Rights Reserved
Thank you for listening!