ithome cloud summit: the next generation of data center: machine intelligent cluster
TRANSCRIPT
MachineIntelligentCluster:Thenextgenerationofdatacenter
EvanLin@LinkerNetworks
About meCloud Architect @ Linker Networks
Golang User Group - Co-Organizer
Top 5 Taiwan Golang open source contributor (githubaward)
Developer, Curator, Blogger
Recap Cloud Summit 2016
Agenda• Problems on data center• How machine learning helps• Machine Intelligent Cluster• Applications• Q&A
Data center
• Power consumption• Low usage• Unpredictable peak• Noisy neighbors
Efficiency
• Physical damage• Networking problem• Anomaly• Attack
Risk
Real data center
Power consumption
Low usage and Unpredictable peak
Noisy neighbor
Use machine learning improve DC power consumption
None of your business?
Modern Data center: Machine Cluster
Before machine clusterDB Master:IP: 192.168.1.222
DB Slave:IP: 192.168.1.223
Web Server 1:IP: 192.168.1.101
Web Server 2:IP: 192.168.1.102
Web Server 3:IP: 192.168.1.103
Load Balancer:IP: 1.2.3.4
Container orchestration
Resource arrangement
Scalability
Portability
Automation migration
Resource management
3 Web App Servers2 DB Servers
1 Load Balancer
Scalability
Automation migration
Automation migration
Automation migration
Automation migration
But .. we need better ..
No prediction
How to define scale out threshold?
50 %?
75 %?
25 %?
MachineIntelligentCluster
Efficiency
Maximize Utilization
Operation Optimization
Accident
RiskMitigation
ServiceabilityManagement
Machine Intelligence
Cluster
How MIC helps
Operation Optimization1. Reinforcement learning 2. Adjust thermostat3. Check the reward (CPU performance).
[1]: Refer from https://goo.gl/ly3zyX
Maximize UtilizationAnalyze utilization and reduce working machines to save our customer budget
- Predict utilization trend- Provide auto-scaling threshold
adjustment
Prediction and dynamic threshold
OptimizedScheduler
Node 1 Node 2Node 3
Node 1 Node 2 Node 3
Nginx(CPU 30%)
DB- MySQL(IO 25%)
DB- Mongo(IO 30%)
Apache(CPU 30%)
Backend Process(CPU 35%)
DB- Oracle(IO 35%)
NodeJS(CPU 7%)
Go backend(CPU 8%) Nginx
(CPU 30%)
DB- MySQL(IO 25%)
NodeJS(CPU 7%)
Go backend(CPU 8%)
Apache(CPU 30%)
Backend Process(CPU 35%)
DB- Mongo(IO 30%)
DB- Oracle(IO 35%)
Maximize Utilization
P.S. Not rearrange processes, we change the scheduler to avoid it happen..
Model 1
Serial Number Prediction
S.M.A.R.T. RNN Prediction
Serviceability Management (cont.)
Model 2
Dummy VM Detection Outlier Attack Detection
Mitigate risk
Storage SDN
Zombie Tagging system
Architecture
Cloud Native Architecture
HPC (with GPU) Server
Storage SDN
Storage SDN
Data Collect Probe & Sensor & Smart GW
Visualization
Data Process
Data Analysis &Machine Learning
DCOS/ Kubernetes Spark ML Tensorflow
DCOS / Kubernetes
Cassandra (Storage)
Kafka (Queueing)
Go/Akka (Connector)
Spark (ETL/Streaming)
D3.js
Scikit Learn R
Interactive Dashboard
Jupyter Notebook
Zeppelin
ML Job Scheduler Chronos
MIC System Architecture
Data Agent KafkaSpark
Streaming
Cassandra
Spark ML(Classification,
Clustering)
TensorFlow(Deep
Learning)
Backend ServerAPI
Portal
TensorFlow Predict
SparkML Predict
MIC Data Flow
Applications on MIC
Machine Intelligent Cluster
IOT Gaming 5G NFV E-Commerce
Machine Intelligent Cluster Summary
• Machine cluster with Intelligent• Features• Self-Optimization• Self-Learning• Self-Recovery• Green, Secure and Predictive machine cluster
歡迎訂閱碼天狗
http://weekly.codetengu.com/
ThankYou