~ilities testing
TRANSCRIPT
~ilities Test Automation
Xi Chen
Aldo Suwandi
Delivery and Quality Solution Group
Ecosystem Service Department
1
Background Story
2
3
Rakuten EcoSystem
Global Start Up and Expansion
Enterprise in Japan
Reliability
Recoverability
Scalability
Operability
4
Current Eco-system review
Planned Scale Out / InMonolithic Architecture No Standard OPS Automation
5
Requirements for modern platform: ZED
Microservice Architecture High Reliability / Recoverability
Easy Scaling / Operation
Standardization
https://jenkins.io/
Ecosystem Service Operation
6
Service A
User SRE
Service B Service C
Reliability Operability
Recoverability / Scalability
7
~ility Test for modern platform
• Reliability Test
• Operability Test
• Scalability Test
• Recoverability Test
~ility Test Problems
8
Definition
9
Reliability : the capability of the system to maintain its
service provision under defined conditions for defined
periods of time.
Operability : ability of the software to be easily operated
by a given user in a given environment.
(ISO 9126 Software Quality Characteristics)
Reliability
10
User
requests
User
User
Pod - A
Pod - B
Pod - C
service / application
Monitoring Operability
11
kibana
SRE
app
fluentdpod (1..X)
datadog agent
elastic-search
kubernetes
application utilization
application log
kubernetes event
new relic event
alert
operate
User Story
12
1. As SRE I want to be notified by the monitoring / alert
system once there is an incident within 5 minutes.
2. As SRE, when I scale out the application, there should
be no error alert triggered by the monitoring system.
3. As QA I want to verify if certain percentage of request
shall be succeed when there is an incident.
Current Problem & Situation
13
It requires at least 10 days to complete operability and
reliability testing
• Manual execution of manifest configuration settings
• Manual checking of alert system / configuration
• Environment preparation
Solution
14
Main Features
15
1. Operability Test
2. Reliability Test + Performance Test
3. Reliability Test + Functional Test
Demo (1)
17
Demo (2)
18
Reliability + Performance Test
19
QA
Reliability
TestFramework
50
100
150
200 210190
203185
200
0 0 0 0 10 8 3 2 4
50
100
150
200 200182
200183
196
0
50
100
150
200
250
0:00:00 0:00:20 0:00:30 0:00:40 0:00:50 0:01:00 0:01:10 0:01:20 0:01:30
Number of Requests per Second
All Requests Failed Requests Successful Request
execute trigger
result
system failure
test triggered
result
https://jenkins.io/
Demo
20
Reliability + Functional Test
21
QA
Functional
TestFramework
API
Reliability
TestFramework
dependency
trigger system failure functional test case
Demo
22
Conclusion
23
Results
24
Before
It requires 10 days to complete due to :
• Manual execution of manifest
configuration settings
• Manual checking of alert triggered
• Environment preparation
After
It only takes approximately 2 days
to finish all the test, since all of the
test setup and scenarios are
automated.
Summary
25
1. This test framework could reduce the lead time by giving
confidence for SRE team about their system configurations
2. Provide transparency between all stakeholders about
operational activities
3. Allowing QA / Test engineer to test on reliability
perspective.
We are hiring Senior QA Engineer!
26