from snowflakes to a common automated platform
TRANSCRIPT
FROM SNOWFLAKES TO A COMMON AUTOMATED PLATFORM
Ricard ClauJohn Paul Newman
WHAT DO WE DO?• Automation group function in Wonga
• Small team servicing X engineers in 5 locations
• CI / CD pipelines, Logging / Monitoring, infrastructure provisioning, config management, …
• Most of the team is quite new to the company
A BIT OF WONGA HISTORY• Started in 2007, DevOps was not even a thing!
• Regions expansion, acquisitions, …
• Massive growth, engineers did their best to keep up
• Regulations happened, FCA approval needed
• Massive turnover, knowledge lost, lack of docs…
CURRENT PROBLEMS• Snowflake servers, many attempts in the past failed
• No unified processes in the group
• Not great monitoring dashboards
• No DevOps culture, we are seen as a service team
• Sometimes, all these are great excuses
WANT TO ACHIEVE• Build / Provision servers & infrastructure from code
• Needs to work for both Windows and Linux
• Hybrid cloud (AWS) / datacenter (vSphere)
• Simple, pragmatic and efficient tools
• Progressive introduction of new tooling
INITIAL ROADMAP• Pick tools to build / automate everything
• Rationalise CI / CD tooling
• Plan a progressive migration to the AWS cloud
• Rationalise logging / monitoring infrastructure
• Build platform capabilities that can be shared
CI
Jenkins Team City ThoughtWorks GoCD
CI
Jenkins Team City ThoughtWorks GoCD
JENKINS• Hundreds of plugins and documentation
• Job configuration from code via Jenkins Job Builder (JJB) or Wonga's own JJB Ruby DSL*
• Free! Allowing each team to have their own self-managed server and agents
* https://github.com/wongatech/wongatech.github.io/blob/master/_drafts/jenkins-job-builder-ruby-dsl.md
SCM
Gerrit GitLab GitHub
SCM
Gerrit GitLab GitHub
GERRIT• Review and CI validation processes
• Supports replication for DR
• LDAP backed authentication
• Integrates with internal tools, like JIRA
• Detailed ACLs and nice project structure
GITHUB• Nice UI and developers familiarity
• Hooks integration
• Debatable Pull Requests model
• Delegate DR, HA, etc… to Github
• Has source code based wiki
SCM
Puppet Labs Opscode Chef Ansible
SCM
Puppet Labs Opscode Chef Ansible
ANSIBLE STRENGTHS• Easy learning curve
• Agentless but you can also do ansible-pull
• Plays nicely with running Windows servers
• Decent community roles in Ansible Galaxy
ANSIBLE ISSUES• Ansible 2.0 is still a bit buggy
• You always need a Linux control machine
• Less flexible than Chef or Puppet (unless you write your own modules…)
• Variable quality in Ansible Galaxy
MONITORING
MONITORING
ELK STACK VS SPLUNK• Decent in-house Splunk experience
• Splunk dashboards still a bit better than Kibana
• Logstash needs to configure GROK, Splunk can mostly guess itself
• Still experimenting with ELK for our own stuff
INFLUXDATA• A platform for collecting time-series data
• Model system metrics and business metrics
• We use the Telegraf agent to send metrics, InfluxDB to store data and Grafana dashboards
• Need to explore Kapacitor for monitoring
INFLUXDATA CONCERNS• Experimental support for Windows
• Still 0.12 at the moment. Breaking API changes
• Many people get confused about time-series data
• InfluxDB cluster not free anymore
INFRASTRUCTURE
PACKER STRENGTHS• Works nicely with both Windows and Linux
• Plays nicely with AWS and VMWare
• Easy to share provisioning scripts
• Easier to understand than Config Management tools (Chef, Puppet or Ansible)
PACKER CAVEATS• Need to be very prescriptive or the number of
templates can get out of hand quickly
• Hard to go with a DRY approach
• Often not much benefit in Linux systems vs provisioning tools on startup
TERRAFORM STRENGTHS• Plays nicely with AWS and has some initial
support for vSphere (actively developed)
• Has a nice pluggable providers system to automate virtually everything… if you know Go
• No real cloud agnostic competitor
TERRAFORM ISSUES• Not great documentation and error messages
• Some providers don´t have nice update support
• Tricky to store state files
• Terraform modules are still a bit hacky
• Relatively immature overall
SOME SUCCESS!• Tools decided, good engagement in the team
• Building Packer AMIs and VMWare templates
• Some services already fully managed by Ansible
• Many servers rebuilt from config management
• Small Terraform setups working
THE (NEAR) FUTURE• Consul for Service Discovery and Config storage
• Better secrets / keys management (Vault)
• Start the Prod migration to AWS (some components already running in PreProd)
• Improve the current successes and think platform
BABY STEPS
A LONG JOURNEY AHEAD
QUESTIONS?• BTW… incidentally… we are hiring!
• Come talk to us!
• Thank you for listening!