gitlab infrastructure 20160621

13
GitLab Infrastructure Status Report

Upload: sytses

Post on 07-Jul-2016

354 views

Category:

Documents


0 download

DESCRIPTION

GitLab Infrastructure

TRANSCRIPT

Page 1: GitLab Infrastructure 20160621

GitLab InfrastructureStatus Report

Page 2: GitLab Infrastructure 20160621

We have HTTP queue time in monitoring, so we ran an experiment

What if we add more memory to workers?

Page 3: GitLab Infrastructure 20160621

This had a good impact across the board - less load in general

Page 4: GitLab Infrastructure 20160621

Specially on API timings (authorized_keys lookup timings)

This is why git-ssh is going faster, but there is still a long way to go.

Page 5: GitLab Infrastructure 20160621

Some things did not go well with the change

Redis leaves connections behind - GitLab max connections open -> outage

Page 6: GitLab Infrastructure 20160621

Deploys - RC3 blowed in production

On a Friday 1AM my time.

So we built staging the next Monday staging.gitlab.com is way smaller and less powerful than GitLab.com, but it has all the data.

Thanks @Jeroen

Done is better than perfect

Page 7: GitLab Infrastructure 20160621

Deploys - RC4 blew in staging that very same Monday

<

Staging Production

Page 8: GitLab Infrastructure 20160621

Postgres is still dying on us, or was it?

Query counts monitoring allowed us to corner the

problem and get it fixed In RC5

Thanks @marat!

Page 9: GitLab Infrastructure 20160621

Monitoring - improvements on how methods are measured

We are actually

showing where the time

is going now.

Thanks @Yorick!

Page 10: GitLab Infrastructure 20160621

Performance - no progress besides the API

Page 11: GitLab Infrastructure 20160621

Storage● Cephfs - dev.gitlab.org has been running on cephfs for the last month

○ Did you noticed? No? That’s good! :)○ Pushing the linux kernel to it takes 27 minutes ~1.5Gb○ Pushing the linux kernel to GitLab.com takes between 1:30hs to forever

● Our measurements were wrong, Cephfs gives 500/150 IOPS● But it scaled without a hiccup up to 98 workers nodes (clients).● We are testing behaviour when we add more nodes/ODS, etc.● We have a plan to move to Cephfs without downtime

Page 12: GitLab Infrastructure 20160621

Storage - capacity

Git data - 28TB out of 49TB

Shared data - 3TB out of 4TBwe can grow this one easily-ish

Page 13: GitLab Infrastructure 20160621

Other news● What’s coming soon

○ Multiple mount points/shards - Thanks @Alejandro!○ 2 new hires

■ Alex as a Production Engineer■ Ahmad as a Performance Specialist

● We are talking with CI to transfer knowledge into Infrastructure.● We are going to take over GitHost.io● We are starting to build infrastructure monitoring that can be shipped with

GitLab● We are hiring!

That’s all folks!