selena deckelmann - sane schema management with alembic and sqlalchemy @ postgres open

72
Sane Schema Management with Alembic and SQLAlchemy Selena Deckelmann Mozilla @selenamarie chesnok.com

Upload: postgresopen

Post on 30-May-2015

1.009 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Sane Schema Management with Alembic and SQLAlchemy

Selena DeckelmannMozilla

@selenamariechesnok.com

Page 2: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

I work on Socorro.

http://github.com/mozilla/socorro

http://crash-stats.mozilla.com

Page 3: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Thanks and apologies to Mike Bayer

Page 4: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

What's sane schema management?

Executing schema change in a controled, repeatable way while working with developers and operations.

Page 5: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

What's alembic?

Alembic is a schema migration tool that integrates with SQLAlchemy.

Page 6: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

My assumptions:

● Schema migrations are frequent.● Automated schema migration is a goal.● Stage environment is enough like

production for testing.● Writing a small amount of code is ok.

Page 7: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

No tool is perfect.

DBAs should drive migration tool choice.

Chose a tool that your developers like. Or, don't hate.

Page 8: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Part 0: #dbaproblems

Part 1: Why we should work with developers on migrations

Part 2: Picking the right migration tool

Part 3: Using Alembic

Part 4: Lessons Learned

Part 5: Things Alembic could learn

Page 9: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Part 0: #dbaproblems

Page 10: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Migrations are hard.And messy.

And necessary.

Page 11: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open
Page 12: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Changing a CHECK constraint on 1000+ partitions.

http://tinyurl.com/q5cjh45

Page 13: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

What sucked about this:

● Wasn't the first time (see 2012 bugs)● Change snuck into partitioning UDF

Jan-April 2013● No useful audit trail● Some partitions affected, not others● Error dated back to 2010● Wake up call to examine process!

Page 14: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open
Page 15: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Process before Alembic:

Page 16: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open
Page 17: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

What was awesome:

● Used Alembic to manage the change● Tested in stage● Experimentation revealed which

partitions could be modified without deadlocking

● Rolled out change with a regular release during normal business hours

Page 18: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Process with Alembic: 1. Make changes to model.py or

raw_sql files2. Run: alembic revision –-auto-generate

3. Edit revision file4.Commit changes5. Run migration on stage after

auto-deploy of a release

Page 19: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Process with Alembic: 1. Make changes to model.py or

raw_sql files2. Run: alembic revision -–auto-generate

3. Edit revision file4.Commit changes5. Run migration on stage after

auto-deploy of a release

Page 20: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Problems Alembic solved:● Easy-to-deploy migrations including

UDFs for dev and stage● Can embed raw SQL, issue multi-

commit changes● Includes downgrades

Page 21: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Problems Alembic solved:● Enables database change discipline● Enables code review discipline● Revisions are decoupled from release

versions and branch commit order

Page 22: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Problems Alembic solved (continued): ● 100k+ lines of code removed● No more post-deploy schema

checkins● Enabling a tested, automated stage

deployment● Separated schema definition from

version-specific configuration

Page 23: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Photo courtesy of secure.flickr.com/photos/lambj

HAPPY

AS A CAT IN A BOX

Page 24: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Part I: Why we should work with developers on migrations

Page 25: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Credit: flickr.com/photos/chrisyarzab/

Page 26: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Schemas change.

Page 27: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Developers find this process reallyfrustrating.

Page 28: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Schemas, what are they good for?

Signal intentCommunicate ideal state of dataHighly customizable in Postgres

Page 29: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Schemas, what are they not so good for?

Rapid iterationDocumenting evolutionMajor changes on big dataData experimentation

Page 30: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Database systems resist change.

Page 31: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Database systems resist change because:

Exist at the center of multiple systems

Stability is a core competency

Schema often is the only API between components

Page 32: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

How do we make changes to schemas?

Page 33: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Because of resistance, we treatschema change as a one-off.

Page 34: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Evolution of schema change process

Page 35: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

We're in charge of picking up the pieces when a poorly-executed schema change plan fails.

Page 36: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Trick question:

When is the right time to work with developers on a schema change?

Page 37: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

How do we safely make changes to schemas?

Page 38: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

How do we safely make changes to schemas?

Process and tooling.

Preferably, that we choose and implement.

Page 39: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Migration tools are really configuration management tools.

Page 40: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Migrations are for: ● Communicating change● Communicating process● Executing change in a controled,

repeatable way with developers and operations

Page 41: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Part 2: Picking the right migration tool

Page 42: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open
Page 43: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Questions to ask: ● How often does your schema change?● Can the migrations be run without you?● Can you test a migration before you run

it in production?

Page 44: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Questions to ask: ● Can developers create a new schema

without your help?● How hard is it to get from an old

schema to a new one using the tool?● Are change rollbacks a standard use of

the tool?

Page 45: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

What does our system need to do?● Communicate change● Apply changes in the correct order● Apply a change only once● Use raw SQL where needed● Provide a single interface for change● Rollback gracefully

Page 46: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

How you are going to feel about the next slide:

Page 47: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Use an ORM with the migration tool.

Page 48: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Shameful admission:

We had three different ways of defining schema in our code and tests.

Page 49: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

A good ORM provides:

● One source of truth about the schema● Reusable components● Database version independence● Ability to use raw SQL

Page 50: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

And good ORM stewardship:

● Fits with existing tooling and developer workflows

● Enables partnership with developers● Integrates with a testing framework

Page 51: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

And:

● Gives you a new way to think about schemas

● Develops compassion for how horrible ORMs can be

● Gives you developer-friendly vocabulary for discussing why ORM-generated code is often terrible

Page 52: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Part 3: Using Alembic

Page 53: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Practical Guide to using Alembic

http://tinyurl.com/po4mal6

Page 54: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

https://alembic.readthedocs.org

revision: a single migrationdown_revision: previous migrationupgrade: apply 'upgrade' changedowngrade: apply 'downgrade' changeoffline mode: emit raw SQL for a change

Page 55: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Installing and using:

virtualenv venv-alembic. venv-alembic/bin/activatepip install alembicalembic initvi alembic.inialembic revision -m “new”alembic upgrade headalembic downgrade -1

Page 56: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Defining a schema?

vi env.py

Add: import myproj.model

Page 57: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Helper functions?

Put your helper functions in a custom library and add this to env.py:

import myproj.migrations

Page 58: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Ignore certain schemas or partitions?

In env.py:

def include_symbol(tablename, schema): return schema in (None, "bixie") and re.search(r'_\d{8}$', tablename) is None

Page 59: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Manage User Defined Functions?

Chose to use raw SQL files3 directories, 128 files:procs/ types/ views/

codepath = '/socorro/external/pg/raw_sql/procs'

def load_stored_proc(op, filelist):

app_path = os.getcwd() + codepath

for filename in filelist:

sqlfile = app_path + filename

with open(myfile, 'r') as stored_proc:

op.execute(stored_proc.read())

Page 60: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Stamping database revision?

from alembic.config import Config

from alembic import command

alembic_cfg = Config("/path/to/yourapp/alembic.ini")

command.stamp(alembic_cfg, "head")

Page 61: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Part 4: Lessons Learned

Page 62: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Always roll forward.

1. Put migrations in a separate commit from schema changes.

2. Revert commits for schema change, leave migration commit in-place for downgrade support.

Page 63: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Store schema objects in the smallest, reasonable, composable unit.

1. Use an ORM for core schema.2. Put types, UDFs and views in separate

files.3. Consider storing the schema in a

separate repo from the application.

Page 64: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Write tests. Run them every time.

1. Write a simple tool to create a new schema from scratch.

2. Write a simple tool to generate fake data.

3. Write tests for these tools.4.When anything fails, add a test.

Page 65: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Part 5: What Alembic could learn

Page 66: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

1. Understand partitions

2. Never apply a DEFAULT to a new column

3. Help us manage UDFs better

4.INDEX CONCURRENTLY

5. Prettier syntax for multi-commit sequences

Page 67: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

1. Understand partitions

2. Never apply a DEFAULT to a new column

3. Help us manage UDFs better

4.INDEX CONCURRENTLY

5. Prettier syntax for multi-commit sequences

Page 68: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Epilogue

Page 69: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

No tool is perfect.

DBAs should drive migration tool choice.

Chose a tool that your developers like. Or, don't hate.

Page 70: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Other tools:

Sqitchhttp://sqitch.org/Written by PostgreSQL contributor

Erwinhttp://erwin.com/Commercial, popular with Oracle

Southhttp://south.aeracode.org/ Django-specific, well-supported

Page 71: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Alembic resources:

bitbucket.org/zzzeek/alembic

alembic.readthedocs.org

groups.google.com/group/sqlalchemy-alembic

Page 72: Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Postgres Open

Sane Schema Management with Alembic and SQLAlchemy

Selena DeckelmannMozilla

@selenamariechesnok.com