continuous integration for spark apps by sean mcintyre

45
Continuous Integration for Spark Apps

Upload: spark-summit

Post on 21-Apr-2017

1.584 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Continuous Integration for Spark Apps by Sean McIntyre

Continuous Integrationfor Spark Apps

Page 2: Continuous Integration for Spark Apps by Sean McIntyre

Hi, I’m Sean!

© 2015 Uncharted Software Inc.

Page 3: Continuous Integration for Spark Apps by Sean McIntyre

It’s hard to test Spark Apps :(

© 2015 Uncharted Software Inc.

Page 4: Continuous Integration for Spark Apps by Sean McIntyre

Case Study: Uncharted Spark Pipeline

© 2015 Uncharted Software Inc.

Page 5: Continuous Integration for Spark Apps by Sean McIntyre

Case Study: Uncharted Spark PipelineSome key issues:

● Ensure reliability● Prevent regressions● Maintain compatibility with multiple versions of Spark● Open-source - need a quick and easy way to evaluate PRs

© 2015 Uncharted Software Inc.

Page 6: Continuous Integration for Spark Apps by Sean McIntyre

What is Continuous Integration?

© 2015 Uncharted Software Inc.

Page 7: Continuous Integration for Spark Apps by Sean McIntyre

“Continuous Integration (CI) is a development practice that requires developers to integrate code into a shared repository several times a day. Each check-in is then verified by an

automated build, allowing teams to detect problems early.”

-- ThoughtWorks

© 2015 Uncharted Software Inc.

Page 8: Continuous Integration for Spark Apps by Sean McIntyre

“Continuous Integration (CI) is a development practice that is pretty damnedimportant for writing quality software.”

-- Me

© 2015 Uncharted Software Inc.

Page 9: Continuous Integration for Spark Apps by Sean McIntyre

So, What is Continuous Integration?

© 2015 Uncharted Software Inc.

Page 10: Continuous Integration for Spark Apps by Sean McIntyre

Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)

© 2015 Uncharted Software Inc.

Page 11: Continuous Integration for Spark Apps by Sean McIntyre

Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)

© 2015 Uncharted Software Inc.

Page 12: Continuous Integration for Spark Apps by Sean McIntyre

Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)3. Tests should be part of the build (ScalaTest)

© 2015 Uncharted Software Inc.

Page 13: Continuous Integration for Spark Apps by Sean McIntyre

Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)3. Tests should be part of the build (ScalaTest)4. Commit/push feature branches often

© 2015 Uncharted Software Inc.

Page 14: Continuous Integration for Spark Apps by Sean McIntyre

Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)3. Tests should be part of the build (ScalaTest)4. Commit/push feature branches often5. Build (and test) All The Branches

© 2015 Uncharted Software Inc.

Page 15: Continuous Integration for Spark Apps by Sean McIntyre

Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)3. Tests should be part of the build (ScalaTest)4. Commit/push feature branches often5. Build (and test) All The Branches6. Test in a clone of the production environment

© 2015 Uncharted Software Inc.

Page 16: Continuous Integration for Spark Apps by Sean McIntyre

Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)3. Tests should be part of the build (ScalaTest)4. Commit/push feature branches often5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast

© 2015 Uncharted Software Inc.

Page 17: Continuous Integration for Spark Apps by Sean McIntyre

Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)3. Tests should be part of the build (ScalaTest)4. Commit/push feature branches often5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast8. Everyone can see the results of builds

© 2015 Uncharted Software Inc.

Page 18: Continuous Integration for Spark Apps by Sean McIntyre

Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)3. Tests should be part of the build (ScalaTest)4. Commit/push feature branches often5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast8. Everyone can see the results of builds

} duh.

© 2015 Uncharted Software Inc.

Page 19: Continuous Integration for Spark Apps by Sean McIntyre

Best Practices, courtesy of Wikipedia1. Maintain a code repository (Git)2. Automate the build (Gradle)3. Tests should be part of the build (ScalaTest)4. Commit/push feature branches often5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast8. Everyone can see the results of builds

} ...less duh.

© 2015 Uncharted Software Inc.

Page 20: Continuous Integration for Spark Apps by Sean McIntyre

Why are these difficult with Apache Spark?

5. Build (and test) All The Branches6. Test in a clone of the production

environment7. Keep the build fast8. Everyone can see the results of builds

© 2015 Uncharted Software Inc.

Page 21: Continuous Integration for Spark Apps by Sean McIntyre

What is a Spark App?

© 2015 Uncharted Software Inc.

Page 22: Continuous Integration for Spark Apps by Sean McIntyre

What is a Spark app?

Source JARSpark ?

This thing.

JAR

© 2015 Uncharted Software Inc.

Page 23: Continuous Integration for Spark Apps by Sean McIntyre

And...

Source JARSpark ?

We need to test this

JAR

© 2015 Uncharted Software Inc.

Page 24: Continuous Integration for Spark Apps by Sean McIntyre

But...

Source JARScalaTestScala RE

By default, we have this

JAR

(boom)

© 2015 Uncharted Software Inc.

Page 25: Continuous Integration for Spark Apps by Sean McIntyre

v1: Squish Spark inside ScalaTest

Source JAR

ScalaTest with

SparkContext

So, we try this

JAR

it works!(sort of)

© 2015 Uncharted Software Inc.

Page 26: Continuous Integration for Spark Apps by Sean McIntyre

it works!(sort of)

© 2015 Uncharted Software Inc.

Page 27: Continuous Integration for Spark Apps by Sean McIntyre

6. Test in a clone of the production environment

© 2015 Uncharted Software Inc.

Page 28: Continuous Integration for Spark Apps by Sean McIntyre

v2: Squish ScalaTest into Spark

Source

TestJAR

Tests Main.scala

Spark

JAR TestJAR

Test Output

JAR

© 2015 Uncharted Software Inc.

Page 29: Continuous Integration for Spark Apps by Sean McIntyre

Main.scala

© 2015 Uncharted Software Inc.

Page 30: Continuous Integration for Spark Apps by Sean McIntyre

6. Test in a clone of the production environment

© 2015 Uncharted Software Inc.

Page 31: Continuous Integration for Spark Apps by Sean McIntyre

Progress?

5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast8. Everyone can see the results of builds

© 2015 Uncharted Software Inc.

Page 32: Continuous Integration for Spark Apps by Sean McIntyre

What now?

5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast8. Everyone can see the results of builds

© 2015 Uncharted Software Inc.

Page 33: Continuous Integration for Spark Apps by Sean McIntyre

Docker Container (uncharted/sparklet)

v3: Squish Spark and Test JAR into Docker

Test Output

Source

TestJAR

Tests Main.scala

Spark

JAR

JAR TestJAR

© 2015 Uncharted Software Inc.

Page 34: Continuous Integration for Spark Apps by Sean McIntyre

test.sh

© 2015 Uncharted Software Inc.

Page 35: Continuous Integration for Spark Apps by Sean McIntyre

build.gradle (excerpt)

© 2015 Uncharted Software Inc.

Page 36: Continuous Integration for Spark Apps by Sean McIntyre

Progress?

5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast8. Everyone can see the results of builds

© 2015 Uncharted Software Inc.

Page 37: Continuous Integration for Spark Apps by Sean McIntyre

Travis CI VM

Docker Container

v4: Squish Docker into Travis CI

Test Output

Source

TestJAR

Tests Main.scala

Spark

JAR

JAR TestJAR

© 2015 Uncharted Software Inc.

Page 38: Continuous Integration for Spark Apps by Sean McIntyre

.travis.yml

© 2015 Uncharted Software Inc.

Page 39: Continuous Integration for Spark Apps by Sean McIntyre

Voilà!

© 2015 Uncharted Software Inc.

Page 40: Continuous Integration for Spark Apps by Sean McIntyre

Progress?

5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast8. Everyone can see the results of builds

© 2015 Uncharted Software Inc.

Page 41: Continuous Integration for Spark Apps by Sean McIntyre

© 2015 Uncharted Software Inc.

Page 42: Continuous Integration for Spark Apps by Sean McIntyre

© 2015 Uncharted Software Inc.

Page 43: Continuous Integration for Spark Apps by Sean McIntyre

All done!

5. Build (and test) All The Branches6. Test in a clone of the production environment7. Keep the build fast8. Everyone can see the results of builds

© 2015 Uncharted Software Inc.

Page 44: Continuous Integration for Spark Apps by Sean McIntyre

Next Steps?

Alpine Linux

docker-compose

Windows (dev environment) support

python

© 2015 Uncharted Software Inc.

Page 45: Continuous Integration for Spark Apps by Sean McIntyre

Questions?

https://github.com/unchartedsoftware/sparkpipe-core

https://github.com/Ghnuberath

@Ghnuberath

https://hub.docker.com/r/uncharted/sparklet/

[email protected]