version control and open- · 2018. 1. 25. · • syncing work within the team. what version...

95

Upload: others

Post on 02-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git
Page 2: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Version control and open-source methodologyin data scienceMislav Marohnić, software developer at GitHub

Page 3: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Topics

Page 4: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Topics

• What is version control,

Page 5: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Topics

• What is version control,

• How has open source influenced software,

Page 6: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Topics

• What is version control,

• How has open source influenced software,

• How can this be relevant to researchers in data science.

Page 7: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Version control

Page 8: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Version control

Page 9: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Version control

Page 10: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Version control

Page 11: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

version control outside of the software world

Page 12: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

version control outside of the software world

Page 13: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Version control terminology• repository

• “checking in”

• commit

• push/pull

• remote

• branch

• merge

• fork

• pull request

Page 14: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Version control terminology• repository

• “checking in”

• commit

• push/pull

• remote

• branch

• merge

• fork

• pull request

project directory

Page 15: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Version control terminology• repository

• “checking in”

• commit

• push/pull

• remote

• branch

• merge

• fork

• pull request

project directory

adding files

Page 16: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Version control terminology• repository

• “checking in”

• commit

• push/pull

• remote

• branch

• merge

• fork

• pull request

project directory

adding files

saving changes

Page 17: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Version control terminology• repository

• “checking in”

• commit

• push/pull

• remote

• branch

• merge

• fork

• pull request

project directory

adding files

saving changes

syncing

Page 18: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Version control terminology• repository

• “checking in”

• commit

• push/pull

• remote

• branch

• merge

• fork

• pull request

project directory

adding files

saving changes

syncing

this project elsewhere

Page 19: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Version control terminology• repository

• “checking in”

• commit

• push/pull

• remote

• branch

• merge

• fork

• pull request

project directory

adding files

saving changes

syncing

this project elsewhere

isolated changes

Page 20: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Version control terminology• repository

• “checking in”

• commit

• push/pull

• remote

• branch

• merge

• fork

• pull request

project directory

adding files

saving changes

syncing

this project elsewhere

isolated changes

combining changes

Page 21: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Version control terminology• repository

• “checking in”

• commit

• push/pull

• remote

• branch

• merge

• fork

• pull request

project directory

adding files

saving changes

syncing

this project elsewhere

isolated changes

combining changes

copying a project

Page 22: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Version control terminology• repository

• “checking in”

• commit

• push/pull

• remote

• branch

• merge

• fork

• pull request

project directory

adding files

saving changes

syncing

this project elsewhere

isolated changes

combining changes

copying a project

contributingchanges

Page 23: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

What version control facilitates

Page 24: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

What version control facilitates

• code storage & backups

Page 25: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

What version control facilitates

• code storage & backups

• isolated environment (branches) to experiment with changes

Page 26: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

What version control facilitates

• code storage & backups

• isolated environment (branches) to experiment with changes

• syncing work within the team

Page 27: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

What version control facilitates

• code storage & backups

• isolated environment (branches) to experiment with changes

• syncing work within the team

• project history

Page 28: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

What version control facilitates

• code storage & backups

• isolated environment (branches) to experiment with changes

• syncing work within the team

• project history

• tracking down software bugs

Page 29: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

What version control facilitates

• code storage & backups

• isolated environment (branches) to experiment with changes

• syncing work within the team

• project history

• tracking down software bugs

• release management

Page 30: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

What version control facilitates

• code storage & backups

• isolated environment (branches) to experiment with changes

• syncing work within the team

• project history

• tracking down software bugs

• release management

• continuous integration (CI)

Page 31: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

What version control looks like

Page 32: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

What version control looks like

Page 33: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

What version control looks like

Page 34: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Open-source

Page 35: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Open-sourceFOSS: Anyone is freely licensed to use, copy, study, and change the

software in any way, and the source code is openly shared so that people are encouraged to voluntarily improve the design of the software.

Page 36: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Examples of open source

Page 37: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Examples of open source• Python / R

Page 38: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Examples of open source• Python / R

• the Web & most browsers

Page 39: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Examples of open source• Python / R

• the Web & most browsers

• Linux

Page 40: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Examples of open source• Python / R

• the Web & most browsers

• Linux

• parts of Apple's macOS

Page 41: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Examples of open source• Python / R

• the Web & most browsers

• Linux

• parts of Apple's macOS

• Android OS

Page 42: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Examples of open source• Python / R

• the Web & most browsers

• Linux

• parts of Apple's macOS

• Android OS

• Microsoft .NET

Page 43: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Benefits of open source

Page 44: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Benefits of open source

• transparency → trust

Page 45: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Benefits of open source

• transparency → trust

• fosters learning

Page 46: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Benefits of open source

• transparency → trust

• fosters learning

• fosters collaboration

Page 47: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Benefits of open source

• transparency → trust

• fosters learning

• fosters collaboration

• more resilient software

Page 48: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Benefits of open source

• transparency → trust

• fosters learning

• fosters collaboration

• more resilient software

• longer-lasting software

Page 49: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git
Page 50: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git
Page 51: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git
Page 52: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

What GitHub provides

Page 53: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

What GitHub provides

• web interface for git

Page 54: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

What GitHub provides

• web interface for git

• storage & backups

Page 55: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

What GitHub provides

• web interface for git

• storage & backups

• issue tracking

Page 56: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

What GitHub provides

• web interface for git

• storage & backups

• issue tracking

• pull requests

Page 57: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

What GitHub provides

• web interface for git

• storage & backups

• issue tracking

• pull requests

• code search

Page 58: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

What GitHub provides

• web interface for git

• storage & backups

• issue tracking

• pull requests

• code search

• collaboration features

Page 59: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

What GitHub provides

• web interface for git

• storage & backups

• issue tracking

• pull requests

• code search

• collaboration features

• project management

Page 60: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

What GitHub provides

• web interface for git

• storage & backups

• issue tracking

• pull requests

• code search

• collaboration features

• project management

• downloadable releases

Page 61: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

What GitHub provides

• web interface for git

• storage & backups

• issue tracking

• pull requests

• code search

• collaboration features

• project management

• downloadable releases

• web site publishing

Page 62: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

What GitHub provides

• web interface for git

• storage & backups

• issue tracking

• pull requests

• code search

• collaboration features

• project management

• downloadable releases

• web site publishing

• API for integrations

Page 63: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git
Page 64: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git
Page 65: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git
Page 66: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Pull Requestsa small “unit” of collaboration

Page 67: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git
Page 68: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

The GitHub Flow

Page 69: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

The GitHub Flow: new branch

Page 70: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

The GitHub Flow: changes (commits)

Page 71: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

The GitHub Flow: create pull request

Page 72: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

The GitHub Flow: collaboration

Page 73: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

The GitHub Flow: peer approval

Page 74: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

The GitHub Flow: merge

Page 75: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git
Page 76: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Continuous integration (CI)The killer feature of pull requests

Page 77: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Continuous integration (CI)The killer feature of pull requests

Page 78: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Version control in data science

Page 79: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Similarities to software development

• syncing materials & data

• writing actual code (e.g. R)

• collaboration within a team

• peer review process

• publishing

Page 80: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git
Page 81: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git
Page 82: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Writing formats

Page 83: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Writing formats

• LaTeX

Page 84: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Writing formats

• LaTeX

• Markdown

Page 85: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Writing formats

• LaTeX

• Markdown

• R Markdown

Page 86: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Writing formats

• LaTeX

• Markdown

• R Markdown

• Jupyter (IPython) Notebook

Page 87: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Writing formats

• LaTeX

• Markdown

• R Markdown

• Jupyter (IPython) Notebook

• AsciiDoc

Page 88: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

github.com/mislav/utrecht

Page 89: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

github.com/mislav/utrecht

Page 90: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

github.com/mislav/utrecht

Page 91: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Potential problems

Page 92: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Potential problems

• git can be tricky to learn for non-developers

Page 93: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Potential problems

• git can be tricky to learn for non-developers

• large datasets can be inconvenient to add to version control

Page 94: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git

Potential problems

• git can be tricky to learn for non-developers

• large datasets can be inconvenient to add to version control

• transition paths from other tools aren't always clear

Page 95: Version control and open- · 2018. 1. 25. · • syncing work within the team. What version control facilitates ... • Jupyter (IPython) Notebook ... Potential problems • git