hg for bioinformatics, second part
DESCRIPTION
The second part of a talk about hg and version control I gave to my colleagues in a group of bioinformaticians. First part here: http://www.slideshare.net/giovanni/hg-version-control-bioinformaticiansTRANSCRIPT
![Page 1: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/1.jpg)
Hg and version control for bioinformatics2
![Page 2: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/2.jpg)
What you will learn from this talk
● Graphical interfaces to hg repos● Working with a remote copy of the repo on
bitbucket● Working together with other people
![Page 3: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/3.jpg)
Graphical interfaces to hg
![Page 4: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/4.jpg)
Graphical interfaces to hg
● In the last talk we saw hg as a command line tool
● However there are many graphical interfaces to it● Learning all the hg commands may be silly● Complex repositories may be difficult to navigate
without the help of a graphical interface
![Page 5: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/5.jpg)
tortoiseHG
● TortoiseHG is a multi-platform graphical interface that integrates with your file manager
● Once installed, it:● adds a few voices in the right-click menu on a file or
folder● install a tool called repository explorer
![Page 6: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/6.jpg)
TortoiseHG on your desktop
● This directory contains a hg repository
● Green and red symbols mark files tracked by hg
![Page 7: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/7.jpg)
TortoiseHG right-click menu
● Right click on the folder and look at the new voices in the menu
![Page 8: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/8.jpg)
Right-click on a file
● Right click on a file gives you more options ● Commit changes if
the file differs from last saved version
● Check the history of the file
● Revert it to previous version
● Etc...
![Page 9: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/9.jpg)
The tortoise-hg repository exporer
● The tortoise repository explorer is a graphical tool to manage a hg repository:● Browse the historial
● Commit changes
● Manage branches
● Upload to a remote server
![Page 10: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/10.jpg)
The repository explorer
1. Historial of changes
2. Files changed in the selected commit
3. Changes made to selected files in selected commit
![Page 11: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/11.jpg)
Making a commit from the Repository explorer
Tools menu → Commit
![Page 12: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/12.jpg)
Setting up a repository on bitbucket
![Page 13: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/13.jpg)
Having a copy of your repository on a remote location
● In the real world, people always keep a copy of their repository on a remote server
● Advantages:● backups ● Can access the code from anywhere
● The smartest thing is to use a free code hosting service (github, bitbucket, etc..)
![Page 14: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/14.jpg)
Code hosting services
● There are many ~free code hosting services:● Bitbucket (hg)● Github, Gitorious (git)● Launchpad (bzr)● Sourceforge (svn, various)
● Bitbucket has fairly good conditions for our case:● Unlimited private and public repositories● Unlimited disk space● Only limit is: 5 collaborators max per account
![Page 16: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/16.jpg)
Recommended: set up a ssh key
● After registering to bitbucket, the first thing you should do is setting up a ssh key
● Go to 'Account' → Add SSH Keys
● Safer transfers through Internet
● Don't have to type password every time
![Page 17: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/17.jpg)
Creating a Repo on bitbucket
● Just click on 'Repositories' → create new repo
![Page 18: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/18.jpg)
Creating a Repo on bitbucket
● Just keep following the instructions
● ssh key is recommended
![Page 19: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/19.jpg)
Cloning a repo
● After creating a repository on bitbucket, it will give you an url that you can use to download the repo on your computer.
● Example:https://bitbucket.org/dalloliogm/secret-repossh://bitbucket.org/dalloliogm/secret-repo
● Just use the hg clone command:hg clone ssh://bitbucket.org/dalloliogm/secret-repo
● You can also clone a repository created by someone else ● (or clone your repository on another computer/directory)
![Page 20: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/20.jpg)
Synchronizing an existing repo with bitbucket
● What happens if you have created your repository in local before creating it on bitbucket?
● No problem, follow the instructions and you can synchronize them
![Page 21: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/21.jpg)
Setting up remote repo (tortoise)
● Go to Tools → Settings → Synchronize
![Page 22: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/22.jpg)
Setting up remote repo (manually)
● Open the .hg/hgrc file inside the repo main directory
● Add the following:[paths]default = ssh://bitbucket.org/dalloliogm/secret-repo
![Page 23: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/23.jpg)
Working with remote repos
![Page 24: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/24.jpg)
Now, let's get serious!
● You have successfully set up a remote copy of your code on bitbucket
● Let's see how it works
![Page 25: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/25.jpg)
Hg – working with remote repos
● hg clone → get a copy of an existing repo (only once)
● hg pull → get the list of changes from the latest version on the remote repository
● hg update → apply the changes from the latest pulled version to the current working directory
● hg merge → merge conflicting versions● hg push → push the local changes to the remote
repository
![Page 26: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/26.jpg)
Hg clone
● This command creates a copy of a repository on your computer● For example, a copy of a repository on bitbucket
● Launch it only once per repository
![Page 27: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/27.jpg)
Hg pull & update
● Hg pull gets the list of changes made to the remote repository since the last time you cloned/pulled it● It checks whether one of your colleagues has updated a
new version to the remote repo
● These changes are not applied automatically to the current working directory;● You have to do a hg update after a hg pull to update your
local files● hg pull -u → pulls & updates
![Page 28: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/28.jpg)
Hg push
● The hg push command sends the changes you have made in local to the remote server
● The command fails if other people have pushed other changes before you● You always have to make a pull&update (and
merge) before doing a push● More on this later
![Page 29: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/29.jpg)
Exercise
● Try to use bitbucket as a repository for your own script
● Commit your versions in local, and push them to bitbucket as a backup copy
● You can clone (and later pull&update) the repo on your computer at home
![Page 30: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/30.jpg)
Hg for our pipeline
![Page 31: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/31.jpg)
Applying hg to our pipeline
● Someone should initialize a repo on the root directory (only once)
● Add, commit, document● Push a copy of the repo on bitbucket● Everybody will clone the repo from there, and
pull/push changes from there
![Page 32: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/32.jpg)
What to include in the repo
● Code, documentation● We may create another repository for results
and parameters● For each set of results, we should be able to
know which version of the scripts and which parameters have been used
![Page 33: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/33.jpg)
Executing the pipeline on the cluster
● Connect to the cluster● Hg pull & update from bitbucket (to get the
latest code)● Test to verify whether it works correctly on the
cluster?● Execute the pipeline
![Page 34: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/34.jpg)
Proposal: code reviews
● One person may be in charge of writing the core pipeline
● Other people can clone the repository and improve it (code review)
● So we will work on the same code, and hopefully make it better
![Page 35: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/35.jpg)
Collective code ownership
● In the perfect group, nobody is 'the only author' of a script
● Code is just a medium :-)● A single script written by two persons is much
better than two redundant scripts
![Page 36: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/36.jpg)
The daily pull
● Every day, the first thing you should do is a hg pull & update to get the latest version of the code
● Make your changes in local and commit them.● When you are ready, pull&update again to align
your code to the remote copy, then push to bitbucket
● Beware of conflicting changes..
![Page 37: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/37.jpg)
Merging and conflicts
● What happens when two people work on the same code on different computers?● Two different versions of the code will exist
● How to merge them?● Ask me :-)● Never force the push (hg push -f) – you will delete
other people's work● Always do a hg pull&update before a push;
eventually use hg merge to integrate other people's changes
![Page 38: Hg for bioinformatics, second part](https://reader034.vdocuments.net/reader034/viewer/2022051609/547d3f185906b56b378b4622/html5/thumbnails/38.jpg)
Making changes to the pipeline
● Get the latest copy of the pipeline from bitbucket (pull&update)
● Make changes, commit● pull&update, push to bitbucket● Connect to cluster, pull&update, execute
pipeline