At my new position at Plemi, I want to optimize the continuous integration flow. Last week, I had a pretty interesting conversation with Mathieu Nebra, we discussed about our same quest: best ROI through the development flow. I also had an awesome experience with Antoine Guiral (the man behind twoolr), we shared a “lean startup" friday night around a very interesting project.
In my previous article, I introduced a pragmatic Git flow. A first step in the flow focused on the local environment of the developer. I like to start this post with a personal, yet simple consideration : “the trust in your development team is reflected by your CI flow”.
It’s very reductive but the github flow is a perfect example. It’s composed of only one master stable branch and several feature branches, the production flow is deadly simple. However Github uses Github (dog fooding for the win) and this flow strategically relies on it own awesome feature called pull request. Sadly, I can’t afford to use github everywhere for many -good and bad- reasons. So I tried to set up a flow with a self-hosted git repository coupled to our continuous integration server
Overview of the development flow
- A Developer works on their local git repository in a dedicated feature branch
- He pushes often as he can (min 2 in a day) to the remote feature branch
- The CI builds on every push, merging the trunk and his feature branch
- When a feature is over, the developer merge it back to the trunk
The context and the tools
For the new comer, a quick summary of the tools used in this flow
- Source code management with git
- Branch management is based on the git flow template
- Continuous integration rely on Jenkins
- Code quality rely on Sonar
- I work with PHP, yet the flow can be used in other langage
The flow starts on the developer side
I already wrote on that topic in my previous article. In short, the developer works into a feature branch named against this pattern: "feature.ticketNumber_description". He writes code without worrying about the SCM. Then, he dedicates some time to nicely split his work into several independent commits. If it’s needed, he can rebase his local branch. Finally he pushes the commit pack.
On the continuous integration plateform side
We have two jobs for each repository: one for the develop branch and another one for all feature branches. The feature job is configured to build every branch with the pattern “feature.*”. Jenkins is notified after each push by a git post-receive hook.
Tip: this hook doesn’t start a build, it trigger Jenkins for a SCM polling. In this way Jenkins will start to build only if there is relevant change(s) in the repository. Read the official documentation: “Build by source changes"
I configured Jenkins to merge locally the lastest pushed branch with the develop branch. This allow an early and cheap detection about merge conflicts.
If Jenkins can’t merge automatically, it fails the build and the developer is already aware that he needs to manually merge back the develop branch into his feature branch. He can’t delay this conflict resolution because it blocks the CI process: checkstyle, unit tests and coverage won’t be available to him until he resolves the merge.
This continuous merge process has the huge advantage to avoid the nightmare of a messy merge while your developer responsible of a shitty conflict is unreachable for an undefined period.”
Don’t trust Jenkins too much
I don’t use "push after a successful build" option provided by Jenkins. Basically, This allows you to automate the integration process: if a build is stable it will push the new modification to a remote branch. Yet, it don’t fit our needs since I ask the developer to push everyday. Many features need more than one day to be achieve and I won’t let Jenkins push some kind of work in progress.
Don’t trust a single developer
I don’t rely too much on Jenkins and this is also related to my first quote: I work with a pretty nice team but “I don’t trust a single developer” (and even myself!) about building the perfect git log (i.e. the perfect code) in a one shot. We are all new in the company and we need to practice a little bit more with the processes or the new coding standards for example.
Trust your team
Moreover, when I say: “I don’t trust a single developer”, it’s a way to highlight the code review phase. When a feature is closed, it must be fully reviewed and discussed by a technical leader before being merged into the develop branch. This has 2 majors consequences: First I can’t let Jenkins automatically merge into the develop branch; second, the technical leader is free to make a final rebase of the feature branch before the merge. This ensures a stable and clean develop branch.
Finally, we have to distinct the 2 usages of develop. Jenkins use this branch locally as an integration branch for each feature while the remote branch reflects the lastest stable version ready to be checked by Q&A and then deployed.
What’s next ?Depends on your need, you’re free:
- Everything go to the production.
- You launch some stress test on a staging server and then deploy to prod
- You notify your Q&A and expect a final feedback
At our office Q&A check is “asynchronous”, it can be delayed from 1 to 3 days. That’s why I’m still using the release branch from the git flow template. I freeze develop into release and wait for a Q&A feedback. If it’s a green light, I’ll deploy the release branch.
I customized the git flow for my need, we don’t use the “release” concept because we deploy ASAP each feature. This flow fits perfectly to the AGILE methods. You collect technical metrics sooner and it’s simpler to detect and to fix potential problems on a small iteration. Your project manager/product owner collects user feedbacks sooner, easing the business decisions for the next sprints/features
I mentioned 2 jobs for each repository, one job is limited to the develop branch and serves the code quality purpose, metrics from this job are exported to Sonar
I’m using git since more than 2 years, yet I was just using it naively the first months. Then, I needed a reliable workflow and git-flow was an huge revelation. I invite you to study this workflow, it’s a perfect start plan that you could customize for your needs.
First, a little reminder about the git-flow branches model:
- develop: lastest stable features
- release: stable snapshot from develop to be released
- master: production state
- hotfixes: emergency branch for master
Second, a short overview of the git-flow workflow:
- Every feature is developed in a dedicated feature branch.
- When the feature is over and stable, the related branch is merged back into develop.
- Then, develop is “freezed” into the release branch, allowing a deeper review/integration before production while other developers can continue merging new features into develop.
- Finally release is merged into master which is the production exposed branch.
I won’t detail the git-flow because the documentation is abundant on internet. Be aware that there is a lighter flow also feature-oriented (via @DavidGuyon). Scott Chacon wrote a nice article about this alternative. I find this flow really efficient, but imho, it relies on the excellent pull request feature from github. So if you’re on a self-hosted repository, it don’t fit to medium-sized projects, except if you work with a high-level senior team.
A clean history
In my previous job at Simple IT, I tried to use the git flow with a simple objective: a clear git history (to ease code review, code management and scm actions like revert) .
So, I started to ask the developers to focus on git, to split their commits by task/concern. It was an awesome success and also a fail. Success: git flow was a perfect flow for our AGILE needs. Fail: the git history was still filled by wrong files or messages, and with the git-flow, merge-commits were new parasites.
The reasons was obvious: it was relying only on the developer goodwill and it was asking him to provide a real-time effort about the SCM. Furthermore, developers had unequal skills: from the exceptional lead developer involved in open source projects to the kid which never meets the words “best” and “practices” together. We didn’t change anything, because the flow was cool and we we’re busy with real business-valued tasks.
I strongly believe that a SCM must not distract or disturb the developers from their main tasks. This means: when your team works on a feature, it shouldn’t be surrounded by questions like “Do I need to commit now ? Do I had to commit 2 files ago ?”. But I keep believing that an SCM history should be clean to be useful.
Rebase before pushing
One command solved almost all my problems: git rebase. Basically, it rewrites the git history. Wait a minute ? Isn’t a bad practice to rewrite an SCM history ? When you’re working with a team, the answer is definitively yes (this is condemned by the death penalty at our office).
Yet, git is a distributed SCM, so developers do not own a local copy, instead they manage a local repository (usually plugged to a remote one). I don’t care about their local repository, a developer needs to express his rage on his local commit message like: “Adds model”, “Fix model”, “Fix model 2”, “Fix BULLSHIT”…
More seriously, a developer will try to keep a clean history on his side, yet I don’t want to lower his productivity for an SCM purpose. He’s allowed to make mistake on his local env, I consider it as a “draft history”.
The push is the dramatic key in the flow. Before pushing, the developers have to rewrite their local history, to clean up their mess. How ? Thanks to rebase.
Here’s a basic example, I have worked hard and I’m ready to push. After a "git log", my history looks like this:
Since “MyClass” is a new file, I don’t really care about the minor refactor or the checkstyle fix. In the best world, I would have make only one commit, let’s correct that.
Git is based on the graph theory. Basically, my local repository:
This means that the branch “feature.alpha” starts from the commit B. As @ubermeda said : a branch is plugged to a commit! not to another branch, so when you said branch “alpha” comes from “develop”, it’s half-true.
We will use git rebase to rewrite the feature branch history. Note: rebase can be used in many different ways, google it to find more usages.
Here’s the command:
git rebase -i SHA1.
This can be translate to “I want to rewrite all the history of my current branch between now and the commit identified by this SHA1”. (now = head = the lastest commit of the branch).
Careful, it’s safer to have a clean stash/state before rebasing and you must not rebase a commit already pushed or your partners will be in big trouble.
In our example, we need to group the 3 commits (D-E-F) into one. I will type:
git rebase -i [the SHA1 of the commit B].
The -i argument means interactive, a screen will popup:
Important: We notice that the commit order is reversed. Also, a lot of informations are available at the bottom of the screen (read it!). To simply regroup the 3 commits into a single commit, I will change some values:
This means: Start with the commit D, add it the commit E, add it the commit F. I used the keyword “fixup” which melds the commit to the previous one and discards the commit message (A perfect command for our needs). Again, read the bottom documentation of the rebase screen, each keywords are described. After saving this file, git will start the rebase.
Let’s check the git log
And my local repository looks like this:
Exactly what we expected. We notice 2 things: I didn’t change the commit message (I could do that during the rebase) and the “grouped” commit is a new commit with a new sha1. That’s why you must not rebase a pushed commit or it will totally break the history.
My local history is now clean, it’s time to push to the remote. It may trigger a post-receive hook for a Jenkins CI which will merge branches, test your code… No ? You should read my introduction about Continuous Integration
I didn’t mentioned it before, but I force the dev team to push every day, this brings a lot of advantages and I’ll probably will talk about it in my next post about continuous integration. In this way they don’t have to rebase hundreds commit
Finally, the productivity isn’t affected by the SCM concern, the developer could work without worrying about the SCM stuff. He just need to dedicate time to clean up and arrange his repository and the whole team benefits from a clean history.
in my example I just grouped 3 commits. But you could do far more: re-ordering commit just by switching the line for example… In fact, you are totally free to rewrite your whole history.
Sometimes you can’t remember easily the common ancestor of your feature branch and your develop branch. In my example the lastest common ancestor was the commit B. Here’s an example:
Note: If you merge “feature.alpha” and “develop”, the last common ancestor for the both branches will be updated to G.
To find the last common ancestor, you just have to type this command:
git merge-base [branch1] [branch2]
It will output the sha1 of the last common commit, you could use this identifier for your rebase.