Search This Blog

2012-04-23

DVCS Branching - Why Mercurial Sucks and What Mercurial Advocates Are Missing

Introduction

Version control software is basically used to track changes to files (usually plain-text files) over time. It is primarily used by software developers to track source code changes. It used to be that a centralized server was used to host a centralized repository of changes, and clients would connect to it to checkout or check-in changes.

Then came a revolution known as distributed version control. This model basically eliminates the centralized repository and gives every user (developer) their own complete clone of the repository.

There are basically two popular distributed version control systems (DVCS) available right now, and both are free/libre/open source software. They are Git and Mercurial. Both were developed to replace BitKeeper for use in maintaining the Linux kernel, but only one was written by Linus himself so the other one sucks. ;) There are other free ones and also commercial ones, but I don't know of any good reason to use them. Linus Torvalds did at one point say that if for whatever reason you needed a commercial DVCS that BitKeeper was the one to use.

Git and Mercurial work relatively similarly. As far as I know, they both represent history internally using a directed acyclic graph (DAG). Combining this with the distributed design means that branching and merging happens to work quite well in both (it has to). Effectively, whenever you work in a separate clone of the repository you create a new physical branch, even if you don't mean to. As a result, you often have to merge when you synchronize your local repository. That only tells part of the story of branching though.

Branching In Mercurial

The Mercurial community has basically come up with 3 branching strategies that you can use with Mercurial. The reason there's so many of them is because none of them really works very well. Proponents look for ways to justify these misfeatures, but when you actually put the strategies into practice you see that it's nonsense.

Named Branches

The core branching feature is called named branches. This is what the `hg branch' and `hg branches' commands manage. The implementation is a little bit finicky. It's effectively a name attribute stored with each changeset. That's all there really is to it. I imagine it must have been inspired by Subversion properties or something silly like that. The way it works is that you are basically free to set the working branch name anytime you want with `hg branch <name>'. The attribute is remembered and when you commit it is stored in the changeset as a permanent part of history. The head of a named branch is basically the most recent commit with that branch name.

Branches in Mercurial are shared by default. If you want to only synchronize selective branches then you need to explicitly specify only the ones you want synchronized with every relevant command. I find it extremely tedious. It basically makes it difficult for anyone to work under the radar without distracting others. In a big project you can expect that many people are not going to be interested in the work that others are doing. You're only really interested in the parts of the project that impact you in some way and don't want to be bothered with all of the other noise. Mercurial makes it difficult to do that.

Another problem is that in this distributed environment there is only one namespace for branches, and branches are permanent parts of history. This means that bad things can happen if two completely distinct developers happen to choose the same branch name for different branches. It also means that if you just want to create an experimental branch and throw it away later then you are forced to share it with the world anyway (short of history editing, which the Mercurial community also discourages). A workaround that was added later is `hg commit --close-branch', which basically adds a new changeset with a new special attribute to indicate that the named branch is "closed" (i.e., finished with). This basically just tells Mercurial to hide it from view by default. The branch is automatically reopened though if you accidentally commit onto it again. In my experience, it's very difficult to keep things straight when the entire universe's branches are always available to the `hg update' command, which is used to update your working copy to some specific version (e.g., the latest) as well as change branches.

Repository Clone

What the Mercurial community would have you do for experimental changes that you may wish to throw away is clone the repository again and work in a throw-away repository instead. Due to the nature of the distributed DAG history you are actually automatically branching whenever you work on a cloned repository. This solution does work, but it's very Subversion-like in that you have to manually manage "branches" within your file system. Mercurial will use hard links on supported file systems to save space if you clone locally (basically each clone will share whatever physical files on disk that it can), but that's little comfort. In my experience, this strategy doesn't work overly well, and it clutters your project namespace (e.g., I generally keep all source code projects in ~/src or "%USERPROFILE%\src").

You also can't directly share these experimental changes with other people or with a shared remote repository without pushing a new anonymous head and screwing everybody up. At best you could tell your collaborator to explicitly clone anew, and keep the history separate from the main history, just as you have done, but that's a very error-prone approach.

Bookmarks

A lot of Git users complain about the branching options in Mercurial, and the Mercurial community's solution was to offer an alternative branching mechanism that is similar to how branches work in Git. The solution is called "bookmarks". Bookmarks began as a Mercurial extension, but has since been merged into the core.

A bookmark is basically an association between a name and a commit identifier. These associations are stored in .hg/bookmarks. When you commit on a bookmark the bookmark is automatically updated to point to the new head. Since bookmarks are stored outside of history they can be easily created, renamed, and deleted at any time. They are very lightweight compared to the other branching options. Indeed, they are similar to a Git branch. The problem with bookmarks is that they aren't shared by default. If you choose to use bookmarks to manage branches in your project then you either have to tediously communicate each bookmark with all collaborators or risk them getting quite confused because there will be multiple seemingly anonymous branch heads when they pull your branches in.

Basically, you have to explicitly push bookmarks with `hg push -B <name>', and explicitly pull them with `hg pull -B <name>'. You can check for new bookmarks in the remote repository with `hg incoming -B'. Unfortunately, this puts a lot of responsibility on the collaborators to manually keep bookmarks up to date. The documentation says that once both sides of the connection have a bookmark it will be updated automatically, but that's not much help. Without them, at least, Mercurial is quite happy to jump branches on you without really raising any alarms (I'm not sure it would stop you even with the bookmarks, to be honest, though I've been told it will). The current implementation is just not sufficient for real world usage. Most of my collaborators are even in the same room. I can't imagine if you had people spread out across a building or the world.

Branching In Git

Git is quite smart about most of what it can do. A branch in Git is a much more logical concept. You get to think of Git branches as tangible things. There is basically only one strategy for branching and it works quite well. Perhaps most importantly it's distribution friendly.

Git manages your branches for you. A branch is actually represented by a very light file that basically just references the head commit identifier of the branch. This allows you to create, rename, and delete branches at will, very efficiently, similar to Mercurial bookmarks. Git takes care of efficiently storing the actual history for you so you can be sure that it doesn't duplicate anything needlessly. The operation of switching branches is also different than the operation of updating your working copy [fix:] local branches (`git checkout' and `git pull'[1], respectively).

Git branches are not all implicitly synchronized with a remote repository. If you `git push' then only the current branch's commits will be pushed. Similarly, when you `git pull' then only the current branch will be pulled. Branches are namespaced by repository. Local ones are just in the global namespace, but remote branches are namespaced by the remote name. For example, origin/master refers to the 'master' branch of the remote 'origin'. This puts the user in complete control of their repository. The remote might refer to a branch as 'foo', but you can use a local branch named 'bar' if you want to. Git doesn't implicitly synchronize any branches with remote branches. You have to explicitly associate them in configuration before they will implicitly be synced. Otherwise, you can explicitly choose which local and remote branches you want to synchronize.

Conclusion

Mercurial proponents typically try to defend Mercurial's branching madness with excuses that branches should be permanent history and that it's a feature, not a bug. The reality is that Mercurial has a very non-distributed nature to its branching policies. Instead of letting each user be in control of his view of the world, the Mercurial community seems to be in favor of encouraging everybody to always share the entire world with the entire world.

In short, Git is a much better DVCS for branching (and in my experience, pretty much everything else too, but I digress). Mercurial is a good alternative for people that need extra hand holding and don't plan to make extensive use of the tools. Branching in Mercurial is so painful that we basically don't do it where I work, just as we didn't really do it with Subversion. The little bit of branching that we do do ends up being ugly and distracting and it's very error-prone in my experience.

References

[1] - `git pull' is equivalent to `git fetch && git merge'. Note that if you're wise to it then you can use git rebase instead of git merge if you want to linearize [local] history.

8 comments:

  1. I get it, but because of the needs of my co-workers, I have to wait for Git to have a better Windows / User Friendly story.

    ReplyDelete
  2. I like the precise description. However, the conclusion is drawn from a very limited view. Only a certain workflow/use case scenario is considered. In a corporate environment (especially in a software development environment regulated by external standards) branches stored permantely in history a very useful. Having this in mind one might call Git's branching model ase useless. The bookmark feature is implemented as it is, because many people prefer excactly that behavior. You can find many good arguments on the Mercurial mailing list. You shouldn't blame a tool to be useless in general - you could call it useless for your certain use case.

    ReplyDelete
  3. @Anonymous: Sorry for the late reply. I feel your pain. I'd argue that Git for Windows is actually very stable and mature, but I can only speak on behalf of the command line tools. I don't use any SCM GUIs so I don't know what the status if for that in Windows (we all know most Windows users rely on the GUI).

    @Peter: I would argue that Git can easily adapt to other use cases (e.g., you could use hooks in the upstream repository to prevent deleting or rebasing branches), whereas in my experience Mercurial is not easily adapted (you generally end up having to work around its behaviors instead at your own expense). I'll leave it at that and agree to disagree. :)

    Thank you both for the comments. :)

    ReplyDelete
  4. I read your post, and while it sounded pretty convincing at first, I see some nagging problems:

    - You compare Mercurial without any activated extensions to Git, which naturally is an unfair comparision, because Mercurial intentionally hides some of its functionality from new users to make it hard to shoot yourself in the foot. There are enough jokes about the starting investment for using git, that I see this as fully justified.

    - If you wanted to share your experimental changes from a clone, you’d just rewrite your local history to put them on a branch (i.e. via mq or histedit - or with mutable, which allows for distributed history rewriting and history conflict resolution but is still in heavy development: http://hg-lab.logilab.org/doc/mutable-history/html/ ).

    - You get rid of a branch by closing it. But reusing the name later does not hurt you, because Mercurial does not consider the closed branches as update or merge target. So you only have to avoid using branch names you can see right now. I tend to start new branches when I experiment with something and that does not hurt at all (aside of requiring me to use the -f flag for hg branch at times).

    - Having anonymous heads does not hurt that much - but fear of them is a misconception I often see with git users. In Mercurial multiple heads simply work. You only avoid them when publishing changes, because you want to ensure that others know the canonical versions of changes. In git the missing support for multiple heads on a branch leads to using all sorts of broken branch names (like “tmp” and “foo”) instead of just using anonymous branching. On the other hand, Mercurial even has divergent bookmark marking, so you can do divergent work on shared bookmarks without having to do some kind of auto-merge as in git.

    - Have you tried the branch-by-clone approach with git? I tried, and it complained that I cannot push into a checked-out-branch… (because it cannot handle multiple heads per branch).

    - A global namespace for branches makes it easier for new people to come in, because they see the same branches which every experienced developer sees. Turned the other way round it means that the developers actually use the branch layout which they provide to new users (dogfooding your project layout). It makes it harder to fit the repo to every possible local workflow, though (i.e. “the main branch always has to be `trunk`”).

    - If you are not interested in all the changes all other people do in the project, you’d normally just pull from the repositories you are interested in. With a single shared repository, you’d just pull certain branches.

    - Git branches: “You have to explicitly associate them in configuration” — that’s part of the complexity which horrifies new users. And rightly so: For most workflows it is not necessary, because if you can do local work with multiple heads on the same branch, most of the usefulness of temporary or local branch names evaporates.

    Overally you have a matter of philosophy, though: Mercurial is “project and new user first”, Git is “single experienced developer first”. You capture that to some degree when you say “sharing the whole world”. Mercurial gives you a shared project whose state gets updated in a distributed way. Git gives you many different projects which can get changes from each other - which makes the whole process of working *together* much more complex.

    PS: I just stumbled over your post while searching for other stuff and it sounded interesting enough to read it :)

    ReplyDelete
  5. Your arguments are highly subjective. You mention Mercurial's public-by-default treatment of named branches--like other history-preserving features, this is useful if you are working with people who ever make mistakes (which is the default in my experience) that you have to troubleshoot. Further, "don't mess with history" is just a guideline--if you're experimenting, feel free to `hg strip` when you're done (such things wouldn't exist if "don't mess with history" was a hard-and-fast rule).

    Honestly, if these are your top gripes about Mercurial, you should take a hard look at Git. I've used it at least as long as Mercurial (and probably more often), and I still get myself into detached head state with every other operation (and I usually have a difficult time correcting it). Further, every time I ask for help on something, I'm either told to go read the entire Pro Git book or enter in some obscure command with a warning like, "Be careful, if you enter a wrong parameter you'll irreversibly delete your entire repository". Call me old fashioned, but I prefer the tool that lets me do everything I need without overcomplicating everything.

    ReplyDelete
  6. @Craig Weber: Thanks for the comment.

    Public branches don't really make troubleshooting easier. It might make it more difficult for incompetent people to lose history, but the problem there is the incompetence, not the software, and no amount of software can fix that. The Mercurial community has somewhat folded in the past few years and accepted that history editing is a route operation. A few years ago they were religiously against it and you had to basically learn to do it yourself because they wouldn't help.

    It's absolutely brain-dead simple to get out of a detached head state in Git. Just assign your head to a new name (i.e., branch) or checkout an existing branch. For example, git checkout -b or git checkout . The only way that you should ever get into that state is if you checkout a that isn't a branch (i.e., usually to go back in history), or if a rebase gets interrupted. In the latter case, git rebase --abort will take you back to sanity, or you can work through it and git rebase --continue and eventually will also be returned to sanity.

    Git makes it very difficult to lose work. Much more difficult than Mercurial. You might not realize that it's so easy, but trust me it is. git reflog is your friend. On the other hand, I have irreversibly lost changes with Mercurial on several occasions. Basically Git is much simpler than Mercurial, but the core command set is quite a bit bigger, and the terminology is a bit more technical. The learning curve is about the same, but Mercurial seems easier at first because they start you off in "basic mode" with a minimal set of features. When you actually start to use it to its full potential Mercurial ends up being way more complicated, error prone, and buggy compared to Git.

    It is a source code management tool for programmers/hackers. It assumes that you are comfortable with code. That's all a command line is. Code. If that scares you then you're in the wrong line of work. The commands aren't even obscure when you understand how the tool works. And you're encouraged to understand the inner workings of Git.

    As for community assistance, the Git community is much more friendly and helpful than the Mercurial one. I have spend years in both IRC channels, both helping people and asking for help and #mercurial tends to be very snobby. Either you "shouldn't" do something so they won't help you, or you clearly did something wrong so they won't help you, or Mercurial just can't do something so they won't help you. They're great help for the basics though! Conversely, there are several people in #git (myself included) that will do their best to help you figure out what you did wrong and how to recover from it. We will also help you to figure out just about anything you want, regardless of how destructive it is (though we try to give you ample warnings when this is the case).

    They certainly seem to suit a different type of person. There's nothing wrong with that. I can get by with Mercurial at great pains, but I much prefer to work with Git. It is simple, stable, and reliable. I'm not even sure how to do history editing in Mercurial anymore because they've gone and changed the API again. Allegedly the new system will be far superior, but too little too late. Last time I used it, at the recommendation of the devs themselves, I permanently lost work. Fortunately for me, I was in a situation where I was able to switch over to Git at work and haven't looked back. I haven't touched Mercurial in almost a year now.

    ReplyDelete
    Replies
    1. Sorry, apparently the comment section tries to process HTML tags... Insert <name> when it looks like there's something missing. <:)

      Delete
  7. One of the worst version control system I have ever used... Mercurial sucks so much

    ReplyDelete