Github's fork queue is dangerous

defunkt · on May 16, 2009

The Fork Queue is a replacement for email patch management, not a merge tool. It is for cherry picking changes from contributors, not merging branches.

This is why the author information is retained and --signoff is used.

As Hongli Lai says in the comments: Most of the time, people who fork my projects make a bunch of changes, some which I want and some which I do not want. Cherry-picking instead of pulling totally makes sense in these situations.

This is the situation it was designed for - reviewing and signing off on individual commits, not merging branches.

I think the takeaway from this article is that we need to do a better job explaining the feature and what it's for. I understand now how someone new to the site would have no concept of email patch management or git cherry-pick and assume this is a nice interface for `git merge`.

As a first step, I've improved the text in the help dialogue to be more clear and link to the blog post + intro video more prominently. Maybe a next step would be making the actual help link itself (it's a small question mark next to Your <project> Fork Queue) more visible.

Also we could detect if it was your first time using the Fork Queue and offer help - that seems to be a popular idiom for web apps these days.

Perhaps in the future we can even offer a first class merge tool as the author suggests, but for now I've been extremely happy with how the Fork Queue compliments (but doesn't replace) my offline workflow.

rjbs · on May 16, 2009

That all sounds like good stuff. A lot of people -- and in this case, I do mean highly technical users -- think that the Fork Queue is a fantastic place to see what's going on and merge it as if it was a "real" merge. I always thought this seemed nuts, and now I'm glad to see that it's not the intended use.

Personally, I never visit the fork queue, and instead use the network diagram -- which really is the bee's knees. Can you imagine clicking on a head and saying, "merge this?" I can. That would be awesome.

swombat · on May 16, 2009

Seems to me that the title of this article is a bit exaggerated.

The author should have titled it "I think Github should use pulling/merging instead of cherry-picking when dealing with forks". Of course, that would probably not have been voted to #1 on HN.

FooBarWidget · on May 16, 2009

I think the choice of cherry-picking over pulling/merging is the right one. Often I do not want to pull in all changes but only specific changes that I approve. Cherry-picking is perfect for this.

jrockway · on May 16, 2009

The problem is not that you are cherry-picking, but rather that github adds a "Signed-off-by" to the commit log. That is what breaks fast-forwarding.

However, "git pull" is smart enough to fix this, so it is not actually a problem.

See: http://blog.woobling.org/2009/05/githubs-fork-queue.html (notably the comments).

davidw · on May 16, 2009

I'm new to git. Can you/someone explain a bit further the advantages and disadvantages of both, and what they mean in terms of how things are actually set up in git? Articles like this make me wary of a tool that looks like it involves far too much fiddling and not enough "just working".

jemmons · on May 16, 2009

You mean the difference between a pull and a cherry-pick? I've been a user of git for a year now and this is my understanding:

git pull is just a synonym for two commands: git fetch and git merge. git fetch gets a copy of the remote branch you specify, git merge merges that branch into your local one.

For the sake of this conversation we're mostly interested in git merge. The primary function of git merge (in this context) is to create a new commit with your local head and the head of the remote branch you just fetched as its parents.

git cherry-pick is different. Instead of creating a commit with two other commits as a parent, it creates a commit with just your local head as its parent. So a cherry-pick doesn't bring two branches together. It just extends your local branch by one more node.

"Well," I suppose the OP is arguing, "If the point is to take a remote's code and merge it with mine, why would I ever want to pop those changes on top of my own branch, extending it, instead of making a nice tidy commit that merges both branches?"

The answer is sometimes (especially when you're working on public projects with anonymous contributers), you don't want all of the changes in a remote branch. You only want some of them. Just the good ones.

In those cases, you can't create a commit with the remote head as one of the parents because that would tell git that all of that branch's changes should be merged in, and that's not what you want. What you really want is a ways to rebase just one or two commits from the remote branch onto your local branch.

And that is exactly what cherry-pick does. It lets you specify a single commit, then rebases that commit on top of your local branch, extending it.

I think the author's problem is he was using this tool thinking it would create commits with both remote and local parents -- that it would merge his branch and the remote one. Instead it created commits that had only his branch as a parent -- that is, it extended his branch.

When a tool does something unexpected, it can be scary. But in this case there's no cause for concern. If what he really wanted was to merge the remote and local branches, he cans still do that. Git is smart enough to know which remote commits have already been applied to the local branch via cherry-pick (or any other means). It then just skips those when resolving merges.

rjbs · on May 16, 2009

I agree with all the technical things you said. The issue isn't that I don't know what the Fork Queue does -- I do. It's that many users do not, and it is unclear. Before I did know what it did, I thought it was a good way to let some of the less-technical members of my team deal with simple cases of collaboration without having to think about merging too much. Because it's cherry-picking whole trees (in a manner of speaking), it led to confusion all around, and would continue to do that.

The two solutions that defunkt and I both seem to like are (a) make it clear that the Fork Queue is not a general-purpose merge tool and (b) add a merge tool, someday.

chibea · on May 16, 2009

Merging: join two (or more) branches of history. The result is a new commit, which has all the merged commits as parents.

A1

|\

A2 B1

A3 B2

| B3

A4 |

| /

A5 (parents: A4,B3)

Cherry-picking: Pick one specific commit from another branch and apply it to the current branch. On your branch the commit gets a new hash (a new identity) since the commit identity is based of a) the parents of a commit and b) the file contents at this commit. There is no information left where the commit came from (aside from author and date).

A1

|\

A2 B1

A3 B2

| B3

A4

|

A5 resulting from cherry-picking B2

So both methods serve a different purpose: merging brings all of the changes of a branch into another branch; cherry-picking just brings in the changes from one commit into a branch. If you want to bring just one particular change into your branch there's no way around cherry-picking.

Another approach is to develop particular things in a topic branch from the start. If you like all of the changes in this branch you can merge it later.

The aversion against cherry-picking and its bigger brother rebasing (which basically cherry-picks all the commits of another branch) is based on the basic principle, that each commit represents a somewhat tested state of a program in time. All VCS want to ensure you can go back to these tested states later. Later, if a bug occurs you want to be able to find the commit where the bug was introduced, you can use git's bisect functionality to find this specific commit. There can be two types of bugs: original bugs and integration bugs. In the chery-picking/rebase case you have only one commit for a feature so you cannot easily decide if it is was an original bug or instead introduced by integrating a specific feature. Using merging, you end up having one commit for the feature and one for the merging (integration) and this can help in particular work-flows.

You might want to see discussions with and by Linus Torvalds on the linux kernel mailing list about this topic...

DannoHung · on May 16, 2009

It'd be kind of neat if the Cherry-picked patch retained some idea of its original identity allowing you to traverse forward and backward in the branch of origin, although I'm not sure how that would work when a patch traveled further and further abroad.

swombat · on May 16, 2009

Git tends to "just work". Don't worry.

davidw · on May 16, 2009

They're talking about 'rewriting history'. In a version control system. That's more than a bit scary...

... but apparently I'm the only one who thinks so?! Come on. Git has some neat stuff, but it's not that hard to mess up your repository. And that's a bad thing for a tool that should be either helping or staying out of the way.

FooBarWidget · on May 17, 2009

I don't see how it's rewriting history. Neither cherry-pick nor Fork Queue changes history that has already been published.

That said, rewriting the history of changes before they've been published is a really useful feature. I use it all the time for merging local branches before I publish them.

rjbs · on May 16, 2009

Hyperbolizing in blog titles is an ancient and time honored tradition. I probably should've stuck with "considered harmful" to make my tongue's placement a bit clearer.