I haven't used Fossil, but just a comment on some of that page, in the order the...

kazinator · on April 11, 2018

The "index" is a silly dongle in Git. One way to get rid of it would simply be to make it visible as a special "work in progress" top commit, visible in the history as a commit. "git add -p" would just hoard changes directly into the WIP commit, if it exists, otherwise create it first. Some sort of publish command would flip the WIP commit to retained status; then a "git add -p" would start a new WIP commit. "git push" would have some safeguard against pushing out a WIP commit.

The "--cached" option would go away; if you have a WIP commit on top, then "git diff" does "git diff --cached", and if you want the diff to the previous non-WIP commit, you just say so: "git diff HEAD^".

stashing wouldn't have the duality of saving the index and tree. It would just save the changes in the tree. Anything added is in the WIP commit; if you want to stash that, you just "git reset HEAD^" past it, and later look for it in your reflog, or via a temporary tag.

ajross · on April 11, 2018

> The "index" is a silly dongle in Git.

Everyone thinks that until they need to use it for something. If all you do is a bunch of linear small changes with obvious implications and two-line commit messages, then the index is nothing but an extra step.

But at some point you're going to want to drop a thousand-line change (from some crazy source like a contractor or whatnot) on top of a giant source tree and split it up into cleanly separable and bisectable patches that your own team can live with. And then you'll realize what the index is for.

kazinator · on April 11, 2018

What I described supports staging small changes and turning them into individual commits. Just the staging area is a commit object rather than a gratuitously different non-commit object with different commands and command options to deal with it.

ajross · on April 11, 2018

You'd still need separate commands, though. Commit vs. commit --amend. Add vs. add -p. Diff-against-grandparent vs. diff --cached. You just want different separate commands to achieve your goal, which is isomorphic to the index.

So sure: if you want a not-index which works like the index and has a bunch of 1:1 operations that map to the existing ones, then... great. I still don't see how that's much of an argument for getting rid of the index.

kazinator · on April 11, 2018

Well; yes; the tool can't read your mind whether you'd like to batch a new change with an existing one, or make a separate commit.

Remember that git, initially, didn't hide the index in the add + commit workflow! You had to "git add" and then "git commit". So the fact there is only "git commit" to do everything is because they realized that the index visibility is an anti-pattern and suddenly wanted to hide it from view.

Since the index is already hidden from view (and largely from the user's model of the system) in the add + commit workflow, we are not going to optimize the command set by turning the index into some other representation. That's not what this is about.

The aim is consistency elsewhere.

For instance, if the index is an actual commit, then if we abandon it somehow, like with some "git reset", it will be recorded in the reflog.

Currently, the index is outside of the commit object model, so it gets destroyed.

It's possible for a git index to have content which doesn't match the working tree; in that case when the index is lost with a git reset, that content is gone.

If the index is a commit, it can have a commit message. It can be tagged, etc.

krupan · on April 11, 2018

Did you read parent comment? How is dealing with the thousand-line change not possible with what they described (hint, it's totally possible, no index needed)?

WhyNotHugo · on April 11, 2018

HOW is it possible?

Imagine I have a file with lines 10 and 150 changes. How do you commit just one without some form of index (or alike)?

kazinator · on April 11, 2018

> or alike

Well, that's the weasel word. In my grandparent comment, I proposed the "alike", didn't I?

Nowhere did I say, just remove the index from Git, but don't replace its functionality with any other representation or mechanism.

In git, we can do that today in such a way that the index is only temporarily involved:

  $ git commit --patch
  ... interactively pick out the change you want ...
  $ # now you have a commit with just that change

It is not some Law of Computer Science that the above scenario requires something called an "index", which is a big archive holding all of the files in the repo, where these changes are first "staged" before migrating into a commit.

carussell · on April 11, 2018

The problem is not that Git supports staging partial changes. The problem is that Git has shoehorned a tool that "at some point you're going to want"—to help you deal with a rare occurrence—into the default workflow, forcing you to deal with the overhead of staging every time.

Izkata · on April 11, 2018

> to help you deal with a rare occurrence

..I actually use it with almost every commit, so I don't add reminder comments and debugging statements.

YZF · on April 11, 2018

It's basically the overhead of typing -a ... so git commit -a rather than git commit. It's not such a big deal. it does take a while to get used to the git "pipeline" tbh but when the rare occurrence happens you have this option, on source control systems without this option you just don't have it.

carussell · on April 11, 2018

You're minimizing the overhead. `git commit -a` also won't help you with new or renamed files. So when you write about "the overhead of typing -a", what you really mean is the overhead of

1. typing `git commit`, checking the output, then typing `git commit -a`, or

2. typing `git commit`, then moving on with your life, and realizing minutes, hours, or days later that the changes you meant to include were not actually included, so you have to go back and add them if you're lucky, untangle them from whatever subsequent changes you were trying to make and/or do an interactive rebase if you're unlucky, and maybe face the prospect of doing a `git push --force` if you're really unlucky

Scale that up to several days or weeks to match the learning period and repeat for every developer who has to sit down and interact with it. That's the overhead we're talking about.

The article got it right; this is a monumental waste of human effort.

> Every developer has a finite number of brain-cycles

kazinator · on April 11, 2018

I've never used a version control system that didn't have to be notified about which files you would like to add. "vc commit" cannot simply pick up all files and put them under version control, because you have junk files all over the place: object files, editor backups, throwaway scratch and test files and so on.

But even when we use "git add", we are not aware of the index. The user can easily maintain a mental model that "git add" just puts files into some list of files to be pulled into version control when the next commit takes place. That is, until that silly user makes changes to the file after the git add, and those changes do not make it in because they forgot the "-a" on the commit.

YZF · on April 12, 2018

I use an IDE with git integration. So I really never worry about most of this but I do also interact with the command line. When I create a new file in my IDE it asks me if I want to add it ...

I won't argue there is a relatively long learning period with git ... It helps if you have some experienced mentors in this area. But you get a lot of power for this...

kazinator · on April 12, 2018

> asks me if I want to add it.

Where "it" is just the snapshot of that file as it is now, not as it will be at commit time; then you have to add it again!

YZF · on April 16, 2018

right, but at that point it is only git commit -a

sigjuice · on April 11, 2018

At what point do you test these "cleanly separable" and "bisectable" patches? Do you do a second pass where you check out and build/test each of these commits?

ajross · on April 11, 2018

It's pretty routine for a CI integration to test every patch, yeah. Not all do. (e.g. Gerrit-based systems generally do because the unit of review is a single patch, github likes to do integration on whole pull requests). It's certainly possible. I don't really understand your point. Are you arguing that it's preferable to dump a giant blob into your source control rather than trying to clean it up for maintainability?

sigjuice · on April 11, 2018

No, I prefer small meaningful commits. I am not for or against the index. I have no problems switching my brain between git add -A or -p as necessary. Like you said, it happens too often that someone sends you a huge pile of code (C code, in my case). My first impulse is to build and run it immediately. For me, just compiling the code can take up to an hour sometimes. Running my full test suite takes even more hours.

At some point I am ready to craft this code into multiple commits. After my first git add -p and git commit, I don't know if HEAD is in a state where it even compiles. It takes further work and discipline to produce a whole series of good commits.

joesb · on April 11, 2018

I think he is arguing that you should work on one feature at a time.

ajross · on April 11, 2018

And I was saying that as a practical matter, you don't always get that option. Individual developers working on their own code don't need the index. But then quite frankly they don't need much more than RCS either (does anyone remember RCS? Pretend I said subversion if you don't).

Situations like integrating a big blob of messy changes happen all the time in real software engineering, and that's the use case for the git index.

kazinator · on April 12, 2018

I split work consisting of multiple changes in the same file just fine under CVS and Quilt.

I would convert the change to a unified diff, remove the change, and then apply the selected hunks out of that diff with patch. ("Selected" means making a copy of the diff, in which I take out the hunks I don't want to apply. Often I'd just have it loaded in Vim, and use undo to roll back to the original diff and remove something else.)

Using reversed diffs (diff -uR) I used also to selectively remove unwanted changes, similarly to "git checkout --patch"

This is basically what git is doing; it doesn't require the index. The index is just the destination where these selective hunks are being applied.

chris_wot · on April 11, 2018

Is it really that many developers who don't split up their code into seperate commits?

klodolph · on April 11, 2018

Personally, Yes. Often I discover a commit won’t build so I do a little bit of interactive rebasing to move some dependent change into the same commit or an earlier one.

Pyxl101 · on April 11, 2018

You would compose a series of local commits and, if you wanted, test them individually before pushing them. With a bunch of changes made to your files locally, you'd use tools like `git add -i` or `git add -p` to stage subsets of your changes to make those commits. As you finish building these commits, you would be left with a series of commits to your local branch, and no additional unstaged or uncommitted changes. You're "draining" the uncommitted changes into commits, part by part. Commands that manipulate the index are how you describe what you want to do to Git.

"Index" is probably not a helpful term. I think of them as simply "staged changes", that is, changes that will be committed to the repository when I run `git commit`, as distinct from local changes that will not be committed when I run `git commit`. With a Git repository checked out, just editing a file locally will not cause it to be included in a commit made with `git commit`. Rather, `git add` is how you describe that you want a change to be included in the "staged changes" that will be committed. You can add some files and not others, or even parts of a file and not other parts.

The need for this doesn't come up especially often, but it's really helpful when it does. One common case where this can come up is when you've been developing for a while locally, and you realize that your changes pertain to two different logical tasks or commits, and you want to break them up. Maybe one commit is "upgrade these dependencies" and the other is "add feature X". You started upgrading dependencies while building feature X, but the changes are logically unrelated, and now the dependency change is bigger than you expected and deserves to be reviewed on its own.

So with all of these changes in your workspace, you'll stage just the changes for "feature X" or "upgrade dependencies" and then run `git commit`. At this point, maybe you'll move this commit into its own branch or pull request in order to code review and ship it separately. (You might use `git stash` to save the remaining uncommitted changes while you do this.) Then you'll return to the remaining changes, which you will stage and commit as well. You've just gone from a bunch of unstructured, conflated changes to two or more separate commits on different branches/PRs (if that's what you want), that can be reviewed and shipped independently. You've gone from a massive change that's too big to review, to multiple bite-sized pieces.

These tools are also especially helpful if you, for any reason, need to manipulate source control history, such as breaking up one already-made commit into several commits, or simply modifying an existing commit. To do this, you would take that commit, apply it to the local workspace as if it's an unstaged change, and then, starting from the point in history before that commit was made, stage parts of the changes again and check them in. At this point, you can push the changes as a new branch, or even rewrite history by replacing your existing branch.

To give a use-case for this last capability, imagine that a developer accidentally checks in sensitive data as part of a big commit. Before (or even after) shipping the change, you realize this, so you want to go back and edit that commit, to remove the part of the change that checked in data while leaving the rest of the changes. You would describe these manipulations with the index as described in the previous paragraph.

krupan · on April 11, 2018

You seem to miss the point that instead of staging changes out of your working directory and then committing them, you could just commit those changes out of your working directory. The extra step of staging is not needed.

Everywhere you said stage you could say commit (or amend) and then not need the extra step of committing afterwards.

Pyxl101 · on April 11, 2018

> You seem to miss the point that instead of staging changes out of your working directory and then committing them, you could just commit those changes out of your working directory. The extra step of staging is not needed.

How would you describe this? One massive `git commit` with a ton of parameters? I don't see how it could work.

How would you describe "commit the first hunk of fileA (but not the second), and the second hunk of fileB (but not the first), and all of file C?". How do you "just commit those changes"? I believe you are missing how to actually describe this on the command line or with an API.

The index is absolutely needed. It's what allows you to build up a commit through a series of small, mutating commands like `git add fileC`, `git add -p fileA`. The value of the index is that you can build up your pending commit incrementally, while displaying what you've got with `git status`, then adding to it or removing from it.

fyi1183 · on April 11, 2018

What krupan and kazinator are saying is that you would use exactly the same command as today, i.e git add -p etc.

However, unlike today, those commands would automatically also do the equivalent of:

    if HEAD is marked as WIP
    then git commit --amend
    else git commit --special-wip-flag

As somebody who understands and uses the git index, I would wholly approve of this change.

kazinator · on April 11, 2018

Do you know how to use Git at all?

You can build a commit using multiple small "commit --amend --patch" commands. These use the index, but only in a fleeting, ephemeral way; changes go into the index and then immediately into a commit. They go into a new commit, or if you use --amend, into the existing top-most commit.

The index is a varnish onion.

Git has too many kinds of objects in its model which are all bags of files.

I do not require a staging area that is neither commit, nor tree.

Look at GIMP. In GIMP, some layer operations get staged: you get a temporary "floating layer". This gets commited with an "anchor" operation, IIRC. But it's the same kind of thing: a layer. It's not a "staging frame" or whatever, with its own toolbox menu of operations.

jsolson · on April 11, 2018

How do you build up the incremental changes? By staging them into your working directory from the one giant commit?

kazinator · on April 11, 2018

Huh? In the obvious way:

  $ git commit --patch 
  ... pick out changes: commit 1 ...

  $ git commit --patch
  ... pick out changes: commit 2 ...
  
  $ git commit --amend --patch
  ... pick out more changes into commit 2 ...

  $ git commit --patch
  ... pick out changes: commit 3 ...

There, now we have three commits with different changes and were never aware of any "index"; it was just used temporarily within the commit operation.

Oops, the last two should have been one! --> git rebase -i HEAD~2, then squash them.

The index is too fragile. You could spend 20 minutes carving out very specific changes to stage into the index. And then you do something wrong and that staging work is suddenly gone. Because it's not a commit, it's not in the reflog.

You want changes in proper commit objects as early as possible, then massage with interactive rebase, and ship.

Suppose HEAD points to a huge commit we would like to break up. One simple way:

   $ git reset HEAD^

Now the commit is gone and the changes are local. Then just do the above procedure: commit --patch, etc.

sigjuice · on April 11, 2018

Thanks for 'you want changes in proper commit objects as early as possible'. I can't tell you how much time I have wasted with a botched index.

kazinator · on April 11, 2018

>Rather, `git add` is how you describe that you want a change to be included in the "staged changes" that will be committed. You can add some files and not others, or even parts of a file and not other parts.

That's largely outdated. For years now, git's commit command has been able to stage changes and squirrel them into the commit in (apparently to the user) one operation.

Only people who learned git ten years ago (and then stopped) still do "git add -p" and then a separate "git commit" instead of just "git commit --patch" and "git commit --amend --patch" which achieve the same thing.

xenomachina · on April 11, 2018

I've had the same feeling about the index, except I want the opposite setup. Instead of the index becoming a commit, I want to always commit my working tree, and for there to be a special sort of stash for the stuff I don't want to commit just yet. I want to commit what's in my working tree so I can test before committing.

The way it would work is that when I realize I want to do a partial commit I'd stash, pare down the changes in my working tree (probably using an editor with visual diff), test what's in my working tree, commit, and then pop the stash.

I had hoped that this would already be doable with git, but it isn't, at least not in a straightforward way. The problem shows up when you try to apply that stash. You get loads of merge conflicts, because git considers the parent of the stash to be the parent of HEAD, and HEAD contains a bunch of the same changes.

I'm sure there's some workaround for this, but every time I've asked people always tell me to not bother testing before committing!

tome · on April 11, 2018

Hmm, I'm not sure exactly what you're after, but how about this

    git commit --patch # Just commit the bits you want
    git stash # Stash the rest
    <do your tests>
    git stash pop
    <continue developing>

xenomachina · on April 11, 2018

Yes, that works, but it means I'm committing with a dirty work tree, so I can't really test before I commit.

You're probably going to say I shouldn't test before every commit, but I rarely work in a branch where the bar is as low as "absolutely no testing required". I generally at least want my build to pass, or some smoke tests to pass, and I can't reliably verify either of those with a dirty work tree. And actually, the fact that all of the commits on my branch are effectively going to end up in master (unless I squash) makes me want to to have even my feature branches fully tested.

tome · on April 12, 2018

> Yes, that works, but it means I'm committing with a dirty work tree, so I can't really test before I commit.

Ah, I see what you want to do now.

> You're probably going to say I shouldn't test before every commit

If have no business telling you what you should do. If you want to test before committing, your wish is my command

    git stash --patch # Stash the bits you don't want to test
    <do your tests>
    git commit <options> # Commit the rest when the tests pass
    git stash pop
    <continue developing>

xenomachina · on April 12, 2018

Interesting! I guess that works because the stash doesn't contain the changes I'm going commit now, and so it doesn't conflict (at least in simple cases).

I never use `--patch` (even with `git add`). I prefer to use vim-fugitive, which lets me edit a diff of the index and my working tree. It looks like being able to do something similar with stashes is a requested, but not yet implemented, feature for vim-fugitive: https://github.com/tpope/vim-fugitive/issues/236

tome · on April 13, 2018

All I can say is I use

    git commit --patch

(and therefore `add --patch` is never even required!)

    git checkout --patch

to selectively revert hunks, and

    git stash --patch

to selectively stash. I couldn't be happier with this workflow!

carussell · on April 11, 2018

This workflow can be yours, in 10 easy* payments:

  $ git stash
  $ git checkout stash@{0} -- ./
  $ $EDITOR # pare your changes down
  $ make runtests # let's assume they pass
  $ git add ./
  $ git commit
  $ git checkout stash@{0} -- ./
  $ make runtests # let's assume they pass
  $ git add ./
  $ git commit

*(In case it appears otherwise, this isn't actually supposed to be a defense of Git. Originally, this was 15 steps, but I edited it into something briefer and more straightforward.)

andreareina · on April 11, 2018

What you describe changes the name from "index" to "WIP commit", keeping the same semantics. Along the way, you now have a "commit" that doesn't behave like a commit, further adding to the potential for confusion. I strongly believe that things that behave differently should be named differently.

kazinator · on April 11, 2018

> What you describe changes the name from "index" to "WIP commit", keeping the same semantics.

I.e. you get it.

Importantly, the semantics is available through a common interface rather than a different design and implementation of the semantics for the index versus commits.

> you now have a "commit" that doesn't behave like a commit

Well, now; literally now you have a commit that doesn't behave like a commit: the index.

If a real commit is used for staging, it behaves much more like a commit. It's just attributed as do-not-publish so it doesn't get pushed out. Under this model, all commits have this attribute; it's just false for most of them. Thus, it isn't a different kind of commit.

> I strongly believe that things that behave differently should be named differently.

Things that do not behave completely differently can use qualified names, in situations when it matters:

"work-in-progress commit; tentative commit; ...."

For instance we use "socket" for both TCP and UDP communication handles, or both Internet and Unix local ones.

Hello71 · on April 11, 2018

1. I actually ran it, and ironically, `git log --oneline --reverse` runs faster than `fossil descendants` on the sqlite repository for new commits, and just as fast on old commits. perhaps fossil would do better on a large repository, but I doubt it. git has many flaws, but "insufficent optimization (compared to alternatives)" is not one of them.

_ugfj · on April 10, 2018

As branch names are not recorded in commits, it is pretty much impossible to say what commit was next in a branch.

ajross · on April 11, 2018

That's because commits are not owned by branches. The same commit can be present in two or more separate branches with different successor commits in each. And this is a feature, not a bug! It's what allows you to ask the question "Where did these two branches fork?".

Now, I don't know how Fossil answers that question. Maybe it's got some clever trick (like a separate "universal ID" vs. "per-branch ID" for each commit, maybe). Maybe SQLite doesn't need that and doesn't care. But it's not like this is a simple feature request. Git was designed the way it was for a reason, and some of us like it that way.

cookiecaper · on April 11, 2018

> Git was designed the way it was for a reason, and some of us like it that way.

For the record, the initial version of git was developed in haste as a replacement for bitkeeper, after Larry McVoy (who shows up here on HN once in a while, and always has great posts) got a little aggressive about the licensing.

BitKeeper was open-sourced a year or two back: https://github.com/bitkeeper-scm . I haven't had a chance to use it extensively yet, but I hear it still has a good selection of features that git either chose not to implement or didn't properly understand.

Don't get me wrong, I think git is pretty great and have been one of its primary advocates over SVN, etc., at most companies I've worked with (I think 2013 was the first time I came on board a team that was already using it), but its history is informative.

EDIT: upon scrolling, I see that Larry has already dropped by. Read his posts! https://news.ycombinator.com/item?id=16806588

_ugfj · on April 11, 2018

For the record, in my opinion git is the worst SCM among the distributed ones and I truly believe it was only hype that carried it. The user interface is genuinely user hostile with so many commands and yet many commands have switches that change the behavior so much it very well might be a different command. And so forth.

gmueckl · on April 11, 2018

Git is just a patch database management system that tries too hard to become a version control system. It is proof that this approach is fundamentally misguided.

Macha · on April 11, 2018

However, mercurial offers both named branches (hg branch) and pointer branches (hg bookmark) and when I last used it, it seemed the consensus was shifting to one of two positions, (a) just use hg bookmark or (b) use named branches for long lived branches only (master/default, develop, release branches etc) and bookmarks for feature branches/dev use.

vlovich123 · on April 10, 2018

Have you looked at the --branches option for git log? It annotates which branch(es) a commit is part of unless you're talking about something else. There's obviously some cost but all this talk of "slow" is ignoring the fact that in practice on small repos (& yes - sqlite is small for git) you're not going to notice it. Also, you can limit your branches to those you're interested in to speed things up.

avar · on April 11, 2018

No, it's not impossible. It's quite easy. What it isn't is immutable. I.e. you can have a repo now with just "master", and I can push a new branch one for each commit in it, and make any such tool output useless garbage.

I.e. in some DAG implementations the branch would be a fundamental property at write time, in git it's just a small bit of info on the side.

What I'm referring to is that for the common case of something like the SQLite repository that uses branches consistently it's easy to extract info from git saying "this commit is on master, and on the LHS of any merge to it", or "this commit is on master, but also the branch-off-point for the xyz branch".

The branch shown in the article is a perfect example of this. In that case all the same info exists to show the same sort of graph output in git (and there's even options to do that), it's just not being done by the web UI the author is using.

petre · on April 11, 2018

I can view all the branches simoultaneously in fossil by opening a web page or by running fossil timeline.

https://www.fossil-scm.org/index.html/timeline

WhyNotHugo · on April 11, 2018

The lack of an index (or something alike) was what made using hg/svn unusable one I'd moved over to git. The ability to closely review what I'm committing (and not commit "every single change in the repo") is a must.