AFAIK the only "crypto" in git is GPG used to sign tags. The content addressable...

haberman · on Dec 25, 2013

Git assumes that a matching SHA1 means that the content is equal to the original content. Is that not crypto? For example, if you sign a tag, it appears to sign the SHA1 of the associated content.

This is definitely outside of my expertise, so I'm sure that my understanding is incomplete. The larger questions for me are:

- if git's SHA1 content-addressable design is not crypto, how do you distinguish crypto from software like git that uses cryptographic primitives for useful purposes?

- is a project like git a safe/sane thing for a non-cryptographer to design and implement? If so, why do all the warnings in this article not apply?

Nursie · on Dec 25, 2013

SHA1 is a hash. That alone does not make it 'crypto'. It's just a hash.

It happens to be a hash with fairly good collision resistance. Maybe not the best we have for modern crypto purposes, but that's not what it's being used for. It's being used to check and record unique tags for patches (as far as I know, my git knowledge is far from complete).

Cryptography does concern itself with data integrity but the reverse does not have to be true if you're not talking about part of a system you care about mitigating attacks on.

That's my take anyway.

--Edit-- for completeness I should add that SHA1 by itself is absolutely not a MAC

alinajaf · on Dec 25, 2013

I'm not a cryptography expert either, but I'll give this a crack...

In as much as SHA1 is a "cryptographic hash function", Linus isn't taking advantage of a few of it's cryptographic properties in his usage of it in git. It would for example, make no difference to the workings of git if you could reverse-engineer the contents of an object from its SHA1. In the same way, it doesn't matter much to the operation of git that you can generate collisions for SHA1, though if you were running into collisions all the time, it would make everyday usage difficult.

> if git's SHA1 content-addressable design is not crypto, how do you distinguish crypto from software like git that uses cryptographic primitives for useful purposes?

If I'm understanding you correctly, software that "uses cryptographic primitives for useful purposes" is usually trying to guarantee one or more of the following:

* Confidentiality - Keeping data secret

* Integrity - Making sure data hasn't been tampered with

* Authentication - Making sure that the person you think sent the data is in fact the person who sent the data.

* Non-repudiation - Ensuring that the person who sent the data can't deny that they in fact sent the data.

Git makes guarantees about none of the above in its usage of SHA1. You could argue that it makes a guarantee of integrity in its content addressable file store, but it doesn't. If you can modify the files in the .git/ directory, you can screw up the repository to your hearts content. There's no way to do so remotely, i.e. by creating and pushing a Git commit with an existing SHA1. You typically protect Git from local tampering by only allowing access via SSH, which has plenty of crypto in it.

When it does make guarantees about authentication (signing tags) it uses GPG more or less off the shelf. In that case the SHA1 is a reference to a commit object, and you're saying that "I, Najaf Ali, sign off on the commit with this SHA1". It doesn't guarantee anything about the contents of that commit if the repository has been tampered with.

> is a project like git a safe/sane thing for a non-cryptographer to design and implement? If so, why do all the warnings in this article not apply?

See above on git not making the guarantees that software that "uses cryptographic primitives for useful purposes" tend to make. Since Git makes none of those guarantees, it's (I think) a safe/sane thing for a non-cryptographer to design and implement. In practice, what Linus has done has let other off-the-shelf crypto (SSH and GPG) make the required guarantees for him.

haberman · on Dec 25, 2013

Thanks for this, you answered my questions thoroughly. I'm not entirely convinced by this though:

> [A signed commit] doesn't guarantee anything about the contents of that commit if the repository has been tampered with.

I think most people would intuitively expect the signed commit to guarantee the contents of the tree being signed. The idea that you could "git pull" a repo from a compromised machine, verify the signed commit, but not actually have a guarantee about the tree matching the one that was signed would run counter to most people's expectations, I suspect.

In other words, this to me seems like a "technically, we don't guarantee" statement about something that is de facto thought to be guaranteed.

alinajaf · on Dec 25, 2013

I did a bit of quick reading on this and at first glance my description of how git tagging works appears to be on point, i.e. all it guarantees is that a particular user asserts that tag X points to commit with SHA1 Y.

I'm not sure that it says anywhere in the documentation that it guarantees anything more than that, but I agree that a significant proportion of developers would intuitively expect that the entire content of the tree to be signed rather than just the SHA1.

haberman · on Jan 3, 2014

> I'm not sure that it says anywhere in the documentation that it guarantees anything more than that, but I agree that a significant proportion of developers would intuitively expect that the entire content of the tree to be signed rather than just the SHA1.

Further evidence that they do assume that: https://news.ycombinator.com/item?id=7003900

alinajaf · on Jan 7, 2014

I agree that they do assume that, but fail to see what connection that has to the actual workings of git. AFAIK the behaviour of software doesn't change in accordance with how developers think it works.

tsahyt · on Dec 25, 2013

I'm no expert but as far as I understand, the signature would be valid as long as the hash stays the same. If the commit has been tampered with in such a way that the hash does not change, the signature would still appear valid.

tsahyt · on Dec 25, 2013

I thought the hash was in fact used for ensuring data integrity. That's pretty much what Linus stated when he said you have a guarantee that the data you put into your repository is exactly the data you get out of it.

rcxdude · on Dec 25, 2013

git's SHA1 is useful in one sense cryptographically: If you have a sha1 hash, then it represents the entire state of the repository and its history, so if you have a copy of the hash you can verify that any given copy of the repo has not been tampered with (but a way to generate collisions would subvert this). This has been used in a few cases when repository servers have been broken into (and a similar feature in bitkeeper allowed the detection of an attempted backdoor insertion in linux).