Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think the current Microsoft GUID is just UUIDv7.

https://learn.microsoft.com/en-us/dotnet/api/system.guid?vie...

I don't think there's a "Microsoft standard" and they just use different versions of UUID in different products over time. No idea why they call it GUID instead of UUID though, but it's easier to speak out loud so I'm not against it.

v7 has a timestamp indeed, but isn't the time making it more collision resistant? You'd have to generate tons of UUIDv7s in the same millisecond, while v4 is more likely to collide due to not being time-constrained and the birthday paradox.

I think both have their uses though. You might need pure random if you want your UUID not to convey any time information and you're not generating tons of them (e.g. a random user id).

What do you mean "model"? Are you referring to UUIDv1 which has time and MAC address?



Years ago, Microsoft took the same algorithm that was being used to generate these things for Remote Procedure Calls in the Open Software Foundation's Distributed Computing Environment and used that algorithm to generate IDs for its Component Object Model. This was all happening in the late 1980s, and at a point where none of it was hard and fast.

If you were doing RPC in OSF DCE your IDs were UUIDs, and if you were doing COM in Wintel your IDs were GUIDs; and that was basically the difference, a different name for the same thing when used in a different domain.

Plus the difference in endianism because one was a network-byte-order network thing and the other was an Intel Architecture byte order thing, and only some parts of these IDs were technically multiple-byte integers with byte orders to have.

But by the late 1990s this had already become lost to history, with a sea of people who had made all sorts of inferences and promoted them as gospel truth, from the fact that Microsoft had two programs named GUIDGEN.EXE and UUIDGEN.EXE, from the fact that many generators sprang up and the whole idea spread to Java and databases and this new-fangled WorldWeb thing and all sorts of stuff, from the fact that there sprang up multiple different versions of these IDs and what version an ID was depended from tooling and libraries, and from the fact that at the time Microsoft was less likely to go through formal standards processes and more likely to just write and ship things and sponsor a book and a CD-ROM of doco so if your world was RFCs and the IETF you had one worldview and if your world was Microsoft Press and the MSDN you had another worldview.


> isn't the time making it more collision resistant?

That seems to depend a whole lot on the pattern your application generates UUIDs in. If you're generating a consistent distribution over time, sure. If you generate a whole lot in bursts, collision seems to be way more likely.


You have to generate 2^37 (137,438,953,472) UUIDv7s in the exact same millisecond to have a 50% chance of collision.

(Not disagreeing with you, just adding perspective.)


The math is interesting here as you'll probably want to run your system for several years, not just a single millisecond. So it's a repeated trials problem. I spent some time trying to figure out the ID generation rate that would be a "break even point" between UUIDv4 vs UUIDv7, but I didn't trust the answer I got.

(Agreeing with both parents)


Good observation. Could you share the math even if you don't trust it? I don't have pen and paper here and I'm curious.

After thinking it more, I have the feeling (against my initial intuition) that v4 might dominate either way unless you consistently generate tons of UUIDs for an impractical number of years.


I ran some numbers by GPT-5[0], and for the scenario of generating 10k UUIDs in one ms every 10ms, over three years, it came up with a 0.0025% chance of collision for UUIDv7, and a 0.000000084% chance for collision with UUIDv4.

[0] https://kagi.com/assistant/dd7d8c48-44e4-499b-9f2f-33663d125...


I checked against my notes, I see about the same numbers using the `n**2` taylor series approximation. I missed that the probability of `>=1` collision is about the same as exactly one collision, but I suspect that's quite reasonable as this scale.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: