What is the risk? It seems very small to me.

wglb · 2025-03-25T00:46:13 1742863573

The risk is that some divisions will be fully off. If it is a chain of calculations, e.g. some stress analysis, or a spreadsheet involving a chain of calculations for a financial report, it could be bad.

charcircuit · 2025-03-25T00:53:28 1742864008

Divisions being off isn't the end of the world. Even without the bug the division can be fully off due to it using fixed precision floats.

Stress analysis and financial reports are more likely to be wrong due to other sources of error than a division being slightly off. If you really wanted exact numbers you wouldn't be using fixed precision floats anyways.

wglb · 2025-03-25T01:33:36 1742866416

From the wikipedia article:

Abrash spent hours tracking down exact conditions needed to produce the bug, which would result in parts of a game level appearing unexpectedly when viewed from certain camera angles.

charcircuit · 2025-03-25T04:33:21 1742877201

Yet, they thought the 1 frame flash was insignificant enough to ship the game with it instead of spending time to workaround the bad division. But thank you for providing an example.

hansvm · 2025-03-25T01:57:03 1742867823

Alright, then quantum chemistry simulations. It's very common in the field to have algorithms with known error bounds given a certain floating point size and to choose a size amenable to the scale of simulation you intend to attempt. If some of your computations are at half precision, the results are hosed.

charcircuit · 2025-03-25T04:19:10 1742876350

Most consumers are not doing quantum chemistry simulations.

insufferable_tw · 2025-03-25T03:35:09 1742873709

This is a perfect example of "normalization of deviance".

brookst · 2025-03-25T04:17:21 1742876241

aka six sigma?

eru · 2025-03-25T05:11:45 1742879505

You don't need precise numbers to figure out whether your bridge will stand. What you need is a calculation designed to be robust to the errors incurred in measurement and computation.

The standard for floats guarantees you specific and precise error bounds that you can use to do an error analysis for your whole calculation. Most likely whatever engineering software you use to check your bridge design, will already have this error analysis baked in.

If you introduce some arbitrary other errors, you'd have to redo you error analysis from scratch. And it might not even be tractable, depending on the errors introduced. (The standard floating point error guarantees are designed to behave reasonably well and easily predictably when combined into a larger calculation.)

colechristensen · 2025-03-25T06:55:24 1742885724

You just have no idea what you're talking about. People get killed when things go wrong, and this "oh well other problems are probably worse" attitude is dangerous.

There's no such thing as exact numbers, but there is such a thing about reliable models. The errors introduced by calculating with numerical methods are studied and well understood, a processor not following exactly the rules it's supposed to is an enormous problem.

Here's a little introduction to condition numbers and how they're used to understand floating point error introduced in calculations:

https://www.cs.cornell.edu/~bindel/class/cs6210-f12/notes/le...

charcircuit · 2025-03-25T07:59:33 1742889573

The FDIV bug is not theoretical. It existed and no one died from it. People love to come up with theoretically how the bug can cause terrible things to happen, but in practice it didn't. The next run of the processor had the fix and the world moved on.

colechristensen · 2025-03-25T08:58:31 1742893111

1. Intel wasn’t very popular for scientific computing in 1994

2. No one was stupid enough to make life critical calculations on Intel after it was discovered and widely publicized

You, on the other hand, are suggesting it was no big deal and acting like people doing important work should have just ignored the bug. The reason bugs like this didn’t kill people in a large disaster is that folks with your disposition weren’t in charge of making decisions that would have led to that.

They did a recall that cost Intel a billion dollars adjusted for the present. It wasn’t just ignored.

charcircuit · 2025-03-25T16:25:03 1742919903

>and acting like people doing important work should have just ignored the bug.

No, I am acting like the average consumer could have ignored the bug. There wasn't a need to do a mass recall of every chip as the chip would still be fine for most users. Yes, there was a recall for people who needed it to work correctly, but in practice not everyone needs it.

MBCook · 2025-03-25T01:34:12 1742866452

Yeah but when the bug triggered you only got like eight digits worth of floating point.

The article says IBM expected normal users to hit it every few days.

charcircuit · 2025-03-25T04:18:38 1742876318

Hitting the bug doesn't mean that it would cause a practical issue for the user.

wglb · 2025-03-25T20:09:46 1742933386

And another note:

Locked reads must be paired with locked writes, and the CPU's bus interface enforces this by forbidding other memory accesses until the corresponding writes occur. As none are forthcoming, after performing these bus cycles all CPU activity stops, and the CPU must be reset to recover

inetknght · 2025-03-25T02:41:16 1742870476

> If you really wanted exact numbers you wouldn't be using fixed precision floats anyways.

Let the adults play with things that need to work exactly as documented (such as IEEE 754 floating point representations) and therefore can be relied upon when required. You can go back to building your little unreliable toys that nobody uses.

charcircuit · 2025-03-25T04:21:42 1742876502

There is no need to belittle me while not providing a practical example where the average consumer can be harmed by this bug.

inetknght · 2025-03-25T15:03:52 1742915032

> not providing a practical example where the average consumer can be harmed by this bug.

Why is a practical example necessary in this case? Why are you not able to recognize the very serious harms that were already described by people 30 years ago and during the intervening time? Why are you demanding that I spend my time to find and give you that information instead of you? I am not your personal tutor.

charcircuit · 2025-03-26T03:07:32 1742958452

Look at my original comment. I asking for a clarification on why the other person believes that intel was wrong and a risk actually was present. Instead of backing up the claim people swarmed me with hypothetical scenarios that don't prove that those scenarios were common enough to happen and cause a problem. I am not demanding your time. You were the one who joined the conversation smugly asserting you knew better than me. You could have just ignored me if you didn't want to answer my question.

genewitch · 2025-03-25T07:27:18 1742887638

humans are potentially harmed when developers use an unsigned int as a counter and it rolls to zero. Or a byte, in the case of medical radiation machines.

I guarantee if you had access to a full nntp text dump from this era you'd find some "harm"

Intel is dead, long live Intel.

charcircuit · 2025-03-25T08:06:57 1742890017

>when developers use an unsigned int as a counter and it rolls to zero

Yet, people wouldn't expect to return their CPU if this happened. The entire technology stack of a computer is filled with bugs, yet people are able to use them to great utility every day.

fragmede · 2025-03-25T06:57:50 1742885870

therac 25?

charcircuit · 2025-03-25T08:01:49 1742889709

The average consumer with a pentium didn't have it in a radiation therapy machine.

rbanffy · 2025-03-25T09:27:08 1742894828

But those who used a Therac 25 wouldn't be happy.

immibis · 2025-03-25T16:03:18 1742918598

Therac 25 didn't use a Pentium, nor floating point, and if it did, a 0.0001% increase in the radiation dose would be unnoticeable.

rbanffy · 2025-03-25T18:04:04 1742925844

It didn't, but the joke is still funny.