As it is written in the parent article, the verification of the code generated b...

zimmerfrei · on Feb 22, 2024

Your argument boils down to "all hardware implementations so far in history never optimized word boolean operations so future implementations will keep doing so".

I think that is just an assumption and I would not take that risk for high security applications, unless for specific CPU models where the behavior is measured in practice (certainly not by just auditing the assembly, which is by itself already too high-level). After all, never in history we had such a level of sophistication.

Right now, all zero register values can lead to speed gains, so they could be obversable. But both ARM and Intel latest ISA have introduced flags that permit the future CPUs to perform operations with data-dependent timing. Boolean operations are officially marked as potentially affected by that flag.

adrian_b · on Feb 22, 2024

No, my argument is that it is impossible to optimize the binary Boolean operations in a way in which they will have variable execution time.

Moreover, they are the only commonly encountered operations that are implemented in hardware and for which this is unconditionally true.

Therefore they are the first choice for the implementation of any constant-time algorithm.

Most modern CPUs have many other operations that are executed in constant time, like additions and subtractions, but for those, variable-time implementations are also possible.

As I have already said, for any bitwise binary boolean operation, the elementary operations having a pair of bits as operands are independent, so there exists no way of performing the complete operation without doing all the sub-operations, unlike for other simple operations, like additions, shifts or rotations, where it is possible to detect conditions that allow an earlier completion of the operation.

The hardware implementation can do the sub-operations sequentially or concurrently, in any order, but it always must execute all of them, regardless of the values of the input operands, so they will take the same time.

Only in an asynchronous CPU and only for certain kinds of logic gate implementations it is possible to have a completion time for a binary Boolean operation that varies with the values of the input bits, but for the most common asynchronous circuit synthesis methods, which use dual rail encoding for bits, i.e. each bit is encoded as a pair of complementary bits, the binary boolean operations remain constant-time, like in the normal synchronous CPUs.

zimmerfrei · on Feb 22, 2024

Let's assume that you have a simple XOR between two registers.

If the CPU can pre-label a register as having no bits set (and they can or speculate on it), during scheduling, it could theoretically simply drop the XOR, transfer or rename the relevant register which may lead to a tiny but different timing that can be measured and exploited.

That is just one simple counter-example that shows how the assumptions you present are not necessarily valid on modern complex CPUs. Many more are examples are possible, which is of course not to say that they are implemented today. But they could be implemented in the future (without us knowing, and again, both ARM and Intel are explicit on that), therefore the security of a security-sensitive piece of code should not solely rely on that.