Re: [PATCH] x86/AMD: Fix ASM constraints in amd_clear_divider()
From: Andrew Cooper
Date: Wed Aug 09 2023 - 19:12:03 EST
On 09/08/2023 10:33 pm, Linus Torvalds wrote:
> On Wed, 9 Aug 2023 at 13:24, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>> DIV writes its results into %eax and %edx, meaning that they need to be output
>> constraints too. It happens to be benign in this case as the registers don't
>> change value, but the compiler should still know.
>>
>> Fixes: 77245f1c3c64 ("x86/CPU/AMD: Do not leak quotient data after a division by 0")
> As mentioned earlier (html, not on list), I think it was intentional
> and this "fix" doesn't really fix anything.
>
> A comment might be good, of course, if this really bothers somebody.
>
> That said, if the code wanted to be *really* clever, it could have done
>
> asm volatile(ALTERNATIVE("", "div %0", X86_BUG_DIV0)
> :: "a" (1), "d" (0));
>
> instead. One less register used, and the same "no change to register
> state" approach.
Yeah, I spotted that as an option, and it does save one whole zeroing
idiom...
But IMO, the risk of someone copy&pasting this as if it were a good
example, and the debugging thereafter ought to be enough of a reason to
avoid klever tricks to save 1 line of C.
> Of course, who knows what early-out algorithm the divider uses, and
> maybe it's cheaper to do 0/1 than it is to do 1/1. Not that I think we
> should care. The main reason to be cute here would be just to be cute.
AMD said "any non-faulting divide". Which still isn't as helpful as it
could be, because according to Agner Fogh:
Uops Latency
DIV r8/m8 1 13-16
DIV r16/m16 2 14-21
DIV r32/m32 2 14-30
DIV r64/m64 2 14-46
IDIV r8/m8 1 13-16
IDIV r16/m16 2 13-21
IDIV r32/m32 2 14-30
IDIV r64/m64 2 14-47
DIV %al looks to be the firm favourite choice.
Assuming the one extra cycle is just for the double-pumped uop, then the
best latency for a divide is 13 cycles across the board.
It doesn't make sense to optimise this as a fastpath. After all, what
fool would put a real divide-by-1 in their code...
> That said, if you were to use this pattern in *real* code (as opposed
> to in that function that will never be called in reality because
> nobody divides by zero in real code), the register clobbers might be
> useful just to make sure the compiler doesn't re-use a zero register
> content that is then behind the latency of the dummy divide. But
> again, this particular code really doesn't _matter_ in that sense.
Well - that's a different question.
An attacker skilled in the art can easily hide #DE in the transient
shadow of something else, and plenty of people got very skilled in this
particular art trying to make better Meltdown exploits.
So I don't think putting any scrubbing in the #DE handler is going to
stop a real attack. But I'm just speculating.
~Andrew
P.S. https://www.amd.com/system/files/documents/security-whitepaper.pdf
currently says
"The divide by zero (#DE) fault is signaled[sic] on the integer divide
instructions. No data is forwarded to younger, dependent operations for
speculative execution on this fault."
which needs to be revisited. Zen1 was the latest-and-greatest when that
whitepaper was written.