Re: [PATCH] x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h
From: Andy Lutomirski
Date: Fri Aug 16 2019 - 11:20:15 EST
On 8/14/19 2:17 PM, Lendacky, Thomas wrote:
From: Tom Lendacky <thomas.lendacky@xxxxxxx>
There have been reports of RDRAND issues after resuming from suspend on
some AMD family 15h and family 16h systems. This issue stems from BIOS
not performing the proper steps during resume to ensure RDRAND continues
to function properly.
Can you or someone from AMD document *precisely* what goes wrong here?
The APM is crystal clear:
Hardware modifies the CF flag to indicate whether the value returned in
the destination register is valid. If CF = 1, the value is valid. If CF
= 0, the value is invalid.
If BIOS screws up and somehow RDRAND starts failing and returning CF =
0, then I think it's legitimate to call it a BIOS bug. Some degree of
documentation would be nice, as would a way for BIOS to indicate to the
OS that it does not have this bug.
But, from the reports, it sounds like RDRAND starts failing, setting CF
= 1, and returning 0xFFFF.... in the destination register. If true,
then this is, in my book, a severe CPU bug. Software is supposed to be
able to trust that, if RDRAND sets CF = 1, the result is a
cryptographically secure random number, even if everything else in the
system is actively malicious. On a SEV-ES system, this should be
considered a security hole -- even if the hypervisor and BIOS collude,
RDRAND in the guest should work as defined by the manual.
So, can you clarify what is actually going on? And, if there is an
issue where the CPU does not behave as documented in the APM, and AMD
issue an erratum? And ideally also fix it in microcode or in a stepping
and give an indication that the issue is fixed?