Re: [Patch v3 Part2 3/9] x86/microcode/intel: Fix collect_cpu_info() to reflect current microcode
From: Borislav Petkov
Date: Tue Jan 31 2023 - 16:09:04 EST
On Tue, Jan 31, 2023 at 08:49:52PM +0000, Luck, Tony wrote:
> What happens here if the update on the first hyperthread failed (sure, it shouldn't,
> but stuff happens at large scale). In this case the current rev is still older that the
> the cache version ... so there is no "goto out", and this hyperthread will now write
> the MSR to initiate microcode update here, while the first thread is off executing
> arbitrary code (the situation that we want to avoid).
Lemme see if I can follow: we sync all threads in __reload_late() and
once they all arrive, we send them down into ->apply_microcode.
T0 arrives, and fails the update. That is this piece:
/* write microcode via MSR 0x79 */
wrmsrl(MSR_IA32_UCODE_WRITE, (unsigned long)mc->bits);
rev = intel_get_microcode_revision();
if (rev != mc->hdr.rev) {
pr_err("CPU%d update to revision 0x%x failed\n",
cpu, mc->hdr.rev);
return UCODE_ERROR;
}
We return here without updating cpu_sig.rev, as we should.
T1 arrives, updates successfully and updates its cpu_sig.rev.
T0's patch level has been updated too with that because the microcode
engine is shared between the threads. T0's cpu_sig.rev isn't, however,
as that has happened "behind its back", so to speak.
Is that the scenario you're talking about?
If so, if you look at __reload_late(), it'll say
pr_warn("Error reloading microcode on CPU %d\n", cpu);
and the large scale operator will know.
And well, the easy fix is, do the reload again. :-)
That'll update the cached values too.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette