Re: [BUGFIX][PATCH] Freezer, CPU hotplug, x86 Microcode: Fix taskfreezing failures
From: Borislav Petkov
Date: Wed Oct 05 2011 - 03:21:11 EST
On Tue, Oct 04, 2011 at 04:57:10PM -0400, Srivatsa S. Bhat wrote:
> 1. Since we never invalidate the microcode once we get it from userspace, it
> also means that we will never be able to update the microcode for that cpu
> ever again! (since we will continue to reuse the same old microcode over and
> over again on every cpu online operation for that cpu).
> This restriction introduced by my patch seems bad, isn't it?
Well, if you have a new microcode image, you are supposed to place it
under /lib/firmware/.. or where the kernel has been configured to find
it and then reload the microcode module.
> 2. Suppose we have a 16 cpu machine and we boot it with only 8 cpus (ie., we online
> only 8 of the 16 cpus while booting). So it means that the kernel gets a copy
> of the microcode for each of these 8 cpus, but not for the ones that were not
> onlined while booting.
> [Let us assume that cpu number 10 was one among the 8 cpus that were not onlined
> while booting].
> Later on, let's say we start our cpu hotplug + suspend/resume tests simultaneously.
> Now consider this possible scenario:
> * Userspace is not frozen
> * We initiate a cpu online operation on cpu 10. At the same time, since suspend
> is in progress, lets say the freezing begins.
> * Just before cpu 10 could be brought up online, userspace gets frozen.
> * Now while bringing up cpu 10, due to the CPU_ONLINE_FROZEN notification, the
> microcode core tries to apply the microcode to the cpu. But unfortunately, it
> doesn't have the microcode! (because this cpu is coming up for the first time
> and hence we never got its microcode from userspace...)
> Now, again the same problem ensues: microcode core calls request_firmware and
> depends on the (frozen) userspace to get the microcode.
Ok, but is this a real-life scenario you expect to happen somewhere or
is it something that happens only during test? IOW, if you have root
there are many ways to shoot yourself in the foot, right?
> I am still wondering if the approach I proposed earlier (the one in
> which we defer applying microcode and queue up a callback function
> etc) could solve all these issues. I am also playing around with the
> idea of coupling that with mutual exclusion between cpu hotplug and
> freezer to handle any problematic scenarios.
Well, all those solutions seem like they're not worth the trouble and
complexity if those cases are only conjecture - if you still trigger
them during your testing then probably mutually excluding freezer and
CPU hotplug is something I would lean towards but I could be wrong.
There's of course a much better fix which has been on the table for a
while now involving loading the ucode from the bootloader and applying
it much earlier than what we have now and keeping the ucode image in
memory. This would solve the CPU hotplug deal completely. Maybe it's
time I looked into it :-).
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/