Re: [BUGFIX][PATCH] Freezer, CPU hotplug, x86 Microcode: Fix taskfreezing failures

From: Srivatsa S. Bhat
Date: Wed Oct 05 2011 - 04:51:10 EST

On 10/05/2011 12:51 PM, Borislav Petkov wrote:
> On Tue, Oct 04, 2011 at 04:57:10PM -0400, Srivatsa S. Bhat wrote:
>> 1. Since we never invalidate the microcode once we get it from userspace, it
>> also means that we will never be able to update the microcode for that cpu
>> ever again! (since we will continue to reuse the same old microcode over and
>> over again on every cpu online operation for that cpu).
>> This restriction introduced by my patch seems bad, isn't it?
> Well, if you have a new microcode image, you are supposed to place it
> under /lib/firmware/.. or where the kernel has been configured to find
> it and then reload the microcode module.
Oh well, then we can update the microcode after all...

>> 2. Suppose we have a 16 cpu machine and we boot it with only 8 cpus (ie., we online
>> only 8 of the 16 cpus while booting). So it means that the kernel gets a copy
>> of the microcode for each of these 8 cpus, but not for the ones that were not
>> onlined while booting.
>> [Let us assume that cpu number 10 was one among the 8 cpus that were not onlined
>> while booting].
>> Later on, let's say we start our cpu hotplug + suspend/resume tests simultaneously.
>> Now consider this possible scenario:
>> * Userspace is not frozen
>> * We initiate a cpu online operation on cpu 10. At the same time, since suspend
>> is in progress, lets say the freezing begins.
>> * Just before cpu 10 could be brought up online, userspace gets frozen.
>> * Now while bringing up cpu 10, due to the CPU_ONLINE_FROZEN notification, the
>> microcode core tries to apply the microcode to the cpu. But unfortunately, it
>> doesn't have the microcode! (because this cpu is coming up for the first time
>> and hence we never got its microcode from userspace...)
>> Now, again the same problem ensues: microcode core calls request_firmware and
>> depends on the (frozen) userspace to get the microcode.
> Ok, but is this a real-life scenario you expect to happen somewhere or
> is it something that happens only during test? IOW, if you have root
> there are many ways to shoot yourself in the foot, right?

Well, honestly I was just trying to see in which all scenarios the patch
would probably not work well... In real-life I don't expect to hit such
a corner case!

> [..]
>> I am still wondering if the approach I proposed earlier (the one in
>> which we defer applying microcode and queue up a callback function
>> etc) could solve all these issues. I am also playing around with the
>> idea of coupling that with mutual exclusion between cpu hotplug and
>> freezer to handle any problematic scenarios.
> Well, all those solutions seem like they're not worth the trouble and
> complexity if those cases are only conjecture - if you still trigger
> them during your testing then probably mutually excluding freezer and
> CPU hotplug is something I would lean towards but I could be wrong.

Even I felt the same (moreover, that complex solution was not foolproof
either!). Please see my other mail which talks about how just mutually
excluding freezer and cpu hotplugging would solve everything.

> There's of course a much better fix which has been on the table for a
> while now involving loading the ucode from the bootloader and applying
> it much earlier than what we have now and keeping the ucode image in
> memory. This would solve the CPU hotplug deal completely. Maybe it's
> time I looked into it :-).

Assuming I understood this correctly, I can see some issues in this
approach as well (since it is quite similar to the approach used in my
one-line patch), but yeah, definitely they are all very much corner

Srivatsa S. Bhat <srivatsa.bhat@xxxxxxxxxxxxxxxxxx>
Linux Technology Center,
IBM India Systems and Technology Lab
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at