Re: [PATCH v2 0/3] Freezer, CPU hotplug, x86 Microcode: Fix taskfreezing failures

From: Borislav Petkov
Date: Mon Oct 10 2011 - 14:34:55 EST


Hi Tejun,

On Mon, Oct 10, 2011 at 02:08:48PM -0400, tj@xxxxxxxxxx wrote:
> Maybe I'm confused but is that patch correct for actual CPU hotplug
> case? If not, what's the point in doing that? What are we gonna do
> after six month some people come up with "CPU hotplug fails to load
> new microcode for the new CPU"?

Ok, first of all, we still will load ucode on the onlining path - we're
simply not going to reload it when the CPU has gone offline and onlined
again. For that case people should simply reload the module so that
ucode on _all_ CPUs is updated pretty much at same time.

Now, actually, the ucode driver shouldn't even exist as it is - ucode
should be loaded much much earlier when the cores are being trampolined.
Normally, ucode is loaded by BIOS and I imagine this facility here
exists for the sole purpose to apply ucode only when there's no new BIOS
available.

So, here's what ucode handling should actually be, IMHO:

* load ucode very early during booting of each CPU

* keep ucode in memory in case not all cores have been onlined from the
get-go to apply when they're onlined later

* do NOT reapply it when cores are off-/onlined because there's no need
(cores retain ucode when we offline them (basically, they enter C1,
probably C3 on Intel or similar so no problem)).

* when you have new ucode image, trigger a replacement of the image in
memory and subsequently trigger a re-application of the new ucode (sysfs
write, whatever).

> The invalidation code is there for a reason.

... and that reason being?

> The CPU is going away and the microcode tied to the CPU should go away
> too.

This is what I don't understand - as I said earlier, ucode is not
something you get on a monthly basis: you get only a very few updates
and that's it in the majority of the cases. So users going the trouble
of reloading the microcode.ko module a very few times (if ever!) during
a system's lifetime shouldn't be an issue.

> If somebody is sure that microcode don't need to be changed once
> loaded, then all's good and dandy but that's not the case here, right?

Well, basically the current situation didn't change the ucode - it
simply reloaded the same image from before going offline.

See, there's this another problem with what we have right now: imagine
you've just updated the ucode image on disk and offline only a subset of
the cores. Then you online them again and they now get the newer ucode
image while the others still run the old ucode. This could explode or
could not, one thing's for sure: all bets are off. If we don't reload it
on hotplug, we're fine - only module reload triggers the ucode update in
a fairly synchronized manner.

> If you want to optimize away microcode unloading during
> suspend/resume, the RTTD is doing revalidation / reload during
> CPU_ONLINE as necessary.

see above.

> If this use case doesn't really matter too much to anyone, just
> leaving it alone would be better than adding band aid which can lead
> to very obscure issues down the road (oooh, that microcode shouldn't
> have been loaded to that cpu).

I'd like to actually hear someone justify such a requirement.

I hope I'm making some sense here.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/