Re: [PATCH v2 0/3] Freezer, CPU hotplug, x86 Microcode: Fix taskfreezing failures

From: tj@xxxxxxxxxx
Date: Mon Oct 10 2011 - 14:53:15 EST


Hello,

On Mon, Oct 10, 2011 at 08:34:43PM +0200, Borislav Petkov wrote:
> On Mon, Oct 10, 2011 at 02:08:48PM -0400, tj@xxxxxxxxxx wrote:
> > Maybe I'm confused but is that patch correct for actual CPU hotplug
> > case? If not, what's the point in doing that? What are we gonna do
> > after six month some people come up with "CPU hotplug fails to load
> > new microcode for the new CPU"?
>
> Ok, first of all, we still will load ucode on the onlining path - we're
> simply not going to reload it when the CPU has gone offline and onlined
> again. For that case people should simply reload the module so that
> ucode on _all_ CPUs is updated pretty much at same time.

I was thinking about hot-swap. It might be pretty unlikely at this
point but I don't think excluding that is a good idea. x86 is used in
pretty highend too these days. Again, I don't know much about how
ucodes are supposed to be managed and maybe it's true that we don't
need new one at all even after hotswap. If that's the case, state it
clearly and it's all fine.

> > The invalidation code is there for a reason.
>
> ... and that reason being?

Again, the CPU for the microcode is going away? It's something tied
to a device and the device is going away. It's a basic correctness
issue. It at least needs to be revalidated.

> > If somebody is sure that microcode don't need to be changed once
> > loaded, then all's good and dandy but that's not the case here, right?
>
> Well, basically the current situation didn't change the ucode - it
> simply reloaded the same image from before going offline.
>
> See, there's this another problem with what we have right now: imagine
> you've just updated the ucode image on disk and offline only a subset of
> the cores. Then you online them again and they now get the newer ucode
> image while the others still run the old ucode. This could explode or
> could not, one thing's for sure: all bets are off. If we don't reload it
> on hotplug, we're fine - only module reload triggers the ucode update in
> a fairly synchronized manner.

Yeah, loading different ucodes to different cores sounds pretty scary.
I suppose we'll need to distinguish physical hotplugs from logical
ones.

Hmm... is it possible to tell whether the core coming online is the
same one as the last time? If that's possible, the problem becomes
pretty simple and we can simply tell people who are mixing
suspend/hibernate with physical hotplug that they're crazy.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/