Re: [PATCH v2 0/3] Freezer, CPU hotplug, x86 Microcode: Fix taskfreezing failures

From: Srivatsa S. Bhat
Date: Mon Oct 10 2011 - 15:00:38 EST


On 10/11/2011 12:23 AM, tj@xxxxxxxxxx wrote:
> Hello,
>
> On Mon, Oct 10, 2011 at 08:34:43PM +0200, Borislav Petkov wrote:
>> On Mon, Oct 10, 2011 at 02:08:48PM -0400, tj@xxxxxxxxxx wrote:
>>> Maybe I'm confused but is that patch correct for actual CPU hotplug
>>> case? If not, what's the point in doing that? What are we gonna do
>>> after six month some people come up with "CPU hotplug fails to load
>>> new microcode for the new CPU"?
>>
>> Ok, first of all, we still will load ucode on the onlining path - we're
>> simply not going to reload it when the CPU has gone offline and onlined
>> again. For that case people should simply reload the module so that
>> ucode on _all_ CPUs is updated pretty much at same time.
>
> I was thinking about hot-swap. It might be pretty unlikely at this
> point but I don't think excluding that is a good idea. x86 is used in
> pretty highend too these days. Again, I don't know much about how
> ucodes are supposed to be managed and maybe it's true that we don't
> need new one at all even after hotswap. If that's the case, state it
> clearly and it's all fine.
>
>>> The invalidation code is there for a reason.
>>
>> ... and that reason being?
>
> Again, the CPU for the microcode is going away? It's something tied
> to a device and the device is going away. It's a basic correctness
> issue. It at least needs to be revalidated.
>
>>> If somebody is sure that microcode don't need to be changed once
>>> loaded, then all's good and dandy but that's not the case here, right?
>>
>> Well, basically the current situation didn't change the ucode - it
>> simply reloaded the same image from before going offline.
>>
>> See, there's this another problem with what we have right now: imagine
>> you've just updated the ucode image on disk and offline only a subset of
>> the cores. Then you online them again and they now get the newer ucode
>> image while the others still run the old ucode. This could explode or
>> could not, one thing's for sure: all bets are off. If we don't reload it
>> on hotplug, we're fine - only module reload triggers the ucode update in
>> a fairly synchronized manner.
>
> Yeah, loading different ucodes to different cores sounds pretty scary.
> I suppose we'll need to distinguish physical hotplugs from logical
> ones.
>
> Hmm... is it possible to tell whether the core coming online is the
> same one as the last time? If that's possible, the problem becomes
> pretty simple and we can simply tell people who are mixing
> suspend/hibernate with physical hotplug that they're crazy.
>

I think that is pretty easy, atleast from a microcode revision standpoint:
the collect_cpu_info() function (defined in arch/x86/kernel/microcode_core.c
and arch/x86/kernel/microcode_intel.c or ..._amd.c) can be used for that
purpose. Am I right Boris?

--
Regards,
Srivatsa S. Bhat <srivatsa.bhat@xxxxxxxxxxxxxxxxxx>
Linux Technology Center,
IBM India Systems and Technology Lab

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/