Re: [PATCH] x86/mce/therm_throt: Handle case where throttle_active_work() is called on behalf of an offline CPU

From: Thomas Gleixner
Date: Mon Feb 24 2020 - 11:02:03 EST


Borislav Petkov <bp@xxxxxxxxx> writes:

> On Sat, Feb 22, 2020 at 04:25:59PM -0800, Srinivas Pandruvada wrote:
>> If the condition is false, will it prevent offline CPU before executing
>> next statement and reschedule on another CPU? Although It will not
>> cause any error or crash but in rare circumstance may print premature
>> warning/normal message based on the current CPU's state.
>
> Why, offline CPU is offline CPU?
>
> Btw, I'm asking whether you can do the simpler thing *instead* of your
> patch. You basically don't run the workqueue callback on offlined CPUs:
>
> get_online_cpus();
>
> if (cpu_is_offline(smp_processor_id()))
> goto out;
>
> ...
>
>
> out:
> put_online_cpus();

Which is wrong as well. Trying to "fix" it in the work queue callback is
papering over the root cause.

Why is any work scheduled on an outgoing CPU after this CPU executed
thermal_throttle_offline()?

When thermal_throttle_offline() is invoked the cpu bound work queues are
still functional and thermal_throttle_offline() cancels outstanding
work.

So no, please fix the root cause not the symptom.

Thanks,

tglx