Re: [PATCH] x86/mce/therm_throt: Handle case where throttle_active_work() is called on behalf of an offline CPU

From: Srinivas Pandruvada
Date: Sat Feb 22 2020 - 19:26:46 EST


On Sat, 2020-02-22 at 18:51 +0100, Borislav Petkov wrote:
> On Sat, Feb 22, 2020 at 08:24:32AM -0800, Srinivas Pandruvada wrote:
> > During cpu-hotplug test with CONFIG_PREEMPTION and
> > CONFIG_DEBUG_PREEMPT
> > enabled, Chris reported error:
> >
> > BUG: using smp_processor_id() in preemptible [00000000] code:
> > kworker/1:0/17
> > caller is throttle_active_work+0x12/0x280
> >
> > Here throttle_active_work() is a work queue callback scheduled with
> > schedule_delayed_work_on(). This will not cause this error for the
> > use
> > of smp_processor_id() under normal conditions as there is a check
> > for
> > "current->nr_cpus_allowed == 1".
> > But when the target CPU is offline the workqueue becomes unbound.
> > Then the work queue callback can be scheduled on another CPU and
> > the
> > error is printed for the use of smp_processor_id() in preemptible
> > context.
>
> So what's wrong with simply doing:
>
> if (cpu_is_offline(this_cpu))
> return;
>
> ?
>
If the condition is false, will it prevent offline CPU before executing
next statement and reschedule on another CPU? Although It will not
cause any error or crash but in rare circumstance may print premature
warning/normal message based on the current CPU's state.

So I can submit something like this:

diff --git a/arch/x86/kernel/cpu/mce/therm_throt.c
b/arch/x86/kernel/cpu/mce/therm_throt.c
index b38010b541d6..7aa7c9d1df2a 100644
--- a/arch/x86/kernel/cpu/mce/therm_throt.c
+++ b/arch/x86/kernel/cpu/mce/therm_throt.c
@@ -239,11 +239,14 @@ static void throttle_active_work(struct
work_struct *work)
{
struct _thermal_state *state =
container_of(to_delayed_work(work),
struct _thermal_state,
therm_work);
- unsigned int i, avg, this_cpu = smp_processor_id();
+ unsigned int i, avg, this_cpu = state->cpu;
u64 now = get_jiffies_64();
bool hot;
u8 temp;

+ if (cpu_is_offline(this_cpu))
+ return;
+
get_therm_status(state->level, &hot, &temp);
/* temperature value is offset from the max so lesser means
hotter */
if (!hot && temp > state->baseline_temp) {

Thanks,
Srinivas

> You don't need to run the callback on an offlined CPU anyway...
>