Re: [PATCH AUTOSEL 4.9 4/4] thermal: intel_powerclamp: Use get_cpu() instead of smp_processor_id() to avoid crash

From: Rafael J. Wysocki
Date: Wed Oct 12 2022 - 12:58:59 EST


On Tue, Oct 11, 2022 at 3:23 PM Chen Yu <yu.c.chen@xxxxxxxxx> wrote:
>
> Hi Pavel,
> On 2022-10-11 at 13:36:46 +0200, Pavel Machek wrote:
> > Hi!
> >
> > > From: Srinivas Pandruvada <srinivas.pandruvada@xxxxxxxxxxxxxxx>
> > >
> > > [ Upstream commit 68b99e94a4a2db6ba9b31fe0485e057b9354a640 ]
> > >
> > > When CPU 0 is offline and intel_powerclamp is used to inject
> > > idle, it generates kernel BUG:
> > >
> > > BUG: using smp_processor_id() in preemptible [00000000] code: bash/15687
> > > caller is debug_smp_processor_id+0x17/0x20
> > > CPU: 4 PID: 15687 Comm: bash Not tainted 5.19.0-rc7+ #57
> > > Call Trace:
> > > <TASK>
> > > dump_stack_lvl+0x49/0x63
> > > dump_stack+0x10/0x16
> > > check_preemption_disabled+0xdd/0xe0
> > > debug_smp_processor_id+0x17/0x20
> > > powerclamp_set_cur_state+0x7f/0xf9 [intel_powerclamp]
> > > ...
> > > ...
> > >
> > > Here CPU 0 is the control CPU by default and changed to the current CPU,
> > > if CPU 0 offlined. This check has to be performed under cpus_read_lock(),
> > > hence the above warning.
> > >
> > > Use get_cpu() instead of smp_processor_id() to avoid this BUG.
> >
> > This has exactly the same problem as smp_processor_id(), you just
> > worked around the warning. If it is okay that control_cpu contains
> > stale value, could we have a comment explaining why?
> >
> May I know why does control_cpu have stale value? The control_cpu
> is a random picked online CPU which will be used later to collect statistics.
> As long as the control_cpu is online, it is valid IMO.

So this is confusing, because the code makes the impression that
getting the number of the CPU running the code matters in some way,
which isn't the case.

Something like cpumask_first(cpu_online_mask) should work as well if
I'm not mistaken and it would be less confusing to use this instead
IMO.