Re: [PATCH] cpufreq: intel_pstate: Fix for HWP interrupt before driver is ready

From: Rafael J. Wysocki
Date: Mon Sep 06 2021 - 12:58:41 EST


On Mon, Sep 6, 2021 at 6:55 PM Srinivas Pandruvada
<srinivas.pandruvada@xxxxxxxxxxxxxxx> wrote:
>
> On Mon, 2021-09-06 at 10:43 -0600, Jens Axboe wrote:
> > On 9/6/21 10:17 AM, Rafael J. Wysocki wrote:
> > > On Sat, Sep 4, 2021 at 7:37 AM Srinivas Pandruvada
> > > <srinivas.pandruvada@xxxxxxxxxxxxxxx> wrote:
> > > >
> > > > In Lenovo X1 gen9 laptop, HWP interrupts arrive before driver is
> > > > ready
> > > > to handle on that CPU. Basically didn't even allocated memory for
> > > > per
> > > > cpu data structure and not even started interrupt enable process
> > > > on that
> > > > CPU. So interrupt handler observes a NULL pointer to schedule
> > > > work.
> > > >
> > > > This interrupt was probably for SMM, but since it is redirected
> > > > to
> > > > OS by OSC call, OS receives it, but not ready to handle. That
> > > > redirection
> > > > of interrupt to OS was also done to solve one SMM crash on Yoga
> > > > 260 for
> > > > HWP interrupt a while back.
> > > >
> > > > To solve this the HWP interrupt handler should ignore such
> > > > request if the
> > > > driver is not ready. This will require some flag to wait till the
> > > > driver
> > > > setup a workqueue to handle on a CPU. We can't simply assume
> > > > cpudata to
> > > > be NULL and avoid processing as it may not be NULL but data
> > > > structure is
> > > > not in consistent state.
> > > >
> > > > So created a cpumask which sets the CPU on which interrupt was
> > > > setup. If
> > > > not setup, simply clear the interrupt status and return. Since
> > > > the
> > > > similar issue can happen during S3 resume, clear the bit during
> > > > offline.
> > > >
> > > > Since interrupt timing may be before HWP is enabled, use safe MSR
> > > > read
> > > > writes as before the change for HWP interrupt.
> > > >
> > > > Fixes: d0e936adbd22 ("cpufreq: intel_pstate: Process HWP
> > > > Guaranteed change notification")
> > > > Reported-and-tested-by: Jens Axboe <axboe@xxxxxxxxx>
> > > > Signed-off-by: Srinivas Pandruvada <
> > > > srinivas.pandruvada@xxxxxxxxxxxxxxx>
> > > > ---
> > > > drivers/cpufreq/intel_pstate.c | 23 ++++++++++++++++++++++-
> > > > 1 file changed, 22 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/cpufreq/intel_pstate.c
> > > > b/drivers/cpufreq/intel_pstate.c
> > > > index b4ffe6c8a0d0..5ac86bfa1080 100644
> > > > --- a/drivers/cpufreq/intel_pstate.c
> > > > +++ b/drivers/cpufreq/intel_pstate.c
> > > > @@ -298,6 +298,8 @@ static bool hwp_boost __read_mostly;
> > > >
> > > > static struct cpufreq_driver *intel_pstate_driver __read_mostly;
> > > >
> > > > +static cpumask_t hwp_intr_enable_mask;
> > > > +
> > > > #ifdef CONFIG_ACPI
> > > > static bool acpi_ppc;
> > > > #endif
> > > > @@ -1067,11 +1069,15 @@ static void intel_pstate_hwp_set(unsigned
> > > > int cpu)
> > > > wrmsrl_on_cpu(cpu, MSR_HWP_REQUEST, value);
> > > > }
> > > >
> > > > +static void intel_pstate_disable_hwp_interrupt(struct cpudata
> > > > *cpudata);
> > > > +
> > > > static void intel_pstate_hwp_offline(struct cpudata *cpu)
> > > > {
> > > > u64 value = READ_ONCE(cpu->hwp_req_cached);
> > > > int min_perf;
> > > >
> > > > + intel_pstate_disable_hwp_interrupt(cpu);
> > > > +
> > > > if (boot_cpu_has(X86_FEATURE_HWP_EPP)) {
> > > > /*
> > > > * In case the EPP has been set to "performance"
> > > > by the
> > > > @@ -1645,20 +1651,35 @@ void notify_hwp_interrupt(void)
> > > > if (!hwp_active || !boot_cpu_has(X86_FEATURE_HWP_NOTIFY))
> > > > return;
> > > >
> > > > - rdmsrl(MSR_HWP_STATUS, value);
> > > > + rdmsrl_safe(MSR_HWP_STATUS, &value);
> > > > if (!(value & 0x01))
> > > > return;
> > > >
> > > > + if (!cpumask_test_cpu(this_cpu, &hwp_intr_enable_mask)) {
> > > > + wrmsrl_safe(MSR_HWP_STATUS, 0);
> > > > + return;
> > > > + }
> > >
> > > Without additional locking, there is a race between this and
> > > intel_pstate_disable_hwp_interrupt().
> > >
> > > 1. notify_hwp_interrupt() checks hwp_intr_enable_mask() and the
> > > target
> > > CPU is in there, so it will go for scheduling the delayed work.
> > > 2. intel_pstate_disable_hwp_interrupt() runs between the check and
> > > the
> > > cpudata load below.
> > > 3. hwp_notify_work is scheduled on the CPU that isn't there in the
> > > mask any more.
> >
> > I noticed that too, not clear to me how much of an issue that is in
> > practice. But there's definitely a race there.
> Glad to see how this is possible from code running in ISR context.

intel_pstate_disable_hwp_interrupt() may very well run on a different
CPU in parallel with the interrupt handler running on this CPU. Or is
this not possible for some reason?