Re: [PATCH 5/7] thermal: intel: hfi: Enable notification interrupt
From: Srinivas Pandruvada
Date: Tue Nov 09 2021 - 07:54:21 EST
On Tue, 2021-11-09 at 09:48 +0100, Peter Zijlstra wrote:
> On Mon, Nov 08, 2021 at 06:26:13PM -0800, Ricardo Neri wrote:
> > On Mon, Nov 08, 2021 at 10:07:40AM +0100, Peter Zijlstra wrote:
> > > On Fri, Nov 05, 2021 at 06:33:10PM -0700, Ricardo Neri wrote:
>
> > > > +static void hfi_update_work_fn(struct work_struct *work)
> > > > +{
> > > > + struct hfi_instance *hfi_instance;
> > > > +
> > > > + hfi_instance = container_of(to_delayed_work(work),
> > > > struct hfi_instance,
> > > > + update_work);
> > > > + if (!hfi_instance)
> > > > + return;
> > > > +
> > > > + /* TODO: Consume update here. */
> > >
> > > // this here uses ->event_lock to serialize against the
> > > // interrupt below changing the data...
> >
> > Anyone reading the HFI table would need to take ->event_lock.
>
> Right.. that implies ->event_lock can be taken while there is no
> interrupt active, which then necessitates the additional lock.
>
Correct.
With the raw_spin_trylock() optimization, we will need additional lock.
So need another lock to protect hfi_instance->table_base.
> > > > +}
> > > > +
> > > > +void intel_hfi_process_event(__u64 pkg_therm_status_msr_val)
> > > > +{
> > > > + struct hfi_instance *hfi_instance;
> > > > + int cpu = smp_processor_id();
> > > > + struct hfi_cpu_info *info;
> > > > + unsigned long flags;
> > > > + u64 timestamp;
> > > > +
> > > > + if (!pkg_therm_status_msr_val)
> > > > + return;
> > > > +
> > > > + info = &per_cpu(hfi_cpu_info, cpu);
> > > > + if (!info)
> > > > + return;
> > > >
[...]
> > > > + memcpy(hfi_instance->table_base, hfi_instance-
> > > > >hw_table,
> > > > + hfi_features.nr_table_pages << PAGE_SHIFT);
>
> I think we actually need to release ->interrupt_lock here, *before*
> the
> WRMSR that ACKs the HFI update. Because I think the moment that WRMSR
> goes through we can get another interrupt, and that *must* not find
> ->interrupt_lock taken, otherwise it will not process the update
> etc..
> leading to lost interrupts.
Correct.
Once we use raw_spin_trylock() change suggested above, then we need to
release lock here.
Thanks,
Srinivas
>
> > > > + /*
> > > > + * Let hardware and other CPUs know that we are done
> > > > reading the HFI
> > > > + * table and it is free to update it again.
> > > > + */
> > > > + pkg_therm_status_msr_val &= THERM_STATUS_CLEAR_PKG_MASK
> > > > &
> > > > +
> > > > ~PACKAGE_THERM_STATUS_HFI_UPDATED;
> > > > + wrmsrl(MSR_IA32_PACKAGE_THERM_STATUS,
> > > > pkg_therm_status_msr_val);
> > > > + schedule_delayed_work(&hfi_instance->update_work,
> > > > HFI_UPDATE_INTERVAL);
> > > > +
> > > > +unlock_spinlock:
> > > > + raw_spin_unlock_irqrestore(&hfi_instance->event_lock,
> > > > flags);
> > >
> > > raw_spin_unlock(&hfi_instance->interrupt_lock);
> >
> > ... and here we release both locks.
>
> See above.