Re: [RFC PATCH v4 20/21] iommu/vt-d: hpet: Reserve an interrupt remampping table entry for watchdog

From: Ricardo Neri
Date: Tue Jun 18 2019 - 18:50:46 EST

On Mon, Jun 17, 2019 at 10:25:35AM +0200, Thomas Gleixner wrote:
> On Sun, 16 Jun 2019, Thomas Gleixner wrote:
> > On Thu, 23 May 2019, Ricardo Neri wrote:
> > > When the hardlockup detector is enabled, the function
> > > hld_hpet_intremapactivate_irq() activates the recently created entry
> > > in the interrupt remapping table via the modify_irte() functions. While
> > > doing this, it specifies which CPU the interrupt must target via its APIC
> > > ID. This function can be called every time the destination iD of the
> > > interrupt needs to be updated; there is no need to allocate or remove
> > > entries in the interrupt remapping table.
> >
> > Brilliant.
> >
> > > +int hld_hpet_intremap_activate_irq(struct hpet_hld_data *hdata)
> > > +{
> > > + u32 destid = apic->calc_dest_apicid(hdata->handling_cpu);
> > > + struct intel_ir_data *data;
> > > +
> > > + data = (struct intel_ir_data *)hdata->intremap_data;
> > > + data->irte_entry.dest_id = IRTE_DEST(destid);
> > > + return modify_irte(&data->irq_2_iommu, &data->irte_entry);
> >
> > This calls modify_irte() which does at the very beginning:
> >
> > raw_spin_lock_irqsave(&irq_2_ir_lock, flags);
> >
> > How is that supposed to work from NMI context? Not to talk about the
> > other spinlocks which are taken in the subsequent call chain.
> >
> > You cannot call in any of that code from NMI context.
> >
> > The only reason why this never deadlocked in your testing is that nothing
> > else touched that particular iommu where the HPET hangs off concurrently.
> >
> > But that's just pure luck and not design.
> And just for the record. I warned you about that problem during the review
> of an earlier version and told you to talk to IOMMU folks whether there is
> a way to update the entry w/o running into that lock problem.

I think I misunderstood your feedback. You did mention issues on locking
between NMI and !NMI contexts. However, that was in the context of using the
generic irq code to do things such as set the affinity of the interrupt and
requesting an irq. I understood that I should instead program things directly.
I extrapolated this to the IOMMU driver in which I also added code directly
instead of using the existing layering.

Also, at the time, the question regarding the IOMMU, as I understood, was
whether it was posible to reserve a IOMMU remapping entry upfront. I believe
my patches achieve that, even if they are hacky and ugly, and have locking
issues. I see now that the locking issues are also part of the IOMMU
discussion. Perhaps that was also implicit.
> Can you tell my why am I actually reviewing patches and spending time on
> this when the result is ignored anyway?

Yes, Thomas, I should have checked first with the IOMMU maintainers
first on the issues in the paragraph above. It is not my intention to
waste your time; your feedback has been valuable and has contributed to
improve the code.

> I also tried to figure out why you went away from the IPI broadcast
> design. The only information I found is:
> Changes vs. v1:
> * Brought back the round-robin mechanism proposed in v1 (this time not
> using the interrupt subsystem). This also requires to compute
> expiration times as in v1 (Andi Kleen, Stephane Eranian).
> Great that there is no trace of any mail from Andi or Stephane about this
> on LKML. There is no problem with talking offlist about this stuff, but
> then you should at least provide a rationale for those who were not part of
> the private conversation.

Stephane has already commented the rationale.

Thanks and BR,