Re: [PATCH 1/6] x86, nmi: Implement delayed irq_work mechanism to handle lost NMIs

From: Peter Zijlstra
Date: Wed May 21 2014 - 15:38:48 EST


On Wed, May 21, 2014 at 03:02:25PM -0400, Don Zickus wrote:
> On Wed, May 21, 2014 at 07:51:49PM +0200, Peter Zijlstra wrote:
> > On Wed, May 21, 2014 at 12:45:25PM -0400, Don Zickus wrote:
> > > > > + /*
> > > > > + * Can't use send_IPI_self here because it will
> > > > > + * send an NMI in IRQ context which is not what
> > > > > + * we want. Create a cpumask for local cpu and
> > > > > + * force an IPI the normal way (not the shortcut).
> > > > > + */
> > > > > + bitmap_zero(nmi_mask, NR_CPUS);
> > > > > + mask = to_cpumask(nmi_mask);
> > > > > + cpu_set(smp_processor_id(), *mask);
> > > > > +
> > > > > + __this_cpu_xchg(nmi_delayed_work_pending, true);
> > > >
> > > > Why is this xchg and not __this_cpu_write() ?
> > > >
> > > > > + apic->send_IPI_mask(to_cpumask(nmi_mask), NMI_VECTOR);
> > > >
> > > > What's wrong with apic->send_IPI_self(NMI_VECTOR); ?
> > >
> > > I tried to explain that in my comment above. IPI_self uses the shortcut
> > > method to send IPIs which means the NMI_VECTOR will be delivered in IRQ
> > > context _not_ NMI context. :-( This is why I do the whole silly dance.
> >
> > I'm still not getting it, probably because I don't know how these APICs
> > really work, but the way I read both the comment and your explanation
> > here is that we get an NMI nested in the IRQ context we called it from,
> > which is pretty much exactly what we want.
>
> Um, ok. I think my concern with that is an NMI nested in IRQ context
> could be interrupted by a real NMI. I believe that would cause nmi_enter()
> to barf among other bad things in the nmi code.

Ohh, you mean the NMI handler will run as a regular interrupt? Yes, that
would be bad.

> > > So both my problems center around what guarantees does irq_work have to
> > > stay on the same cpu?
> >
> > Well, none as you used a global irq_work, so all cpus will now contend
> > on it on every NMI trying to queue it :-(
>
> Yes, I was stuck between using a per-cpu implementation in which every dummy
> NMI grabs the spin lock in the nmi handlers, or a global lock. I tried
> the global lock.
>
> I thought the irq_work lock seemed less contended because it was only read
> once before being acted upon (for a cacheline seperate from actual nmi work).
>
> Whereas a spin lock in the nmi handlers seems to keep reading the lock
> until it owns it thus slowing down useful work for the handler that owns
> the lock (because of the cache contention).
>
> I could be wrong though.

Well, pretty much every NMI will call irq_queue_work() which calls
irq_work_claim() which does an uncondition cmpxchg (locked rmw) on the
global cacheline.

Which is *hurt*.


will try and reply to the rest later..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/