Re: [PATCH v4 2/2] Output stall traces in /proc

From: Peter Zijlstra
Date: Mon Aug 08 2011 - 16:53:16 EST

On Mon, 2011-08-08 at 16:19 -0400, Don Zickus wrote:

> I believe irq_work_queue uses cmpxchg for all its locking and just swaps
> entries on to a linked list?

yeah, cmpxchg to add entries, xchg to splice the whole list out.

> >
> > Or maybe the intermediate buffer should be dynamically allocated. That
> > would work without a lock, although it seems slightly inefficient.
> Peter,
> How does the irq_work_queue work such that you can save info in the NMI
> context and safely pass it to the irq context for processing?

It doesn't cover that part. It assumes a pre-allocated struct irq_work
exists and can enqueue that on a list, if its already enqueued, nothing
to do (irq_work_queue() returns that state).

Typically irq_work would be embedded in a larger structure.

> > Regarding the lock between the work queue thread and the system call,
> > maybe that should become a mutex instead, since it's all outside of
> > interrupt context at that point?
> No it is still in the irq context.

Right, irq_work is typically ran from hardirq context, either through
some self-IPI, raised at irq_work_queue() or in the fallback case from
the timer interrupt.

> Peter,
> If we want to expose data captured in the NMI context through the procfs,
> I assume we can pass that info along using irq_work_queue. But then when
> reading from procfs do we just lock the data with 'spin_lock_irq' to block
> the irq_work_queue from manipulating the data? (note we are expecting
> data to be overwritten with fresh data, not serialized out like
> trace/perf).

Since its the NMI context that's generating the data, disabling IRQs
will obviously not do much good.

Also, if you've got a procfs output, what do you need irq_work for?

Depending on the type of data and kind of data loss you can incur, would
a static per-cpu buffer be ok? You can guard it by a single bit, if
cleared NMI can write, if set NMI will skip over and loose the data.

Then your procfs routine can set the bit, dump out the latest version of
the data and clear the bit again. If you always want the NMI thing to be
able to write, use two buffers, if you 'lock' then one at a time,
there's always one writable.

All you need is atomic bitops :-)

> > > The softstall case should be ok though.
> > Why's that? The soft stall traces are not written in a NMI context but
> > just in a regular interrupt context, right? Doesn't that pose similar
> > problems?

You should be able to do stack traces from NMI context, that's what perf
does after all.

> > These are weird rare corner cases anyway, right? Maybe the simplest
> > thing could be to let the interrupts only try_lock(), so they might
> > sometimes fail to record a stall, but it would be a pretty big
> > coincidence.

Sure you can do a trylock. I'm still not quite sure what you want the
irq_work for.. there's no guarantee the interrupt runs immediately after
the NMI, it could be the NMI is in the middle of a irq disabled region,
or the whole thing is ran on an architecture without
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at