Re: [PATCH v4 2/2] Output stall traces in /proc

From: Don Zickus
Date: Mon Aug 08 2011 - 17:37:37 EST


On Mon, Aug 08, 2011 at 10:52:51PM +0200, Peter Zijlstra wrote:
> Since its the NMI context that's generating the data, disabling IRQs
> will obviously not do much good.
>
> Also, if you've got a procfs output, what do you need irq_work for?
>
> Depending on the type of data and kind of data loss you can incur, would
> a static per-cpu buffer be ok? You can guard it by a single bit, if
> cleared NMI can write, if set NMI will skip over and loose the data.
>
> Then your procfs routine can set the bit, dump out the latest version of
> the data and clear the bit again. If you always want the NMI thing to be
> able to write, use two buffers, if you 'lock' then one at a time,
> there's always one writable.
>
> All you need is atomic bitops :-)
>
> > > > The softstall case should be ok though.
> > > Why's that? The soft stall traces are not written in a NMI context but
> > > just in a regular interrupt context, right? Doesn't that pose similar
> > > problems?
>
> You should be able to do stack traces from NMI context, that's what perf
> does after all.
>
> > > These are weird rare corner cases anyway, right? Maybe the simplest
> > > thing could be to let the interrupts only try_lock(), so they might
> > > sometimes fail to record a stall, but it would be a pretty big
> > > coincidence.
>
> Sure you can do a trylock. I'm still not quite sure what you want the
> irq_work for.. there's no guarantee the interrupt runs immediately after
> the NMI, it could be the NMI is in the middle of a irq disabled region,
> or the whole thing is ran on an architecture without
> arch_irq_work_raise().

Maybe irq_work isn't what we needed. I just wasn't smart enough to figure
out how to make sure we can write data in an NMI context and read it in a
normal context. I supposed the whole swapping buffers could work and is
simpler.

Basically, Zak was working on a way to save stack traces of potential
hard/soft lockups (ones that lockup for a minimum amount of time but
un-lockup before actually triggering anything). The idea was to get
visibility of who is spending time doing some wrong before it is too late.

Cheers,
Don

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/