Re: [PATCH v4 2/2] Output stall traces in /proc

From: Don Zickus
Date: Tue Aug 09 2011 - 17:08:46 EST

On Mon, Aug 08, 2011 at 04:34:33PM -0700, Alex Neronskiy wrote:
> On Mon, Aug 8, 2011 at 2:37 PM, Don Zickus <dzickus@xxxxxxxxxx> wrote:
> > Maybe irq_work isn't what we needed.  I just wasn't smart enough to figure
> > out how to make sure we can write data in an NMI context and read it in a
> > normal context.  I supposed the whole swapping buffers could work and is
> > simpler.
> So each of the buffers has its own lock, right? If a lock protects a
> pair of buffers, then: A reader takes the lock, and a writing NMI
> comes in and writes to the non-readable buffer and swaps the two. The
> reader still has the lock. Another NMI comes in, sees that the lock is
> unavailable, and writes to the "backup" buffer, which is actually the
> one the reader is still reading from. Bad corrupted read results.

Actually it should just overwrite the non-readable buffer as that data is
now stale and useless. The reader can atomically set which buffer is
being read. The only problem is once you read it, you lose it.

> Either way, I don't see how to make the idea work safely for one pair
> of buffers shared by multiple CPU's. It works one-pair-per-CPU, but
> that's not how the current design is. I guess it would need to
> add/remove files every time a processor is added/removed, and there
> have to be some other changes too, obviously. What do you think, Don?
> Should this be a per-CPU thing, instead of global worst?

Well, looking at the code again, I think the spin_locks in the NMI handler
will block the other cpus from writing to the page at the same time. So
it gets serialized that way, I think. The next trick is to do something
with procfs like swapping buffers successfully.

I am trying to think how that would work, but I guess if you use the
cmpxchg macros then

procfs could cmpxchg a READ_BIT on the buffer and if successful (no
WRITE_BIT), then proceed to read the buffer. Otherwise use the other
buffer. A lock would have to be used to serialize access on the procfs.

The NMI code could do the same with a WRITE_BIT and if succesful (no
READ_BIT), then proceed to write the buffer. Otherwise use the other
buffer. Because the NMI is serialized only one write could go on at any
one time.

I was reading the kernel/irq_work.c code to generate the above idea.

Not sure if it works. Thoughts?

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at