Re: [PATCH] make atomic_t volatile on all architectures

From: Chris Snook
Date: Thu Aug 09 2007 - 07:45:18 EST


Herbert Xu wrote:
On Thu, Aug 09, 2007 at 03:47:57AM -0400, Chris Snook wrote:
If they're not doing anything, sure. Plenty of loops actually do some sort of real work while waiting for their halt condition, possibly even work which is necessary for their halt condition to occur, and you definitely don't want to be doing cpu_relax() in this case. On register-rich architectures you can do quite a lot of work without needing to reuse the register containing the result of the atomic_read(). Those are precisely the architectures where barrier() hurts the most.

I have a problem with this argument. The same loop could be
using a non-atomic as long as the updaters are serialised. Would
you suggest that we turn such non-atomics into volatiles too?

No. I'm simply saying that when people call atomic_read, they expect a read to occur. When this doesn't happen, people get confused. Merely referencing a variable doesn't carry the same connotation.

Anyway, I'm taking Linus's advice and putting the cast in atomic_read(), rather than the counter declaration itself. Everything else uses __asm__ __volatile__, or calls atomic_read() with interrupts disabled. This ensures that atomic_read() works as expected across all architectures, without the cruft the compiler generates when you declare the variable itself volatile.

Any loop that's waiting for an external halt condition either
has to schedule away (which is a barrier) or you'd be busy
waiting in which case you should use cpu_relax.

Not necessarily. Some code uses atomic_t for a sort of lightweight semaphore. If your loop is actually doing real work, perhaps in a softirq handler negotiating shared resources with a hard irq handler, you're not busy-waiting.

Do you have an example where this isn't the case?

a) No, and that's sort of the point. We shouldn't have to audit the whole kernel to find cases where a misleadingly-named function is doing what its users are not expecting. If we can make it always do the right thing without any substantial drawbacks, we should.

b) Loops are just one case, which came to mind because of the IPVS bug recently discussed. I recall seeing some scheduler code recently which does an atomic_read() twice on the same variable, with a barrier in between. It's in a very hot path, so if we can remove that barrier, we save a bunch of register loads. When you're context switching every microsecond in SCHED_RR, that really matters.

-- Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/