Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

From: Kyle Moffett
Date: Mon Sep 10 2007 - 08:22:39 EST


On Sep 10, 2007, at 06:56:29, Denys Vlasenko wrote:
On Sunday 09 September 2007 19:18, Arjan van de Ven wrote:
On Sun, 9 Sep 2007 19:02:54 +0100
Denys Vlasenko <vda.linux@xxxxxxxxxxxxxx> wrote:

Why is all this fixation on "volatile"? I don't think people want "volatile" keyword per se, they want atomic_read(&x) to _always_ compile into an memory-accessing instruction, not register access.

and ... why is that? is there any valid, non-buggy code sequence that makes that a reasonable requirement?

Well, if you insist on having it again:

Waiting for atomic value to be zero:

while (atomic_read(&x))
continue;

gcc may happily convert it into:

reg = atomic_read(&x);
while (reg)
continue;

Bzzt. Even if you fixed gcc to actually convert it to a busy loop on a memory variable, you STILL HAVE A BUG as it may *NOT* be gcc that does the conversion, it may be that the CPU does the caching of the memory value. GCC has no mechanism to do cache-flushes or memory- barriers except through our custom inline assembly. Also, you probably want a cpu_relax() in there somewhere to avoid overheating the CPU. Thirdly, on a large system it may take some arbitrarily large amount of time for cache-propagation to update the value of the variable in your local CPU cache. Finally, if atomics are based on based on spinlock+interrupt-disable then you will sit in a tight busy- loop of spin_lock_irqsave()->spin_unlock_irqrestore(). Depending on your system's internal model this may practically lock up your core because the spin_lock() will take the cacheline for exclusive access and doing that in a loop can prevent any other CPU from doing any operation on it! Since your IRQs are disabled you even have a very small window that an IRQ will come along and free it up long enough for the update to take place.

The earlier code segment of:
while(atomic_read(&x) > 0)
atomic_dec(&x);
is *completely* buggy because you could very easily have 4 CPUs doing this on an atomic variable with a value of 1 and end up with it at negative 3 by the time you are done. Moreover all the alternatives are also buggy, with the sole exception of this rather obvious- seeming one:
atomic_set(&x, 0);

You simply CANNOT use an atomic_t as your sole synchronizing primitive, it doesn't work! You virtually ALWAYS want to use an atomic_t in the following types of situations:

(A) As an object refcount. The value is never read except as part of an atomic_dec_return(). Why aren't you using "struct kref"?

(B) As an atomic value counter (number of processes, for example). Just "reading" the value is racy anyways, if you want to enforce a limit or something then use atomic_inc_return(), check the result, and use atomic_dec() if it's too big. If you just want to return the statistics then you are going to be instantaneous-point-in-time anyways.

(C) As an optimization value (statistics-like, but exact accuracy isn't important).

Atomics are NOT A REPLACEMENT for the proper kernel subsystem, like completions, mutexes, semaphores, spinlocks, krefs, etc. It's not useful for synchronization, only for keeping track of simple integer RMW values. Note that atomic_read() and atomic_set() aren't very useful RMW primitives (read-nomodify-nowrite and read-set-zero- write). Code which assumes anything else is probably buggy in other ways too.

So while I see no real reason for the "volatile" on the atomics, I also see no real reason why it's terribly harmful. Regardless of the "volatile" on the operation the CPU is perfectly happy to cache it anyways so it doesn't buy you any actual "always-access-memory" guarantees. If you are just interested in it as an optimization you could probably just read the properly-aligned integer counter directly, an atomic read on most CPUs.

If you really need it to hit main memory *every* *single* *time* (Why? Are you using it instead of the proper kernel subsystem?) then you probably need a custom inline assembly helper anyways.

Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/