The compiler is within its rights to read a 32-bit quantity 16 bits at
at time, even on a 32-bit machine. I would be glad to help pummel any
compiler writer that pulls such a dirty trick, but the C standard really
does permit this.
Use of volatile does in fact save you from the compiler pushing stores out
of loops regardless of whether you are also doing reads. The C standard
has the notion of sequence points, which occur at various places including
the ends of statements and the control expressions for "if" and "while"
statements. The compiler is not permitted to move volatile references
across a sequence point. Therefore, the compiler is not allowed to
push a volatile store out of a loop. Now the CPU might well do such a
reordering, but that is a separate issue to be dealt with via memory
barriers. Note that it is the CPU and I/O system, not the compiler,
that is forcing you to use reads to flush writes to MMIO registers.
And you would be amazed at what compiler writers will do in order to
get an additional fraction of a percent out of SpecCPU...
In short, please retain atomic_set()'s volatility, especially on those
architectures that declared the atomic_t's counter to be volatile.