The only way to guarantee ordering in the above setup, is to either
make writel() fully ordered or adding the mmiowb()'s inbetween the two
writel's. On Altix you have to go and read from the PCI brige to
ensure all writes to it have been flushed, which is also what mmiowb()
is doing. If writel() was to guarantee this ordering, it would make
every writel() call extremely expensive :-(
So if a read from the bridge achieves the same effect, can't we just put
one after the writes within the spinlock (an unrelaxed one). That way
this whole sequence will look like a well understood PCI posting flush
rather than have to muck around with little understood (at least by most
driver writers) io barriers?