Re: Opinion on ordering of writel vs. stores to RAM
From: Benjamin Herrenschmidt
Date: Sun Sep 10 2006 - 20:28:38 EST
Ok, so we have two different proposals here...
Maybe we should cast a vote ? :)
* Option A:
- writel/readl are fully synchronous (minus mmiowb for spinlocks)
- we provide __writel/__readl with some barriers with relaxed ordering
between memory and MMIO (though still _precise_ semantics, not arch
specific)
* Option B:
- The driver decides at ioremap time wether it wants a fully ordered
mapping or not using
a "special" version of ioremap (with flags ?)
- writel/readl() behave differently depending on the mapping
- __writel/__readl might exist but are architecture specific (ahem...
still to be debated)
The former seems easier to me to implement. The later might indeed be a
bit easier for drivers writers, I'm not 100% convinced tho. The later
means stuffing special tokens in the returned address from ioremap and
testing for them in writel. However, the later is also what we need for
write combining (at least for PowerPC, maybe for other archs, wether a
mapping can write combine has to be decided by using flags in the page
table, thus has to be done at ioremap time. (*)
Votes ?
(*) Unrelated note about write combine: Due to weird implementation
issues, one cannot guarantee to prevent write combine on a write-combine
enabled mapping using barriers. This doesn't work due to CPUs combining
stores between threads on such mappings. I know of at least one CPU
doing that (Cell), there might be a lot more though. Thus a driver using
write combine might want to create two mappings for it's IOs, one with
combine enabled, one ordered, and use the "right" one depending on the
requirements of a given IO access.
> The tg3 bug actually seems not to be because of the missing wmb()'s,
> [the driver and all net traffic survive just fine in the case of non-
> TSO],
> but just because of a plain-and-simple programming bug in the driver.
> I'll run some tests tomorrow to confirm. If I'm right, this fix should
> go into .18 and into .17-stable at least.
Interesting :) I didn't actually verify the barrier problem theory
(though the driver does indeed seem to lack them, so there _is_ a
problem there too), I trusted Michael Chan who seemed to know about the
bug :) Anyway, that doesn't change anything to the fact that there is a
problem with barriers and we are on a good path to finally do something
sane about it.
> Of course the problems that the PowerPC port currently has with the
> missing wmb()'s are still there, but in the non-TSO case they almost
> always result in 100% garbage sent on the actual ethernet line, and
> that doesn't impede correctness (and it has been there since about
> forever; even the TSO case that _does_ corrupt data was in the released
> 2.6.17, it's very hard to hit). There's no reason to delay 2.6.18
> release or change the PowerPC I/O accessors because of this single
> issue, knowing it was in 2.6.17 already.
We can add the missing barriers to tg3 for 2.6.18 nontheless.
> That "trick" to avoid I/O accesses to "leak out" of protected regions
> is just fine, and should be done no matter what -- if it is decreed that
> spinlocks order I/O accesses at all, which is not a bad idea at all (in
> my opinion anyway).
Or we rely on an explicit barrier for that, which is what mmiowb() has
been defined for. We can use the "trick" to detect the missing ones
though, as a debugging aid.
Ben.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/