Re: TG3 data corruption (TSO ?)

From: Benjamin Herrenschmidt
Date: Sat Sep 09 2006 - 18:33:40 EST


On Sat, 2006-09-09 at 02:22 -0700, David Miller wrote:
> From: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>
> Date: Sat, 09 Sep 2006 07:46:02 +1000
>
> > I don't think that in general, you have ordering guarantees between
> > cacheable and non-cacheable stores unless you use explicit barriers.
>
> In fact, on most systems you absolutely do have ordering between
> MMIO and memory accesses.

Well, at least powerpc and ia64 don't in hw, I don't know about
others... out of order in general is getting fascionable in processor
design ...

> So you are making an extremely poor engineering decision
> by trying to fixup all the drivers to match PowerPC's
> semantics. I think a smart engineer would decrease his
> debugging burdon, by matching his platform's MMIO accessors
> such that it matches what other platforms do and therefore
> inheriting the testing coverage provided by all platforms.
>
> Otherwise you will be hunting down these kinds of memory
> barrier issues forever.

Well, some of you (Alan, you, etc...) seem to imply that it's always
been the rule to have a memory store followed by an MMIO write be
strongly ordered.

However, if you look at drivers like e1000, USB OHCI, or even sungem
(:-) they, all have at least wmb()'s between updating descriptor in
memory and the MMIO that triggers reading those by the chip. So it seems
that I'm not the only one to have thought otherwise ;-) More
specificaly, at least ia64 I think, like PowerPC, assumes no ordering
requirement here. So they would need fixing too.

My main problem is the cost... it's actually very expensive to do that
sort of synchronisation. I don't know for ia64 or other potentially out
of order architectures, but we do introduced a measureable performance
hit by adding the one we already have to guard against spin_unlock.

So if we decide to go the way of making writel synchronous vs. previous
MMIOs, I'd really like to have a clearly defined "relaxed" version as
well.

However, I'm not sure any of our current "relaxed" accessor have clear
semantics. At least what is implemented currently on PowerPC is the
__raw_* versions which not only have no barriers at all (they don't even
order between MMIOs, for example, readl might cross writel), and do no
endian swap. Quite a mess of semantics if you ask me... Then there has
been talks about those *_relaxed operations but those are more a match
to the relaxed PCI-X and PCI-E cycels, that is they relax ordering vs.
requests in a different direction on the bus, they have nothing to do
with storage domains on the CPU.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/