Re: [rfc][patch 3/3] x86: optimise barriers

From: Jarek Poplawski
Date: Tue Oct 16 2007 - 04:57:21 EST


On Tue, Oct 16, 2007 at 02:50:33AM +0200, Nick Piggin wrote:
> On Mon, Oct 15, 2007 at 11:10:00AM +0200, Jarek Poplawski wrote:
...
> > I'm not performance-words at all, so I can't help you, sorry. But, I
> > understand people who care about this, and think there is a popular
> > conviction barriers and locked instructions are costly, so I'm
> > surprised there is any "problem" now with finding these gains...
>
> It's more expensive than nothing, sure. However in real code, algorithmic
> complexity, cache misses and cacheline bouncing tend to be much bigger
> issues.
>
> I can't think of a place in the kernel where smp_rmb matters _that_ much.
> seqlocks maybe (timers, dcache lookup), vmscan... Obviously removing the
> lfence is not going to hurt. Maybe we even gain 0.01% performance in
> someone's workload.
>
> Also, remember: if loads are already in-order, then lfence is a noop,
> right? (in practice it seems to have to do a _little_ bit of work, but
> it's like a dozen cycles).

You are right: considering current CPUs there could be no performance
problem at all. Removing LOCKs for older ones should probably matter
more, but as a matter of fact, now I wouldn't bet even on this - it
could be mostly noop as well.

> > As a matter of fact it's not natural for me at all. I expected the
> > other direction, and I still doubt programmers' intentions could be
> > "automatically" predicted good enough, so IMHO, it's not for long.
>
> Really? Consider the consequences if, instead of releasing this latest
> document tightening consistency, Intel found that out of order loads
> were worth 5% more performance and implemented them in their next chip.
> The chip could be completely backwards compatible, but all your old code
> would break, because it was broken to begin with (because it was outside
> the spec).

I've different opinion on this: I expect any spec to describe current
implementation. Before issuing new models any changes of
implementation should be made public with proper margin of time. Then
system could be optimally adjusted to a real hardware, instead of
planned only, but possibly never realized (plus doing such not used
things with old means is usually more costly: lock vs. lfence). There
is still problem of specs' completness: there are probably often some
things unspecified which could brake on a new model, so never 100%
guarantee anyway.

> IMO Intel did exactly the right thing from an engineering perspective,
> and so did Linux to always follow the spec.

But, if you follow the spec - you don't follow the spec! Why do you
ignore so much this part of Intel's spec:

"This document contains information which Intel may change at any
time without notice. Do not finalize a design with this information."

Maybe it's a real Intel intention and not for lawyers only? (Btw, it
seems we have an example.)

Regards,
Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/