Re: [REGRESSION] 998ef75ddb and aio-dio-invalidate-failure w/ data=journal

From: Linus Torvalds
Date: Mon Oct 05 2015 - 16:22:50 EST


On Mon, Oct 5, 2015 at 5:23 PM, Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> wrote:
>
> One thing I've been noticing on Skylake is that barriers (implicit and
> explicit) are showing up more in profiles.

Ahh, you're on skylake?

It's entirely possible that the issue is that the whole
"stac/mov/clac" is much more expensive because skylake actually ends
up supporting those AC instructions. That would make sense.

We could probably do them outside the loop, rather than tightly around
the actual move instructions. Peter (hpa), is there some sane
interface to try to do that?

> What we're seeing here
> probably isn't actually stac/clac overhead, but the cost of finishing
> some other operations that are outstanding before we can proceed through
> here.

I suspect it actually _is_ stac/clac overhead. It might well be that
clac/stac ends up serializing loads some way. Last I heard, they were
reasonably cheap but certainly not free - and when we're talking about
something that just loops over bringing the line into cache, it might
be relatively expensive.

How did you do the profile? Use "-e cycles:pp" to get the precise
profile information, which should actually attribute the cost to the
instruction that really causes it.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/