Re: [REGRESSION] 998ef75ddb and aio-dio-invalidate-failure w/ data=journal
From: Dave Hansen
Date: Mon Oct 05 2015 - 16:49:19 EST
On 10/05/2015 01:22 PM, Linus Torvalds wrote:
> On Mon, Oct 5, 2015 at 5:23 PM, Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> wrote:
>> One thing I've been noticing on Skylake is that barriers (implicit and
>> explicit) are showing up more in profiles.
>
> Ahh, you're on skylake?
Yup.
> It's entirely possible that the issue is that the whole
> "stac/mov/clac" is much more expensive because skylake actually ends
> up supporting those AC instructions. That would make sense.
>
> We could probably do them outside the loop, rather than tightly around
> the actual move instructions. Peter (hpa), is there some sane
> interface to try to do that?
iov_iter_fault_in_readable() is just going and touching a single word in
the page so that it is faulted in, or a pair of words if it manages to
cross a page boundary (which isn't happening here). I'm not sure
there's a loop to move them out of here (for the prefaulting part).
We could theoretically expand the stac/clac to be around the pair of
__get_user()s in fault_in_pages_readable() but that would only help the
case where we are crossing a page boundary.
Although I was probably wrong about the source of the overhead, the
point still remains that the prefaulting is eating cycles for no
practical benefit.
>> What we're seeing here
>> probably isn't actually stac/clac overhead, but the cost of finishing
>> some other operations that are outstanding before we can proceed through
>> here.
>
> I suspect it actually _is_ stac/clac overhead. It might well be that
> clac/stac ends up serializing loads some way. Last I heard, they were
> reasonably cheap but certainly not free - and when we're talking about
> something that just loops over bringing the line into cache, it might
> be relatively expensive.
>
> How did you do the profile? Use "-e cycles:pp" to get the precise
> profile information, which should actually attribute the cost to the
> instruction that really causes it.
It reduced the skid a bit.
Plain (no -e"):
> â stac
> 24.57 â mov (%rcx),%sil
> 15.70 â clac
> 28.77 â test %eax,%eax
> 2.15 â mov %sil,-0x1(%rbp)
> 8.93 â â jne 66
> 2.31 â movslq %edx,%rdx
With "-e cycles:pp":
> â sub $0x8,%rsp
> 24.57 â stac
> 15.49 â mov (%rcx),%sil
> 29.06 â clac
> 2.24 â test %eax,%eax
> 8.77 â mov %sil,-0x1(%rbp)
> 2.22 â â jne 66
> â movslq %edx,%rdx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/