Re: [linus:master] [fortify] 239d87327d: vm-scalability.throughput 17.3% improvement
From: Mateusz Guzik
Date: Fri Jan 10 2025 - 14:15:08 EST
On Fri, Jan 10, 2025 at 5:58 PM Kees Cook <kees@xxxxxxxxxx> wrote:
>
> On Thu, Jan 09, 2025 at 11:01:47PM +0100, Mateusz Guzik wrote:
> > That is to say, contrary to the report above, I believe the change is
> > in fact a regression which just so happened to make things faster for
> > a specific case. The unintended speed up can be achieved without
> > regressing anything else by taming the craziness.
>
> How do we best make sense of the perf report? Even in the iter case
> above, it looks like a perf improvement?
>
The kernel without your change compiled with gcc is leaving
performance on the table in select cases, namely when it elects to use
rep movsq for sizes below a magic threshold (depends on uarch).
Your change has the unintended side effect of changing
copy_page_from_iter_atomic to use plain memcpy, which justhappens to
be the right thing to do for this particular consumer.
However, it also has a side effect forcing of a memcpy call in places
which were optimized just fine -- for example if there is a spot where
there is a variable number of bytes to copy, but the range is small
and the upper limit is also small, gcc will elect to emit few movs and
be done with it, which is faster than calling memcpy. That is to say
for spots like that this is a regression.
In terms of optimizing all of this, the thing to do is to convince gcc
to not emit rep movsq for known problematic cases. But also not mess
with places which are optimized fine.
--
Mateusz Guzik <mjguzik gmail.com>