Re: [RFC PATCH] x86: prevent gcc from emitting rep movsq/stosq for inlined ops
From: Mateusz Guzik
Date: Wed Apr 02 2025 - 12:31:00 EST
On Wed, Apr 2, 2025 at 6:22 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Wed, 2 Apr 2025 at 06:42, Mateusz Guzik <mjguzik@xxxxxxxxx> wrote:
> >
> >
> > +ifdef CONFIG_CC_IS_GCC
> > +#
> > +# Inline memcpy and memset handling policy for gcc.
> > +#
> > +# For ops of sizes known at compilation time it quickly resorts to issuing rep
> > +# movsq and stosq. On most uarchs rep-prefixed ops have a significant startup
> > +# latency and it is faster to issue regular stores (even if in loops) to handle
> > +# small buffers.
> > +#
> > +# This of course comes at an expense in terms of i-cache footprint. bloat-o-meter
> > +# reported 0.23% increase for enabling these.
> > +#
> > +# We inline up to 256 bytes, which in the best case issues few movs, in the
> > +# worst case creates a 4 * 8 store loop.
> > +#
> > +# The upper limit was chosen semi-arbitrarily -- uarchs wildly differ between a
> > +# threshold past which a rep-prefixed op becomes faster, 256 being the lowest
> > +# common denominator. Someone(tm) should revisit this from time to time.
> > +#
> > +KBUILD_CFLAGS += -mmemcpy-strategy=unrolled_loop:256:noalign,libcall:-1:noalign
> > +KBUILD_CFLAGS += -mmemset-strategy=unrolled_loop:256:noalign,libcall:-1:noalign
> > +endif
>
> Please make this a gcc bug-report instead - I really don't want to
> have random compiler-specific tuning options in the kernel.
>
> Because that whole memcpy-strategy thing is something that gets tuned
> by a lot of other compiler options (ie -march and different versions).
>
Ok.
--
Mateusz Guzik <mjguzik gmail.com>