Re: [PATCH 1/2] x86/purgatory: add -mno-sse, -mno-mmx, -mno-sse2 to Makefile

From: Nick Desaulniers
Date: Mon Jul 22 2019 - 17:12:22 EST


On Fri, Jul 19, 2019 at 1:17 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Thu, Jul 18, 2019 at 02:34:44PM -0700, Nick Desaulniers wrote:
> > On Wed, Jul 17, 2019 at 5:02 PM Vaibhav Rustagi
> > <vaibhavrustagi@xxxxxxxxxx> wrote:
> > >
> > > Compiling the purgatory code with clang results in using of mmx
> > > registers.
> > >
> > > $ objdump -d arch/x86/purgatory/purgatory.ro | grep xmm
> > >
> > > 112: 0f 28 00 movaps (%rax),%xmm0
> > > 115: 0f 11 07 movups %xmm0,(%rdi)
> > > 122: 0f 28 00 movaps (%rax),%xmm0
> > > 125: 0f 11 47 10 movups %xmm0,0x10(%rdi)
> > >
> > > Add -mno-sse, -mno-mmx, -mno-sse2 to avoid generating SSE instructions.
> > >
> > > Signed-off-by: Vaibhav Rustagi <vaibhavrustagi@xxxxxxxxxx>
> > > ---
> > > arch/x86/purgatory/Makefile | 1 +
> > > 1 file changed, 1 insertion(+)
> > >
> > > diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
> > > index 3cf302b26332..3589ec4a28c7 100644
> > > --- a/arch/x86/purgatory/Makefile
> > > +++ b/arch/x86/purgatory/Makefile
> > > @@ -20,6 +20,7 @@ KCOV_INSTRUMENT := n
> > > # sure how to relocate those. Like kexec-tools, use custom flags.
> > >
> > > KBUILD_CFLAGS := -fno-strict-aliasing -Wall -Wstrict-prototypes -fno-zero-initialized-in-bss -fno-builtin -ffreestanding -c -Os -mcmodel=large
> > > +KBUILD_CFLAGS += -mno-mmx -mno-sse -mno-sse2
> >
> > Yep, this is a commonly recurring bug in the kernel, observed again
> > and again for Clang builds. The top level Makefile carefully sets
> > KBUILD_CFLAGS, then lower subdirs in the kernel wipe them away with
> > `:=` assignment. Invariably important flags don't always get re-added.
> > In this case, these flags are used in arch/x86/Makefile, but not here
> > and should be IMO. Thanks for the patch.
>
> Should we then not fix/remove these := assignments?

Good point, it's actually pretty straightforward to do so. It just
will invert the order of patches in the series, as then the
memcpy/memset infinite recursion is now guaranteed with
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y (without the other patch in this
series). Did the x86 maintainers have thoughts on their favorite
implementation of memset/memcpy for me to use from the thread from the
other patch in the series? I'll just resend with this fix and maybe we
can discuss there and spin a v3 if needed.

--
Thanks,
~Nick Desaulniers