Re: [PATCH v2] x86/cpu: Use SERIALIZE in sync_core() when available

From: Andy Lutomirski
Date: Wed Aug 05 2020 - 15:11:40 EST


On Wed, Aug 5, 2020 at 10:07 AM Ricardo Neri
<ricardo.neri-calderon@xxxxxxxxxxxxxxx> wrote:
>
> On Wed, Aug 05, 2020 at 07:08:08AM +0200, Borislav Petkov wrote:
> > On Tue, Aug 04, 2020 at 09:58:25PM -0700, hpa@xxxxxxxxx wrote:
> > > Because why use an alternative to jump over one instruction?
> > >
> > > I personally would prefer to have the IRET put out of line
> >
> > Can't yet - SERIALIZE CPUs are a minority at the moment.
> >
> > > and have the call/jmp replaced by SERIALIZE inline.
> >
> > Well, we could do:
> >
> > alternative_io("... IRET bunch", __ASM_SERIALIZE, X86_FEATURE_SERIALIZE, ...);
> >
> > and avoid all kinds of jumping. Alternatives get padded so there
> > would be a couple of NOPs following when SERIALIZE gets patched in
> > but it shouldn't be a problem. I guess one needs to look at what gcc
> > generates...
>
> But the IRET-TO-SELF code has instruction which modify the stack. This
> would violate stack invariance in alternatives as enforced in commit
> 7117f16bf460 ("objtool: Fix ORC vs alternatives"). As a result, objtool
> gives warnings as follows:
>
> arch/x86/kernel/alternative.o: warning: objtool: do_sync_core()+0xe:
> alternative modifies stack
>
> Perhaps in this specific case it does not matter as the changes in the
> stack will be undone by IRET. However, using alternative_io would require
> adding the macro STACK_FRAME_NON_STANDARD to functions using sync_core().
> IMHO, it wouldn't look good.
>
> So maybe the best approach is to implement as you suggested using
> static_cpu_has()?

I agree. Let's keep it simple.

Honestly, I think the right solution is to have iret_to_self() in
actual asm and invoke it from C as needed. IRET is *slow* -- trying
to optimize it at all is silly. The big optimization was switching
from CPUID to IRET, since CPUID is slooooooooooooooooooow in virtual
environments, whereas IRET is merely sloooooooow and SERIALIZE is
probably just sloooow.

(I once benchmarked it. IIRC the winning version on my laptop is MOV
to CR2 on bare metal and IRET in a Xen PV guest. This optimization
was not obviously worthwhile.)

>
> Thanks and BR,
> Ricardo