Re: [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware
From: Ingo Molnar
Date: Sat Feb 17 2018 - 05:26:36 EST
* Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx> wrote:
> On 02/16/2018 11:16 AM, David Woodhouse wrote:
> > On Fri, 2018-02-16 at 10:44 -0800, Tim Chen wrote:
> >>
> >> I encountered hang on a machine but not others when using the above
> >> macro. It is probably an alignment thing with ALTERNATIVE as the
> >> problem went
> >> away after I made the change below:
> >>
> >> Tim
> >>
> >> diff --git a/arch/x86/include/asm/nospec-branch.h
> >> b/arch/x86/include/asm/nospec-branch.h
> >> index 8f2ff74..0f65bd2 100644
> >> --- a/arch/x86/include/asm/nospec-branch.h
> >> +++ b/arch/x86/include/asm/nospec-branch.h
> >> @@ -148,6 +148,7 @@ extern char __indirect_thunk_end[];
> >>
> >> #define alternative_msr_write(_msr, _val, _feature) \
> >> asm volatile(ALTERNATIVE("", \
> >> + ".align 16\n\t" \
> >> "movl %[msr], %%ecx\n\t" \
> >> "movl %[val], %%eax\n\t" \
> >> "movl $0, %%edx\n\t" \
> >
> > That's weird. Note that .align in an altinstr section isn't actually
> > going to do what you'd expect; the oldinstr and altinstr sections
> > aren't necessarily aligned the same, so however many NOPs it inserts
> > into the alternative, might be deliberately *misaligning* it in the
> > code that actually gets executed.
> >
> > Are you sure you're not running a kernel where the alternatives code
> > would turn that alternative which *starts* with a NOP, into *all* NOPs?
> >
>
> I rebuild the kernel again without the align. I'm no longer
> seeing the issue again on that machine that had an issue earlier.
> So let's ignore this for now as I can't reproduce the problem.
>
> It should be other issues causing the hang I saw earlier.
Note that PeterZ was struggling with intermittent boot hangs yesterday as well,
which hangs came and went during severeal (fruitless) bisection attempts. Then at
a certain point the hangs went away altogether.
The symptoms for both his and your hangs are consistent with an alignment
dependent bug.
My other guess is that it's perhaps somehow microcode related?
I'm not seeing any hangs myself, on various test systems.
Thanks,
Ingo