Re: frequent lockups in 3.18rc4

From: Borislav Petkov
Date: Mon Dec 15 2014 - 09:00:13 EST

On Sun, Dec 14, 2014 at 09:47:26PM -0800, Linus Torvalds wrote:
> and "save_xstate_sig+0x81" shows up on all stacks, although only on
> CPU1 does it show up as a "guaranteed" part of the stack chain (ie it
> matches frame pointer data too). CPU1 also has that __clear_user show
> up (which is called from save_xstate_sig), but not other CPU's. CPU2
> and CPU3 have "save_xstate_sig+0x98" in addition to that +0x81 thing.
> My guess is that "save_xstate_sig+0x81" is the instruction after the
> __clear_user call, and that CPU1 took the fault in __clear_user(),
> while CPU2 and CPU3 took the fault at "save_xstate_sig+0x98" instead,
> which I'd guess is the
> xsave64 (%rdi)

Err, maybe a wild guess, but could XSAVE be encountering some problems,
like store ordering violations or somesuch?

Quick search shows

"AZ72. Store Ordering Violation When Using XSAVE"

here which
talks about SSE context stores happening out of order. Now, there are a
lot of IFs like does Dave's machine even have the erratum and even if,
would that erratum cause some sort of a livelock leading to the kernel
lockups and so on and so on...

It might be worth to rule out though.


