Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]

From: Linus Torvalds
Date: Fri Feb 26 2016 - 13:05:31 EST


On Fri, Feb 26, 2016 at 9:52 AM, Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> wrote:
>
> So more analysis would seem to confirm that RSP has been bumped +8
> while in ttwu_stat() so when the epilog executed, register restore
> was off by 1 qword. However, there's nothing in ttwu_stat() that
> results in stack pointer offset by +1 qword from prolog.

I agree.

That's why I'm actually starting to suspect that it's an AMD microcode
bug that we know very little about. There's apparently register
corruption (the guess being from NMI handling, but virtualization was
also involved) under some circumstances.

Of course, if Jiri isn't actually running this on an AMD CPU, that
theory flies right out the window. But we do have a reported oops on
the security list that looks totally different in the big picture, but
shares the exact same "corrupted stack pointer register state
resulting in crazy instruction pointer, resulting in NX fault"
behavior in the end.

In the other case, microcode patchlevel 0x0600081c was fine, and
0x06000832 is the one exhibiting the corruption problem.

I've contacted Robert ÅwiÄcki (who found the microcode problem) in
case he wants to weigh in in this thread.. He was talking to some AMD
people, but I don't know the exactly who.

Linus