Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]

From: Robert ÅwiÄcki
Date: Fri Feb 26 2016 - 15:00:10 EST


2016-02-26 20:44 GMT+01:00 Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>:

>> I've contacted Robert ÅwiÄcki (who found the microcode problem) in
>> case he wants to weigh in in this thread.. He was talking to some AMD
>> people, but I don't know the exactly who.
>
> And since it's looking increasingly likely that it's the same issue,
> I'm adding Robert here explicitly to the cc so that he sees the
> thread...

Thx,

Some data I was able to gather:

It happens only with 0x6000832 ucode, and Piledriver-based CPUs: i.e.
newer AMD FX, and Opteron 300 series (4300, 6300 etc.).

The visible effects are in ~80% of cases incorrect RSP leading to bad
'rets' into kernel data/bss or stack-protector faults. But there are
also more elusive ones, like registers being cleared before use in
indirect memory fetches or so.

I can trigger it from within qemu guest (non-root), causing bad RIP in
the host kernel. When testing, a couple of times (maybe 2) out of
maybe 30 seen oopses, I was able to set it to user-space addresses
mapped in the guest. It greatly depends on timing, but I think with
some more effort and populating kernel stack with guest addresses it'd
be possible to create a more reliable qemu-guest to host ring0 escape.

I CC'd some AMD engineers from this list, and on of them replied with
"We are working on the final testing of a new microcode patch to
replace 0x06000832."
but without specifying any errata no, or ETA for the new ucode.

I can only now suggest not using 0x06000832 is possible (i.e. if it's
not embedded in BIOS), I tested a few from
http://www.amd64.org/microcode.html and only this version seemed
vulnerable.

PS. There's a bug on vmware pages -
https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2061211
- which looks very similar to this problem (affects Opteron 6300 which
is Piledriver-based), and it was "somehow" patched by vmware in their
kernel. It points to AMD errata #815 -
http://support.amd.com/TechDocs/48063_15h_Mod_00h-0Fh_Rev_Guide.pdf -
but I cannot tell whether it's really the same problem, or whether it
can be somehow by-passed on the kernel side.

--
Robert ÅwiÄcki