Re: Yet another KPTI regression with 4.14.x series in a VM

From: Andy Lutomirski
Date: Sat Jan 13 2018 - 15:02:29 EST


On Fri, Jan 12, 2018 at 10:33 PM, Willy Tarreau <w@xxxxxx> wrote:
> On Fri, Jan 12, 2018 at 10:08:20PM -0800, Andy Lutomirski wrote:
>> In fact, it looks like this code is totally bogus and has never been
>> correct at all. Even in:
>>
>> commit 4b1d5ae3b103eda43f9d0f85c355bb6995b03a30
>> Author: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>> Date: Mon Dec 4 15:07:59 2017 +0100
>>
>> x86/mm: Use/Fix PCID to optimize user/kernel switches
>>
>> We have:
>>
>> .macro SWITCH_TO_USER_CR3_NOSTACK scratch_reg:req scratch_reg2:req
>> ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
>> mov %cr3, \scratch_reg
>>
>> ALTERNATIVE "jmp .Lwrcr3_\@", "", X86_FEATURE_PCID
>>
>> ...
>>
>> .Lwrcr3_\@:
>> /* Flip the PGD and ASID to the user version */
>> orq $(PTI_SWITCH_MASK), \scratch_reg
>> mov \scratch_reg, %cr3
>> .Lend_\@:
>>
>> That's bogus. PTI_SWITCH_MASK is 0x1800, which has PCID = 0x800.
>>
>> This should probably use an alternative to select between 0x1000 and
>> 0x800 depending on X86_FEATURE_PCID or just use an entirely different
>> label for the !PCID case.
>>
>> FWIW, this bit in SAVE_AND_SWITCH_TO_KERNEL_CR3
>>
>> testq $(PTI_SWITCH_MASK), \scratch_reg
>> jz .Ldone_\@
>>
>> is a bit silly, too. It's *correct* (I think), but shouldn't that
>> just be bt $(PTI_SWITCH_PGTABLES_BIT), \scratch_reg, with the obvious
>> caveat that the headers don't actually define PTI_SWITCH_PGTABLES_BIT?
>
> I wondered the same initially when reading this but thought there was
> surely a good reason that I could not understand due to my lack of
> knowledge and stopped wondering. BTW your PTI_SWITCH_PGTABLES_BIT would
> in fact be PAGE_SHIFT :-)

Trying to inventory this stuff scattered all over the place:

#define PTI_PGTABLE_SWITCH_BIT PAGE_SHIFT
#define PTI_SWITCH_PGTABLES_MASK (1<<PAGE_SHIFT)
# define X86_CR3_PTI_SWITCH_BIT 11
#define PTI_SWITCH_MASK
(PTI_SWITCH_PGTABLES_MASK|(1<<X86_CR3_PTI_SWITCH_BIT))

Blech. I wouldn't be terribly surprised if I missed a few as well. How about:

PTI_USER_PGTABLE_BIT = PAGE_SHIFT
PTI_USER_PGTABLE_MASK = 1 << PTI_USER_PGTABLE_BIT
PTI_USER_PCID_BIT = 11
PTI_USER_PCID_MASK = 1 << PTI_USER_PCID_BIT
PTI_USER_PGTABLE_AND_PCID_MASK = PTI_USER_PCID_MASK | PTI_USER_PGTABLE_MASK

This naming would make the apparently buggy code look fishy, as it
should. I will give this a shot some time soon if no one beats me to
it.