Re: [PATCH RFC V2 17/17] x86/entry: Preserve PKRS MSR across exceptions

From: Thomas Gleixner
Date: Thu Jul 23 2020 - 16:15:22 EST


Thomas Gleixner <tglx@xxxxxxxxxxxxx> writes:

> Ira Weiny <ira.weiny@xxxxxxxxx> writes:
>> On Fri, Jul 17, 2020 at 12:06:10PM +0200, Peter Zijlstra wrote:
>>> On Fri, Jul 17, 2020 at 12:20:56AM -0700, ira.weiny@xxxxxxxxx wrote:
>> I've been really digging into this today and I'm very concerned that I'm
>> completely missing something WRT idtentry_enter() and idtentry_exit().
>>
>> I've instrumented idt_{save,restore}_pkrs(), and __dev_access_{en,dis}able()
>> with trace_printk()'s.
>>
>> With this debug code, I have found an instance where it seems like
>> idtentry_enter() is called without a corresponding idtentry_exit(). This has
>> left the thread ref counter at 0 which results in very bad things happening
>> when __dev_access_disable() is called and the ref count goes negative.
>>
>> Effectively this seems to be happening:
>>
>> ...
>> // ref == 0
>> dev_access_enable() // ref += 1 ==> disable protection
>> // exception (which one I don't know)
>> idtentry_enter()
>> // ref = 0
>> _handler() // or whatever code...
>> // *_exit() not called [at least there is no trace_printk() output]...
>> // Regardless of trace output, the ref is left at 0
>> dev_access_disable() // ref -= 1 ==> -1 ==> does not enable protection
>> (Bad stuff is bound to happen now...)
>
> Well, if any exception which calls idtentry_enter() would return without
> going through idtentry_exit() then lots of bad stuff would happen even
> without your patches.
>
>> Also is there any chance that the process could be getting scheduled and that
>> is causing an issue?
>
> Only from #PF, but after the fault has been resolved and the tasks is
> scheduled in again then the task returns through idtentry_exit() to the
> place where it took the fault. That's not guaranteed to be on the same
> CPU. If schedule is not aware of the fact that the exception turned off
> stuff then you surely get into trouble. So you really want to store it
> in the task itself then the context switch code can actually see the
> state and act accordingly.

Actually thats nasty as well as you need a stack of PKRS values to
handle nested exceptions. But it might be still the most reasonable
thing to do. 7 PKRS values plus an index should be really sufficient,
that's 32bytes total, not that bad.

Thanks,

tglx