Re: [PATCH V3 04/10] x86/pks: Preserve the PKRS MSR on context switch

From: Thomas Gleixner
Date: Fri Dec 18 2020 - 16:31:00 EST


On Fri, Dec 18 2020 at 11:42, Ira Weiny wrote:
> On Fri, Dec 18, 2020 at 02:57:51PM +0100, Thomas Gleixner wrote:
>> 2) Modify kmap() so that it marks the to be mapped page as 'globaly
>> unprotected' instead of doing this global unprotect PKS dance.
>> kunmap() undoes that. That obviously needs some thought
>> vs. refcounting if there are concurrent users, but that's a
>> solvable problem either as part of struct page itself or
>> stored in some global hash.
>
> How would this globally unprotected flag work? I suppose if kmap created a new
> PTE we could make that PTE non-PKS protected then we don't have to fiddle with
> the register... I think I like that idea.

No. Look at the highmem implementation of kmap(). It's a terrible idea,
really. Don't even think about that.

There is _no_ global flag. The point is that the kmap is strictly bound
to a particular struct page. So you can simply do:

kmap(page)
if (page_is_access_protected(page))
atomic_inc(&page->unprotect);

kunmap(page)
if (page_is_access_protected(page))
atomic_dec(&page->unprotect);

and in the #PF handler:

if (!page->unprotect)
goto die;

The reason why I said: either in struct page itself or in a global hash
is that struct page is already packed and people are not really happy
about increasing it's size. But the principle is roughly the same.

>>
>> 4) Have a smart #PF mechanism which does:
>>
>> if (error_code & X86_PF_PK) {
>> page = virt_to_page(address);
>>
>> if (!page || !page_is_globaly_unprotected(page))
>> goto die;
>>
>> if (pks_mode == PKS_MODE_STRICT)
>> goto die;
>>
>> WARN_ONCE(pks_mode == PKS_MODE_RELAXED, "Useful info ...");
>>
>> temporary_unprotect(page, regs);
>> return;
>> }
>
> I feel like this is very similar to what I had in the global patch you found in
> my git tree with the exception of the RELAXED mode. I simply had globally
> unprotected or die.

Your stuff depends on that global_pks_state which is not maintainable
especially not the teardown side. This depends on per page state which
is clearly way simpler and more focussed.

> Regardless I think unprotecting a global context is the easy part. The code
> you had a problem with (and I see is fully broken) was the restriction of
> access. A failure to update in that direction would only result in a wider
> window of access. I contemplated not doing a global update at all and just
> leave the access open until the next context switch. But the code as it stands
> tries to force an update for a couple of reasons:
>
> 1) kmap_local_page() removes most of the need for global pks. So I was
> thinking that global PKS could be a slow path.
>
> 2) kmap()'s that are handed to other contexts they are likely to be 'long term'
> and should not need to be updated 'too' often. I will admit that I don't
> know how often 'too often' is.

Even once in while is not a justification for stopping the world for N
milliseconds.

>> temporary_unprotect(page, regs)
>> {
>> key = page_to_key(page);
>>
>> /* Return from #PF will establish this for the faulting context */
>> extended_state(regs)->pks &= ~PKS_MASK(key);
>> }
>>
>> This temporary unprotect is undone when the context is left, so
>> depending on the context (thread, interrupt, softirq) the
>> unprotected section might be way wider than actually needed, but
>> that's still orders of magnitudes better than having this fully
>> unrestricted global PKS mode which is completely scopeless.
>
> I'm not sure I follow you. How would we know when the context is
> left?

The context goes away on it's own. Either context switch or return from
interrupt. As I said there is an extended window where the external
context still might have unprotected access even if the initiating
context has called kunmap() already. It's not pretty, but it's not the
end of the world either.

That's why I suggested to have that WARN_ONCE() so we can actually see
why and where that happens and think about solutions to make this go
into local context, e.g. by changing the vaddr pointer to a struct page
pointer for these particular use cases and then the other context can do
kmap/unmap_local().

>> 5) The DAX case which you made "work" with dev_access_enable() and
>> dev_access_disable(), i.e. with yet another lazy approach of
>> avoiding to change a handful of usage sites.
>>
>> The use cases are strictly context local which means the global
>> magic is not used at all. Why does it exist in the first place?
>
> I'm not following. What is 'it'?

That global argument to dev_access_enable()/disable().

>> That leaves the question about the refcount. AFAICT, nothing nests
>> in that use case for a given execution context. I'm surely missing
>> something subtle here.
>
> The refcount is needed for non-global pks as well as global. I've not resolved
> if anything needs to be done with the refcount on the global update since the
> following is legal.
>
> kmap()
> kmap_local_page()
> kunmap()
> kunmap_local()
>
> Which would be a problem. But I don't think it is ever actually done.

If it does not exist why would we support it in the first place? We can
have some warning there to catch that case.

> Another problem would be if the kmap and kunmap happened in different
> contexts... :-/ I don't think that is done either but I don't know for
> certain.
>
> Frankly, my main focus before any of this global support has been to
> get rid of as many kmaps as possible.[1] Once that is done I think
> more of these questions can be answered better.

I was expecting that you could answer these questions :)

Thanks,

tglx