Re: [PATCH V3 04/10] x86/pks: Preserve the PKRS MSR on context switch

From: Thomas Gleixner
Date: Fri Dec 18 2020 - 08:58:36 EST


On Thu, Dec 17 2020 at 23:43, Thomas Gleixner wrote:
> The only use case for this in your tree is: kmap() and the possible
> usage of that mapping outside of the thread context which sets it up.
>
> The only hint for doing this at all is:
>
> Some users, such as kmap(), sometimes requires PKS to be global.
>
> 'sometime requires' is really _not_ a technical explanation.
>
> Where is the explanation why kmap() usage 'sometimes' requires this
> global trainwreck in the first place and where is the analysis why this
> can't be solved differently?
>
> Detailed use case analysis please.

A lengthy conversation with Dan and Dave over IRC confirmed what I was
suspecting.

The approach of this whole PKS thing is to make _all_ existing code
magically "work". That means aside of the obvious thread local mappings,
the kmap() part is needed to solve the problem of async handling where
the mapping is handed to some other context which then uses it and
notifies the context which created the mapping when done. That's the
principle which was used to make highmem work long time ago.

IMO that was a mistake back then. The right thing would have been to
change the code so that it does not rely on a temporary mapping created
by the initiator. Instead let the initiator hand the page over to the
other context which then creates a temporary mapping for fiddling with
it. Water under the bridge...

Glueing PKS on to that kmap() thing is horrible and global PKS is pretty
much the opposite of what PKS wants to achieve. It's disabling
protection systemwide for an unspecified amount of time and for all
contexts.

So instead of trying to make global PKS "work" we really should go and
take a smarter approach.

1) Many kmap() use cases are strictly thread local and the mapped
address is never handed to some other context, which means this can
be replaced with kmap_local() now, which preserves the mapping
accross preemption. PKS just works nicely on top of that.

2) Modify kmap() so that it marks the to be mapped page as 'globaly
unprotected' instead of doing this global unprotect PKS dance.
kunmap() undoes that. That obviously needs some thought
vs. refcounting if there are concurrent users, but that's a
solvable problem either as part of struct page itself or
stored in some global hash.

3) Have PKS modes:

- STRICT: No pardon

- RELAXED: Warn and unprotect temporary for the current context

- SILENT: Like RELAXED, but w/o warning to make sysadmins happy.
Default should be RELAXED.

- OFF: Disable the whole PKS thing


4) Have a smart #PF mechanism which does:

if (error_code & X86_PF_PK) {
page = virt_to_page(address);

if (!page || !page_is_globaly_unprotected(page))
goto die;

if (pks_mode == PKS_MODE_STRICT)
goto die;

WARN_ONCE(pks_mode == PKS_MODE_RELAXED, "Useful info ...");

temporary_unprotect(page, regs);
return;
}

temporary_unprotect(page, regs)
{
key = page_to_key(page);

/* Return from #PF will establish this for the faulting context */
extended_state(regs)->pks &= ~PKS_MASK(key);
}

This temporary unprotect is undone when the context is left, so
depending on the context (thread, interrupt, softirq) the
unprotected section might be way wider than actually needed, but
that's still orders of magnitudes better than having this fully
unrestricted global PKS mode which is completely scopeless.

The above is at least restricted to the pages which are in use for
a particular operation. Stray pointers during that time are
obviously not caught, but that's not any different from that
proposed global thingy.

The warning allows to find the non-obvious places so they can be
analyzed and worked on.

5) The DAX case which you made "work" with dev_access_enable() and
dev_access_disable(), i.e. with yet another lazy approach of
avoiding to change a handful of usage sites.

The use cases are strictly context local which means the global
magic is not used at all. Why does it exist in the first place?

Aside of that this global thing would never work at all because the
refcounting is per thread and not global.

So that DAX use case is just a matter of:

grant/revoke_access(DEV_PKS_KEY, READ/WRITE)

which is effective for the current execution context and really
wants to be a distinct READ/WRITE protection and not the magic
global thing which just has on/off. All usage sites know whether
they want to read or write.

That leaves the question about the refcount. AFAICT, nothing nests
in that use case for a given execution context. I'm surely missing
something subtle here.

Hmm?

Thanks,

tglx