Re: [RFC PATCH v1 5/6] powerpc/mm: Add a framework for Kernel Userspace Access Protection

From: Christophe LEROY
Date: Wed Nov 28 2018 - 05:05:54 EST




Le 22/11/2018 Ã 02:25, Russell Currey a ÃcritÂ:
On Wed, 2018-11-21 at 09:32 +0100, Christophe LEROY wrote:

Le 21/11/2018 Ã 03:26, Russell Currey a Ãcrit :
On Wed, 2018-11-07 at 16:56 +0000, Christophe Leroy wrote:
This patch implements a framework for Kernel Userspace Access
Protection.

Then subarches will have to possibility to provide their own
implementation
by providing setup_kuap(), and lock/unlock_user_rd/wr_access

We separate read and write accesses because some subarches like
book3s32 might only support write access protection.

Signed-off-by: Christophe Leroy <christophe.leroy@xxxxxx>

Separating read and writes does have a performance impact, I'm
doing
some benchmarking to find out exactly how much - but at least for
radix
it means we have to do a RMW instead of just a write. It does add
some
amount of security, though.

The other issue I have is that you're just locking everything here
(like I was), and not doing anything different for just reads or
writes. In theory, wouldn't someone assume that they could (for
example) unlock reads, lock writes, then attempt to read? At which
point the read would fail, because the lock actually locks both.

I would think we either need to bundle read/write locking/unlocking
together, or only implement this on platforms that can do one at a
time, unless there's a cleaner way to handle this. Glancing at the
values you use for 8xx, this doesn't seem possible there, and it's
a
definite performance hit for radix.

At the same time, as you say, it would suck for book3s32 that can
only
do writes, but maybe just doing both at the same time and if
implemented for that platform it could just have a warning that it
only
applies to writes on init?

Well, I see your points. My idea was not to separate read and write
on platform that can lock both. I think it is no problem to also
unlocking writes when we are doing a read, so on platforms that can
do
both I think both should do the same..

The idea was to avoid spending time unlocking writes for doing a read
on
platforms on which reads are not locked. And for platforms able to
independently unlock/lock reads and writes, if only unlocking reads
can
improve performance it can be interesting as well.

For book3s/32, locking/unlocking will be done through Kp/Ks bits in
segment registers, the function won't be trivial as it may involve
more
than one segment at a time. So I just wanted to avoid spending time
doing that for reads as reads won't be protected. And may also be
the
case on older book3s/64, may not it ?
On Book3s/32, the page protection bits are as follows:

Key 0 1
PP
00 RW NA
01 RW RO
10 RW RW
11 RO RO

So the idea is to encode user RW with PP01 (instead of PP10 today)
and
user RO with PP11 (as done today), giving Key0 to user and Key1 to
kernel (today both user and kernel have Key1). Then when kernel needs
to
write, we change Ks to Key0 in segment register for the involved
segments.

I'm not sure there is any risk that someone nests unlocks/locks for
reads and unlocks/locks for writes, because the unlocks/locks are
done
in very limited places.

Yeah I don't think it's a risk since the scope is so limited, it just
needs to be clearly documented that locking/unlocking reads/writes
could have the side effect of covering the other. My concern is less
about a problem in practice as much as functions that only don't
exactly do what the function name says.

Another option is to again have a single lock/unlock function that
takes a bitmask (so read/write/both), which due to being a singular
function might be a bit more obvious that it could lock/unlock
everything, but at this point I'm just bikeshedding.

In order to support book3s/32, I needed to add arguments to the unlock/lock functions, as the address is needed to identify the affected segments.

Therefore, I changed it to single functions as you suggested. These functions have 'to', 'from' and 'size' arguments. When it is a read, 'to' is NULL. When it is a write, 'from' is NULL. When it is a copy, none is NULL.

See RFC v2 for the details.

Christophe


Doing it this way should be fine, I'm just cautious that some future
developer might be caught off guard.

Planning on sending my series based on top of yours for radix today.

- Russell


Christophe


Curious for people's thoughts on this.

- Russell