Re: [PATCH v2] powerpc, pkey: make protection key 0 less special
From: Thiago Jung Bauermann
Date: Fri Apr 06 2018 - 16:45:08 EST
Ram Pai <linuxram@xxxxxxxxxx> writes:
> On Wed, Apr 04, 2018 at 06:41:01PM -0300, Thiago Jung Bauermann wrote:
>>
>> Hello Ram,
>>
>> Ram Pai <linuxram@xxxxxxxxxx> writes:
>>
>> > Applications need the ability to associate an address-range with some
>> > key and latter revert to its initial default key. Pkey-0 comes close to
>> > providing this function but falls short, because the current
>> > implementation disallows applications to explicitly associate pkey-0 to
>> > the address range.
>> >
>> > Lets make pkey-0 less special and treat it almost like any other key.
>> > Thus it can be explicitly associated with any address range, and can be
>> > freed. This gives the application more flexibility and power. The
>> > ability to free pkey-0 must be used responsibily, since pkey-0 is
>> > associated with almost all address-range by default.
>> >
>> > Even with this change pkey-0 continues to be slightly more special
>> > from the following point of view.
>> > (a) it is implicitly allocated.
>> > (b) it is the default key assigned to any address-range.
>>
>> It's also special in more ways (and if intentional, these should be part
>> of the commit message as well):
>>
>> (c) it's not possible to change permissions for key 0
>>
>> This has two causes: this patch explicitly forbids it in
>> arch_set_user_pkey_access(), and also because even if it's allocated,
>> the bits for key 0 in AMOR and UAMOR aren't set.
>
> Yes. will have to capture that one aswell.
>
> we cannot let userspace change permissions on key 0 because
> doing so will hurt the kernel too. Unlike x86 where keys are effective
> only in userspace, powerpc keys are effective even in the kernel.
> So if the kernel tries to access something, it will get stuck forever.
> Almost everything in the kernel is associated with key-0.
>
> I ran a small test program which disabled access on key 0 from
> userspace, and as expected ran into softlockups. It certainly
> can lead to denial-of-service-attack. We can let apps
> shoot-itself-in-its-foot but if the shot hurts someone else, we will
> have to stop it.
Ah, I wasn't aware of that. We definitely shouldn't let userspace change
key 0 permissions then. :-) It would be good to have this information in
a comment in the code somewhere, IMHO.
> The key-semantics discussed with the x86 folks did not
> explicitly say anything about changing permissions on key-0. We will
> have to keep that part of the semantics open-ended.
>
>>
>> (d) it can be freed, but can't be allocated again later.
>>
>> This is because mm_pkey_alloc() only calls __arch_activate_pkey(ret)
>> if ret > 0.
>>
>> It looks like (d) is a bug. Either mm_pkey_free() should fail with key
>> 0, or mm_pkey_alloc() should work with it.
>
> Well, it can be allocated, just that we do not let userspace change the
> permissions on the key. __arch_activate_pkey(ret) does not get called
> for pkey-0.
Yes, now I see how the allocated-but-not-enabled state makes sense.
Considering that the kernel always uses key 0, it's unworkable on
powerpc for an app to free pkey 0. In that case I think we should reject
it in mm_pkey_free().
>> (c) could be a measure to prevent users from shooting themselves in
>> their feet. But if that is the case, then mm_pkey_free() should forbid
>> freeing key 0 too.
>>
>> > Tested on powerpc.
>> >
>> > cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>> > cc: Dave Hansen <dave.hansen@xxxxxxxxx>
>> > cc: Michael Ellermen <mpe@xxxxxxxxxxxxxx>
>> > cc: Ingo Molnar <mingo@xxxxxxxxxx>
>> > cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>> > Signed-off-by: Ram Pai <linuxram@xxxxxxxxxx>
>> > ---
>> > History:
>> > v2: mm_pkey_is_allocated() continued to treat pkey-0 special.
>> > fixed it.
>> >
>> > arch/powerpc/include/asm/pkeys.h | 20 ++++++++++++++++----
>> > arch/powerpc/mm/pkeys.c | 20 ++++++++++++--------
>> > 2 files changed, 28 insertions(+), 12 deletions(-)
>> >
>> > diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
>> > index 0409c80..b598fa9 100644
>> > --- a/arch/powerpc/include/asm/pkeys.h
>> > +++ b/arch/powerpc/include/asm/pkeys.h
>> > @@ -101,10 +101,14 @@ static inline u16 pte_to_pkey_bits(u64 pteflags)
>> >
>> > static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey)
>> > {
>> > - /* A reserved key is never considered as 'explicitly allocated' */
>> > - return ((pkey < arch_max_pkey()) &&
>> > - !__mm_pkey_is_reserved(pkey) &&
>> > - __mm_pkey_is_allocated(mm, pkey));
>> > + if (pkey < 0 || pkey >= arch_max_pkey())
>> > + return false;
>> > +
>> > + /* Reserved keys are never allocated. */
>> > + if (__mm_pkey_is_reserved(pkey))
>> > + return false;
>> > +
>> > + return __mm_pkey_is_allocated(mm, pkey);
>> > }
>> >
>> > extern void __arch_activate_pkey(int pkey);
>> > @@ -200,6 +204,14 @@ static inline int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
>> > {
>> > if (static_branch_likely(&pkey_disabled))
>> > return -EINVAL;
>> > +
>> > + /*
>> > + * userspace is discouraged from changing permissions of
>> > + * pkey-0.
>>
>> They're not discouraged. They're not allowed to. :-)
>
> ok :-)
>
>>
>> > + * powerpc hardware does not support it anyway.
>>
>> It doesn't? I don't get that impression from reading the ISA, but
>> perhaps I'm missing something.
>
> Good Catch. I am wrongly blaming it on powerpc hardware.
> Its a semantics enforced by our pkey code to block DOS attacks.
I think this would be a good place to explain why we can't allow
userspace to change permissions on key 0.
>
>>
>> > + */
>> > + if (!pkey)
>> > + return init_val ? -EINVAL : 0;
>> > +
>> > return __arch_set_user_pkey_access(tsk, pkey, init_val);
>> > }
>> >
>> > diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
>> > index ba71c54..e7a9e34 100644
>> > --- a/arch/powerpc/mm/pkeys.c
>> > +++ b/arch/powerpc/mm/pkeys.c
>> > @@ -119,19 +119,21 @@ int pkey_initialize(void)
>> > #else
>> > os_reserved = 0;
>> > #endif
>> > - /*
>> > - * Bits are in LE format. NOTE: 1, 0 are reserved.
>> > - * key 0 is the default key, which allows read/write/execute.
>> > - * key 1 is recommended not to be used. PowerISA(3.0) page 1015,
>> > - * programming note.
>> > - */
>> > + /* Bits are in LE format. */
>> > initial_allocation_mask = ~0x0;
>> >
>> > /* register mask is in BE format */
>> > pkey_amr_uamor_mask = ~0x0ul;
>> > pkey_iamr_mask = ~0x0ul;
>> >
>> > - for (i = 2; i < (pkeys_total - os_reserved); i++) {
>> > + for (i = 0; i < (pkeys_total - os_reserved); i++) {
>> > + /*
>>
>> There's a space between the tabs here.
>
> ok. will fix.
>
>>
>> > + * key 1 is recommended not to be used.
>> > + * PowerISA(3.0) page 1015,
>> > + */
>> > + if (i == 1)
>> > + continue;
>> > +
>> > initial_allocation_mask &= ~(0x1 << i);
>> > pkey_amr_uamor_mask &= ~(0x3ul << pkeyshift(i));
>> > pkey_iamr_mask &= ~(0x1ul << pkeyshift(i));
>> > @@ -145,7 +147,9 @@ void pkey_mm_init(struct mm_struct *mm)
>> > {
>> > if (static_branch_likely(&pkey_disabled))
>> > return;
>> > - mm_pkey_allocation_map(mm) = initial_allocation_mask;
>> > +
>> > + /* allocate key-0 by default */
>> > + mm_pkey_allocation_map(mm) = initial_allocation_mask | 0x1;
>> > /* -1 means unallocated or invalid */
>> > mm->context.execute_only_pkey = -1;
>> > }
>>
>> I think we should also set the AMOR and UAMOR bits for key 0. Otherwise,
>> key 0 will be in allocated-but-not-enabled state which is yet another
>> subtle way in which it will be special.
>
> No. as explained above, it will hurt to let userspace modify
> permissions on key-0.
Indeed. Thanks for explaining and even double-checking that it's indeed
the case.
>>
>> Also, pkey_access_permitted() has a special case for key 0. Should it?
>
> we can delete that check. though it does not hurt to leave it in place.
> Access/Write/Execute on pkey-0 is always permitted.
It doesn't hurt, but it's redundant since the is_pkey_enabled() call
right below will have the same effect.
Actually I just changed my mind because I see now that the early exit
prevents the need to read the UAMOR (which is done by
is_pkey_enabled()). Since this function is called by page table code
it's probably worth avoiding the additional SPR access.
--
Thiago Jung Bauermann
IBM Linux Technology Center