Re: [PATCH] powerpc: Save AMR/IAMR when switching tasks

From: Samuel Holland
Date: Wed Oct 26 2022 - 00:27:48 EST


On 9/21/22 00:17, Christophe Leroy wrote:
> Le 21/09/2022 à 05:33, Samuel Holland a écrit :
>> On 9/19/22 07:37, Michael Ellerman wrote:
>>> Christophe Leroy <christophe.leroy@xxxxxxxxxx> writes:
>>>> Le 16/09/2022 à 07:05, Samuel Holland a écrit :
>>>>> With CONFIG_PREEMPT=y (involuntary preemption enabled), it is possible
>>>>> to switch away from a task inside copy_{from,to}_user. This left the CPU
>>>>> with userspace access enabled until after the next IRQ or privilege
>>>>> level switch, when AMR/IAMR got reset to AMR_KU[AE]P_BLOCKED. Then, when
>>>>> switching back to the original task, the userspace access would fault:
>>>>
>>>> This is not supposed to happen. You never switch away from a task
>>>> magically. Task switch will always happen in an interrupt, that means
>>>> copy_{from,to}_user() get interrupted.
>>>
>>> Unfortunately this isn't true when CONFIG_PREEMPT=y.
>>>
>>> We can switch away without an interrupt via:
>>> __copy_tofrom_user()
>>> -> __copy_tofrom_user_power7()
>>> -> exit_vmx_usercopy()
>>> -> preempt_enable()
>>> -> __preempt_schedule()
>>> -> preempt_schedule()
>>> -> preempt_schedule_common()
>>> -> __schedule()
>>>
>>> I do some boot tests with CONFIG_PREEMPT=y, but I realise now those are
>>> all on Power8, which is a bit of an oversight on my part.
>>>
>>> And clearly no one else tests it, until now :)
>>>
>>> I think the root of our problem is that our KUAP lock/unlock is at too
>>> high a level, ie. we do it in C around the low-level copy to/from.
>>>
>>> eg:
>>>
>>> static inline unsigned long
>>> raw_copy_to_user(void __user *to, const void *from, unsigned long n)
>>> {
>>> unsigned long ret;
>>>
>>> allow_write_to_user(to, n);
>>> ret = __copy_tofrom_user(to, (__force const void __user *)from, n);
>>> prevent_write_to_user(to, n);
>>> return ret;
>>> }
>>>
>>> There's a reason we did that, which is that we have various different
>>> KUAP methods on different platforms, not a simple instruction like other
>>> arches.
>>>
>>> But that means we have that exit_vmx_usercopy() being called deep in the
>>> guts of __copy_tofrom_user(), with KUAP disabled, and then we call into
>>> the preempt machinery and eventually schedule.
>>>
>>> I don't see an easy way to fix that "properly", it would be a big change
>>> to all platforms to push the KUAP save/restore down into the low level
>>> asm code.
>>>
>>> But I think the patch below does fix it, although it abuses things a
>>> little. Namely it only works because the 64s KUAP code can handle a
>>> double call to prevent, and doesn't need the addresses or size for the
>>> allow.
>>>
>>> Still I think it might be our best option for an easy fix.
>>>
>>> Samuel, can you try this on your system and check it works for you?
>>
>> It looks like your patch works. Thanks for the correct fix!
>
> Instead of the patch from Michael, could you try by replacing
> preempt_enable() by preempt_enable_no_resched() in exit_vmx_usercopy() ?

I finally got a chance to test this, and the simpler fix of using
preempt_enable_no_resched() works as well.

>> I replaced my patch with the one below, and enabled
>> CONFIG_PPC_KUAP_DEBUG=y, and I was able to do several kernel builds
>> without any crashes or splats in dmesg.
>
> Did you try CONFIG_PPC_KUAP_DEBUG without the patch ? Did it detect any
> problem ?

I believe I did at one point, and it did not detect anything.

Regards,
Samuel