Re: [PATCH] powerpc: Save AMR/IAMR when switching tasks

From: Christophe Leroy
Date: Sun Sep 18 2022 - 04:21:59 EST




Le 17/09/2022 à 20:38, Samuel Holland a écrit :
> On 9/17/22 03:16, Christophe Leroy wrote:
>> Le 16/09/2022 à 07:05, Samuel Holland a écrit :
>>> With CONFIG_PREEMPT=y (involuntary preemption enabled), it is possible
>>> to switch away from a task inside copy_{from,to}_user. This left the CPU
>>> with userspace access enabled until after the next IRQ or privilege
>>> level switch, when AMR/IAMR got reset to AMR_KU[AE]P_BLOCKED. Then, when
>>> switching back to the original task, the userspace access would fault:
>>
>> This is not supposed to happen. You never switch away from a task
>> magically. Task switch will always happen in an interrupt, that means
>> copy_{from,to}_user() get interrupted.
>
> That makes sense, the interrupt handler is responsible for saving the
> KUAP status. It looks like neither DEFINE_INTERRUPT_HANDLER_RAW nor any
> of its users (performance_monitor_exception(), do_slb_fault()) do that.

As far as I can see, that's done earlier, in exceptions-64s.S.
Look for kuap_save_amr_and_lock.

Now, it may be possible that one of the exceptions pathes misses it.

> Yet they still call one of the interrupt_return variants, which restores
> the status from the stack.
>
>> Whenever an interrupt is taken, kuap_save_amr_and_lock() macro is used
>> to save KUAP status into the stack then lock KUAP access. At interrupt
>> exit, kuap_kernel_restore() macro or function is used to restore KUAP
>> access from the stack. At the time the task switch happens, KUAP access
>> is expected to be locked. During task switch, the stack is switched so
>> the KUAP status is taken back from the new task's stack.
>
> What if another task calls schedule() from kernel process context, and
> the scheduler switches to a task that had been preempted inside
> copy_{from,to}_user()? Then there is no interrupt involved, and I don't
> see where kuap_kernel_restore() would get called.

Yes there is interrupt involved. That task, if it has been preempted
inside copy_from_user(), it must have been through an interrupt, likely
a timer interrupt. So switching back to that task means doing an
interrupt return in the context of that task. That's when KUAP status
should be restored.

>
>> Your fix suggests that there is some path where the KUAP status is not
>> properly saved and/or restored. Did you try running with
>> CONFIG_PPC_KUAP_DEBUG ? It should warn whenever a KUAP access is left
>> unlocked.
>>
>>>
>>> Kernel attempted to write user page (3fff7ab68190) - exploit attempt? (uid: 65536)
>>> ------------[ cut here ]------------
>>> Bug: Write fault blocked by KUAP!
>>> WARNING: CPU: 56 PID: 4939 at arch/powerpc/mm/fault.c:228 ___do_page_fault+0x7b4/0xaa0
>>> CPU: 56 PID: 4939 Comm: git Tainted: G W 5.19.8-00005-gba424747260d #1
>>> NIP: c0000000000555e4 LR: c0000000000555e0 CTR: c00000000079d9d0
>>> REGS: c00000008f507370 TRAP: 0700 Tainted: G W (5.19.8-00005-gba424747260d)
>>> MSR: 9000000000021033 <SF,HV,ME,IR,DR,RI,LE> CR: 28042222 XER: 20040000
>>> CFAR: c000000000123780 IRQMASK: 3
>>> NIP [c0000000000555e4] ___do_page_fault+0x7b4/0xaa0
>>> LR [c0000000000555e0] ___do_page_fault+0x7b0/0xaa0
>>> Call Trace:
>>> [c00000008f507610] [c0000000000555e0] ___do_page_fault+0x7b0/0xaa0 (unreliable)
>>> [c00000008f5076c0] [c000000000055938] do_page_fault+0x68/0x130
>>> [c00000008f5076f0] [c000000000008914] data_access_common_virt+0x194/0x1f0
>>> --- interrupt: 300 at __copy_tofrom_user_base+0x9c/0x5a4
>>
>> ...
>>
>>>
>>> Fix this by saving and restoring the kernel-side AMR/IAMR values when
>>> switching tasks.
>>
>> As explained above, KUAP access should be locked at that time, so saving
>> and restoring it should not have any effect. If it does, it means
>> something goes wrong somewhere else.
>>
>>>
>>> Fixes: 890274c2dc4c ("powerpc/64s: Implement KUAP for Radix MMU")
>>> Signed-off-by: Samuel Holland <samuel@xxxxxxxxxxxx>
>>> ---
>>> I have no idea if this is the right change to make, and it could be
>>> optimized, but my system has been stable with this patch for 5 days now.
>>>
>>> Without the patch, I hit the bug every few minutes when my load average
>>> is <1, and I hit it immediately if I try to do a parallel kernel build.
>>
>> Great, then can you make a try with CONFIG_PPC_KUAP_DEBUG ?
>
> Yes, I will try this out in the next few days.
>

Thanks
Christophe