Re: [PATCH] riscv: Define TASK_SIZE_MAX for __access_ok()
From: Samuel Holland
Date: Mon Mar 18 2024 - 17:29:44 EST
Hi Alex,
On 2024-03-18 3:50 PM, Alexandre Ghiti wrote:
> On Wed, Mar 13, 2024 at 7:00 PM Samuel Holland
> <samuel.holland@xxxxxxxxxx> wrote:
>>
>> TASK_SIZE_MAX should be set to the largest userspace address under any
>> runtime configuration. This optimizes the check in __access_ok(), which
>> no longer needs to compute the current value of TASK_SIZE. It is still
>> safe because addresses between TASK_SIZE and TASK_SIZE_MAX are invalid
>> at the hardware level.
>>
>> This removes about half of the references to pgtable_l[45]_enabled.
>>
>> Signed-off-by: Samuel Holland <samuel.holland@xxxxxxxxxx>
>> ---
>>
>> arch/riscv/include/asm/pgtable-64.h | 1 +
>> arch/riscv/include/asm/pgtable.h | 1 +
>> 2 files changed, 2 insertions(+)
>>
>> diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h
>> index b99bd66107a6..a677ef3c0fe2 100644
>> --- a/arch/riscv/include/asm/pgtable-64.h
>> +++ b/arch/riscv/include/asm/pgtable-64.h
>> @@ -17,6 +17,7 @@ extern bool pgtable_l5_enabled;
>> #define PGDIR_SHIFT_L4 39
>> #define PGDIR_SHIFT_L5 48
>> #define PGDIR_SIZE_L3 (_AC(1, UL) << PGDIR_SHIFT_L3)
>> +#define PGDIR_SIZE_L5 (_AC(1, UL) << PGDIR_SHIFT_L5)
>>
>> #define PGDIR_SHIFT (pgtable_l5_enabled ? PGDIR_SHIFT_L5 : \
>> (pgtable_l4_enabled ? PGDIR_SHIFT_L4 : PGDIR_SHIFT_L3))
>> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
>> index 6066822e7396..2032f8ac5fc5 100644
>> --- a/arch/riscv/include/asm/pgtable.h
>> +++ b/arch/riscv/include/asm/pgtable.h
>> @@ -867,6 +867,7 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
>> #ifdef CONFIG_64BIT
>> #define TASK_SIZE_64 (PGDIR_SIZE * PTRS_PER_PGD / 2)
>> #define TASK_SIZE_MIN (PGDIR_SIZE_L3 * PTRS_PER_PGD / 2)
>> +#define TASK_SIZE_MAX (PGDIR_SIZE_L5 * PTRS_PER_PGD / 2)
>>
>> #ifdef CONFIG_COMPAT
>> #define TASK_SIZE_32 (_AC(0x80000000, UL))
>> --
>> 2.43.1
>>
>
> I think you also need to change the check in handle_page_fault() by
> using TASK_SIZE_MAX instead of TASK_SIZE, otherwise the fixup can't
> happen (https://elixir.bootlin.com/linux/latest/source/arch/riscv/mm/fault.c#L273).
It is not necessary to change that check in fault.c unless we expect to handle
exceptions (outside of userspace access routines) for addresses between
TASK_SIZE and TASK_SIZE_MAX. It looks like the call to fixup_exception() [added
in 416721ff05fd ("riscv, mm: Perform BPF exhandler fixup on page fault")] is
only intended to catch null pointer dereferences. So making the change wouldn't
have any functional impact, but it would still be a valid optimization.
> Or I was wondering if it would not be better to do like x86 and use an
> alternative, it would be more correct (even though I believe your
> solution works)
> https://elixir.bootlin.com/linux/latest/source/arch/x86/include/asm/page_64.h#L82.
What would be the benefit of using an alternative? Any access to an address
between TASK_SIZE and TASK_SIZE_MAX is guaranteed to generate a page fault, so
the only benefit I see is returning -EFAULT slightly faster at the cost of
applying a few hundred alternatives at boot. But it's possible I'm missing
something.
Regards,
Samuel