Re: [PATCH] kfence: Avoid stalling work queue task without allocations
From: Dmitry Vyukov
Date: Tue Nov 10 2020 - 09:25:29 EST
On Tue, Nov 10, 2020 at 2:53 PM Marco Elver <elver@xxxxxxxxxx> wrote:
>
> To toggle the allocation gates, we set up a delayed work that calls
> toggle_allocation_gate(). Here we use wait_event() to await an
> allocation and subsequently disable the static branch again. However, if
> the kernel has stopped doing allocations entirely, we'd wait
> indefinitely, and stall the worker task. This may also result in the
> appropriate warnings if CONFIG_DETECT_HUNG_TASK=y.
>
> Therefore, introduce a 1 second timeout and use wait_event_timeout(). If
> the timeout is reached, the static branch is disabled and a new delayed
> work is scheduled to try setting up an allocation at a later time.
>
> Note that, this scenario is very unlikely during normal workloads once
> the kernel has booted and user space tasks are running. It can, however,
> happen during early boot after KFENCE has been enabled, when e.g.
> running tests that do not result in any allocations.
>
> Link: https://lkml.kernel.org/r/CADYN=9J0DQhizAGB0-jz4HOBBh+05kMBXb4c0cXMS7Qi5NAJiw@xxxxxxxxxxxxxx
> Reported-by: Anders Roxell <anders.roxell@xxxxxxxxxx>
> Signed-off-by: Marco Elver <elver@xxxxxxxxxx>
> ---
> mm/kfence/core.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/mm/kfence/core.c b/mm/kfence/core.c
> index 9358f42a9a9e..933b197b8634 100644
> --- a/mm/kfence/core.c
> +++ b/mm/kfence/core.c
> @@ -592,7 +592,11 @@ static void toggle_allocation_gate(struct work_struct *work)
> /* Enable static key, and await allocation to happen. */
> atomic_set(&allocation_gate, 0);
> static_branch_enable(&kfence_allocation_key);
> - wait_event(allocation_wait, atomic_read(&allocation_gate) != 0);
> + /*
> + * Await an allocation. Timeout after 1 second, in case the kernel stops
> + * doing allocations, to avoid stalling this worker task for too long.
> + */
> + wait_event_timeout(allocation_wait, atomic_read(&allocation_gate) != 0, HZ);
I wonder what happens if we get an allocation right when the timeout fires.
Consider, another task already went to the slow path and is about to
wake this task. This task wakes on timeout and subsequently enables
static branch again. Now we can have 2 tasks on the slow path that
both will wake this task. How will it be handled? Can it lead to some
warnings or something?
> /* Disable static key and reset timer. */
> static_branch_disable(&kfence_allocation_key);
> --
> 2.29.2.222.g5d2a92d10f8-goog