Re: [RFC][PATCH 7/7] livepatch,context_tracking: Avoid disturbing NOHZ_FULL tasks

From: Petr Mladek
Date: Thu Sep 23 2021 - 09:14:52 EST


On Wed 2021-09-22 13:05:13, Peter Zijlstra wrote:
> When a task is stuck in NOHZ_FULL usermode, we can simply mark the
> livepatch state complete.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> ---
> kernel/livepatch/transition.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
> --- a/kernel/livepatch/transition.c
> +++ b/kernel/livepatch/transition.c
> @@ -270,13 +270,24 @@ static int klp_check_task(struct task_st
> {
> int ret;
>
> - if (task_curr(task))
> + if (task_curr(task)) {
> + if (context_tracking_state_cpu(task_cpu(task)) == CONTEXT_USER) {
> + /*
> + * If we observe the CPU being in USER context, they
> + * must issue an smp_mb() before doing much kernel
> + * space and as such will observe the patched state,
> + * mark it clean.
> + */
> + goto complete;

IMHO, this is not safe:

CPU0 CPU1

klp_check_task(A)
if (context_tracking_state_cpu(task_cpu(task)) == CONTEXT_USER)
goto complete;

clear_tsk_thread_flag(task, TIF_PATCH_PENDING);

# task switching to kernel space
klp_update_patch_state(A)
if (test_and_clear_tsk_thread_flag(task, TIF_PATCH_PENDING))
//false

# calling kernel code with old task->patch_state

task->patch_state = klp_target_state;

BANG: CPU0 sets task->patch_state when task A is already running
kernel code on CPU1.

> + }
> return -EBUSY;
> + }
>
> ret = klp_check_stack(task, arg);
> if (ret)
> return ret;
>
> +complete:
> clear_tsk_thread_flag(task, TIF_PATCH_PENDING);
> task->patch_state = klp_target_state;

A solution might be to switch ordering and add a barrier here.

> return 0;


The code might look like:

static int klp_check_task(struct task_struct *task, void *arg)
{
int ret;

if (task_curr(task)) {
if (context_tracking_state_cpu(task_cpu(task)) == CONTEXT_USER) {
/*
* Task running in USER mode might get switched
* immediately. They are switched when entering
* kernel code anyway.
*/
goto complete;
}
return -EBUSY;
}

ret = klp_check_stack(task, arg);
if (ret)
return ret;

complete:
WRITE_ONCE(task->patch_state, klp_target_state);
/*
* We switch also tasks running in USER mode here. They must
* see the new state before clearing the pending flag.
* Otherwise, they might enter kernel mode without switching
* the state in klp_update_patch_state().
*/
smp_wmb();
clear_tsk_thread_flag(task, TIF_PATCH_PENDING);

return 0;
}

The only problem is that the corresponding read barrier is not clear.
It will make more sense if it is paired with some read barrier
in the scheduler after handling TIF flags.

But we should be on the safe side because klp_ftrace_handler() always
does read barrier before reading the state. Though, it is done
there from other reasons.

Best Regards,
Petr