Re: [PATCH v5] livepatch: fix race between fork and KLP transition

From: Petr Mladek
Date: Thu Sep 01 2022 - 09:46:04 EST


On Mon 2022-08-08 15:00:19, Rik van Riel wrote:
> The KLP transition code depends on the TIF_PATCH_PENDING and
> the task->patch_state to stay in sync. On a normal (forward)
> transition, TIF_PATCH_PENDING will be set on every task in
> the system, while on a reverse transition (after a failed
> forward one) first TIF_PATCH_PENDING will be cleared from
> every task, followed by it being set on tasks that need to
> be transitioned back to the original code.
>
> However, the fork code copies over the TIF_PATCH_PENDING flag
> from the parent to the child early on, in dup_task_struct and
> setup_thread_stack. Much later, klp_copy_process will set
> child->patch_state to match that of the parent.
>
> However, the parent's patch_state may have been changed by KLP loading
> or unloading since it was initially copied over into the child.
>
> This results in the KLP code occasionally hitting this warning in
> klp_complete_transition:
>
> for_each_process_thread(g, task) {
> WARN_ON_ONCE(test_tsk_thread_flag(task, TIF_PATCH_PENDING));
> task->patch_state = KLP_UNDEFINED;
> }
>
> Set, or clear, the TIF_PATCH_PENDING flag in the child task
> depending on whether or not it is needed at the time
> klp_copy_process is called, at a point in copy_process where the
> tasklist_lock is held exclusively, preventing races with the KLP
> code.
>
> The KLP code does have a few places where the state is changed
> without the tasklist_lock held, but those should not cause
> problems because klp_update_patch_state(current) cannot be
> called while the current task is in the middle of fork,
> klp_check_and_switch_task() which is called under the pi_lock,
> which prevents rescheduling, and manipulation of the patch
> state of idle tasks, which do not fork.
>
> This should prevent this warning from triggering again in the
> future, and close the race for both normal and reverse transitions.
>
> Signed-off-by: Rik van Riel <riel@xxxxxxxxxxx>
> Reported-by: Breno Leitao <leitao@xxxxxxxxxx>
> Reviewed-by: Petr Mladek <pmladek@xxxxxxxx>
> Acked-by: Josh Poimboeuf <jpoimboe@xxxxxxxxxx>
> Fixes: d83a7cb375ee ("livepatch: change to a per-task consistency model")
> Cc: stable@xxxxxxxxxx

The patch has been pushed to livepatching/livepaching.git,
branch for-6.1/fixes.

Best Regards,
Petr