Re: [PATCH v1] landlock/tsync: fix null-ptr-deref in cancel_tsync_works()
From: Günther Noack
Date: Fri Mar 06 2026 - 08:45:27 EST
On Fri, Mar 06, 2026 at 05:22:13PM +0800, Jiayuan Chen wrote:
> cancel_tsync_works() iterates over works->works[0..size-1] and calls
> task_work_cancel() on each entry. task_work_cancel() leads to
> task_work_pending(), which dereferences task->task_works. If
> works->works[i]->task is NULL, this causes a null-ptr-deref:
>
> KASAN: null-ptr-deref in range [0x00000000000009a0-0x00000000000009a7]
> RIP: 0010:task_work_pending include/linux/task_work.h:26 [inline]
> RIP: 0010:task_work_cancel_match+0x86/0x250 kernel/task_work.c:124
> RSP: 0018:ffffc90003597ba0 EFLAGS: 00010202
> RAX: 0000000000000134 RBX: 0000000000000000 RCX: ffffc900106b1000
> RDX: 0000000000080000 RSI: ffffffff81d13236 RDI: 0000000000000000
> RBP: 1ffff920006b2f77 R08: 0000000000000007 R09: 0000000000000000
> R10: 0000000000000002 R11: 0000000000000000 R12: ffffffff81d12dd0
> R13: ffff88802c045100 R14: dffffc0000000000 R15: 00000000000009a0
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000110c3ea90c CR3: 0000000037f63000 CR4: 00000000003526f0
> DR0: 0000000000000003 DR1: 00000000000001f8 DR2: 000000000000008e
> DR3: 000000000000057a DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> task_work_cancel+0x23/0x30 kernel/task_work.c:187
> cancel_tsync_works security/landlock/tsync.c:415 [inline]
> landlock_restrict_sibling_threads+0xafe/0x1280 security/landlock/tsync.c:533
> __do_sys_landlock_restrict_self+0x5c9/0x9e0 security/landlock/syscalls.c:574
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0x106/0xf80 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f859b39c629
> RSP: 002b:00007f85991b2028 EFLAGS: 00000246 ORIG_RAX: 00000000000001be
> RAX: ffffffffffffffda RBX: 00007f859b616270 RCX: 00007f859b39c629
> RDX: 0000000000000000 RSI: 000000000000000a RDI: 0000000000000003
> RBP: 00007f859b432b39 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007f859b616308 R14: 00007f859b616270 R15: 00007ffcff084488
>
> The root cause is a race in schedule_task_work(). tsync_works_provide()
> increments works->size and stores the task reference in ctx->task *before*
> task_work_add() is called. A thread can race to call do_exit() in the
> window between the PF_EXITING check and task_work_add(), causing
> task_work_add() to return -ESRCH. The error path then drops the task
> reference and sets ctx->task = NULL, but works->size remains incremented.
> A subsequent call to cancel_tsync_works() iterates up to the stale size
> and passes the NULL task pointer to task_work_cancel().
>
> Fix this by decrementing works->size in the task_work_add() error path,
> so the failed slot is rolled back and cancel_tsync_works() never iterates
> over it. The slot is naturally reused in subsequent iterations since
> tsync_works_provide() always picks works->works[works->size].
>
> As a defensive measure, also add a WARN_ONCE() guard in cancel_tsync_works()
> to catch any future NULL task pointer before dereferencing it.
>
> Fixes: 42fc7e6543f6 ("landlock: Multithreading support for landlock_restrict_self()")
> Reported-by: syzbot+911d99dc200feac03ea6@xxxxxxxxxxxxxxxxxxxxxxxxx
> Closes: https://syzkaller.appspot.com/bug?extid=911d99dc200feac03ea6
> Signed-off-by: Jiayuan Chen <jiayuan.chen@xxxxxxxxx>
> ---
> security/landlock/tsync.c | 15 ++++++++++-----
> 1 file changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/security/landlock/tsync.c b/security/landlock/tsync.c
> index 0d2b9c646030..e6d742529484 100644
> --- a/security/landlock/tsync.c
> +++ b/security/landlock/tsync.c
> @@ -381,14 +381,14 @@ static bool schedule_task_work(struct tsync_works *works,
> err = task_work_add(thread, &ctx->work, TWA_SIGNAL);
> if (err) {
> /*
> - * task_work_add() only fails if the task is about to exit. We
> - * checked that earlier, but it can happen as a race. Resume
> - * without setting an error, as the task is probably gone in the
> - * next loop iteration. For consistency, remove the task from ctx
> - * so that it does not look like we handed it a task_work.
> + * task_work_add() only fails if the task is about to exit.
> + * We checked PF_EXITING earlier, but the thread can race to
> + * exit between that check and task_work_add(). Roll back the
> + * slot so cancel_tsync_works() never sees a NULL task pointer.
> */
> put_task_struct(ctx->task);
> ctx->task = NULL;
> + works->size--;
>
> atomic_dec(&shared_ctx->num_preparing);
> atomic_dec(&shared_ctx->num_unfinished);
> @@ -412,6 +412,11 @@ static void cancel_tsync_works(struct tsync_works *works,
> int i;
>
> for (i = 0; i < works->size; i++) {
> + if (WARN_ONCE(!works->works[i]->task,
> + "landlock: unexpected NULL task in tsync slot %d\n",
> + i))
> + continue;
> +
> if (!task_work_cancel(works->works[i]->task,
> &works->works[i]->work))
> continue;
> --
> 2.43.0
>
Thanks for the patch!
This bug is already fixed on Mickaël's "next" branch.
The code review has happened in
https://lore.kernel.org/all/20260217122341.2359582-1-mic@xxxxxxxxxxx/
—Günther