Re: [syzbot] [kernel?] WARNING in task_work_add

From: Waiman Long
Date: Wed Oct 23 2024 - 19:47:02 EST


On 10/23/24 2:47 PM, Waiman Long wrote:


On 10/22/24 3:08 PM, Peter Zijlstra wrote:
On Tue, Oct 22, 2024 at 04:06:47PM +0200, Frederic Weisbecker wrote:
Adding scheduler people in Cc.

Thanks.

Le Mon, Oct 21, 2024 at 09:54:38PM -0700, syzbot a écrit :
Hello,

syzbot found the following issue on:

HEAD commit: 9ec59cb3edc7 KVM: arm64: Shave a few bytes from the EL2 id..
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
console output:https://syzkaller.appspot.com/x/log.txt?x=17061430580000
kernel config:https://syzkaller.appspot.com/x/.config?x=c154e2d4db830898
dashboard link:https://syzkaller.appspot.com/bug?extid=4abde9163a953b8a0fd0
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
userspace arch: arm64

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image:https://storage.googleapis.com/syzbot-assets/fc9a7d36d46a/disk-9ec59cb3.raw.xz
vmlinux:https://storage.googleapis.com/syzbot-assets/30547ddd681e/vmlinux-9ec59cb3.xz
kernel image:https://storage.googleapis.com/syzbot-assets/5c4e02d0f97a/Image-9ec59cb3.gz.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by:syzbot+4abde9163a953b8a0fd0@xxxxxxxxxxxxxxxxxxxxxxxxx

------------[ cut here ]------------
WARNING: CPU: 1 PID: 1 at arch/arm64/kernel/stacktrace.c:223 kunwind_next_frame_record_meta arch/arm64/kernel/stacktrace.c:216 [inline]
WARNING: CPU: 1 PID: 1 at arch/arm64/kernel/stacktrace.c:223 kunwind_next_frame_record arch/arm64/kernel/stacktrace.c:248 [inline]
WARNING: CPU: 1 PID: 1 at arch/arm64/kernel/stacktrace.c:223 kunwind_next arch/arm64/kernel/stacktrace.c:278 [inline]
WARNING: CPU: 1 PID: 1 at arch/arm64/kernel/stacktrace.c:223 do_kunwind arch/arm64/kernel/stacktrace.c:309 [inline]
WARNING: CPU: 1 PID: 1 at arch/arm64/kernel/stacktrace.c:223 kunwind_stack_walk arch/arm64/kernel/stacktrace.c:380 [inline]
WARNING: CPU: 1 PID: 1 at arch/arm64/kernel/stacktrace.c:223 arch_stack_walk+0x458/0x48c arch/arm64/kernel/stacktrace.c:404
Modules linked in:
CPU: 1 UID: 0 PID: 1 Comm: init Not tainted 6.12.0-rc3-syzkaller-g9ec59cb3edc7 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : kunwind_next_frame_record_meta arch/arm64/kernel/stacktrace.c:216 [inline]
pc : kunwind_next_frame_record arch/arm64/kernel/stacktrace.c:248 [inline]
pc : kunwind_next arch/arm64/kernel/stacktrace.c:278 [inline]
pc : do_kunwind arch/arm64/kernel/stacktrace.c:309 [inline]
pc : kunwind_stack_walk arch/arm64/kernel/stacktrace.c:380 [inline]
pc : arch_stack_walk+0x458/0x48c arch/arm64/kernel/stacktrace.c:404
lr : 0x0
sp : ffff8000800176a0
x29: ffff800080017750 x28: 1ffff00010002f58 x27: 00000000ffff8d68
x26: dfff800000000000 x25: ffff0000c2c588c0 x24: dfff800000000000
x23: ffff700010002ef0 x22: ffff800080017850 x21: ffff8000800176b8
x20: ffff800080462114 x19: ffff8000800177a0 x18: dfff800000000000
x17: ffff800123f21000 x16: ffff80008b490b1c x15: 0000000000000001
x14: 1fffe000366c806a x13: ffff800097807ff0 x12: ffff800097808000
x11: 0000000000000000 x10: ffff0000c1978000 x9 : ffff800097807e9f
x8 : ffff800097807fd8 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000040 x4 : fffffffffffffff0 x3 : 0000000000000000
x2 : ffff0000c1978000 x1 : ffff800080029c40 x0 : 0000000000000001
Call trace:
kunwind_next_frame_record_meta arch/arm64/kernel/stacktrace.c:216 [inline] (P)
kunwind_next_frame_record arch/arm64/kernel/stacktrace.c:248 [inline] (P)
kunwind_next arch/arm64/kernel/stacktrace.c:278 [inline] (P)
do_kunwind arch/arm64/kernel/stacktrace.c:309 [inline] (P)
kunwind_stack_walk arch/arm64/kernel/stacktrace.c:380 [inline] (P)
arch_stack_walk+0x458/0x48c arch/arm64/kernel/stacktrace.c:404 (P)
0x0 (L)
stack_trace_save+0xfc/0x1a0 kernel/stacktrace.c:122
kasan_save_stack+0x40/0x6c mm/kasan/common.c:47
__kasan_record_aux_stack+0xd0/0xec mm/kasan/generic.c:541
kasan_record_aux_stack+0x14/0x20 mm/kasan/generic.c:546
task_work_add+0xb8/0x464 kernel/task_work.c:66
task_tick_mm_cid kernel/sched/core.c:10468 [inline]
sched_tick+0x2a8/0x404 kernel/sched/core.c:5605
I'm guessing this is very close to what 73ab05aa46b0 ("sched/core:
Disable page allocation in task_tick_mm_cid()") does. Initial version of
that patch was more aggressive and killed off the whole KASAN thing
instead of just the page-alloc.

The stack trace is a bit suspicious. Commit 73ab05aa46b0 modifies task_tick_mm_cid() and task_work_add() to make it call kasan_record_aux_stack_noalloc() not kasan_record_aux_stack() in this particular sequence. So I wonder if this commit is in this particular instance of the git tree at all.

This is a 6.12.0-rc3 kernel which doesn't have commit 73ab05aa46b0 yet.

Thanks, Longman