Re: [PATCH v3] kcov: don't generate a warning on vm_insert_page()'s failure

From: Marco Elver
Date: Fri Apr 15 2022 - 04:25:15 EST


On Thu, Apr 14, 2022 at 02:24PM -0700, Andrew Morton wrote:
> On Fri, 1 Apr 2022 18:25:12 +0000 Aleksandr Nogikh <nogikh@xxxxxxxxxx> wrote:
>
> > vm_insert_page()'s failure is not an unexpected condition, so don't do
> > WARN_ONCE() in such a case.
> >
> > Instead, print a kernel message and just return an error code.
>
> (hm, I thought I asked this before but I can't find it)
>
> Under what circumstances will this failure occur?

It looks like syzbot was able to generate an OOM situation:

| [ 599.515700][T23028] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=syz1,mems_allowed=0-1,oom_memcg=/syz1,task_memcg=/syz1,task=syz-executor.1,pid=23028,uid=0
| [ 599.537757][T23028] Memory cgroup out of memory: Killed process 23028 (syz-executor.1) total-vm:56816kB, anon-rss:436kB, file-rss:8888kB, shmem-rss:48kB, UID:0 pgtables:88kB oom_score_adj:1000
| [ 599.615664][T23028] ------------[ cut here ]------------
| [ 599.652858][T23028] vm_insert_page() failed
| [ 599.662598][T23028] WARNING: CPU: 0 PID: 23028 at kernel/kcov.c:479 kcov_mmap+0xbe/0xe0
| [ 599.900577][T23028] Modules linked in:
| [ 599.904480][T23028] CPU: 1 PID: 23028 Comm: syz-executor.1 Tainted: G W 5.17.0-syzkaller-12964-gccaff3d56acc #0
| [ 599.956099][T23028] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
| [ 600.092674][T23028] RIP: 0010:kcov_mmap+0xbe/0xe0
| [ 600.097559][T23028] Code: 48 81 c3 00 10 00 00 49 39 dc 77 c9 31 c0 5b 5d 41 5c 41 5d 41 5e c3 48 c7 c7 e9 4a 5b 8b c6 05 5a fc 28 0c 01 e8 bd c6 a0 07 <0f> 0b eb d2 4c 89 f7 e8 66 28 e8 07 b8 ea ff ff ff eb d1 66 66 2e
| [ 600.117319][T23028] RSP: 0018:ffffc9000c1cfc30 EFLAGS: 00010282
| [ 600.135794][T23028] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
| [ 600.163986][T23028] RDX: ffff888051f40000 RSI: ffffffff815fce18 RDI: fffff52001839f78
| [ 600.188615][T23028] RBP: ffff88804fc6e210 R08: 0000000000000000 R09: 0000000000000000
| [ 600.196616][T23028] R10: ffffffff815f77ee R11: 0000000000000000 R12: 0000000000200000
| [ 600.214229][T23028] R13: ffff8880646c2500 R14: ffff8880646c2508 R15: ffff88804fc6e260
| [ 600.252864][T23028] FS: 00005555570e4400(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
| [ 600.283249][T23028] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
| [ 600.335749][T23028] CR2: 0000001b2c436000 CR3: 000000004ef16000 CR4: 00000000003506f0
| [ 600.390781][T23028] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
| [ 600.430312][T23028] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
| [ 600.441698][T23028] Call Trace:
| [ 600.447877][T23028] <TASK>
| [ 600.451890][T23028] mmap_region+0xba5/0x14a0
| [ 600.486043][T23028] do_mmap+0x863/0xfa0
| [ 600.490544][T23028] vm_mmap_pgoff+0x1b7/0x290
| [ 600.505607][T23028] ksys_mmap_pgoff+0x40d/0x5a0
| [ 600.522165][T23028] do_syscall_64+0x35/0x80
| [ 600.526655][T23028] entry_SYSCALL_64_after_hwframe+0x44/0xae
| [ 600.532936][T23028] RIP: 0033:0x7f5be4889092
| [ 600.537407][T23028] Code: 00 00 00 00 00 0f 1f 00 41 f7 c1 ff 0f 00 00 75 27 55 48 89 fd 53 89 cb 48 85 ff 74 3b 41 89 da 48 89 ef b8 09 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 66 5b 5d c3 0f 1f 00 48 c7 c0 b8 ff ff ff 64
| [ 600.560042][T23028] RSP: 002b:00007fffde76b318 EFLAGS: 00000246 ORIG_RAX: 0000000000000009
| [ 600.569079][T23028] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f5be4889092
| [ 600.577107][T23028] RDX: 0000000000000003 RSI: 0000000000200000 RDI: 0000000000000000
| [ 600.587064][T23028] RBP: 0000000000000000 R08: 00000000000000db R09: 0000000000000000
| [ 600.596119][T23028] R10: 0000000000000001 R11: 0000000000000246 R12: 00007f5be499c1dc
| [ 600.604977][T23028] R13: 0000000000000003 R14: 00007f5be499c1d0 R15: 0000000000000032
| [ 600.613026][T23028] </TASK>
| [ 600.616066][T23028] Kernel panic - not syncing: panic_on_warn set ...

> Why do we emit a message at all? What action can the user take upon
> seeing the message?

The message is mainly for the benefit of the test log, in this case the
fuzzer's log so that humans inspecting the log can figure out what was
going on. KCOV is a testing tool, so I think being a little more chatty
when KCOV unexpectedly is about to fail will save someone debugging
time.

We don't want the WARN, because it's not a kernel bug that syzbot should
report, and failure can happen if the fuzzer tries hard enough (as
above).

> Do we have a Fixes: for this?

The WARN was moved with b3d7fe86fbd0 ("kcov: properly handle subsequent
mmap calls"), so that'd be the only commit a backport would cleanly
apply to.

> From the info provided thus far I'm unable to determine whether a
> -stable backport is needed. What are your thoughts on this?

The main problem is it only makes fuzzers try to report this as a bug
(which it is not). Backporting to kernels that have b3d7fe86fbd0 would
be reasonable, but wouldn't bother creating backports for older kernels.

Thanks,
-- Marco