overlayfs: WARN_ONCE(Can't encode file handler for inotify: 255)
From: Dmitry Safonov
Date: Tue Dec 17 2024 - 19:23:41 EST
Hi Amir and Miklos, linux-unionfs,
On v6.9.0 kernel we stepped over the WARN_ON() in show_mark_fhandle():
> ------------[ cut here ]------------
> Can't encode file handler for inotify: 255
> WARNING: CPU: 0 PID: 11136 at fs/notify/fdinfo.c:55 show_mark_fhandle+0xfa/0x110
> CPU: 0 PID: 11136 Comm: lsof Kdump: loaded Tainted: P W O 6.9.0 #1
> RIP: 0010:show_mark_fhandle+0xfa/0x110
> Code: 00 00 00 5b 41 5c 5d e9 44 21 97 00 80 3d 0d af 99 01 00 75 d8 89 ce 48 c7 c7 68 ad 4a 82 c6 05 fb ae 99 01 01 e8 f6 98 cc ff <0f> 0b eb bf e8 4d 29 96 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 90
...
> Call Trace:
> <TASK>
> inotify_show_fdinfo+0x124/0x170
> seq_show+0x188/0x1f0
> seq_read_iter+0x115/0x4a0
> seq_read+0xf9/0x130
> vfs_read+0xb6/0x330
> ksys_read+0x6b/0xf0
> __do_fast_syscall_32+0x80/0x110
> do_fast_syscall_32+0x37/0x80
> entry_SYSCALL_compat_after_hwframe+0x75/0x75
> RIP: 0023:0xf7f93569
it later reproduced on v6.12.0. With some debug, it was narrowed down
to the way overlayfs encodes file handlers in ovl_encode_fh(). It
seems that currently it calculates them with the help of dentries.
Straight away from that, the reproducer becomes an easy drop_caches +
lsof (which parses procfs and finds some pid(s) that utilize inotify,
reading their correspondent fdinfo(s)).
So, my questions are: is a dentry actually needed for
ovl_dentry_to_fid()? Can't it just encode fh based on an inode? It
seems that the only reason it "needs" a dentry is to find the origin
layer in ovl_check_encode_origin(), is it so?
I guess, the potential solution here would be either to populate the
dentry back (likely racy and ugh) or just encode file handles based on
the lower-inode? IIUC, old file handles will become stale anyway after
dropping the caches?
As a rare visitor to fs code, not sure I described the problem or
understood it well enough.
FWIW, reverting commit 16aac5ad1fa9 ("ovl: support encoding
non-decodable file handles") seems to "fix" the warning locally (not
considering it a long-term fix).
Thanks,
Dmitry