Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek

From: Al Viro
Date: Thu Apr 04 2024 - 18:01:29 EST


On Thu, Apr 04, 2024 at 12:33:40PM +0300, Amir Goldstein wrote:

> This specifically cannot happen because sysfs is not allowed as an
> upper layer only as a lower layer, so overlayfs itself will not be writing to
> /sys/power/resume.

Then how could you possibly get a deadlock there? What would your minimal
deadlocked set look like?

1. Something is blocked in lookup_bdev() called from resume_store(), called
from sysfs_kf_write(), called from kernfs_write_iter(), which has acquired
->mutex of struct kernfs_open_file that had been allocated by
kernfs_fop_open() back when the file had been opened. Note that each
struct file instance gets a separate struct kernfs_open_file. Since we are
calling ->write_iter(), the file *MUST* have been opened for write.

2. Something is blocked in kernfs_fop_llseek() on the same of->mutex,
i.e. using the same struct file as (1). That something is holding an
overlayfs inode lock, which is what the next thread is blocked on.

+ at least one more thread, to complete the cycle.

Right? How could that possibly happen without overlayfs opening /sys/power/resume
for write? Again, each struct file instance gets a separate of->mutex;
for a deadlock you need a cycle of threads and a cycle of locks, such
that each thread is holding the corresponding lock and is blocked on
attempt to get the lock that comes next in the cyclic order.

If overlayfs never writes to that sucker, it can't participate in that
cycle. Sure, you can get overlayfs llseek grabbing of->mutex of *ANOTHER*
struct file opened for the same sysfs file. Since it's not the same
struct file and since each struct file there gets a separate kernfs_open_file
instance, the mutex won't be the same.

Unless I'm missing something else, that can't deadlock. For a quick and
dirty experiment, try to give of->mutex on r/o opens a class separate from
that on r/w and w/o opens (mutex_init() in kernfs_fop_open()) and see
if lockdep warnings persist.

Something like

if (has_mmap)
mutex_init(&of->mutex);
else if (file->f_mode & FMODE_WRITE)
mutex_init(&of->mutex);
else
mutex_init(&of->mutex);

circa fs/kernfs/file.c:642.