[PATCH v4 0/5] fsnotify: fix softlockups iterating over d_subdirs

From: Stephen Brennan
Date: Fri Nov 11 2022 - 17:06:42 EST


Hi Jan, Amir, Al,

Here's my v4 patch series that aims to eliminate soft lockups when updating
dentry flags in fsnotify. I've incorporated Jan's suggestion of simply
allowing the flag to be lazily cleared in the fsnotify_parent() function,
via Amir's patch. This allowed me to drop patch #2 from my previous series
(fsnotify: Protect i_fsnotify_mask and child flags with inode rwsem). I
replaced it with "fsnotify: require inode lock held during child flag
update", patch #5 in this series. I also added "dnotify: move
fsnotify_recalc_mask() outside spinlock" to address the sleep-during-atomic
issues with dnotify.

Jan expressed concerns about lock ordering of the inode rwsem with the
fsnotify group mutex. I built this with lockdep enabled (see below for the
lock debugging .config section -- I'm not too familiar with lockdep so I
wanted a sanity check). I ran all the fanotify, inotify, and dnotify tests
I could find in LTP, with no lockdep splats to be found. I don't know that
this can completely satisfy the concerns about lock ordering: I'm reading
through the code to better understand the concern about "the removal of
oneshot mark during modify event generation". But I'm encouraged by the
LTP+lockdep results.

I also went ahead and did my negative dentry oriented testing. Of course
the fsnotify_parent() issue is fully resolved, and when I tested several
processes all using inotifywait on the same directory full of negative
dentries, I was able to use ftrace to confirm that
fsnotify_update_children_dentry_flags() was called exactly once for all
processes. No softlockups occurred!

I originally wrote this series to make the last patch (#5) optional: if for
some reason we didn't think it was necessary to hold the inode rwsem, then
we could omit it -- the main penalty being the race condition described in
the patch description. I tested without the last patch and LTP passed also
with lockdep enabled, but of course when multiple tasks did an inotifywait
on the same directory (with many negative dentries) only the first waited
for the flag updates, the rest of the tasks immediately returned despite
the flags not being ready.

I agree with Amir that as long as the lock ordering is fine, we should keep
patch #5. And if that's the case, I can reorder the series a bit to make it
a bit more logical, and eliminate logic in
fsnotify_update_children_dentry_flags() for handling d_move/cursor races,
which I promptly delete later in the series.

1. fsnotify: clear PARENT_WATCHED flags lazily
2. fsnotify: Use d_find_any_alias to get dentry associated with inode
3. dnotify: move fsnotify_recalc_mask() outside spinlock
4. fsnotify: require inode lock held during child flag update
5. fsnotify: allow sleepable child flag update

Thanks for continuing to read this series, I hope we're making progress
toward a simpler way to fix these scaling issues!

Stephen

Amir Goldstein (1):
fsnotify: clear PARENT_WATCHED flags lazily

Stephen Brennan (4):
fsnotify: Use d_find_any_alias to get dentry associated with inode
dnotify: move fsnotify_recalc_mask() outside spinlock
fsnotify: allow sleepable child flag update
fsnotify: require inode lock held during child flag update

fs/notify/dnotify/dnotify.c | 28 ++++++---
fs/notify/fsnotify.c | 101 ++++++++++++++++++++++---------
fs/notify/fsnotify.h | 3 +-
fs/notify/mark.c | 40 +++++++++++-
include/linux/fsnotify_backend.h | 8 ++-
5 files changed, 136 insertions(+), 44 deletions(-)

--
2.34.1