Re: [RFC] fsnotify: allow sleepable child dentry flag update
From: Stephen Brennan
Date: Mon Oct 17 2022 - 13:00:14 EST
Amir Goldstein <amir73il@xxxxxxxxx> writes:
> On Mon, Oct 17, 2022 at 10:59 AM Stephen Brennan
> <stephen.s.brennan@xxxxxxxxxx> wrote:
>>
>> Amir Goldstein <amir73il@xxxxxxxxx> writes:
[snip]
>> > I think that d_find_any_alias() should be used to obtain
>> > the alias with elevated refcount instead of the awkward
>> > d_u.d_alias iteration loop.
>>
>> D'oh! Much better idea :)
>> Do you think the BUG_ON would still be worthwhile?
>>
>
> Which BUG_ON()?
> In general no, if there are ever more multiple aliases for
> a directory inode, updating dentry flags would be the last
> of our problems.
Sorry, I meant the one in my patch which asserts that the dentry is the
only alias for that inode. I suppose you're right about having bigger
problems in that case -- but the existing code "handles" it by iterating
over the alias list.
>
>> > In the context of __fsnotify_parent(), I think the optimization
>> > should stick with updating the flags for the specific child dentry
>> > that had the false positive parent_watched indication,
>> > leaving the rest of
>>
>> > WOULD that address the performance issues of your workload?
>>
>> I think synchronizing the __fsnotify_update_child_dentry_flags() with a
>> mutex and getting rid of the call from __fsnotify_parent() would go a
>> *huge* way (maybe 80%?) towards resolving the performance issues we've
>> seen. To be clear, I'm representing not one single workload, but a few
>> different customer workloads which center around this area.
>>
>> There are some extreme cases I've seen, where the dentry list is so
>> huge, that even iterating over it once with the parent dentry spinlock
>> held is enough to trigger softlockups - no need for several calls to
>> __fsnotify_update_child_dentry_flags() queueing up as described in the
>> original mail. So ideally, I'd love to try make *something* work with
>> the cursor idea as well. But I think the two ideas can be separated
>> easily, and I can discuss with Al further about if cursors can be
>> salvaged at all.
>>
>
> Assuming that you take the dir inode_lock() in
> __fsnotify_update_child_dentry_flags(), then I *think* that children
> dentries cannot be added to dcache and children dentries cannot
> turn from positive to negative and vice versa.
>
> Probably the only thing that can change d_subdirs is children dentries
> being evicted from dcache(?), so I *think* that once in N children
> if you can dget(child), drop alias->d_lock, cond_resched(),
> and then continue d_subdirs iteration from child->d_child.
This sounds like an excellent idea. I can't think of anything which
would remove a dentry from d_subdirs without the inode lock held.
Cursors wouldn't move without the lock held in read mode. Temporary
dentries from d_alloc_parallel are similar - they need the inode locked
shared in order to be removed from the parent list.
I'll try implementing it (along with the fsnotify changes we've
discussed in this thread). I'll add a BUG_ON after we wake up from
COND_RESCHED() to guarantee that the parent is the same dentry as
expected - just in case the assumption is wrong.
Al - if you've read this far :) - does this approach sound reasonable,
compared to the cursor? I'll send out some concrete patches as soon as
I've implemented and done a few tests on them.
Thanks,
Stephen