Re: [GIT PULL] f2fs for 5.18

From: Jaegeuk Kim
Date: Wed Mar 23 2022 - 17:21:15 EST


On 03/23, Linus Torvalds wrote:
> On Wed, Mar 23, 2022 at 9:26 AM Jaegeuk Kim <jaegeuk@xxxxxxxxxx> wrote:
> >
> > OTOH, I was suspecting the major contetion would be
> > f2fs_lock_op -> f2fs_down_read(&sbi->cp_rwsem);
> > , which was used for most of filesystem operations.
>
> Very possible, I was just looking at a random one in f2fs/file.c
> obviously with no actual numbers in hand.
>
> In general, I really hate seeing specialized locks, but this f2fs use
> case is in some ways worse than other ad-hoc locks I've seen - simply
> because it's been one whole-sale conversion of "down_read/write()" to
> "f2fs_down_read/write()" - regardless of _which_ lock is being locked.
>
> (Now, it's not all bad news - in other respects it's much better than
> some ad-hoc locking: at least you still will participate in lockdep,
> and you use the actual low-level locking primitives instead of making
> up your own and getting memory ordering wrong).
>
> But basically I think it would have been much nicer if you would have
> done this for just the _one_ lock that mattered, whichever lock that
> might be. Partly as documentation, and partly so that maybe some day
> you can split that lock up (or maybe notice cases where you can avoid
> it entirely).
>
> For example, if it's really just f2fs_lock_op() that needs this, the
> special "wait_event(trylock)" hack could have been entirely local to
> just *that*, rather than affecting all the other locks too.
>
> And the very first f2fs_lock_op() case I find, I see that the lock is
> pointless. Again, that's unlikely to be the *cause* of any of these
> problems, but the fact that I've now looked at two of the f2fs locks,
> and gone "the locking seems to be pointlessly badly done" does imply
> that the problem isn't "down_read()", it's the use.
>
> That other lock I reacted to was the f2fs_lock_op(sbi) at the top of
> f2fs_new_inode().
>
> Look, you have a new inode that you just allocated, that nobody else
> can yet access.
>
> And the only thing that that f2fs_lock_op(sbi) -> f2fs_unlock_op(sbi)
> sequence protects is the f2fs_alloc_nid() for that new inode.
>
> Ok, so maybe f2fs_alloc_nid() needs that lock?
>
> No it doesn't. It already has
>
> - &nm_i->nid_list_lock spinlock for its own in-memory internal NID caches
>
> *and* when that fails
>
> - &NM_I(sbi)->build_lock for protecting all of f2fs_build_free_nids()
>
> *and* inside of that lock
>
> - f2fs_down_read(&nm_i->nat_tree_lock) for protecting the NAT tree structures.
>
> So I see two major issues in the very first user of that
> f2fs_lock_op() that I look at:
>
> (a) it seems to be entirely unnecessary

Actually, when I took a look at the above path, indeed, f2fs_lock_op in
f2fs_new_inode may be unnecessary at all aligned to your points. Even, that
might hurt performance since we get f2fs_lock_op twice before dealing with
dentries like f2fs_add_link. Let me test a bit whether there's any regression
if I remove f2fs_lock_op in f2fs_new_inode.

>
> (b) it is a classic case of "multiple nested locks".
>
> Now, it's possible that I'm wrong on (a) and there's some odd reason
> that lock is needed (maybe there is a lock ordering problem for one of
> the other locks between readers and writers, and the op-lock acts as a
> mutual exclusion for that).
>
> But (b) really is a classic problem case for locking: nested locks are
> *much* more likely to cause horrible contention, because not any
> contention in any of the locks will end up affecting the others (and
> you easily get "bunching up" of different processes when they get
> synchronized with each other thanks to the inner lock).
>
> Nested locking is often required, but it's one of those things where
> you just need to be aware that they can be horribly bad for
> performance, _particularly_ if an inner lock sees contention and
> essentially "transfers" that contention to an outer lock.
>
> Maybe I've been unlucky. Maybe the two cases I happened to look at
> were just completely harmless, and very unusual. But the fact that I'm
> two-for-two and go "that locking looks like a prime candidate to be
> fixed" makes me suspect there's a lot of low-hanging fruit in there.

Thank you so much for taking the time to write this great advise. Let me dig
more whether there's anything that I can relax the lock use-cases further.
(tbh, I haven't reviewed them for a long time due to focusing on stability
issues mostly.)

>
> And that whole "wait_event(trylock)" thing is a symptom of problematic
> f2fs locking, rather than a solution to it.

Understood. If I can avoid lock contention upfront, definitely it wouldn't
need to apply rwsem change at all. Let me take some time to think about how to
move forward.

>
> Linus