Re: [PATCH RFC 7/7] libfs: Re-arrange locking in offset_iterate_dir()

From: Matthew Wilcox
Date: Fri Feb 16 2024 - 10:59:34 EST


On Fri, Feb 16, 2024 at 11:15:46AM +0100, Jan Kara wrote:
> On Thu 15-02-24 16:07:42, Liam R. Howlett wrote:
> > The limitations are outlined in the documentation as to how and when to
> > lock. I'm not familiar with the xarray users, but it does check for
> > locking with lockdep, but the way this is written bypasses the lockdep
> > checking as the locks are taken and dropped without the proper scope.
> >
> > If you feel like this is a trap, then maybe we need to figure out a new
> > plan to detect incorrect use?
>
> OK, I was a bit imprecise. What I wanted to say is that this is a shift in
> the paradigm in the sense that previously, we mostly had (and still have)
> data structure APIs (lists, rb-trees, radix-tree, now xarray) that were
> guaranteeing that unless you call into the function to mutate the data
> structure it stays intact.

hm, no. The radix tree never guaranteed that to you; I just documented
that it wasn't guaranteed for the XArray.

> Now maple trees are shifting more in a direction
> of black-box API where you cannot assume what happens inside. Which is fine
> but then we have e.g. these iterators which do not quite follow this
> black-box design and you have to remember subtle details like calling
> "mas_pause()" before unlocking which is IMHO error-prone. Ideally, users of
> the black-box API shouldn't be exposed to the details of the internal
> locking at all (but then the performance suffers so I understand why you do
> things this way). Second to this ideal variant would be if we could detect
> we unlocked the lock without calling xas_pause() and warn on that. Or maybe
> xas_unlock*() should be calling xas_pause() automagically and we'd have
> similar helpers for RCU to do the magic for you?

If you're unlocking the lock that protects a data structure while still
using that data structure, you should always be aware that you're doing
something very dangerous! It's no different from calling inode_unlock()
inside a filesystem. Sure, you can do it, but you'd better be ready to
deal with the consequences.

The question is, do we want to be able to defragment slabs or not?
My thinking is "yes", for objects where we can ensure there are no
current users (at least after an RCU grace period), we want to be able
to move them. That does impose certain costs (and subtleties), but just
like fast-GUP and lockless page-cache, I think it's worth doing.

Of course, we don't have slab defragmentation yet, so we're not getting
any benefit from this. The most recent attempt was in 2019:
https://lore.kernel.org/linux-mm/20190603042637.2018-1-tobin@xxxxxxxxxx/
but there were earlier attepts in 2017:
https://lore.kernel.org/linux-mm/20171227220636.361857279@xxxxxxxxx/
and 2008:
https://lore.kernel.org/linux-mm/20080216004526.763643520@xxxxxxx/

so I have to believe it's something we want, just haven't been able to
push across the "merge this now" line.