Re: [RFC 4/9] gfs2: Fix mmap + page fault deadlocks (part 1)

From: Andreas Gruenbacher
Date: Wed Jun 02 2021 - 07:16:47 EST


On Tue, Jun 1, 2021 at 8:00 AM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Mon, May 31, 2021 at 7:01 AM Andreas Gruenbacher <agruenba@xxxxxxxxxx> wrote:
> >
> > Fix that by recognizing the self-recursion case.
>
> Hmm. I get the feeling that the self-recursion case should never have
> been allowed to happen in the first place.
>
> IOW, is there some reason why you can't make the user accesses always
> be done with page faults disabled (ie using the "atomic" user space
> access model), and then if you get a partial read (or write) to user
> space, at that point you drop the locks in read/write, do the "try to
> make readable/writable" and try again.
>
> IOW, none of this "detect recursion" thing. Just "no recursion in the
> first place".
>
> That way you'd not have these odd rules at fault time at all, because
> a fault while holding a lock would never get to the filesystem at all,
> it would be aborted early. And you'd not have any odd "inner/outer"
> locks, or lock compatibility rules or anything like that. You'd
> literally have just "oh, I didn't get everything at RW time while I
> held locks, so let's drop the locks, try to access user space, and
> retry".

Well, iomap_file_buffered_write() does that by using
iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic() as
in iomap_write_actor(), but the read and direct I/O side doesn't seem
to have equivalents. I suspect we can't just wrap
generic_file_read_iter() and iomap_dio_rw() calls in
pagefault_disable().

> Wouldn't that be a lot simpler and more robust?

Sure, with vfs primitives that support atomic user-space access and
with a iov_iter_fault_in_writeable() like operation, we could do that.

> Because what if the mmap is something a bit more complex, like
> overlayfs or userfaultfd, and completing the fault isn't about gfs2
> handling it as a "fault", but about some *other* entity calling back
> to gfs2 and doing a read/write instead? Now all your "inner/outer"
> lock logic ends up being entirely pointless, as far as I can tell, and
> you end up deadlocking on the lock you are holding over the user space
> access _anyway_.

Yes, those kinds of deadlocks would still be possible.

Until we have a better solution, wouldn't it make sense to at least
prevent those self-recursion deadlocks? I'll send a separate pull
request in case you find that acceptable.

Thanks,
Andreas