Re: [PATCH 0/2] page_count can't be used to decide when wp_page_copy

From: Linus Torvalds
Date: Sat Jan 09 2021 - 14:53:08 EST


On Sat, Jan 9, 2021 at 11:33 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>
> On Thu, Jan 07, 2021 at 01:05:19PM -0800, Linus Torvalds wrote:
> > Side note, and not really related to UFFD, but the mmap_sem in
> > general: I was at one point actually hoping that we could make the
> > mmap_sem a spinlock, or at least make the rule be that we never do any
> > IO under it. At which point a write lock hopefully really shouldn't be
> > such a huge deal.
>
> There's a (small) group of us working towards that. It has some
> prerequisites, but where we're hoping to go currently:
>
> - Replace the vma rbtree with a b-tree protected with a spinlock
> - Page faults walk the b-tree under RCU, like peterz/laurent's SPF patchset
> - If we need to do I/O, take a refcount on the VMA
>
> After that, we can gradually move things out from mmap_sem protection
> to just the vma tree spinlock, or whatever makes sense for them. In a
> very real way the mmap_sem is the MM layer's BKL.

Well, we could do the "no IO" part first, and keep the semaphore part.

Some people actually prefer a semaphore to a spinlock, because it
doesn't end up causing preemption issues.

As long as you don't do IO (or memory allocations) under a semaphore
(ok, in this case it's a rwsem, same difference), it might even be
preferable to keep it as a semaphore rather than as a spinlock.

So it doesn't necessarily have to go all the way - we _could_ just try
something like "when taking the mmap_sem, set a thread flag" and then
have a "warn if doing allocations or IO under that flag".

And since this is about performance, not some hard requirement, it
might not even matter if we catch all cases. If we fix it so that any
regular load on most normal filesystems never see the warning, we'd
already be golden.

Of course, I think we've had issues with rw_sems for _other_ reasons.
Waiman actually removed the reader optimistic spinning because it
caused bad interactions with mixed reader-writer loads.

So rwsemapores may end up not working as well as spinlocks if the
common situation is "just wait a bit, you'll get it".

Linus