Re: [PATCH 5/6] powerpc/mmu: drop mmap_sem now that locked_vm is atomic

From: Jason Gunthorpe
Date: Wed Apr 24 2019 - 07:10:32 EST


On Tue, Apr 23, 2019 at 07:15:44PM -0700, Davidlohr Bueso wrote:
> On Wed, 03 Apr 2019, Daniel Jordan wrote:
>
> > On Wed, Apr 03, 2019 at 06:58:45AM +0200, Christophe Leroy wrote:
> > > Le 02/04/2019 Ã 22:41, Daniel Jordan a ÃcritÂ:
> > > > With locked_vm now an atomic, there is no need to take mmap_sem as
> > > > writer. Delete and refactor accordingly.
> > >
> > > Could you please detail the change ?
> >
> > Ok, I'll be more specific in the next version, using some of your language in
> > fact. :)
> >
> > > It looks like this is not the only
> > > change. I'm wondering what the consequences are.
> > >
> > > Before we did:
> > > - lock
> > > - calculate future value
> > > - check the future value is acceptable
> > > - update value if future value acceptable
> > > - return error if future value non acceptable
> > > - unlock
> > >
> > > Now we do:
> > > - atomic update with future (possibly too high) value
> > > - check the new value is acceptable
> > > - atomic update back with older value if new value not acceptable and return
> > > error
> > >
> > > So if a concurrent action wants to increase locked_vm with an acceptable
> > > step while another one has temporarily set it too high, it will now fail.
> > >
> > > I think we should keep the previous approach and do a cmpxchg after
> > > validating the new value.
>
> Wouldn't the cmpxchg alternative also be exposed the locked_vm changing between
> validating the new value and the cmpxchg() and we'd bogusly fail even when there
> is still just because the value changed (I'm assuming we don't hold any locks,
> otherwise all this is pointless).
>
> current_locked = atomic_read(&mm->locked_vm);
> new_locked = current_locked + npages;
> if (new_locked < lock_limit)
> if (cmpxchg(&mm->locked_vm, current_locked, new_locked) == current_locked)
> /* ENOMEM */

Well it needs a loop..

again:
current_locked = atomic_read(&mm->locked_vm);
new_locked = current_locked + npages;
if (new_locked < lock_limit)
if (cmpxchg(&mm->locked_vm, current_locked, new_locked) != current_locked)
goto again;

So it won't have bogus failures as there is no unwind after
error. Basically this is a load locked/store conditional style of
locking pattern.

> > That's a good idea, and especially worth doing considering that an arbitrary
> > number of threads that charge a low amount of locked_vm can fail just because
> > one thread charges lots of it.
>
> Yeah but the window for this is quite small, I doubt it would be a real issue.
>
> What if before doing the atomic_add_return(), we first did the racy new_locked
> check for ENOMEM, then do the speculative add and cleanup, if necessary. This
> would further reduce the scope of the window where false ENOMEM can occur.
>
> > pinned_vm appears to be broken the same way, so I can fix it too unless someone
> > beats me to it.
>
> This should not be a surprise for the rdma folks. Cc'ing Jason nonetheless.

I think we accepted this tiny race as a side effect of removing the
lock, which was very beneficial. Really the time window between the
atomic failing and unwind is very small, and there are enough other
ways a hostile user could DOS locked_vm that I don't think it really
matters in practice..

However, the cmpxchg seems better, so a helper to implement that would
probably be the best thing to do.

Jason