Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions

From: Dave Chinner
Date: Wed Jan 16 2019 - 17:51:39 EST


On Wed, Jan 16, 2019 at 09:50:16AM -0500, Jerome Glisse wrote:
> On Wed, Jan 16, 2019 at 03:34:55PM +1100, Dave Chinner wrote:
> > On Tue, Jan 15, 2019 at 09:23:12PM -0500, Jerome Glisse wrote:
> > > On Tue, Jan 15, 2019 at 06:01:09PM -0800, Dan Williams wrote:
> > > > On Tue, Jan 15, 2019 at 5:56 PM Jerome Glisse <jglisse@xxxxxxxxxx> wrote:
> > > > > On Tue, Jan 15, 2019 at 04:44:41PM -0800, John Hubbard wrote:
> > > > [..]
> > > > > To make it clear.
> > > > >
> > > > > Lock code:
> > > > > GUP()
> > > > > ...
> > > > > lock_page(page);
> > > > > if (PageWriteback(page)) {
> > > > > unlock_page(page);
> > > > > wait_stable_page(page);
> > > > > goto retry;
> > > > > }
> > > > > atomic_add(page->refcount, PAGE_PIN_BIAS);
> > > > > unlock_page(page);
> > > > >
> > > > > test_set_page_writeback()
> > > > > bool pinned = false;
> > > > > ...
> > > > > pinned = page_is_pin(page); // could be after TestSetPageWriteback
> > > > > TestSetPageWriteback(page);
> > > > > ...
> > > > > return pinned;
> > > > >
> > > > > Memory barrier:
> > > > > GUP()
> > > > > ...
> > > > > atomic_add(page->refcount, PAGE_PIN_BIAS);
> > > > > smp_mb();
> > > > > if (PageWriteback(page)) {
> > > > > atomic_add(page->refcount, -PAGE_PIN_BIAS);
> > > > > wait_stable_page(page);
> > > > > goto retry;
> > > > > }
> > > > >
> > > > > test_set_page_writeback()
> > > > > bool pinned = false;
> > > > > ...
> > > > > TestSetPageWriteback(page);
> > > > > smp_wmb();
> > > > > pinned = page_is_pin(page);
> > > > > ...
> > > > > return pinned;
> > > > >
> > > > >
> > > > > One is not more complex than the other. One can contend, the other
> > > > > will _never_ contend.
> > > >
> > > > The complexity is in the validation of lockless algorithms. It's
> > > > easier to reason about locks than barriers for the long term
> > > > maintainability of this code. I'm with Jan and John on wanting to
> > > > explore lock_page() before a barrier-based scheme.
> > >
> > > How is the above hard to validate ?
> >
> > Well, if you think it's so easy, then please write the test cases so
> > we can add them to fstests and make sure that we don't break it in
> > future.
> >
> > If you can't write filesystem test cases that exercise these race
> > conditions reliably, then the answer to your question is "it is
> > extremely hard to validate" and the correct thing to do is to start
> > with the simple lock_page() based algorithm.
> >
> > Premature optimisation in code this complex is something we really,
> > really need to avoid.
>
> Litmus test shows that this never happens, i am attaching 2 litmus
> test one with barrier and one without. Without barrier we can see
> the double negative !PageWriteback in GUP and !page_pinned() in
> test_set_page_writeback() (0:EAX = 0; 1:EAX = 0; below)

That's not a regression test, nor does it actually test the code
that the kernel runs. It's just an extremely simplified model of
a small part of the algorithm. Sure, that specific interaction is
fine, but that in no way reflects the complexity of the code or the
interactions with other code that interacts with that state. And
it's not something we can use to detect that some future change has
broken gup vs writeback synchronisation.

Memory barriers might be fast, but they are hell for anyone but the
person who wrote the algorithm to understand. If a simple page lock
is good enough and doesn't cause performance problems, then use the
simple page lock mechanism and stop trying to be clever.

Other people who will have to understand and debug issues in this
code are much less accepting of such clever algorithms. We've
been badly burnt on repeated occasions by broken memory barriers in
code heavily optimised for performance (*cough* rwsems *cough*), so
I'm extremely wary of using memory ordering dependent algorithms in
places where they are not necessary.

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx