Re: [PATCH 1/5] mm: Introduce mm_struct.has_pinned
From: Peter Xu
Date: Tue Sep 22 2020 - 11:17:46 EST
On Mon, Sep 21, 2020 at 04:53:38PM -0700, John Hubbard wrote:
> On 9/21/20 2:17 PM, Peter Xu wrote:
> > (Commit message collected from Jason Gunthorpe)
> >
> > Reduce the chance of false positive from page_maybe_dma_pinned() by keeping
>
> Not yet, it doesn't. :) More:
>
> > track if the mm_struct has ever been used with pin_user_pages(). mm_structs
> > that have never been passed to pin_user_pages() cannot have a positive
> > page_maybe_dma_pinned() by definition. This allows cases that might drive up
> > the page ref_count to avoid any penalty from handling dma_pinned pages.
> >
> > Due to complexities with unpining this trivial version is a permanent sticky
> > bit, future work will be needed to make this a counter.
>
> How about this instead:
>
> Subsequent patches intend to reduce the chance of false positives from
> page_maybe_dma_pinned(), by also considering whether or not a page has
> even been part of an mm struct that has ever had pin_user_pages*()
> applied to any of its pages.
>
> In order to allow that, provide a boolean value (even though it's not
> implemented exactly as a boolean type) within the mm struct, that is
> simply set once and never cleared. This will suffice for an early, rough
> implementation that fixes a few problems.
>
> Future work is planned, to provide a more sophisticated solution, likely
> involving a counter, and *not* involving something that is set and never
> cleared.
This looks good, thanks. Though I think Jason's version is good too (as long
as we remove the confusing sentence, that's the one starting with "mm_structs
that have never been passed... "). Before I drop Jason's version, I think I'd
better figure out what's the major thing we missed so that maybe we can add
another paragraph. E.g., "future work will be needed to make this a counter"
already means "involving a counter, and *not* involving something that is set
and never cleared" to me... Because otherwise it won't be called a counter..
>
> >
> > Suggested-by: Jason Gunthorpe <jgg@xxxxxxxx>
> > Signed-off-by: Peter Xu <peterx@xxxxxxxxxx>
> > ---
> > include/linux/mm_types.h | 10 ++++++++++
> > kernel/fork.c | 1 +
> > mm/gup.c | 6 ++++++
> > 3 files changed, 17 insertions(+)
> >
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index 496c3ff97cce..6f291f8b74c6 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -441,6 +441,16 @@ struct mm_struct {
> > #endif
> > int map_count; /* number of VMAs */
> > + /**
> > + * @has_pinned: Whether this mm has pinned any pages. This can
> > + * be either replaced in the future by @pinned_vm when it
> > + * becomes stable, or grow into a counter on its own. We're
> > + * aggresive on this bit now - even if the pinned pages were
> > + * unpinned later on, we'll still keep this bit set for the
> > + * lifecycle of this mm just for simplicity.
> > + */
> > + int has_pinned;
>
> I think this would be elegant as an atomic_t, and using atomic_set() and
> atomic_read(), which seem even more self-documenting that what you have here.
>
> But it's admittedly a cosmetic point, combined with my perennial fear that
> I'm missing something when I look at a READ_ONCE()/WRITE_ONCE() pair. :)
Yeah but I hope I'm using it right.. :) I used READ_ONCE/WRITE_ONCE explicitly
because I think they're cheaper than atomic operations, (which will, iiuc, lock
the bus).
>
> It's completely OK to just ignore this comment, but I didn't want to completely
> miss the opportunity to make it a tiny bit cleaner to the reader.
This can always become an atomic in the future, or am I wrong? Actually if
we're going to the counter way I feel like it's a must.
Thanks,
--
Peter Xu