Re: [PATCH v10 08/33] mm: Add folio_try_get_rcu

From: Matthew Wilcox
Date: Sat Jun 05 2021 - 00:27:42 EST


On Thu, May 27, 2021 at 09:16:42AM +0100, Christoph Hellwig wrote:
> On Tue, May 11, 2021 at 10:47:10PM +0100, Matthew Wilcox (Oracle) wrote:
> > +static inline bool folio_ref_try_add_rcu(struct folio *folio, int count)
>
> Should this have a __ prefix and/or a don't use direct comment?

I think it will get used directly ... its page counterpart is:

mm/gup.c: if (unlikely(!page_cache_add_speculative(head, refs)))

I deliberately left kernel-doc off this function so it's not described,
but described folio_try_get_rcu() in excruciating detail. I hope that's
enough. There's no comment on page_cache_add_speculative() today, so
again, we're status quo.

> > +{
> > +#ifdef CONFIG_TINY_RCU
> > + /*
> > + * The caller guarantees the folio will not be freed from interrupt
> > + * context, so (on !SMP) we only need preemption to be disabled
> > + * and TINY_RCU does that for us.
> > + */
> > +# ifdef CONFIG_PREEMPT_COUNT
> > + VM_BUG_ON(!in_atomic() && !irqs_disabled());
> > +# endif
>
> VM_BUG_ON(IS_ENABLED(CONFIG_PREEMPT_COUNT) &&
> !in_atomic() && !irqs_disabled());
>
> ?

I'm just moving it over, and honestly, I think it's slightly clearer
this way. We can't check it if PREEMPT_COUNT isn't enabled, and I
think that's expressed better by the ifdef than the IS_ENABLED().

> > + VM_BUG_ON_FOLIO(folio_ref_count(folio) == 0, folio);
> > + folio_ref_add(folio, count);
> > +#else
> > + if (unlikely(!folio_ref_add_unless(folio, count, 0))) {
> > + /* Either the folio has been freed, or will be freed. */
> > + return false;
> > + }
> > +#endif
> > + return true;
>
> but is this tiny rcu optimization really worth it? I guess we're just
> preserving it from the existing code and don't rock the boat..

I wondered that myself. It's been there since Nick introduced it in
2008 with commit e286781d5f2e. We certainly cared about small systems
more then, but apparently we still care about UP enough to maintain
CONFIG_TINY_RCU, so maybe this optimisation is still relevant.

> > @@ -1746,6 +1746,26 @@ pgoff_t page_cache_prev_miss(struct address_space *mapping,
> > }
> > EXPORT_SYMBOL(page_cache_prev_miss);
> >
> > +/*
> > + * Lockless page cache protocol:
> > + * On the lookup side:
> > + * 1. Load the folio from i_pages
> > + * 2. Increment the refcount if it's not zero
> > + * 3. If the folio is not found by xas_reload(), put the refcount and retry
> > + *
> > + * On the removal side:
> > + * A. Freeze the page (by zeroing the refcount if nobody else has a reference)
> > + * B. Remove the page from i_pages
> > + * C. Return the page to the page allocator
> > + *
> > + * This means that any page may have its reference count temporarily
> > + * increased by a speculative page cache (or fast GUP) lookup as it can
> > + * be allocated by another user before the RCU grace period expires.
> > + * Because the refcount temporarily acquired here may end up being the
> > + * last refcount on the page, any page allocation must be freeable by
> > + * put_folio().
> > + */
> > +
> > /*
> > * mapping_get_entry - Get a page cache entry.
> > * @mapping: the address_space to search
>
> Is this really a good place for the comment? I'd expect it either near
> a relevant function or at the top of a file.

It's right before mapping_get_entry() which is the main lookup function
for the page cache, so I think it meets your first criteria?