Re: [PATCH 09/15] mm: Allow large pages to be added to the page cache

From: Kirill A. Shutemov
Date: Thu Sep 26 2019 - 10:23:01 EST


On Tue, Sep 24, 2019 at 05:52:08PM -0700, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@xxxxxxxxxxxxx>
>
> We return -EEXIST if there are any non-shadow entries in the page
> cache in the range covered by the large page. If there are multiple
> shadow entries in the range, we set *shadowp to one of them (currently
> the one at the highest index). If that turns out to be the wrong
> answer, we can implement something more complex. This is mostly
> modelled after the equivalent function in the shmem code.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
> ---
> mm/filemap.c | 37 ++++++++++++++++++++++++++-----------
> 1 file changed, 26 insertions(+), 11 deletions(-)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index bab97addbb1d..afe8f5d95810 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -855,6 +855,7 @@ static int __add_to_page_cache_locked(struct page *page,
> int huge = PageHuge(page);
> struct mem_cgroup *memcg;
> int error;
> + unsigned int nr = 1;
> void *old;
>
> VM_BUG_ON_PAGE(!PageLocked(page), page);
> @@ -866,31 +867,45 @@ static int __add_to_page_cache_locked(struct page *page,
> gfp_mask, &memcg, false);
> if (error)
> return error;
> + xas_set_order(&xas, offset, compound_order(page));
> + nr = compound_nr(page);
> }
>
> - get_page(page);
> + page_ref_add(page, nr);
> page->mapping = mapping;
> page->index = offset;
>
> do {
> + unsigned long exceptional = 0;
> + unsigned int i = 0;
> +
> xas_lock_irq(&xas);
> - old = xas_load(&xas);
> - if (old && !xa_is_value(old))
> + xas_for_each_conflict(&xas, old) {
> + if (!xa_is_value(old))
> + break;
> + exceptional++;
> + if (shadowp)
> + *shadowp = old;
> + }
> + if (old)
> xas_set_err(&xas, -EEXIST);

This made me confused.

Do we rely on 'old' to be NULL if the loop has completed without 'break'?
It's not very obvious.

Can we have a comment or call xas_set_err() within the loop next to the
'break'?

> - xas_store(&xas, page);
> + xas_create_range(&xas);
> if (xas_error(&xas))
> goto unlock;
>
> - if (xa_is_value(old)) {
> - mapping->nrexceptional--;
> - if (shadowp)
> - *shadowp = old;
> +next:
> + xas_store(&xas, page);
> + if (++i < nr) {
> + xas_next(&xas);
> + goto next;
> }
> - mapping->nrpages++;
> + mapping->nrexceptional -= exceptional;
> + mapping->nrpages += nr;
>
> /* hugetlb pages do not participate in page cache accounting */
> if (!huge)
> - __inc_node_page_state(page, NR_FILE_PAGES);
> + __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES,
> + nr);

We also need to bump NR_FILE_THPS here.

> unlock:
> xas_unlock_irq(&xas);
> } while (xas_nomem(&xas, gfp_mask & GFP_RECLAIM_MASK));
> @@ -907,7 +922,7 @@ static int __add_to_page_cache_locked(struct page *page,
> /* Leave page->index set: truncation relies upon it */
> if (!huge)
> mem_cgroup_cancel_charge(page, memcg, false);
> - put_page(page);
> + page_ref_sub(page, nr);
> return xas_error(&xas);
> }
> ALLOW_ERROR_INJECTION(__add_to_page_cache_locked, ERRNO);
> --
> 2.23.0
>
>

--
Kirill A. Shutemov