Re: [PATCH] rmap: fix pgoff calculation to handle hugepage correctly

From: Andrew Morton
Date: Mon Jul 07 2014 - 15:39:30 EST


On Wed, 2 Jul 2014 00:30:57 -0400 Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> wrote:

> Subject: [PATCH v2] rmap: fix pgoff calculation to handle hugepage correctly
>
> I triggered VM_BUG_ON() in vma_address() when I try to migrate an anonymous
> hugepage with mbind() in the kernel v3.16-rc3. This is because pgoff's
> calculation in rmap_walk_anon() fails to consider compound_order() only to
> have an incorrect value.
>
> This patch introduces page_to_pgoff(), which gets the page's offset in
> PAGE_CACHE_SIZE. Kirill pointed out that page cache tree should natively
> handle hugepages, and in order to make hugetlbfs fit it, page->index of
> hugetlbfs page should be in PAGE_CACHE_SIZE. This is beyond this patch,
> but page_to_pgoff() contains the point to be fixed in a single function.
>
> ...
>
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -399,6 +399,18 @@ static inline struct page *read_mapping_page(struct address_space *mapping,
> }
>
> /*
> + * Get the offset in PAGE_SIZE.
> + * (TODO: hugepage should have ->index in PAGE_SIZE)
> + */
> +static inline pgoff_t page_to_pgoff(struct page *page)
> +{
> + if (unlikely(PageHeadHuge(page)))
> + return page->index << compound_order(page);
> + else
> + return page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
> +}
> +

This is all a bit of a mess.

We have page_offset() which only works for regular pagecache pages and
not for huge pages.

We have page_file_offset() which works for regular pagecache as well
as swapcache but not for huge pages.

We have page_index() and page_file_index() which differ in undocumented
ways which I cannot be bothered working out. The latter calls
__page_file_index() which is grossly misnamed.

Now we get a new page_to_pgoff() which in inconsistently named but has
a similarly crappy level of documentation and which works for hugepages
and regular pagecache pages but not for swapcache pages.


Sigh.

I'll merge this patch because it's a bugfix but could someone please
drive a truck through all this stuff and see if we can come up with
something tasteful and sane?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/