Re: [PATCH v3 1/3] mm: introduce fincore()

From: Dave Hansen
Date: Tue Jul 08 2014 - 15:43:08 EST


On 07/08/2014 12:03 PM, Naoya Horiguchi wrote:
>> > The biggest question for me, though, is whether we want to start
>> > designing these per-page interfaces to consider different page sizes, or
>> > whether we're going to just continue to pretend that the entire world is
>> > 4k pages. Using FINCORE_BMAP on 1GB hugetlbfs files would be a bit
>> > silly, for instance.
> I didn't answer this question, sorry.
>
> In my option, hugetlbfs pages should be handled as one hugepage (not as
> many 4kB pages) to avoid lots of meaningless data transfer, as you pointed
> out. And the current patch already works like that.

Just reading the code, I don't see any way that pc_shift gets passed
down in to the do_fincore() loop. I don't see it getting reflected in
to 'nr' or 'nr_pages' in there, and I can't see how:

jump = iter.index - fc->pgstart - nr;

can possibly be right since iter.index is being kept against the offset
in the userspace buffer (4k pages) and 'nr' and fc->pgstart are
essentially done in the huge page size.

If you had a 2-page 1GB-hpage_size() hugetlbfs file, you would only have
two pages in the radix tree, and only two iterations of
radix_tree_for_each_slot(). It would only set the first two bytes of a
256k BMAP buffer since only two pages were encountered in the radix tree.

Or am I reading your code wrong again?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/