Re: [PATCH v3 1/3] mm: introduce fincore()

From: Dave Hansen
Date: Tue Jul 08 2014 - 18:32:49 EST

On 07/08/2014 01:41 PM, Naoya Horiguchi wrote:
>> > It would only set the first two bytes of a
>> > 256k BMAP buffer since only two pages were encountered in the radix tree.
> Hmm, this example shows me a problem, thanks.
> If the user knows the fd is for 1GB hugetlbfs file, it just prepares
> the 2 bytes buffer, so no problem.
> But if the user doesn't know whether the fd is from hugetlbfs file,
> the user must prepare the large buffer, though only first few bytes
> are used. And the more problematic is that the user could interpret
> the data in buffer differently:
> 1. only the first two 4kB-pages are loaded in the 2GB range,
> 2. two 1GB-pages are loaded.
> So for such callers, fincore() must notify the relevant page size
> in some way on return.
> Returning it via fincore_extra is my first thought but I'm not sure
> if it's elegant enough.

That does limit the interface to being used on a single page size per
call, which doesn't sound too bad since we don't mix page sizes in a
single file. But, you mentioned using this interface along with
/proc/$pid/mem. How would this deal with a process which had two sizes
of pages mapped?

Another option would be to have userspace pass in its desired
granularity. Such an interface could be used to find holes in a file
fairly easily. But, introduces a whole new set of issues, like what
BMAP means if only a part of the granule is in-core, and do you need a
new option to differentiate BMAP_AND vs. BMAP_OR operations.

I honestly think we need to take a step back and enumerate what you're
trying to do here before going any further.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at