Re: [PATCH v3 27/34] mm: pagewalk: Add 'depth' parameter to pte_hole

From: Steven Price
Date: Thu Feb 28 2019 - 06:28:09 EST


On 27/02/2019 17:38, Dave Hansen wrote:
> On 2/27/19 9:06 AM, Steven Price wrote:
>> #ifdef CONFIG_SHMEM
>> static int smaps_pte_hole(unsigned long addr, unsigned long end,
>> - struct mm_walk *walk)
>> + __always_unused int depth, struct mm_walk *walk)
>> {
>
> I think this 'depth' argument is a mistake. It's synthetic and it's
> surely going to be a source of bugs.
>
> The page table dumpers seem to be using this to dump out the "name" of a
> hole which seems a bit bogus in the first place. I'd much rather teach
> the dumpers about the length of the hole, "the hole is 0x12340000 bytes
> long", rather than "there's a hole at this level".

I originally started by trying to calculate the 'depth' from (end -
addr), e.g. for arm64:

level = 4 - (ilog2(end - addr) - PAGE_SHIFT) / (PAGE_SHIFT - 3)

However there are two issues that I encountered:

* walk_page_range() takes a range of addresses to walk. This means that
holes at the beginning/end of the range are clamped to the address
range. This particularly shows up at the end of the range as I use ~0ULL
as the end which leads to (~0ULL - addr) which is 1 byte short of the
desired size. Obviously that particular corner-case is easy to work
round, but it seemed fragile.

* The above definition for arm64 isn't correct in all cases. You need to
account for things like CONFIG_PGTABLE_LEVELS. Other architectures also
have various quirks in their page tables.

I guess I could try something like:

static int get_level(unsigned long addr, unsigned long end)
{
/* Add 1 to account for ~0ULL */
unsigned long size = (end - addr) + 1;
if (size < PMD_SIZE)
return 4;
else if (size < PUD_SIZE)
return 3;
else if (size < P4D_SIZE)
return 2;
else if (size < PGD_SIZE)
return 1;
return 0;
}

There are two immediate problems with that:

* The "+1" to deal with ~0ULL is fragile

* PGD_SIZE isn't what you might expect, it's not defined for most
architectures and arm64/x86 use it as the size of the PGD table.
Although that's easy enough to fix up.

Do you think a function like above would be preferable?

The other option would of course be to just drop the information from
the debugfs file about at which level the holes are. But it can be
useful information to see whether there are empty levels in the page
table structure. Although this is an area where x86 and arm64 differ
currently (x86 explicitly shows the gaps, arm64 doesn't), so if x86
doesn't mind losing that functionality that would certainly simplify things!

Thanks,

Steve