Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers
From: Qian Cai
Date: Mon Jan 27 2020 - 22:33:25 EST
> On Jan 27, 2020, at 10:06 PM, Anshuman Khandual <anshuman.khandual@xxxxxxx> wrote:
>
>
>
> On 01/28/2020 07:41 AM, Qian Cai wrote:
>>
>>
>>> On Jan 27, 2020, at 8:28 PM, Anshuman Khandual <Anshuman.Khandual@xxxxxxx> wrote:
>>>
>>> This adds tests which will validate architecture page table helpers and
>>> other accessors in their compliance with expected generic MM semantics.
>>> This will help various architectures in validating changes to existing
>>> page table helpers or addition of new ones.
>>>
>>> This test covers basic page table entry transformations including but not
>>> limited to old, young, dirty, clean, write, write protect etc at various
>>> level along with populating intermediate entries with next page table page
>>> and validating them.
>>>
>>> Test page table pages are allocated from system memory with required size
>>> and alignments. The mapped pfns at page table levels are derived from a
>>> real pfn representing a valid kernel text symbol. This test gets called
>>> right after page_alloc_init_late().
>>>
>>> This gets build and run when CONFIG_DEBUG_VM_PGTABLE is selected along with
>>> CONFIG_VM_DEBUG. Architectures willing to subscribe this test also need to
>>> select CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE which for now is limited to x86 and
>>> arm64. Going forward, other architectures too can enable this after fixing
>>> build or runtime problems (if any) with their page table helpers.
>
> Hello Qian,
>
>>
>> Whatâs the value of this block of new code? It only supports x86 and arm64
>> which are supposed to be good now.
>
> We have been over the usefulness of this code many times before as the patch is
> already in it's V12. Currently it is enabled on arm64, x86 (except PAE), arc and
> ppc32. There are build time or runtime problems with other archs which prevent
I am not sure if I care too much about arc and ppc32 which are pretty much legacy
platforms.
> enablement of this test (for the moment) but then the goal is to integrate all
> of them going forward. The test not only validates platform's adherence to the
> expected semantics from generic MM but also helps in keeping it that way during
> code changes in future as well.
Another option maybe to get some decent arches on board first before merging this
thing, so it have more changes to catch regressions for developers who might run this.
>
>> Did those tests ever find any regression or this is almost only useful for new
>
> The test has already found problems with s390 page table helpers.
Hmm, that is pretty weak where s390 is not even official supported with this version.
>
>> architectures which only happened once in a few years?
>
> Again, not only it validates what exist today but its also a tool to make
> sure that all platforms continue adhere to a common agreed upon semantics
> as reflected through the tests here.
>
>> The worry if not many people will use this config and code those that much in
>
> Debug features or tests in the kernel are used when required. These are never or
> should not be enabled by default. AFAICT this is true even for entire DEBUG_VM
> packaged tests. Do you have any particular data or precedence to substantiate
> the fact that this test will be used any less often than the other similar ones
> in the tree ? I can only speak for arm64 platform but the very idea for this
> test came from Catalin when we were trying to understand the semantics for THP
> helpers while enabling THP migration without split. Apart from going over the
> commit messages from the past, there were no other way to figure out how any
> particular page table helper is suppose to change given page table entry. This
> test tries to formalize those semantics.
I am thinking about how we made so many mistakes before by merging too many of
those debugging options that many of them have been broken for many releases
proving that nobody actually used them regularly. We donât need to repeat the same
mistake again. I am actually thinking about to remove things like page_poisoning often
which is almost are never found any bug recently and only cause pains when interacting
with other new features that almost nobody will test them together to begin with.
We even have some SLUB debugging code sit there for almost 15 years that almost
nobody used it and maintainers refused to remove it.
>
>> the future because it is inefficient to find bugs, it will simply be rotten
> Could you be more specific here ? What parts of the test are inefficient ? I
> am happy to improve upon the test. Do let me know you if you have suggestions.
>
>> like a few other debugging options out there we have in the mainline that
> will be a pain to remove later on.
>>
>
> Even though I am not agreeing to your assessment about the usefulness of the
> test without any substantial data backing up the claims, the test case in
> itself is very much compartmentalized, staying clear from generic MM and
> debug_vm_pgtable() is only function executing the test which is getting
> called from kernel_init_freeable() path.
I am thinking exactly the other way around. You are proposing to merge this tests
without proving how useful it will be able to find regressions for future developers
to make sure it will actually get used.