Re: [PATCH 00/12] mm/debug_vm_pgtable: Enhancements

From: Anshuman Khandual
Date: Sun Jul 18 2021 - 02:35:42 EST

On 7/15/21 10:47 AM, Gavin Shan wrote:
> Hi Anshuman,
> On 7/14/21 3:26 PM, Anshuman Khandual wrote:
>> On 7/13/21 6:50 AM, Gavin Shan wrote:
>>> On 7/12/21 2:14 PM, Anshuman Khandual wrote:
>>>> Though I have not jumped into the details for all individual
>>>> patches here but still have some high level questions below.
>>>> On 7/6/21 11:47 AM, Gavin Shan wrote:
>>>>> There are couple of issues with current implementations and this series
>>>>> tries to resolve the issues:
>>>>>     (a) All needed information are scattered in variables, passed to various
>>>>>         test functions. The code is organized in pretty much relaxed fashion.
>>>> All these variables are first prepared in debug_vm_pgtable(), before
>>>> getting passed into respective individual test functions. Also these
>>>> test functions receive only the required number of variables not all.
>>>> Adding a structure that captures all test parameters at once before
>>>> passing them down will be unnecessary. I am still wondering what will
>>>> be the real benefit of this large code churn ?
>>> Thanks for your review. There are couple of reasons to have "struct vm_pgtable_debug".
>>> (1) With the struct, the old and new implementation can coexist. In this way,
>>>      the patches in this series can be stacked up easily.
>> Makes sense.
>>> (2) I think passing single struct to individual test functions improves the
>>>      code readability. Besides, it also makes the empty stubs simplified.
>> Empty stub simplified - reduced argument set in the empty stubs ?
> Yes.
>>> (3) The code can be extended easily if we need in future.
>> Agreed.
>>>>>     (b) The page isn't allocated from buddy during page table entry modifying
>>>>>         tests. The page can be invalid, conflicting to the implementations
>>>>>         of set_{pud, pmd, pte}_at() on ARM64. The target page is accessed
>>>>>         so that the iCache can be flushed when execution permission is given
>>>>>         on ARM64. Besides, the target page can be unmapped and access to
>>>>>         it causes kernel crash.
>>>> Using 'start_kernel' based method for struct page usage, enabled this
>>>> test to run on platforms which might not have enough memory required
>>>> for various individual test functions. This method is not a problem for
>>>> tests that just need an aligned pfn (which creates a page table entry)
>>>> not a real struct page.
>>>> But not allocating and owning the struct page might be problematic for
>>>> tests that expect a real struct page and transform its state via set_
>>>> {pud, pmd, pte}_at() functions as reported here.
>>> Yeah, I totally agree. The series follows what you explained: Except the
>>> test cases where set_{pud, pmd, pte}_at() is used, the allocated page
>>> is used. For other test cases, 'start_kernel' based PFN is used as before.
>>>>> "struct vm_pgtable_debug" is introduced to address issue (a). For issue
>>>>> (b), the used page is allocated from buddy in page table entry modifying
>>>>> tests. The corresponding tets will be skipped if we fail to allocate the
>>>>> (huge) page. For other test cases, the original page around to kernel
>>>>> symbol (@start_kernel) is still used.
>>>> For all basic pfn requiring tests, existing 'start_kernel' based method
>>>> should continue but allocate a struct page for other tests which change
>>>> the passed struct page. Skipping the tests when allocation fails is the
>>>> right thing to do.
>>> Yes, it's exactly what this series does. Hope you can jump into the details
>>> when you get a chance :)
>> I have already started looking into the series. But still wondering if
>> the huge page memory allocation change and the arm64 specific page fix
>> should be completed first, before getting into the new structure based
>> arguments (in a separate series). Although the end result would still
>> remain the same, the transition there would be better I guess. Do you
>> see any challenges in achieving that ?
> Thanks for your time to review in details. As I can understand, the reason
> to have the fix for easy backporting to stable-kernel and I didn't do that
> because of couple of facts: (1) The changes included in this series only
> affects only one source file, so backporting the whole series isn't hard.
> (2) There will be more redundant code if we include the fix before switching
> to "struct vm_pgtable_debug". It's unnecessary.


> So lets keep the patch layout we had if you agree. Actually, the issues are
> found during the testing with RHEL downstream kernel. Once it's settled down,
> I will backport the whole series to RHEL downstream kernel.

Okay, then lets keep this proposed layout and address the issues here.

- Anshuman