Re: [PATCH v1 2/2] mm: don't install PMD mappings when THPs are disabled by the hw/process/vma

From: Ryan Roberts
Date: Fri Oct 11 2024 - 07:37:12 EST


On 11/10/2024 12:33, David Hildenbrand wrote:
> On 11.10.24 13:29, Ryan Roberts wrote:
>> On 11/10/2024 11:24, David Hildenbrand wrote:
>>> We (or rather, readahead logic :) ) might be allocating a THP in the
>>> pagecache and then try mapping it into a process that explicitly disabled
>>> THP: we might end up installing PMD mappings.
>>>
>>> This is a problem for s390x KVM, which explicitly remaps all PMD-mapped
>>> THPs to be PTE-mapped in s390_enable_sie()->thp_split_mm(), before
>>> starting the VM.
>>>
>>> For example, starting a VM backed on a file system with large folios
>>> supported makes the VM crash when the VM tries accessing such a mapping
>>> using KVM.
>>>
>>> Is it also a problem when the HW disabled THP using
>>> TRANSPARENT_HUGEPAGE_UNSUPPORTED? At least on x86 this would be the case
>>> without X86_FEATURE_PSE.
>>>
>>> In the future, we might be able to do better on s390x and only disallow
>>> PMD mappings -- what s390x and likely TRANSPARENT_HUGEPAGE_UNSUPPORTED
>>> really wants. For now, fix it by essentially performing the same check as
>>> would be done in __thp_vma_allowable_orders() or in shmem code, where this
>>> works as expected, and disallow PMD mappings, making us fallback to PTE
>>> mappings.
>>>
>>> Reported-by: Leo Fu <bfu@xxxxxxxxxx>
>>> Fixes: 793917d997df ("mm/readahead: Add large folio readahead")
>>
>> Will this patch be difficult to backport given it depends on the previous patch
>> and that doesn't have a Fixes tag?
>
> "difficult" -- not really. Andrew might want to tag patch #1  with "Fixes:" as
> well, but I can also send simple stable backports that avoid patch #1.
>
> (Thinking again, I assume we want to Cc:stable)
>
>>
>>> Cc: Thomas Huth <thuth@xxxxxxxxxx>
>>> Cc: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
>>> Cc: Ryan Roberts <ryan.roberts@xxxxxxx>
>>> Cc: Christian Borntraeger <borntraeger@xxxxxxxxxxxxx>
>>> Cc: Janosch Frank <frankja@xxxxxxxxxxxxx>
>>> Cc: Claudio Imbrenda <imbrenda@xxxxxxxxxxxxx>
>>> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>
>>> ---
>>>   mm/memory.c | 9 +++++++++
>>>   1 file changed, 9 insertions(+)
>>>
>>> diff --git a/mm/memory.c b/mm/memory.c
>>> index 2366578015ad..a2e501489517 100644
>>> --- a/mm/memory.c
>>> +++ b/mm/memory.c
>>> @@ -4925,6 +4925,15 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct
>>> page *page)
>>>       pmd_t entry;
>>>       vm_fault_t ret = VM_FAULT_FALLBACK;
>>>   +    /*
>>> +     * It is too late to allocate a small folio, we already have a large
>>> +     * folio in the pagecache: especially s390 KVM cannot tolerate any
>>> +     * PMD mappings, but PTE-mapped THP are fine. So let's simply refuse any
>>> +     * PMD mappings if THPs are disabled.
>>> +     */
>>> +    if (thp_disabled_by_hw() || vma_thp_disabled(vma, vma->vm_flags))
>>> +        return ret;
>>
>> Why not just call thp_vma_allowable_orders()?
>
> Why call thp_vma_allowable_orders() that does a lot more work that doesn't
> really apply here? :)

Yeah fair enough, I was just thinking it makes the code simpler to keep all the
checks in one place. But no strong opinion.

Either way:

Reviewed-by: Ryan Roberts <ryan.roberts@xxxxxxx>

>
> I'd say, just like shmem, we handle this separately here.
>