Re: [PATCH 0/3] mm/mmu_notifier, drm/amdgpu: block THP for GPU user mappings
From: David Hildenbrand (Arm)
Date: Thu Jun 25 2026 - 07:47:42 EST
On 6/25/26 12:59, Yitao Jiang wrote:
> Hi,
>
> This series fixes a THP policy problem I found while debugging
> frequent ROCm GPU failures on an AMD Radeon 780M system during ML
> training.
>
> Some AMDGPU/KFD user mappings are registered through interval
> notifiers and cannot safely tolerate the backing VMA changing from base
> pages to a transparent huge page after registration. Userspace can
> still apply MADV_HUGEPAGE or MADV_COLLAPSE, and khugepaged can also
> collapse the range, after the GPU mapping has been registered.
Huh, why? As a memory notifier user, you must be prepared from memory to get
unmapped+remapped at random points in time.
What is the precise problem here? How are you handling THPs at registration time?
Letting arbitrary drivers make THP policies sounds like the very wrong approach.
--
Cheers,
David