Re: [v2 PATCH] mm: thp: fix false negative of shmem vma's THP eligibility

From: Yang Shi
Date: Mon May 06 2019 - 19:38:36 EST




On 4/28/19 12:13 PM, Yang Shi wrote:


On 4/23/19 10:52 AM, Michal Hocko wrote:
On Wed 24-04-19 00:43:01, Yang Shi wrote:
The commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each
vma") introduced THPeligible bit for processes' smaps. But, when checking
the eligibility for shmem vma, __transparent_hugepage_enabled() is
called to override the result from shmem_huge_enabled(). It may result
in the anonymous vma's THP flag override shmem's. For example, running a
simple test which create THP for shmem, but with anonymous THP disabled,
when reading the process's smaps, it may show:

7fc92ec00000-7fc92f000000 rw-s 00000000 00:14 27764 /dev/shm/test
Size:ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 4096 kB
...
[snip]
...
ShmemPmdMapped:ÂÂÂÂ 4096 kB
...
[snip]
...
THPeligible:ÂÂÂ 0

And, /proc/meminfo does show THP allocated and PMD mapped too:

ShmemHugePages:ÂÂÂÂ 4096 kB
ShmemPmdMapped:ÂÂÂÂ 4096 kB

This doesn't make too much sense. The anonymous THP flag should not
intervene shmem THP. Calling shmem_huge_enabled() with checking
MMF_DISABLE_THP sounds good enough. And, we could skip stack and
dax vma check since we already checked if the vma is shmem already.
Kirill, can we get a confirmation that this is really intended behavior
rather than an omission please? Is this documented? What is a global
knob to simply disable THP system wise?

Hi Kirill,

Ping. Any comment?

Talked with Kirill at LSFMM, it sounds this is kind of intended behavior according to him. But, we all agree it looks inconsistent.

So, we may have two options:
ÂÂÂ - Just fix the false negative issue as what the patch does
ÂÂÂ - Change the behavior to make it more consistent

I'm not sure whether anyone relies on the behavior explicitly or implicitly or not.

If we would like to change the behavior, I may consider to take a step further to refactor the code a little bit to use huge_fault() to handle THP fault instead of falling back to handle_pte_fault() in the current implementation. This may make adding THP for other filesystems easier.


Thanks,
Yang


I have to say that the THP tuning API is one giant mess :/

Btw. this patch also seem to fix khugepaged behavior because it previously
ignored both VM_NOHUGEPAGE and MMF_DISABLE_THP.

Fixes: 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each vma")
Cc: Michal Hocko <mhocko@xxxxxxxx>
Cc: Vlastimil Babka <vbabka@xxxxxxx>
Cc: David Rientjes <rientjes@xxxxxxxxxx>
Cc: Kirill A. Shutemov <kirill@xxxxxxxxxxxxx>
Signed-off-by: Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx>
---
v2: Check VM_NOHUGEPAGE per Michal Hocko

 mm/huge_memory.c | 4 ++--
 mm/shmem.c | 3 +++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 165ea46..5881e82 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -67,8 +67,8 @@ bool transparent_hugepage_enabled(struct vm_area_struct *vma)
 {
ÂÂÂÂÂ if (vma_is_anonymous(vma))
ÂÂÂÂÂÂÂÂÂ return __transparent_hugepage_enabled(vma);
-ÂÂÂ if (vma_is_shmem(vma) && shmem_huge_enabled(vma))
-ÂÂÂÂÂÂÂ return __transparent_hugepage_enabled(vma);
+ÂÂÂ if (vma_is_shmem(vma))
+ÂÂÂÂÂÂÂ return shmem_huge_enabled(vma);
 Â return false;
 }
diff --git a/mm/shmem.c b/mm/shmem.c
index 2275a0f..6f09a31 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3873,6 +3873,9 @@ bool shmem_huge_enabled(struct vm_area_struct *vma)
ÂÂÂÂÂ loff_t i_size;
ÂÂÂÂÂ pgoff_t off;
 + if ((vma->vm_flags & VM_NOHUGEPAGE) ||
+ÂÂÂÂÂÂÂ test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
+ÂÂÂÂÂÂÂ return false;
ÂÂÂÂÂ if (shmem_huge == SHMEM_HUGE_FORCE)
ÂÂÂÂÂÂÂÂÂ return true;
ÂÂÂÂÂ if (shmem_huge == SHMEM_HUGE_DENY)
--
1.8.3.1