Re: [PATCH] mm/huge_memory: skip huge_zero_pmd in zap_huge_pmd_folio()

From: Lance Yang

Date: Thu Apr 30 2026 - 03:21:22 EST

On 2026/4/30 15:05, Bibo Mao wrote:

On 2026/4/30 下午3:02, Lance Yang wrote:

On Thu, Apr 30, 2026 at 02:34:20PM +0800, Bibo Mao wrote:

On 2026/4/30 下午12:28, Lance Yang wrote:

On Thu, Apr 30, 2026 at 12:11:20PM +0800, Bibo Mao wrote:

when executing command "make check" with qemu software, there is
error report like this:
BUG: Bad rss-counter state mm:00000000972846bc type:MM_FILEPAGES val:-4096 Comm:bios-tables-tes Pid:27802
BUG: Bad rss-counter state mm:00000000752180c5 type:MM_FILEPAGES val:-2048 Comm:worker Pid:27815
BUG: Bad rss-counter state mm:000000009c2f6a61 type:MM_FILEPAGES val:-2048 Comm:qom-test Pid:27825

Good catch!

The problem is that when application exits, rss counter is calculated
with huge_zero_pmd huge page, instead it should be skipped.

Looks like the same problem[1] we discussed recently.

[1] https://lore.kernel.org/linux-mm/74a75b59-2e13-3985-ee99- d5521f39df2a@xxxxxxxxxx/

Signed-off-by: Bibo Mao <maobibo@xxxxxxxxxxx>
---
mm/huge_memory.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 970e077019b7..3cbea344d4a2 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2423,6 +2423,9 @@ static void zap_huge_pmd_folio(struct mm_struct *mm, struct vm_area_struct *vma,
{
    const bool is_device_private = folio_is_device_private(folio);

+    if (is_huge_zero_pmd(pmdval))
+        return;
+

The huge zero PMD should not be returned by vm_normal_page_pmd() or
vm_normal_folio_pmd() as a normal folio. If it reaches
zap_huge_pmd_folio(), we already made the wrong normal-vs-special
decision ...

So I don't think we should special-case it in zap_huge_pmd_folio(). That
only avoids this RSS decrement :)

Could you please check whether the fix[2] also fixes your QEMU test?

[2] https://lore.kernel.org/linux-mm/ea1453a6-14c9-4334- ac7e-2758586393b2@xxxxxxxxxx/
yes, I think it will solve this problem.

Only that I think that there should be tlb flush operation after
pmdp_huge_get_and_clear_full() even with huge_zero_pmd page, so
tlb_remove_page_size() should be called. Is that right?

Calling tlb_remove_page_size() is not necessary there :)

zap_huge_pmd() already marks the PMD range for TLB invalidation right
after clearing the entry:

    orig_pmd = pmdp_huge_get_and_clear_full(...);
    tlb_remove_pmd_tlb_entry(tlb, pmd, addr);

Yes, it is. I forget the tlb_flush_pmd_range() calling in tlb_remove_pmd_tlb_entry().

So the fix solves this problem. And thanks for your explanation.

If possible, can you test the fix[1] with your QEMU workload and
provide a Tested-by? That would be very helpful :D

[1] https://lore.kernel.org/linux-mm/4d950326-6944-409b-b108-a4e67256857f@xxxxxxxxxx/