[QUESTION] memcg page_counter seems broken in MADV_DONTNEED with THP enabled
From: Yongqiang Liu
Date: Sat Nov 26 2022 - 08:10:02 EST
Hi,
We use mm_counter to how much a process physical memory used. Meanwhile,
page_counter of a memcg is used to count how much a cgroup physical
memory used.
If a cgroup only contains a process, they looks almost the same. But with
THP enabled, sometimes memory.usage_in_bytes in memcg may be twice or
more than rss
in proc/[pid]/smaps_rollup as follow:
[root@localhost sda]# cat /sys/fs/cgroup/memory/test/memory.usage_in_bytes
1080930304
[root@localhost sda]# cat /sys/fs/cgroup/memory/test/cgroup.procs
1290
[root@localhost sda]# cat /proc/1290/smaps_rollup
55ba80600000-ffffffffff601000 ---p 00000000 00:00 0
[rollup]
Rss: 500648 kB
Pss: 498337 kB
Shared_Clean: 2732 kB
Shared_Dirty: 0 kB
Private_Clean: 364 kB
Private_Dirty: 497552 kB
Referenced: 500648 kB
Anonymous: 492016 kB
LazyFree: 0 kB
AnonHugePages: 129024 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 0 kB
THPeligible: 0
I have found the differences was because that __split_huge_pmd decrease
the mm_counter but page_counter in memcg was not decreased with refcount
of a head page is not zero. Here are the follows:
do_madvise
madvise_dontneed_free
zap_page_range
unmap_single_vma
zap_pud_range
zap_pmd_range
__split_huge_pmd
__split_huge_pmd_locked
__mod_lruvec_page_state
zap_pte_range
add_mm_rss_vec
add_mm_counter -> decrease the
mm_counter
tlb_finish_mmu
arch_tlb_finish_mmu
tlb_flush_mmu_free
free_pages_and_swap_cache
release_pages
folio_put_testzero(page) -> not zero, skip
continue;
__folio_put_large
free_transhuge_page
free_compound_page
mem_cgroup_uncharge
page_counter_uncharge -> decrease the
page_counter
node_page_stat which shows in meminfo was also decreased. the
__split_huge_pmd
seems free no physical memory unless the total THP was free.I am
confused which
one is the true physical memory used of a process.
Kind regards,
Yongqiang Liu