Re: [PATCH v3 1/1] memcg/hugetlb: Adding hugeTLB counters to memcg
From: Shakeel Butt
Date: Mon Oct 28 2024 - 18:38:52 EST
On Mon, Oct 28, 2024 at 02:05:05PM GMT, Joshua Hahn wrote:
> This patch introduces a new counter to memory.stat that tracks hugeTLB
> usage, only if hugeTLB accounting is done to memory.current. This
> feature is enabled the same way hugeTLB accounting is enabled, via
> the memory_hugetlb_accounting mount flag for cgroupsv2.
>
> 1. Why is this patch necessary?
> Currently, memcg hugeTLB accounting is an opt-in feature [1] that adds
> hugeTLB usage to memory.current. However, the metric is not reported in
> memory.stat. Given that users often interpret memory.stat as a breakdown
> of the value reported in memory.current, the disparity between the two
> reports can be confusing. This patch solves this problem by including
> the metric in memory.stat as well, but only if it is also reported in
> memory.current (it would also be confusing if the value was reported in
> memory.stat, but not in memory.current)
>
> Aside from the consistency between the two files, we also see benefits
> in observability. Userspace might be interested in the hugeTLB footprint
> of cgroups for many reasons. For instance, system admins might want to
> verify that hugeTLB usage is distributed as expected across tasks: i.e.
> memory-intensive tasks are using more hugeTLB pages than tasks that
> don't consume a lot of memory, or are seen to fault frequently. Note that
> this is separate from wanting to inspect the distribution for limiting
> purposes (in which case, hugeTLB controller makes more sense).
>
> 2. We already have a hugeTLB controller. Why not use that?
> It is true that hugeTLB tracks the exact value that we want. In fact, by
> enabling the hugeTLB controller, we get all of the observability
> benefits that I mentioned above, and users can check the total hugeTLB
> usage, verify if it is distributed as expected, etc.
>
> With this said, there are 2 problems:
> (a) They are still not reported in memory.stat, which means the
> disparity between the memcg reports are still there.
> (b) We cannot reasonably expect users to enable the hugeTLB controller
> just for the sake of hugeTLB usage reporting, especially since
> they don't have any use for hugeTLB usage enforcing [2].
>
> [1] https://lore.kernel.org/all/20231006184629.155543-1-nphamcs@xxxxxxxxx/
> [2] Of course, we can't make a new patch for every feature that can be
> duplicated. However, since the existing solution of enabling the
> hugeTLB controller is an imperfect solution that still leaves a
> discrepancy between memory.stat and memory.curent, I think that it
> is reasonable to isolate the feature in this case.
>
> Suggested-by: Nhat Pham <nphamcs@xxxxxxxxx>
> Suggested-by: Shakeel Butt <shakeel.butt@xxxxxxxxx>
> Signed-off-by: Joshua Hahn <joshua.hahnjy@xxxxxxxxx>
Acked-by: Shakeel Butt <shakeel.butt@xxxxxxxxx>