Re: [RFC PATCH] mm, memcg: introduce memory.high.throttle

From: Waiman Long
Date: Thu Jan 30 2025 - 10:05:46 EST

Next message: James Clark: "Re: [PATCH 1/2] perf test: Fix perf test 114 perf record test subtest precise_max"
Previous message: Christian Brauner: "Re: [PATCH v2 0/3] further lockref cleanups"
In reply to: Michal Hocko: "Re: [RFC PATCH] mm, memcg: introduce memory.high.throttle"
Next in thread: Roman Gushchin: "Re: [RFC PATCH] mm, memcg: introduce memory.high.throttle"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 1/30/25 3:15 AM, Michal Hocko wrote:

On Wed 29-01-25 14:12:04, Waiman Long wrote:

Since commit 0e4b01df8659 ("mm, memcg: throttle allocators when failing
reclaim over memory.high"), the amount of allocator throttling had
increased substantially. As a result, it could be difficult for a
misbehaving application that consumes increasing amount of memory from
being OOM-killed if memory.high is set. Instead, the application may
just be crawling along holding close to the allowed memory.high memory
for the current memory cgroup for a very long time especially those
that do a lot of memcg charging and uncharging operations.

This behavior makes the upstream Kubernetes community hesitate to
use memory.high. Instead, they use only memory.max for memory control
similar to what is being done for cgroup v1 [1].

Why is this a problem for them?

My understanding is that a mishaving container will hold up memory.high amount of memory for a long time instead of getting OOM killed sooner and be more productively used elsewhere.

To allow better control of the amount of throttling and hence the
speed that a misbehving task can be OOM killed, a new single-value
memory.high.throttle control file is now added. The allowable range
is 0-32. By default, it has a value of 0 which means maximum throttling
like before. Any non-zero positive value represents the corresponding
power of 2 reduction of throttling and makes OOM kills easier to happen.

I do not like the interface to be honest. It exposes an implementation
detail and casts it into a user API. If we ever need to change the way
how the throttling is implemented this will stand in the way because
there will be applications depending on a behavior they were carefuly
tuned to.

It is also not entirely sure how is this supposed to be used in
practice? How do people what kind of value they should use?

Yes, I agree that a user may need to run some trial runs to find a proper value. Perhaps a simpler binary interface of "off" and "on" may be easier to understand and use.

System administrators can now use this parameter to determine how easy
they want OOM kills to happen for applications that tend to consume
a lot of memory without the need to run a special userspace memory
management tool to monitor memory consumption when memory.high is set.

Why cannot they achieve the same with the existing events/metrics we
already do provide? Most notably PSI which is properly accounted when
a task is throttled due to memory.high throttling.

That will require the use of a userspace management agent that looks for these stalling conditions and make the kill, if necessary. There are certainly users out there that want to get some benefit of using memory.high like early memory reclaim without the trouble of handling these kind of stalling conditions.

Cheers,
Longman

Next message: James Clark: "Re: [PATCH 1/2] perf test: Fix perf test 114 perf record test subtest precise_max"
Previous message: Christian Brauner: "Re: [PATCH v2 0/3] further lockref cleanups"
In reply to: Michal Hocko: "Re: [RFC PATCH] mm, memcg: introduce memory.high.throttle"
Next in thread: Roman Gushchin: "Re: [RFC PATCH] mm, memcg: introduce memory.high.throttle"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]