Re: [PATCH v13 2/3] mm: Fix OOM killer inaccuracy on large many-core systems

Next message: Kartik Rajput: "Re: [PATCH v3] ACPI: bus: Use OF match data for PRP0001 matched devices"
Previous message: Bartosz Golaszewski: "Re: [PATCH v2] gpio: davinci: implement .get_direction()"
In reply to: Mathieu Desnoyers: "[PATCH v13 2/3] mm: Fix OOM killer inaccuracy on large many-core systems"
Next in thread: Mathieu Desnoyers: "Re: [PATCH v13 2/3] mm: Fix OOM killer inaccuracy on large many-core systems"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Michal Hocko

Date: Mon Jan 12 2026 - 03:42:15 EST

Hi,
sorry to jump in this late but the timing of previous versions didn't
really work well for me.

On Sun 11-01-26 14:49:57, Mathieu Desnoyers wrote:
[...]
> Here is a (possibly incomplete) list of the prior approaches that were
> used or proposed, along with their downside:
>
> 1) Per-thread rss tracking: large error on many-thread processes.
>
> 2) Per-CPU counters: up to 12% slower for short-lived processes and 9%
> increased system time in make test workloads [1]. Moreover, the
> inaccuracy increases with O(n^2) with the number of CPUs.
>
> 3) Per-NUMA-node counters: requires atomics on fast-path (overhead),
> error is high with systems that have lots of NUMA nodes (32 times
> the number of NUMA nodes).
>
> The approach proposed here is to replace this by the hierarchical
> per-cpu counters, which bounds the inaccuracy based on the system
> topology with O(N*logN).

The concept of hierarchical pcp counter is interesting and I am
definitely not opposed if there are more users that would benefit.

>From the OOM POV, IIUC the primary problem is that get_mm_counter
(percpu_counter_read_positive) is too imprecise on systems when the task
is moving around a large number of cpus. In the list of alternative
solutions I do not see percpu_counter_sum_positive to be mentioned.
oom_badness() is a really slow path and taking the slow path to
calculate a much more precise value seems acceptable. Have you
considered that option?

--
Michal Hocko
SUSE Labs