Re: [PATCH v13 2/3] mm: Fix OOM killer inaccuracy on large many-core systems
From: Michal Hocko
Date: Mon Jan 12 2026 - 03:42:15 EST
Hi,
sorry to jump in this late but the timing of previous versions didn't
really work well for me.
On Sun 11-01-26 14:49:57, Mathieu Desnoyers wrote:
[...]
> Here is a (possibly incomplete) list of the prior approaches that were
> used or proposed, along with their downside:
>
> 1) Per-thread rss tracking: large error on many-thread processes.
>
> 2) Per-CPU counters: up to 12% slower for short-lived processes and 9%
> increased system time in make test workloads [1]. Moreover, the
> inaccuracy increases with O(n^2) with the number of CPUs.
>
> 3) Per-NUMA-node counters: requires atomics on fast-path (overhead),
> error is high with systems that have lots of NUMA nodes (32 times
> the number of NUMA nodes).
>
> The approach proposed here is to replace this by the hierarchical
> per-cpu counters, which bounds the inaccuracy based on the system
> topology with O(N*logN).
The concept of hierarchical pcp counter is interesting and I am
definitely not opposed if there are more users that would benefit.
>From the OOM POV, IIUC the primary problem is that get_mm_counter
(percpu_counter_read_positive) is too imprecise on systems when the task
is moving around a large number of cpus. In the list of alternative
solutions I do not see percpu_counter_sum_positive to be mentioned.
oom_badness() is a really slow path and taking the slow path to
calculate a much more precise value seems acceptable. Have you
considered that option?
--
Michal Hocko
SUSE Labs