Re: [RFC PATCH v2] Introduce Hierarchical Per-CPU Counters

From: Christoph Lameter (Ampere)
Date: Tue Apr 08 2025 - 13:30:01 EST


On Tue, 8 Apr 2025, Mathieu Desnoyers wrote:

> - Minimize contention when incrementing and decrementing counters,
> - Provide fast access to a sum approximation,

In general I like this as a abstraction of the Zoned VM counters in
vmstat.c that will make the scalable counters there useful elsewhere.

> It aims at fixing the per-mm RSS tracking which has become too
> inaccurate for OOM killer purposes on large many-core systems [1].

There are numerous cases where these issues occur. I know of a few I could
use something like this.

> The hierarchical per-CPU counters propagate a sum approximation through
> a binary tree. When reaching the batch size, the carry is propagated
> through a binary tree which consists of log2(nr_cpu_ids) levels. The
> batch size for each level is twice the batch size of the prior level.

A binary tree? Could we do this N-way? Otherwise the tree will be 8 levels
on a 512 cpu machine. Given the inflation of the number of cpus this
scheme better work up to 8K cpus.

> +int percpu_counter_tree_precise_sum(struct percpu_counter_tree *counter);
> +int percpu_counter_tree_precise_compare(struct percpu_counter_tree *a, struct percpu_counter_tree *b);
> +int percpu_counter_tree_precise_compare_value(struct percpu_counter_tree *counter, int v);

Precise? Concurrent counter updates can occur while determining the global
value. People may get confused.

Also maybe there would be a need for a function to collape the values into
the global if f.e. a cpu goes off line or in order to switch off OS
activities on a cpu.