Re: [PATCH 2/2] lib/percpu_counter: fix dying cpu compare race

From: Yury Norov
Date: Mon Apr 03 2023 - 22:50:57 EST


On Tue, Apr 04, 2023 at 09:42:06AM +0800, Ye Bin wrote:
> From: Ye Bin <yebin10@xxxxxxxxxx>
>
> In commit 8b57b11cca88 ("pcpcntrs: fix dying cpu summation race") a race
> condition between a cpu dying and percpu_counter_sum() iterating online CPUs
> was identified.
> Acctually, there's the same race condition between a cpu dying and
> __percpu_counter_compare(). Here, use 'num_online_cpus()' for quick judgment.
> But 'num_online_cpus()' will be decreased before call 'percpu_counter_cpu_dead()',
> then maybe return incorrect result.
> To solve above issue, also need to add dying CPUs count when do quick judgment
> in __percpu_counter_compare().

Not sure I completely understood the race you are describing. All CPU
accounting is protected with percpu_counters_lock. Is it a real race
that you've faced, or hypothetical? If it's real, can you share stack
traces?

> Signed-off-by: Ye Bin <yebin10@xxxxxxxxxx>
> ---
> lib/percpu_counter.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c
> index 5004463c4f9f..399840cb0012 100644
> --- a/lib/percpu_counter.c
> +++ b/lib/percpu_counter.c
> @@ -227,6 +227,15 @@ static int percpu_counter_cpu_dead(unsigned int cpu)
> return 0;
> }
>
> +static __always_inline unsigned int num_count_cpus(void)

This doesn't look like a good name. Maybe num_offline_cpus?

> +{
> +#ifdef CONFIG_HOTPLUG_CPU
> + return (num_online_cpus() + num_dying_cpus());

^ ^
'return' is not a function. Braces are not needed

Generally speaking, a sequence of atomic operations is not an atomic
operation, so the above doesn't look correct. I don't think that it
would be possible to implement raceless accounting based on 2 separate
counters.

Most probably, you'd have to use the same approach as in 8b57b11cca88:

lock();
for_each_cpu_or(cpu, cpu_online_mask, cpu_dying_mask)
cnt++;
unlock();

And if so, I'd suggest to implement cpumask_weight_or() for that.

> +#else
> + return num_online_cpus();
> +#endif
> +}
> +
> /*
> * Compare counter against given value.
> * Return 1 if greater, 0 if equal and -1 if less
> @@ -237,7 +246,7 @@ int __percpu_counter_compare(struct percpu_counter *fbc, s64 rhs, s32 batch)
>
> count = percpu_counter_read(fbc);
> /* Check to see if rough count will be sufficient for comparison */
> - if (abs(count - rhs) > (batch * num_online_cpus())) {
> + if (abs(count - rhs) > (batch * num_count_cpus())) {
> if (count > rhs)
> return 1;
> else
> --
> 2.31.1