Re: [RFC PATCH] cpumask: Implement "random" version of cpumask_any_but()

From: I Hsin Cheng
Date: Wed Jan 15 2025 - 02:24:44 EST


On Tue, Jan 14, 2025 at 10:43:56AM -0500, Yury Norov wrote:
> > > > > @@ -401,12 +401,18 @@ unsigned int __pure cpumask_next_wrap(int n, const struct cpumask *mask, int sta
> > > > > static __always_inline
> > > > > unsigned int cpumask_any_but(const struct cpumask *mask, unsigned int cpu)
> > > > > {
> > > > > - unsigned int i;
> > > > > + unsigned int i, n, weight;
> > > > >
> > > > > cpumask_check(cpu);
> > > > > - for_each_cpu(i, mask)
> > > > > - if (i != cpu)
> > > > > - break;
> > > > > + weight = cpumask_weight(mask);
> > > > > + n = get_random_u32() % weight;
> > > > > +
> > > > > + /* If we accidentally pick "n" equal to "cpu",
> > > > > + * then simply choose "n + 1"th cpu.
> > > > > + */
> > > > > + if (n == cpu)
> > > > > + n = (n + 1) % weight;
> > > > > + i = cpumask_nth(n, mask);
> > >
> > > This is an entirely broken thing, and it works only because your CPU mask
> > > is dense. Imagine cpumask: 0111 1111. Your new cpumask_any_but(mask, 5)
> > > will return 5 exactly, if the get_random_u32() draws 4.
> > >
> >
> > Oh I'm sorry for this part, "cpumask_nth()" return value should be
> > checked instead of checking "get_random_u32()". May I ask why does it
> > only work for dense cpumask?
>
> Because for dense mask cpumask_nth(n) == n, and your
> n = get_random_u32() % weight
> is the same as
> n = cpumask_nth(get_random_u32() % weight)
>
> > I mean clearly original method will be
> > faster when cpumask isn't dense.
> >
> > > It looks broken even for a dense mask. By probability, your code returns:
> > >
> > > P(0-4,7) == 1/8,
> > > P(5) == 0,
> > > P(6) == 1/4.
> > >
> > > Assuming you are trying to implement a random uniform distribution drawing,
> > > the correct probabilities should look like:
> > >
> > > P(0-4,6-7) == 1/7,
> > > P(5) == 0,
> > >
> > > Thanks,
> > > Yury
> > >
> >
> > I did noticed this part, I was thinking that we do not actually need to
> > make the prbability to be uniform distribution, just want to make it
> > scatters more then picking certain number.
>
> On cpumask level you can't speculate whether users need true randomness
> or not. That's why we pay so much attention to proper function naming.
>
> See:
> - cpumask_any() - 'any' requirement is much simpler than random;
> - cpumask_any_distribute() - not random at all, just a better (good
> enough) approximation;
> - cpumask_random() - a true random drawing. Must pass Chi^2,
> Kolmogorov-Smirnov and other fancy statistical tests. As you can see,
> doesn't exist.
>
> If in your function you use expensive true-random get_random_u32(),
> one can expect that the output will not be worse than the original
> uniform distribution, by the statistical properties means.
>

I see, thanks for the explanation, it really helps me learn alot!

> > Thanks for your detailed review and explanation, also thanks to Mark and
> > Kuan-Wei !
> >
> > I now realize "random" here is more of a convention for the caller to
> > states that it doesn't matter which cpu it gets, maybe we should
> > rephrase the comment to make it less confusing? Because I think "random"
> > itself does stands for a particular meaning.
>
> Feel free to send a patch.

No problem, I'll send it ASAP.

Regards,
I Hsin Cheng.