Re: [PATCH 1/4] bitops: Add single_bit_set()

From: Andy Shevchenko
Date: Tue Nov 23 2021 - 04:03:53 EST


On Mon, Nov 22, 2021 at 11:33:46PM -0800, Yury Norov wrote:
> On Tue, Nov 23, 2021 at 05:26:56AM +0000, Vaittinen, Matti wrote:
> > On 11/22/21 19:54, Yury Norov wrote:
> > > On Mon, Nov 22, 2021 at 02:57:27PM +0200, Andy Shevchenko wrote:
> > >> On Mon, Nov 22, 2021 at 12:42:21PM +0000, Vaittinen, Matti wrote:
> > >>> On 11/22/21 13:28, Andy Shevchenko wrote:
> > >>>> On Mon, Nov 22, 2021 at 01:03:25PM +0200, Matti Vaittinen wrote:

...

> > >>>> So, you decided to reinvent hamming weight...
> > >>>> Please, drop this patch and use corresponding hweight() call.
> > >>
> > >>> Thanks Andy.
> > >>>
> > >>> There are few differences to hamming weight here. We scan only given
> > >>> amount of bits - and we will end scanning immediately when we hit second
> > >>> set bit. Oh, and obviously we only return information whether there is
> > >>> exactly one bit set. So no, this is not hamming weight().
> > >>
> > >> What do you mean by this?
> > >>
> > >> hweight() will return you the number of the non-zero elements in the set.
> > >> In application to boolean based arrays it means the number of bits that
> > >> are set. Obviously, the condition `hweight() == 1` is what you are looking
> > >> for.
> > >
> > > Hi Andy,
> > >
> > > I think, Matti means earlier return when part of bitmap counts set
> > > bits to a greater nubmer, and we can skip the rest. Right, Matti?
> >
> > Yes.
> >
> > > But in general, it might be useful for long bitmaps.
> > >
> > > The more complete way of doing this would be adding a new set of
> > > functions: bitmap_weight_{eq,neq,gt,le}
> > >
> > > I'm looking at how bitmap_weight is used in the kernel and see
> > > quite a lot of places where this optimization may take place. For
> > > example otx2_remove_flow() in drivers/net/ethernet/marvell/octeontx2/nic/otx2_flows.c:
> > >
> > > if (bitmap_weight(&flow_cfg->dmacflt_bmap,
> > > flow_cfg->dmacflt_max_flows) == 1)
> > > otx2_update_rem_pfmac(pfvf, DMAC_ADDR_DEL);
> > >
> > > may be replaced with:
> > >
> > > if (bitmap_weight_eq(&flow_cfg->dmacflt_bmap, flow_cfg->dmacflt_max_flows, 1)
> > > otx2_update_rem_pfmac(pfvf, DMAC_ADDR_DEL);
> > >
> > > Most of that places are in drivers however, and the length of bitmaps
> > > there is typically small, so that there's no chance to get any
> > > measurable performance improvement.
> > >
> > > There is always a chance that we have opencoded bitmap_weight_eq()
> > > et all. If we add these API, it might help people wright better code.
> > >
> > > What do you think?
> >
> > My uneducated opinion (for what it matters :]) is thet the cost of
> > adding such functions is negligible so I am all for adding them if there
> > are even few users who can benefit from those.
>
> I think I changed my opinion. We have enough examples of opencoded
> bitmap_weight_{eq,...} in core code which will definitely benefit
> from this optimization. For example, sched_cpu_activate:
>
> if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
> static_branch_inc_cpuslocked(&sched_smt_present);
>
> Considering computers with thousands of CPUs, early return would save a
> lot.
>
> I'll take a look on it at this weekend.

Be sure you get better assembly in these cases. As I have told already,
hweight() is a single assembly instruction, I'm not sure open coded variant
may be better even for long bitmaps. That said, assembly comparison and
some performance tests would be nice to have.

As an API per se it might make sense to have such, but you know that we don't
add it without users.

--
With Best Regards,
Andy Shevchenko