Re: [PATCH v2] lib/memweight.c: open codes bitmap_weight()

From: Matthew Wilcox
Date: Sun Aug 25 2019 - 02:12:19 EST


On Sat, Aug 24, 2019 at 01:01:02PM +0300, Denis Efremov wrote:
> This patch open codes the bitmap_weight() call. The direct
> invocation of hweight_long() allows to remove the BUG_ON and
> excessive "longs to bits, bits to longs" conversion.

Honestly, that's not the problem with this function. Take a look
at https://danluu.com/assembly-intrinsics/ for a _benchmarked_
set of problems with popcnt.

> BUG_ON was required to check that bitmap_weight() will return
> a correct value, i.e. the computed weight will fit the int type
> of the return value.

What? No. Look at the _arguments_ of bitmap_weight():

static __always_inline int bitmap_weight(const unsigned long *src, unsigned int nbits)

> With this patch memweight() controls the
> computation directly with size_t type everywhere. Thus, the BUG_ON
> becomes unnecessary.

Why are you bothering? How are you allocating half a gigabyte of memory?
Why are you calling memweight() on half a gigabyte of memory?

> if (longs) {
> - BUG_ON(longs >= INT_MAX / BITS_PER_LONG);
> - ret += bitmap_weight((unsigned long *)bitmap,
> - longs * BITS_PER_LONG);
> + const unsigned long *bitmap_long =
> + (const unsigned long *)bitmap;
> +
> bytes -= longs * sizeof(long);
> - bitmap += longs * sizeof(long);
> + for (; longs > 0; longs--, bitmap_long++)
> + ret += hweight_long(*bitmap_long);
> + bitmap = (const unsigned char *)bitmap_long;
> }

If you really must change anything, I'd rather see this turned into a
loop:

while (longs) {
unsigned int nbits;

if (longs >= INT_MAX / BITS_PER_LONG)
nbits = INT_MAX + 1;
else
nbits = longs * BITS_PER_LONG;

ret += bitmap_weight((unsigned long *)bitmap, sz);
bytes -= nbits / 8;
bitmap += nbits / 8;
longs -= nbits / BITS_PER_LONG;
}

then we only have to use Dan Luu's optimisation in bitmap_weight()
and not in memweight() as well.

Also, why does the trailer do this:

for (; bytes > 0; bytes--, bitmap++)
ret += hweight8(*bitmap);

instead of calling hweight_long on *bitmap & mask?