Re: [PATCH next] crypto: qat - replace avg_array() with a better function

From: David Laight

Date: Tue Feb 24 2026 - 17:01:53 EST


On Tue, 24 Feb 2026 18:11:37 +0000
Giovanni Cabiddu <giovanni.cabiddu@xxxxxxxxx> wrote:

> On Fri, Feb 06, 2026 at 09:09:40PM +0000, david.laight.linux@xxxxxxxxx wrote:
> > From: David Laight <david.laight.linux@xxxxxxxxx>
> >
> > avg_array() is defined as a 'type independant' #define.
> > However the algorithm is only valid for unsigned types and the
> > implementation is only valid for u64.
> > All the callers pass temporary kmalloc() allocated arrays of u64.
> >
> > Replace with a function that takes a pointer to a u64 array.
> >
> > Change the implementation to sum the low and high 32bits of each
> > value separately and then compute the average.
> Thanks David, this is a great optimization.
>
> I also reviewed the algorithm and confirmed it is functionally equivalent
> to the previous version. I tested it on a platform with QAT and it
> behaves as expected.
>
> Some minor comments below.
>
> > This will be massively faster as it does two divisions rather than
> > one for each element.
> NIT: probably not `massively faster` as the maximum value for len in the
> current implementation is 4.

It is still a lot faster - but probably not significant to system performance.

Actually if the max for len is 65536 you can do better (esp. for 32bit).
Instead of splitting 32/32 split 48/16, then sum_lo needs only be u32.

giving:

{
u64 sum_hi = 0;
u32 sum_lo = 0;
size_t i;

for (i = 0; i < len; i++) {
sum_hi += array[i] >> 16;
sum_lo += array[i] & 0xffff;
}

sum_lo += do_div(sum_hi, len) << 16;

return (sum_hi << 16) + sum_lo / len;
}

OTOH aren't those values performance counts of some kind?
Adding four of them together isn't going to wrap.

>
> > Also removes some very pointless __unqual_scalar_typeof().
> > They could be 'auto _x = 0 ? x + 0 : 0;' even if the types weren't fixed.
> >
> > Only compile tested.
> >
> > Signed-off-by: David Laight <david.laight.linux@xxxxxxxxx>
> > ---
> > .../intel/qat/qat_common/adf_tl_debugfs.c | 38 ++++++++-----------
> > 1 file changed, 15 insertions(+), 23 deletions(-)
> >
> > diff --git a/drivers/crypto/intel/qat/qat_common/adf_tl_debugfs.c b/drivers/crypto/intel/qat/qat_common/adf_tl_debugfs.c
> > index b81f70576683..a084437a2631 100644
> > --- a/drivers/crypto/intel/qat/qat_common/adf_tl_debugfs.c
> > +++ b/drivers/crypto/intel/qat/qat_common/adf_tl_debugfs.c
> > @@ -77,32 +77,24 @@ static int tl_collect_values_u64(struct adf_telemetry *telemetry,
> > * @len: Number of elements.
> > *
> > * This algorithm computes average of an array without running into overflow.
> > + * (Provided len is less than 2 << 31.)
> Should this be 2^31 or 1 << 31?
> Alternatively: `Provided len fits in u32`?

Not sure why I wrote 2 << 31 :-)

The condition is that sum_lo must not overflow.
The worst case is all the low bits being 1.
If len is 2^32 then sum_lo is then (2^32 - 1) * 2^32.
The remainder from the sum_hi divide is shifted and added in,
giving (2^32 - 1) * (2^32 + 1) which is what my maths teacher called a
'cow and goat' - (cow + goat) * (cow - goat) = cow squared - goat squared,
so then maximum for sum_lo is 2^64 - 1 which fits.
Which means it should have been 'len <= 2^32'.

David


>
> > *
> > * Return: average of values.
> > */
> > -#define avg_array(array, len) ( \
> > -{ \
> > - typeof(&(array)[0]) _array = (array); \
> > - __unqual_scalar_typeof(_array[0]) _x = 0; \
> > - __unqual_scalar_typeof(_array[0]) _y = 0; \
> > - __unqual_scalar_typeof(_array[0]) _a, _b; \
> > - typeof(len) _len = (len); \
> > - size_t _i; \
> > - \
> > - for (_i = 0; _i < _len; _i++) { \
> > - _a = _array[_i]; \
> > - _b = do_div(_a, _len); \
> > - _x += _a; \
> > - if (_y >= _len - _b) { \
> > - _x++; \
> > - _y -= _len - _b; \
> > - } else { \
> > - _y += _b; \
> > - } \
> > - } \
> > - do_div(_y, _len); \
> > - (_x + _y); \
> > -})
> > +static u64 avg_array(const u64 *array, size_t len)
> Shall size_t len be u32 len?
>
> > +{
> > + u64 sum_hi = 0, sum_lo = 0;
> > + size_t i;
> > +
> > + for (i = 0; i < len; i++) {
> > + sum_hi += array[i] >> 32;
> > + sum_lo += (u32)array[i];
> > + }
> > +
> > + sum_lo += (u64)do_div(sum_hi, len) << 32;
> > +
> > + return (sum_hi << 32) + div_u64(sum_lo, len);
> > +}
> >
> > /* Calculation function for simple counter. */
> > static int tl_calc_count(struct adf_telemetry *telemetry,
>
> Thanks,
>