Re: [PATCH] crypto: ecc - Optimize vli additive operations using compiler builtins

From: Lukas Wunner

Date: Tue Jun 23 2026 - 09:45:22 EST


On Sun, Jun 07, 2026 at 01:24:35PM +0200, Fabian Blatter wrote:
> This patch uses __builtin_addcll, __builtin_subcll when available and
> otherwise __builtin_uaddll_overflow, __builtin_usubll_overflow. the
> latter have existed since ancient gcc versions, so no third fallback
> is needed.

crypto/ecc.c is derived from https://github.com/kmackay/micro-ecc/,
which seeks to be a portable ECC library. I suspect the portability
goal is the reason why it doesn't take advantage of compiler builtins
or other optimizations.

The kernel is much less encumbered, the minimum compiler versions are
apparent from Documentation/process/changes.rst. If these compiler
versions support the builtins you're using then everything should be
alright.

> I have put the add_carry and sub_borrow inline functions with the
> preprocessor logic for builtin selection directly in crypto/ecc.c.
> Please let me know if you would like them to be somewhere else.

Seems reasonable to me.

> This is quite interesting, since, as far as I know, the kernel compiles
> with gcc and O2 by default, yet the macro-level benchmarks still show a
> performance increase. The effect seems to be reversed when crypto/ecc.c
> gets compiled. Or maybe the linux kernel uses some additional
> optimization flags, I am unsure.

You can compile the kernel with V=1 to see the full command line.

> However, most of the time, the patched version outperforms the original
> one by a wide margin:
> - On clang -O2 or -O3, vli_add and vli_uadd show a 4.074x and 5.384x
> speedup.
> - On gcc, vli_uadd shows a 74% performance increase at O2,
> and a 2.07x speedup at O3.

There is precedent in the tree for overriding the default -O2 with -O3,
see lib/lz4/Makefile and arch/mips/vdso/Makefile.

It might be worth using that for crypto/ecc.c if it doesn't cause
breakage and yields a significant speedup.

> I am happy to make any changes to this patch if you like.
> I could also look into making `vli_cmp` and `vli_is_zero`,
> or others constant-time in a future patch.

Your patch LGTM and I don't see a need for a v2.

Previously we discussed replacing the ECC point multiplication algorithm
used by crypto/ecc.c with a newer constant time Montgomery ladder.
If you are interested in continuing working on crypto/ecc.c,
this might be a worthwhile topic:

https://lore.kernel.org/r/aftFAexDFrYbIeBM@xxxxxxxxx/

Thanks,

Lukas