Re: [PATCH v2] crypto: ecc - Fix carry overflow in vli multiplication
From: Lukas Wunner
Date: Wed May 13 2026 - 10:24:22 EST
On Wed, May 13, 2026 at 08:39:48PM +0800, Qingfang Deng wrote:
> On Wed, 13 May 2026 at 13:57:40 +0300, Anastasia Tishchenko wrote:
> > +++ b/crypto/ecc.c
> > @@ -393,14 +393,26 @@ static uint128_t mul_64_64(u64 left, u64 right)
> > return result;
> > }
> >
> > -static uint128_t add_128_128(uint128_t a, uint128_t b)
> > +/* Calculate addition with overflow checking. Returns true on wrap-around,
> > + * false otherwise.
> > + */
> > +static bool check_add_128_128_overflow(uint128_t *result, uint128_t a,
> > + uint128_t b)
> > {
> > - uint128_t result;
> > + bool carry;
> >
> > - result.m_low = a.m_low + b.m_low;
> > - result.m_high = a.m_high + b.m_high + (result.m_low < a.m_low);
> > + result->m_low = a.m_low + b.m_low;
> > + carry = (result->m_low < a.m_low);
> >
> > - return result;
> > + result->m_high = a.m_high + b.m_high + carry;
>
> If CONFIG_ARCH_SUPPORTS_INT128 is defined, you can convert them to
> "unsigned __int128" as done in mul_64_64(), and use check_add_overflow()
> to get the carry.
Okay, but that would be a separate feature on top of this fix.
This fix applies to *all* architectures whereas using native
128-bit integers would only solve the problem on arches which
support such integers.
ARCH_SUPPORTS_INT128 actually depends on CC_HAS_INT128 on all arches
that support it (arm64, loongarch, riscv, s390, x86). Interestingly,
s390 additionally requires CC_IS_CLANG. Commit fbac266f095d ("s390:
select ARCH_SUPPORTS_INT128") explains that "gcc generates inefficient
code, which may lead to stack overflows" when handling 128-bit integers.
It goes on to say: "The gcc generated functions have 6kb stack frames,
compared to only 1kb of the code generated with clang."
That reminds me of the high stack usage issue we're seeing when compiling
crypto/ecc.c on arm 32-bit:
https://lore.kernel.org/r/7e3d64a53efb28740b32d1f934e78c10086208ab.1778073318.git.lukas@xxxxxxxxx/
I'm fine with adding native 128-bit usage to check_add_128_128_overflow()
on top of this fix if somebody wants to submit a patch. But I suspect
it might only actually be useful on s390 with clang.
Thanks,
Lukas