RE: [RFC PATCH] crypto: arc4: Implement a version optimized for memory usage

From: David Laight
Date: Wed May 05 2021 - 06:21:01 EST


From: Christophe JAILLET
> Sent: 04 May 2021 19:00
>
> Le 04/05/2021 à 18:57, Eric Biggers a écrit :
> > On Sun, May 02, 2021 at 09:29:46PM +0200, Christophe JAILLET wrote:
> >> +#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
> >> +#define S_type u8
> >> +#else
> >> +#define S_type u32
> >> +#endif
> >> +
> >> struct arc4_ctx {
> >> - u32 S[256];
> >> + S_type S[256];
> >> u32 x, y;
> >> };
> >
> > Is it actually useful to keep both versions? It seems we could just use the u8
> > version everywhere. Note that there aren't actually any unaligned memory
> > accesses, so choosing the version conditionally on
> > CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS seems odd. What are you trying to
> > determine by checking that?
>
> Hi, this is a bad interpretation from me.
...
>
> I wanted to avoid potential performance cost related to using char (i.e
> u8) instead of int (i.e. u32).
> On some architecture this could require some shift or masking or
> whatever to "unpack" the values of S.

The only architecture that Linux ran on where the hardware
did RMW accesses for byte writes was some very old alpha cpu.
Even more recent alpha supported byte writes to memory.

On many architectures (not x86 or arm) indexing a byte array
is better because it saves the instruction to multiply the index by 4.
On x86-64 you want to be using 'unsigned int' for array indexes
so the compiler doesn't have to emit the instruction to sign extend
a 32bit int to 64 bits (sometimes it knows it can't be needed).

FWIW with a modern compiler all those temporaries are pointless.
The number of lines of code can be halved.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)