Re: [PATCH v16 5/5] x86: vdso: Wire up getrandom() vDSO implementation

From: Jason A. Donenfeld
Date: Fri Jun 07 2024 - 11:28:08 EST


On Thu, May 30, 2024 at 08:38:16PM -0700, Eric Biggers wrote:
> On Tue, May 28, 2024 at 02:19:54PM +0200, Jason A. Donenfeld wrote:
> > diff --git a/arch/x86/entry/vdso/vgetrandom-chacha.S b/arch/x86/entry/vdso/vgetrandom-chacha.S
> > new file mode 100644
> > index 000000000000..d79e2bd97598
> > --- /dev/null
> > +++ b/arch/x86/entry/vdso/vgetrandom-chacha.S
> > @@ -0,0 +1,178 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2022 Jason A. Donenfeld <Jason@xxxxxxxxx>. All Rights Reserved.
> > + */
> > +
> > +#include <linux/linkage.h>
> > +#include <asm/frame.h>
> > +
> > +.section .rodata, "a"
> > +.align 16
> > +CONSTANTS: .octa 0x6b20657479622d323320646e61707865
> > +.text
> > +
> > +/*
> > + * Very basic SSE2 implementation of ChaCha20. Produces a given positive number
> > + * of blocks of output with a nonce of 0, taking an input key and 8-byte
> > + * counter. Importantly does not spill to the stack. Its arguments are:
> > + *
> > + * rdi: output bytes
> > + * rsi: 32-byte key input
> > + * rdx: 8-byte counter input/output
> > + * rcx: number of 64-byte blocks to write to output
> > + */
> > +SYM_FUNC_START(__arch_chacha20_blocks_nostack)
> > +
> > +.set output, %rdi
> > +.set key, %rsi
> > +.set counter, %rdx
> > +.set nblocks, %rcx
> > +.set i, %al
> > +/* xmm registers are *not* callee-save. */
> > +.set state0, %xmm0
> > +.set state1, %xmm1
> > +.set state2, %xmm2
> > +.set state3, %xmm3
> > +.set copy0, %xmm4
> > +.set copy1, %xmm5
> > +.set copy2, %xmm6
> > +.set copy3, %xmm7
> > +.set temp, %xmm8
> > +.set one, %xmm9
>
> An "interesting" x86_64 quirk: in SSE instructions, registers xmm0-xmm7 take
> fewer bytes to encode than xmm8-xmm15.
>
> Since 'temp' is used frequently, moving it into the lower range (and moving one
> of the 'copy' registers, which isn't used as frequently, into the higher range)
> decreases the code size of __arch_chacha20_blocks_nostack() by 5%.

That's a nice trick. Thank you very much for it.

Jason