[RFC/RFT PATCH 0/6] Improve get_random_u8() for use in randomize kstack

From: Ard Biesheuvel

Date: Thu Nov 27 2025 - 04:22:54 EST


From: Ard Biesheuvel <ardb@xxxxxxxxxx>

Ryan reports that get_random_u16() is dominant in the performance
profiling of syscall entry when kstack randomization is enabled [0].

This is the reason many architectures rely on a counter instead, and
that, in turn, is the reason for the convoluted way the (pseudo-)entropy
is gathered and recorded in a per-CPU variable.

Let's try to make the get_random_uXX() fast path faster, and switch to
get_random_u8() so that we'll hit the slow path 2x less often. Then,
wire it up in the syscall entry path, replacing the per-CPU variable,
making the logic at syscall exit redundant.

[0] https://lore.kernel.org/all/dd8c37bc-795f-4c7a-9086-69e584d8ab24@xxxxxxx/

Cc: Kees Cook <kees@xxxxxxxxxx>
Cc: Ryan Roberts <ryan.roberts@xxxxxxx>
Cc: Will Deacon <will@xxxxxxxxxx>
Cc: Arnd Bergmann <arnd@xxxxxxxx>
Cc: Jeremy Linton <jeremy.linton@xxxxxxx>
Cc: Catalin Marinas <Catalin.Marinas@xxxxxxx>
Cc: Mark Rutland <mark.rutland@xxxxxxx>
Cc: Jason A. Donenfeld <Jason@xxxxxxxxx>

Ard Biesheuvel (6):
hexagon: Wire up cmpxchg64_local() to generic implementation
arc: Wire up cmpxchg64_local() to generic implementation
random: Use u32 to keep track of batched entropy generation
random: Use a lockless fast path for get_random_uXX()
random: Plug race in preceding patch
randomize_kstack: Use get_random_u8() at entry for entropy

arch/Kconfig | 9 ++--
arch/arc/include/asm/cmpxchg.h | 3 ++
arch/hexagon/include/asm/cmpxchg.h | 4 ++
drivers/char/random.c | 49 ++++++++++++++------
include/linux/randomize_kstack.h | 36 ++------------
init/main.c | 1 -
6 files changed, 49 insertions(+), 53 deletions(-)


base-commit: ac3fd01e4c1efce8f2c054cdeb2ddd2fc0fb150d
--
2.52.0.107.ga0afd4fd5b-goog