Re: [PATCH] random: align entropy_timer_state to cache line

From: Eric Biggers
Date: Wed Nov 30 2022 - 14:51:47 EST


On Wed, Nov 30, 2022 at 08:31:33PM +0100, Jason A. Donenfeld wrote:
> On Wed, Nov 30, 2022 at 7:59 PM Eric Biggers <ebiggers@xxxxxxxxxx> wrote:
> >
> > On Wed, Nov 30, 2022 at 11:04:23AM +0100, Jason A. Donenfeld wrote:
> > > > > diff --git a/drivers/char/random.c b/drivers/char/random.c
> > > > > index 67558b95d531..2494e08c76d8 100644
> > > > > --- a/drivers/char/random.c
> > > > > +++ b/drivers/char/random.c
> > > > > @@ -1262,7 +1262,7 @@ static void __cold entropy_timer(struct timer_list *timer)
> > > > > static void __cold try_to_generate_entropy(void)
> > > > > {
> > > > > enum { NUM_TRIAL_SAMPLES = 8192, MAX_SAMPLES_PER_BIT = HZ / 15 };
> > > > > - struct entropy_timer_state stack;
> > > > > + struct entropy_timer_state stack ____cacheline_aligned;
> > > >
> > > > Several years ago, there was a whole thing about how __attribute__((aligned)) to
> > > > more than 8 bytes doesn't actually work on stack variables in the kernel on x86,
> > > > because the kernel only keeps the stack 8-byte aligned but gcc assumes it is
> > > > 16-byte aligned. See
> > > > https://lore.kernel.org/linux-crypto/20170110143340.GA3787@xxxxxxxxxxxxxxxxxxx/T/#t
> > > >
> > > > IIRC, nothing was done about it at the time.
> > > >
> > > > Has that been resolved in the intervening years?
> > >
> > > Maybe things are different for ____cacheline_aligned, which is 64 bytes.
> > > Reading that thread, it looks like it was a case of trying to align the
> > > stack to 16 bytes, but gcc assumed 16 bytes already while the kernel
> > > only gave it 8. So gcc didn't think it needed to emit any code to align
> > > it. Here, though, it's 64, and gcc certainly isn't assuming 64-byte
> > > stack alignment.
> > >
> > > Looking at the codegen, gcc appears to doing `rsp = (rsp & ~63) - 64`,
> > > which appears correct.
> >
> > Well, if gcc thinks the stack is already 16-byte aligned, then it would be
> > perfectly within its rights to do 'rsp = (rsp & ~47) - 64', right? You probably
> > don't want to be relying on an implementation detail of gcc codegen...
>
> The really pathological one would be ~48, which would just clear those
> two extra bits. I can't imagine gcc or clang ever deciding to do that.
> But I guess they could?
>
> What would you recommend here? kmalloc'ing it instead? Keeping things
> as is with ____cacheline_aligned, since this has always been broken,
> and it's not the end of the world? Something else?

Well, other places in the kernel do the alignment manually:

u8 __stack[sizeof(struct entropy_timer_state) + SMP_CACHE_BYTES - 1];
struct entropy_timer_state *stack = (void *)PTR_ALIGN(__stack, SMP_CACHE_BYTES);

It's silly, but I'm not aware of a better option.

- Eric