Re: [PATCH] random: align entropy_timer_state to cache line
From: Jason A. Donenfeld
Date: Wed Nov 30 2022 - 14:31:53 EST
On Wed, Nov 30, 2022 at 7:59 PM Eric Biggers <ebiggers@xxxxxxxxxx> wrote:
>
> On Wed, Nov 30, 2022 at 11:04:23AM +0100, Jason A. Donenfeld wrote:
> > > > diff --git a/drivers/char/random.c b/drivers/char/random.c
> > > > index 67558b95d531..2494e08c76d8 100644
> > > > --- a/drivers/char/random.c
> > > > +++ b/drivers/char/random.c
> > > > @@ -1262,7 +1262,7 @@ static void __cold entropy_timer(struct timer_list *timer)
> > > > static void __cold try_to_generate_entropy(void)
> > > > {
> > > > enum { NUM_TRIAL_SAMPLES = 8192, MAX_SAMPLES_PER_BIT = HZ / 15 };
> > > > - struct entropy_timer_state stack;
> > > > + struct entropy_timer_state stack ____cacheline_aligned;
> > >
> > > Several years ago, there was a whole thing about how __attribute__((aligned)) to
> > > more than 8 bytes doesn't actually work on stack variables in the kernel on x86,
> > > because the kernel only keeps the stack 8-byte aligned but gcc assumes it is
> > > 16-byte aligned. See
> > > https://lore.kernel.org/linux-crypto/20170110143340.GA3787@xxxxxxxxxxxxxxxxxxx/T/#t
> > >
> > > IIRC, nothing was done about it at the time.
> > >
> > > Has that been resolved in the intervening years?
> >
> > Maybe things are different for ____cacheline_aligned, which is 64 bytes.
> > Reading that thread, it looks like it was a case of trying to align the
> > stack to 16 bytes, but gcc assumed 16 bytes already while the kernel
> > only gave it 8. So gcc didn't think it needed to emit any code to align
> > it. Here, though, it's 64, and gcc certainly isn't assuming 64-byte
> > stack alignment.
> >
> > Looking at the codegen, gcc appears to doing `rsp = (rsp & ~63) - 64`,
> > which appears correct.
>
> Well, if gcc thinks the stack is already 16-byte aligned, then it would be
> perfectly within its rights to do 'rsp = (rsp & ~47) - 64', right? You probably
> don't want to be relying on an implementation detail of gcc codegen...
The really pathological one would be ~48, which would just clear those
two extra bits. I can't imagine gcc or clang ever deciding to do that.
But I guess they could?
What would you recommend here? kmalloc'ing it instead? Keeping things
as is with ____cacheline_aligned, since this has always been broken,
and it's not the end of the world? Something else?
Jason