RE: [PATCH] x86/entry/64: randomize kernel stack offset upon syscall

From: Reshetova, Elena
Date: Thu May 09 2019 - 03:02:34 EST



> * Reshetova, Elena <elena.reshetova@xxxxxxxxx> wrote:
>
> > > * Reshetova, Elena <elena.reshetova@xxxxxxxxx> wrote:
> > >
> > > > CONFIG_PAGE_TABLE_ISOLATION=n:
> > > >
> > > > base: Simple syscall: 0.0510 microseconds
> > > > get_random_bytes(4096 bytes buffer): Simple syscall: 0.0597 microseconds
> > > >
> > > > So, pure speed wise get_random_bytes() with 1 page per-cpu buffer wins.
> > >
> > > It still adds +17% overhead to the system call path, which is sad.
> > > Why is it so expensive?
> >
> > I guess I can experiment further with buffer size increase and/or
> > using HW acceleration (I mostly played around different rdrand paths now).
> >
> > What would be acceptable overheard approximately (so that I know how
> > much I need to squeeze this thing)?
>
> As much as possible? No idea, I'm sad about anything that is more than
> 0%, and I'd be *really* sad about anything more than say 1-2%.

Ok, understood.

>
> I find it ridiculous that even with 4K blocked get_random_bytes(), which
> gives us 32k bits, which with 5 bits should amortize the RNG call to
> something like "once per 6553 calls", we still see 17% overhead? It's
> either a measurement artifact, or something doesn't compute.

If you check what happens underneath of get_random_bytes(), there is
a fair amount of stuff that is going on, including reseeding CRNG if reseeding
interval has passed (see _extract_crng()). It also even attempts to stir in more
entropy from rdrand if avalaible:


I will look into this whole construction
slowly now to investigate. I did't optimize anything yet also (I take 8 bits at
the time for offset), but these small optimization won't make performance
impact from 17% --> 2%, so pointless for now, need a more radical shift.

Best Regards,
Elena.