Re: [PATCH] x86/entry/64: randomize kernel stack offset upon syscall
From: Theodore Ts'o
Date: Tue Apr 16 2019 - 11:45:24 EST
So a couple of comments; I wasn't able to find the full context for
this patch, and looking over the thread on kernel-hardening from late
February still left me confused exactly what attacks this would help
us protect against (since this isn't my area and I didn't take the
time to read all of the links to slide decks, etc.)
So I'm not going to comment on the utility of this patch, but just on
the random number generator issues. If you're only going to be using
the low 8 bits of the output of get_prandom_u32(), even if two
adjacent calls to get_prandom_u32() (for which only the low 8 bits are
revealed) can be used to precisely identify which set of 2**120
potential prandom states could have generate that pair of states, it's
still going to take a lot of calls before you'd be able to figure out
the prandom's internal state.
It seems though the assumption that we're assuming the attacker has
arbitrary ability to get the low bits of the stack, so *if* that's
true, then eventually, you'd be able to get enough samples that you
could reverse engineer the prandom state. This could take long enough
that the process will have gotten rescheduled to another CPU, and
since the prandom state is per-cpu, that adds another wrinkle.
> > So the argument against using TSC directly was that it might be easy to
> > guess most of the TSC bits in timing attack. But IIRC there is fairly
> > solid evidence that the lowest TSC bits are very hard to guess and might
> > in fact be a very good random source.
> >
> > So what one could do, is for each invocation mix in the low (2?) bits of
> > the TSC into a per-cpu/task PRNG state. By always adding some fresh
> > entropy it would become very hard indeed to predict the outcome, even
> > for otherwise 'trivial' PRNGs.
>
> You could just feed 8 bits of TSC into a CRC. Or even xor the
> entire TSC over a CRC state and then cycle it at least 6 bits.
> Probably doesn't matter which CRC - but you may want one that is
> cheap in software. Even a 16bit CRC might be enough.
Do we only care about x86 in this discussion? Given "x86/entry/64",
I'm guessing the answer we're not trying to worry about how to protect
other architectures, like say ARM, that don't have a TSC?
If we do care about architectures w/o a TSC, how much cost are we
willing to pay as far as system call overhead is concerned?
If it's x86 specific, maybe the simplest thing to do is to use RDRAND
if it exists, and fall back to something involving a TSC and maybe
prandom_u32 (assuming on how bad you think the stack leak is going to
be) if RDRAND isn't available?
- Ted