RE: x86/random: Speculation to the rescue

From: Thomas Gleixner
Date: Tue Oct 15 2019 - 17:50:43 EST


David,

On Tue, 1 Oct 2019, David Laight wrote:
> From: Linus Torvalds
> > Sent: 30 September 2019 17:16
> >
> > On Mon, Sep 30, 2019 at 6:16 AM Theodore Y. Ts'o <tytso@xxxxxxx> wrote:
> > >
> > > Which is to say, I'm still worried that people with deep access to the
> > > implementation details of a CPU might be able to reverse engineer what
> > > a jitter entropy scheme produces. This is why I'd be curious to see
> > > the results when someone tries to attack a jitter scheme on a fully
> > > open, simple architecture such as RISC-V.
> >
> > Oh, I agree.
> >
> > One of the reasons I didn't like some of the other jitter entropy
> > things was that they seemed to rely _entirely_ on just purely
> > low-level CPU unpredictability. I think that exists, but I think it
> > makes for problems for really simple cores.
> >
> > Timing over a bigger thing and an actual interrupt (even if it's
> > "just" a timer interrupt, which is arguably much closer to the CPU and
> > has a much higher likelihood of having common frequency domains with
> > the cycle counter etc) means that I'm pretty damn convinced that a big
> > complex CPU will absolutely see issues, even if it has big caches.
>
> Agreed, you need something that is actually non-deterministic.
> While 'speculation' is difficult to predict, it is actually fully deterministic.

I surely agree with Linus that simple architectures could have a more or
less predictable or at least searchable behaviour. If we talk about complex
x86 CPUs, I tend to disagree.

Even the Intel architects cannot explain why the following scenario is not
deterministic at all:

Single CPU
No NMIs
No MCEs
No DMAs in the background, nothing.
CPU frequency is identical to TSC frequency

volatile int foo;

local_irq_disable();
start = rdtscp();
for (i = 0; i < 100000; i++)
foo++;
end = rdtscp();
local_irq_enable();

Repeat that loop as often as you wish and observe the end - start
delta. You'll see

min <= delta <= N * min

where N is something >= 2. The actual value of N depends on the micro
architecture, but is not identical on two systems and not even identical on
the same system after boot and 1e6 iterations of the above.

Aside of the fact that N is insane big there is absolutely no pattern in
the delta value even over a large number of runs.

> Until you get some perturbation from an outside source the cpu state
> (including caches and DRAM) is likely to be the same on every boot.

See above and read Nicholas paper. It's simply not that likely on anything
halfways modern.

> For a desktop (etc) PC booting from a disk (even SSD) you'll get some variation.
> Boot an embedded system from onboard flash and every boot could
> well be the same (or one of a small number of results).
>
> Synchronising a signal between frequency domains might generate
> some 'randomness', but maybe not if both come from the same PLL.
>
> Even if there are variations, they may not be large enough to give
> a lot of variations in the state.
> The variations between systems could also be a lot larger than the
> variations within a system.

The variations between systems are going to be larger as any minimal
tolerance in the components have an influence.

But even between two boots on a 'noiseless' embedded system factors like
temperature, PLL lock times, swing in times of voltage regulators and other
tiny details create non-deterministic behaviour. In my former life as a
hardware engineer I had to analyze such issues deeply as they create
serious trouble for some application scenarios, but those systems where
based on very trivial and truly deterministic silicon parts. No commodity
hardware vendor will ever go the extra mile to address these things as the
effort required to get them under control is exponential vs. the effect.

Whether that's enough to create true entropy is a different question, but I
do not agree with the upfront dismissal of a potentially useful entropy
source.

I'm in no way saying that this should be used as the sole source of
entropy, but I definitely want to explore the inherent non-determinism of
modern OoO machines further.

The results are way too interesting to ignore them and amazingly fast at
least with the algorithm which I used in my initial post which started this
whole discussion.

I let that thing run on a testbox for the last two weeks while I was on
vacation and gathered 32768 random bits via that debugfs hack every 100ms,
i.e. a total of 1.2e7 samples amounting to ~4e11 bits. The analysis is
still running, but so far it holds up.

Thanks,

tglx