Re: [PATCH 2/2] x86/random: Issue a warning if RDRAND or RDSEED fails

From: Dr. Greg
Date: Thu Feb 01 2024 - 16:11:20 EST


On Thu, Feb 01, 2024 at 11:08:09AM +0000, Daniel P. Berrang?? wrote:

Hi Dan, thanks for the thoughts.

> On Thu, Feb 01, 2024 at 03:54:51AM -0600, Dr. Greg wrote:
> > I suspect that the achievable socket core count cannot effectively
> > overwhelm the 1022x amplification factor inherent in the design of the
> > RDSEED based seeding of RDRAND.

> In testing I could get RDSEED down to < 3% success rate when
> running on 20 cores in parallel on a laptop class i7. If that
> failure rate can be improved by a little more than one order
> of magnitude to 0.1% we're starting to get to the point where
> it might be enough to make RDRAND re-seed fail.
>
> Intel's Sierra Forest CPUs are said to have a variant with 288
> cores per socket, which is an order of magnitude larger. It is
> conceivable this might be large enough to demonstrate RDRAND
> failure in extreme load. Then again who knows what else has
> changed that might alter the equation, maybe the DRBG is also
> better / faster. Only real world testing can say for sure.
> One thing is certain though, core counts per socket keep going
> up, so the potential worst case load on RDSEED will increase...

Indeed, that would seem to be the important and operative question
that Intel could answer, maybe Dave and Elena will be able to provide
some guidance.

Until someone can actually demonstrate a sustained RDRAND depletion
attack we don't have an issue, only a lot of wringing of hands and
other handwaving on what we should do.

The thing that intrigues me is that we have two AMD engineers
following this, do you guys have any comments, reflections? Unless I
misunderstand, SEV-SNP has the same challenges and issues.

As of late you guys have been delivering higher core counts that would
make your platform more susceptible. Does your hardware design not
have a socket common RNG architecture that makes RDSEED vulnerable to
socket adversarial depletion? Is this a complete non-issue in
practice?

Big opportunity here to proclaim: "Just buy AMD"... :-)

> > We will see if Elena can come up with what Intel engineering's
> > definition of 'astronomical' is.. :-)
> >
> > > There's a special case with Confidential Compute VM's, since the
> > > assumption is that you want to protect against even a malicious
> > > hypervisor who could theoretically control all other sources of
> > > timing uncertainty. And so, yes, in that case, the only thing we
> > > can do is Panic if RDRAND fails.
> >
> > Indeed.
> >
> > The bigger question, which I will respond to Elena with, is how much
> > this issue calls the entire question of confidential computing into
> > question.

> A denial of service (from a panic on RDRAND fail) doesn't undermine
> confidental computing. Guest data confidentiality is maintained by
> panicing on RDRAND failure and DoS protection isn't a threat that CC
> claims to be able to mitigate in general.

Yes, if there is a problem with RDRAND we have a CoCo solution, full
stop.

The issue that I was raising with Elena is more generic, to wit:

Her expressed concern is that a code construct looking something like this,
rdrand() returning 0 on success:

for (i= 0; i < 9; ++i)
if (!rdrand(&seed))
break;
sleep(some time);
}
if (i == 9)
BUG("No entropy");

do_something_with(seed);

Could be sufficiently manipulated by a malicious hypervisor in a TDX
environment so as to compromise its functionality.

If this level of control is indeed possible, given the long history of
timing and side-channel attacks against cryptography, this would seem
to pose significant questions as to whether or not CoCo can deliver on
its stated goals.

> With regards,
> Daniel

Have a good evening.

As always,
Dr. Greg

The Quixote Project - Flailing at the Travails of Cybersecurity
https://github.com/Quixote-Project