RE: [PATCH 2/2] x86/random: Issue a warning if RDRAND or RDSEED fails

From: Reshetova, Elena
Date: Wed Jan 31 2024 - 03:17:14 EST


> On Tue, Jan 30, 2024 at 06:49:15PM +0100, Jason A. Donenfeld wrote:
> > On Tue, Jan 30, 2024 at 6:32 PM Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
> > >
> > > On 1/30/24 05:45, Reshetova, Elena wrote:
> > > >> You're the Intel employee so you can find out about this with much
> > > >> more assurance than me, but I understand the sentence above to be _way
> > > >> more_ true for RDRAND than for RDSEED. If your informed opinion is,
> > > >> "RDRAND failing can only be due to totally broken hardware"
> > > > No, this is not the case per Intel SDM. I think we can live under a simple
> > > > assumption that both of these instructions can fail not just due to broken
> > > > HW, but also due to enough pressure put into the whole DRBG construction
> > > > that supplies random numbers via RDRAND/RDSEED.
> > >
> > > I don't think the SDM is the right thing to look at for guidance here.
> > >
> > > Despite the SDM allowing it, we (software) need RDRAND/RDSEED failures
> > > to be exceedingly rare by design. If they're not, we're going to get
> > > our trusty torches and pitchforks and go after the folks who built the
> > > broken hardware.
> > >
> > > Repeat after me:
> > >
> > > Regular RDRAND/RDSEED failures only occur on broken hardware
> > >
> > > If it's nice hardware that's gone bad, then we WARN() and try to make
> > > the best of it. If it turns out that WARN() was because of a broken
> > > hardware _design_ then we go sharpen the pitchforks.
> > >
> > > Anybody disagree?
> >
> > Yes, I disagree. I made a trivial test that shows RDSEED breaks easily
> > in a busy loop. So at the very least, your statement holds true only
> > for RDRAND.
> >
> > But, anyway, if the statement "RDRAND failures only occur on broken
> > hardware" is true, then a WARN() in the failure path there presents no
> > DoS potential of any kind, and so that's a straightforward conclusion
> > to this discussion. However, that really hinges on "RDRAND failures
> > only occur on broken hardware" being a true statement.
>
> There's a useful comment here from an Intel engineer
>
> https://web.archive.org/web/20190219074642/https://software.intel.com/en-
> us/blogs/2012/11/17/the-difference-between-rdrand-and-rdseed
>
> "RDRAND is, indeed, faster than RDSEED because it comes
> from a hardware-based pseudorandom number generator.
> One seed value (effectively, the output from one RDSEED
> command) can provide up to 511 128-bit random values
> before forcing a reseed"
>
> We know we can exhaust RDSEED directly pretty trivially. Making your
> test program run in parallel across 20 cpus, I got a mere 3% success
> rate from RDSEED.
>
> If RDRAND is reseeding every 511 values, RDRAND output would have
> to be consumed significantly faster than RDSEED in order that the
> reseed will happen frequently enough to exhaust the seeds.
>
> This looks pretty hard, but maybe with a large enough CPU count
> this will be possible in extreme load ?
>
> So I'm not convinced we can blindly wave away RDRAND failures as
> guaranteed to mean broken hardware.

This matches both my understanding (I do have cryptography background
and understanding how cryptographic RNGs work)
and official public docs that Intel published on this matter.
Given that the physical entropy source is limited anyhow, and by giving
enough pressure on the whole construction you should be able to
make RDRAND fail because if the intermediate AES-CBC MAC extractor/
conditioner is not getting its min entropy input rate, it wont
produce a proper seed for AES CTR DRBG.
Of course exact details/numbers can wary between different generations of
Intel DRNG implementation, and the platforms where it is running on,
so be careful to sticking to concrete numbers.

That said, I have taken an AR to follow up internally on what can be done
to improve our situation with RDRAND/RDSEED. But I would still like to
finish the discussion on what people think should be done in the
meanwhile keeping in mind that the problem is not intel specific, despite us
intel people bringing it for public discussion first. The old saying is still here:
"Please don’t shoot the messenger" )) We are actually trying to be open
about these things and create a public discussion.

Best Regards,
Elena.

>
> With regards,
> Daniel
> --
> |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o- https://fstop138.berrange.com :|
> |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|