Re: [PATCH 2/2] x86/random: Issue a warning if RDRAND or RDSEED fails

From: H. Peter Anvin
Date: Tue Feb 06 2024 - 13:51:57 EST


On February 6, 2024 4:04:45 AM PST, "Dr. Greg" <greg@xxxxxxxxxxxx> wrote:
>On Tue, Feb 06, 2024 at 08:04:57AM +0000, Daniel P. Berrang?? wrote:
>
>Good morning to everyone.
>
>> On Mon, Feb 05, 2024 at 07:12:47PM -0600, Dr. Greg wrote:
>> >
>> > Actually, I now believe there is clear evidence that the problem is
>> > indeed Intel specific. In light of our testing, it will be
>> > interesting to see what your 'AR' returns with respect to an official
>> > response from Intel engineering on this issue.
>> >
>> > One of the very bright young engineers collaborating on Quixote, who
>> > has been following this conversation, took it upon himself to do some
>> > very methodical engineering analysis on this issue. I'm the messenger
>> > but this is very much his work product.
>> >
>> > Executive summary is as follows:
>> >
>> > - No RDRAND depletion failures were observable with either the Intel
>> > or AMD hardware that was load tested.
>> >
>> > - RDSEED depletion is an Intel specific issue, AMD's RDSEED
>> > implementation could not be provoked into failure.
>
>> My colleague ran a multithread parallel stress test program on his
>> 16core/2HT AMD Ryzen (Zen4 uarch) and saw a 80% failure rate in
>> RDSEED.
>
>Interesting datapoint, thanks for forwarding it along, so the issue
>shows up on at least some AMD platforms as well.
>
>On the 18 core/socket Intel Skylake platform, the parallelized
>depletion test forces RDSEED success rates down to around 2%. It
>would appear that your tests suggest that the AMD platform fairs
>better than the Intel platform.
>
>So this is turning into even more of a morass, given that RDSEED
>depletion on AMD may be a function of the micro-architecture the
>platform is based on. The other variable is that our AMD test
>platform had a substantially higher core count per socket, one would
>assume that would result in higher depletion rates, if the operative
>theory of socket common RNG infrastructure is valid.
>
>Unless AMD engineering understands the problem and has taken some type
>of action on higher core count systems to address the issue.
>
>Of course, the other variable may be how the parallelized stress test
>is conducted. If you would like to share your implementation source
>we could give it a twirl on the systems we have access to.
>
>The continuing operative question is whether or not any of this ever
>leads to an RDRAND failure.
>
>We've conducted some additional tests on the Intel platform where
>RDSEED depletion was driven low as possible, ~1-2% success rates,
>while RDRAND depletion tests were being run simultaneously. No RDRAND
>failures have been noted.
>
>So the operative question remains, why worry about this if RDRAND is
>used as the randomness primitive.
>
>We haven't seen anything out of Intel yet on this, maybe AMD has a
>quantifying definition for 'astronomical' when it comes to RDRAND
>failures.
>
>The silence appears to be deafening out of the respective engineering
>camps... :-)
>
>> > - AMD's RDRAND/RDSEED implementation is significantly slower than
>> > Intel's.
>
>> Yes, we also noticed the AMD impl is horribly slow compared to
>> Intel, had to cut test iterations x100.
>
>The operative question is the impact of 'slow', in the absence of
>artifical stress tests.
>
>It would seem that a major question is what are or were the
>engineering thought processes on the throughput of the hardware
>randomness instructions.
>
>Intel documents the following randomness throughput rates:
>
>RDSEED: 3 Gbit/second
>RDRAND: 6.4 Gbit/second
>
>If there is the possibility of over-harvesting randomness, why not
>design the implementations to be clamped at some per core value such
>as a megabit/second. In the case of the documented RDSEED generation
>rates, that would allow the servicing of 3222 cores, if my math at
>0530 in the morning is correct.
>
>Would a core need more than 128 kilobytes of randomness, ie. one
>second of output, to effectively seed a random number generator?
>
>A cynical conclusion would suggest engineering acquiesing to marketing
>demands... :-)
>
>> With regards,
>> Daniel
>
>Have a good day.
>
>As always,
>Dr. Greg
>
>The Quixote Project - Flailing at the Travails of Cybersecurity
> https://github.com/Quixote-Project

You do realize, right, that the "deafening silence" is due to the need for research and discussions on our part, and presumably AMD's.

In addition, quite frankly, your rather abusive language isn't exactly encouraging people to speak publicly based on immediately available and therefore inherently incomplete and/or dated information, meaning that we have had to take even what discussions we might have been able to have in public without IP concerns behind the scenes.

Yes, we work for Intel. No, we don't know every detail about every Intel chip ever created off the top of my head, nor do we necessarily know the exact person that is *currently* in charge of the architecture of a particular unit, nor is it necessarily true that even *that* person knows all the exact properties of the behavior of their unit when integrated into a particular SoC, as units are modular by design.