Re: [PATCH 1/2] x86/random: Retry on RDSEED failure

From: Daniel P. Berrangé
Date: Tue Jan 30 2024 - 09:43:39 EST


On Tue, Jan 30, 2024 at 03:06:14PM +0100, Jason A. Donenfeld wrote:
> Is that an accurate summary? If it is, then the actual problem is that
> the hardware provided to solve this problem doesn't actually solve it
> that well, so we're caught deciding between guest-guest DoS (some
> other guest on the system uses all RDRAND resources) and cryptographic
> failure because of a malicious host creating a deterministic
> environment.

In a CoCo VM environment, a guest DoS is not a unique threat
scenario, as it is unrelated to confidentiality. Ensuring
fair subdivision of resources between competeing guests is
just a general VM threat. There are many easy ways a host
admin can stop a guest making computational progress. Simply
not scheduling the guest vCPU threads is one. CoCo doesn't
try to solve this problem.

Preserving confidentiality is the primary aim of CoCo.

IOW, if the guest boot is stalled because the kernel is spinning
waiting on RDRAND to return data, that's fine. If the kernel
panics after "n" RDRAND failures in a row that's fine too. They
are both just yet another DoS scenario.

If the kernel ignores the RDRAND failure and lets it boot with
degraded RNG state there were susceptible to attacks, that would
not be OK for CoCo.

> But I have two questions:
>
> 1) Is this CoCo VM stuff even real? Is protecting guests from hosts
> actually possible in the end? Is anybody doing this? I assume they
> are, so maybe ignore this question, but I would like to register my
> gut feeling that on the Intel platform this seems like an endless
> whack-a-mole problem like SGX.

It is real, but it is also not perfect. I expect it /will/ be an
endless whack-a-mole problem though.

None the less, it is a significant layer of defence, as compared
to traditional VMs where the guest RAM is nothing more than a
'cat' command away from host admin exposure.

> 2) Can a malicious host *actually* create a fully deterministic
> environment? One that'll produce the same timing for the jitter
> entropy creation, and all the other timers and interrupts and things?
> I imagine the attestation part of CoCo means these VMs need to run on
> real Intel silicon and so it can't be single stepped in TCG or
> something, right? So is this problem actually a real one? And to what
> degree? Any good experimental research on this?
>
> Either way, if you're convinced RDRAND is the *only* way here, adding
> a `WARN_ON(is_in_early_boot)` to the RDRAND (but not RDSEED) failure
> path seems a fairly lightweight bandaid. I just wonder if the hardware
> people could come up with something more reliable that we wouldn't
> have to agonize over in the kernel.

If RDRAND failure is more of a theoretical problem than a practical
real world problem, I'd be inclined to just let the kernel loop on
RDRAND failure until it suceeds, with a WARN after 'n' iterations to
aid diagnosis of the stall in the unlikely even it did hit.

With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|