Re: Linux guest kernel threat model for Confidential Computing

From: Christophe de Dinechin
Date: Wed Feb 08 2023 - 11:59:51 EST



On 2023-02-08 at 11:58 +01, Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> wrote...
> On Wed, Feb 08, 2023 at 10:44:25AM +0000, Reshetova, Elena wrote:
>>
>> The CC threat model does change the traditional linux trust boundary regardless of
>> what mitigations are used (kernel config vs. runtime filtering). Because for the
>> drivers that CoCo guest happens to need, there is no way to fix this problem by
>> either of these mechanisms (we cannot disable the code that we need), unless somebody
>> writes a totally new set of coco specific drivers (who needs another set of
>> CoCo specific virtio drivers in the kernel?).
>
> It sounds like you want such a set of drivers, why not just write them?
> We have zillions of drivers already, it's not hard to write new ones, as
> it really sounds like that's exactly what you want to have happen here
> in the end as you don't trust the existing set of drivers you are using
> for some reason.

In the CC approach, the hypervisor is considered as hostile. The rest of the
system is not changed much. If we pass-through some existing NIC, we'd
rather use the existing driver for that NIC rather than reinvent
it. However, we need to also consider the possibility that someone
maliciously replaced the actual NIC with a cleverly crafted software
emulator designed to cause the driver to leak confidential data.


>> So, if the path is to be able to use existing driver kernel code, then we need:
>
> Wait, again, why? Why not just have your own? That should be the
> simplest thing overall. What's wrong with that?

That would require duplication for the majority of hardware drivers.



>> 1. these selective CoCo guest required drivers (small set) needs to be hardened
>> (or whatever word people prefer to use here), which only means that in
>> the presence of malicious host/hypervisor that can manipulate pci config space,
>> port IO and MMIO, these drivers should not expose CC guest memory
>> confidentiality or integrity (including via privilege escalation into CC guest).
>
> Again, stop it please with the "hardened" nonsense, that means nothing.
> Either the driver has bugs, or it doesn't. I welcome you to prove it
> doesn't :)

In a non-CC scenario, a driver is correct if, among other things, it does
not leak kernel data to user space. However, it assumes that PCI devices are
working correctly and according to spec.

In a CC scenario, an additional condition for correctness is that it must
not leak data from the trusted environment to the host. It assumes that a
_virtual_ PCI device can be implemented on the host side to cause an
existing driver to leak secrets to the host.

It is this additional condition that we are talking about.

Think of this as a bit similar to the introduction of IOMMUs, which meant
there was a new condition impacting _the entire kernel_ that you had to make
sure your DMA operations and IOMMU were in agreement. Here, it is a bit of a
similar situation: CC forbids some specific operations the same way an IOMMU
does, except instead of stray DMAs, it's stray accesses from the host.

Note that, as James Bottomley pointed out, a crash is not seen as a failure
of the CC model, unless it leads to a subsequent leak of confidential data.
Denial of service, through crash or otherwise, is so easy to do from host or
hypervisor side that it is entirely out of scope.


>
>> Please note that this only applies to a small set (in tdx virtio setup we have less
>> than 10 of them) of drivers and does not present invasive changes to the kernel
>> code. There is also an additional core pci/msi code that is involved with discovery
>> and configuration of these drivers, this code also falls into the category we need to
>> make robust.
>
> Again, why wouldn't we all want "robust" drivers? This is not anything
> new here,

What is new is that CC requires driver to be "robust" against a new kind of
attack "from below" (i.e. from the [virtual] hardware side).

> all you are somehow saying is that you are changing the thread
> model that the kernel "must" support. And for that, you need to then
> change the driver code to support that.

What is being argued is that CC is not robust unless we block host-side
attacks that can cause the guest to leak data to the host.

>
> So again, why not just have your own drivers and driver subsystem that
> meets your new requirements? Let's see what that looks like and if
> there even is any overlap between that and the existing kernel driver
> subsystems.

Would a "CC-aware PCI" subsystem fit your definition?

>
>> 2. rest of non-needed drivers must be disabled. Here we can argue about what
>> is the correct method of doing this and who should bare the costs of enforcing it.
>
> You bare that cost.

I believe the CC community understands that.

The first step before introducing modifications in the drivers is getting an
understanding of why we think that CC introduces a new condition for
robustness.

We will not magically turn all drivers into CC-safe drivers. It will take a
lot of time, and the patches are likely to come from the CC community. At
that stage, though, the question is: "do you understand the problem we are
trying to solve?". I hope that my IOMMU analogy above helps.


> Or you get a distro to do that.

Best a distro can do is have a minified kernel tuned for CC use-cases, or
enabling an hypothetical CONFIG_COCO_SAFETY configuration.
A distro cannot decide what work goes behing CONFIG_COCO_SAFETY.


> That's not up to us in the kernel community, sorry, we give you the option
> to do that if you want to, that's all that we can do.

I hope that the explanations above will help you change your mind on that
statement. That cannot be a config-only or custom-drivers-only solution.
(or maybe you can convince us it can ;-)

>
>> But from pure security point of view: the method that is simple and clear, that
>> requires as little maintenance as possible usually has the biggest chance of
>> enforcing security.
>
> Again, that's up to your configuration management. Please do it, tell
> us what doesn't work and send changes if you find better ways to do it.
> Again, this is all there for you to do today, nothing for us to have to
> do for you.
>
>> And given that we already have the concept of authorized devices in Linux,
>> does this method really brings so much additional complexity to the kernel?
>
> No idea, you tell us! :)
>
> Again, I recommend you just having your own drivers, that will allow you
> to show us all exactly what you mean by the terms you keep using. Why
> not just submit that for review instead?
>
> good luck!
>
> greg k-h


--
Cheers,
Christophe de Dinechin (https://c3d.github.io)
Theory of Incomplete Measurements (https://c3d.github.io/TIM)