Re: Linux guest kernel threat model for Confidential Computing
From: Dr. David Alan Gilbert
Date: Thu Jan 26 2023 - 08:59:18 EST
* Reshetova, Elena (elena.reshetova@xxxxxxxxx) wrote:
> > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > > Replying only to the not-so-far addressed points.
> > >
> > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > > > > Hi Greg,
> >
> > <...>
> >
> > > > > 3) All the tools are open-source and everyone can start using them right
> > away
> > > > even
> > > > > without any special HW (readme has description of what is needed).
> > > > > Tools and documentation is here:
> > > > > https://github.com/intel/ccc-linux-guest-hardening
> > > >
> > > > Again, as our documentation states, when you submit patches based on
> > > > these tools, you HAVE TO document that. Otherwise we think you all are
> > > > crazy and will get your patches rejected. You all know this, why ignore
> > > > it?
> > >
> > > Sorry, I didn’t know that for every bug that is found in linux kernel when
> > > we are submitting a fix that we have to list the way how it has been found.
> > > We will fix this in the future submissions, but some bugs we have are found by
> > > plain code audit, so 'human' is the tool.
> >
> > My problem with that statement is that by applying different threat
> > model you "invent" bugs which didn't exist in a first place.
> >
> > For example, in this [1] latest submission, authors labeled correct
> > behaviour as "bug".
> >
> > [1] https://lore.kernel.org/all/20230119170633.40944-1-
> > alexander.shishkin@xxxxxxxxxxxxxxx/
>
> Hm.. Does everyone think that when kernel dies with unhandled page fault
> (such as in that case) or detection of a KASAN out of bounds violation (as it is in some
> other cases we already have fixes or investigating) it represents a correct behavior even if
> you expect that all your pci HW devices are trusted? What about an error in two
> consequent pci reads? What about just some failure that results in erroneous input?
I'm not sure you'll get general agreement on those answers for all
devices and situations; I think for most devices for non-CoCo
situations, then people are generally OK with a misbehaving PCI device
causing a kernel crash, since most people are running without IOMMU
anyway, a misbehaving device can cause otherwise undetectable chaos.
I'd say:
a) For CoCo, a guest (guaranteed) crash isn't a problem - CoCo doesn't
guarantee forward progress or stop the hypervisor doing something
truly stupid.
b) For CoCo, information disclosure, or corruption IS a problem
c) For non-CoCo some people might care about robustness of the kernel
against a failing PCI device, but generally I think they worry about
a fairly clean failure, even in the unexpected-hot unplug case.
d) It's not clear to me what 'trust' means in terms of CoCo for a PCIe
device; if it's a device that attests OK and we trust it is the device
it says it is, do we give it freedom or are we still wary?
Dave
> Best Regards,
> Elena.
>
--
Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK