RE: Linux guest kernel threat model for Confidential Computing
From: Reshetova, Elena
Date: Thu Jan 26 2023 - 08:29:05 EST
> On Thu, Jan 26, 2023 at 11:29:20AM +0000, Reshetova, Elena wrote:
> > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > > > Replying only to the not-so-far addressed points.
> > > >
> > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > > > > > Hi Greg,
> > >
> > > <...>
> > >
> > > > > > 3) All the tools are open-source and everyone can start using them right
> > > away
> > > > > even
> > > > > > without any special HW (readme has description of what is needed).
> > > > > > Tools and documentation is here:
> > > > > > https://github.com/intel/ccc-linux-guest-hardening
> > > > >
> > > > > Again, as our documentation states, when you submit patches based on
> > > > > these tools, you HAVE TO document that. Otherwise we think you all are
> > > > > crazy and will get your patches rejected. You all know this, why ignore
> > > > > it?
> > > >
> > > > Sorry, I didn’t know that for every bug that is found in linux kernel when
> > > > we are submitting a fix that we have to list the way how it has been found.
> > > > We will fix this in the future submissions, but some bugs we have are found
> by
> > > > plain code audit, so 'human' is the tool.
> > >
> > > My problem with that statement is that by applying different threat
> > > model you "invent" bugs which didn't exist in a first place.
> > >
> > > For example, in this [1] latest submission, authors labeled correct
> > > behaviour as "bug".
> > >
> > > [1] https://lore.kernel.org/all/20230119170633.40944-1-
> > > alexander.shishkin@xxxxxxxxxxxxxxx/
> >
> > Hm.. Does everyone think that when kernel dies with unhandled page fault
> > (such as in that case) or detection of a KASAN out of bounds violation (as it is in
> some
> > other cases we already have fixes or investigating) it represents a correct
> behavior even if
> > you expect that all your pci HW devices are trusted?
>
> This is exactly what I said. You presented me the cases which exist in
> your invented world. Mentioned unhandled page fault doesn't exist in real
> world. If PCI device doesn't work, it needs to be replaced/blocked and not
> left to be operable and accessible from the kernel/user.
Can we really assure correct operation of *all* pci devices out there?
How would such an audit be performed given a huge set of them available?
Isnt it better instead to make a small fix in the kernel behavior that would guard
us from such potentially not correctly operating devices?
>
> > What about an error in two consequent pci reads? What about just some
> > failure that results in erroneous input?
>
> Yes, some bugs need to be fixed, but they are not related to trust/not-trust
> discussion and PCI spec violations.
Let's forget the trust angle here (it only applies to the Confidential Computing
threat model and you clearly implying the existing threat model instead) and stick just to
the not-correctly operating device. What you are proposing is to fix *unknown* bugs
in multitude of pci devices that (in case of this particular MSI bug) can
lead to two different values being read from the config space and kernel incorrectly
handing this situation. Isn't it better to do the clear fix in one place to ensure such
situation (two subsequent reads with different values) cannot even happen in theory?
In security we have a saying that fixing a root cause of the problem is the most efficient
way to mitigate the problem. The root cause here is a double-read with different values,
so if it can be substituted with an easy and clear patch that probably even improves
performance as we do one less pci read and use cached value instead, where is the
problem in this particular case? If there are technical issues with the patch, of course we
need to discuss it/fix it, but it seems we are arguing here about whenever or not we want
to be fixing kernel code when we notice such cases...
Best Regards,
Elena