Re: Linux guest kernel threat model for Confidential Computing

From: Theodore Ts'o
Date: Wed Feb 08 2023 - 08:44:33 EST


On Wed, Feb 08, 2023 at 10:44:25AM +0000, Reshetova, Elena wrote:
> 2. rest of non-needed drivers must be disabled. Here we can argue about what
> is the correct method of doing this and who should bare the costs of enforcing it.
> But from pure security point of view: the method that is simple and clear, that
> requires as little maintenance as possible usually has the biggest chance of
> enforcing security.
> And given that we already have the concept of authorized devices in Linux,
> does this method really brings so much additional complexity to the kernel?
> But hard to argue here without the code: we need to submit the filter proposal first
> (under internal review still).

I think the problem here is that we've had a lot of painful experience
where fuzzing produces a lot of false positives which then
security-types then insist that all kernel developers must fix so that
we can see the "important" security issues from the false positives.

So "as little maintenance as possible" and fuzzing have not
necessarily gone together. It might be less maintenance costs for
*you*, but it's not necessarily less maintenance work for *us*. I've
seen Red Hat principal engineers take completely bogus issues and
raise them to CVE "high" priority levels, when it was nothing like
that, thus forcing distro and data center people to be forced to do
global pushes to production because it's easier than trying to explain
to FEDramp auditors why the CVE SS is bogus --- and every single
unnecessary push to production has its own costs and risks.

I've seen the constant load of syzbot false positives that generate
noise in my inbox and in bug tracking issues assigned to me at $WORK.
I've seen the false positives generated by DEPT, which is why I've
pushed back on it. So if you are going to insist on fuzzing all of
the PCI config space, and treat them all as "bugs", there is going to
be huge pushback.

Even if the "fixes" are minor, and don't have any massive impact on
memory used or cache line misses or code/maintainability bloat, the
fact that we treat them as P3 quality of implementation issues, and
*you* treat them as P1 security bugs that must be fixed Now! Now!
Now! is going to cause friction. (This is especially true since CVE
SS scores are unidimentional, and what might be high security --- or
at least embarassing --- for CoCo, might be completely innocuous QOI
bugs for the rest of the world.)

So it might be that a simple, separate, kerenl config is going to be
the massively simpler way to go, instead of insisting that all PCI
device drivers must be fuzzed and be made CoCo safe, even if they will
never be used in a CoCo context. Again, please be cognizant about the
costs that CoCo may be imposing and pushing onto the rest of the
ecosystem.

Cheers,

- Ted