On Thu, 2023-05-25 at 15:59 +0200, Mickaël Salaün wrote:
[ snip ]
The kernel often creates writable aliases in order to write to
protected data (kernel text, etc). Some of this is done right as
text
is being first written out (alternatives for example), and some
happens
way later (jump labels, etc). So for verification, I wonder what
stage
you would be verifying? If you want to verify the end state, you
would
have to maintain knowledge in the verifier of all the touch-ups the
kernel does. I think it would get very tricky.
For now, in the static kernel case, all rodata and text GPA is
restricted, so aliasing such memory in a writable way before or after
the KVM enforcement would still restrict write access to this memory,
which could be an issue but not a security one. Do you have such
examples in mind?
On x86, look at all the callers of the text_poke() family. In
arch/x86/include/asm/text-patching.h.
It also seems it will be a decent ask for the guest kernel to keep
track of GPA permissions as well as normal virtual memory
pemirssions,
if this thing is not widely used.
This would indeed be required to properly handle the dynamic cases.
So I wondering if you could go in two directions with this:
1. Make this a feature only for super locked down kernels (no
modules,
etc). Forbid any configurations that might modify text. But eBPF is
used for seccomp, so you might be turning off some security
protections
to get this.
Good idea. For "super locked down kernels" :) , we should disable all
kernel executable changes with the related kernel build configuration
(e.g. eBPF JIT, kernel module, kprobes…) to make sure there is no
such
legitimate access. This looks like an acceptable initial feature.
How many users do you think will want this protection but not
protections that would have to be disabled? The main one that came to
mind for me is cBPF seccomp stuff.
But also, the alternative to JITing cBPF is the eBPF interpreter which
AFAIU is considered a juicy enough target for speculative attacks that
they created an option to compile it out. And leaving an interpreter in
the kernel means any data could be "executed" in the normal non-
speculative scenario, kind of working around the hypervisor executable
protections. Dropping e/cBPF entirely would be an option, but then I
wonder how many users you have left. Hopefully that is all correct,
it's hard to keep track with the pace of BPF development.
I wonder if it might be a good idea to POC the guest side before
settling on the KVM interface. Then you can also look at the whole
thing and judge how much usage it would get for the different options
of restrictions.
2. Loosen the rules to allow the protections to not be so one-way
enable. Get less security, but used more widely.
This is our goal. I think both static and dynamic cases are
legitimate
and have value according to the level of security sought. This should
be
a build-time configuration.
Yea, the proper way to do this is probably to move all text handling
stuff into a separate domain of some sort, like you mentioned
elsewhere. It would be quite a job.