Re: [PATCH v3 07/18] nitro_enclaves: Init misc device providing the ioctl interface

From: Alexander Graf
Date: Thu May 28 2020 - 09:01:55 EST




On 27.05.20 00:24, Greg KH wrote:

On Tue, May 26, 2020 at 03:44:30PM +0200, Alexander Graf wrote:


On 26.05.20 15:17, Greg KH wrote:

On Tue, May 26, 2020 at 02:44:18PM +0200, Alexander Graf wrote:


On 26.05.20 14:33, Greg KH wrote:

On Tue, May 26, 2020 at 01:42:41PM +0200, Alexander Graf wrote:


On 26.05.20 08:51, Greg KH wrote:

On Tue, May 26, 2020 at 01:13:23AM +0300, Andra Paraschiv wrote:
+#define NE "nitro_enclaves: "

Again, no need for this.

+#define NE_DEV_NAME "nitro_enclaves"

KBUILD_MODNAME?

+#define NE_IMAGE_LOAD_OFFSET (8 * 1024UL * 1024UL)
+
+static char *ne_cpus;
+module_param(ne_cpus, charp, 0644);
+MODULE_PARM_DESC(ne_cpus, "<cpu-list> - CPU pool used for Nitro Enclaves");

Again, please do not do this.

I actually asked her to put this one in specifically.

The concept of this parameter is very similar to isolcpus= and maxcpus= in
that it takes CPUs away from Linux and instead donates them to the
underlying hypervisor, so that it can spawn enclaves using them.

From an admin's point of view, this is a setting I would like to keep
persisted across reboots. How would this work with sysfs?

How about just as the "initial" ioctl command to set things up? Don't
grab any cpu pools until asked to. Otherwise, what happens when you
load this module on a system that can't support it?

That would give any user with access to the enclave device the ability to
remove CPUs from the system. That's clearly a CAP_ADMIN task in my book.

Ok, what's wrong with that?

Would you want random users to get the ability to hot unplug CPUs from your
system? At unlimited quantity? I don't :).

A random user, no, but one with admin rights, why not? They can do that
already today on your system, this isn't new.

Hence this whole split: The admin defines the CPU Pool, users can safely
consume this pool to spawn enclaves from it.

But having the admin define that at module load / boot time, is a major
pain. What tools do they have that allow them to do that easily?

The normal toolbox: editing /etc/default/grub, adding an /etc/modprobe.d/
file.

Editing grub files is horrid, come on...

It's not editing grub files, it's editing template config files that then are used as input for grub config file generation :).

When but at module load / boot time would you define it? I really don't want
to have a device node that in theory "the world" can use which then allows
any user on the system to hot unplug every CPU but 0 from my system.

But you have that already when the PCI device is found, right? What is
the initial interface to the driver? What's wrong with using that?

Or am I really missing something as to how this all fits together with
the different pieces? Seeing the patches as-is doesn't really provide a
good overview, sorry.

Ok, let me walk you through the core donation process.

Imagine you have a parent VM with 8 cores. Every one of those virtual cores is 1:1 mapped to a physical core.

You enumerate the PCI device, you start working with it. None of that changes your topology.

You now create an enclave spanning 2 cores. The hypervisor will remove the 1:1 map for those 2 cores and instead mark them as "free floating" on the remaining 6 cores. It then uses the 2 freed up cores and creates a 1:1 map for the enclave's 2 cores

To ensure that we still see a realistic mapping of core topology, we need to remove those 2 cores from the parent VM's scope of execution. The way this is done today is by going through CPU offlining.

The first and obvious option would be to offline all respective CPUs when an enclave gets created. But unprivileged users should be able to spawn enclaves. So how do I ensure that my unprivileged user doesn't just offline all of my parent VM's CPUs?

The option implemented here is that we fold this into a two-stage approach. The admin reserves a "pool" of cores for enclaves to use. Unprivileged users can then consume cores from that pool, but not more than those.

That way, unprivileged users have no influence over whether a core is enabled or not. They can only consume the pool of cores that was dedicated for enclave use.

It also has the big advantage that you don't have dynamically changing CPU topology in your system. Long living processes that adjust their environment to the topology can still do so, without cores getting pulled out under their feet.


So I really don't think an ioctl would be a great user experience. Same for
a sysfs file - although that's probably slightly better than the ioctl.

You already are using ioctls to control this thing, right? What's wrong
with "one more"? :)

So what we *could* do is add an ioctl to set the pool size which then does a
CAP_ADMIN check. That however means you now are in priority hell:

A user that wants to spawn an enclave as part of an nginx service would need
to create another service to set the pool size and indicate the dependency
in systemd control files.

Is that really better than a module parameter?

module parameters are hard to change, and manage control over who really
is changing them.

What is hard about

$ echo 1-5 > /sys/module/nitro_enclaves/parameters/ne_cpus

I also fail to understand the argument about managing control over who is changing them. Only someone with CAP_ADMIN can change them, no? I feel like I'm missing something obvious in your argumentation :).


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879