Re: [PATCH v12 05/14] x86/sgx: Implement basic EPC misc cgroup functionality

From: Haitao Huang
Date: Thu Apr 18 2024 - 18:41:33 EST


On Tue, 16 Apr 2024 08:22:06 -0500, Huang, Kai <kai.huang@xxxxxxxxx> wrote:

On Mon, 2024-04-15 at 20:20 -0700, Haitao Huang wrote:
From: Kristen Carlson Accardi <kristen@xxxxxxxxxxxxxxx>

SGX Enclave Page Cache (EPC) memory allocations are separate from normal
RAM allocations, and are managed solely by the SGX subsystem. The
existing cgroup memory controller cannot be used to limit or account for
SGX EPC memory, which is a desirable feature in some environments. For
instance, within a Kubernetes environment, while a user may specify a
particular EPC quota for a pod, the orchestrator requires a mechanism to
enforce that the pod's actual runtime EPC usage does not exceed the
allocated quota.

Utilize the misc controller [admin-guide/cgroup-v2.rst, 5-9. Misc] to
limit and track EPC allocations per cgroup. Earlier patches have added
the "sgx_epc" resource type in the misc cgroup subsystem. Add basic
support in SGX driver as the "sgx_epc" resource provider:

- Set "capacity" of EPC by calling misc_cg_set_capacity()
- Update EPC usage counter, "current", by calling charge and uncharge
APIs for EPC allocation and deallocation, respectively.
- Setup sgx_epc resource type specific callbacks, which perform
initialization and cleanup during cgroup allocation and deallocation,
respectively.

With these changes, the misc cgroup controller enables users to set a hard
limit for EPC usage in the "misc.max" interface file. It reports current
usage in "misc.current", the total EPC memory available in
"misc.capacity", and the number of times EPC usage reached the max limit
in "misc.events".

For now, the EPC cgroup simply blocks additional EPC allocation in
sgx_alloc_epc_page() when the limit is reached. Reclaimable pages are
still tracked in the global active list, only reclaimed by the global
reclaimer when the total free page count is lower than a threshold.

Later patches will reorganize the tracking and reclamation code in the
global reclaimer and implement per-cgroup tracking and reclaiming.

Co-developed-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
Signed-off-by: Kristen Carlson Accardi <kristen@xxxxxxxxxxxxxxx>
Co-developed-by: Haitao Huang <haitao.huang@xxxxxxxxxxxxxxx>
Signed-off-by: Haitao Huang <haitao.huang@xxxxxxxxxxxxxxx>
Reviewed-by: Jarkko Sakkinen <jarkko@xxxxxxxxxx>
Reviewed-by: Tejun Heo <tj@xxxxxxxxxx>
Tested-by: Jarkko Sakkinen <jarkko@xxxxxxxxxx>

I don't see any big issue, so feel free to add:

Reviewed-by: Kai Huang <kai.huang@xxxxxxxxx>

Thanks

Nitpickings below:

[...]


--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2022-2024 Intel Corporation. */
+
+#include <linux/atomic.h>
+#include <linux/kernel.h>

It doesn't seem you need the above two here.

Probably they are needed in later patches, in that case we can move to the
relevant patch(es) that they got used.

However I think it's better to explicitly include <linux/slab.h> since
kzalloc()/kfree() are used.

Btw, I am not sure whether you want to use <linux/kernel.h> because looks
it contains a lot of unrelated staff. Anyway I guess nobody cares.


I'll check and remove as needed.

+#include "epc_cgroup.h"
+
+/* The root SGX EPC cgroup */
+static struct sgx_cgroup sgx_cg_root;

The comment isn't necessary (sorry didn't notice before), because the code
is pretty clear saying that IMHO.


Was requested by Jarkko:
https://lore.kernel.org/lkml/CYU504RLY7QU.QZY9LWC076NX@suppilovahvero/#t

[...]


--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.h
@@ -0,0 +1,72 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _SGX_EPC_CGROUP_H_
+#define _SGX_EPC_CGROUP_H_
+
+#include <asm/sgx.h>

I don't see why you need <asm/sgx.h> here. Also, ...

+#include <linux/cgroup.h>
+#include <linux/misc_cgroup.h>
+
+#include "sgx.h"

... "sgx.h" already includes <asm/sgx.h>

[...]

right


+static inline struct sgx_cgroup *sgx_get_current_cg(void)
+{
+ /* get_current_misc_cg() never returns NULL when Kconfig enabled */
+ return sgx_cgroup_from_misc_cg(get_current_misc_cg());
+}

I spent some time looking into this. And yes if I was reading code
correctly the get_current_misc_cg() should never return NULL when Kconfig
is on.

I typed my analysis below in [*]. And it would be helpful if any cgroup
expert can have a second eye on this.

[...]

Thanks for checking this and I did similar and agree with the conclusion. I think this is confirmed also by Michal's description AFAICT:
"
The current implementation creates root css object (see cgroup_init(),
cgroup_ssid_enabled() check is after cgroup_init_subsys()).
I.e. it will look like all tasks are members of root cgroup wrt given
controller permanently and controller attribute files won't exist."


--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -6,6 +6,7 @@
#include <linux/highmem.h>
#include <linux/kthread.h>
#include <linux/miscdevice.h>
+#include <linux/misc_cgroup.h>

Is this needed? I believe SGX variants in "epc_cgroup.h" should be enough
for sgx/main.c?

[...]


right

[*] IIUC get_current_misc_cg() should never return NULL when Kconfig is on
yes

[...]
Thanks
Haitao