Re: [PATCH v15 05/14] x86/sgx: Implement basic EPC misc cgroup functionality
From: Huang, Kai
Date: Tue Jun 18 2024 - 07:52:01 EST
> @@ -921,7 +956,8 @@ static int __init sgx_init(void)
> if (!sgx_page_cache_init())
> return -ENOMEM;
>
> - if (!sgx_page_reclaimer_init()) {
> + if (!sgx_page_reclaimer_init() || !sgx_cgroup_init()) {
> + misc_cg_set_capacity(MISC_CG_RES_SGX_EPC, 0);
> ret = -ENOMEM;
> goto err_page_cache;
> }
This code change is wrong due to two reasons:
1) If sgx_page_reclaimer_init() was successful, but sgx_cgroup_init()
failed, you actually need to 'goto err_kthread' because the ksgxd() kernel
thread is already created and is running.
2) There are other cases after here that can also result in sgx_init() to
fail completely, e.g., registering sgx_dev_provision mics device. We need
to reset the capacity to 0 for those cases as well.
AFAICT, you need something like:
diff --git a/arch/x86/kernel/cpu/sgx/main.c
b/arch/x86/kernel/cpu/sgx/main.c
index 27892e57c4ef..46f9c26992a7 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -930,6 +930,10 @@ static int __init sgx_init(void)
if (ret)
goto err_kthread;
+ ret = sgx_cgroup_init();
+ if (ret)
+ goto err_provision;
+
/*
* Always try to initialize the native *and* KVM drivers.
* The KVM driver is less picky than the native one and
@@ -941,10 +945,12 @@ static int __init sgx_init(void)
ret = sgx_drv_init();
if (sgx_vepc_init() && ret)
- goto err_provision;
+ goto err_cgroup;
return 0;
+err_cgroup:
+ /* SGX EPC cgroup cleanup */
err_provision:
misc_deregister(&sgx_dev_provision);
@@ -952,6 +958,8 @@ static int __init sgx_init(void)
kthread_stop(ksgxd_tsk);
err_page_cache:
+ misc_misc_cg_set_capacity(MISC_CG_RES_SGX_EPC, 0);
+
for (i = 0; i < sgx_nr_epc_sections; i++) {
vfree(sgx_epc_sections[i].pages);
memunmap(sgx_epc_sections[i].virt_addr);
I put the sgx_cgroup_init() before sgx_drv_init() and sgx_vepc_init(),
otherwise you will need sgx_drv_cleanup() and sgx_vepc_cleanup()
respectively when sgx_cgroup_init() fails.
This looks a little bit weird too, though:
Calling misc_misc_cg_set_capacity() to reset capacity to 0 is done at end
of sgx_init() error path, because the "set capacity" part is done in
sgx_epc_cache_init().
But logically, both "set capacity" and "reset capacity to 0" should be SGX
EPC cgroup operation, so it's more reasonable to do "set capacity" in
sgx_cgroup_init() and do "reset to 0" in the
/* SGX EPC cgroup cleanup */
as shown above.
Eventually, you will need to do EPC cgroup cleanup anyway, e.g., to free
the workqueue, so it's odd to have two places to handle EPC cgroup
cleanup.
I understand the reason "set capacity" part is done in
sgx_page_cache_init() now is because in that function you can easily get
the capacity. But the fact is @sgx_numa_nodes also tracks EPC size for
each node, so you can also get the total EPC size from @sgx_numa_node in
sgx_cgroup_init() and set capacity there.
In this case, you can put "reset capacity to 0" and "free workqueue"
together as the "SGX EPC cgroup cleanup", which is way more clear IMHO.