Re: [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory

From: Kristen Carlson Accardi
Date: Thu Sep 22 2022 - 14:59:26 EST


On Thu, 2022-09-22 at 07:41 -1000, Tejun Heo wrote:
> Hello,
>
> (cc'ing memcg folks)
>
> On Thu, Sep 22, 2022 at 10:10:37AM -0700, Kristen Carlson Accardi
> wrote:
> > Add a new cgroup controller to regulate the distribution of SGX EPC
> > memory,
> > which is a subset of system RAM that is used to provide SGX-enabled
> > applications with protected memory, and is otherwise inaccessible.
> >
> > SGX EPC memory allocations are separate from normal RAM
> > allocations,
> > and is managed solely by the SGX subsystem. The existing cgroup
> > memory
> > controller cannot be used to limit or account for SGX EPC memory.
> >
> > This patchset implements the sgx_epc cgroup controller, which will
> > provide
> > support for stats, events, and the following interface files:
> >
> > sgx_epc.current
> >         A read-only value which represents the total amount of EPC
> >         memory currently being used on by the cgroup and its
> > descendents.
> >
> > sgx_epc.low
> >         A read-write value which is used to set best-effort
> > protection
> >         of EPC usage. If the EPC usage of a cgroup drops below this
> > value,
> >         then the cgroup's EPC memory will not be reclaimed if
> > possible.
> >
> > sgx_epc.high
> >         A read-write value which is used to set a best-effort limit
> >         on the amount of EPC usage a cgroup has. If a cgroup's
> > usage
> >         goes past the high value, the EPC memory of that cgroup
> > will
> >         get reclaimed back under the high limit.
> >
> > sgx_epc.max
> >         A read-write value which is used to set a hard limit for
> >         cgroup EPC usage. If a cgroup's EPC usage reaches this
> > limit,
> >         allocations are blocked until EPC memory can be reclaimed
> > from
> >         the cgroup.
>
> I don't know how SGX uses its memory but you said in the other
> message that
> it's usually a really small portion of the memory and glancing the
> code it
> looks like its own page aging and all. Can you give some concrete
> examples
> on how it's used and why we need cgroup support for it? Also, do you
> really
> need all three control knobs here? e.g. given that .high is only
> really
> useful in conjunction with memory pressure and oom handling from
> userspace,
> I don't see how this would actually be useful for something like
> this.
>
> Thanks.
>

Thanks for your question. The SGX EPC memory is a global shared
resource that can be over committed. The SGX EPC controller should be
used similarly to the normal memory controller. Normally when there is
pressure on EPC memory, the reclaimer thread will write out pages from
EPC memory to a backing RAM that is allocated per enclave. It is
possible currently for even a single enclave to force all the other
enclaves to have their epc pages written to backing RAM by allocating
all the available system EPC memory. This can cause performance issues
for the enclaves when they have to fault to load pages page in.

sgx_epc.high value will help control the EPC usage of the cgroup. The
sgx reclaimer will use this value to prevent the total EPC usage of a
cgroup from exceeding this value (best effort). This way, if a system
administrator would like to try to prevent single enclaves, or groups
of enclaves from allocating all of the EPC memory and causing
performance issues for the other enclaves on the system, they can set
this limit. sgx_epc.max can be used to set a hard limit, which will
cause an enclave to get all it's used pages zapped and it will
effectively be killed until it is rebuilt by the owning sgx
application. sgx_epc.low can be used to (best effort) try to ensure
that some minimum amount of EPC pages are protected for enclaves in a
particular cgroup. This can be useful for preventing evictions and thus
performance issues due to faults.

I hope this answers your question.

Thanks,
Kristen