Re: [PATCH v12 09/14] x86/sgx: Implement async reclamation for cgroup

From: Huang, Kai
Date: Tue Apr 23 2024 - 22:13:53 EST


On Tue, 2024-04-23 at 19:26 -0500, Haitao Huang wrote:
> On Tue, 23 Apr 2024 17:13:15 -0500, Huang, Kai <kai.huang@xxxxxxxxx> wrote:
>
> > On Tue, 2024-04-23 at 10:30 -0500, Haitao Huang wrote:
> > > > > It's a workaround because you use the capacity==0 but it does not
> > > really
> > > > > mean to disable the misc cgroup for specific resource IIUC.
> > > >
> > > > Please read the comment around @misc_res_capacity again:
> > > >
> > > > * Miscellaneous resources capacity for the entire machine. 0
> > > capacity
> > > > * means resource is not initialized or not present in the host.
> > > >
> > >
> > > I mentioned this in earlier email. I think this means no SGX EPC. It
> > > doesnot mean sgx epc cgroup not enabled. That's also consistent with the
> > > behavior try_charge() fails if capacity is zero.
> >
> > OK. To me the "capacity" is purely the concept of cgroup, so it must be
> > interpreted within the scope of "cgroup". If cgroup, in our case, SGX
> > cgroup, is disabled, then whether "leaving the capacity to reflect the
> > presence of hardware resource" doesn't really matter.
> > So what you are saying is that, the kernel must initialize the capacity
> > of
> > some MISC resource once it is added as valid type.
> > And you must initialize the "capacity" even MISC cgroup is disabled
> > entirely by kernel commandline, in which case, IIUC, misc.capacity is not
> > even going to show in the /fs.
> >
> > If this is your point, then your patch:
> >
> > cgroup/misc: Add SGX EPC resource type
> >
> > is already broken, because you added the new type w/o initializing the
> > capacity.
> >
> > Please fix that up.
> >
> > >
> > > > >
> > > > > There is explicit way for user to disable misc without setting
> > > capacity> > to
> > > > > zero.
> > > >
> > > > Which way are you talking about?
> > >
> > > Echo "-misc" to cgroup.subtree_control at root level for example still
> > > shows non-zero sgx_epc capacity.
> >
> > I guess "having to disable all MISC resources just in order to disable
> > SGX
> > EPC cgroup" is a brilliant idea.
> >
> > You can easily disable the entire MISC cgroup by commandline for that
> > purpose if that's acceptable.
> >
> > And I have no idea why "still showing non-zero EPC capacity" is important
> > if SGX cgroup cannot be supported at all.
> >
>
> Okay, all I'm trying to say is we should care about consistency in code
> and don't want SGX do something different. Mixing "disable" with
> "capacity==0" causes inconsistencies AFAICS:
>
> 1) The try_charge() API currently returns error when capacity is zero. So
> it appears not to mean that the cgroup is disabled otherwise it should
> return success.

I agree this isn't ideal. My view is we can fix it in MISC code if
needed.

>
> 2) The current explicit way ("-misc") to disable misc still shows non-zero
> entries in misc.capacity. (At least for v2 cgroup, it does when I tested).
> Maybe this is not important but I just don't feel good about this
> inconsistency.

This belongs to "MISC resource cgroup was initially enabled by the kernel
at boot time, but later was disabled *somewhere in hierarchy* by the
user".

In fact, if you only do "-misc" for "some subtree", it's quite reasonable
to still report the resource in max.capacity. In the case above, the
"some subtree" happens to be the root.

So to me it's reasonable to still show max.capacity in this case. And you
can actually argue that the kernel still supports the cgroup for the
resource. E.g., you can at runtime do "+misc" to re-enable.

However, if the kernel isn't able to support certain MISC resource cgroup
at boot time, it's quite reasonable to just set the "capacity" to 0 so it
isn't visible to userspace.

Note:

My key point is, when userspace sees 0 "capacity", it shouldn't need to
care about whether it is because of "hardware resource is not available",
or "hardware resource is available but kernel cannot support cgroup for
it". The resource cgroup is simply unavailable.

That means the kernel has full right to just hide that resource from the
cgroup at boot time.

But this should be just within "cgroup's scope", i.e., that resource can
still be available if kernel can provide it. If some app wants to
additionally check whether such resource is indeed available but only
cgroup is not available, it should check resource specific interface but
not take advantage of the MISC cgroup interface.

>
> For now I'll just do BUG_ON() unless there are more strong opinions one
> way or the other.
>

Fine to me.