Re: [PATCH v5 12/18] x86/sgx: Add EPC OOM path to forcefully reclaim EPC

From: Huang, Kai
Date: Mon Oct 09 2023 - 21:18:10 EST


On Mon, 2023-10-09 at 20:04 -0500, Haitao Huang wrote:
> On Mon, 09 Oct 2023 18:45:06 -0500, Huang, Kai <kai.huang@xxxxxxxxx> wrote:
>
> > On Fri, 2023-09-22 at 20:06 -0700, Haitao Huang wrote:
> > > From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> > >
> > > Introduce the OOM path for killing an enclave with a reclaimer that is
> > > no
> > > longer able to reclaim enough EPC pages. Find a victim enclave, which
> > > will be an enclave with only "unreclaimable" EPC pages left in the
> > > cgroup LRU lists. Once a victim is identified, mark the enclave as OOM
> > > and zap the enclave's entire page range, and drain all mm references in
> > > encl->mm_list. Block allocating any EPC pages in #PF handler, or
> > > reloading any pages in all paths, or creating any new mappings.
> > >
> > > The OOM killing path may race with the reclaimers: in some cases, the
> > > victim enclave is in the process of reclaiming the last EPC pages when
> > > OOM happens, that is, all pages other than SECS and VA pages are in
> > > RECLAIMING_IN_PROGRESS state. The reclaiming process requires access to
> > > the enclave backing, VA pages as well as SECS. So the OOM killer does
> > > not directly release those enclave resources, instead, it lets all
> > > reclaiming in progress to finish, and relies (as currently done) on
> > > kref_put on encl->refcount to trigger sgx_encl_release() to do the
> > > final cleanup.
> > >
> > > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> > > Co-developed-by: Kristen Carlson Accardi <kristen@xxxxxxxxxxxxxxx>
> > > Signed-off-by: Kristen Carlson Accardi <kristen@xxxxxxxxxxxxxxx>
> > > Co-developed-by: Haitao Huang <haitao.huang@xxxxxxxxxxxxxxx>
> > > Signed-off-by: Haitao Huang <haitao.huang@xxxxxxxxxxxxxxx>
> > > Cc: Sean Christopherson <seanjc@xxxxxxxxxx>
> > > ---
> > > V5:
> > > - Rename SGX_ENCL_OOM to SGX_ENCL_NO_MEMORY
> > >
> > > V4:
> > > - Updates for patch reordering and typo fixes.
> > >
> > > V3:
> > > - Rebased to use the new VMA_ITERATOR to zap VMAs.
> > > - Fixed the racing cases by blocking new page allocation/mapping and
> > > reloading when enclave is marked for OOM. And do not release any enclave
> > > resources other than draining mm_list entries, and let pages in
> > > RECLAIMING_IN_PROGRESS to be reaped by reclaimers.
> > > - Due to above changes, also removed the no-longer needed encl->lock in
> > > the OOM path which was causing deadlocks reported by the lock prover.
> > >
> >
> > [...]
> >
> > > +
> > > +/**
> > > + * sgx_epc_oom() - invoke EPC out-of-memory handling on target LRU
> > > + * @lru: LRU that is low
> > > + *
> > > + * Return: %true if a victim was found and kicked.
> > > + */
> > > +bool sgx_epc_oom(struct sgx_epc_lru_lists *lru)
> > > +{
> > > + struct sgx_epc_page *victim;
> > > +
> > > + spin_lock(&lru->lock);
> > > + victim = sgx_oom_get_victim(lru);
> > > + spin_unlock(&lru->lock);
> > > +
> > > + if (!victim)
> > > + return false;
> > > +
> > > + if (victim->flags & SGX_EPC_OWNER_PAGE)
> > > + return sgx_oom_encl_page(victim->encl_page);
> > > +
> > > + if (victim->flags & SGX_EPC_OWNER_ENCL)
> > > + return sgx_oom_encl(victim->encl);
> >
> > I hate to bring this up, at least at this stage, but I am wondering why
> > we need
> > to put VA and SECS pages to the unreclaimable list, but cannot keep an
> > "enclave_list" instead?
> >
> > So by looking the patch (" x86/sgx: Limit process EPC usage with misc
> > cgroup
> > controller"), if I am not missing anything, the whole "unreclaimable"
> > list is
> > just used to find the victim enclave when OOM needs to be done. Thus, I
> > don't
> > see why "enclave_list" cannot be used to achieve this.
> >
> > The reason that I am asking is because it seems using "enclave_list" we
> > can
> > simplify the code. At least the patches related to track VA/SECS pages,
> > and the
> > SGX_EPC_OWNER_PAGE/SGX_EPC_OWNER_ENCL thing can be eliminated
> > completely.
> > Using "enclave_list", I guess you just need to put the enclave to the
> > current
> > EPC cgroup when SECS page is allocated.
> >
> Later the hosting process could migrated/reassigned to another cgroup?
> What to do when the new cgroup is OOM?
>

You addressed in the documentation, no?

+Migration
+---------
+
+Once an EPC page is charged to a cgroup (during allocation), it
+remains charged to the original cgroup until the page is released
+or reclaimed. Migrating a process to a different cgroup doesn't
+move the EPC charges that it incurred while in the previous cgroup
+to its new cgroup.