On Mon, 2023-10-09 at 20:04 -0500, Haitao Huang wrote:
On Mon, 09 Oct 2023 18:45:06 -0500, Huang, Kai <kai.huang@xxxxxxxxx> wrote:
> On Fri, 2023-09-22 at 20:06 -0700, Haitao Huang wrote:
> > From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> >
> > Introduce the OOM path for killing an enclave with a reclaimer that is
> > no
> > longer able to reclaim enough EPC pages. Find a victim enclave, which
> > will be an enclave with only "unreclaimable" EPC pages left in the
> > cgroup LRU lists. Once a victim is identified, mark the enclave as OOM
> > and zap the enclave's entire page range, and drain all mm references in
> > encl->mm_list. Block allocating any EPC pages in #PF handler, or
> > reloading any pages in all paths, or creating any new mappings.
> >
> > The OOM killing path may race with the reclaimers: in some cases, the
> > victim enclave is in the process of reclaiming the last EPC pages when
> > OOM happens, that is, all pages other than SECS and VA pages are in
> > RECLAIMING_IN_PROGRESS state. The reclaiming process requires access to
> > the enclave backing, VA pages as well as SECS. So the OOM killer does
> > not directly release those enclave resources, instead, it lets all
> > reclaiming in progress to finish, and relies (as currently done) on
> > kref_put on encl->refcount to trigger sgx_encl_release() to do the
> > final cleanup.
> >
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> > Co-developed-by: Kristen Carlson Accardi <kristen@xxxxxxxxxxxxxxx>
> > Signed-off-by: Kristen Carlson Accardi <kristen@xxxxxxxxxxxxxxx>
> > Co-developed-by: Haitao Huang <haitao.huang@xxxxxxxxxxxxxxx>
> > Signed-off-by: Haitao Huang <haitao.huang@xxxxxxxxxxxxxxx>
> > Cc: Sean Christopherson <seanjc@xxxxxxxxxx>
> > ---
> > V5:
> > - Rename SGX_ENCL_OOM to SGX_ENCL_NO_MEMORY
> >
> > V4:
> > - Updates for patch reordering and typo fixes.
> >
> > V3:
> > - Rebased to use the new VMA_ITERATOR to zap VMAs.
> > - Fixed the racing cases by blocking new page allocation/mapping and
> > reloading when enclave is marked for OOM. And do not release any enclave
> > resources other than draining mm_list entries, and let pages in
> > RECLAIMING_IN_PROGRESS to be reaped by reclaimers.
> > - Due to above changes, also removed the no-longer needed encl->lock in
> > the OOM path which was causing deadlocks reported by the lock prover.
> >
>
> [...]
>
> > +
> > +/**
> > + * sgx_epc_oom() - invoke EPC out-of-memory handling on target LRU
> > + * @lru: LRU that is low
> > + *
> > + * Return: %true if a victim was found and kicked.
> > + */
> > +bool sgx_epc_oom(struct sgx_epc_lru_lists *lru)
> > +{
> > + struct sgx_epc_page *victim;
> > +
> > + spin_lock(&lru->lock);
> > + victim = sgx_oom_get_victim(lru);
> > + spin_unlock(&lru->lock);
> > +
> > + if (!victim)
> > + return false;
> > +
> > + if (victim->flags & SGX_EPC_OWNER_PAGE)
> > + return sgx_oom_encl_page(victim->encl_page);
> > +
> > + if (victim->flags & SGX_EPC_OWNER_ENCL)
> > + return sgx_oom_encl(victim->encl);
>
> I hate to bring this up, at least at this stage, but I am wondering why
> we need
> to put VA and SECS pages to the unreclaimable list, but cannot keep an
> "enclave_list" instead?
>
> So by looking the patch (" x86/sgx: Limit process EPC usage with misc
> cgroup
> controller"), if I am not missing anything, the whole "unreclaimable"
> list is
> just used to find the victim enclave when OOM needs to be done. Thus, I
> don't
> see why "enclave_list" cannot be used to achieve this.
>
> The reason that I am asking is because it seems using "enclave_list" we
> can
> simplify the code. At least the patches related to track VA/SECS pages,
> and the
> SGX_EPC_OWNER_PAGE/SGX_EPC_OWNER_ENCL thing can be eliminated
> completely.
> Using "enclave_list", I guess you just need to put the enclave to the
> current
> EPC cgroup when SECS page is allocated.
>
Later the hosting process could migrated/reassigned to another cgroup?
What to do when the new cgroup is OOM?
You addressed in the documentation, no?
+Migration
+---------
+
+Once an EPC page is charged to a cgroup (during allocation), it
+remains charged to the original cgroup until the page is released
+or reclaimed. Migrating a process to a different cgroup doesn't
+move the EPC charges that it incurred while in the previous cgroup
+to its new cgroup.