Good catch. This is indeed a problem if pages in a higher level cgroup are always busy (being 'young').The reclamation loop starting from this group may be stuck in only shifting the pages from front to tail in this group and never tries to scan & reclaim pages in its descendants.
On 1/05/2024 7:51 am, Haitao Huang wrote:
static void sgx_reclaim_pages_global(struct mm_struct *charge_mm)
{
- sgx_reclaim_pages(&sgx_global_lru, charge_mm);
+ if (IS_ENABLED(CONFIG_CGROUP_MISC))
+ sgx_cgroup_reclaim_pages(misc_cg_root(), charge_mm);
+ else
+ sgx_reclaim_pages(&sgx_global_lru, charge_mm);
}
I think we have a problem here when we do global reclaim starting from the ROOT cgroup:
This function will mostly just only try to reclaim from the ROOT cgroup, but won't reclaim from the descendants.
The reason is the sgx_cgroup_reclaim_pages() will simply return after "scanning" SGX_NR_TO_SCAN (16) pages w/o going to the descendants, and the "scanning" here simply means "removing the EPC page from the cgroup's LRU list".
So as long as the ROOT cgroup LRU contains more than SGX_NR_TO_SCAN (16) pages, effectively sgx_cgroup_reclaim_pages() will just scan and return w/o going into the descendants. Having 16 EPC pages should be a "almost always true" case I suppose.
When the sgx_reclaim_pages_global() is called again, we will start from the ROOT again.
That means the this doesn't truly reclaim "from global" at all.
IMHO the behaviour of sgx_cgroup_reclaim_pages() is OK for per-cgroup reclaim because I think in this case our intention is we should try best to reclaim from the cgroup, i.e., whether we can reclaim from descendants doesn't matter.
But for global reclaim this doesn't work.
Am I missing anything?