On Wed, 2024-02-21 at 00:44 -0600, Haitao Huang wrote:I'll add some comments. Was assuming this is just following the old design as ksgxd.
[...]
>
> Here the @nr_to_scan is reduced by the number of pages that are
> isolated, but
> not actually reclaimed (which is reflected by @cnt).
>
> IIUC, looks you want to make this function do "each cycle" as what you
> mentioned
> in the v8 [1]:
>
> I tested with that approach and found we can only target number of
> pages
> attempted to reclaim not pages actually reclaimed due to the
> uncertainty
> of how long it takes to reclaim pages. Besides targeting number of
> scanned pages for each cycle is also what the ksgxd does.
>
> If we target actual number of pages, sometimes it just takes too long.
> I
> saw more timeouts with the default time limit when running parallel
> selftests.
>
> I am not sure what does "sometimes it just takes too long" mean, but
> what I am
> thinking is you are trying to do some perfect but yet complicated code
> here.
I think what I observed was that the try_charge() would block too long
before getting chance of schedule() to yield, causing more timeouts than
necessary.
I'll do some re-test to be sure.
Looks this is a valid information that can be used to justify whatever you are
implementing in the EPC cgroup reclaiming function(s).
>
> For instance, I don't think selftest reflect the real workload, and I
> believe
> adjusting the limit of a given EPC cgroup shouldn't be a frequent
> operation,
> thus it is acceptable to use some easy-maintain code but less perfect
> code.
>
> Here I still think having @nr_to_scan as a pointer is over-complicated.
> For
> example, we can still let sgx_reclaim_pages() to always scan
> SGX_NR_TO_SCAN
> pages, but give up when there's enough pages reclaimed or when the EPC
> cgroup
> and its descendants have been looped:
>
> unsigned int sgx_epc_cgroup_reclaim_pages(struct misc_cg *root)
> {
> unsigned int cnt = 0;
> ...
>
> css_for_each_descendant_pre(pos, css_root) {
> ...
> epc_cg = sgx_epc_cgroup_from_misc_cg(css_misc(pos));
> cnt += sgx_reclaim_pages(&epc_cg->lru);
>
> if (cnt >= SGX_NR_TO_SCAN)
> break;
> }
>
> ...
> return cnt;
> }
>
> Yeah it may reclaim more than SGX_NR_TO_SCAN when the loop actually
> reaches any
> descendants, but that should be rare and we don't care that much, do we?
>
I assume you meant @cnt here to be number of pages actually reclaimed.
Yes.
This could cause sgx_epc_cgroup_reclaim_pages() block too long as @cnt
may always be zero (all pages are too young) and you have to loop all
descendants.
I am not sure whether this is a valid point.
For example, your change in patch 10 "x86/sgx: Add EPC reclamation in cgroup
try_charge()" already loops all descendants in below code:
+ if (sgx_epc_cgroup_lru_empty(epc_cg->cg))
+ return -ENOMEM;
Anyway, I can see all these can be justification to your design/implementation.Yes, will add clarifying comments.
My point is please put these justification in changelog/comments so that we can
actually understand.