On Tue, 29 Jul 2014, Joonsoo Kim wrote:
I have a silly question here.
Why need_resched() is criteria to stop async compaction?
need_resched() is flagged up when time slice runs out or other reasons.
It means that we should stop async compaction at arbitrary timing
because process can be on compaction code at arbitrary moment. I think
that it isn't reasonable and it doesn't ensure anything. Instead of
this approach, how about doing compaction on certain amounts of pageblock
for async compaction?
Not a silly question at all, I had the same feeling in
https://lkml.org/lkml/2014/5/21/730 and proposed it to be a tunable that
indicates how much work we are willing to do for thp in the pagefault
path. It suffers from the fact that past failure to isolate and/or
migrate memory to free an entire pageblock doesn't indicate that the next
pageblock will fail as well, but there has to be cutoff at some point or
async compaction becomes unnecessarily expensive. We can always rely on
khugepaged later to do the collapse, assuming we're not faulting memory
and then immediately pinning it.
I think there's two ways to go about it:
- allow a single thp fault to be expensive and then rely on deferred
compaction to avoid subsequent calls in the near future, or
- try to make all thp faults be as least expensive as possible so that
the cumulative effect of faulting large amounts of memory doesn't end
up with lengthy stalls.
Both of these are complex because of the potential for concurrent calls to
memory compaction when faulting thp on several cpus.
I also think the second point from that email still applies, that we
should abort isolating pages within a pageblock for migration once it can
no longer allow a cc->order allocation to succeed.