Prerequisites
=============
Some work items identified as being prerequisites are listed on page 3 at [8].
The summary is:
| item | status |
|:------------------------------|:------------------------|
| mlock | In mainline (v6.7) |
| madvise | In mainline (v6.6) |
| compaction | v1 posted [9] |
| numa balancing | Investigated: see below |
| user-triggered page migration | In mainline (v6.7) |
| khugepaged collapse | In mainline (NOP) |
On NUMA balancing, which currently ignores any PTE-mapped THPs it encounters,
John Hubbard has investigated this and concluded that it is A) not clear at the
moment what a better policy might be for PTE-mapped THP and B) questions whether
this should really be considered a prerequisite given no regression is caused
for the default "small-sized THP disabled" case, and there is no correctness
issue when it is enabled - its just a potential for non-optimal performance.
(John please do elaborate if I haven't captured this correctly!)
If there are no disagreements about removing numa balancing from the list, then
that just leaves compaction which is in review on list at the moment.
I really would like to get this series (and its remaining comapction
prerequisite) in for v6.8. I accept that it may be a bit optimistic at this
point, but lets see where we get to with review?
Testing
=======
The series includes patches for mm selftests to enlighten the cow and khugepaged
tests to explicitly test with small-order THP, in the same way that PMD-order
THP is tested. The new tests all pass, and no regressions are observed in the mm
selftest suite. I've also run my usual kernel compilation and java script
benchmarks without any issues.
Refer to my performance numbers posted with v6 [6]. (These are for small-sized
THP only - they do not include the arm64 contpte follow-on series).
John Hubbard at Nvidia has indicated dramatic 10x performance improvements for
some workloads at [10]. (Observed using v6 of this series as well as the arm64
contpte series).