On 16/12/2024 16:51, Dev Jain wrote:
One of the testcases triggers a CoW on the 255th page (0-indexing) withIs this description essentially saying that it's now possible to creep towards
max_ptes_shared = 256. This leads to 0-254 pages (255 in number) being unshared,
and 257 pages shared, exceeding the constraint. Suppose we run the test as
./khugepaged -s 2. Therefore, khugepaged starts collapsing the range to order-2
folios, since PMD-collapse will fail due to the constraint.
When the scan reaches 254-257 PTE range, because at least one PTE in this range
is writable, with other 3 being read-only, khugepaged collapses this into an
order-2 mTHP, resulting in 3 extra PTEs getting unshared. After this, we encounter
a 4-sized chunk of read-only PTEs, and mTHP collapse stops according to the scaled
constraint, but the number of shared PTEs have now come under the constraint for
PMD-sized THPs. Therefore, the next scan of khugepaged will be able to collapse
this range into a PMD-mapped hugepage, leading to failure of this subtest. Fix
this by reducing the CoW range.
collapsing to a full PMD-size block over successive scans due to rounding errors
in the scaling? Or is this just trying an edge case and the problem doesn't
generalize?
Note: The only objective of this patch is to make the test work for the PMD-case;
no extension has been made for testing for mTHPs.
Signed-off-by: Dev Jain <dev.jain@xxxxxxx>
---
tools/testing/selftests/mm/khugepaged.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/khugepaged.c b/tools/testing/selftests/mm/khugepaged.c
index 8a4d34cce36b..143c4ad9f6a1 100644
--- a/tools/testing/selftests/mm/khugepaged.c
+++ b/tools/testing/selftests/mm/khugepaged.c
@@ -981,6 +981,7 @@ static void collapse_fork_compound(struct collapse_context *c, struct mem_ops *o
static void collapse_max_ptes_shared(struct collapse_context *c, struct mem_ops *ops)
{
int max_ptes_shared = thp_read_num("khugepaged/max_ptes_shared");
+ int fault_nr_pages = is_anon(ops) ? 1 << anon_order : 1;
int wstatus;
void *p;
@@ -997,8 +998,8 @@ static void collapse_max_ptes_shared(struct collapse_context *c, struct mem_ops
fail("Fail");
printf("Trigger CoW on page %d of %d...",
- hpage_pmd_nr - max_ptes_shared - 1, hpage_pmd_nr);
- ops->fault(p, 0, (hpage_pmd_nr - max_ptes_shared - 1) * page_size);
+ hpage_pmd_nr - max_ptes_shared - fault_nr_pages, hpage_pmd_nr);
+ ops->fault(p, 0, (hpage_pmd_nr - max_ptes_shared - fault_nr_pages) * page_size);
if (ops->check_huge(p, 0))
success("OK");
else