Re: [linus:master] [mm] c0bff412e6: stress-ng.clone.ops_per_sec -2.9% regression

From: David Hildenbrand
Date: Thu Aug 01 2024 - 02:49:45 EST


On 01.08.24 08:39, Yin, Fengwei wrote:
Hi David,

On 7/30/2024 4:11 PM, David Hildenbrand wrote:
On 30.07.24 07:00, kernel test robot wrote:


Hello,

kernel test robot noticed a -2.9% regression of
stress-ng.clone.ops_per_sec on:

Is that test even using hugetlb? Anyhow, this pretty much sounds like
noise and can be ignored.

It's not about hugetlb. It looks like related with the change:

Ah, that makes sense!


diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 888353c209c03..7577fe7debafc 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -1095,7 +1095,12 @@ PAGEFLAG(Isolated, isolated, PF_ANY);
static __always_inline int PageAnonExclusive(const struct page *page)
{
VM_BUG_ON_PGFLAGS(!PageAnon(page), page);
- VM_BUG_ON_PGFLAGS(PageHuge(page) && !PageHead(page), page);
+ /*
+ * HugeTLB stores this information on the head page; THP keeps
it per
+ * page
+ */
+ if (PageHuge(page))
+ page = compound_head(page);
return test_bit(PG_anon_exclusive, &PF_ANY(page, 1)->flags);


The PageAnonExclusive() function is changed. And the profiling data
showed it:

0.00 +3.9 3.90
perf-profile.calltrace.cycles-pp.folio_try_dup_anon_rmap_ptes.copy_present_ptes.copy_pte_range.copy_p4d_range.copy_page_range

According
https://download.01.org/0day-ci/archive/20240730/202407301049.5051dc19-oliver.sang@xxxxxxxxx/config-6.9.0-rc4-00197-gc0bff412e67b:
# CONFIG_DEBUG_VM is not set
So maybe such code change could bring difference?

Yes indeed. fork() can be extremely sensitive to each added instruction.

I even pointed out to Peter why I didn't add the PageHuge check in there originally [1].

"Well, and I didn't want to have runtime-hugetlb checks in
PageAnonExclusive code called on certainly-not-hugetlb code paths."


We now have to do a page_folio(page) and then test for hugetlb.

return folio_test_hugetlb(page_folio(page));

Nowadays, folio_test_hugetlb() will be faster than at c0bff412e6 times, so maybe at least part of the overhead is gone.


[1] https://lore.kernel.org/r/all/8b0b24bb-3c38-4f27-a2c9-f7d7adc4a115@xxxxxxxxxx/


--
Cheers,

David / dhildenb