It's complicated. On the one hand, it's "more likely" because there are
more tail pages than there are head pages or order-0 pages. On the
other hand, a _lot_ of the time we call compound_head(), it's done with
a non-tail page because we tend to pass around head pages (eg,
pmd_page() on hugetlbfs, or looking up a folio in the page cache and
passing &folio->page to some function that's not yet converted.
On the third hand, does the compiler really do much with the annotation?
Before your patch:
27d6: a8 01 test $0x1,%al
27d8: 75 02 jne 27dc <clear_refs_pte_range+0x9c>
27da: eb 59 jmp 2835 <clear_refs_pte_range+0xf5>
27dc: 49 8b 44 24 08 mov 0x8(%r12),%rax
27e1: a8 01 test $0x1,%al
27e3: 75 6f jne 2854 <clear_refs_pte_range+0x114>
27e5: eb 73 jmp 285a <clear_refs_pte_range+0x11a>
With your patch:
1ee6: a8 01 test $0x1,%al
1ee8: 75 02 jne 1eec <clear_refs_pte_range+0x9c>
1eea: eb 5f jmp 1f4b <clear_refs_pte_range+0xfb>
1eec: 49 8b 44 24 08 mov 0x8(%r12),%rax
1ef1: a8 01 test $0x1,%al
1ef3: 75 50 jne 1f45 <clear_refs_pte_range+0xf5>
1ef5: eb 6c jmp 1f63 <clear_refs_pte_range+0x113>
Looks pretty much the same. bloat-o-meter says:
$ ./scripts/bloat-o-meter before.o after.o
add/remove: 0/0 grow/shrink: 2/4 up/down: 32/-48 (-16)
Function old new delta
gather_stats.constprop 730 753 +23
smaps_hugetlb_range 635 644 +9
smaps_page_accumulate 342 338 -4
clear_refs_pte_range 339 328 -11
pagemap_hugetlb_range 422 407 -15
smaps_pte_range 1406 1388 -18
Total: Before=20066, After=20050, chg -0.08%
(I was looking at clear_refs_pte_range above). This seems marginal.
The benefits of removing a call to compound_head are much less
ambiguous:
$ ./scripts/bloat-o-meter before.o .build/fs/proc/task_mmu.o
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-101 (-101)
Function old new delta
clear_refs_pte_range 339 238 -101
Total: Before=20066, After=19965, chg -0.50%
I'd describe that as replacing four calls to compound_head() with two:
- page = pmd_page(*pmd);
+ folio = page_folio(pmd_page(*pmd));
/* Clear accessed and referenced bits. */
pmdp_test_and_clear_young(vma, addr, pmd);
- test_and_clear_page_young(page);
- ClearPageReferenced(page);
+ folio_test_clear_young(folio);
+ folio_clear_referenced(folio);
...
- page = vm_normal_page(vma, addr, ptent);
- if (!page)
+ folio = vm_normal_folio(vma, addr, ptent);
+ if (!folio)
continue;
/* Clear accessed and referenced bits. */
ptep_test_and_clear_young(vma, addr, pte);
- test_and_clear_page_young(page);
- ClearPageReferenced(page);
+ folio_test_clear_young(folio);
+ folio_clear_referenced(folio);
I'm not saying this patch is necessarily wrong, I just think it's
"not proven".