Re: [PATCH v1] mm/contpte: Optimize loop to reduce redundant operations

From: Barry Song
Date: Sat Apr 12 2025 - 03:18:38 EST


On Fri, Apr 11, 2025 at 8:03 PM David Laight
<david.laight.linux@xxxxxxxxx> wrote:
>
> On Fri, 11 Apr 2025 09:25:39 +1200
> Barry Song <21cnbao@xxxxxxxxx> wrote:
>
> > On Mon, Apr 7, 2025 at 9:23 PM Xavier <xavier_qy@xxxxxxx> wrote:
> > >
> > > This commit optimizes the contpte_ptep_get function by adding early
> > > termination logic. It checks if the dirty and young bits of orig_pte
> > > are already set and skips redundant bit-setting operations during
> > > the loop. This reduces unnecessary iterations and improves performance.
> > >
> > > Signed-off-by: Xavier <xavier_qy@xxxxxxx>
> > > ---
> > > arch/arm64/mm/contpte.c | 13 +++++++++++--
> > > 1 file changed, 11 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c
> > > index bcac4f55f9c1..ca15d8f52d14 100644
> > > --- a/arch/arm64/mm/contpte.c
> > > +++ b/arch/arm64/mm/contpte.c
> > > @@ -163,17 +163,26 @@ pte_t contpte_ptep_get(pte_t *ptep, pte_t orig_pte)
> > >
> > > pte_t pte;
> > > int i;
> > > + bool dirty = false;
> > > + bool young = false;
> > >
> > > ptep = contpte_align_down(ptep);
> > >
> > > for (i = 0; i < CONT_PTES; i++, ptep++) {
> > > pte = __ptep_get(ptep);
> > >
> > > - if (pte_dirty(pte))
> > > + if (!dirty && pte_dirty(pte)) {
> > > + dirty = true;
> > > orig_pte = pte_mkdirty(orig_pte);
> > > + }
> > >
> > > - if (pte_young(pte))
> > > + if (!young && pte_young(pte)) {
> > > + young = true;
> > > orig_pte = pte_mkyoung(orig_pte);
> > > + }
> > > +
> > > + if (dirty && young)
> > > + break;
> >
> > This kind of optimization is always tricky. Dev previously tried a similar
> > approach to reduce the loop count, but it ended up causing performance
> > degradation:
> > https://lore.kernel.org/linux-mm/20240913091902.1160520-1-dev.jain@xxxxxxx/
> >
> > So we may need actual data to validate this idea.
>
> You might win with 3 loops.
> The first looks for both 'dirty' and 'young'.
> If it finds only one it jumps to a different loop that continues
> the search but only looks for the other flag.

However, there's no concrete evidence that this loop is a hot path. It might
save a few instructions when the first PTE is both young and dirty, but in
many cases, none of the PTEs are either. Previous profiling indicates that
the actual hot path lies in the rmap walk to locate the page table entries
during folio_reference in the reclamation path.

I suspect the code you're optimizing isn't actually part of a hot path at all.

>
> David
>
> >
> > > }
> > >
> > > return orig_pte;
> > > --
> > > 2.34.1
> > >
> >

Thanks
Barry