Re: mm/DAMON: Profiling enhancements for DAMON

From: Yu Zhao
Date: Sat Dec 16 2023 - 00:42:33 EST


On Fri, Dec 15, 2023 at 3:08 AM Prasad, Aravinda
<aravinda.prasad@xxxxxxxxx> wrote:
>
> > On Fri, Dec 15, 2023 at 12:42 AM Aravinda Prasad
> > <aravinda.prasad@xxxxxxxxx> wrote:
> > ...
> >
> > > This patch proposes profiling different levels of the application’s
> > > page table tree to detect whether a region is accessed or not. This
> > > patch is based on the observation that, when the accessed bit for a
> > > page is set, the accessed bits at the higher levels of the page table
> > > tree (PMD/PUD/PGD) corresponding to the path of the page table walk
> > > are also set. Hence, it is efficient to check the accessed bits at
> > > the higher levels of the page table tree to detect whether a region is
> > > accessed or not.
> >
> > This patch can crash on Xen. See commit 4aaf269c768d("mm: introduce
> > arch_has_hw_nonleaf_pmd_young()")
>
> Will fix as suggested in the commit.
>
> >
> > MGLRU already does this in the correct way. See mm/vmscan.c.
>
> I don't see access bits at PUD or PGD checked for 4K page size. Can you
> point me to the code where access bits are checked at PUD and PGD level?

There isn't any, because *the system* bottlenecks at the PTE level and
at moving memory between tiers. Optimizing at the PUD/PGD levels has
insignificant ROI for the system.

And food for thought:
1. Can a PUD/PGD cover memory from different tiers?
2. Can the A-bit in non-leaf entries work for EPT?

> > This patch also can cause USER DATA CORRUPTION. See commit
> > c11d34fa139e ("mm/damon/ops-common: atomically test and clear young
> > on ptes and pmds").
>
> Ok. Will atomically test and set the access bits.
>
> >
> > The quality of your patch makes me very much doubt the quality of your
> > paper, especially your results on Google's kstaled and MGLRU in table 6.2.
>
> The results are very much reproducible. We have not used kstaled/MGLRU for
> the data in Figure 3, but we linearly scan pages similar to kstaled by implementing
> a kernel thread for scanning.

You have not used MGLRU, and yet your results are very much reproducible.

> Our argument for kstaled/MGLRU is that, scanning individual pages at 4K
> granularity may not be efficient for large footprint applications.

Your argument for MGLRU is based on a wrong assumption, as I have
already pointed out.

> Instead,
> access bits at the higher level of the page table tree can be used. In the
> paper we have demonstrated this with DAMON but the concept can be
> applied to kstaled/MGLRU as well.

You got it backward: MGLRU introduced the concept; you fabricated a
comparison table.