Re: DAMON: problems when running DAMON on ARM64 with 'transparent_hugepage' enabled

From: SeongJae Park
Date: Fri Nov 05 2021 - 09:58:22 EST


Hi Xiongfeng,

On Wed, 27 Oct 2021 08:06:36 +0000 SeongJae Park <sj@xxxxxxxxxx> wrote:

> Hello Xiongfeng,
>
> On Wed, 27 Oct 2021 14:14:57 +0800 Xiongfeng Wang <wangxiongfeng2@xxxxxxxxxx> wrote:
>
> > Sorry, I forgot to Cc the maillist. Cc it in this mail.
> >
> > On 2021/10/27 10:19, Xiongfeng Wang wrote:
> > > Hi SeongJae,
> > >
> > > Sorry to disturb you. It's just that I came across some problems when running
> > > DAMON, but still didn't find the solution after several days.
>
> You're not disturbing but helping me! Please don't say so! :)
>
> > >
> > > A short description is that the result of DAMON is not as expected when running
> > > on ARM64 with 'transparent_hugepage' enabled. But the result is correct when
> > > 'transparent_hugepage' is disabled.
> > >
> > > The following are the steps I came across the problems.
> > > 1. Firstly, I use 'damo record' to sample the 'stairs' demo.
> > > damo record "./masim ./configs/stairs.cfg"
> > > 2. Then I use 'damo report' to show the results.
> > > damo report heats --address_range xxx xxx --time_range xxx xxx --heatmap
> > > stdout --stdout_heatmap_color emotion
> > > The result doesn't show like a stair. I wrote a userspace demo to access a
> > > certain address range in loop and use DAMON to sample the demo. I added
> > > trace_print in 'damon_va_check_access()' and found out the pages in the address
> > > range are not always detected as accessed, which is not expected. When I disable
> > > transparent_hugepage by chance, the pages are marked as accessed. Then I test
> > > the 'stairs' demo again, the result is correct. It seems that, only when
> > > transparent_hugepage' is disabled, the access check works. I don't know where
> > > the bug is, the software or the hardware ? Appreciate it if you have time to
> > > reply. Thanks !
>
> Thank you for this report! I have a theory, but would like to test first.
> Will check and get back to you soon.

Sorry for late response. I also confirmed the issue is reproducible on my
ARM64 test machine. My theory is, enabling THP reduced page table walks, and
therefore the PTE Accessed bits are not frequently updated. To verify this, I
made below experimental change. After applying the change on my test machine,
I was able to show the expected access pattern regardless of THP enablement.

--- a/mm/damon/vaddr.c
+++ b/mm/damon/vaddr.c
@@ -429,6 +429,7 @@ void damon_va_prepare_access_checks(struct damon_ctx *ctx)
continue;
damon_for_each_region(r, t)
damon_va_prepare_access_check(ctx, mm, r);
+ flush_tlb_mm(mm);
mmput(mm);
}
}

Could you please test this on your machine and let me know the result?

Again, please note that this change is only for proof of the theory, rather
than the complete fix.


Thanks,
SJ

>
>
> Thanks,
> SJ
>
> > >
> > > Thanks,
> > > Xiongfeng