Re: [PATCH 4/5] selftests/damon/damos_tried_regions: handle empty tried regions in early cycles

From: Kunwu Chan

Date: Sun May 31 2026 - 22:35:40 EST


June 1, 2026 at 12:54 AM, "SeongJae Park" <sj@xxxxxxxxxx mailto:sj@xxxxxxxxxx?to=%22SeongJae%20Park%22%20%3Csj%40kernel.org%3E > wrote:


>
> On Sun, 31 May 2026 17:17:23 +0800 Kunwu Chan <kunwu.chan@xxxxxxxxx> wrote:
>
> >
> > From: Kunwu Chan <kunwu.chan@xxxxxxxxx>
> >
> > The test aborts if the initial aggregation cycles produce zero
> > tried regions. This can happen on slow machines, causing false
> > failures. Skip empty cycles and retry up to 200 times before
> > giving up. Also check that enough samples were collected before
> > computing the 50th percentile.
> >
> I agree this will make the test be more reliable. I'm bit concerned if 200
> times retry can make the test run too long, though.
>
> Also, could you further elaborate why this can fail on slow machines? That is,
> DAMON will check the access of 'access_memory_even' process every 5ms. Are you
> thinking the 5ms is too short for 'access_memory_event' to make the expected
> access (accessing the 7 regins of 10 MiB size) within? If so, should we
> increase the sampling interval before retrying?
>
> I also suspect if the unreliable results you seen is due to the fact that DAMON
> is not flushing TLB, like we discussed before. If that's the case, could we
> increase the working set size of this test, similar to the wss_estimation test?
>
Thanks, SJ.

Good points.

I don't yet have enough evidence to say whether this is primarily due to
scheduling delays, a too-short sampling interval, or effects from not
flushing TLB.

I'll investigate the root cause and see if increasing the working set
size or adjusting the test configuration may be a cleaner solution than
adding retries.

I'll drop this patch from v2 for now and revisit it once I better
understand the root cause.

Thanks,
Kunwu

> [1] https://lore.kernel.org/20260525144846.604907-1-kunwu.chan@xxxxxxxxx
>
> Thanks,
> SJ
>
> [...]
>