Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory

From: Filipe Manana
Date: Thu Jul 04 2024 - 05:57:39 EST


On Thu, Jul 4, 2024 at 10:48 AM Filipe Manana <fdmanana@xxxxxxxxxx> wrote:
>
> On Wed, Jul 3, 2024 at 10:07 PM Andrea Gelmini <andrea.gelmini@xxxxxxxxx> wrote:
> >
> > Il giorno mer 3 lug 2024 alle ore 13:59 Filipe Manana
> > <fdmanana@xxxxxxxxxx> ha scritto:
> > >
> > > I'm collecting all the patches in this branch:
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git/log/?h=em_shrinker_6.10
> > >
> > > They apply cleanly to 6.10-rc.
> >
> > Yeap, as I wrote before, same problem here.
> > I tried the branch over today Linus git (master), and nothing changed.
> > But, good news, I can provide a few more details.
> >
> > So, no need to use restic. On my laptop (nvme + ssd, 32GB RAM, Lenovo T480):
> > a) boot up;
> > b) just open Window Maker and two Konsole, one with htop (with a few
> > tricks to view PSI and so on);
> > c) on one terminal run: tar cp /home/ | pv > /dev/null
> > d) wait less than one minutes, and I see "PSI full memory" increase
> > more than 50, memory pressure on swap, and two CPU threads (out of
> > eight) busy at 100%;
>
> I'll try that soon and see if I can reproduce.
>
> In the meanwhile, just curious: are you using swapfiles on btrfs?

I wonder if you have bpftrace installed and can run the following
script while doing the test:

$ cat bpftrace-em-shrinker.sh
#!/usr/bin/bpftrace

tracepoint:btrfs:btrfs_extent_map_shrinker_scan_enter
{
time("%H:%M:%S ");
@start_em_scan[tid] = nsecs;
printf("%s enter shrinker scan %ld nr %ld root %llu ino %llu\n",
comm, args->nr_to_scan, args->nr, args->last_root_id, args->last_ino);
}

tracepoint:btrfs:btrfs_extent_map_shrinker_scan_exit
/@start_em_scan[tid]/
{
time("%H:%M:%S ");
$dur = (nsecs - @start_em_scan[tid]) / 1000;
delete(@start_em_scan[tid]);
printf("%s exit shrinker drop %ld nr %ld root %llu ino %llu | %llu us\n",
comm, args->nr_dropped, args->nr, args->last_root_id,
args->last_ino, $dur);
}

END
{
clear(@start_em_scan);
}

The run it like:

$ ./bpftrace-em-shrinker.sh 2>&1 | tee em_shrinker_log.txt

And provide the log file.

Thanks.

>
> Thanks.
>
> > e) system get sluggish (on htop I see no process eating CPU);
> > f) if I kill tar, PSI memory keeps going up and down, so the threads.
> > After lots of minutes, everything get back to no activity. In these
> > minutes I see by iotop there's no activity nor on ssd or nvme. Until
> > the end, the system is unresponsive, oh well, really slow.
> >
> > My / is BTRFS. Not many years of aging. Usually with daily snapshots
> > and forced compression.
> >
> > Less than 4.000.000 files on the system. Usually .git and source code.
> >
> > root@glen:/home/gelma# btrfs filesystem usage /
> > Overall:
> > Device size: 3.54TiB
> > Device allocated: 2.14TiB
> > Device unallocated: 1.40TiB
> > Device missing: 0.00B
> > Device slack: 0.00B
> > Used: 2.03TiB
> > Free (estimated): 1.50TiB (min: 1.50TiB)
> > Free (statfs, df): 1.50TiB
> > Data ratio: 1.00
> > Metadata ratio: 1.00
> > Global reserve: 512.00MiB (used: 0.00B)
> > Multiple profiles: no
> >
> > Data,single: Size:2.12TiB, Used:2.02TiB (95.09%)
> > /dev/mapper/sda6_crypt 2.12TiB
> >
> > Metadata,single: Size:16.00GiB, Used:14.73GiB (92.04%)
> > /dev/mapper/sda6_crypt 16.00GiB
> >
> > System,single: Size:32.00MiB, Used:320.00KiB (0.98%)
> > /dev/mapper/sda6_crypt 32.00MiB
> >
> > Unallocated:
> > /dev/mapper/sda6_crypt 1.40TiB