Re: Abnormal values show up in /proc/allocinfo

From: Suren Baghdasaryan
Date: Thu Nov 28 2024 - 13:47:04 EST


On Thu, Nov 28, 2024 at 12:34 AM David Wang <00107082@xxxxxxx> wrote:
>
> At 2024-11-27 17:44:26, "David Wang" <00107082@xxxxxxx> wrote:
> >
> >
> >
> >At 2024-11-27 01:10:23, "Suren Baghdasaryan" <surenb@xxxxxxxxxx> wrote:
> >
> >>
> >>Hi David,
> >>Thanks for the investigation. I think your suggestion should work fine
> >>and it's simpler than what we do now. It will swap not only counters
> >>but allocation locations as well, however I think we already do that
> >>when we call __alloc_tag_ref_set(). So, instead of clearing the
> >>original tag, decrementing the new tag's counter (to compensate for
> >>its own allocation) and reassigning the old tag to the new counter,
> >>you simply swap the tags. That seems fine to me.
>
> I will send a patch for this.

Yes please. the more I look into it, the more it looks like the right approach.

>
> >>However I think there is still a bug where some get_new_folio()
> >>callback does not increment the new folio's counters and that's why we
> >>get an underflow when calling alloc_tag_sub(). I'll try to reproduce
> >>on my side and see what's going on there.
> >
> >Agreed, the reason for underflow with current code should be clarified.
> >Just update reproduce procedure:
> >1. fio --randrepeat=1 --ioengine=libaio --direct=1 --name=test --bs=4k --iodepth=64 --size=1G --readwrite=randrw --runtime=100 --numjobs=4 --time_based=1
> >2. echo 1 >/proc/sys/vm/compact_memory
> >3. echo 1 > /proc/sys/vm/drop_caches
> >(It is very strange, on my VM, "echo 3 > /proc/sys/vm/drop_caches" would not trigger easily.
> >4 cat /proc/allocinfo | grep __filemap_get_folio
> >
> >
> >FYI
> >David
> >
> Finally find out why those underflow values on my system,
> clear_page_tag_ref() -> set_codetag_empty() only works when
> CONFIG_MEM_ALLOC_PROFILING_DEBUG is defined.....

Ah, good catch! That's why my attempts to reproduce the issue were
unsuccessful. I always keep CONFIG_MEM_ALLOC_PROFILING_DEBUG enabled.

> I guess you guys would have CONFIG_MEM_ALLOC_PROFILING_DEBUG=y, but I don't
> think it would be the case for end users.

Correct.

>
> There are several references of clear_page_tag_ref()/set_codetag_empty():
>
> ./mm/mm_init.c: clear_page_tag_ref(page);
> ./mm/mm_init.c: clear_page_tag_ref(page);
> ./mm/page_alloc.c: clear_page_tag_ref(page);
> ./mm/page_alloc.c: clear_page_tag_ref(page)
> ./mm/slub.c: set_codetag_empty(&slab_exts[offs].ref);
> ./mm/slub.c: set_codetag_empty(&vec[i].ref);
>
>
> Things may go off when CONFIG_MEM_ALLOC_PROFILING_DEBUG is not set.

I'll go over all set_codetag_empty() uses and check if they are valid.
set_codetag_empty() should only be used when we have an object with no
valid tag reference and we mark it as empty to avoid warnings when we
free it. In clear_page_tag_ref() set_codetag_empty() is used to clear
a valid tag reference and that's not right. I'll think about how we
can avoid such misuse in the future.
Thanks for the investigation, David. Looking forward to your patch.
Suren.

>
>
> FYI
> David
>