HI,
At 2024-12-19 02:22:53, "Suren Baghdasaryan" <surenb@xxxxxxxxxx> wrote:
On Wed, Dec 18, 2024 at 4:49 AM David Wang <00107082@xxxxxxx> wrote:
Hi,
I found another usage/benefit for accumulative counters:
On my system, /proc/allocinfo yields about 5065 lines, of which 2/3 lines have accumulative counter *0*.
meaning no memory activities. (right?)
It is quite a waste to keep those items which are *not alive yet*.
With additional changes, only 1684 lines in /proc/allocinfo on my system:
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -95,8 +95,11 @@ static void alloc_tag_to_text(struct seq_buf *out, struct codetag *ct)
struct alloc_tag_counters counter = alloc_tag_read(tag);
s64 bytes = counter.bytes;
+ if (counter.accu_calls == 0)
+ return;
seq_buf_printf(out, "%12lli %8llu ", bytes, counter.calls);
I think this is quite an improvement worth pursuing.
(counter.calls could also be used to filter out "inactive" items, but
lines keep disappearing/reappearing can confuse monitoring systems.)
Please see discussion at
https://lore.kernel.org/all/20241211085616.2471901-1-quic_zhenhuah@xxxxxxxxxxx/
Thanks for the information.
My point is that with this change we lose information which can be
useful. For example if I want to analyze all the places in the kernel
where memory can be potentially allocated, your change would prevent
me from doing that
Maybe the filter can be disabled when DEBUG is on?
> No, I disagree. Allocation that was never invoked is not the same as
no allocation at all. How would we know the difference if we filter
out the empty ones?
Totally agree with this, I think (bytes || counter.calls) does not make good filter. Accumulative counter is the answer. :)
If you don't want to see all the unused sites, you can filter them in
the userspace. I also suspect that for practical purposes you would
want to filter small ones (below some threshold) as well.
I have setup monitoring tool polling /proc/allocinfo every 5 seconds on my system,
and it takes totally ~11ms and ~100 read syscalls just read out all the content in one round,
and with (counter.accu_calls == 0) filter, it takes totally ~4.4ms and 34 read syscalls.
it would be nice to have ~60% performance improvement....
Thanks
David