Re: [PATCH] Add accumulated call counter for memory allocation profiling

From: David Wang
Date: Wed Sep 11 2024 - 22:28:53 EST


At 2024-07-02 05:58:50, "Kent Overstreet" <kent.overstreet@xxxxxxxxx> wrote:
>On Mon, Jul 01, 2024 at 10:23:32AM GMT, David Wang wrote:
>> HI Suren,
>>
>> At 2024-07-01 03:33:14, "Suren Baghdasaryan" <surenb@xxxxxxxxxx> wrote:
>> >On Mon, Jun 17, 2024 at 8:33 AM David Wang <00107082@xxxxxxx> wrote:
>> >>
>> >> Accumulated call counter can be used to evaluate rate
>> >> of memory allocation via delta(counters)/delta(time).
>> >> This metrics can help analysis performance behaviours,
>> >> e.g. tuning cache size, etc.
>> >
>> >Sorry for the delay, David.
>> >IIUC with this counter you can identify the number of allocations ever
>> >made from a specific code location. Could you please clarify the usage
>> >a bit more? Is the goal to see which locations are the most active and
>> >the rate at which allocations are made there? How will that
>> >information be used?
>>
>> Cumulative counters can be sampled with timestamp, say at T1, a monitoring tool got a sample value V1,
>> then after sampling interval, at T2, got a sample value V2. Then the average rate of allocation can be evaluated
>> via (V2-V1)/(T2-T1). (The accuracy depends on sampling interval)
>>
>> This information "may" help identify where the memory allocation is unnecessary frequent,
>> and gain some better performance by making less memory allocation .
>> The performance "gain" is just a guess, I do not have a valid example.
>
>Easier to just run perf...

Hi,

To Kent:
It is strangely odd to reply to this when I was trying to debug a performance issue for bcachefs :)

Yes it is true that performance bottleneck could be identified by perf tools, but normally perf
is not continously running (well, there are some continous profiling projects out there).
And also, memory allocation normally is not the biggest bottleneck,
its impact may not easily picked up by perf.

Well, in the case of https://lore.kernel.org/lkml/20240906154354.61915-1-00107082@xxxxxxx/,
the memory allocation is picked up by perf tools though.
But with this patch, it is easier to spot that memory allocations behavior are quite different:
When performance were bad, the average rate for
"fs/bcachefs/io_write.c:113 func:__bio_alloc_page_pool" was 400k/s,
while when performance were good, rate was only less than 200/s.

(I have a sample tool collecting /proc/allocinfo, and the data is stored in prometheus,
the rate is calculated and plot via prometheus statement:
irate(mem_profiling_count_total{file=~"fs/bcachefs.*", func="__bio_alloc_page_pool"}[5m]))

Hope this could be a valid example demonstrating the usefulness of accumulative counters
of memory allocation for performance issues.


Thanks
David