Re: [PATCH v2] mm: emit tracepoint when RSS changes by threshold

From: Tim Murray
Date: Thu Sep 05 2019 - 13:43:44 EST


On Thu, Sep 5, 2019 at 9:03 AM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
> I might misunderstand this but is the issue here actually throttling
> of the sheer number of trace records or tracing large enough changes
> to RSS that user might care about? Small changes happen all the time
> but we are likely not interested in those. Surely we could postprocess
> the traces to extract changes large enough to be interesting but why
> capture uninteresting information in the first place? IOW the
> throttling here should be based not on the time between traces but on
> the amount of change of the traced signal. Maybe a generic facility
> like that would be a good idea?

You want two properties from the tracepoint:

- Small fluctuations in the value don't flood the trace buffer. If you
get a new trace event from a process every time kswapd reclaims a
single page from that process, you're going to need an enormous trace
buffer that will have significant side effects on overall system
performance.
- Any spike in memory consumption gets a trace event, regardless of
the duration of that spike. This tracepoint has been incredibly useful
in both understanding the causes of kswapd wakeups and
lowmemorykiller/lmkd kills and evaluating the impact of memory
management changes because it guarantees that any spike appears in the
trace output.

As a result, the RSS tracepoint in particular needs to be throttled
based on the delta of the value, not time. The very first prototype of
the patch emitted a trace event per RSS counter change, and IIRC the
RSS trace events consumed significantly more room in the buffer than
sched_switch (and Android has a lot of sched_switch events). It's not
practical to trace changes in RSS without throttling. If there's a
generic throttling approach that would work here, I'm all for it; like
Dan mentioned, there are many more counters that we would like to
trace in a similar way.