Re: [PATCH v2] mm: emit tracepoint when RSS changes by threshold

From: Joel Fernandes
Date: Thu Sep 05 2019 - 10:23:22 EST


On Thu, Sep 05, 2019 at 04:20:10PM +0200, Michal Hocko wrote:
> On Thu 05-09-19 10:14:52, Joel Fernandes wrote:
> > On Thu, Sep 05, 2019 at 12:54:24PM +0200, Michal Hocko wrote:
> > > On Wed 04-09-19 12:28:08, Joel Fernandes wrote:
> > > > On Wed, Sep 4, 2019 at 11:38 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Wed 04-09-19 11:32:58, Joel Fernandes wrote:
> > > > > > On Wed, Sep 04, 2019 at 10:45:08AM +0200, Michal Hocko wrote:
> > > > > > > On Tue 03-09-19 16:09:05, Joel Fernandes (Google) wrote:
> > > > > > > > Useful to track how RSS is changing per TGID to detect spikes in RSS and
> > > > > > > > memory hogs. Several Android teams have been using this patch in various
> > > > > > > > kernel trees for half a year now. Many reported to me it is really
> > > > > > > > useful so I'm posting it upstream.
> > > > > > > >
> > > > > > > > Initial patch developed by Tim Murray. Changes I made from original patch:
> > > > > > > > o Prevent any additional space consumed by mm_struct.
> > > > > > > > o Keep overhead low by checking if tracing is enabled.
> > > > > > > > o Add some noise reduction and lower overhead by emitting only on
> > > > > > > > threshold changes.
> > > > > > >
> > > > > > > Does this have any pre-requisite? I do not see trace_rss_stat_enabled in
> > > > > > > the Linus tree (nor in linux-next).
> > > > > >
> > > > > > No, this is generated automatically by the tracepoint infrastructure when a
> > > > > > tracepoint is added.
> > > > >
> > > > > OK, I was not aware of that.
> > > > >
> > > > > > > Besides that why do we need batching in the first place. Does this have a
> > > > > > > measurable overhead? How does it differ from any other tracepoints that we
> > > > > > > have in other hotpaths (e.g. page allocator doesn't do any checks).
> > > > > >
> > > > > > We do need batching not only for overhead reduction,
> > > > >
> > > > > What is the overhead?
> > > >
> > > > The overhead is occasionally higher without the threshold (that is if we
> > > > trace every counter change). I would classify performance benefit to be
> > > > almost the same and within the noise.
> > >
> > > OK, so the additional code is not really justified.
> >
> > It is really justified. Did you read the whole of the last email?
>
> Of course I have. The information that numbers are in noise with some
> outliers (without any details about the underlying reason) is simply
> showing that you are optimizing something probably not worth it.
>
> I would recommend adding a simple tracepoint. That should be pretty non
> controversial. And if you want to add an optimization on top then
> provide data to justify it.

Did you read the point about trace sizes? We don't want traces flooded and
you are not really making any good points about why we should not reduce
flooding of traces. I don't want to simplify it and lose the benefit. It is
already simple enough and non-controversial.

thanks,

- Joel