Re: [PATCH] Add slowpath enter/exit trace events

From: Mel Gorman
Date: Thu Nov 23 2017 - 08:45:53 EST


On Thu, Nov 23, 2017 at 01:25:30PM +0100, Michal Hocko wrote:
> On Thu 23-11-17 11:43:36, peter.enderborg@xxxxxxxx wrote:
> > From: Peter Enderborg <peter.enderborg@xxxxxxxx>
> >
> > The warning of slow allocation has been removed, this is
> > a other way to fetch that information. But you need
> > to enable the trace. The exit function also returns
> > information about the number of retries, how long
> > it was stalled and failure reason if that happened.
>
> I think this is just too excessive. We already have a tracepoint for the
> allocation exit. All we need is an entry to have a base to compare with.
> Another usecase would be to measure allocation latency. Information you
> are adding can be (partially) covered by existing tracepoints.
>

You can gather that by simply adding a probe to __alloc_pages_slowpath
(like what perf probe does) and matching the trigger with the existing
mm_page_alloc points. This is a bit approximate because you would need
to filter mm_page_alloc hits that do not have a corresponding hit with
__alloc_pages_slowpath but that is easy.

With that probe, it's trivial to use systemtap to track the latencies between
those points on a per-processes basis and then only do a dump_stack from
systemtap for the ones that are above a particular threshold. This can all
be done without introducing state-tracking code into the page allocator
that is active regardless of whether the tracepoint is in use. It also
has the benefit of working with many older kernels.

If systemtap is not an option then use ftrace directly to gather the
information from userspace. It can be done via trace_pipe with some overhead
or on a per-cpu basis like what trace-cmd does. It's important to note
that even *if* the tracepoints were introduced that it would be necessary
to have something gather the information and report it in a sensible fashion.

That probe+mm_page_alloc can tell you the frequency of allocation
attempts that take a long time but not the why. Compaction and direct
reclaim latencies can be checked via existing tracepoints and in the case
of compaction, detailed information can also be gathered from existing
tracepoints. Detailed information on why direct reclaim stalled can be
harder but the biggest one is checking if reclaim stalls due to congestion
and again, tracepoints already exist for that.

I'm not convinced that a new tracepoint is needed.

--
Mel Gorman
SUSE Labs