Re: [PATCH 2/2] sched/debug: add sched_update_nr_running tracepoint

From: Valentin Schneider
Date: Tue Sep 03 2019 - 12:05:52 EST


On 03/09/2019 16:43, Radim KrÄmÃÅ wrote:
> The paper "The Linux Scheduler: a Decade of Wasted Cores" used several
> custom data gathering points to better understand what was going on in
> the scheduler.
> Red Hat adapted one of them for the tracepoint framework and created a
> tool to plot a heatmap of nr_running, where the sched_update_nr_running
> tracepoint is being used for fine grained monitoring of scheduling
> imbalance.
> The tool is available from https://github.com/jirvoz/plot-nr-running.
>
> The best place for the tracepoints is inside the add/sub_nr_running,
> which requires some shenanigans to make it work as they are defined
> inside sched.h.
> The tracepoints have to be included from sched.h, which means that
> CREATE_TRACE_POINTS has to be defined for the whole header and this
> might cause problems if tree-wide headers expose tracepoints in sched.h
> dependencies, but I'd argue it's the other side's misuse of tracepoints.
>
> Moving the import sched.h line lower would require fixes in s390 and ppc
> headers, because they don't include dependecies properly and expect
> sched.h to do it, so it is simpler to keep sched.h there and
> preventively undefine CREATE_TRACE_POINTS right after.
>
> Exports of the pelt tracepoints remain because they don't need to be
> protected by CREATE_TRACE_POINTS and moving them closer would be
> unsightly.
>

Pure trace events are frowned upon in scheduler world, try going with
trace points. Qais did something very similar recently:

https://lore.kernel.org/lkml/20190604111459.2862-1-qais.yousef@xxxxxxx/

You'll have to implement the associated trace events in a module, which
lets you define your own event format and doesn't form an ABI :).