Re: Tracehooks in scheduler

From: Qais Yousef
Date: Fri Apr 26 2019 - 08:34:50 EST


Hi Quentin

On 04/26/19 11:26, Quentin Perret wrote:
> Hi Qais,
>
> On Monday 15 Apr 2019 at 15:49:45 (+0100), Qais Yousef wrote:
> > Hi Steve, Peter
> >
> > > On 04/07/19 18:52, Qais Yousef wrote:
> > > > Hi Steve, Peter
> > > >
> > > > I know the topic has sprung up in the past but I couldn't find anything that
> > > > points into any conclusion.
> > > >
> > > > As far as I understand new TRACE_EVENTS() in the scheduler (and probably other
> > > > subsystems) isn't desirable as it intorduces a sort of ABI that can be painful
> > > > to maintain.
> > > >
> > > > But for us to be able to test various aspect of EAS, we rely on some events
> > > > that track load_avg, util_avg and some other metrics in the scheduler.
> > > > Example of such patches that are in android and we maintain out of tree can be
> > > > found here:
> > > >
> > > > https://android.googlesource.com/kernel/common/+/42903694913697da88a4ac627a92bbfdf44f0a2e
> > > > https://android.googlesource.com/kernel/common/+/6dfaed989ea4ca223f0913dfc11cdafd9664fc1c
> > > >
> > > > Dietmar and Quentin pointed me to a discussion you guys had with Daniel Bristot
> > > > in the last LPC when he had a similar need. So it is something that could
> > > > benefit other users as well.
> > > >
> > > > What is the best way forward to be able to add tracehooks into the scheduler
> > > > and any other subsystem for that matters?
> > > >
> > > > We tried using DECLARE_TRACE() to create a tracepoint which doesn't export
> > > > anything in /sys/kernel/debug/tracing/events and hoped that we can use eBPF or
> > > > a kernel module to attach to this tracepoint and access the args to inject our
> > > > own trace_printks() but this didn't work. The glue logic necessary to attach
> > > > to this tracepoint in a similar manner to how RAW_TRACEPOINT() in eBPF works
> > > > isn't there AFAICT.
> > > >
> > > > I can post the full example if the above doesn't make sense. I am still
> > > > familiarizing myself with the different aspects of this code as well. There
> > > > might be support for what we want but I failed to figure out the magic
> > > > combination to get it to work.
> > > >
> > > > If I got this glue logic done, would this be an acceptable solution? If not, do
> > > > you have any suggestions on how to progress?
> >
> > I have written some patches in hope it'll clarify further what we are trying to
> > achieve here and what would be the best possible approach about it.
> >
> > I have taken two approaches to solve the problem.
> >
> >
> > 1.
> >
> > https://github.com/qais-yousef/linux/commit/e7d0aa7ff1328195f314b0730c4cc744dec4261e
> >
> > In this approach everything we need is already available and we just
> > need to create new tracepoints as described in
> > Documentation/trace/tracepoints.rst and export it with
> > EXPORT_TRACEPOINT_SYMBOL_GPL().
> >
> > A user then can have an out of tree module to probe this tp and
> > manipulate it as they like.
> >
> > Example of such a module is here, the pelt_se tp is to demo the
> > approach:
> >
> > https://github.com/qais-yousef/tracepoints-helpers/blob/master/module-pelt-se/probe_tp_pelt_se.c
> >
> > Googling around I can see that the use of
> > EXPORT_TRACEPOINT_SYMBOL_GPL() is not desired unless the module is
> > in-tree which I doubt will be the case here.
> >
> > https://lore.kernel.org/lkml/20150422130052.4996e231@xxxxxxxxxxxxxxxxxx/
> >
> > 2.
> > https://github.com/qais-yousef/linux/commit/fb9fea29edb8af327e6b2bf3bc41469a8e66df8b
> > https://github.com/qais-yousef/linux/commit/edd2498c5bbfca1a26acd151a4e3323e511f3455
> >
> > In this approach I try to allow attaching to a TP using eBPF. Sadly the
> > current infrastructure is lacking so I hacked the above up to create a
> > new DECLARE_TRACE_HOOK() macro which will allow using eBPF but without
> > exporting anything in debugfs that can constitute an ABI.
> >
> > The following eBPF program can be used then to attach and access some
> > info at the TP:
> >
> > https://github.com/qais-yousef/tracepoints-helpers/blob/master/bpf/tp_trace_printk_pelt_se
> >
> >
> > Does any of the above approaches make sense?
>
> For the EAS-testing use-case you mentioned earlier, it's really for
> debugging so we don't actually need the eBPF safety. None of this is

Well debugging and testing are different. But I get what you mean. Yes it'd be
running in a special environment and running on production is not required
although would be a plus thing to have. ie running the test on an Android phone
using the stock kernel.

The focus for us is ensuring mainline tree doesn't regress as the code evolves.

Our test suite lives here if anyone is interested in having a look:

https://github.com/ARM-software/lisa

I guess in your case, Quentin, they'd help with pure debugging too if you ever
got a bug report in this area.

> supposed to run in production I would say. So I tend to prefer option 1
> if that works for everybody interested in this thing.

I prefer it too since it's the simplest thing to do. The only other simpler
option is to add the TRACE_EVENTs themselves :) /me hide behind the curtains

>
> And then what would be the story ? We would carry a module out-of-tree
> in our test suite to extract scheduler data and then post-process it in
> userspace or something ? Since that would be an out-of-tree module,
> upstream doesn't commit to anything to userspace, so perhaps that could
> work.

Exactly. Unless the tracepoint and its args are an ABI, then it's a deadend..

But I hope that's not the case since for us at least if the tracepoint
changed signature (which I think that it's something that will happen rarely),
updating the out of tree module to use the right signature based on kernel
version is dead easy.

The only problem with this approach (and eBPF one) is that if you need to
access a none exported data structures. Hopefully if the right thing is passed
in the args then that would not be necessary.
Also it's easy to work around the problem by compiling the out-of-tree module
in-tree. I have no clue how to re-phrase this in a simpler way ;)
There's no such workaround that I know of in eBPF case.

By the way I've seen some discussion to deal with this problem by exporting
type information in the kernel image. I think it was called BTF

https://facebookmicrosites.github.io/bpf/blog/2018/11/14/btf-enhancement.html

>
> Another thing, should these sched tracepoints be guarded by sched_debug ?

I prefer not to so that such testing can be performed on production kernels
that don't have sched_debug. But as I stated earlier that is not a requirement
that we must have.

Thanks

--
Qais Yousef