Re: [PATCH 0/4] workqueue_tracepoint: Add worklet tracepoints forworklet lifecycle tracing

From: Ingo Molnar
Date: Sun Apr 26 2009 - 06:48:30 EST



* Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Sat, 25 Apr 2009 02:37:03 +0200
> Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote:
>
> > I discovered it with this tracer. Then it brought me to
> > write this patch:
> >
> > http://lkml.org/lkml/2009/1/31/184
> >
> > ...
> >
> > Still with these same observations, I wrote this another one:
> >
> > http://lkml.org/lkml/2009/1/26/363
>
> OK, it's great that you're working to improve the workqueue code.
> But does this justify permanently adding debug code to the core
> workqueue code? [...]

Andrew - but this is not what you asked originally. Here's the
exchange, not cropped:

> > > So this latest patchset provides all these required
> > > informations on the events tracing level.
> > Well.. required by who?
> >
> > I don't recall ever seeing any problems of this nature, nor
> > patches to solve any such problems.

And Frederic replied that there's three recent examples of various
patches and problem reports resulting out of the workqueue
tracepoints.

Now you argue 'yes, there might have been an advantage but it's not
permanent' - which appears to be a somewhat shifting position
really. I dont think _our_ position has shifted in any way - please
correct me if i'm wrong ;-)

And i'm, as the original author of the kernel/workqueue.c code
(heck, i even coined the 'workqueue' term - if that matters) agree
with Frederic here: more transparency in seeing what goes on in a
subsystem brings certain advantages:

- it spurs development
- it helps the fixing of bugs
- and generally it helps people understand the kernel better

weighed against the cost of maintaining (and seeing) those
tracepoints.

In the scheduler we have more than 60 distinct points of
instrumentation.

The patches we are discussing here add 6 new tracepoints to
kernel/workqueue.c - and i'd argue they are pretty much the maximum
we'd ever want to have there.

I've been maintaining the scheduler instrumentation for years, and
its overhead is, in hindsight, rather low - and the advantage is
significant. As long as tracing and statistics instrumentation has a
very standard and low-key "function call" visual form, i dont even
notice them most of the time.

And the thing is, the workqueue code has been pretty problematic
lately - with lockups and other regressions. It's a pretty 'opaque'
facility that _hides_ what goes on in it - so more transparency
might be a good answer just on that basis alone.

> [...] In fact, because you've discovered these problem, the
> reasons for adding the debug code have lessened!
>
> What we need are curious developers looking into how well
> subsystems are performing and how well callers are using them.
> Adding fairly large amounts of permanent debug code into the core
> subsystems is a peculiar way of encouraging such activity.

but this - which you call peculiar - is exactly what happened when
the first set of tracepoints were added.

Secondly, if we discount the (fairly standard) off-site tracepoints,
is not "large amount of debug code" - the tracepoints are completely
off site and are not a worry as long as the tracepoint arguments are
kept intact. The bits in kernel/workqueue.c are on the 26 lines flux
range:

workqueue.c | 26 ++++++++++++++++++++------

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/