Re: [PATCH, RFC 0/3] Improvements to the tracing documentation

From: Tom Zanussi
Date: Tue Apr 14 2009 - 01:23:22 EST


On Tue, 2009-04-14 at 00:55 +0200, Ingo Molnar wrote:
> * Theodore Tso <tytso@xxxxxxx> wrote:
>
> > On Mon, Apr 13, 2009 at 11:31:24PM +0200, Ingo Molnar wrote:
> > > Cool. [ And i guess you'll like the per tracepoint filter
> > > expressions too :-) ]
> >
> > I haven't played with them yet, but I was looking over the source
> > code at them (since they aren't documented yet :-). It looks like
> > at the moment only integer matches are allowed, right? That's a
> > bit of an issue for me, since one of the things I'd really like to
> > be able to do is filter based on devname (i.e., sda2). (Most of
> > the time we only want to collect information for a particular
> > block device or filesystem.)
>
> You can already do:
>
> aldebaran:/debug/tracing/events/sched/sched_process_wait> echo "comm == Xorg" > filter
> aldebaran:/debug/tracing/events/sched/sched_process_wait> cat filter
> comm == Xorg
>
> But string values depends on the type of the format field - so you
> cannot do string matches on integer fields.
>
> For kdev_t matches i think we'll need native support for that type -
> in addition to the integer/string types. It will come up in other
> places as well and user-space knows about devices as well.
>
> > Actually, the fact that I'm having to drop some 32 bytes for each
> > jbd2 and ext4 trace log for the bdevname in the ring buffer is
> > really for the birds. What I really want to do is just to drop in
> > the dev_t, and then for the tracing infrastructure to have an
> > efficient (cached) way of taking the dev_t and turning that back
> > into struct block_device at TP_printk time so we can print the
> > bdevname when it's needed. We deifnitely don't want to be calling
> > bdget() in fs/block_dev.c each time we print a line in the tracing
> > buffer! I'm guessing that's something the blktrace tracer would
> > find handy as well.
>
> Yeah.
>
> It could be worked around right now by converting it to an integer
> but i think what we want is native support for kdev_t, together with
> all the usual convenience forms of specifying it: sda1 should work
> the same way as 8:1 or 0801. Even /dev/sda1 should be recognized in
> a filter expression.
>

Yeah, I'll make sure the new filter parser can handle all these forms,
but hopefully in a general way that would allow different forms to be
used as predicate values for any special type such as kdev_t. Maybe a
simple expression matcher that when setting up the filter would map the
matched value expression to some code that would generate the final
filter value to be used in the run-time filter.

Tom

> > Of course having more kernel code play with dev_t's directly isn't
> > considered politically correct in some circles, but tough. :-) We
> > can't exactly drop a pointer to a struct block_device in the trace
> > buffer, since there's no guarantee it will still be valid when we
> > read it out. Dropping in a dev_t is exactly what we want. It
> > would be nice though if there was a way to specify a major/minor
> > number as the filter predicate for the dev_t, and not to have the
> > user generate the MAJOR/MINOR encoding. So some way of parsing
> > "MKDEV(8, 4)" as the input to the filter predicate would probably
> > be a really good thing to do.
>
> Yeah, exactly. We already have smarts in init/* to recognize certain
> device string patterns (for rootdev specification) - that could be
> factored out (it already is to a large degree) and reused. We dont
> need full udev enumeration really - we just need the most common
> variants.
>
> Regardless of whether it's considered politically correct or not ;-)
> It's clearly useful.
>
> Ingo

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/