Re: [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support

From: Thierry Reding
Date: Tue Feb 05 2013 - 04:16:16 EST


On Mon, Feb 04, 2013 at 08:41:25PM -0800, Terje BergstrÃm wrote:
> On 04.02.2013 03:03, Thierry Reding wrote:
> > On Tue, Jan 15, 2013 at 01:44:00PM +0200, Terje Bergstrom wrote:
> >> diff --git a/drivers/gpu/host1x/debug.h b/drivers/gpu/host1x/debug.h
> > [...]
> >> +struct output {
> >> + void (*fn)(void *ctx, const char *str, size_t len);
> >> + void *ctx;
> >> + char buf[256];
> >> +};
> >
> > Do we really need this kind of abstraction? There really should be only
> > one location where debug information is obtained, so I don't see a need
> > for this.
>
> This is used by debugfs code to direct to debugfs, and
> nvhost_debug_dump() to send via printk.

Yes, that was precisely my point. Why bother providing the same data via
several output methods. debugfs is good for showing large amounts of
data such as register dumps or a tabular representation of syncpoints
for instance.

If, however, you want to interactively show debug information using
printk the same format isn't very useful and something more reduced is
often better.

> >> diff --git a/drivers/gpu/host1x/hw/debug_hw.c b/drivers/gpu/host1x/hw/debug_hw.c
> >
> >> +static int show_channel_command(struct output *o, u32 addr, u32 val, int *count)
> >> +{
> >> + unsigned mask;
> >> + unsigned subop;
> >> +
> >> + switch (val >> 28) {
> >> + case 0x0:
> >
> > These can easily be derived by looking at the debug output, but it may
> > still make sense to assign symbolic names to them.
>
> I have another suggestion. In downstream I removed the decoding part and
> I just print out a string of hex. That removes quite a bit bunch of code
> from kernel. It makes the debug output also more compact.
>
> It's much easier to write a user space program to decode than maintain
> it in kernel.

I don't know. I think if you use in-kernel debugging facilities such as
debugfs or printk, then the output should be self-explanatory. However I
do see the usefulness of having a binary dump that can be decoded in
userspace. But I think if we want to go that way we should make that a
separate interface. USB provides something like that, which can then be
fed to libpcap or wireshark to capture and analyze USB traffic. If done
properly you get replay functionality for free. I don't know what infra-
structure exists to help with implementing something similar.

So I think having debugfs output some data about syncpoints or the state
of channels might be useful to quickly diagnose a certain set of
problems but for anything more involved maybe a complete binary dump may
be better.

I'm not sure whether doing this would be acceptable though. Maybe Dave
or somebody else on the lists can answer that. An alternative way to
achieve the same would be to hook ioctl() from userspace and not do any
of it in kernel space.

> >> +static void show_channel_word(struct output *o, int *state, int *count,
> >> + u32 addr, u32 val, struct host1x_cdma *cdma)
> >> +{
> >> + static int start_count, dont_print;
> >
> > What if two processes read debug information at the same time?
>
> show_channels() acquires cdma.lock, so that shouldn't happen.

Okay. Another solution would be to pass around a debug context which
keeps track of the variables. But if we opt for a more involved dump
interface as discussed above this will no longer be relevant.

> >> +static void do_show_channel_gather(struct output *o,
> >> + phys_addr_t phys_addr,
> >> + u32 words, struct host1x_cdma *cdma,
> >> + phys_addr_t pin_addr, u32 *map_addr)
> >> +{
> >> + /* Map dmaget cursor to corresponding mem handle */
> >> + u32 offset;
> >> + int state, count, i;
> >> +
> >> + offset = phys_addr - pin_addr;
> >> + /*
> >> + * Sometimes we're given different hardware address to the same
> >> + * page - in these cases the offset will get an invalid number and
> >> + * we just have to bail out.
> >> + */
> >
> > Why's that?
>
> Because of a race - memory might've been unpinned and unmapped from
> IOMMU and when we re-map (pin), we are given a new address.
>
> But, I think this comment is a bit stale - we used to dump also old
> gathers. The latest code only dumps jobs in sync queue, so the race
> shouldn't happen.

Okay. In the context of a channel dump interface this may not be
relevant anymore. Can you think of any issue that wouldn't be detectable
or debuggable by analyzing a binary dump of the data within a channel?
I'm asking because at that point we wouldn't be able to access any of
the in-kernel data structures but would have to rely on the data itself
for diagnostics. IOMMU virtual addresses won't be available and so on.

> >> +static void host1x_debug_show_channel_cdma(struct host1x *m,
> >> + struct host1x_channel *ch, struct output *o, int chid)
> >> +{
> > [...]
> >> + switch (cbstat) {
> >> + case 0x00010008:
> >
> > Again, symbolic names would be nice.
>
> I propose I remove the decoding from kernel, and save 200 lines.

I think it could be more than 200 lines. If all we provide in the kernel
is some statistics about syncpoint usage or channel state that should be
a lot less code than we have now.

However that would make it necessary to provide userspace tools that can
provide the same quality of diagnostics, so I'm not sure if it's doable
without access to the in-kernel data structures.

Thierry

Attachment: pgp00000.pgp
Description: PGP signature