Re: Status of tip/x86/apic

From: Steven Rostedt
Date: Mon Dec 15 2014 - 10:52:13 EST

On Fri, 12 Dec 2014 21:35:14 +0100 (CET)
Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:

> 2) Proper trace point support so we can actually track allocation
> and the hardware access at the various domain levels because
> some of these issues cannot be decoded by looking at a state
> snapshot in debugfs. With some of them we even can't access
> debugfs at all.
> Though one issue with that is, that for the early boot process
> there is no way to store that information as the tracer gets
> enabled way after init_IRQ(). But there is no reason why the
> tracer could not be enabled before that. All it needs is a
> working memory allocator. Steven?

And as we found out, we also need working RCU ;-) (but that still
happens before init_IRQ() which is what we want here).

> Now there is another class of problems which might be hard to
> debug. When the machine just boots into a hang, so we dont get a
> ftrace output neither from an oops nor from a console. It would
> be nice if we could have a command line option which prints
> enabled trace points via (early_)printk. That would avoid
> sending out ad hoc printk debug patches which will basically
> provide the same information as the trace_points. That would be
> useful for other hard to debug boot hangs as well. Steven?

Agreed and patches have been sent to Linus.

> I think the above can be solved, so we need to agree on a proper
> set of tracepoints. I came up with the following list:
> - trace_irqdomain_create(domain->id, domain->name, ...)

Is that suppose to be a variable number of args? Tracepoints do not
support a variable length number of args passed in. I guess we could
add that, but it wont be for this merge window.

I've added Mathieu and Frederic to the Cc list here.

If we do support this (and if it is needed) we could make it use the
bprintf() infrastructure. It already supports just saving a format and
args directly to the the buffer, and a way to print them again.

tools/lib/traceevent/event-parse.c will need to deal with this. But it
too also already handles trace_bprintk().

> - trace_irqdomain_destroy(domain->id)
> - trace_irqdomain_alloc(irq_data)
> struct irq_data contains all relevant information for
> assigning the tracepoint data.
> __entry->virq = irq_data->virq;
> __entry->domainid = irq_data->domain;
> __entry->hwirq = irq_data->hwirq;
> TP_STORE_DATA(__entry->data, irq_data);
> Where TP_STORE_DATA checks for the above callback and uses it
> if available, otherwise we just clear the data field.
> So this reuses the callback which we want for debugfs
> anyway. The print format is just hexdump. See my above
> rationale for that.

We could also create a plugin in tools/lib/traceevent that can give us
more than just a hexdump. That is, we have the code in the kernel
source tree but not in the kernel binary.

-- Steve

> - trace_irqdomain_free(virq, domain->id)
> - trace_irqdomain_hw_access(irqdata)
> Same "data" and pretty printing argument as for
> trace_irqdomain_alloc()
> The obvious place to put such a trace point is
> e.g. irq_chip_write_msi_msg() where the callback records the
> currently written msi msg.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at