Re: [PATCH v3] trace: Add x86 irq vector entry/exit tracepoints

From: David Sharp
Date: Thu Jul 07 2011 - 20:58:21 EST


On Thu, Jul 7, 2011 at 4:34 PM, Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote:
> On Tue, Jun 21, 2011 at 11:45:03AM -0700, Vaibhav Nagarnaik wrote:
>> From: Jiaying Zhang <jiayingz@xxxxxxxxxx>
>>
>> The current interrupt trace of irq_handler_entry and irq_handler_exit
>> give traces of when an interrupt is handled. They provide good data
>> about when the system is running in kernel space and how it affects the
>> currently running applications.
>>
>> Apart from this, they are IRQ vectors which trigger the system into
>> kernel space. Tracing such events gives us the trace of where the system
>> is spending its time. We want to know which cores are handling
>> interrupts and how they are affecting other processes in the system.
>> Also, the trace provides information about when the cores are idle and
>> which interrupts are changing that state.
>>
>> The following patch adds the event definition and trace instrumentation
>> for interrupt vectors. A lookup table is provided to print out readable
>> IRQ vector names. Apart from the IRQ vectors handled in the generic
>> kernel code, some x86 specific IRQ vectors are also traced. The lookup
>> table can be extended by adding other arch-specific IRQ vectors.
>>
>> Changelog:
>> v2-v3
>> * Move the irq_vector_{entry|exit} tracepoints back under irq/ sub
>> Â folder
>> * Trace the common IRQs in generic kernel code so that all archs can
>> Â benefit.
>>
>> v1-v2
>> * Move the irq_vector_{entry|exit} tracepoints under irq_vectors/ sub
>> Â folder
>>
>> Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@xxxxxxxxxx>
>> ---
>> Âarch/x86/kernel/apic/apic.c       Â|  Â4 ++
>> Âarch/x86/kernel/cpu/mcheck/therm_throt.c | Â Â2 +
>> Âarch/x86/kernel/cpu/mcheck/threshold.c  |  Â2 +
>> Âarch/x86/kernel/irq.c          Â|  Â4 ++
>> Âarch/x86/kernel/time.c          |  16 ++++--
>> Âarch/x86/kernel/traps.c         Â|  Â3 +
>> Âarch/x86/mm/tlb.c            Â|  Â2 +
>> Âinclude/trace/events/irq.h        |  84 ++++++++++++++++++++++++++++++
>> Âkernel/hrtimer.c             |  Â6 ++
>> Âkernel/irq_work.c            Â|  Â4 ++
>> Âkernel/sched.c              |  Â4 ++
>> Âkernel/smp.c               |  Â5 ++
>> Âkernel/time/tick-broadcast.c       |  16 +++++-
>> Âkernel/time/tick-common.c        Â|  Â8 +++-
>> Âkernel/time/tick-sched.c         |  Â4 ++
>> Â15 files changed, 155 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
>> index b9338b8..159851c 100644
>> --- a/arch/x86/kernel/apic/apic.c
>> +++ b/arch/x86/kernel/apic/apic.c
>> @@ -1792,6 +1792,7 @@ void smp_spurious_interrupt(struct pt_regs *regs)
>>
>> Â Â Â exit_idle();
>> Â Â Â irq_enter();
>> + Â Â trace_irq_vector_entry(TRACE_SPURIOUS_APIC_VECTOR);
>> Â Â Â /*
>> Â Â Â Â* Check if this really is a spurious interrupt and ACK it
>> Â Â Â Â* if it is a vectored one. ÂJust in case...
>> @@ -1806,6 +1807,7 @@ void smp_spurious_interrupt(struct pt_regs *regs)
>> Â Â Â /* see sw-dev-man vol 3, chapter 7.4.13.5 */
>> Â Â Â pr_info("spurious APIC interrupt on CPU#%d, "
>> Â Â Â Â Â Â Â "should never happen.\n", smp_processor_id());
>> + Â Â trace_irq_vector_exit(TRACE_SPURIOUS_APIC_VECTOR);
>> Â Â Â irq_exit();
>> Â}
>>
>
> Seems a lot of these tracepoints fit under that pattern:
>
> Â Â Â Â Â Â Â Âirq_enter();
> Â Â Â Â+ Â Â Â trace_irq_entry_entry(foo)
>
> Â Â Â Â Â Â Â Âdo things
>
> Â Â Â Â+ Â Â Â trace_irq_exit(foo)
> Â Â Â Â Â Â Â Âirq_exit();
>
>
> What if we instead had trace_irq_enter() inside irq_enter() and
> trace_irq_exit() inside irq_exit() ?
>
> Then your specific irq tracepoints would just be one thing that inform us
> about the nature of the interrupt:
>
> Â Â Â Â Â Â Â Âirq_enter();
> Â Â Â Â+ Â Â Â trace_irq_vector_entry(foo);
>
> Â Â Â Â Â Â Â Âdo things
>
> Â Â Â Â Â Â Â Âirq_exit();
>
> Those who are only interested in knowing when we have irqs can just turn
> on trace_irq_enter() and trace_irq_exit().
>
> Those who want more details about the nature of these interrupts can turn
> on trace_irq_vector() and trace_irq_exit().
>
> So you don't need the trace_irq_vector_exit anymore.
>
> How does that sound?

That roughly doubles the overhead of hitting these irqs, and since
irqs can be nested, it's not guaranteed that the "irq_vector" event
immediately follows "irq_enter", so I think the trace could be
confusing.

I also think it's far more useful to know which interrupt occurred
than "some" interrupt occurred, and I don't see a case where one would
not want that info.

Moving the tracepoints into irq_enter() and irq_exit() might be
sensible, but I'd want to pass down that info.

>
>
>> diff --git a/arch/x86/kernel/time.c b/arch/x86/kernel/time.c
>> index 00cbb27..9179dca 100644
>> --- a/arch/x86/kernel/time.c
>> +++ b/arch/x86/kernel/time.c
>> @@ -13,6 +13,7 @@
>> Â#include <linux/interrupt.h>
>> Â#include <linux/time.h>
>> Â#include <linux/mca.h>
>> +#include <trace/events/irq.h>
>>
>> Â#include <asm/vsyscall.h>
>> Â#include <asm/x86_init.h>
>> @@ -51,6 +52,13 @@ unsigned long profile_pc(struct pt_regs *regs)
>> Â}
>> ÂEXPORT_SYMBOL(profile_pc);
>>
>> +static irqreturn_t timer_interrupt(int irq, void *dev_id);
>> +static struct irqaction irq0 Â= {
>> + Â Â .handler = timer_interrupt,
>> + Â Â .flags = IRQF_DISABLED | IRQF_NOBALANCING | IRQF_IRQPOLL | IRQF_TIMER,
>> + Â Â .name = "timer"
>> +};
>> +
>> Â/*
>> Â * Default timer interrupt handler for PIT/HPET
>> Â */
>> @@ -59,7 +67,9 @@ static irqreturn_t timer_interrupt(int irq, void *dev_id)
>> Â Â Â /* Keep nmi watchdog up to date */
>> Â Â Â inc_irq_stat(irq0_irqs);
>>
>> + Â Â trace_irq_handler_entry(irq, &irq0);
>> Â Â Â global_clock_event->event_handler(global_clock_event);
>> + Â Â trace_irq_handler_exit(irq, &irq0, 1);
>
> Seems it should go to generic event handlers in kernel/time ?

(same response as hrtimers, below...)

>
>> diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h
>> index 4fc1205..d2f3f1f 100644
>> --- a/include/trace/events/irq.h
>> +++ b/include/trace/events/irq.h
>> @@ -138,6 +138,90 @@ DEFINE_EVENT(softirq, softirq_raise,
>> Â Â Â TP_ARGS(vec_nr)
>> Â);
>>
>> +#if !defined(_TRACE_IRQ_VECTOR_ENUM)
>> +#define _TRACE_IRQ_VECTOR_ENUM
>> +enum {
>> + Â Â TRACE_NMI_VECTOR,
>> + Â Â TRACE_NOHZ_TIMER_VECTOR,
>> + Â Â TRACE_HRTIMER_VECTOR,
>> + Â Â TRACE_ONESHOT_TIMER_VECTOR,
>> + Â Â TRACE_PERIODIC_TIMER_BROADCAST_VECTOR,
>> + Â Â TRACE_PERIODIC_TIMER_VECTOR,
>> + Â Â TRACE_ERROR_APIC_VECTOR,
>> + Â Â TRACE_RESCHEDULE_VECTOR,
>> + Â Â TRACE_CALL_FUNCTION_VECTOR,
>> + Â Â TRACE_CALL_FUNCTION_SINGLE_VECTOR,
>> + Â Â TRACE_THERMAL_APIC_VECTOR,
>> + Â Â TRACE_THRESHOLD_APIC_VECTOR,
>> + Â Â TRACE_REBOOT_VECTOR,
>> + Â Â TRACE_SPURIOUS_APIC_VECTOR,
>> + Â Â TRACE_IRQ_WORK_VECTOR,
>> + Â Â TRACE_X86_PLATFORM_IPI_VECTOR,
>> + Â Â TRACE_INVALIDATE_TLB_VECTOR,
>> +};
>> +#endif
>
> Pure x86 things shouldn't be in a generic header.

Other platforms have IPI vector interrupts, so we can just remove
"X86_" from the name. Would that work?

>
> include/trace/event/irq.h should contain generic things. And I'm not sure
> we want some generalized vector namespace for that but rather single contained
> tracepoint names that make sense:
>
> trace_ipi_function(), trace_irq_reschedule()...

I don't see how that can fit with putting the tracepoints in
irq_enter() and irq_exit()...

Also I'm not sure it makes sense to bloat the trace events with
individual tracepoints for interrupts. Some of these events are
relatively rare in archs: RESCHEDULE_VECTOR is only in x86 and ia64,
and IPI will not be in any unicore machine, for example. They may not
warrant having their own events instead of an enum entry.

>
> But vector namespace makes sense for archs yeah.
>
>> Â/* This part must be outside protection */
>> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
>> index a9205e3..84a0c70 100644
>> --- a/kernel/hrtimer.c
>> +++ b/kernel/hrtimer.c
>> @@ -48,6 +48,7 @@
>>
>> Â#include <asm/uaccess.h>
>>
>> +#include <trace/events/irq.h>
>> Â#include <trace/events/timer.h>
>>
>> Â/*
>> @@ -1244,6 +1245,8 @@ void hrtimer_interrupt(struct clock_event_device *dev)
>> Â Â Â ktime_t expires_next, now, entry_time, delta;
>> Â Â Â int i, retries = 0;
>>
>> + Â Â trace_irq_vector_entry(TRACE_HRTIMER_VECTOR);
>> +
>> Â Â Â BUG_ON(!cpu_base->hres_active);
>> Â Â Â cpu_base->nr_events++;
>> Â Â Â dev->next_event.tv64 = KTIME_MAX;
>> @@ -1316,6 +1319,7 @@ retry:
>> Â Â Â if (expires_next.tv64 == KTIME_MAX ||
>> Â Â Â Â Â !tick_program_event(expires_next, 0)) {
>> Â Â Â Â Â Â Â cpu_base->hang_detected = 0;
>> + Â Â Â Â Â Â trace_irq_vector_exit(TRACE_HRTIMER_VECTOR);
>> Â Â Â Â Â Â Â return;
>> Â Â Â }
>>
>> @@ -1355,6 +1359,8 @@ retry:
>> Â Â Â tick_program_event(expires_next, 1);
>> Â Â Â printk_once(KERN_WARNING "hrtimer: interrupt took %llu ns\n",
>> Â Â Â Â Â Â Â Â Â ktime_to_ns(delta));
>> +
>> + Â Â trace_irq_vector_exit(TRACE_HRTIMER_VECTOR);
>
> We already have hrtimer tracepoints.
>

Yes, but those events are about hrtimer events and buried deep in
hrtimer code. If I enable all events in the "irq" trace event
subsystem, I want to see every interrupt. I don't want to remember "oh
yeah, I need to enable x,y,z events too." Putting the event in the
"irq" subsystem and in the interrupt handler makes this crystal clear.
And if the hrtimer code is (or is ever) such that an interrupt path
doesn't reach the hrtimer tracepoints, I still want to know that the
interrupt occurred.

Here's our motivation: We use interrupt events, along with traps,
syscall, and sched_switch events to know when kernel code is running
instead of user code. So, being clear that we got all interrupts (and
which interrupt) is important.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/