Re: [RFC PATCH 6/7] core/metricfs: expose x86-specific irq information through metricfs

From: Thomas Gleixner
Date: Thu Aug 13 2020 - 06:11:39 EST


Jonathan Adams <jwadams@xxxxxxxxxx> writes:

How is that related to core? The x86 subsys prefix is 'x86' and for this
particular thing it's 'x86/irq:'. That applies to the rest of the series
as well.

> Add metricfs support for displaying percpu irq counters for x86.
> The top directory is /sys/kernel/debug/metricfs/irq_x86.
> Then there is a subdirectory for each x86-specific irq counter.
> For example:
>
> cat /sys/kernel/debug/metricfs/irq_x86/TLB/values

What is 'TLB'? I'm not aware of any vector which is named TLB.

The changelog is pretty useless in providing any form of rationale for
this. It tells the WHAT, but not the WHY.

Also what is does this file contain? Aggregates, one line per CPU or the
value of the random CPU of the day? I'm not going to dive into the macro
zoo to figure that out.

> jwadams@xxxxxxxxxx: rebased to 5.8-pre6
> This is work originally done by another engineer at
> google, who would rather not have their name associated with
> this patchset. They're okay with me sending it under my name.

I can understand why they wont have their name associated with this.

> +#ifdef CONFIG_METRICFS
> +METRICFS_ITEM(NMI, __nmi_count, "Non-maskable interrupts");
> +#ifdef CONFIG_X86_LOCAL_APIC
> +METRICFS_ITEM(LOC, apic_timer_irqs, "Local timer interrupts");
> +METRICFS_ITEM(SPU, irq_spurious_count, "Spurious interrupts");
> +METRICFS_ITEM(PMI, apic_perf_irqs, "Performance monitoring interrupts");
> +METRICFS_ITEM(IWI, apic_irq_work_irqs, "IRQ work interrupts");
> +METRICFS_ITEM(RTR, icr_read_retry_count, "APIC ICR read retries");
> +#endif
....

So you are adding NR_CPUS * NR_DIRECT_VECTORS debugfs files which show
exactly the same information as /proc/interrupts, right?

Aside of that _all_ of this information is available via tracepoints as
well.

That's NR_CPUS * 15 and incomplete because x86 has 23 of those directly
handled vectors which do not go through the irq core. So with just 15
and 256 CPUs that's 3840 files.

Impressive number especially without any information why this is useful
and provides value over the existing mechanisms to retrieve exactly the
same information.

The cover letter talks a lot about who Google finds this useful, but
that's not really a convincing argument for this metric failsystem
addon.

Thanks,

tglx