Re: [patch V6 00/16] Improve /proc/interrupts further

From: Shrikanth Hegde

Date: Thu May 21 2026 - 11:05:33 EST

On 5/21/26 1:23 PM, Thomas Gleixner wrote:

Shrikanth!

On Thu, May 21 2026 at 10:04, Shrikanth Hegde wrote:

On 5/20/26 8:57 PM, Thomas Gleixner wrote:

Can you redirect it to /dev/null instead to take the file operations out
of the picture?

Yes. Did "perf stat -r 1000 cat /proc/interrupts > /dev/null".
It shows better improvement with the series compared to file write.

Unsurprisingly :)

0.000490211 +- 0.000000992 seconds time elapsed ( +- 0.20% ) <<< 3-4% improvements.

Again IPC drops ....

Yes. IPC dropping is consistent. I see the same trend in (PATCH 1/16) in the series.
Copying that snippet below.

Before:
8,932,242 instructions # 1.66 insn per cycle ( +- 0.34% )
After:
7,020,982 instructions # 1.30 insn per cycle ( +- 0.52% )

So it might be common pattern across archs. Maybe perf stat subsystem is slow
enough it doesn't shows the aboslute benefit.

The problem is that the overhead of starting and tearing down 'cat' is
accounted as well. That's constant, obviously.

But for the use cases like irqbalanced or similar things, there is no
startup/teardown cost involved. The process is up and running and they
care about the actual read performance.

true.

It's clearly to observe by comparing the perf data with the read loop
timing data:

Base line v6
Perf 3072.21 us 1564.40 us
Loop 1310.36 us 209.90 us

It doesn't add up completely, but the trend is there. And you can trick
perf to reveal the startup/teardown overhead it by comparing:

perf stat -r 1000 head -q -c -0 /proc/interrupts >/dev/null
perf stat -r 1000 head -q -c 0 /proc/interrupts >/dev/null

Tried it, but doesn't affect much.

In addition, I ran "perf stat -a -r 1000 cat /proc/interrupts > /dev/null"
It is now 10x slower. IPC is same with series And improvement vanishes.
So heavier the infra testing it, gains are getting minimal i guess.

As often :)

But i don't see any regression.

As you said in the cover-letter, the micro loops you ran maybe the best way to evaluate it.
If you have the code in shareable form, I can give it a try.

See below. I thought I would come around some day to actually use perf
directly in the test program, but that never happened due to
-ENOTIME.

Thanks,

tglx
---
#include <fcntl.h>
#include <math.h>
#include <stdio.h>
#include <time.h>
#include <unistd.h>

static char buf[1024*1024];

#define NSECS_PER_SEC (1000L * 1000L * 1000L)

#define LOOPS 1000

static float td[LOOPS];

int main(int argc, char *argv[])
{
int fd = open("/proc/interrupts", O_RDONLY);
long tsum = 0, rs = 0;

for (int i = 0; i < LOOPS; i++) {
long r;

do {
r = read(fd, buf, sizeof(buf));
} while (r);
lseek(fd, 0, 0);
}

for (int i = 0; i < LOOPS; i++) {
struct timespec t0, t1;
unsigned long delta;
long r;

clock_gettime(CLOCK_MONOTONIC, &t0);
do {
r = read(fd, buf, sizeof(buf));
rs += r;
} while (r);
clock_gettime(CLOCK_MONOTONIC, &t1);

delta = t1.tv_nsec + t1.tv_sec * NSECS_PER_SEC;
delta -= t0.tv_nsec + t0.tv_sec * NSECS_PER_SEC;
tsum += delta;
td[i] = delta * 1.0;

lseek(fd, 0, 0);
}

float mean = tsum / LOOPS;
float calc = 0;

for (int i = 0; i < LOOPS; i++) {
float tmp = td[i] - mean;

calc += tmp * tmp;
}

calc /= LOOPS;

float std = sqrt(calc * 1.0);

printf("%lu %lu %5.3f\n", tsum / LOOPS, rs / LOOPS, (std / mean) * 100.0);
return 0;
}

This shows real benefits indeed.

base v6 v6+ppc_hack
101us 65us 57us

So doing a proper powerpc fix indeed would make sense.
I think it is going to be similar. Let me go and read your
series again.

For the genirq bits of the series, consider the tag if applicable

Tested-by: Shrikanth Hegde <sshegde@xxxxxxxxxxxxx>