Re: [PATCH V7] irq: Track the interrupt timings

From: Thomas Gleixner
Date: Thu Jun 23 2016 - 06:14:58 EST


On Thu, 23 Jun 2016, Daniel Lezcano wrote:
> On 06/23/2016 10:41 AM, Thomas Gleixner wrote:
> > Is it really required to do this per interrupt rather than providing per cpu
> > statistics of interrupts which arrived in the last X seconds or whatever
> > timeframe is relevant for this.
>
> Perhaps I am misunderstanding but if the statistics are done per cpu without
> tracking per irq timings, it is not possible to extract a repeating pattern
> for each irq and have an accurate prediction.

I don't see why you need a repeating pattern for each irq. All you want to
know is whether there are repeating patterns of interrupts on a particular
cpu.

struct per_cpu_stat {
u32 irq;
u64 ts;
};

storing 32 entries of the above should give you enough information about
patterns etc. If you have a high rate of interrupts on that cpu it does not
matter at all whether thats from one or several devices. If you have only a
few then this storage is sufficient to get the desired information.

> If we step back and look at the potential users of this framework, we have:
>
> - mobile: by nature the interrupt line number is small and the devices are
> "slow"
>
> - desktop and laptop : a few interrupts are really interesting us, ethernet
> and sdd (the other ones are rare, or ignored like timers or IPI)

You still walk ALL interrupts. On my laptop that's 22 of them. And you touch
every single per cpu storage of each interrupt.

> - server : the interrupt line number is bigger, but not so much.

Not so much? 158 interrupts on one of my larger machines.

> Usually, server and super sized system want full performance and low latency.
> For this reason the kernel is configured with periodic tick and that makes the
> next prediction algorithm superfluous, especially when the latency is set to
> 0. So I don't think the irq timings + next irq event code path will be ever
> used in this case.

Well, if such a machine runs with NOHZ=n then fine. But there are enough
machines where NOHZ is enabled so you get better power savings during times
where the machine is not loaded, but you want to have performance and low
latency if there is work to do.

> As you mentioned it, there are some parts we can make evolve and optimize like
> preventing to lookup an empty irq events cpu.

You better think about this now.

Thanks,

tglx