Re: 2.6.17-mm6

From: Eric W. Biederman
Date: Thu Jul 06 2006 - 03:17:25 EST


Andrew Morton <akpm@xxxxxxxx> writes:

> On Wed, 5 Jul 2006 22:59:05 -0700
> Andrew Morton <akpm@xxxxxxxx> wrote:
>
>> On Wed, 05 Jul 2006 23:42:00 -0600
>> ebiederm@xxxxxxxxxxxx (Eric W. Biederman) wrote:
>>
>> > So I suspect that we need to de-percpuify kernel_stat.irqs.
>>
>> I think so.
>
> Maybe not. If we do this, we lose the pretty CPUn columns in
> /proc/interrupts. That /proc/interrupts display requires that we maintain
> NR_CPUS*NR_IRQS counters.

Yes. Although at least part of that display is per architecture
so we may be able to get away with a little more.

> Given that a large NR_IRQs space will be sparsely populated, we should
> dynamically allocate the NR_CPUS storage for each active IRQ, as you say.

To some extent. Although with MSI-X there is the potential we could use
all of them. (But probably not).

Natalie has a firmer grasp on what is common that I do.
But I think the big Unisys boxes were running 1-2 active irqs per
cpu, and I believe some large IBM boxes were worse.

What is scary is that at 1K cpus if we wind up using all of the irqs
we start consuming 1Gig of RAM. At only 128 cpus we are still in the 2M-15M
territory, so that isn't too scary. The point is that after a certain
put the memory usage for all of those counters goes insane.

So I'm not certain how safe it is to leave something that explodes that badly
in there.

We at least need a big fat scary comment and probably something like
#if (NR_IRQS*NR_CPUS) > 1000000
#error ouch! NR_IRQS*NR_CPUS too large.
#endif

> That involves putting it into the irq_desc (as good a place as any). And a
> rather large number of trivial edits. I guess we do this only for genirq?

Probably. The potential has always been there, but it clearly wasn't
unlocked until just recently.

If we find a cheap solution then we can do something.

As a short term thing I am tempted to say just:
#define NR_IRQS (256 + (NR_CPUS*8))

Which would reduce kstat_irqs to 5k, and would relieve the immediate
symptoms.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/