Re: [PATCH 00/16] dyn_array and nr_irqs support v2

From: Eric W. Biederman
Date: Sat Aug 02 2008 - 16:23:20 EST


"H. Peter Anvin" <hpa@xxxxxxxxx> writes:

> Eric W. Biederman wrote:
>>
>> Yes. I want the option of using those bits. It might not be smart to
>> use them to encode a physical location and the irq number but just
>> having the option would be nice.
>>
>
> Urk! First of all, there isn't enough space as we have already proven (on the
> machines where it actually matters there just aren't enough bits), but doing
> this kind of stuff *optionally* is going to hurt even worse.

With respect to space we have shown: We create many more irq_desc
entries then we use in practice. Which hurts us when it comes to
pace. Especially when compiling a single kernel for a wide range
of machines.

Which is why I ultimately want a list or a tree data structure holding
irq_desc entries instead of an array. Arrays must be statically
oversized sized, waisting space and reducing our flexibility of
dealing with irqs at run time.

Which says to me the low level architecture code that actually knows
at run time how many irqs there are should do the allocation of
irq_desc entries and allocating them on the appropriate NUMA node.

All of which should yield no fixed cap short of 32 bits for the irq
number at run time. Not having an arbitrarily low cap is what I mean
by having the option of a sparsely allocated irq number. If we have a
nice data structure that is a side effect that comes essentially for
free.

Except for upgrading the genirq code to pass things internally and to
the arch code in terms of irq_desc * entries. This should be very little
change from where we are today.

> Furthermore, this crap will break anyway the *next* time someone comes up with a
> new clever way to do interrupts -- and to truly get stable identifiers, we can't
> treat HyperTransport MSI as APICs anymore, yadda, yadda...

Yes. There are those kinds of issues. I don't think I have yet come up with
a usable stable mapping for msi interrupts. Just something close.

I expect what is most likely to work is after allocating the fixed irqs, to scan the
pci busses and for each for each pci device if msi is supported reserve 1 irq number.
If msi-X is supported reserve 4096 irq numbers. If ht-irqs are supported reserve
1 irq for each irq number. Hot plug slots that can ultimately have pci busses
plugged into them are going to be interesting. But I think if we make an
effort msi irq numbers will stop flapping in the breeze and are likely to
remain the same, and fit in the number of bits we have. While still not
requiring us to allocate storage for them. Potentially we can even treat
GSIs the same way. If we know that an ioapic line is simply not connected
we can reserve an irq number for it at boot but never allocate an irq_desc
structure for it.

What I mean by having the option to do a stable mapping is that we don't build
in unnecessary a priori limits to the maximum irq number. Irq numbers have
always been sparsely allocated. It was a rare ISA system that used all 16
of it's irqs. It was an even rarer ioapic based system that used all of it's ioapic
inputs but we have always reserved irq numbers for all of those potential irqs.

So I ask to have a data structure that can potentially span the entire 32bit
range of irq numbers, and that instead of a dense and sparsely used array
we keep just the irq_desc entries that we need.

The only compile time options would be: Has this architecture switched over
to a sparse irq array data structure.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/