Re: [RFC] killing the NR_IRQS arrays.
From: Benjamin Herrenschmidt
Date: Sat Feb 17 2007 - 16:04:57 EST
> No. I don't think we should make your irq_hwnumber_t thingy general
> because it is not general. I don't understand why you need it to be
> an unsigned long, that still puzzles me. But for the rest it actually
> appears that ppc has a simpler model to deal with.
I think you might have misunderstood becaues I do beleive it's actually
very general :-)
Let me explain below.
> I don't think I actually can describe x86 hardware in you hwnumber_t
> world. Although I can approximate.
And I think it fits well...
> In non-legacy mode at the top of the tree I have a network cooperating
> irq controllers. For each cpu there is an lapic next to each cpu that
> catches interrupt packets and below that I have interrupt controllers
> that throw interrupt packets. In the network of cooperating interrupt
> controllers a interrupt packet has a destination address that looks
> like (cpu#, vector#) where cpu# is currently at 8 bits and slowly
> growing and the vector# is a fixed 8 bits.
>
> The interrupt controllers that throw those packets have a fixed
> number of irq slots usually 24 or so. Each slot (referred to in the
> code as a pin) can be programmed which (cpu#, vector#) packet it
> throws when an interrupt occurs. Including an option to vary the cpu#
> between a set of cpus.
>
> So to be frank to handle this model properly I need to deal with
> this properly I need.
> #define NR_IRQS (NR_CPUS*256)
>
> There is enough flexibility in this model that hardware vectors
> have not found a need to cascade interrupt controllers.
This is roughly similar to the cell "toplevel" model where interrupt
messages encode the source unit/node, target and class. The chip has an
interrupt "controller" (receiver of those messages) for each thread. In
the kernel, I use a "flat" model, that is I create one host for all of
them and my hardware numbers are mode of a similar bit encoding of those
"routing" infos.
That is, with a remapping model like mine, the x86 non-legacy situation
could be easily expressed by having one domain (I call them hosts in the
code) covering the whole fabric and the hw number be your (CPU << 16) |
vector thing.
In addition, but you don't need that on x86, cell has an external
controller cascaded on one of those interrupt, I use a separate domain
for it.
The reason my hwnumber thingy is a generic type is that i provide
generic functions to create a linux interrupt for a domain/number pair
and generic mecanism to do the reverse mapping. That's where I think my
code might be of some use as with the "numbers" going away, pretty
everybody will need a wat to reverse map from HW numbers back to
irq_desc *.
I use an unsigned long because I needed to choose a type that would fit
the biggest number potentially used by an interrupt controller, and that
can be real big with some hypervisors for which those are "tokens" which
are potentially 64 bits.
> Ben I have no problem with a number that is specific to an irq
> controller for dealing with the internal irq controller
> implementations, heck I think everyone has that to some degree
>
> The linux irq number will remain an arbitrary software number for
> use by the linux system for talking about the source of the
> interrupt.
So you do intend to keep the linux number which is what I call the
"virtual interrupt" number on powerpc... I wouldn't have thought that to
be necessary except as a special case of an array of 16 entries for ISA
interrupts...
> Why in a sparse address space you would find it hard to allocate a
> range of numbers to an irq controller that only has a fixed number of
> irqs it can deal with is something I don't understand and I think
> it is does a disservice to your users. But that is all it is
> a quality of implementation issue. ia64 does the same foolish
> thing.
It would be fairly easy to change my powerpc code to pre-allocate a full
range for a given domain/pic when initializing it instead of doing
"lazy" scattered allocation like I do, though it won't bring much I
think. It's not possible for all PICs though, for example, the pSeries
needs to use the radix tree reverse mapper because of how large HW
interrupt numbers can be.
I chose not to do it. In the long run, the only remotely meaningful way
to expose interrupt to users would be to -add- columns
to /proc/interrupts that provide the "host" and the HW number on that
host, though I'm not sure that wouldn't break some userland tools.
> The only time it really makes sense to me to let the irq number vary
> arbitrary are when things are truly dynamic, like with MSI, a
> hypervisor, or hot plug interrupt controllers.
I don't understand why you would go to all that lenght to replace irq
numbers with irq_desc * and ... keep then numbers :-)
But again, as I said, this is in no way a fundamental limitation of the
powerpc code. It could be modified easily to allocate the whole range of
a given PIC that uses the "linear" remapping. It makes no sense for PICs
that use the "radix tree" remapping though.
> Sure, and I have the same issue with a big "DESIGNED FOR ppc" in the middle,
> or "DESIGNED FOR arch/x". However the unfortunate truth is that the x86
> has enough volume that frequently other architectures use some x86
> hardware and thus get some of x86's warts. So anything that doesn't
> cope with the x86's warts is frequently doomed to failure.
I fait to see how what I described would not apply nicely to x86 ..
Ben.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/