Re: [RFC] killing the NR_IRQS arrays.

From: Eric W. Biederman
Date: Sat Feb 17 2007 - 23:59:30 EST


Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> writes:

>> The only time it really makes sense to me to let the irq number vary
>> arbitrary are when things are truly dynamic, like with MSI, a
>> hypervisor, or hot plug interrupt controllers.
>
> I don't understand why you would go to all that lenght to replace irq
> numbers with irq_desc * and ... keep then numbers :-)

Because I don't have something better to replace them with.

We need names for irqs, currently the kernel/user space interface
is a unsigned number.

Printing out a pointer where we currently have an integer in:
/proc/interrupts
/proc/irq/N/...
/sys/devices/pci0000:00/0000:00:0e.0/irq
is a bad practice, and if I don't retain the number that is
my only choice.

I similar problem exists in all of the initialization messages from
device drivers that display their irq number.

Plus I think there are also a few ioctls that return the linux
irq number.

Now it may make sense to replace my irq_nr() with irq_name(), and
return a string that can be used instead, but fixing the kernel
user space interface is a third step that is a lot more delicate
and will require more thinking. So I would prefer to put that
off until all of the internal users are using a pointer. Then
we can grep for irq_nr and see how many places we actually export
the irq number to user space.

The fact that the user space has been put in charge of when to
migrate an irq from cpu to another makes this double delicate.

>> Sure, and I have the same issue with a big "DESIGNED FOR ppc" in the middle,
>> or "DESIGNED FOR arch/x". However the unfortunate truth is that the x86
>> has enough volume that frequently other architectures use some x86
>> hardware and thus get some of x86's warts. So anything that doesn't
>> cope with the x86's warts is frequently doomed to failure.
>
> I fait to see how what I described would not apply nicely to x86 ..

The model can be made to work if you force it but it isn't really
a good fit.

I can't really use the (cpu#, vector#) tuple as hw number as it
varies at runtime, and a single interrupt can send different (cpu#,
vector#) tuples from one interrupt message to the next without being
reprogrammed. At least I don't have the impression that you support
multiple hardware numbers going to the same linux irq. But this
really is the layer where I need the reverse mapping. However I can
optimize the reverse mapping by taking advantage of the per cpu
nature.

Currently the hardware number that I use is the pin number on
the ioapic. And to form the linux irq I just add the number
of pins of all previous ioapics together and then add my pin number.
Fairly simple.

Doing the above gives me stable names that are the same from one boot
to the next if someone doesn't change how the hardware is put
together. It looks to me that if I adapt the ppc scheme my irq
numbers will change from one boot to the next one kernel to the next,
almost at random. Depending on driver initialization order and
similar things. Having names that change all of the time is confusing
and not very useful. The fact that in the process of making my names
stable it actually happens to reflect part of the irq hardware
topology is incidental.

Giving up stable names is not something I want to do.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/