[PATCH] irq: sparse irq_desc[] support

From: Yinghai Lu
Date: Sat Nov 29 2008 - 02:13:03 EST


Impact: new CONFIG_SPARSE_IRQ feature, which makes irq_desc[] a sparse array

To support kernels with very large NR_CPUS and NR_IRQS settings,
we need to reduce the size of irq_desc[]. On x86, when NR_CPUS is
set to 4096, the irq_desc[] array will waste megabytes of RAM,
which is not acceptable overhead to generic distro kernels.

In v2.6.28 we already introduced a generic API to make access to
the irq_desc[] array more abstract - and to allow a different
data structure to underly it. This patch finishes that process.

Core kernel changes:

- fix missing sparseirq API changes in various bits of core kernel code
(missing for_irq_desc primitives, missing checks for !desc, etc.)

- introduce a new data type in the IRQ code: irq_desc_ptrs[] and its
handling in the core IRQ code

- detach the IRQ statistics counters from kernel_stat and
attach it to irq_desc->kstat_irqs[] dynamically allocated
array of pointers. (this can use percpu_alloc() in the
future, once percpu_alloc() becomes generic enough)

- detach the NR_IRQS array in random.c.

- interrupt remapping: when moving an IRQ on NUMA, reallocate the irq
descriptor so that we get proper NUMA-local memory for the descriptor,
for the irq_cfg entry and for the kstat_irqs array.

Architectures can enable this by setting the CONFIG_SPARSE_IRQ
config switch. The x86 architecture is extended/fixed to deal
with such an irq_desc[] model:

- io_apic irq_cfg[NR_IRQS] array is re-attached to desc->irq_chip

- MSI virtual IRQ numbering is sanitized to go from the max upper
end of the physical IRQ range up towards NR_IRQS - instead of
coming down from the end of NR_IRQS.

- re-tunes our max NR_IRQS calculations

Architectures that do not specify CONFIG_SPARSE_IRQ, do not need
to change anything - this is a transparent feature that is not
supposed to break any existing code.

Signed-off-by: Yinghai Lu <yinghai@xxxxxxxxxx>
Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/