Re: [PATCH 1/2] irq: sparseirq enabling

From: Ingo Molnar
Date: Mon Nov 24 2008 - 09:40:35 EST



* Yinghai Lu <yinghai@xxxxxxxxxx> wrote:

> +/*
> + * Protect the sparse_irqs_free freelist:
> + */
> +static DEFINE_SPINLOCK(sparse_irq_lock);
> +LIST_HEAD(sparse_irqs_head);
> +
> +/*
> + * The sparse irqs are in a hash-table as well, for fast lookup:
> + */
> +#define SPARSEIRQHASH_BITS (13 - 1)
> +#define SPARSEIRQHASH_SIZE (1UL << SPARSEIRQHASH_BITS)
> +#define __sparseirqhashfn(key) hash_long((unsigned long)key, SPARSEIRQHASH_BITS)
> +#define sparseirqhashentry(key) (sparseirqhash_table + __sparseirqhashfn((key)))
> +
> +static struct list_head sparseirqhash_table[SPARSEIRQHASH_SIZE];
> +

> +struct irq_desc *irq_to_desc(unsigned int irq)
> +{
> + struct irq_desc *desc;
> + struct list_head *hash_head;
> +
> + hash_head = sparseirqhashentry(irq);
> +
> + /*
> + * We can walk the hash lockfree, because the hash only
> + * grows, and we are careful when adding entries to the end:
> + */
> + list_for_each_entry(desc, hash_head, hash_entry) {
> + if (desc->irq == irq)
> + return desc;
> + }
> +
> + return NULL;
> +}

I have talked to Thomas about the current design of sparseirq, and the
current code looks pretty good already, but we'd like to suggest one
more refinement to the hashing scheme:

Please simplify it by using a sparse pointers static array instead of
a full hash. I.e. do it as a:

struct irq_desc *irq_desc_ptrs[NR_IRQS] __read_mostly;

This can scale up just fine to 4096 CPUs without measurable impact to
the kernel image size on x86 distro kernels.

Data structure rules:

1) allocation: an entry would be allocated at IRQ number creation time
- never at any other time. Pure access to the desc returns NULL -
it's atomic and lockless.

2) freeing: we never free them. If a system allocates an irq_desc[],
it signals that it uses it. This makes all access lockless.

3) lookup: irq_to_desc() just does a simple irq_desc_ptrs[irq]
dereference (inlined) - no locking or hash lookup needed. Since
irq_desc_ptrs[] is accessed __read_mostly, this will scale really
well even on NUMA.

4) iterators: we still keep NR_IRQS as a limit for the
irq_desc_ptrs[] array - but it would never be used directly by any
iteration loop, in generic code.

5) bootup, pre-kmalloc irq_desc access: on x86 we should preallocate a
pool of ~32 irq_desc entries for allocation use. This can be a
simple static array of struct irq_desc that is fed into the
first 32 legacy irq_desc[] slots or so. No bootmem-alloc and
no after_bootmem flaggery needed. All SMP init and secondary CPU
irq desc allocation happens after kmalloc is active already -
cleaning up the allocation path.

6) limits on x86: please scale NR_IRQS to this rule:
max(32*MAX_IO_APICS, 8*NR_CPUS)

That gives a minimum of 4096 IRQs on 64-bit systems. On even larger
systems, we scale linearly up to 32K IRQs on 4K CPUs. That should
be more than enough (and it can be increased if really needed).

Please keep all the irq_desc[] abstraction APIs and cleanups you did
so far - they are great. In the far future we can still make irq_desc
a full hash if needed - but right now we'll be just fine with such a
simpler scheme as well, and scale fine up to 16K CPUs or so.

This should simplify quite a few things in the sparseirq code.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/