Re: new IRQ scalability changes in 2.3.48

From: Andrea Arcangeli (andrea@suse.de)
Date: Wed Mar 08 2000 - 22:09:45 EST


On Wed, 8 Mar 2000, Richard Henderson wrote:

>On Wed, Mar 08, 2000 at 06:02:26PM -0700, yodaiken@fsmlabs.com wrote:
>> > if different IRQs are delivered to different CPUs, then there is no global
>> > spinlock connection between them. Also see /proc/irq/*/smp_affinity. Eg.
>>
>> I've been thinking about this change and still don't see what good it
>> does.
>> On a UP -- no change except code is more complex
>> On a SMP box performance loss without using affinity.
>> Take two spinlocks instead of one, more cache boucing etc.
>
>Actually, it does make sense on Alpha. We don't really take advantage
>of it at the moment but --

It make sense also on X86 and we are just taking advantage of it see my
other email.

>We don't need any spinlocks at all.

I don't think so.

>Each cpu has its own interrupt mask register, so the actual interrupt
>handler needn't worry about communicating with other processors.

Yes, but we definitely need the per irq-desc locking to avoid to run the
irq handler on two CPUs at the same time.

For the per-controller lock I had a strong reason for adding it too (at
first I took the wrong lock-less way of only replacing the ~&| with atomic
set_bit/clear_bit). Without it this is what can happen:

        cpu0 running irq0 cpu1 running irq1
        ------------------ -------------------
        handler->ack(0) -> disable_irq(0)
                                        handler->ack(1) -> disable_irq(1)
                                        mask = cached_irq_mask & ~(1UL<<1)
                                        cached_irq_mask = mask
        mask = cached_irq_mask & ~(1UL<<0)
        cached_irq_mask = mask
        sync(mask)
                                        sync(mask) irq0 was still enabled
                                                   in this mask!!!!

        returns thinking irq0 is
        disabled bit it isn't (lockup)

With the additional lock the set-clear-bit + flush-new-mask-to-hardware is
atomic and the above SMP race can't happen anymore.

>an interprocessor interrupt. This is more heavy-weight than a spinlock,
>but it ought to be much less frequent.

Hug, handler->enable_irq()/handler->disable_irq() are called for _each_
irq.

And the real enable_irq()/disable_irq() is not less frequent than
interrupts in the 3c509 case (it may be the only operation while
transmitting data in UDP and that's an issue at least for multicasting). I
think using IPI for synchronization is a loss even if possible thing to
do. enable_irq/disable_irq should be really fast. (And it's not fully
clear to me how to solve the above race by using IPI btw :).

>What we ought to do is initialize the affinity mask with some initial
>static distribution, [..]

Agreed, if there are platforms supporting 8/16-way SMP that is a very good
idea. It can be done as a per platform init thing to call in the
alpha_mv.init_irq callback.

The current default for dp264 should be ok at lest for 2-way and maybe ok
for 4-way too (max number of CPU supported by dp264).

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Mar 15 2000 - 21:00:15 EST