Re: [SMP patch] io-apic-patch-2.1.101-F (new-one)

Edgar Toernig (froese@gmx.de)
Thu, 14 May 1998 04:25:06 +0200


Results with 101-I

MOLNAR Ingo wrote:
>
> On Thu, 14 May 1998, Edgar Toernig wrote:
>
> > Sorry, your patch doesn't work. Some comments:
> >
> > 1. The thing with the spurious interrupt vector is good. On my
> > motherboard (ASUS dual P2) the vector of the second CPU is set
> > to 0x0f! (boot CPU is ok, 0xff)
>
> i dont think it makes any difference, i just added it because i felt
> uneasy about us using the BIOS-provided default. We should not see any
> APIC-spurious IRQs, should we?

We shouldn't loose interrupts, either ;-)

> > 3. I made the following test with a Realtek 8029 ethernet card.
> > Two simultaneous ping -f to another machines and the usual
> > message arrives (tx timed out) and no more interrupts are
> > delivered to the card. The IO-APIC register shows, that the
> > /delivery-status/ and the /remote-irr/ bits are stuck at 1 for
> > this interrupt.
>
> does this happen with io-apic-patch-2.1.101-I? It has so far survived ~30
> million NE2000 IRQs on my box.

Well, on my box it stuck within a second...

> > 4. Every interrupt is delivered twice! Setting ei_debug in 8390.c
> > to 4 shows, that after _every_ normal interrupt a second one is
> > delivered with the interrupt status register set to 0 (which
> > means: no interrupt condition present).
> > Digging deeper into this, I noticed that the interrupt counters
> > of both CPUs are incremented, but the interrupt routine is
> > called by the same CPU. That would mean, that a second interrupt
> > is raised, while the first is still running and is later delivered
> > by a self_IPI.
>
> hm. This could be the pin<->idx bug, disable_IO_APIC_irq() has a chance to
> disable the wrong pin, thus resulting in IRQ storms. Does this still
> happen with io-apic-patch-2.1.101-I? It doesnt happen here, exactly as
> many IRQs as should happen.

Still, every interrupt (only NE2000 checked) comes twice, the second
with ISR=0.

> the point is: we do not want to know about all events. When we are running
> in a driver, we want to disable all events from that source (in whatever
> way).

OK!

> the other problem with 'late ACKs' is that an unacked vector blocks _all_
> smaller vectors. This might work with BSDs, but Linux can mix IRQs freely,
> and also it generates unfair problems for RT-Linux.

OK, too ;-)

> > When an interrupt arrives it is masked in the IO-APIC. Then it is
> > acked. Now, the normal interrupt processing is done. At the end,
> > the interrupt is unmasked (if not disabled). No self_IPI!
> > As soon, as the next sti/reti is executed, the next interrupt will
> > be raised by the CPU.
>
> the problem with this is that we loose IRQs which arrive while they are
> masked in the IO-APIC. So if a driver (say eth0) handles one specific
> frame, and a status flag says there are no more frames, and the driver
> clears the IRQ, and is on it's way back to the higher-level IRQ code,
> _and_ we get another frame in this window, then we will loose this IRQ.

1. Why should we loose it? It's level triggered! As soon as the APIC
is unmasked/reenabled the IRQ comes through.

2. We are/I'm *not* loosing interrupts. The interrupts stick in the
IO-APIC (see delivery-status and remote-irr flags). They wait to
be acked. Loosing interrupts would slow down the 8390-driver
(it resets the card and discards packet when it detects a lost int)
but wouldn't make him dead.

> Enabling a pin for a raised level-triggered IRQ does _not_ generate an
> IRQ as far as i've checked. Yes it sucks.

Unmasking an already active level-triggered pin does of course generate
an IRQ. Your statement is right for edge-triggered interrupts; the
IO-APIC wants to /see/ the edge while unmasked but for level-
triggered interrupts the level alone is enough (else they would be
edge-triggered ints, too, right? :-)

I don't know, why you don't get this faults. My hardware:
ASUS dual-P2 mb, 2xP2 266, onboard aic7880, 128mb sdram,
3xRealtek 8029, 2xIBM dcas 4gb raid.
The other machine answering the pings is a K6-200 with another PCI-NE2k.

The only working patch up to now is the one from me without sending
the self_IPIs.

4am -- time for dinner :-)

Ciao, ET

PS: FYI, 2.1.100 with NOT_BROKEN set to 1 hangs my aic-driver, too, not
only the BT-driver.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu