Re: rt20 patch question

From: Steven Rostedt
Date: Fri May 12 2006 - 02:47:29 EST



On Thu, 11 May 2006, Mark Hounschell wrote:

>
> Here is a detailed list of the RT tasks running with prios, cpu masks
> etc. There are 3 nics. eth1 is the nic being used by the emulation. eth2
> is currently unused.

>
> pid SCHED PRIO CPUM TASK
> --- ---- ---- ---- ----

This being a SMP machine, pid 2 and 3 must be the migration threads.

> 2 FIFO 99 1 (unknown)
> 3 FIFO 99 1 (unknown)

> 4 FIFO 1 1 (unknown)
> 5 FIFO 1 1 (unknown)
> 6 FIFO 1 1 (unknown)
> 7 FIFO 1 1 (unknown)
> 8 FIFO 1 1 (unknown)
> 9 FIFO 1 1 (unknown)
> 10 FIFO 1 1 (unknown)

Do you know what these processes are (12 and 13)?

> 12 FIFO 99 2 (unknown)
> 13 FIFO 99 2 (unknown)

[...]

> 39 FIFO acpi 49 [IRQ 9] 1 (unknown)
> 1129 FIFO rtc 48 [IRQ 8] 1 (unknown)
> 1135 FIFO i8042 47 [IRQ 12] 1 (unknown)
> 1145 FIFO floppy 46 [IRQ 6] 1 (unknown)
> 1178 FIFO i8042 45 [IRQ 1] 1 (unknown)
> 1268 FIFO ide0 44 [IRQ 14] 1 (unknown)
> 1313 FIFO ide1 43 [IRQ 15] 1 (unknown)
>

FYI, The above are all of higher priority than the below.

> 1362 FIFO 42 [IRQ 169] 1 (unknown)
> ide2, aic7xxx, aic7xxx, eth1, eth2,
> gpiohsd, gpiohsd, gpiohsd, gpiohsd, eprm

Wow! that's a lot on a shared IRQ. Do you have the ide2 being used. If
one of these where to spin for a while, then all the below would freeze.
Also them being preempted will also have a problem. Perhaps you want to
raise the priority of this interrupt thread.

>
> 2663 FIFO ??? 41 [IRQ 4] 1 (unknown)
> 2667 FIFO ??? 40 [IRQ 3] 1 (unknown)
> 3420 FIFO 82801BA 39 [IRQ 177] 1 (unknown)
> 5788 FIFO eth0 38 [IRQ 185] 1 (unknown)
> 8036 FIFO rtom 37 [IRQ 193] 2 (unknown)
> 10338 FIFO EMU-CPU 33 2 ./vrsx

[...]

> > What seems to be happening is that the vortex_timer is going off while the
> > interrupt is running. Hence the disable_irq fails and schedules.
> >
> > Perhaps the interrupt thread has been preempted by some high priority task
> > and causes it to lose a connection.
> >
> > Yeah that task output would be helpful to see if you can get it to work.
>
> Ok I have this but it is 2000+ lines. I probably don't want to put it on
> the list. Should I send it to you directly?

Yes please (compress it as well). With so much shared on an IRQ and you
are disabling it, it might cause some large timeouts. The disable irq with
the hardirqs as threads is a sleep (that's where you hit the bug) where as
otherwise it just spins and waits. So it can be a timing issue.

Could also you try running the RT kernel without hardirqs as threads to
see if it works fine then?

>
> > Also can you show us the output of /proc/interrupts so we know which
> > threads are associated to the network card interrupt, and see where they
> > are.
> >
>
> harley:/home/markh/work/lcrs-linux # cat /proc/interrupts
> CPU0 CPU1
> 0: 450333 0 IO-APIC-edge [........N/ 0] pit
> 1: 4288 0 IO-APIC-edge [........./ 1] i8042
> 8: 2 0 IO-APIC-edge [........./ 0] rtc
> 9: 0 0 IO-APIC-level [........./ 0] acpi
> 12: 66129 0 IO-APIC-edge [........./ 1] i8042
> 14: 3523 0 IO-APIC-edge [........./ 0] ide0
> 15: 65675 0 IO-APIC-edge [........./ 0] ide1
> 169: 219209 0 IO-APIC-level [........./ 0] ide2,
> aic7xxx, aic7xxx, eth1, eth2, gpiohsd, gpiohsd, gpiohsd, gpiohsd, eprm
> 177: 1821 0 IO-APIC-level [........./ 0] Intel
> 82801BA-ICH2
> 185: 185550 0 IO-APIC-level [........./ 0] eth0
> 193: 0 76740 IO-APIC-level [........./ 0] rtom
> NMI: 0 0
> LOC: 2657906 587751
> ERR: 0
> MIS: 0

I see you are pinning all the irqs to CPU0

>
> The aic7xxx controllers are both connected to external legacy scsi
> racks. eth1, eth2, and the aix7xxx cards are in an SBS pci expansion
> chassis. The 3 gpiohsd and the 1 eprm cards are also in the expansion
> rack but are not being used at all in this.

So all but the 3 gpiohsd and eprm are being used? Still that seems to be
a lot. But anyway, send me the compressed task dump, and I'll take a
look. Maybe it will shed some light.

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/