Common IRQ pitfall results in lockup.

Rogier Wolff (R.E.Wolff@BitWizard.nl)
Fri, 5 Nov 1999 13:23:47 +0100 (MET)


Hi Alan, Linus,

Could you include the following patch that explains a common (see the
list of affected drivers in the patch) IRQ pitfall. It also provides a
platform for driver writers to "publish" pitfalls they encountered
which may be important for others.

Todays pitfall results in that a complete lockup can occur if a PCI
interrupt is shared between two devices, and one of them decides to
unload or just give up the interrupt.

Alan, I'd like to discuss with you and Ted what we'll do about the 2.2
serial driver: Leave as is (please, no!), upgrade to Teds newest
driver with PCI serial card support (with fix), or just a minimal fix.

Although the current 2.2. driver doesn't automatically detect any PCI
serial cards, you can use setserial to tell it that a serial port is
present. This can result in the serial driver triggering the bug (in
combination with one of the other drivers that has the problem).

A client reports that this problem occurred (and that my fix fixed
it), when his SCSI card was sharing the interrupt. The cards driver
however isn't listed as "misbehaving", so I'm not so sure that the
problem ONLY occurs when both drivers are "misbehaving". If that's the
case, the bug isn't (in genetic terms) "recessive" (i.e. happens only
when BOTH drivers misbehave) as my analysis would make you believe,
but "dominant" (i.e. can happen when either driver misbehaves).

Regards,

Roger Wolff.

-------------------------------------------------------------------------

--- linux-2.2.13.clean/Documentation/driver_writers_readme Fri Nov 5 11:20:55 1999
+++ linux/Documentation/driver_writers_readme Fri Nov 5 11:20:16 1999
@@ -0,0 +1,93 @@
+
+What's this file?
+-----------------
+
+This file tries to explain some of the pitfalls that driver writers
+may encounter. All you driver writers, feel free to contribute by
+adding stuff that you encountered, and that you think might be
+interesting to prevent others from making the same mistake again.
+
+I think it's best that this is not officially "maintained" by me, as
+only additions/corrections are acceptable, and it would be easiest to
+just send Linus patches.
+
+
+The IRQ pitfall.
+----------------
+
+Older Linux kernels called the interrupt handler with the IRQ
+number. The driver then contained an IRQ number -> devinfo table.
+
+The newer kernels provide a void* argument to the function where you
+can put the devinfo pointer, so that interrupts can be shared and
+things like that. Passing "NULL" there was done to many drivers as a
+compatibility measure. This is however a VERY temporary (*) measure
+that should be eradicated ASAP:
+
+Interrupts can be shared, the dev pointer is used as an "identifier".
+Suppose two drivers with the old "NULL" are using the same (PCI, level
+triggered) interrupt. First driver allocates interrupt, second driver
+allocates interrupt. All is fine now. Interrupt comes, both interrupt
+routines get called, handlers tickle their device to make the
+interrupt stop and control is returned to whatever was interrupted.
+
+Then the second driver is unloaded/doesn't need the interrupt
+anymore. It then tries to free its own interrupt routine, but the
+kernel frees the first matching interrupt on the IRQ chain for that
+interrupt. This happens to be the first drivers interrupt routine. Now
+when some activity happens on the first device, it will interrupt, but
+the second driver's interrupt routine, still on the chain will get
+called. It cannot cause the first device to stop interrupting, so an
+interrupt storm occurs. As soon as interrupts are re-enabled, the CPU
+is interrupted again. Clients report this as a complete lockup.
+
+At the moment, the following drivers request an interrupt with NULL as
+the last argument:
+
+net: hp.c e2100.c
+hamradio: scc.c yam.c
+block: hd.c xd.c amiflop.c
+char: busmouse.c msbusmouse.c tpqic02.c wdt.c stallion.c rtc.c
+ riscom8.c h8.c qmouse.c amikeyb.c dn_keyb.c isicom.c
+scsi: aha1542.c NCR53c406a.c ultrastor.c pas16.c ncr5380.c aha1740.c
+ t128.c 53c7,8xx.c in2000.c g_NCR5380.c eata_pio.c wd7000.c
+ qlogicfas.c tmscim.c psi240i.c atp870u.c ini9100u.c inia100.c
+ sym53c416.c
+sound: maui.c uart6850.c
+cdrom: cdu31a.c cm206 mcd mcdx
+isdn: act2000_isa
+ap1000: ddv
+macintosh:mediabay
+nubus: nubus?
+acorn: cumana_1 ecoscsi oak mfmhd keyb_ps2
+usb: mouse?
+
+Some of the listed drivers may only support ISA devices (act2000_isa
+comes to mind). ISA normally only supports edge triggered interrupts.
+I recommend the policy: "Don't count on it". (e.g. PNP devices have
+a setup option to use level or edge triggered interrupts. I don't know
+wether that's for real... )
+
+The best way to correct this is to pass along the "dev" structure that
+describes your device. If (like the serial driver) you don't request
+the IRQ multiple times for multiple devices hanging on one interrupt,
+I recommend that you pass IRQ_table + IRQnum. This is an address
+inside your drivers "address space", and should be very unique. If all
+else fails (e.g. your driver may support just one of your devices) you
+could pass in the address of your interrupt routine or something like
+that.
+
+(*) Ok, it's been going on for several years now, but lets treat it
+with some urgency, before things really start blowing up in our faces:
+with more and more devices in computers people are going to see shared
+interrupts more and more...
+
+--------------------------------------------------------------------------
+
+Created on november 5th 1999 by R.E.Wolff@BitWizard.
+Added: What's this file? -- REW
+Added: IRQ request hints. -- REW
+
+Changed: ....
+Added: .....
+

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
 "I didn't say it was your fault. I said I was going to blame it on you."

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/