[patch] IDE problems on SMP, fixed?

MOLNAR Ingo (mingo@chiara.csoma.elte.hu)
Wed, 29 Jul 1998 15:35:40 +0200 (CEST)

> under very heavy (artificial) load, i can reliably make my system fail, it
> repeatedly fails to release a very important irq spinlock, causing a hard
> hang. The case where this happens is _always_ when an (arbitrary)
> interrupt hits us after ide.c's ide__sti() in ide.c:start_request().

as an interesting turn of events, my system is now very stable with the
attached patch applied.

could any IDE-expert (Gadi, Mark?) verify why this small fix makes a
difference on this SMP board? That sti() has been in ide_do_request() for
ages. The fix is arbitrary, i have a RAM module thats known to be bad
(although i never have any problems with it on 66MHz system bus, except
this single lockup). I've traced down the lockup and have 'fixed' the case
by moving the sti(), but no other thinking was behind this change ...

i do not really understand though why this sti() is considered safe on
SMP, as we have dropped the io_request_lock already, and we have dropped
the hwif->lock too, so we are just asking for trouble on another CPU, is
my thinking correct that at this point another CPU could add a request to
this hwif? Or is some other lock (hwif->busy?) handling this case already.

to get things right, this is a fairly standard configuration, good'ole
Quantum FB 1280ATA, PIIX4. The lockup was independent of _any_ BIOS
setting (Passive Release, etc.). Maybe this is an IDE problem after all?

(while writing this email, the stress-test was still running, this is with
a DMA-enabled kernel, and the _very same_ hardware and software
configuration caused a lockup after 20 seconds without the patch
installed. With the patch installed i got no lockup after 30 minutes of
uptime. It still might be hardware problems, although i have 3 fans and no
other problem has occured on this system so far, only this IDE lockup)

-- mingo

--- linux/drivers/block/ide.c.orig Wed Jul 29 15:09:18 1998
+++ linux/drivers/block/ide.c Wed Jul 29 15:09:44 1998
@@ -961,7 +961,6 @@
unsigned int minor = MINOR(rq->rq_dev), unit = minor >> PARTN_BITS;
ide_hwif_t *hwif = HWIF(drive);

- ide__sti(); /* local CPU only */
#ifdef DEBUG
printk("%s: start_request: current=0x%08lx\n", hwif->name, (unsigned long) rq);
@@ -988,6 +987,7 @@
block = 1; /* redirect MBR access to EZ-Drive partn table */
+ ide__sti(); /* local CPU only */
while ((read_timer() - hwif->last_time) < DISK_RECOVERY_TIME);
SELECT_DRIVE(hwif, drive);

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html