Re: no DRQ after issuing WRITE was Re: 2.4.23-uv3 patch set released
From: Daniel Tram Lux
Date: Sat Jan 03 2004 - 06:24:47 EST
Rob Love wrote:
On Tue, 2003-12-30 at 17:54, Linus Torvalds wrote:
Interrupts are _not_ disabled here, very much on purpose. If they were,
then "jiffies" wouldn't update, and the timeouts wouldn't work.
This is what that _stupid_ "local_irq_set()" function does: it saves the
old irq masking state, and then it enables it.
The whole concept doesn't make any sense. If you enable interrupts, there
is little point in saving the callers irq mask, since it already got
deflated.
Ah, OK. local_irq_set() is worthless, then.
Curious to see the results of upping the timeout.
Rob Love
I tried setting the timeout up as a first fix, it also decreased the
frequency of the error,
however it did not get rid of the error.
I used:
#define WAIT_DRQ (10*HZ/100) /* 100msec - spec allows up to 20ms */
in stead of:
#define WAIT_DRQ (5*HZ/100) /* 50msec - spec allows up to 20ms */
The device the error occurs with is a cf card. The error also occurs
much more frequently in
2.4.23 than in 2.4.20 (but it can be provoked in 2.4.20). Neither use
the preemption patch
and both are from kernel.org. The platform is based on an AMD Elan
processor which is
a 486 compatible processor, running at 133 Mhz. The IDE subsytem does
not use any extra
drivers and is not a PCI ide chipset.
The test I use to provoke the error is moving a directory tree from hdc
(a normal harddisk)
to hda (the cf card), removing the dir on hdc, copy it back from hda to
hdc, and remove it
from hda, then start all over.....
While doing this there is a flood ping running and the machine is being
flood pinged + there
is traffic on three serial ports (RS485).
The way the code works right now there is no way you can tell how much
time has passed
since the status register last got read out due to a possible interrupt.
So when I made the patch
I saw two possibilities, either disabeling the interrupts while first
reading the status and then
checking the timeout, after which the interrupts would be enabled again.
Or to just make one extra check after the timout has expired because
that is cheaper
than returning, failing and then resetting the drive. After I applied my
patch (using the
5*HZ/100 timeout) my test ran for a full weekend without giving the
timeout error.
Before the error would occur about every 3 minutes with 2.4.23 and every
couple of
hours on 2.4.20. (I didn't try to patch 2.4.20).
The ide standard gives a timeout for the busy wait of 20 ms which should
not be exceeded
and the documentation from sandisk (the cf card is from sandisk) claims
to conform to this.
If anybody has any other suggestions/tests I can try these out on monday
when I am back
at work.
Regards
Daniel Tram Lux
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/