Thanks, Alan for your reply to this,...just a couple clarifications
On Thu, 27 Sep 2001, Alan Cox wrote:
> > 2.4.7-ac2 patch level and not having any lock-ups with ultraDMA.
> > My question--there was a new file "serverworks.c" inserted in the
> > 2.4.6 ac patches. Does anyone know if that made it into the kernel
> > and supposedly fixed the problem? (My guess is that it has *not*
> Short answer is "it should"
OK--"it should" fix the bug--my question is, how far has the
serverworks.c which was in the 2.4.6-ac series now propagated
into Linus' tree (and eventually into the RedHat tree?) Also,
are the later 2.4.7 and higher -ac patches significantly
different from that which was found in the 2.4.6-ac patches?
> Long answer "I have been chasing a specific problem with OSB4, seagate
> drives and UDMA corruption. We can reliably reproduce it and see it on one
> set of machines. Serverworks cannot reproduce it elsewhere"
Too bad we didn't know this a month ago... we had 136 of these machines
and were seeing it all the time, could have given you a perfect test
bed...we found the quickest way
to reproduce it was just to do ls -R /usr | grep Input/output error
and managed to make 80 of the 136 machines misbehave within a couple
of days time. As a result of that the vendor swapped all the drives
to western digital and left us with a bug that has only happened on
10 machines over the course of 2 months--where the DMA timeouts
will hang the machine but not corrupt the data.
> So unless your box when running current -ac starts spewing messages about
> DMA completions seeming broken - it should work. I only mention this
> because you write:
> > We have seen quite a difference on systems that are otherwise
> > the same (Supermicro 370DLE w/serverworks OSB4 LE chipset) by swapping
> > different models of hard disk drives. With some types of drive
> > (Seagate) we
> > observe massive corruption of the file system but nothing reported
> > in /var/log/messages or on the console. Currently we see hda
> > timeouts (but only on about 10 systems over the course of 2 months)
> > which hang the machine but after a reboot things are fine (Western
> > Digital).
> I would thus be very interested if the current -ac "hardware just did
> something impossibly stupid" trap is hit.
Just tell me what you think your latest and greatest patch is that
you would want to see tested and I'll be glad to give it a test on this
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to firstname.lastname@example.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Sun Sep 30 2001 - 21:01:02 EST