Re: 2.1.111: IDE DMA disabled...BLAH...BLAH...

Kristofer T. Karas (
Wed, 29 Jul 1998 14:47:47 -0400

Mark Lord writes:
>4. If for some reason PIO works, but DMA produces corruption...
> then perhaps the only possible way this might occur would be due
> to the IDE DMA controller performing burst PCI accesses during the
> data transfer. Thus, the error would be on the PCI bus...

This is quite likely as well as possible, as weaknesses in hardware
(rather than outright failures) will often allow a piece of circuitry
to pass the GenRad and other QA tests that are normally performed on
the production line. I had a Supermicro P6DLE motherboard that locked
up solidly every hour or so of moderate IDE activity, with some
corruption upon FSCK as people have reported here. Upping the ante by
adding an AGP graphics board (which is a bus-master device) made the
system so unreliable as to not be able to make it through booting
windoze95 2 out of every 3 attempts, though Linux lasted longer before

Not all hardware bugs are caught by the EEs, either. Years ago, I was
given a VME-interface board to debug, whose original designer tried to
fix the bugs but lost interest with the subtler ones. It worked
flawlessly in every application except that of a single customer,
whose program generated just the precise timing that was necessary to
make it break. Where timing on PC motherboards is fast enough these
days that the distinction between traces and transmission lines is
blurred, ringing and other abnormalities are likely to occur just as
they did in the case of my P6DLE.

>Quoted from linux-2.1.111:
> "# Either the DMA code is buggy or the DMA hardware is unreliable.
>"We" do not have any such reports yet.

What you and Linus are arguing about are really two similar but not
identical issues. You are saying the DMA mechanism should work, as
the timing is the same as on PIO, and the UDMA is superior due to
CRC16; fine, that's quite true. Linus is taking a higher-level view:
whether it's a design flaw with the intermediate PCI bus, or a timing
race in SMP kernels, or a weakness in a particular user's hardware
(and it's not just the bad SIMMs/DIMMs either), the end result is the
same - corrupted data. And it's the end-result that matters to the
user, particularly the user who is not hardware-literate enough to
effect a cure.

It might be worth noting that, in windoze95/98, the settings tab for
each hard drive has an enable-DMA checkbox that (at least in my setup,
with two high-capacity recent-vintage SMART/DMA capable drives from
different manufacturers) is disabled by default. Bad hardware may be
more pandemic than we assume. :-/

>Too bad, you used to be a nice guy.

It's too bad this has to degenerate into a pissing match; the bazaar
model of programming is supposed to be decorated with lots of kudos
and pats on the back for volunteered programming, which I note are
conspicuous in their absence from this list. But that's another
story... For our collective purposes, why not simply agree to
disagree. You can't, evidently, automatically compensate for all the
anomalies with just one paradigm; so make a config option ("enable DMA
upon boot-up if supported") that the user can set. Leave it by
default off, as windoze seems to do in my case. Savvy users can
enable it if they believe their system to be capable. This reduces
the risk you mention of enabling DMA when initially off, though users
can still do so with hdparm while deciding. And Linus seems likely to
accept it.


Kristofer Karas                           *    Senior systems engineer/SysAdmin
AMA/CCS DoD RF900RR HawkGT !car           * BI Deaconess Medical Center, Boston
"Build a system that even a fool can use, *
 and only a fool will want to use it."    *  Will design LISP machines for food

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to Please read the FAQ at