Re: 2.1.111: IDE DMA disabled?

Linus Torvalds (torvalds@transmeta.com)
Sun, 26 Jul 1998 13:30:37 -0700 (PDT)


On Sun, 26 Jul 1998, Alan Cox wrote:
>
> What Im questioning is whether you are fixing a single persons dodgy
> machine, a real problem in the IDE layer, or a real hardware problem.

To me, the end result is pretty much the same. I wouldn't be in the least
surprised if it is a hardware problem, and I've tried to be careful not to
blame driver bugs in my emails by saying so.

In fact, my personal belief is that it _is_ a hardware problem, with too
tight timings making it appear. And I would also believe Gadi and Mark
almost implicitly if they tell me that the driver is "within spec".

My personal suspicion is that we're _just_ aggressive enough to make a few
machines unhappy. Very obviously the IDE driver works beatifully on 99% of
all machines, which is why I don't think there is anything major wrong
with the thing ("major wrong" would be to start the DMA on a write, and
release the request before the full write has been done, for example).

It's just that I personally think that even if on just 1% of machines we
have occasional problems because 1% fo all machines are sub-spec, then we
shouldn't default to being as aggressive as we are.

(And I actually suspect that in the PC world, the "1%" is closer to 10% or
something, it's just that a few machines are _so_ much out of spec that it
then shows up).

And THAT is why I think we should default to something slower and safer,
and then continue to look into the problem (because we'd still like to
know why it happens, just in case..)

For example, what makes me suspicios is that for the person I was talking
to, the problems showed up between 102 and 103. I've now looked at the 102
patch too, and there is absolutely nothing there related to filesystems
that looks even slightly suspicious.

The ide driver itself didn't change very much, but what _did_ change was
- the cyrix code was removed. That code would accidentally overwrite the
DMA registers on BX motherboards, and it is just possible that by
mistake would make DMA work better (slowing down the timings or
something)
- the interrupt code path for SMP was speeded up and simplified (and
finally fixed - this was the first release that got the pending
interrupts right)

Essentially, everything I found looked more like timing changes, not like
some fundamental bug. Which is why I'm suspicious of timing and hardware
issues.

Yes, timing changes can make a latent fundamental bug show up, no question
about that. But at least to me it just looks more like a "simple" hardware
timing thing than anything else right now.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html