Re: IDE + SMP Lockup (no OOPS) in 2.2.12, 2.2.10

Andre Hedrick (andre@suse.com)
Thu, 30 Sep 1999 03:06:46 -0700 (PDT)


Yeah this one is close to being lick.
The sequential access is better.

Alan, that little proggy you sent me eventually segfaults and locks the
cdrom. I assume a wrap around issue in the user-space code.

Linux-IDE:/src # smp-ide
Segmentation faults a DATA cd

this is after a half of a short int of banging.

Here is a three way access:

./hdparm -t /dev/hda & ./hdparm -t /dev/hdc & ./hdparm -t /dev/hde &
[1] 218
[2] 219
[3] 220
Timing buffered disk reads: 64 MB in 8.35 seconds = 7.66 MB/sec
Timing buffered disk reads: 64 MB in 8.45 seconds = 7.57 MB/sec
Timing buffered disk reads: 64 MB in 26.53 seconds = 2.41 MB/sec

./hdparm -i /dev/hda /dev/hdc /dev/hde

/dev/hda:

Model=Maxtor 90845D4, FwRev=GAS54112, SerialNo=A40APN8C
Config={ Fixed }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=29
BuffType=3(DualPortCache), BuffSize=256kB, MaxMultSect=16, MultSect=off
DblWordIO=no, maxPIO=2(fast), DMA=yes, maxDMA=2(fast)
CHS=1027/255/63 LBA Native, LBAsects=16514064
tDMA={min:120,rec:120}, DMA modes: mword0 mword1 mword2
IORDY=on/off, tPIO={min:120,w/IORDY:120}, PIO modes: mode3 mode4
UDMA modes: mode0 mode1 *mode2
Drive Supports : ATA/ATAPI-4 T13 1153D revision 17 : ATA-1 ATA-2 ATA-3 ATA-4

/dev/hdc:

Model=Maxtor 90845D4, FwRev=GAS54112, SerialNo=A40EJEPC
Config={ Fixed }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=29
BuffType=3(DualPortCache), BuffSize=256kB, MaxMultSect=16, MultSect=off
DblWordIO=no, maxPIO=2(fast), DMA=yes, maxDMA=2(fast)
CHS=1027/255/63 LBA Native, LBAsects=16514064
tDMA={min:120,rec:120}, DMA modes: mword0 mword1 mword2
IORDY=on/off, tPIO={min:120,w/IORDY:120}, PIO modes: mode3 mode4
UDMA modes: mode0 mode1 *mode2
Drive Supports : ATA/ATAPI-4 T13 1153D revision 17 : ATA-1 ATA-2 ATA-3 ATA-4

/dev/hde:

Model=IDE/ATAPI CD-ROM 36X, FwRev=T6C4, SerialNo=
Config={ Fixed Removeable DTR<=5Mbs DTR>10Mbs nonMagnetic }
RawCHS=0/0/0, TrkSize=0, SectSize=0, ECCbytes=0
BuffType=0(?), BuffSize=0kB, MaxMultSect=0
DblWordIO=no, maxPIO=4(ata), DMA=yes, maxDMA=2(fast)
(maybe): CurCHS=0/0/0, CurSects=0, LBA=yes, LBAsects=0
tDMA={min:120,rec:150}, DMA modes: mword0 mword1 *mword2
IORDY=yes, tPIO={min:227,w/IORDY:120}, PIO modes: mode3 mode4
UDMA modes: mode0 mode1 mode2

Andre Hedrick
The Linux IDE guy

On Thu, 30 Sep 1999, Rogier Wolff wrote:

> Jes Sorensen wrote:
> > >>>>> "Rogier" == Rogier Wolff <R.E.Wolff@BitWizard.nl> writes:
> >
> > Rogier> Jes Sorensen wrote:
> > >> Most of the interrupt sharing problems I have seen have been due to
> > >> driver writers forgetting to check whether an interrupt actually
> > >> came from the device before they start processing it.
> >
> > Rogier> To get some extra speed out of a drive, some manufacturers
> > Rogier> give the interrupt a few microseconds earlier than when the
> > Rogier> data is fully available. With faster and faster hardware, it
> > Rogier> becomes possible to empty the buffer before it is fully
> > Rogier> available.... Data corruption. Now by sharing the interrupt,
> > Rogier> the OTHER drive may be interrupting, and the processor gets
> > Rogier> that few microsecond lead....
> >
> > Well it is supposedly the job of the driver to check that with the
> > hardware that data is actually available before taking it for
> > granted. This may for various reasons be problematic if the cheap
> > designers did stupid things etc.
>
> Right. If you really want to interrupt before the data is really
> available, you end up marking "data ready" in het control register
> too. It's easier that way. If you're cutting corners cut them well.
>
> Oh, the reason that I'm pretty sure that the PCI interrupt sharing
> for IDE works is that otherwise it wouldn't work at all. You always
> have to check for data.
>
> One way to optimize a driver is to say that IF the check for "did you
> interrupt me" is just as long as "do you have data for me", then you
> can forget the "did you interrupt me" and simply ask if it has any
> data. I mean if it didn't interrupt but it does have data, why not
> handle it anyway. Saves the extra interrupt overhead.
>
> That's a trick that is completely according to the rules. But if the
> NT driver happens not to do it that way, do you think it has been
> tested? Anyway, Andre Hedrick said he was able to reproduce the
> problem, and that means that the solution is in sight. It is very slow
> debugging a driver with a client 1000 miles away saying: "no, it still
> crashed".
>
> > Rogier> Tell me. Is the Linux serial driver reliable? I'm seeing
> > Rogier> datacorruption. That's for sure. Ok. There is a neato, new PCI
> > Rogier> serial chip involved. The chip looks reliable to me. The
> > Rogier> driver looks reliable to me. Where is the data getting
> > Rogier> corrupted? Anyway: Something to do for today...
> >
> > Well serial is a medium that is almost guaranteed to lose data once in
> > a while ;-)
>
> Ok. Tell me: Why is the PCI serial chip with 128 byte buffer
> experiencing overruns, while the chip soldered to my motherboard
> (probably in one of those SMC multi-io chips (*)) with only 16 byte
> buffer is NOT dropping characters?
>
> What the hell is going on?
>
> I'll be checking that the 128byte buffer is indeed enabled, and after
> that I don't know what to do anymore....
>
> Oh. I was suspecting the transmit path, but have now narrowed it down
> to the recieve path...
>
> Roger.
>
>
> (*) It's effectively on the ISA bus!
>
> --
> ** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2137555 **
> *-- BitWizard writes Linux device drivers for any device you may have! --*
> ------ Microsoft SELLS you Windows, Linux GIVES you the whole house ------
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/