Re: 2.0.34pre breaks my system badly

Glenn Lamb (mumford@verio.com)
Sun, 17 May 1998 01:24:57 -0700 (PDT)


On Sun, 17 May 1998, Alan Cox wrote:

> Date: Sun, 17 May 1998 01:40:44 +0100 (BST)
> From: Alan Cox <alan@lxorguk.ukuu.org.uk>
> To: linux-kernel@vger.rutgers.edu
> Subject: Re: 2.0.34pre breaks my system badly
>
> > From: Glenn Lamb <mumford@verio.com>
> >
> > Other problems that show themselves:
> > 1) my floppy drive won't work. The activity light will come on, but that's
> > all.
>
> The floppy driver hasnt changed between 2.0.33 and 2.0.34. I imagine you
> are using a gcc >2.7.2.3 if so tough it doesnt work. Use gcc 2.7.2.3.
> 2.8.x optimses out a ton of things it shouldnt and may well be causing
> millions of bugs. Sure its probably Linux not saying 'erm dont optimse me'
> thats the problem not gcc 2.8.1 but it changes too much to avoid fixing it

$ gcc -v
Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/2.7.2.3/specs
gcc version 2.7.2.3

Assume nothing.

> > 2) my network card will not work. The system sees it and configures it
> > properly (same configuration as 2.0.33), but will not communicate with
> > it. I get a lot of timeout errors if I enable debugging on the driver
> > module.
> >
> > <RANT>
> > Who's idea was it to add so many new features to what should be a stable
> > kernel release? I remember seeing around 3 new network drivers, and a ton
> > of new language stuff in the msdos filesystem area. Aren't we supposed to
> > *not* be adding new features to the "stable" kernel for this exact reason:
> > that it would become unstable?
> > </RANT>
>
> <RANT>
>
> 1. Diff the kernel except drivers/* between 2.0.33 and 2.0.34pre14 - shock
> horror, my what a small diff. Also all 2.0.33/2.0.34 drivers are
> interchangable precisely so I can back drivers in and out.

Even 2.0.34pre12 with the 2.0.33 ethernet drivers failed. I fought with
this problem for a week before I posted. I assure you I covered all of
the obvious.

> 2. The driver updates are in many cases ones vendors have been shipping for
> months.

Vendor? You mean Digital is supplying Linux with tulip drivers now?

> 3. The FAT32 update is one some vendors have been shipping for months and has
> a big userbase. We chose not to make changes to improve on it because
> of this.

Maybe it should not have been added, since it does nothing to add to
stability.

> 4. Adding new drivers DOES NOT make a kernel less stable. It makes it usable
> to those with the cards in question.

I made no claim that adding the new driver caused the instability. It is
not the new network or fat32 stuff that caused the problem at all--since I
am quite capable (and currently using) a 2.0.33 kernel with the newest
network drivers and the fat32 patches. Based on my determination, it is
in fact a DMA problem. More on that below.

All I did was question the logic in adding *NEW* features into what is by
definition something that is not supposed to have any new features added?
Whether or not it works, or whether or not you've gotten new features from
the authors is not an issue.

> 5. If you are going to say "My network card will not work" get a small
> clue and say what sort of card please.
> </RANT>

<RANT>
I posted *TWICE* two weeks ago about problems with my network card. I
gave full details about all four network cards I tried and what versions
of the drivers were being used. I posted all debugging information I could
glean from the kernel, including the timeout messages that would come up
when I tried pinging the outside world, and the interrupt messages that
would appear when someone tried to ping me. I also gave detailed info
about what kernel version and version of gcc I was using. I even posted
the debug information for the network cards to the individual driver's
mailing lists *and* to the linux-net mailing list. In all cases I was
totally ignored.

Funny how when I ask for help I get ignored, but when I criticize I get
flamed.

I assure you even though I am not a kernel hacker, I am certainly not
without a clue.
</RANT>

Just for reference:

1 Netgear FA310TX PCI card, dec tulip 21140 based, driver tulip.c
v0.79 (comes in 2.0.33) works in 2.0.33
v0.88 (comes in 2.0.34) does not work in 2.0.34
tried v0.79 (comes in 2.0.33) with 2.0.34, did not work.
tried v0.88 (comes in 2.0.34) with 2.0.33, did work.
tried v0.89B (newest, downloaded from webpage) with 2.0.33 did work.
tried v0.89B (newest, downloaded from webpage) with 2.0.34 did not work
1 SMC 9432 PCI card, driver epic100.c
tried what comes with 2.0.34 in 2.0.34, did not work.
driver doesn't come with 2.0.33, so I didn't test then.
1 3c509 EISA card
tried what comes with 2.0.34, did not work.
1 NE2000 clone EISA card
tried what comes with 2.0.34, did not work.

All kernels were compiled without modification by gcc 2.7.2.3. The problems
I had were the card would be configured correctly (the same configuration
files were run in 2.0.33 and 2.0.34), giving absolutely no errors, but it
would not send or receive traffic. Anything sent from my computer would
not cause the activity light to come on on the card. Anything sent to my
computer would cause the activity light to come on, and would also trigger
interrupts, but the kernel would not respond. This happened with all four
cards.

Another symptom of the problem was that the floppy drive would not work in
2.0.34pre*. I could issue floppy commands, and the floppy drive light would
come on, but nothing would happen (the drive wouldn't even spin). I could
not abort the disk write by ^C or sending a signal to the process. The only
way to abort it was to eject the disk. The kernel would usually print an
error after that stating I/O error sector 0. 2.0.33 (compiled with the
same options, except for all the new language crap) had no problems with the
same floppy drive and the same floppy disk issuing the same commands. The
test command was just to create a minix filesystem on the floppy,
# mkfs.minix /dev/fd0
The exact error after ejecting the floppy is:

---------------------------------------------------------------------------
rook:0:/root# mkfs.minix /dev/fd0
floppy0: disk removed during i/o
end_request: I/O error, dev 02:00, sector 0
end_request: I/O error, dev 02:00, sector 0
end_request: I/O error, dev 02:00, sector 0
Usage: mkfs.minix [-c | -l filename] [-nXX] [-iXX] /dev/name [blocks]
---------------------------------------------------------------------------
The floppy is ejected after a minute or so. It should certainly be able
to pass sector 0 after a minute. Again, the same floppy works later when
I boot up 2.0.33 again.

Still another symptom was the hda irq timeout and DMA disabled messages:

---------------------------------------------------------------------------
2.0.33:
ide: i82371 PIIX (Triton) on PCI bus 0 function 57
ide0: BM-DMA at 0xffa0-0xffa7
ide1: BM-DMA at 0xffa8-0xffaf
hda: Maxtor 83500D4, 3339MB w/256kB Cache, CHS=848/128/63
hdb: Maxtor 71260 AT, 1204MB w/256kB Cache, CHS=612/64/63
hdc: HP CD-Writer+ 7200, ATAPI CDROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
Floppy drive(s): fd0 is 1.44M
FDC 0 is a National Semiconductor PC87306
Partition check:
hda: hda1 hda2 hda3
hdb: hdb1 hdb2

---------------------------------------------------------------------------
2.0.34pre12
ide: i82371 PIIX (Triton) on PCI bus 0 function 57
ide0: BM-DMA at 0xffa0-0xffa7
ide1: BM-DMA at 0xffa8-0xffaf
hda: Maxtor 83500D4, 3339MB w/256kB Cache, CHS=848/128/63, UDMA
hdb: Maxtor 71260 AT, 1204MB w/256kB Cache, CHS=612/64/63, DMA
hdc: HP CD-Writer+ 7200, ATAPI CDROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
Floppy drive(s): fd0 is 1.44M
FDC 0 is a National Semiconductor PC87306
Partition check:
hda:hda: irq timeout: status=0x58 { DriveReady SeekComplete DataRequest }
hda: disabled DMA
hdb: disabled DMA
ide0: reset: success
hda1 hda2 hda3
hdb: hdb1 hdb2

---------------------------------------------------------------------------
2.0.34pre13
ide: i82371 PIIX (Triton) on PCI bus 0 function 57
ide0: BM-DMA at 0xffa0-0xffa7
ide1: BM-DMA at 0xffa8-0xffaf
hda: Maxtor 83500D4, 3339MB w/256kB Cache, CHS=848/128/63, UDMA
hdb: Maxtor 71260 AT, 1204MB w/256kB Cache, CHS=612/64/63, DMA
hdc: HP CD-Writer+ 7200, ATAPI CDROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
Floppy drive(s): fd0 is 1.44M
FDC 0 is a National Semiconductor PC87306
Partition check:
hda:hda: irq timeout: status=0x58 { DriveReady SeekComplete DataRequest }
hda: disabled DMA
hdb: disabled DMA
ide0: reset: success
hda1 hda2 hda3
hdb: hdb1 hdb2

---------------------------------------------------------------------------
2.0.34pre14
ide: i82371 PIIX (Triton) on PCI bus 0 function 57
ide0: BM-DMA at 0xffa0-0xffa7
ide1: BM-DMA at 0xffa8-0xffaf
hda: Maxtor 83500D4, 3339MB w/256kB Cache, CHS=848/128/63, UDMA
hdb: Maxtor 71260 AT, 1204MB w/256kB Cache, CHS=612/64/63, DMA
hdc: HP CD-Writer+ 7200, ATAPI CDROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
Floppy drive(s): fd0 is 1.44M
FDC 0 is a National Semiconductor PC87306
Partition check:
hda:hda: irq timeout: status=0x58 { DriveReady SeekComplete DataRequest }
hda: disabled DMA
hdb: disabled DMA
ide0: reset: success
hda1 hda2 hda3
hdb: hdb1 hdb2

---------------------------------------------------------------------------

As you can see, for the most recent 2.0.34pre series, dma gets disabled on
the hard drives. I believe (I have no evidence of this, however) that dma
is getting disabled on the network cards as well, and that's what's causing
the problems.

Btw, if it matters, I have a P-II 400, 440BX chipset. It shouldn't matter,
though, since the PPro I have sitting about 5 meters away also has the same
problems with the network card (PPro 200, Orion chipset) (in fact, the same
problems with the same network cards).

Just for reference: the .config file from 2.0.33 was used to generate the
.config file for 2.0.34pre12. No extra drivers were selected except for
the language stuff (and that was compiled as a module, so it would not
contribute to kernel IRQ timeouts during boot). .config files for pre13
and pre14 were generated from the pre12 .config without change.

> Thank you. PS linux-kernel appears to have unsubscribed me so any reports
> for the past week and half that went to linux-kernel about 2.0.34prex kernels
> are lost, gone and dust.
>
> Please try pre14 (pre15 tomorrow). Build it with gcc 2.7.2.*. Then people
> who are seeing bugs please send me reports (prefably directly or cc me
> as linux-kernel may well drop me again for all I know)

Try offering some suggestions that you wouldn't offer to a first time
Linuxer.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu