Re: could someone plz explain those ext3/hard disk errors

From: Bill Davidsen
Date: Wed Feb 18 2004 - 13:49:26 EST


John Bradford wrote:
Quote from Bill Davidsen <davidsen@xxxxxxx>:

John Bradford wrote:

now...hm, it all started when i upgraded from kernel 2.4.19 to 2.6.0
in late decemeber, the system worked very fine for a week or so
(having great response times!) but then all of a sudden the problems
started. 2 disks died. then my gigabit network card was only able to
transmit 200kb/s (but this was really a hardware problem, a new card
is working fine again, well...). a week later the next disks are
having problems and i have yet to RMA three disks. and now the next
two disks..., i'm getting insane ;) i can't see any EXT3 error anymore
*g* the next disks will be reiserfs only to see other error messages
;) well, but that doesn't solve the problem of 6 disks within 2
months...this is so unlikely.


Please read the FAQ, fix your mail application - you are sending long
lines, and don't break the CC list.

As to your problem, look at the LBA sector addresses in the error
message:

280923064991615

is your drive really over 100 EB? No...

I think we could assume that (a) he never told the kernel the disk was "over 100 EB" and (b) the kernel was trying to use that LBA anyway. Which could be due to either a kernel bug or memory corruption (or CPU problems, but unlikely).


What I was trying to point out is that the error message is clearly
the result of a problem elsewhere. Unless the drive firmware is
buggy, or something very strange is going on inside the drive, (bad
internal RAM or something like that), then the kernel did send a
request for a sector which is well out of range. What caused that
request we don't know - quite possibly corruption of some filesystem
structure on the disk caused that request, but it's important to be
clear that the error message is the expected response to a request for
such a high block number, and doesn't within itself indicate a problem
with the disk.

I.E. Even though there is every chance that the drive is faulty, the
posted error message doesn't indicate a drive failiure in itself, and
you should look elsewhere.

Yes, I think we are saying the same thing, and by now hopefully the O.P. has gotten that. I suspect non-disk cause, just my guess.

--
bill davidsen <davidsen@xxxxxxx>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/