Re: Laptop mode causing writes to wrong sectors?
From: Jan Niehusmann
Date: Wed Nov 16 2005 - 16:42:18 EST
On Wed, Nov 16, 2005 at 09:06:03PM +0100, Bart Samwel wrote:
> First of all, you having resized your fs is a smoking gun, if you ask
> me. Your fs is dead/dying, and you know you've recently been tinkering
> with it. It's the most probable cause.
Well, it would be nice if the explanation was that easy - but the recent
corruption (with the man page) was on my root partition, which is not on
LVM and has never been resized.
> Secondly, I think that your resize sequence is missing an e2fsck -f
> after resize2fs.
Be assured that at least after the first corruption I observed, I did
forced e2fsck on all partitions. Without any errors found.
> after the resize2fs -- seeing as all the subsequent fscks were probably
> done by journal.
What do you mean with 'by journal'? The filesystems were unmounted (or
remounted r/o), so the journal should have been committed and empty at
e2fsck time.
> >But now, I got another hint pointing to a possible cause of this
> >problem: I found a file - /usr/lib/libatlas.so.3.0 - which was corrupted
> >by 4k of it being overwritten by a different file, which I recognized.
> >And that file happened to be an uncompressed manual page.
>
> This sounds like your filesystem's block bitmaps are "fscked up". These
> problems can definitely cause "creeping corruption" when undetected,
But this should definitely have been detected by an fsck, right?
> (especially if your filesystem has a relatively large amount of free
> space, as it probably does because you just resized it)
Root fs is 94% full, and during apt-get upgrade sometimes becomes
completely full. (Which I don't like, because it probably causes bad
fragmentation...)
> About the laptop mode hypothesis: I think it's just a coincidence. If
> it's not, then it could be a "sync-time-only" problem (because what
> laptop mode does before spindown is a sync), but not a specific laptop
> mode problem -- laptop mode doesn't influence block numbers whatsoever.
> But if it were a sync problem, we would be seeing a lot more reports
> of corruption. For now my vote is with the resize. :)
I agree it's probably not a sync problem. And therefore, probably not
really a laptop-mode bug, even if laptop-mode triggered the corruption.
I suspected the hard drive to mess up write requests during spin-up. Or
perhaps giving some kind of error message, which could trigger a bug in
a rarely tested error-handling path in the kernel. But the fact that you
never got similar reports makes this less likely. In the end, I have to
consider there may be some bad hardware in my laptop. (Already did a
memtest86, of course.)
Jan
Attachment:
signature.asc
Description: Digital signature