Re: Laptop mode causing writes to wrong sectors?

From: Jan Niehusmann
Date: Wed Nov 16 2005 - 16:42:18 EST

On Wed, Nov 16, 2005 at 09:06:03PM +0100, Bart Samwel wrote:
> First of all, you having resized your fs is a smoking gun, if you ask
> me. Your fs is dead/dying, and you know you've recently been tinkering
> with it. It's the most probable cause.

Well, it would be nice if the explanation was that easy - but the recent
corruption (with the man page) was on my root partition, which is not on
LVM and has never been resized.

> Secondly, I think that your resize sequence is missing an e2fsck -f
> after resize2fs.

Be assured that at least after the first corruption I observed, I did
forced e2fsck on all partitions. Without any errors found.

> after the resize2fs -- seeing as all the subsequent fscks were probably
> done by journal.

What do you mean with 'by journal'? The filesystems were unmounted (or
remounted r/o), so the journal should have been committed and empty at
e2fsck time.

> >But now, I got another hint pointing to a possible cause of this
> >problem: I found a file - /usr/lib/ - which was corrupted
> >by 4k of it being overwritten by a different file, which I recognized.
> >And that file happened to be an uncompressed manual page.
> This sounds like your filesystem's block bitmaps are "fscked up". These
> problems can definitely cause "creeping corruption" when undetected,

But this should definitely have been detected by an fsck, right?

> (especially if your filesystem has a relatively large amount of free
> space, as it probably does because you just resized it)

Root fs is 94% full, and during apt-get upgrade sometimes becomes
completely full. (Which I don't like, because it probably causes bad

> About the laptop mode hypothesis: I think it's just a coincidence. If
> it's not, then it could be a "sync-time-only" problem (because what
> laptop mode does before spindown is a sync), but not a specific laptop
> mode problem -- laptop mode doesn't influence block numbers whatsoever.
> But if it were a sync problem, we would be seeing a lot more reports
> of corruption. For now my vote is with the resize. :)

I agree it's probably not a sync problem. And therefore, probably not
really a laptop-mode bug, even if laptop-mode triggered the corruption.
I suspected the hard drive to mess up write requests during spin-up. Or
perhaps giving some kind of error message, which could trigger a bug in
a rarely tested error-handling path in the kernel. But the fact that you
never got similar reports makes this less likely. In the end, I have to
consider there may be some bad hardware in my laptop. (Already did a
memtest86, of course.)


Attachment: signature.asc
Description: Digital signature