Re: jbd bug(s) (?)

From: Jakob Oestergaard (jakob@unthought.net)
Date: Thu Sep 26 2002 - 07:56:47 EST


On Thu, Sep 26, 2002 at 01:27:23PM +0100, Stephen C. Tweedie wrote:
> Hi,

Hi !

...
>
> The index is only updated when we purge stuff out of the journal.

Yes, of course.

> That can still be quite frequent on a really busy journal, but it's
> definitely not a required part of a transaction.

No, but the correct writing of your index is a required part for the
successful re-play at recovery.

You can't re-play if you can't find the beginning of your journal :)

Just *imagine* that half your index was written successfully to disk and
then the power failed. That was what I imagined could happen.

Quite some people have pointed out to me by now, that I was wrong - so
don't worry about it :)

>
> That's deliberate --- the ext3 journal is designed to be written as
> sequentially as possible, so seeking to the index block is an expense
> which we try to avoid.

Of course - any sane journal should be designed that way.

>
> > RAID wouldn't save me in the case where the journal index is screwed due
> > to a partial sector write and a power loss.
>
> A partial sector write is essentially impossible. It's unlikely that
> the data on disk would be synchronised beyond the point at which the
> write stopped, and even if it was, the CRC would be invalid, so you'd
> get a bad sector error return on subsequent attempts to read that data
> --- you'd not be given silently corrupt data.

I know. What I imagined was, that there were disks out there which
*internally* worked with smaller sector sizes, and merely presented a
512 byte sector to the outside world.

That way, it would be perfecly possible for a disk to write 2 bytes of
your index pointer along with it's error correction codes and what not.
And it would be perfectly possible for it to return an invalid index
pointer (2 bytes of your new pointer, the remaining bytes from the old
pointer) - without returning a read error.

Let's hope that none of the partitioning formats or LVM projects out
there will misalign the filesystem so that your index actually *does*
cross a 512 byte boundary ;)

> Making parts of the disk suddenly unreadable on power-fail is
> generally considered a bad thing, though, so modern disks go to great
> lengths to ensure the write finishes.

Lucky us :)

 Cheers,

-- 
................................................................
:   jakob@unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob Østergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Sep 30 2002 - 22:00:26 EST