Re: ext3 badness in 2.6.0-test2

From: Andrew Morton (akpm@osdl.org)
Date: Wed Aug 06 2003 - 01:57:35 EST


Neil Brown <neilb@cse.unsw.edu.au> wrote:
>
> > Could have been an IO error, or the block/MD/device layer returned
> > incorrect data. ext3 used to go BUG a lot in the latter case, but nowadays
> > we try to abort the journal and go read-only.
> >
> > Without the initial message we do not know.
>
> Can I add a "me too".....

No. Go away.

> First, I'm using data=journal - is that supposed to work in 2.6 yet?
>

I think so. It's much less tested than ordered mode, but some people have
beat upon it.

> I have a raid5 array across a bunch of SCSI drives and a separate scsi
> drive with boot, swap, and a journal partition.
> I have an ext3 filesystem on the raid5 array with an external journal
> on the journal partition.

oh. Good to hear that external journals still work.

> The raid5 was rebuilding a spare and I was pounding the filesystem
> over NFS using the SPEC SFS benchmark program (ofcourse the raid5
> rebuild killed the performance reported by SFS, but I expected that.
>
> Shortly after the rebuild finished, I got an ext3 error (see log
> below) and the journal aborted, and then nfsd Oopsed inside ext3.

> ...
> Aug 6 15:22:05 adams kernel: EXT3-fs error (device md1): ext3_add_entry: bad entry in directory #41
> 009295: rec_len is smaller than minimal - offset=0, inode=3265411686, rec_len=0, name_len=0

It looks like we had a block full of zeroes come back from the device
driver. I find it distinctly fishy how this happens so much with
ext3-on-md, and so little with ext3-on-just-a-disk.

> Aug 6 15:22:05 adams kernel: Remounting filesystem read-only
> Aug 6 15:22:05 adams kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000

Now that's an ext3 bug. Something like this...

 fs/jbd/transaction.c | 10 ++++++++--
 1 files changed, 8 insertions(+), 2 deletions(-)

diff -puN fs/jbd/transaction.c~ext3-aborted-journal-fix fs/jbd/transaction.c
--- 25/fs/jbd/transaction.c~ext3-aborted-journal-fix 2003-08-05 23:53:16.000000000 -0700
+++ 25-akpm/fs/jbd/transaction.c 2003-08-05 23:56:47.000000000 -0700
@@ -525,12 +525,18 @@ do_get_write_access(handle_t *handle, st
                         int force_copy, int *credits)
 {
         struct buffer_head *bh;
- transaction_t *transaction = handle->h_transaction;
- journal_t *journal = transaction->t_journal;
+ transaction_t *transaction;
+ journal_t *journal;
         int error;
         char *frozen_buffer = NULL;
         int need_copy = 0;
 
+ if (is_handle_aborted(handle))
+ return -EROFS;
+
+ transaction = handle->h_transaction;
+ journal = transaction->t_journal;
+
         jbd_debug(5, "buffer_head %p, force_copy %d\n", jh, force_copy);
 
         JBUFFER_TRACE(jh, "entry");

_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Aug 07 2003 - 22:00:32 EST