Re: Linux 2.4.30-rc3 md/ext3 problems (ext3 gurus : please check)

From: Marcelo Tosatti
Date: Wed Mar 30 2005 - 12:03:27 EST


On Wed, Mar 30, 2005 at 02:06:39PM +1000, Neil Brown wrote:
> On Tuesday March 29, marcelo.tosatti@xxxxxxxxxxxx wrote:
> >
> > Attached is the backout patch, for convenience.
>
> Thanks. I had another look, and think I may be able to see the
> problem. If I'm right, it is a problem with this patch.
>
> > diff -Nru a/fs/jbd/commit.c b/fs/jbd/commit.c
> > --- a/fs/jbd/commit.c 2005-03-29 18:50:55 -03:00
> > +++ b/fs/jbd/commit.c 2005-03-29 18:50:55 -03:00
> > @@ -92,7 +92,7 @@
> > struct buffer_head *wbuf[64];
> > int bufs;
> > int flags;
> > - int err = 0;
> > + int err;
> > unsigned long blocknr;
> > char *tagp = NULL;
> > journal_header_t *header;
> > @@ -299,8 +299,6 @@
> > spin_unlock(&journal_datalist_lock);
> > unlock_journal(journal);
> > wait_on_buffer(bh);
> > - if (unlikely(!buffer_uptodate(bh)))
> > - err = -EIO;
> > /* the journal_head may have been removed now */
> > lock_journal(journal);
> > goto write_out_data;
>
>
> I think the "!buffer_update(bh)" test is not safe at this point as,
> after the wait_on_buffer which could cause a schedule, the bh may
> no longer exist, or be for the same block. There doesn't seem to be
> any locking or refcounting that would keep it valid.
>
> Note the comment "the journal_head may have been removed now".
> If the journal_head is gone, the associated buffer_head is likely gone
> as well.

Seems to be possible, yes.

> I'm not certain that this is right, but it seems possible and would
> explain the symptoms. Maybe Stephen or Andrew could comments?

Andrew, Stephen?

> > --- a/mm/filemap.c 2005-03-29 18:50:55 -03:00
> > +++ b/mm/filemap.c 2005-03-29 18:50:55 -03:00
> > @@ -3261,12 +3261,7 @@
> > status = generic_osync_inode(inode, OSYNC_METADATA|OSYNC_DATA);
> > }
> >
> > - /*
> > - * generic_osync_inode always returns 0 or negative value.
> > - * So 'status < written' is always true, and written should
> > - * be returned if status >= 0.
> > - */
> > - err = (status < 0) ? status : written;
> > + err = written ? written : status;
> > out:
> >
> > return err;
>
> As an aside, this looks extremely dubious to me.
>
> There is a loop earlier in this routine (do_generic_file_write) that
> passes a piece-at-a-time of the write request to prepare_write /
> commit_write.
> Successes are counted in 'written'. A failure causes the loop to
> abort with a status in 'status'.
>
> If some of the write succeeded and some failed, then I believe the
> correct behaviour is to return the number of bytes that succeeded.
> However this change to the return status (remember the above patch is
> a reversal) causes any failure to over-ride any success. This, I
> think, is wrong.

Yeap, that part also looks wrong.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/