Re: [Vserver] PROBLEM: Oops in log_do_checkpoint, using vserver

From: Jan Kara
Date: Thu Oct 21 2004 - 04:27:33 EST


Hello,

> * Herbert Poetzl (herbert@xxxxxxxxxxxx) wrote:
> > On Tue, Oct 19, 2004 at 06:01:00PM -0400, Stephen Frost wrote:
> > > Assertion failure in log_do_checkpoint() at fs/jbd/checkpoint.c:361:
> > > "drop_count != 0 || cleanup_ret != 0"
> >
> > you can split up this assertion into
> >
> > - drop_count != 0
> > - cleanup_ret != 0
> >
> > and fail on that (or just output those values
> > before you panic) ... this might give some
> > deeper insight into the issue ...
>
> Hmm, that's a good thought, though I have to say I'd really like to get
> a comment from the ext3 folks. This is also a production server, so I'd
> kind of like to minimize the downtime. :)
>
I've been looking through the code and I think there might be a
following race (but it looks unlikely):
Proc 1 Proc 2
log_do_checkpoint()
scans the list for buffers to flush
and flushes everything
scan again and throw out flushed buffers
lock_bh_state()
on the last buffer fails jbd_trylock_bh_state()
so we retry
unlock_bh_state()
lock_buffer()
scanning again but now buffer is buffer_locked()
so we cannot throw it out
mark_buffer_jbddirty()
unlock_buffer()
__cleanup_transaction() called
It finds nothing wrong with the buffer (and
there is only one) => return 0
So we have drop_count==0, cleanup_ret==0 => assertion failure

But in this case IMHO nothing bad happened so maybe the assertion is
just the problem but probably someone with more knowledge of this code should
decide (that's why I CC'd Andrew ;).

Honza

--
Jan Kara <jack@xxxxxxx>
SuSE CR Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/