Re: [PATCH] ext3: wait on all pending commits in ext3_sync_fs

From: Eric Sandeen
Date: Tue Jan 13 2009 - 17:14:35 EST


Jan Kara wrote:

> From 0a578ba1b56fe655570ee6dad41748863a120dbc Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@xxxxxxx>
> Date: Fri, 19 Dec 2008 00:05:34 +0100
> Subject: [PATCH] jbd: Fix return value of journal_start_commit()
>
> journal_start_commit() returns 1 if either a transaction is committing or the
> function has queued a transaction commit. But it returns 0 if we raced with
> somebody queueing the transaction commit as well. This resulted in
> ext3_sync_fs() not functioning correctly (description from Arthur Jones):
> In the case of a data=ordered umount with pending long symlinks which are
> delayed due to a long list of other I/O on the backing block device, this
> causes the buffer associated with the long symlinks to not be moved to the
> inode dirty list in the second phase of fsync_super. Then, before they can be
> dirtied again, kjournald exits, seeing the UMOUNT flag and the dirty pages are
> never written to the backing block device, causing long symlink corruption and
> exposing new or previously freed block data to userspace.

This looks sane to me, and it does fix the below testcase.

Care to formally propose it?

Thanks,
-Eric

> This can be reproduced with a script created by Eric Sandeen
> <sandeen@xxxxxxxxxx>:
>
> #!/bin/bash
>
> umount /mnt/test2
> mount /dev/sdb4 /mnt/test2
> rm -f /mnt/test2/*
> dd if=/dev/zero of=/mnt/test2/bigfile bs=1M count=512
> touch /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename
> ln -s /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename
> /mnt/test2/link
> umount /mnt/test2
> mount /dev/sdb4 /mnt/test2
> ls /mnt/test2/
>
> This patch fixes journal_start_commit() to always return 1 when there's
> a transaction committing or queued for commit.
>
> Signed-off-by: Jan Kara <jack@xxxxxxx>
> ---
> fs/jbd/journal.c | 17 +++++++++++------
> 1 files changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/fs/jbd/journal.c b/fs/jbd/journal.c
> index 9e4fa52..e79c078 100644
> --- a/fs/jbd/journal.c
> +++ b/fs/jbd/journal.c
> @@ -427,7 +427,7 @@ int __log_space_left(journal_t *journal)
> }
>
> /*
> - * Called under j_state_lock. Returns true if a transaction was started.
> + * Called under j_state_lock. Returns true if a transaction commit was started.
> */
> int __log_start_commit(journal_t *journal, tid_t target)
> {
> @@ -495,7 +495,8 @@ int journal_force_commit_nested(journal_t *journal)
>
> /*
> * Start a commit of the current running transaction (if any). Returns true
> - * if a transaction was started, and fills its tid in at *ptid
> + * if a transaction is going to be committed (or is currently already
> + * committing), and fills its tid in at *ptid
> */
> int journal_start_commit(journal_t *journal, tid_t *ptid)
> {
> @@ -505,15 +506,19 @@ int journal_start_commit(journal_t *journal, tid_t *ptid)
> if (journal->j_running_transaction) {
> tid_t tid = journal->j_running_transaction->t_tid;
>
> - ret = __log_start_commit(journal, tid);
> - if (ret && ptid)
> + __log_start_commit(journal, tid);
> + /* There's a running transaction and we've just made sure
> + * it's commit has been scheduled. */
> + if (ptid)
> *ptid = tid;
> - } else if (journal->j_committing_transaction && ptid) {
> + ret = 1;
> + } else if (journal->j_committing_transaction) {
> /*
> * If ext3_write_super() recently started a commit, then we
> * have to wait for completion of that transaction
> */
> - *ptid = journal->j_committing_transaction->t_tid;
> + if (ptid)
> + *ptid = journal->j_committing_transaction->t_tid;
> ret = 1;
> }
> spin_unlock(&journal->j_state_lock);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/