Re: [PATCH -next] jbd2: discard last transaction when commit block checksum broken in v2v3

From: Jan Kara
Date: Mon Nov 22 2021 - 13:22:11 EST


On Wed 29-09-21 11:55:28, Ye Bin wrote:
> Now, we meet an issue that commit block has broken checksum when cold reboot
> device, that lead to mount failed.
> The reason maybe only some sector store on disk, and then device power off.
> But we calculate checksum with whole logic block.The data stored on disk can
> only ensure the atomicity of sector level.
> Actually, we already replay previous transactions. We can just discard last
> transaction. As now, descriptor/revocation/commit/superblock has it's own
> checksum.
>
> Fixes:80b3767fbe15("jbd2: don't wipe the journal on a failed journal checksum")
> Signed-off-by: Ye Bin <yebin10@xxxxxxxxxx>

Thanks for the patch. It seems to have fallen through the cracks. Sorry for
that.

> ---
> fs/jbd2/journal.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index 35302bc192eb..a3dd7b757b3d 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -2080,7 +2080,7 @@ int jbd2_journal_load(journal_t *journal)
> if (jbd2_journal_recover(journal))
> goto recovery_error;
>
> - if (journal->j_failed_commit) {
> + if (journal->j_failed_commit && !jbd2_journal_has_csum_v2or3(journal)) {

I guess this decision somewhat questionable. If the failed commit was
indeed the last one, I guess loosing the last transaction as you suggest is
a sensible thing to do. However if the checksum failed somewhere in the
middle of the journal because of a bitflip or something like that, we
probably don't want to loose that many transactions and rather want to do
fsck and try to recover as much data as possible... What do others think?

Honza

> printk(KERN_ERR "JBD2: journal transaction %u on %s "
> "is corrupt.\n", journal->j_failed_commit,
> journal->j_devname);
> --
> 2.31.1
>
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR