Re: ext3/ext4 filesystem corruption under post 5.1.0 kernels

From: Theodore Ts'o
Date: Sat May 11 2019 - 18:08:42 EST


On Sat, May 11, 2019 at 02:43:16PM +0200, Richard Weinberger wrote:
> [CC'in linux-ext4]
>
> On Sat, May 11, 2019 at 1:47 PM Arthur Marsh
> <arthur.marsh@xxxxxxxxxxxxxxxx> wrote:
> >
> >
> > The filesystem with the kernel source tree is the root file system, ext3, mounted as:
> >
> > /dev/sdb7 on / type ext3 (rw,relatime,errors=remount-ro)
> >
> > After the "Compressing objects" stage, the following appears in dmesg:
> >
> > [ 848.968550] EXT4-fs error (device sdb7): ext4_get_branch:171: inode #8: block 30343695: comm jbd2/sdb7-8: invalid block
> > [ 849.077426] Aborting journal on device sdb7-8.
> > [ 849.100963] EXT4-fs (sdb7): Remounting filesystem read-only
> > [ 849.100976] jbd2_journal_bmap: journal block not found at offset 989 on sdb7-8

This indicates that the extent tree blocks for the journal was found
to be corrupt; so the journal couldn't be found.

> > # fsck -yv
> > fsck from util-linux 2.33.1
> > e2fsck 1.45.0 (6-Mar-2019)
> > /dev/sdb7: recovering journal
> > /dev/sdb7 contains a file system with errors, check forced.

But e2fsck had no problem finding the journal.

> > Pass 1: Checking inodes, blocks, and sizes
> > Pass 2: Checking directory structure
> > Pass 3: Checking directory connectivity
> > Pass 4: Checking reference counts
> > Pass 5: Checking group summary information
> > Free blocks count wrong (4619656, counted=4619444).
> > Fix? yes
> >
> > Free inodes count wrong (15884075, counted=15884058).
> > Fix? yes

And no other significant problems were found. (Ext4 never updates or
relies on the summary number of free blocks and free inodes, since
updating it is a scalability bottleneck and these values can be
calculated from the per block group free block/inodes count. So the
fact that e2fsck needed to update them is not an issue.)

So that implies that we got one set of values when we read the journal
inode when attempting to mount the file system, and a *different* set
of values when e2fsck was run. Which makes means that we need
consider the possibility that the problem is below the file system
layer (e.g., the block layer, device drivers, etc.).


> > /dev/sdb7: ***** FILE SYSTEM WAS MODIFIED *****
> >
> > Other times, I have gotten:
> >
> > "Inodes that were part of a corrupted orphan linked list found."
> > "Block bitmap differences:"
> > "Free blocks sound wrong for group"
> >

This variety of issues also implies that the issue may be in the data
read by the file system, as opposed to an issue in the file system.

Arthur, can you give us the full details of your hardware
configuration and your kernel config file? Also, what kernel git
commit ID were you testing?

- Ted