Re: ext3/ext4 filesystem corruption under post 5.1.0 kernels

From: Geert Uytterhoeven
Date: Fri May 17 2019 - 05:25:50 EST


Hi Ted,

On Sun, May 12, 2019 at 12:07 AM Theodore Ts'o <tytso@xxxxxxx> wrote:
> On Sat, May 11, 2019 at 02:43:16PM +0200, Richard Weinberger wrote:
> > [CC'in linux-ext4]
> >
> > On Sat, May 11, 2019 at 1:47 PM Arthur Marsh
> > <arthur.marsh@xxxxxxxxxxxxxxxx> wrote:
> > >
> > >
> > > The filesystem with the kernel source tree is the root file system, ext3, mounted as:
> > >
> > > /dev/sdb7 on / type ext3 (rw,relatime,errors=remount-ro)
> > >
> > > After the "Compressing objects" stage, the following appears in dmesg:
> > >
> > > [ 848.968550] EXT4-fs error (device sdb7): ext4_get_branch:171: inode #8: block 30343695: comm jbd2/sdb7-8: invalid block
> > > [ 849.077426] Aborting journal on device sdb7-8.
> > > [ 849.100963] EXT4-fs (sdb7): Remounting filesystem read-only
> > > [ 849.100976] jbd2_journal_bmap: journal block not found at offset 989 on sdb7-8
>
> This indicates that the extent tree blocks for the journal was found
> to be corrupt; so the journal couldn't be found.
>
> > > # fsck -yv
> > > fsck from util-linux 2.33.1
> > > e2fsck 1.45.0 (6-Mar-2019)
> > > /dev/sdb7: recovering journal
> > > /dev/sdb7 contains a file system with errors, check forced.
>
> But e2fsck had no problem finding the journal.
>
> > > Pass 1: Checking inodes, blocks, and sizes
> > > Pass 2: Checking directory structure
> > > Pass 3: Checking directory connectivity
> > > Pass 4: Checking reference counts
> > > Pass 5: Checking group summary information
> > > Free blocks count wrong (4619656, counted=4619444).
> > > Fix? yes
> > >
> > > Free inodes count wrong (15884075, counted=15884058).
> > > Fix? yes
>
> And no other significant problems were found. (Ext4 never updates or
> relies on the summary number of free blocks and free inodes, since
> updating it is a scalability bottleneck and these values can be
> calculated from the per block group free block/inodes count. So the
> fact that e2fsck needed to update them is not an issue.)
>
> So that implies that we got one set of values when we read the journal
> inode when attempting to mount the file system, and a *different* set
> of values when e2fsck was run. Which makes means that we need
> consider the possibility that the problem is below the file system
> layer (e.g., the block layer, device drivers, etc.).
>
>
> > > /dev/sdb7: ***** FILE SYSTEM WAS MODIFIED *****
> > >
> > > Other times, I have gotten:
> > >
> > > "Inodes that were part of a corrupted orphan linked list found."
> > > "Block bitmap differences:"
> > > "Free blocks sound wrong for group"
> > >
>
> This variety of issues also implies that the issue may be in the data
> read by the file system, as opposed to an issue in the file system.
>
> Arthur, can you give us the full details of your hardware
> configuration and your kernel config file? Also, what kernel git
> commit ID were you testing?

I'm seeing similar things running post v5.1 on ARAnyM (Atari emulator):

EXT4-fs (sda1): mounting ext3 file system using the ext4 subsystem
...
EXT4-fs error (device sda1): ext4_get_branch:171: inode #1980:
block 27550: comm jbd2/sda1-1980: invalid block

and userspace hung somewhere during initial system startup, so I had to
kill the instance.

-----

EXT4-fs (sda1): mounting ext3 file system using the ext4 subsystem
EXT4-fs (sda1): INFO: recovery required on readonly filesystem
EXT4-fs (sda1): write access will be enabled during recovery
EXT4-fs warning (device sda1): ext4_clear_journal_err:5078:
Filesystem error recorded from previous mount: IO failure
EXT4-fs warning (device sda1): ext4_clear_journal_err:5079:
Marking fs in need of filesystem check.
EXT4-fs (sda1): recovery complete
EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
VFS: Mounted root (ext3 filesystem) readonly on device 8:1.
...
Run /sbin/init as init process
random: fast init done
EXT4-fs (sda1): re-mounted. Opts:
random: crng init done
EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro
EXT4-fs (sda1): error count since last fsck: 1
EXT4-fs (sda1): initial error at time 1557931133:
ext4_get_branch:171: inode 1980: block 27550
EXT4-fs (sda1): last error at time 1557931133:
ext4_get_branch:171: inode 1980: block 27550

-----

EXT4-fs (sda1): mounting ext3 file system using the ext4 subsystem
EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
VFS: Mounted root (ext3 filesystem) readonly on device 8:1.
...
Run /sbin/init as init process
random: fast init done
EXT4-fs (sda1): re-mounted. Opts:
EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro
random: crng init done
EXT4-fs error (device sda1): ext4_get_branch:171: inode #1980:
block 27550: comm jbd2/sda1-1980: invalid block
Aborting journal on device sda1-1980.
EXT4-fs (sda1): Remounting filesystem read-only
jbd2_journal_bmap: journal block not found at offset 426 on sda1-1980
EXT4-fs error (device sda1): ext4_journal_check_start:61: Detected
aborted journal
EXT4-fs (sda1): error count since last fsck: 3
EXT4-fs (sda1): initial error at time 1557931133:
ext4_get_branch:171: inode 1980: block 27550
EXT4-fs (sda1): last error at time 1558083596:
ext4_journal_check_start:61: inode 1980: block 27550
EXT4-fs error (device sda1): ext4_remount:5328: Abort forced by user

---

EXT4-fs (sda1): mounting ext3 file system using the ext4 subsystem
EXT4-fs (sda1): INFO: recovery required on readonly filesystem
EXT4-fs (sda1): write access will be enabled during recovery
random: fast init done
EXT4-fs warning (device sda1): ext4_clear_journal_err:5078:
Filesystem error recorded from previous mount: IO failure
EXT4-fs warning (device sda1): ext4_clear_journal_err:5079:
Marking fs in need of filesystem check.
EXT4-fs (sda1): recovery complete
EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
...
Run /sbin/init as init process
random: crng init done
EXT4-fs (sda1): re-mounted. Opts:
EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro
EXT4-fs (sda1): error count since last fsck: 4
EXT4-fs (sda1): initial error at time 1557931133:
ext4_get_branch:171: inode 1980: block 27550
EXT4-fs (sda1): last error at time 1558083665: ext4_remount:5328:
inode 1980: block 27550

Notes:
- It's always the same block,
- Block device is an image file, accessed using
arch/m68k/emu/nfblock.c, which did not receive any recent (bvec)
updates.
- There are no reported errors for the device containing the image
file on the host,
- Given Arthur sees the issue on a different class of machines, it's
unlikely the issue is related to a problem with the block device
(driver). It may still be an issue with the block layer, though,
- Both Arthur and I are mounting an ext3 file system using the ext4
subsystem.

Thanks!

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds