Re: ext3/ext4 filesystem corruption under post 5.1.0 kernels

From: Arthur Marsh
Date: Mon May 13 2019 - 03:48:19 EST




On 12 May 2019 7:36:59 am ACST, Theodore Ts'o <tytso@xxxxxxx> wrote:
>On Sat, May 11, 2019 at 02:43:16PM +0200, Richard Weinberger wrote:
>> [CC'in linux-ext4]
>>
>> On Sat, May 11, 2019 at 1:47 PM Arthur Marsh
>> <arthur.marsh@xxxxxxxxxxxxxxxx> wrote:
>> >
>> >
>> > The filesystem with the kernel source tree is the root file system,
>ext3, mounted as:
>> >
>> > /dev/sdb7 on / type ext3 (rw,relatime,errors=remount-ro)
>> >
>> > After the "Compressing objects" stage, the following appears in
>dmesg:
>> >
>> > [ 848.968550] EXT4-fs error (device sdb7): ext4_get_branch:171:
>inode #8: block 30343695: comm jbd2/sdb7-8: invalid block
>> > [ 849.077426] Aborting journal on device sdb7-8.
>> > [ 849.100963] EXT4-fs (sdb7): Remounting filesystem read-only
>> > [ 849.100976] jbd2_journal_bmap: journal block not found at offset
>989 on sdb7-8
>
>This indicates that the extent tree blocks for the journal was found
>to be corrupt; so the journal couldn't be found.
>
>> > # fsck -yv
>> > fsck from util-linux 2.33.1
>> > e2fsck 1.45.0 (6-Mar-2019)
>> > /dev/sdb7: recovering journal
>> > /dev/sdb7 contains a file system with errors, check forced.
>
>But e2fsck had no problem finding the journal.
>
>> > Pass 1: Checking inodes, blocks, and sizes
>> > Pass 2: Checking directory structure
>> > Pass 3: Checking directory connectivity
>> > Pass 4: Checking reference counts
>> > Pass 5: Checking group summary information
>> > Free blocks count wrong (4619656, counted=4619444).
>> > Fix? yes
>> >
>> > Free inodes count wrong (15884075, counted=15884058).
>> > Fix? yes
>
>And no other significant problems were found. (Ext4 never updates or
>relies on the summary number of free blocks and free inodes, since
>updating it is a scalability bottleneck and these values can be
>calculated from the per block group free block/inodes count. So the
>fact that e2fsck needed to update them is not an issue.)
>
>So that implies that we got one set of values when we read the journal
>inode when attempting to mount the file system, and a *different* set
>of values when e2fsck was run. Which makes means that we need
>consider the possibility that the problem is below the file system
>layer (e.g., the block layer, device drivers, etc.).
>
>
>> > /dev/sdb7: ***** FILE SYSTEM WAS MODIFIED *****
>> >
>> > Other times, I have gotten:
>> >
>> > "Inodes that were part of a corrupted orphan linked list found."
>> > "Block bitmap differences:"
>> > "Free blocks sound wrong for group"
>> >
>
>This variety of issues also implies that the issue may be in the data
>read by the file system, as opposed to an issue in the file system.
>
>Arthur, can you give us the full details of your hardware
>configuration and your kernel config file? Also, what kernel git
>commit ID were you testing?
>
> - Ted

Please see the attachments, this machine has a Pentium-D and PATA drive for the root filesystem and 32 bit kernel but I also experienced the same problem on an Athlon II X4 640 machine with SATA drive for the root filesystem and 64 bit kernel.

Arthur.

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Attachment: 20190513gitbisect.log
Description: Binary data

Attachment: 20190513.config
Description: Binary data

Attachment: 20190513dmesg.log
Description: Binary data

processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Pentium(R) D CPU 3.00GHz
stepping : 4
microcode : 0x6
cpu MHz : 2800.000
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fdiv_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pebs bts cpuid pni dtes64 monitor ds_cpl est cid cx16 xtpr
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips : 6029.07
clflush size : 64
cache_alignment : 128
address sizes : 36 bits physical, 48 bits virtual
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Pentium(R) D CPU 3.00GHz
stepping : 4
microcode : 0x6
cpu MHz : 2800.000
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fdiv_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pebs bts cpuid pni dtes64 monitor ds_cpl est cid cx16 xtpr
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips : 6029.07
clflush size : 64
cache_alignment : 128
address sizes : 36 bits physical, 48 bits virtual
power management: