Re: ext4 file system corruption with v4.19.3 / v4.19.4

From: Andrey Melnikov
Date: Wed Nov 28 2018 - 16:13:39 EST


ÑÑ, 28 ÐÐÑÐ. 2018 Ð. Ð 18:55, Rainer Fiebig <jrf@xxxxxxxxxxx>:
>
> Am Mittwoch, 28. November 2018, 13:02:56 schrieb Andrey Jr. Melnikov:
> > In gmane.comp.file-systems.ext4 Theodore Y. Ts'o <tytso@xxxxxxx> wrote:
> > > On Wed, Nov 28, 2018 at 03:16:33AM +0300, Andrey Jr. Melnikov wrote:
> > > > Corrupted inodes - always directory, not touched at least year or
> > > > more for writing. Something wrong when updating atime?
> > >
> > > We're not sure. The frustrating thing is that it's not reproducing
> > > for me. I run extensive regression tests, and I'm using 4.19 on my
> > > development laptop without notcing any problems. If I could reproduce
> > > it, I could debug it, but since I can't, I need to rely on those who
> > > are seeing the problem to help pinpoint the problem.
> >
> > My workstation hit this bug every time after boot. If you have an idea - I
> > may test it.
> >
> > > I'm trying to figure out common factors from those people who are
> > > reporting problems.
> > >
> > > (a) What distribution are you running (it appears that many people
> > > reporting problems are running Ubuntu, but this may be a sampling
> > > issue; lots of people run Ubuntu)? (For the record, I'm using Debian
> > > Testing.)
> >
> > Debian sid but self-build kernel from ubuntu mainline-ppa.
>
> You could try a vanilla 4.19.5 from https://www.kernel.org/
> and compile it with your current .config.

mainline-ppa use vanilla kernel. Patches only adds debian specific
build infrastructure.

> If you still see the errors, at least the Ubuntu-kernel could be ruled out.
>
> In addition, if you still see the errors:
>
> - backup your .config in a *different* folder (so that you can later re-use
> it)
> - do a "make mrproper" (deletes the .config, see above)
> - do a "make defconfig"
> - and compile the kernel with that new .config

defconfig is great - for abstract hardware in vacuum.

> If you still have the problem after that, you may want to learn how to bisect.
> ;)
I'm already know how-to bisect. From kernel 2.0 era. Without git ;)

This problem simply non-bisectable, when same kernel corrupt FS on my
workstation but normally working on other servers.
And now - FS corrupted again with disabled CONFIG_EXT4_ENCRYPTION. Great.

> So long!
>
> Rainer Fiebig
>
>
> >
> > > (b) What hardware are you using? (SSD? SATA-attached?
> > > NVMe-attached?)
> >
> > SATA HDD WDC WD20EZRZ-00Z5HB0.
> >
> > > (c) Are you using LVM? LUKS (e.g., disk encrypted)?
> >
> > No and no. Plain ext4.
> > -- cut --
> > debugfs: features
> > Filesystem features: has_journal ext_attr resize_inode dir_index filetype
> > needs_recovery extent 64bit flex_bg sparse_super large_file huge_file
> > dir_nlink extra_isize metadata_csum
> > -- cut --
> >
> > > (d) are you using discard? One theory is a recent discard change may
> > > be in play. How do you use discard? (mount option, fstrim, etc.)
> >
> > no
>
> --
> The truth always turns out to be simpler than you thought.
> Richard Feynman