Re: Filesystem corruption MD (imsm) Raid0 via 2 SSD's + discard

From: Holger Kiehl
Date: Thu May 21 2015 - 02:44:37 EST


On Thu, 21 May 2015, NeilBrown wrote:

On Thu, 21 May 2015 01:32:13 +0500 Roman Mamedov <rm@xxxxxxxxxxx> wrote:

On Wed, 20 May 2015 20:12:31 +0000 (UTC)
Holger Kiehl <Holger.Kiehl@xxxxxx> wrote:

The kernel I was running when I discovered the
problem was 4.0.2 from kernel.org. However, after reinstalling from DVD
I updated to Fedora's lattest kernel, which was 3.19.? (I do not remember
the last numbers). So that kernel seems also effected, but I assume it
contains many 'fixes' from 4.0.x. As filesystem I use ext4, distribution
is Fedora 21 and hardware is: Xeon E3-1275, 16GB ECC Ram.

My system seems to be now running stable for some days with kernel.org
kernel 4.0.3 and with discard DISABLED. But I am still unsure what could
be the real cause.

It is a bug in the 4.0.2 kernel, fixed in 4.0.3.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=785672
https://bbs.archlinux.org/viewtopic.php?id=197400
https://kernel.googlesource.com/pub/scm/linux/kernel/git/stable/linux-stable/+/d2dc317d564a46dfc683978a2e5a4f91434e9711



I suspect that is a different bug.
I think this one is
https://bugzilla.kernel.org/show_bug.cgi?id=98501

Should there not be a big fat warning going around telling users to disable
discard on Raid 0 until this is fixed? This breaks the filesystem completely
and I believe there is absolutly no way one can get back the data.

Is this fixed in 4.0.4? And which kernels are effected? There could be many
people running systems that have not noticed this and don't know in what
dangerous situation they are when they delete data.

Regards,
Holger
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/