Re: Kernel-4.14: With ubuntu-18.04 building rootfs images and booting gives SQUASHFS error: xz decompression failed, data probably corrupt

From: Pintu Agarwal
Date: Mon Nov 15 2021 - 01:06:43 EST


On Mon, 15 Nov 2021 at 00:40, Phillip Lougher <phillip@xxxxxxxxxxxxxxx> wrote:
>
> On 14/11/2021 07:06, Pintu Agarwal wrote:
> > + Adding squashfs-devel to get opinion from squashfs side.
> >
> > On Fri, 12 Nov 2021 at 12:48, Pintu Agarwal <pintu.ping@xxxxxxxxx> wrote:
> >>
> >> Hi,
> >>
> >> On Tue, 9 Nov 2021 at 21:04, Pintu Agarwal <pintu.ping@xxxxxxxxx> wrote:
> >>
> >>>>> We only get these squashfs errors flooded in the boot logs:
> >>>>> {{{
> >>>>> ....
> >>>>> [ 5.153479] device-mapper: init: dm-0 is ready
> >>>>> [ 5.334282] VFS: Mounted root (squashfs filesystem) readonly on device 253:0.
> >>>>> ....
> >>>>> [ 8.954120] SQUASHFS error: xz decompression failed, data probably corrupt
> >>>>> [ 8.954153] SQUASHFS error: squashfs_read_data failed to read block 0x1106
> >>>>> [ 8.970316] SQUASHFS error: Unable to read data cache entry [1106]
> >>>>> [ 8.970349] SQUASHFS error: Unable to read page, block 1106, size 776c
> >>>>> [ 8.980298] SQUASHFS error: Unable to read data cache entry [1106]
> >>>>> [ 8.981911] SQUASHFS error: Unable to read page, block 1106, size 776c
> >>>>> [ 8.988280] SQUASHFS error: Unable to read data cache entry [1106]
> >>>>> ....
> >>>>> }}}
> >>>>>
> >>
> >> One more observation:
> >> When I disable FEC flag in bootloader, I see the below error:
> >> [ 8.360791] device-mapper: verity: 253:0: data block 2 is corrupted
> >> [ 8.361134] device-mapper: verity: 253:0: data block 3 is corrupted
> >> [ 8.366016] SQUASHFS error: squashfs_read_data failed to read block 0x1106
> >> [ 8.379652] SQUASHFS error: Unable to read data cache entry [1106]
> >> [ 8.379680] SQUASHFS error: Unable to read page, block 1106, size 7770
> >>
> >> Also, now I see that the decompress error is gone, but the read error
> >> is still there.
> >>
> >> This seems to me that dm-verity detects some corrupted blocks but with
> >> FEC it auto corrects itself, how when dm-verity auto corrects itself,
> >> the squashfs decompression algorithm somehow could not understand it.
> >>
> >> So, it seems like there is some mis-match between the way FEC
> >> correction and the squashfs decompression happens ?
> >>
> >> Is this issue seen by anybody else here ?
> >>
> >
> > The squashfs version used by Kernel:
> > [ 0.355958] squashfs: version 4.0 (2009/01/31) Phillip Lougher
> >
> > The squashfs version available on Ubuntu:
> > mksquashfs version 4.3-git (2014/06/09)
> >
> > The squashfs version used by Yocto 2.6:
> > squashfs-tools/0001-squashfs-tools-Allow-setting-selinux-xattrs-through-.patch:61:
> > printf("mksquashfs version 4.3-git (2014/09/12)\n");
> >
> > We create dm-verity squashfs image using version 4.3 whereas, the
> > kernel uses 4.0 version to decompress it.
> > Is there something missing here?
> >
> > When FEC (Forward Error Correction) comes into picture, then squashfs
> > decompress fails.
> > When we remove FEC flag from dm-verity then decompress works but read
> > error still occurs.
> > This seems as if something is missing either in FEC handling or either
> > in squashfs decompress logic.
> >
> > Just wanted to know if there are any fixes already available in the
> > mainline for this ?
> >
> >
>
> As Squashfs maintainer I want you to stop randomly blaming anything and
> everything here. You won't fix anything doing that.
>
> In a previous email you stated
>
>
> >
> > One quick observation:
> > This issue is seen only when we enable dm-verity in our bootloader and
> > cross-building the bootloader/kernel (with Yocto 2.6 toolchain
> > arm-oe-linux-gnueabi-) on Ubuntu 18.04.
> > The issue is *NOT* seen (on the same device) when building the
> > dm-verity enabled kernel on Ubuntu 16.04.
> >
> > Is it something to do with the cross-toolchain difference between
> > Ubuntu 16 and 18 ?
> >
>
> If that is the case, then it is not an issue with Squashfs or any
> kernel code, it is a build time issue and *that* is where you should
> be concentrating your efforts. Find out what differences are there.
>
> You don't seem to understand that a Squashfs filesystem generated
> by any Mksquashfs 4.X is mountable *without* errors on any kernel
> since 2.6.29 (January 2009). Looking for mismatches between
> Mksquashfs and/or kernel version and blaming that for the above
> different behaviour is a complete waste of time.
>

I am sorry, but I am not blaming anybody here.
I am just trying to put my observation here and trying to understand
if someone else have seen a similar issue.
Toolchain side also, it seems the same as it comes from Yocto itself.

It seems there is some relation between dm-verity FEC correction and
squashfs decompression.
So I was looking for some clues from both sides.

Anyways, thank you so much for your suggestion.
Yes, we are analysing the Yocto side build difference as well between
Ubuntu 16 and 18.

Thank you!
Pintu