Re: PROBLEM: zstd bzImage decompression fails for some x86_32 config on 5.9-rc1

From: Feng Tang
Date: Tue Sep 29 2020 - 01:50:09 EST


On Tue, Sep 29, 2020 at 05:15:38AM +0000, Nick Terrell wrote:
>
>
> > On Sep 28, 2020, at 11:02 AM, Nick Terrell <terrelln@xxxxxx> wrote:
> >
> >
> >
> >> On Sep 28, 2020, at 1:55 AM, Feng Tang <feng.tang@xxxxxxxxx> wrote:
> >>
> >> Hi Nick,
> >>
> >> 0day has found some kernel decomprssion failure case since 5.9-rc1 (X86_32
> >> build), and it could be related with ZSTD code, though initially we bisected
> >> to some other commits.
> >>
> >> The error messages are:
> >> Decompressing Linux...
> >>
> >> ZSTD-compressed data is corrupt
> >>
> >> This could be reproduced by compiling the kernel with attached config,
> >> and use QEMU to boot it.
> >>
> >> We suspect it could be related with the kernel size, as we only see
> >> it on big kernel, and some more info are:
> >>
> >> * If we remove a lot of kernel config to build a much smaller kernel,
> >> it will boot fine
> >>
> >> * If we change the zstd algorithm from zstd22 to zstd19, the kernel will
> >> boot fine with below patch
> >>
> >> Please let me know if you need more info, and sorry for the late report
> >> as we just tracked down to this point.
> >
> > Thanks for the report, I will look into it today.
>
> CC: Petr Malat
>
> I’ve successfully reproduced, and found the issue. It turns out that this
> patch [0] from Petr Malat fixes the issue. As I mentioned in that thread, his
> fix corresponds to this upstream commit [1].

Glad to know there is already a fix.

> Can we get Petr's patch merged into v5.9?
>
> This bug only happens when the window size is > 8 MB. A non-kernel workaround
> would be to compress the kernel level 19 instead of level 22, which uses an
> 8 MB window size, instead of a 128 MB window size.
>
> The reason it only shows up for large kernels, is that the code is only buggy
> when an offset > 8 MB is used, so a kernel <= 8 MB can't trigger the bug.
>
> Best,
> Nick
>
> [0] https://lkml.org/lkml/2020/9/14/94

With this patch, all the failed cases on my side could boot fine.

Tested-by: Feng Tang <feng.tang@xxxxxxxxx>

Thanks,
Feng

> [1] https://github.com/facebook/zstd/commit/8a5c0c98ae5a7884694589d7a69bc99011add94d