Re: PROBLEM: zstd bzImage decompression fails for some x86_32 config on 5.9-rc1

From: Sedat Dilek
Date: Sat Oct 03 2020 - 14:50:07 EST


On Tue, Sep 29, 2020 at 7:47 AM Feng Tang <feng.tang@xxxxxxxxx> wrote:
>
> On Tue, Sep 29, 2020 at 05:15:38AM +0000, Nick Terrell wrote:
> >
> >
> > > On Sep 28, 2020, at 11:02 AM, Nick Terrell <terrelln@xxxxxx> wrote:
> > >
> > >
> > >
> > >> On Sep 28, 2020, at 1:55 AM, Feng Tang <feng.tang@xxxxxxxxx> wrote:
> > >>
> > >> Hi Nick,
> > >>
> > >> 0day has found some kernel decomprssion failure case since 5.9-rc1 (X86_32
> > >> build), and it could be related with ZSTD code, though initially we bisected
> > >> to some other commits.
> > >>
> > >> The error messages are:
> > >> Decompressing Linux...
> > >>
> > >> ZSTD-compressed data is corrupt
> > >>
> > >> This could be reproduced by compiling the kernel with attached config,
> > >> and use QEMU to boot it.
> > >>
> > >> We suspect it could be related with the kernel size, as we only see
> > >> it on big kernel, and some more info are:
> > >>
> > >> * If we remove a lot of kernel config to build a much smaller kernel,
> > >> it will boot fine
> > >>
> > >> * If we change the zstd algorithm from zstd22 to zstd19, the kernel will
> > >> boot fine with below patch
> > >>
> > >> Please let me know if you need more info, and sorry for the late report
> > >> as we just tracked down to this point.
> > >
> > > Thanks for the report, I will look into it today.
> >
> > CC: Petr Malat
> >
> > I’ve successfully reproduced, and found the issue. It turns out that this
> > patch [0] from Petr Malat fixes the issue. As I mentioned in that thread, his
> > fix corresponds to this upstream commit [1].
>
> Glad to know there is already a fix.
>
> > Can we get Petr's patch merged into v5.9?
> >
> > This bug only happens when the window size is > 8 MB. A non-kernel workaround
> > would be to compress the kernel level 19 instead of level 22, which uses an
> > 8 MB window size, instead of a 128 MB window size.
> >
> > The reason it only shows up for large kernels, is that the code is only buggy
> > when an offset > 8 MB is used, so a kernel <= 8 MB can't trigger the bug.
> >
> > Best,
> > Nick
> >
> > [0] https://lkml.org/lkml/2020/9/14/94
>
> With this patch, all the failed cases on my side could boot fine.
>
> Tested-by: Feng Tang <feng.tang@xxxxxxxxx>
>

I applied this patch to see if it is OK with x86 64bit - Yes, it is.

Feel free to add my:

Tested-by: Sedat Dilek <sedat.dilek@xxxxxxxxx>

- Sedat -

> Thanks,
> Feng
>
> > [1] https://github.com/facebook/zstd/commit/8a5c0c98ae5a7884694589d7a69bc99011add94d
>
>