Re: [PATCH v2 2/3] lib: zstd: Don't inline functions in zstd_opt.c
From: Geert Uytterhoeven
Date: Thu Nov 18 2021 - 02:53:19 EST
Hi Nick,
On Wed, Nov 17, 2021 at 9:08 PM Nick Terrell <nickrterrell@xxxxxxxxx> wrote:
> `zstd_opt.c` contains the match finder for the highest compression
> levels. These levels are already very slow, and are unlikely to be used
> in the kernel. If they are used, they shouldn't be used in latency
> sensitive workloads, so slowing them down shouldn't be a big deal.
>
> This saves 188 KB of the 288 KB regression reported by Geert Uytterhoeven [0].
> I've also opened an issue upstream [1] so that we can properly tackle
> the code size issue in `zstd_opt.c` for all users, and can hopefully
> remove this hack in the next zstd version we import.
>
> Bloat-o-meter output on x86-64:
>
> ```
> > ../scripts/bloat-o-meter vmlinux.old vmlinux
> add/remove: 6/5 grow/shrink: 1/9 up/down: 16673/-209939 (-193266)
> Function old new delta
> ZSTD_compressBlock_opt_generic.constprop - 7559 +7559
> ZSTD_insertBtAndGetAllMatches - 6304 +6304
> ZSTD_insertBt1 - 1731 +1731
> ZSTD_storeSeq - 693 +693
> ZSTD_BtGetAllMatches - 255 +255
> ZSTD_updateRep - 128 +128
> ZSTD_updateTree 96 99 +3
> ZSTD_insertAndFindFirstIndexHash3 81 - -81
> ZSTD_setBasePrices.constprop 98 - -98
> ZSTD_litLengthPrice.constprop 138 - -138
> ZSTD_count 362 181 -181
> ZSTD_count_2segments 1407 938 -469
> ZSTD_insertBt1.constprop 2689 - -2689
> ZSTD_compressBlock_btultra2 19990 423 -19567
> ZSTD_compressBlock_btultra 19633 15 -19618
> ZSTD_initStats_ultra 19825 - -19825
> ZSTD_compressBlock_btopt 20374 12 -20362
> ZSTD_compressBlock_btopt_extDict 29984 12 -29972
> ZSTD_compressBlock_btultra_extDict 30718 15 -30703
> ZSTD_compressBlock_btopt_dictMatchState 32689 12 -32677
> ZSTD_compressBlock_btultra_dictMatchState 33574 15 -33559
> Total: Before=6611828, After=6418562, chg -2.92%
> ```
>
> [0] https://lkml.org/lkml/2021/11/14/189
> [1] https://github.com/facebook/zstd/issues/2862
>
> Reported-by: Geert Uytterhoeven <geert@xxxxxxxxxxxxxx>
> Signed-off-by: Nick Terrell <terrelln@xxxxxx>
Thanks for your patch!
Impact on lib/zstd/zstd_compress.ko for atari_defconfig:
add/remove: 5/4 grow/shrink: 1/9 up/down: 15392/-167214 (-151822)
Nice!
Tested-by: Geert Uytterhoeven <geert@xxxxxxxxxxxxxx>
> --- a/lib/zstd/compress/zstd_opt.c
> +++ b/lib/zstd/compress/zstd_opt.c
> @@ -894,7 +906,7 @@ static void ZSTD_optLdm_processMatchCandidate(ZSTD_optLdm_t* optLdm, ZSTD_match_
> */
> U32 posOvershoot = currPosInBlock - optLdm->endPosInBlock;
> ZSTD_optLdm_skipRawSeqStoreBytes(&optLdm->seqStore, posOvershoot);
> - }
> + }
> ZSTD_opt_getNextMatchAndUpdateSeqStore(optLdm, currPosInBlock, remainingBytes);
> }
> ZSTD_optLdm_maybeAddMatch(matches, nbMatches, optLdm, currPosInBlock);
This change is unrelated. With that removed:
Reviewed-by: Geert Uytterhoeven <geert@xxxxxxxxxxxxxx>
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds