Re: [PATCH] LoongArch: vDSO: Tune the chacha20 implementation
From: Jason A. Donenfeld
Date: Fri Sep 20 2024 - 11:11:21 EST
On Thu, Sep 19, 2024 at 05:13:59PM +0800, Xi Ruoyao wrote:
> As Christophe pointed out, tuning the chacha20 implementation by
> scheduling the instructions like what GCC does can improve the
> performance.
>
> The tuning does not introduce too much complexity (basically it's just
> reordering some instructions). And the tuning does not hurt readibility
> too much: actually the tuned code looks even more similar to a
> textbook-style implementation based on 128-bit vectors. So overall it's
> a good deal to me.
>
> Tested with vdso_test_getchacha and benched with vdso_test_getrandom.
> On a LA664 the speedup is 5%, and I expect a larger speedup on LA[2-4]64
> with a lower issue rate.
>
> Suggested-by: Christophe Leroy <christophe.leroy@xxxxxxxxxx>
> Link: https://lore.kernel.org/all/77655d9e-fc05-4300-8f0d-7b2ad840d091@xxxxxxxxxx/
> Signed-off-by: Xi Ruoyao <xry111@xxxxxxxxxxx>
That seems like a reasonable optimization to me. I'll queue it up in
random.git and send it in my pull next week.
Thanks.
Jason