Re: [PATCH 2/3] arm64: lib: improve copy performance when size is ge 128 bytes

From: Will Deacon
Date: Tue Mar 23 2021 - 09:33:14 EST


On Tue, Mar 23, 2021 at 12:08:56PM +0000, Robin Murphy wrote:
> On 2021-03-23 07:34, Yang Yingliang wrote:
> > When copy over 128 bytes, src/dst is added after
> > each ldp/stp instruction, it will cost more time.
> > To improve this, we only add src/dst after load
> > or store 64 bytes.
>
> This breaks the required behaviour for copy_*_user(), since the fault
> handler expects the base address to be up-to-date at all times. Say you're
> copying 128 bytes and fault on the 4th store, it should return 80 bytes not
> copied; the code below would return 128 bytes not copied, even though 48
> bytes have actually been written to the destination.
>
> We've had a couple of tries at updating this code (because the whole
> template is frankly a bit terrible, and a long way from the well-optimised
> code it was derived from), but getting the fault-handling behaviour right
> without making the handler itself ludicrously complex has proven tricky. And
> then it got bumped down the priority list while the uaccess behaviour in
> general was in flux - now that the dust has largely settled on that I should
> probably try to find time to pick this up again...

I think the v5 from Oli was pretty close, but it didn't get any review:

https://lore.kernel.org/r/20200914151800.2270-1-oli.swede@xxxxxxx

he also included tests:

https://lore.kernel.org/r/20200916104636.19172-1-oli.swede@xxxxxxx

It would be great if you or somebody else has time to revive those!

Will