Re: [PATCH v4 00/14] arm64: Optimise and update memcpy, user copy and string routines

From: Will Deacon
Date: Mon Sep 07 2020 - 06:10:11 EST


Hi Oli,

Thanks for this. Just a few high-level comments below.

On Wed, Jul 01, 2020 at 09:12:49AM +0100, Oli Swede wrote:
> > Version 3 addressed this but I later found some issues with the fixup
> > correctness after further testing, and have partially re-written them
> > here, and addressed some other behaviours of the copy algorithm.

[...]

> I am waiting on access to the relevant machine before posting the benchmark
> results for this optimized memcpy, but Sam reported the following with the
> similar (but now slightly older) cortex-strings version:
> * copy_from_user: 13.17%
> * copy_to_user: 4.8%
> * memcpy: 27.88%
> * copy_in_user: Didn't appear in the test results.
> This machine will also be used to check the fixups are accurate on a system
> with UAO - they appear to be exact on a non-UAO system with PAN that I've
> been working on locally.

I'm inclined to say that cortex-strings is probably not a good basis for
our uaccess routines. The code needs to be adapted in a non-straightforward
way so that we lose pretty much all of the benefits we'd usually get from
adopted an existing implementation; we can't pull in fixes or improvements
without a lot of manual effort, we can't reuse existing testing infrastructure
(see below) and we end up being a "second-class" user of the routines
because of the discrepancies in implementation.

So why don't we use cortex-strings as a basis for the in-kernel routines
only, preferably in a form where the code can be used directly and updated
with a script (e.g. similar to how we pull in arch/arm64/crypto routines
from OpenSSL). We can then roll our own uaccess routines, using a slightly
more straight-forward implementation which is more amenable to handling
user faults and doesn't do things like over copying.

> I should also mention that the correctness of these routines were tested
> using a selftest test module akin to lib/test_user_copy.c (whose usercopy
> functionality checks these patches do pass) but which is more specific to
> the fixup accuracy, in that it compares the return value with the true
> number of bytes remaining in the destination buffer at the point of a fault.

Can we put this test module into the kernel source tree, please, maybe as
part of lkdtm? Given the control flow of these optimised functions, I think
we absolutely need targetted testing to make sure we're getting complete
coverage.

Will