[PATCH v3 00/13] arm64: Optimise and update memcpy, user copy and string routines

From: Oliver Swede
Date: Thu May 14 2020 - 10:32:49 EST


Hi, I have been working on a fix for this branch. The initial
version of this patchset imported new optimizations from Linaro's
cortex-strings project, and converted the instructions to the
macro equivalents in the usercopy template, in order to allow
each of the helper usercopy functions to use expansions that
insert exception table entries for exclusively the instructions
that could potentially fault during a copy.

The 'cortex-strings' repository has since been renamed to
'optimized-routines' (this is public on GitHub at
https://github.com/ARM-software/optimized-routines),
and has been updated with further optimizations of various
library functions, and this includes a newer memcpy
implementation.

The address of page faults is exposed to the faulting instructions'
corresponding fixup code, and in v2 a fixup routine to correspond
to the previous copy algorithm was also implemented in a way that
utilized the fault address to try and efficiently calculate the
number of bytes that failed to copy. It returned the distance
between the fault and the end of the buffer, however this fixup
in [PATCH v2 2/14] had some issues due to the out-of-order nature
of the copy algorithm, and was flagged by the LTP testcase of the
preadv2() syscall (which makes multiple calls to copy_to_user).
This was due to preadv2() reporting SUCCESS for an invalid
destination address, NULL, where it expected EFAULT, because a
nonzero return value was calculated (indicating that some bytes
were copied) due to the fault not occurring at the start of the
buffer.

In this version I have imported the very latest optimized memcpy
routine, and re-written the fixup to use multiple routines that
encapsulate various properties of the algorithm (this is
explained in more detail in patches 11/13, 12/13, 13/13).
The aim is to return the exact number of bytes that haven't copied
when a fault occurs in copy_{to, in, from}_user, and to enable the
fixups to be modular so that they could be re-written without too
much trouble in the future if the copy algorithm was to be updated
again.

Initial testing indicates that the backtracking performed in the
fixup routines is accurate, and I am working on a separate
patchset containing more concise selftests that indirectly
call the usercopy functions via read()/write() - this will help
to ease the verification of expected behaviour.

I am going to post updated benchmark results, as the ~27% increase
in speed measured by Sam with the previous 'cortex-strings' memcpy
is no longer applicable due to the more recent replacement from
'optimized-routines', which should hopefully be even more efficient
and improve this further.

v1: https://lore.kernel.org/linux-arm-kernel/cover.1571073960.git.robin.murphy@xxxxxxx/
v2: https://lore.kernel.org/linux-arm-kernel/cover.1571421836.git.robin.murphy@xxxxxxx/

Changes since v2:

* Adds Robin's separate patch that fixes a compilation issue with KProbes fixup [1]
* Imports the most recent memcpy implementation by updating Sam's patch
(and moves this patch to occur after the cortex-strings importing so
that it's closer to the patches containing its corresponding fixups)
* Uses the stack to preserve the initial parameters
* Replaces the usercopy fixup routine in v2 with multiple longer
fixups that each make use of the fault address to return the exact
number of bytes that haven't yet copied.

[1] https://lore.kernel.org/linux-arm-kernel/e70f7b9de7e601b9e4a6fedad8eaf64d304b1637.1571326276.git.robin.murphy@xxxxxxx/

Many thanks,
Oliver

Oliver Swede (4):
arm64: Store the arguments to copy_*_user on the stack
arm64: Use additional memcpy macros and fixups
arm64: Add fixup routines for usercopy load exceptions
arm64: Add fixup routines for usercopy store exceptions

Robin Murphy (2):
arm64: Tidy up _asm_extable_faultaddr usage
arm64: kprobes: Drop open-coded exception fixup

Sam Tebbs (7):
arm64: Allow passing fault address to fixup handlers
arm64: Import latest optimization of memcpy
arm64: Import latest version of Cortex Strings' memcmp
arm64: Import latest version of Cortex Strings' memmove
arm64: Import latest version of Cortex Strings' strcmp
arm64: Import latest version of Cortex Strings' strlen
arm64: Import latest version of Cortex Strings' strncmp

arch/arm64/include/asm/alternative.h | 36 ---
arch/arm64/include/asm/assembler.h | 13 +
arch/arm64/include/asm/extable.h | 10 +-
arch/arm64/kernel/probes/kprobes.c | 7 -
arch/arm64/lib/copy_from_user.S | 272 +++++++++++++++++--
arch/arm64/lib/copy_in_user.S | 287 ++++++++++++++++++--
arch/arm64/lib/copy_template.S | 377 +++++++++++++++------------
arch/arm64/lib/copy_template_user.S | 50 ++++
arch/arm64/lib/copy_to_user.S | 273 +++++++++++++++++--
arch/arm64/lib/copy_user_fixup.S | 277 ++++++++++++++++++++
arch/arm64/lib/memcmp.S | 333 +++++++++--------------
arch/arm64/lib/memcpy.S | 128 +++++++--
arch/arm64/lib/memmove.S | 232 ++++++-----------
arch/arm64/lib/strcmp.S | 272 ++++++++-----------
arch/arm64/lib/strlen.S | 247 ++++++++++++------
arch/arm64/lib/strncmp.S | 363 ++++++++++++--------------
arch/arm64/mm/extable.c | 13 +-
arch/arm64/mm/fault.c | 2 +-
18 files changed, 2073 insertions(+), 1119 deletions(-)
create mode 100644 arch/arm64/lib/copy_template_user.S
create mode 100644 arch/arm64/lib/copy_user_fixup.S

--
2.17.1