[RFC PATCH v3 0/4] nolibc x86-64 string functions

From: Ammar Faizi
Date: Sat Sep 02 2023 - 09:35:22 EST


Hi Willy,

This is an RFC patchset v3 for nolibc x86-64 string functions.

There are 4 patches in this series:

## Patch 1-2: Use `rep movsb`, `rep stosb` for:
- memcpy() and memmove()
- memset()
respectively. They can simplify the generated ASM code.

Patch 3 and 4 are not related, just a small cleanup.

## Patch 3: Remove the `_nolibc_memcpy_down()` function
This nolibc internal function is not used. Delete it. It was probably
supposed to handle memmove(), but today the memmove() has its own
implementation.

## Patch 4: Remove the `_nolibc_memcpy_up()` function
This function is only called by memcpy(), there is no real reason to
have this wrapper. Delete this function and move the code to memcpy()
directly.

Before this series:
```
00000000004013aa <memmove>:
4013aa: f3 0f 1e fa endbr64
4013ae: 48 39 f7 cmpq %rsi,%rdi
4013b1: 48 c7 c1 ff ff ff ff movq $0xffffffffffffffff,%rcx
4013b8: 48 89 f8 movq %rdi,%rax
4013bb: 48 0f 43 ca cmovaeq %rdx,%rcx
4013bf: 48 19 ff sbbq %rdi,%rdi
4013c2: 83 e7 02 andl $0x2,%edi
4013c5: 48 ff cf decq %rdi
4013c8: 48 85 d2 testq %rdx,%rdx
4013cb: 74 10 je 4013dd <memmove+0x33>
4013cd: 48 01 f9 addq %rdi,%rcx
4013d0: 48 ff ca decq %rdx
4013d3: 44 8a 04 0e movb (%rsi,%rcx,1),%r8b
4013d7: 44 88 04 08 movb %r8b,(%rax,%rcx,1)
4013db: eb eb jmp 4013c8 <memmove+0x1e>
4013dd: c3 retq

00000000004013de <memcpy>:
4013de: f3 0f 1e fa endbr64
4013e2: 48 89 f8 movq %rdi,%rax
4013e5: 31 c9 xorl %ecx,%ecx
4013e7: 48 39 ca cmpq %rcx,%rdx
4013ea: 74 0d je 4013f9 <memcpy+0x1b>
4013ec: 40 8a 3c 0e movb (%rsi,%rcx,1),%dil
4013f0: 40 88 3c 08 movb %dil,(%rax,%rcx,1)
4013f4: 48 ff c1 incq %rcx
4013f7: eb ee jmp 4013e7 <memcpy+0x9>
4013f9: c3 retq

00000000004013fa <memset>:
4013fa: f3 0f 1e fa endbr64
4013fe: 48 89 f8 movq %rdi,%rax
401401: 31 c9 xorl %ecx,%ecx
401403: 48 39 ca cmpq %rcx,%rdx
401406: 74 09 je 401411 <memset+0x17>
401408: 40 88 34 08 movb %sil,(%rax,%rcx,1)
40140c: 48 ff c1 incq %rcx
40140f: eb f2 jmp 401403 <memset+0x9>
401411: c3 retq
```

After this series:
```
// `memmove` is an alias for `memcpy`
000000000040149c <memcpy>:
40149c: 48 89 d1 movq %rdx,%rcx
40149f: 48 89 f8 movq %rdi,%rax
4014a2: 48 89 fa movq %rdi,%rdx
4014a5: 48 29 f2 subq %rsi,%rdx
4014a8: 48 39 ca cmpq %rcx,%rdx
4014ab: 72 03 jb 4014b0 <memcpy+0x14>
4014ad: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi)
4014af: c3 retq
4014b0: 48 8d 7c 0f ff leaq -0x1(%rdi,%rcx,1),%rdi
4014b5: 48 8d 74 0e ff leaq -0x1(%rsi,%rcx,1),%rsi
4014ba: fd std
4014bb: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi)
4014bd: fc cld
4014be: c3 retq

00000000004014bf <memset>:
4014bf: 96 xchgl %eax,%esi
4014c0: 48 89 d1 movq %rdx,%rcx
4014c3: 57 pushq %rdi
4014c4: f3 aa rep stosb %al,%es:(%rdi)
4014c6: 58 popq %rax
4014c7: c3 retq
```

## Changelog

Changes in v3:
- Make memmove as an alias for memcpy (Willy).
- Make the forward copy the likely case (Alviro).

Changes in v2:
- Shrink the memset code size:
- Use pushq %rdi / popq %rax (Alviro).
- Use xchg %eax, %esi (Willy).
- Drop the memcmp patch (need more pondering).
- Fix the broken memmove implementation (David).

Signed-off-by: Ammar Faizi <ammarfaizi2@xxxxxxxxxxx>
---

Ammar Faizi (4):
tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`
tools/nolibc: x86-64: Use `rep stosb` for `memset()`
tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function
tools/nolibc: string: Remove the `_nolibc_memcpy_up()` function

tools/include/nolibc/arch-x86_64.h | 42 ++++++++++++++++++++++++++++++
tools/include/nolibc/string.h | 36 +++++++++----------------
2 files changed, 55 insertions(+), 23 deletions(-)


base-commit: 3c9b7c4a228bf8cca2f92abb65575cdd54065302
--
Ammar Faizi