[RFC PATCH v2 0/4] nolibc x86-64 string functions

From: Ammar Faizi
Date: Sat Sep 02 2023 - 01:55:52 EST


Hi Willy,

This is an RFC patchset v2 for nolibc x86-64 string functions.

Changes in v2:
- Shrink the memset code size:
- Use pushq %rdi / popq %rax (Alviro).
- Use xchg %eax, %esi (Willy).
- Drop the memcmp patch (need more pondering).
- Fix the broken memmove implementation (David).

There are 4 patches in this series:

## Patch 1-2: Use `rep movsb`, `rep stosb` for:
- memcpy() and memmove()
- memset()
respectively. They can simplify the generated ASM code.

Patch 3 and 4 are not related, just a small cleanup.

## Patch 3: Remove the `_nolibc_memcpy_down()` function
This nolibc internal function is not used. Delete it. It was probably
supposed to handle memmove(), but today the memmove() has its own
implementation.

## Patch 4: Remove the `_nolibc_memcpy_up()` function
This function is only called by memcpy(), there is no real reason to
have this wrapper. Delete this function and move the code to memcpy()
directly.

Before this series:
```
0000000000401058 <memmove>:
401058: 48 89 f8 movq %rdi,%rax
40105b: 31 c9 xorl %ecx,%ecx
40105d: 48 39 f7 cmpq %rsi,%rdi
401060: 48 83 d1 ff adcq $0xffffffffffffffff,%rcx
401064: 48 85 d2 testq %rdx,%rdx
401067: 74 25 je 40108e <memmove+0x36>
401069: 48 83 c9 01 orq $0x1,%rcx
40106d: 48 39 f0 cmpq %rsi,%rax
401070: 48 c7 c7 ff ff ff ff movq $0xffffffffffffffff,%rdi
401077: 48 0f 43 fa cmovaeq %rdx,%rdi
40107b: 48 01 cf addq %rcx,%rdi
40107e: 44 8a 04 3e movb (%rsi,%rdi,1),%r8b
401082: 44 88 04 38 movb %r8b,(%rax,%rdi,1)
401086: 48 01 cf addq %rcx,%rdi
401089: 48 ff ca decq %rdx
40108c: 75 f0 jne 40107e <memmove+0x26>
40108e: c3 retq

000000000040108f <memcpy>:
40108f: 48 89 f8 movq %rdi,%rax
401092: 48 85 d2 testq %rdx,%rdx
401095: 74 12 je 4010a9 <memcpy+0x1a>
401097: 31 c9 xorl %ecx,%ecx
401099: 40 8a 3c 0e movb (%rsi,%rcx,1),%dil
40109d: 40 88 3c 08 movb %dil,(%rax,%rcx,1)
4010a1: 48 ff c1 incq %rcx
4010a4: 48 39 ca cmpq %rcx,%rdx
4010a7: 75 f0 jne 401099 <memcpy+0xa>
4010a9: c3 retq

00000000004010aa <memset>:
4010aa: 48 89 f8 movq %rdi,%rax
4010ad: 48 85 d2 testq %rdx,%rdx
4010b0: 74 0e je 4010c0 <memset+0x16>
4010b2: 31 c9 xorl %ecx,%ecx
4010b4: 40 88 34 08 movb %sil,(%rax,%rcx,1)
4010b8: 48 ff c1 incq %rcx
4010bb: 48 39 ca cmpq %rcx,%rdx
4010be: 75 f4 jne 4010b4 <memset+0xa>
4010c0: c3 retq
```

After this series:
```
0000000000401040 <memmove>:
401040: 48 89 d1 movq %rdx,%rcx
401043: 48 89 fa movq %rdi,%rdx
401046: 48 89 f8 movq %rdi,%rax
401049: 48 29 f2 subq %rsi,%rdx
40104c: 48 39 ca cmpq %rcx,%rdx
40104f: 73 0f jae 401060 <memmove+0x20>
401051: 48 8d 7c 0f ff leaq -0x1(%rdi,%rcx,1),%rdi
401056: 48 8d 74 0e ff leaq -0x1(%rsi,%rcx,1),%rsi
40105b: fd std
40105c: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi)
40105e: fc cld
40105f: c3 retq
401060: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi)
401062: c3 retq

0000000000401063 <memcpy>:
401063: 48 89 f8 movq %rdi,%rax
401066: 48 89 d1 movq %rdx,%rcx
401069: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi)
40106b: c3 retq

000000000040106c <memset>:
40106c: 96 xchgl %eax,%esi
40106d: 48 89 d1 movq %rdx,%rcx
401070: 57 pushq %rdi
401071: f3 aa rep stosb %al,%es:(%rdi)
401073: 58 popq %rax
401074: c3 retq
```

Signed-off-by: Ammar Faizi <ammarfaizi2@xxxxxxxxxxx>
---

Ammar Faizi (4):
tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`
tools/nolibc: x86-64: Use `rep stosb` for `memset()`
tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function
tools/nolibc: string: Remove the `_nolibc_memcpy_up()` function

tools/include/nolibc/arch-x86_64.h | 48 ++++++++++++++++++++++++++++++
tools/include/nolibc/string.h | 36 ++++++++--------------
2 files changed, 61 insertions(+), 23 deletions(-)


base-commit: 3c9b7c4a228bf8cca2f92abb65575cdd54065302
--
Ammar Faizi