Re: [PATCH 2/2] lib/string.c: Optimize memchr()

From: Andy Shevchenko
Date: Tue Jul 12 2022 - 11:15:58 EST


On Tue, Jul 12, 2022 at 4:58 PM Yu-Jen Chang <arthurchang09@xxxxxxxxx> wrote:
> Andrey Semashev <andrey.semashev@xxxxxxxxx> 於 2022年7月11日 週一 晚上11:00寫道:
> > On 7/11/22 17:52, Yu-Jen Chang wrote:
> > > Andrey Semashev <andrey.semashev@xxxxxxxxx> 於 2022年7月11日 週一 凌晨4:01寫道:
> > >> On 7/10/22 17:28, Yu-Jen Chang wrote:

...

> > >>> + for (; p <= end - 8; p += 8) {
> > >>> + val = *(u64 *)p ^ mask;
> > >>
> > >> What if p is not aligned to 8 (or 4 on 32-bit targets) bytes? Not all
> > >> targets support (efficient) unaligned loads, do they?
> > >
> > > I think it works if p is not aligned to 8 or 4 bytes.
> > >
> > > Let's say the string is 10 bytes. The for loop here will search the first
> > > 8 bytes. If the target character is in the last 2 bytes, the second for
> > > loop will find it. It also work like this on 32-bit machine.
> >
> > I think you're missing the point. Loads at unaligned addresses may not
> > be allowed by hardware using conventional load instructions or may be
> > inefficient. Given that this memchr implementation is used as a fallback
> > when no hardware-specific version is available, you should be
> > conservative wrt. hardware capabilities and behavior. You should
> > probably have a pre-alignment loop.
>
> Got it. I add pre-alignment loop. It aligns the address to 8 or 4bytes.

Still far from what can be accepted. Have you had a chance to read how
strscpy() is implemented? Do you understand why it's done that way?

> void *memchr(const void *p, int c, size_t length)
> {
> u64 mask, val;
> const void *end = p + length;
> c &= 0xff;

> while ((long ) p & (sizeof(long) - 1)) {
> if (p >= end)
> return NULL;
> if (*(unsigned char *)p == c)
> return (void *) p;
> p++;
> }
> if (p <= end - 8) {
> mask = c;
> MEMCHR_MASK_GEN(mask);
>
> for (; p <= end - 8; p += 8) {

Why you decided that this code will be run explicitly on 64-bit arch?

> val = *(u64*)p ^ mask;
> if ((val + 0xfefefefefefefeffull)
> & (~val & 0x8080808080808080ull))
> break;
> }
> }
>
> for (; p < end; p++)
> if (*(unsigned char *)p == c)
> return (void *)p;
>
> return NULL;
> }

--
With Best Regards,
Andy Shevchenko