Re: [PATCH v3 1/3] riscv: optimized memcpy
From: Matteo Croce
Date: Tue Jun 22 2021 - 19:36:39 EST
On Tue, Jun 22, 2021 at 2:15 AM Nick Kossifidis <mick@xxxxxxxxxxxx> wrote:
>
> Hello Matteo and thanks for the patch,
>
> Στις 2021-06-17 18:27, Matteo Croce έγραψε:
> >
> > @@ -0,0 +1,91 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * String functions optimized for hardware which doesn't
> > + * handle unaligned memory accesses efficiently.
> > + *
> > + * Copyright (C) 2021 Matteo Croce
> > + */
> > +
> > +#include <linux/types.h>
> > +#include <linux/module.h>
> > +
> > +/* Minimum size for a word copy to be convenient */
> > +#define MIN_THRESHOLD (BITS_PER_LONG / 8 * 2)
> > +
> > +/* convenience union to avoid cast between different pointer types */
> > +union types {
> > + u8 *u8;
>
> You are using a type as a name, I'd go with as_bytes/as_ulong/as_uptr
> which makes it easier for the reader to understand what you are trying
> to do.
>
Makes sense
> > + unsigned long *ulong;
> > + uintptr_t uptr;
> > +};
> > +
> > +union const_types {
> > + const u8 *u8;
> > + unsigned long *ulong;
> > +};
> > +
>
> I suggest you define those unions inside the function body, no one else
> is using them.
>
They will be used in memset(), in patch 3/3
> > +void *__memcpy(void *dest, const void *src, size_t count)
> > +{
> > + const int bytes_long = BITS_PER_LONG / 8;
> > +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> > + const int mask = bytes_long - 1;
> > + const int distance = (src - dest) & mask;
>
> Why not unsigned ints ?
>
Ok.
> > +#endif
> > + union const_types s = { .u8 = src };
> > + union types d = { .u8 = dest };
> > +
> > +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>
> If you want to be compliant with memcpy you should check for overlapping
> regions here since "The memory areas must not overlap", and do nothing
> about it because according to POSIX this leads to undefined behavior.
> That's why recent libc implementations use memmove in any case (memcpy
> is an alias to memmove), which is the suggested approach.
>
Mmm which memcpy arch implementation does this check?
I guess that noone is currently doing it.
> > + if (count < MIN_THRESHOLD)
> > + goto copy_remainder;
> > +
> > + /* copy a byte at time until destination is aligned */
> > + for (; count && d.uptr & mask; count--)
> > + *d.u8++ = *s.u8++;
> > +
>
> You should check for !IS_ENABLED(CONFIG_CPU_BIG_ENDIAN) here.
>
I tought that only Little Endian RISC-V machines were supported in Linux.
Should I add a BUILD_BUG_ON()?
Anyway, if this is going in generic lib/, I'll take care of the endianness.
> > + if (distance) {
> > + unsigned long last, next;
> > +
> > + /* move s backward to the previous alignment boundary */
> > + s.u8 -= distance;
>
> It'd help here to explain that since s is distance bytes ahead relative
> to d, and d reached the alignment boundary above, s is now aligned but
> the data needs to be shifted to compensate for distance, in order to do
> word-by-word copy.
>
> > +
> > + /* 32/64 bit wide copy from s to d.
> > + * d is aligned now but s is not, so read s alignment wise,
> > + * and do proper shift to get the right value.
> > + * Works only on Little Endian machines.
> > + */
>
> This commend is misleading because s is aligned or else s.ulong[0]/[1]
> below would result an unaligned access.
>
Yes, those two comments should be rephrased, merged and put above.
> > + for (next = s.ulong[0]; count >= bytes_long + mask; count -=
> > bytes_long) {
> > + last = next;
> > + next = s.ulong[1];
> > +
> > + d.ulong[0] = last >> (distance * 8) |
> > + next << ((bytes_long - distance) * 8);
> > +
> > + d.ulong++;
> > + s.ulong++;
> > + }
> > +
> > + /* restore s with the original offset */
> > + s.u8 += distance;
> > + } else
> > +#endif
> > + {
> > + /* if the source and dest lower bits are the same, do a simple
> > + * 32/64 bit wide copy.
> > + */
>
> A while() loop would make more sense here.
>
Ok.
> > + for (; count >= bytes_long; count -= bytes_long)
> > + *d.ulong++ = *s.ulong++;
> > + }
> > +
> > + /* suppress warning when CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y */
> > + goto copy_remainder;
> > +
> > +copy_remainder:
> > + while (count--)
> > + *d.u8++ = *s.u8++;
> > +
> > + return dest;
> > +}
> > +EXPORT_SYMBOL(__memcpy);
> > +
> > +void *memcpy(void *dest, const void *src, size_t count) __weak
> > __alias(__memcpy);
> > +EXPORT_SYMBOL(memcpy);
Regards,
--
per aspera ad upstream