Maybe something like:
------
extern inline void * __memcpy_c(void * to, const void * from, size_t n)
{
switch (n) {
case 0:
return to;
<maybe more code like in include/asm-i386/string.h>
}
switch (n % 4) {
case 0: return __memcpy_by4((d),(s),(count));
case 1: return __memcpy_g((d),(s),(count));
case 2: return __memcpy_by2((d),(s),(count));
case 3: return __memcpy_g((d),(s),(count));
}
}
#define __HAVE_ARCH_MEMCPY
#define memcpy(d,s,count)
(__builtin_constant_p(count) ? \
__memcpy_c((d),(s),(count)) : \
__memcpy_g((d),(s),(count)))
------
... and similar for memset. Remaining question is: does the compiler
optimize/remove unnecessary code?
>
> The rest looks ok, but I'll continue to consider string-i486 totally
> broken,
Werner