Re: [patch 1/13] Qsort
From: Willy Tarreau
Date: Sun Jan 23 2005 - 01:01:21 EST
Hi,
On Sun, Jan 23, 2005 at 03:03:32AM +0100, Felipe Alfaro Solana wrote:
> On 22 Jan 2005, at 22:00, vlobanov wrote:
> >#define SWAP(a, b, size) \
> > do { \
> > register size_t __size = (size); \
> > register char * __a = (a), * __b = (b); \
> > do { \
> > *__a ^= *__b; \
> > *__b ^= *__a; \
> > *__a ^= *__b; \
> > __a++; \
> > __b++; \
> > } while ((--__size) > 0); \
> > } while (0)
> >
> >What do you think? :)
>
> AFAIK, XOR is quite expensive on IA32 when compared to simple MOV
> operatings. Also, since the original patch uses 3 MOVs to perform the
> swapping, and your version uses 3 XOR operations, I don't see any
> gains.
It will even be worse because we are accessing memory, and most architectures
will not be able to use a memory reference for both operands of the XOR.
Basically, what will be generated will look like this :
tmp = *b
*a ^= tmp
tmp ^= *a
*b = tmp
*a ^= tmp
which is 5 cycles, or 4 if the two last instructions get merged. And there's
3 memory reads + 3 memory writes (assuming that the CPU will be smart enough
to reuse *a without accessing memory at instruction 3).
The move is quite faster :
tmp1 = *a
tmp2 = *b
*a = tmp2
*b = tmp1
This is 4 cycles on simple CPUs, or even 2 cycles on most of todays CPUs
which can do the first two fetches at once, and the last two writes at once.
And there are only two reads and two writes.
Clearly this one is better.
Regards,
Willy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/