[PATCH] x86: atomic64_t should be 8 bytes aligned

From: Eric Dumazet
Date: Thu Jul 02 2009 - 18:09:18 EST


Linus Torvalds a écrit :
>
> On Thu, 2 Jul 2009, Eric Dumazet wrote:
>> Using a fixed initial value (instead of __atomic64_read()) is even faster,
>> it apparently permits cpu to use an appropriate bus transaction.
>
> Yeah, I guess it does a "read-for-write-ownership" and allows the thing to
> be done as a single cache transaction.
>
> If we read it first, it will first get the cacheline for shared-read, and
> then the cmpxchg8b will need to turn it from shared to exclusive.
>
> Of course, the _optimal_ situation would be if the cmpxchg8b didn't
> actually do the write at all when the value matches (and all cores could
> just keep it shared), but I guess that's not going to happen.
>
> Too bad there is no pure 8-byte read op. Using MMX has too many downsides.
>
> Btw, your numbers imply that for the atomic64_add_return(), we really
> would be much better off not reading the original value at all. Again, in
> that case, we really do want the "read-for-write-ownership" cache
> transaction, not a read.
>

I forgot to mention that if atomic64_t uses to cache lines, my test
program is 10x slower.

So we probably need an __attribute__((aligned(8)) on atomic64_t definition
as well, since alignof(atomic64_t) is 4, not 8 (!!!)

#include <stdio.h>

typedef struct {
unsigned long long counter;
} atomic64_t;

typedef struct {
unsigned long long __attribute__((aligned(8))) counter;
} atomic64_ta;

struct {
int a;
atomic64_t counter;
} s __attribute__((aligned(64)));

struct {
int a;
atomic64_ta counter;
} sa __attribute__((aligned(64)));

int main()
{
printf("alignof(atomic64_t)=%d\n", __alignof__(atomic64_t));
printf("alignof(atomic64_ta)=%d\n", __alignof__(atomic64_t));
printf("alignof(unsigned long long)=%d\n", __alignof__(unsigned long long));
printf("&s.counter=%p\n", &s.counter);
printf("&sa.counter=%p\n", &sa.counter);
return 0;
}

$ gcc -O2 -o test test.c ; ./test
alignof(atomic64_t)=4
alignof(atomic64_ta)=4
alignof(unsigned long long)=8
&s.counter=0x80496c4
&sa.counter=0x8049708
$ gcc -v
Using built-in specs.
Target: i686-pc-linux-gnu
Configured with: ../gcc-4.3.3/configure --enable-languages=c,c++ --prefix=/usr
Thread model: posix
gcc version 4.3.3 (GCC)


[PATCH] x86: atomic64_t should be 8 bytes aligned

LOCKED instructions on two cache lines are painful.
Make sure an atomic64_t is 8bytes aligned.

Signed-off-by: Eric Dumazet <eric.dumazet@xxxxxxxxx>

diff --git a/arch/x86/include/asm/atomic_32.h b/arch/x86/include/asm/atomic_32.h
index 2503d4e..1927a56 100644
--- a/arch/x86/include/asm/atomic_32.h
+++ b/arch/x86/include/asm/atomic_32.h
@@ -250,7 +250,7 @@ static inline int atomic_add_unless(atomic_t *v, int a, int u)
/* An 64bit atomic type */

typedef struct {
- unsigned long long counter;
+ unsigned long long __attribute__((__aligned__(8))) counter;
} atomic64_t;

#define ATOMIC64_INIT(val) { (val) }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/