Re: raid6 badness

From: H. Peter Anvin
Date: Sat Jan 31 2004 - 00:55:47 EST


Andi Kleen wrote:
"H. Peter Anvin" <hpa@xxxxxxxxx> writes:

I don't know what would cause the stack to be misaligned, however.

x86-64 kernel doesn't guarantee the stack to be 16 byte aligned
(although it usually is). If you need 16 byte alignment you have to align yourself.


OK, that's unfortunate... per our discussion I really think this is a bug, since the compiler still does 16-byte alignment, and thus we're taking the cost without the benefit.

I'll send in the attached patch for now, but at some point I'd like to fix this. Unfortunately I still don't have an x86-64 machine that I can actually compile and install kernels on; I only have access to an x86-64 userspace, so I'm a bit limited in what I can test.

Michael: Perhaps you could apply this patch and test it out for me?

-hpa ===================================================================
RCS file: /home/hpa/kernel/bkcvs/linux-2.5/drivers/md/raid6x86.h,v
retrieving revision 1.3
diff -u -r1.3 raid6x86.h
--- linux-2.5/drivers/md/raid6x86.h 22 Jan 2004 16:15:09 -0000 1.3
+++ linux-2.5/drivers/md/raid6x86.h 31 Jan 2004 05:41:49 -0000
@@ -32,18 +32,20 @@
/* N.B.: For SSE we only save %xmm0-%xmm7 even for x86-64, since
the code doesn't know about the additional x86-64 registers */
typedef struct {
- unsigned int sarea[8*4];
- unsigned int cr0;
+ unsigned int sarea[8*4+2];
+ unsigned long cr0;
} raid6_sse_save_t __attribute__((aligned(16)));

/* This is for x86-64-specific code which uses all 16 XMM registers */
typedef struct {
- unsigned int sarea[16*4];
+ unsigned int sarea[16*4+2];
unsigned long cr0;
} raid6_sse16_save_t __attribute__((aligned(16)));

-/* On x86-64 the stack is 16-byte aligned */
-#define SAREA(x) (x->sarea)
+/* On x86-64 the stack *SHOULD* be 16-byte aligned, but currently this
+ is buggy in the kernel and it's only 8-byte aligned in places, so
+ we need to do this anyway. Sigh. */
+#define SAREA(x) ((unsigned int *)((((unsigned long)&(x)->sarea)+15) & ~15))

#else /* __i386__ */

@@ -60,6 +62,7 @@
unsigned long cr0;
} raid6_sse_save_t;

+/* Find the 16-byte aligned save area */
#define SAREA(x) ((unsigned int *)((((unsigned long)&(x)->sarea)+15) & ~15))

#endif