Re: [PATCH 09/10] x86-32: use SSE for atomic64_read/set if available
From: Luca Barbieri
Date: Thu Feb 18 2010 - 13:42:20 EST
> We already do that kind of stuff, using
> kernel_fpu_begin()..kernel_fpu_end(). We went through some pain a bit
> ago to clean up "private hacks" that complicated things substantially.
But that saves the whole FPU state on the first usage, and also
triggers a fault when userspace attempts to use it again.
Additionally it does a clts/stts every time which is slow for small
algorithms (lke the atomic64 routines).
The first issue can be solved by using SSE and saving only the used
registers, and the second with lazy TS flag restoring.
How about something like:
static inline unsigned long kernel_sse_begin(void)
{
struct thread_info *me = current_thread_info();
preempt_disable();
if (unlikely(!(me->status & TS_USEDFPU))) {
unsigned long cr0 = read_cr0();
if (unlikely(cr0 & X86_CR0_TS)) {
clts();
return cr0;
}
}
return 0;
}
static inline void kernel_sse_end(unsigned long cr0)
{
if (unlikely(cr0))
write_cr0(cr0);
preempt_enable();
}
to be improved with lazy TS restoring instead of the read_cr0/write_cr0?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/