Re: [PATCH 09/10] x86-32: use SSE for atomic64_read/set if available

From: H. Peter Anvin
Date: Wed Feb 17 2010 - 19:47:45 EST


On 02/17/2010 04:41 PM, Luca Barbieri wrote:
>> I'm a bit unhappy about this patch. It seems to violate the assumption
>> that we only ever use the FPU state guarded by
>> kernel_fpu_begin()..kernel_fpu_end() and instead it uses a local hack,
>> which seems like a recipe for all kinds of very subtle problems down the
>> line.
>
> kernel_fpu_begin saves the whole FPU state, but to use SSE we don't
> really need that, since we can just save the %xmm registers we need,
> which is much faster.
> This is why SSE is used instead of just using an FPU double read.
> We could however add a kernel_sse_begin_nosave/kernel_sse_end_nosave to do this.
>

We could, and that would definitely better than open-coding the operation.

>> Unless the performance advantage is provably very compelling, I'm
>> inclined to say that this is not worth it.
> There is the advantage of not taking the cacheline for writing in atomic64_read.
> Also locked cmpxchg8b is slow and if we were to restore the TS flag
> lazily on userspace return, it would significantly improve the
> function in all cases (with the current code, it depends on how fast
> the architecture does clts/stts vs lock cmpxchg8b).
> Of course the big-picture impact depends on the users of the interface.

It does, and I would prefer to not take it until there is a user of the
interface which motivates the performance. Ingo, do you have a feel for
how performance-critical this actually is?

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/