Re: [PATCH 09/10] x86-32: use SSE for atomic64_read/set if available
From: Luca Barbieri
Date: Wed Feb 17 2010 - 19:41:22 EST
> I'm a bit unhappy about this patch. It seems to violate the assumption
> that we only ever use the FPU state guarded by
> kernel_fpu_begin()..kernel_fpu_end() and instead it uses a local hack,
> which seems like a recipe for all kinds of very subtle problems down the
> line.
kernel_fpu_begin saves the whole FPU state, but to use SSE we don't
really need that, since we can just save the %xmm registers we need,
which is much faster.
This is why SSE is used instead of just using an FPU double read.
We could however add a kernel_sse_begin_nosave/kernel_sse_end_nosave to do this.
> Unless the performance advantage is provably very compelling, I'm
> inclined to say that this is not worth it.
There is the advantage of not taking the cacheline for writing in atomic64_read.
Also locked cmpxchg8b is slow and if we were to restore the TS flag
lazily on userspace return, it would significantly improve the
function in all cases (with the current code, it depends on how fast
the architecture does clts/stts vs lock cmpxchg8b).
Of course the big-picture impact depends on the users of the interface.
Anyway, feel free to ignore this patch for now (and the next one as well).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/