It does, and I would prefer to not take it until there is a user of theUnless the performance advantage is provably very compelling, I'mThere is the advantage of not taking the cacheline for writing in atomic64_read.
inclined to say that this is not worth it.
Also locked cmpxchg8b is slow and if we were to restore the TS flag
lazily on userspace return, it would significantly improve the
function in all cases (with the current code, it depends on how fast
the architecture does clts/stts vs lock cmpxchg8b).
Of course the big-picture impact depends on the users of the interface.
interface which motivates the performance. Ingo, do you have a feel for
how performance-critical this actually is?