This cannot be stated categorically without precise measurements ofIt can be nice to code an absolute worst-case microbenchmark too.
known-good, known-bad, average FPU usage and average CPU usage scenarios. All
these workloads have different characteristics.
I can imagine bad effects across all sorts of workloads: tcpbench, AIM7,
various lmbench components, X benchmarks, tiobench - you name it. Combined
with the fact that most micro-benchmarks wont be using the FPU, while in the
long run most processes will be using the FPU due to SIMM instructions. So
even a positive result might be skewed in practice. Has to be measured
carefully IMO - and i havent seen a _single_ performance measurement in the
submission mail. This is really essential.
Task migration can actually be very important to the point of being
almost a fastpath in some workloads where threads are oversubscribed to
CPUs and blocking on some contented resource (IO or mutex or whatever).
I suspect the main issues in that case is the actual context switching
and contention, but it would be nice to see just how much slower it
could get.