I think the moving to another CPU gets really dependent on the CPU type. On a P4+HT the caches are shared, and moving costs almost nothing for cache hits, while on CPUs which have other cache layouts the migration cost is higher. Obviously multi-core should be cheaper than multi-socket, by avoiding using the system memory bus, but it still can get ugly.However, I fail to understand the goal of the reproducer. Granted it shows
irregularities in the scheduler under such conditions, but what *real*
workload would spend its time sequentially creating then immediately killing
threads, never using more than 2 at a time ?
If this could be turned into a DoS, I could understand, but here it looks
a bit pointless :-/
It seems generally unfortunate that it takes longer for a new thread to
move over to the second cpu even when the first is busy with the original
thread. I can certainly see cases where this causes suboptimal overall
system behaviour.