Hello,
I will try to take a look at this on Friday.
However, even if I manage to reproduce it on one of
the systems I have access to, I'm still not sure how
exactly we would root cause the issue.
Is it due to
select_idle_sibling() doing a little bit
more work?
Is it because we invoke test_idle_cores() a little
earlier, widening the race window with CPUs going idle,
causing select_idle_cpu to do a lot more work?
Is it a locality thing where random placement on any
core in the LLC is somehow better than placement on
the same core as "prev" when there is no idle core?
Is it tbench running
faster when the woken up task is
placed on the runqueue behind the current task on the
"target" cpu, even though that CPU isn't currently
idle, because tbench happens to go to sleep fast?
In other words, I'm
not quite sure whether this is
a tbench (and other similar benchmark) specific thing,
or a kernel thing, or what instrumentation we would
want in select_idle_sibling / select_idle_cpu for us
to root cause issues like this more easily in the
future...