On Mon, Apr 23, 2018 at 05:41:14PM -0700, subhra mazumdar wrote:The only justification I have is the benchmarks I ran all most all
select_idle_core() can potentially search all cpus to find the fully idleSo this removes the whole core awareness from the wakeup path; this
core even if there is one such core. Removing this is necessary to achieve
scalability in the fast path.
needs far more justification.
In general running on pure cores is much faster than running on threads.
If you plot performance numbers there's almost always a fairly
significant drop in slope at the moment when we run out of cores and
start using threads.
Also, depending on cpu enumeration, your next patch might not even leaveAgain this doesn't matter for the benchmarks I ran. Most are happy to make
the core scanning for idle CPUs.
Now, typically on Intel systems, we first enumerate cores and then
siblings, but I've seen Intel systems that don't do this and enumerate
all threads together. Also other architectures are known to iterate full
cores together, both s390 and Power for example do this.
So by only doing a linear scan on CPU number you will actually fill
cores instead of equally spreading across cores. Worse still, by
limiting the scan to _4_ you only barely even get onto a next core for
SMT4 hardware, never mind SMT8.
Can we have a config or a way for enabling/disabling select_idle_core?
So while I'm not adverse to limiting the empty core search; I do feel it
is important to have. Overloading cores when you don't have to is not