Re: [PATCH v4 0/2] Improving topology_span_sane

From: K Prateek Nayak
Date: Thu Mar 06 2025 - 01:47:19 EST


Hello Steve,

On 3/4/2025 9:38 PM, Steve Wahl wrote:
toplogy_span_sane() has an O(N^2) algorithm that takes an inordinate
amount of time on systems with a large number of cpus.

The first patch in this series replaces the algorithm used with a O(N)
method that should exactly duplicate the previous code's results.

The second patch simplifies the first, taking a similar amount of time
to run, but potentially has different results than previous code under
situations believed to not truly exist, like a CPU not being included
in its own span.

I've tested Patch 1 individually and the whole series as is on top of
tip:sched/core and I haven't run into any issues with the optimization
on my 3rd Generation EPYC system.

Please feel free to include:

Tested-by: K Prateek Nayak <kprateek.nayak@xxxxxxx>

--
Thanks and Regards,
Prateek


Version 1:
* Original patch

Version 2:

* Adopted simplifications from K Prateek Nayak,and fixed use of
num_possible_cpus().

Version 3:

* Undid the simplifications from version 2 when noticed that results
could differ from original code; kept num_possible_cpus() fix.

Version 4:

* Turned the patch into a series of 2, the second re-introduces the
simplifications, and includes further simplification suggested by
Valentin Schneider in the discussion for Version 2.

Steve Wahl (2):
sched/topology: improve topology_span_sane speed
sched/topology: Refinement to topology_span_sane speedup

kernel/sched/topology.c | 73 +++++++++++++++++++++++++++--------------
1 file changed, 48 insertions(+), 25 deletions(-)