Re: [PATCH] RISC-V: Dynamically allocate cpumasks and further increase range and default value of NR_CPUS

From: Palmer Dabbelt
Date: Tue Sep 17 2024 - 10:23:28 EST

Next message: Russell King (Oracle): "Re: [PATCH net-next V4] net: phy: microchip_t1: SQI support for LAN887x"
Previous message: Masahiro Yamada: "[PATCH 23/23] kbuild: allow to start building external module in any directory"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, 05 Aug 2024 01:58:54 PDT (-0700), liuyuntao12@xxxxxxxxxx wrote:

Gentle ping

I think we just need to see some results for real hardware, as QEMU isn't meaningful for benchmarks like this. My guess is we're not going to have an answer for a while, RISC-V really isn't anywhere close to having systems of this complexity yet. So for now I think we should just leave the defaults alone, if hardware shows up where it makes sense to star changing things then we can take a look again.

On 2024/6/26 20:41, liuyuntao (F) wrote:

On 2024/6/25 19:44, liuyuntao (F) wrote:

On 2024/6/25 19:11, Andrew Jones wrote:

On Fri, Jun 14, 2024 at 07:53:06AM GMT, Yuntao Liu wrote:

Currently default NR_CPUS is 64 for riscv64, since the latest QEMU virt
machine supports up to 512 CPUS, so set default NR_CPUS 512 for
riscv64.

Under the promotion of RISC-V International and related chip
manufacturers, RISC-V has also begun to enter the server market, which
demands higher performance. Other major architectures (such as ARM64,
x86_64, MIPS, etc) already have a higher range, so further increase
this range up to 4096 for riscv64.

Due to the fact that increasing NR_CPUS enlarges the size of cpumasks,
there is a concern that this could significantly impact stack usage,
especially for code that allocates cpumasks on the stack. To address
this, we have the option to enable CPUMASK_OFFSTACK, which prevents
cpumasks from being allocated on the stack. we choose to enable this
feature only when NR_CPUS is greater than 512, why 512, since then
the kernel size with offstack is smaller.

This isn't the reason why Arm decided to start at 512, afaict. The
reason
for Arm was because hackbench did better with onstack for 256. What are
the hackbench results for riscv?

Okay, I will add the test results of hacktest soon.

Benchmark results using hackbench average over 5 runs of
./hackbench -s 512 -l 20 -g 10 -f 50 -P
on Qemu.

NR_CPUS     64      128     256     512     1024    2048
onstack/s   6.9992 6.6112 6.7834 6.6578 6.6646 6.8692
offstack/s 6.5616 6.95    6.5698 6.91    6.663   6.8202
difference -6.25% +5.12% -3.15% +3.79% -0.02% -0.71%

When there are more cores, the fluctuation is minimal, leading to the
speculation that the performance gap would be smaller with a higher
number of NR_CPUS.
Since I don't have a RISCV single-board computer, these are the results
I obtained from testing in QEMU, which may differ from the actual
situation. Perhaps someone could help with the testing.

Thanks,
Yuntao

vmlinux size comparison(difference to vmlinux_onstack_NR_CPUS
baseline):

NR_CPUS     256         512         1024        2048        4096
onstack     19814536    19840760    19880584    19969672    20141704
offstack    19819144    19840936    19880480    19968544    20135456
difference +0.023%     +0.001%     -0.001%     -0.001      -0.031%
is_smaller n           n           y           y           y

Since the savings are almost nothing we must not have too many global
cpumasks. But I'm in favor of ensuring stack depths stay under control,
so turning on CPUMASK_OFFSTACK sounds good to me in general.

Signed-off-by: Yuntao Liu <liuyuntao12@xxxxxxxxxx>
---
arch/riscv/Kconfig | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 0525ee2d63c7..5960713b3bf9 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -77,6 +77,7 @@ config RISCV
      select CLINT_TIMER if RISCV_M_MODE
      select CLONE_BACKWARDS
      select COMMON_CLK
+    select CPUMASK_OFFSTACK if NR_CPUS > 512
      select CPU_PM if CPU_IDLE || HIBERNATION || SUSPEND
      select EDAC_SUPPORT
      select FRAME_POINTER if PERF_EVENTS || (FUNCTION_TRACER &&
!DYNAMIC_FTRACE)
@@ -428,11 +429,11 @@ config SCHED_MC
config NR_CPUS
      int "Maximum number of CPUs (2-512)"
      depends on SMP
-    range 2 512 if !RISCV_SBI_V01
+    range 2 4096 if !RISCV_SBI_V01
      range 2 32 if RISCV_SBI_V01 && 32BIT
      range 2 64 if RISCV_SBI_V01 && 64BIT
      default "32" if 32BIT
-    default "64" if 64BIT
+    default "512" if 64BIT

This is somewhat reasonable, even if nothing is going to use this for
quite a while, since it'll help avoid bugs popping up when NR_CPUS gets
bumped later, but it feels excessive right now for riscv, so I'm a bit
on the fence about it. Maybe if hackbench doesn't show any issues we
could turn CPUMASK_OFFSTACK on for a smaller NR_CPUS and also select
a smaller default?

It seems that when NR_CPUS is larger, hackbench performs better, and
which NR_CPUS do you have a preference for?

Thanks,
drew

config HOTPLUG_CPU
      bool "Support for hot-pluggable CPUs"
--
2.34.1

_______________________________________________
linux-riscv mailing list
linux-riscv@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/linux-riscv

Next message: Russell King (Oracle): "Re: [PATCH net-next V4] net: phy: microchip_t1: SQI support for LAN887x"
Previous message: Masahiro Yamada: "[PATCH 23/23] kbuild: allow to start building external module in any directory"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]