Re: [PATCH 0/2] Skip offline cores when enabling SMT on PowerPC

From: Nysal Jan K.A.
Date: Thu Jun 13 2024 - 23:55:17 EST


On Thu, Jun 13, 2024 at 09:34:10PM GMT, Michael Ellerman wrote:
> "Nysal Jan K.A." <nysal@xxxxxxxxxxxxx> writes:
> > From: "Nysal Jan K.A" <nysal@xxxxxxxxxxxxx>
> >
> > After the addition of HOTPLUG_SMT support for PowerPC [1] there was a
> > regression reported [2] when enabling SMT.
>
> This implies it was a kernel regression. But it can't be a kernel
> regression because previously there was no support at all for the sysfs
> interface on powerpc.
>
> IIUIC the regression was in the ppc64_cpu userspace tool, which switched
> to using the new kernel interface without taking into account the way it
> behaves.
>
> Or are you saying the kernel behaviour changed on x86 after the powerpc
> HOTPLUG_SMT was added?
>

The regression is in ppc64_cpu. If we need the older behaviour we will need this
or an equivalent change in the kernel though. Fixing it in userspace in an
efficient way might be difficult.

> > On a system with at least
> > one offline core, when enabling SMT, the expectation is that no CPUs
> > of offline cores are made online.
> >
> > On a POWER9 system with 4 cores in SMT4 mode:
> > $ ppc64_cpu --info
> > Core 0: 0* 1* 2* 3*
> > Core 1: 4* 5* 6* 7*
> > Core 2: 8* 9* 10* 11*
> > Core 3: 12* 13* 14* 15*
> >
> > Turn only one core on:
> > $ ppc64_cpu --cores-on=1
> > $ ppc64_cpu --info
> > Core 0: 0* 1* 2* 3*
> > Core 1: 4 5 6 7
> > Core 2: 8 9 10 11
> > Core 3: 12 13 14 15
> >
> > Change the SMT level to 2:
> > $ ppc64_cpu --smt=2
> > $ ppc64_cpu --info
> > Core 0: 0* 1* 2 3
> > Core 1: 4 5 6 7
> > Core 2: 8 9 10 11
> > Core 3: 12 13 14 15
> >
> > As expected we see only two CPUs of core 0 are online
> >
> > Change the SMT level to 4:
> > $ ppc64_cpu --smt=4
> > $ ppc64_cpu --info
> > Core 0: 0* 1* 2* 3*
> > Core 1: 4* 5* 6* 7*
> > Core 2: 8* 9* 10* 11*
> > Core 3: 12* 13* 14* 15*
> >
> > The CPUs of offline cores are made online. If a core is offline then
> > enabling SMT should not online CPUs of this core.
>
> That's the way the ppc64_cpu tool behaves, but it's not necessarily what
> other arches want.
>

True, but from a user perspective it seems logical though. I think one can make
a case for either behaviour.

> > An arch specific
> > function topology_is_core_online() is proposed to address this.
> > Another approach is to check the topology_sibling_cpumask() for any
> > online siblings. This avoids the need for an arch specific function
> > but is less efficient and more importantly this introduces a change
> > in existing behaviour on other architectures.
>
> It's only x86 and powerpc right?
>
> Having different behaviour on the only two arches that support the
> interface does not seem like a good result.
>

Agree, I was originally thinking of sending out a patch changing this for both
architectures, but was unsure if there might be users who now depend on this
behaviour on x86.

> > What is the expected behaviour on x86 when enabling SMT and certain cores
> > are offline?
>
> AFAIK no one really touches SMT on x86 other than to turn it off for
> security reasons.
>
> cheers
>

Thanks for your comments. It will be good to hear if changing this behaviour
for both x86 and PowerPC might be an acceptable path forward.

Regards
--Nysal

> > [1] https://lore.kernel.org/lkml/20230705145143.40545-1-ldufour@xxxxxxxxxxxxx/
> > [2] https://groups.google.com/g/powerpc-utils-devel/c/wrwVzAAnRlI/m/5KJSoqP4BAAJ
> >
> > Nysal Jan K.A (2):
> > cpu/SMT: Enable SMT only if a core is online
> > powerpc/topology: Check if a core is online
> >
> > arch/powerpc/include/asm/topology.h | 13 +++++++++++++
> > kernel/cpu.c | 12 +++++++++++-
> > 2 files changed, 24 insertions(+), 1 deletion(-)
> >
> >
> > base-commit: c760b3725e52403dc1b28644fb09c47a83cacea6
> > --
> > 2.35.3