Re: [PATCH] powerpc/pseries/cpuhp: respect current SMT when adding new CPU

From: Laurent Dufour
Date: Fri Mar 31 2023 - 11:11:42 EST


On 30/03/2023 18:19:38, Michal Suchánek wrote:
> On Thu, Mar 30, 2023 at 05:51:57PM +0200, Laurent Dufour wrote:
>> On 13/02/2023 16:40:50, Nathan Lynch wrote:
>>> Michal Suchánek <msuchanek@xxxxxxx> writes:
>>>> On Mon, Feb 13, 2023 at 08:46:50AM -0600, Nathan Lynch wrote:
>>>>> Laurent Dufour <ldufour@xxxxxxxxxxxxx> writes:
>>>>>> When a new CPU is added, the kernel is activating all its threads. This
>>>>>> leads to weird, but functional, result when adding CPU on a SMT 4 system
>>>>>> for instance.
>>>>>>
>>>>>> Here the newly added CPU 1 has 8 threads while the other one has 4 threads
>>>>>> active (system has been booted with the 'smt-enabled=4' kernel option):
>>>>>>
>>>>>> ltcden3-lp12:~ # ppc64_cpu --info
>>>>>> Core 0: 0* 1* 2* 3* 4 5 6 7
>>>>>> Core 1: 8* 9* 10* 11* 12* 13* 14* 15*
>>>>>>
>>>>>> There is no SMT value in the kernel. It is possible to run unbalanced LPAR
>>>>>> with 2 threads for a CPU, 4 for another one, and 5 on the latest.
>
>> Indeed, that's not so easy. There are multiple ways for the SMT level to be
>> impacted:
>> - smt-enabled kernel option
>> - smtstate systemctl service (if activated), saving SMT level at shutdown
>> time to restore it a boot time
>> - pseries-energyd daemon (if activated) could turn off threads
>> - ppc64_cpu --smt=x user command
>> - sysfs direct writing to turn off/on specific threads.
>>
>> There is no SMT level saved, on "disk" or in the kernel, and any of these
>> options can interact in parallel. So from the user space point of view, the
>> best we could do is looking for the SMT current values, there could be
>> multiple values in the case of a mixed SMT state, peek one value and apply it.
>>
>> Extending the drmgr's hook is still valid, and I sent a patch series on the
>> powerpc-utils mailing list to achieve that. However, changing the SMT level
>> in that hook means that newly added CPU will be first turn on and there is
>> a window where this threads could be seen active. Not a big deal but not
>> turning on these extra threads looks better to me.
>
> Which means
>
> 1) add an option to not onlince hotplugged CPUs by default

After discussing this with Srikar, it happens that exposing the smt-enabled
value set a boot time (or not) in SYS FS and to update it when SMT level is
changed using ppc64_cpu will be better. This will aslo allow the energy
daemon to take this value in account.

> 2) when a tool that wants to manage CPU onlining is active it can set
> the option so that no threads are onlined automatically, and online the
> desired threads

When new CPU are added, the kernel will automatically online the right
number of threads based on that recorded SMT level.

>
> 3) when no such tool is active the default should be to online all
> threeads to preserve compatibility with existing behavior

I don't think we should keep the existing behavior, customers are confused
and some user space tools like lparstart have difficulties to handled mixed
SMT levels.

I'll submit a new series exposing the SMT level and propose a patch for
ppc64_cpu too.

>
>> That's being said, I can't see any benefit of a user space implementation
>> compared to the option I'm proposing in that patch.
>
> The userspace implementation can implement arbitrily complex policy,
> that's not something that belongs into the kernel.
>
> Thanks
>
> Michal