Re: [PATCH 0/2 v3] cpu hotplug: Preserve topology directory after soft remove event

From: Prarit Bhargava
Date: Mon Sep 26 2016 - 07:45:50 EST




On 09/22/2016 08:10 AM, Borislav Petkov wrote:
> On Thu, Sep 22, 2016 at 07:59:08AM -0400, Prarit Bhargava wrote:
>> System boots with (usually) with 2 threads/core. Some performance users want
>> one thread per core. Since there is no "noht" option anymore, users use /sys to
>> disable a thread on each core.
>
> I see.
>
>> core_siblings and thread_siblings are the online thread's sibling cores and
>> threads that are available to the scheduler
>
> Hmm, I see something else:
>
> <Documentation/cputopology.txt>:
> 7) /sys/devices/system/cpu/cpuX/topology/core_siblings:
>
> internal kernel map of cpuX's hardware threads within the same
> physical_package_id.

Thanks. I'll send a patch to modify the Documentation which is out-of-date.

>
>> and should be 0 when the thread is offline. That comes directly from
>> reading the code.
>
> But then code which reads those will have to *know* that those cores are
> offline - otherwise it would be confused by what it is reading there.

When offline, /sys/devices/system/cpuX/cpu/online is 0. The problem is that
when online is 0, topology disappears so there is no way to determine _the
location_ of the offline'd thread.

>
> For example, the core siblings of an offlined core are still the same,
> they don't change. It is just the core that is offline.

Please see (in latest linux.git)

arch/x86/kernel/smpboot.c:1493 function remove_siblinginfo()

Specifically,

cpumask_clear(topology_sibling_cpumask(cpu));
cpumask_clear(topology_core_cpumask(cpu));

>
>> See commit 20102ac5bee3 ("cpupower: cpupower monitor reports uninitialized
>> values for offline cpus"). That patch papers over the bug of not being able to
>> find core_id and physical_package_id for an offline thread.
>
> Right, and this is *exactly* the *right* thing to do - tools should
> handle the case gracefully when cores are offline.

cpupower should still print out all asterisks for down'd threads. It does not
because the topology directory is incorrectly removed.

IOW how does userspace know the _location_ of the thread? The topology
directory no longer exists when the thread is downed, so core_id and
physical_package_id (both of which would be effectively static) do not exist.
The whole point of this patchset is to know where the offline'd thread actually is.

P.