Re: [PATCH 1/2] cpu hotplug, add CONFIG_PERMANENT_CPU_TOPOLOGY and keep topology directory for lifetime of CPU

From: Prarit Bhargava
Date: Mon Oct 19 2015 - 18:26:42 EST




On 10/19/2015 11:04 AM, Prarit Bhargava wrote:
> The information in /sys/devices/system/cpu/cpuX/topology
> directory is useful for userspace monitoring applications and in-tree
> utilities like cpupower & turbostat.
>
> When down'ing a CPU the /sys/devices/system/cpu/cpuX/topology directory is
> removed during the CPU_DEAD hotplug callback in the kernel. The problem
> with this model is that the CPU has not been physically removed and the
> data in the topology directory is still valid.
>
> This patch adds CONFIG_PERMANENT_CPU_TOPOLOGY, and is Y by default for
> x86, an N for all other arches. When enabled the kernel is modified so
> that the topology directory is added to the core cpu sysfs files so that
> the topology directory exists for the lifetime of the CPU. When
> disabled, the behavior of the current kernel is maintained (that is, the
> topology directory is removed on a down and added on an up). Adding
> CONFIG_PERMANENT_CPU_TOPOLOGY may require additional architecture so that
> the cpumask data the CPU's topology is not cleared during a CPU down.
>
> This patch combines drivers/base/topology.c and drivers/base/cpu.c to
> implement CONFIG_PERMANENT_CPU_TOPOLOGY, and leaves all arches except
> x86 with the current behavior.
>
> Before patch:
>
> [root@prarit cpu143]# ls
> cache crash_notes firmware_node online thermal_throttle
> cpufreq crash_notes_size microcode power topology
> cpuidle driver node3 subsystem uevent
>
> Down a cpu
>
> [root@prarit cpu143]# echo 0 > online
>
> [root@intel-brickland-05 cpu143]# ls
> cpuidle crash_notes_size firmware_node online subsystem
> crash_notes driver node3 power uevent
> [root@intel-brickland-05 cpu143]# ls -l topology
> ls: cannot access topology: No such file or directory
>
> After patch:
>
> [root@prarit cpu143]# ls
> cache crash_notes firmware_node online thermal_throttle
> cpufreq crash_notes_size microcode power topology
> cpuidle driver node3 subsystem uevent
> [root@prarit cpu143]# cat topology/*
> 27
> ffff,c0000000,000000ff,ffc00000,00000000
> 54-71,126-143
> 3
> 8000,00000000,00000080,00000000,00000000
> 71,143
>
> Down a cpu
>
> [root@prarit cpu143]# echo 0 > online
> [root@intel-brickland-05 cpu143]# ls
> cpuidle crash_notes_size firmware_node online subsystem uevent
> crash_notes driver node3 power topology
> [root@prarit cpu143]# cat topology/*
> 27
> ffff,c0000000,000000ff,ffc00000,00000000
> 54-71,126-143
> 3
> 8000,00000000,00000080,00000000,00000000
> 71,143
>
> I did some light testing with and without BOOTPARAM_HOTPLUG_CPU0 enabled,
> and up'd and down'd CPUs in sequence, randomly, by thread group, by
> socket group and didn't see any issues.
>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
> Cc: x86@xxxxxxxxxx
> Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> Cc: Borislav Petkov <bp@xxxxxxxxx>
> Cc: Len Brown <len.brown@xxxxxxxxx>
> Cc: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
> Cc: Zhu Guihua <zhugh.fnst@xxxxxxxxxxxxxx>
> Cc: Denys Vlasenko <dvlasenk@xxxxxxxxxx>
> Cc: "Jan H. SchÃnherr" <jschoenh@xxxxxxxxx>
> Cc: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
> Cc: Prarit Bhargava <prarit@xxxxxxxxxx>
> Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx>
> Cc: Thomas Renninger <trenn@xxxxxxx>
> Signed-off-by: Prarit Bhargava <prarit@xxxxxxxxxx>
> ---
> arch/x86/kernel/smpboot.c | 28 --------
> drivers/base/Kconfig | 13 ++++
> drivers/base/Makefile | 2 +-
> drivers/base/cpu.c | 135 +++++++++++++++++++++++++++++++++++++++
> drivers/base/topology.c | 155 ---------------------------------------------
> 5 files changed, 149 insertions(+), 184 deletions(-)
> delete mode 100644 drivers/base/topology.c
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index e0c198e..19082c7 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1322,32 +1322,6 @@ __init void prefill_possible_map(void)
>
> #ifdef CONFIG_HOTPLUG_CPU
>
> -static void remove_siblinginfo(int cpu)
> -{
> - int sibling;
> - struct cpuinfo_x86 *c = &cpu_data(cpu);
> -
> - for_each_cpu(sibling, topology_core_cpumask(cpu)) {
> - cpumask_clear_cpu(cpu, topology_core_cpumask(sibling));
> - /*/
> - * last thread sibling in this cpu core going down
> - */
> - if (cpumask_weight(topology_sibling_cpumask(cpu)) == 1)
> - cpu_data(sibling).booted_cores--;
> - }
> -
> - for_each_cpu(sibling, topology_sibling_cpumask(cpu))
> - cpumask_clear_cpu(cpu, topology_sibling_cpumask(sibling));
> - for_each_cpu(sibling, cpu_llc_shared_mask(cpu))
> - cpumask_clear_cpu(cpu, cpu_llc_shared_mask(sibling));
> - cpumask_clear(cpu_llc_shared_mask(cpu));
> - cpumask_clear(topology_sibling_cpumask(cpu));
> - cpumask_clear(topology_core_cpumask(cpu));


It was just pointed out to me that the these masks tend to be used in
the kernel as "sibling & online mask", and "core & online_mask". The other
values are indeed static and are safe to use.

I'll "self-nak" this and come up with a slightly better patchset.

P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/