Re: [PATCH] powerpc: Drop lockdep_assert_cpus_held call from arch_update_cpu_topology

From: Thiago Jung Bauermann
Date: Thu Oct 05 2017 - 14:26:16 EST



Hello Thomas,

Thanks for your comments.

Thomas Gleixner <tglx@xxxxxxxxxxxxx> writes:

> On Wed, 4 Oct 2017, Thiago Jung Bauermann wrote:
>
>> It turns out that not all paths calling arch_update_cpu_topology hold
>> cpu_hotplug_lock, but that's ok because those paths aren't supposed to race
>> with any concurrent hotplug events.
>>
>> Callers of arch_update_cpu_topology are expected to know what they are
>> doing when they call the function without holding the lock, so remove the
>> lockdep warning.
>
> "Expected to know what they are doing" is not really a good approach as
> it's way too simple to screw things up.

I agree.

> You might consider to have two functions where one does the check and the
> other does not, but I leave that to the PPC maintainers.

It doesn't look like powerpc uses arch_update_cpu_topology differently
than other arches. Are you saying that all callers of the function
should be holding cpu_hotplug_lock? I came to the conclusion that this
isn't the case because of two things:

1. This comment in sched_init_smp:

/*
* There's no userspace yet to cause hotplug operations; hence all the
* CPU masks are stable and all blatant races in the below code cannot
* happen.
*/
mutex_lock(&sched_domains_mutex);
sched_init_domains(cpu_active_mask);
cpumask_andnot(non_isolated_cpus, cpu_possible_mask, cpu_isolated_map);
if (cpumask_empty(non_isolated_cpus))
cpumask_set_cpu(smp_processor_id(), non_isolated_cpus);
mutex_unlock(&sched_domains_mutex);

This is relevant for the following call trace:

[c000003ff6107b50] [c00000000010ac70] lockdep_assert_cpus_held+0x50/0x70 (unreliable)
[c000003ff6107b70] [c0000000000802c0] arch_update_cpu_topology+0x20/0x40
[c000003ff6107b90] [c000000000182ec4] sched_init_domains+0x74/0x100
[c000003ff6107bd0] [c000000000ed5c78] sched_init_smp+0x58/0x168
[c000003ff6107d00] [c000000000eb460c] kernel_init_freeable+0x1fc/0x3ac
[c000003ff6107dc0] [c00000000000dc2c] kernel_init+0x2c/0x150
[c000003ff6107e30] [c00000000000bcec] ret_from_kernel_thread+0x5c/0x70

2. Your comment on patch "lockup_detector: Cleanup hotplug locking mess"Â:

"All watchdog thread related functions are delegated to the smpboot thread
infrastructure, which handles serialization against CPU hotplug correctly."

Though TBH I'm still grasping the smpboot thread infrastructure and
I'm not sure how it handles serialization against CPU hotplug.

This is relevant for the following call trace:

[c000001ff257ba00] [c0000000000f8618] lockdep_assert_cpus_held+0x48/0x80 (unreliable)
[c000001ff257ba20] [c00000000007a848] arch_update_cpu_topology+0x18/0x30
[c000001ff257ba40] [c00000000016bd0c] partition_sched_domains+0x8c/0x4e0
[c000001ff257baf0] [c00000000020a914] cpuset_update_active_cpus+0x24/0x60
[c000001ff257bb10] [c0000000001449bc] sched_cpu_deactivate+0xfc/0x1a0
[c000001ff257bc20] [c0000000000f5a8c] cpuhp_invoke_callback+0x19c/0xe00
[c000001ff257bcb0] [c0000000000f6868] cpuhp_down_callbacks+0x78/0xf0
[c000001ff257bd00] [c0000000000f6c1c] cpuhp_thread_fun+0x1fc/0x210
[c000001ff257bd50] [c000000000132f4c] smpboot_thread_fn+0x2fc/0x370
[c000001ff257bdc0] [c00000000012ca24] kthread+0x1b4/0x1c0
[c000001ff257be30] [c00000000000bcec] ret_from_kernel_thread+0x5c/0x70

>From what you said though it looks like the lockdep assertion shouldn't
have been triggered in either case. I'll keep digging.

--
Thiago Jung Bauermann
IBM Linux Technology Center


 https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-September/163477.html