Re: [PATCH v4 1/5] sched/topology: Add check to backup comment about hotplug lock
From: Juri Lelli
Date: Thu Jun 14 2018 - 10:30:47 EST
On 14/06/18 15:18, Quentin Perret wrote:
> On Thursday 14 Jun 2018 at 16:11:18 (+0200), Juri Lelli wrote:
> > On 14/06/18 14:58, Quentin Perret wrote:
> >
> > [...]
> >
> > > Hmm not sure if this can help but I think that rebuild_sched_domains()
> > > does _not_ take the hotplug lock before calling partition_sched_domains()
> > > when CONFIG_CPUSETS=n. But it does take it for CONFIG_CPUSETS=y.
> >
> > Did you mean cpuset_mutex?
>
> Nope, I really meant the cpu_hotplug_lock !
>
> With CONFIG_CPUSETS=n, rebuild_sched_domains() calls
> partition_sched_domains() directly:
>
> https://elixir.bootlin.com/linux/latest/source/include/linux/cpuset.h#L255
>
> But with CONFIG_CPUSETS=y, rebuild_sched_domains() calls,
> rebuild_sched_domains_locked(), which calls get_online_cpus() which
> calls cpus_read_lock(), which does percpu_down_read(&cpu_hotplug_lock).
> And all that happens before calling partition_sched_domains().
Ah, right!
> So yeah, the point I was trying to make is that there is an inconsistency
> here, maybe for a good reason ? Maybe related to the issue you're seeing ?
The config that came with the 0day splat was indeed CONFIG_CPUSETS=n.
So, in this case IIUC we hit the !doms_new branch of partition_sched_
domains, which uses cpu_active_mask (and cpu_possible_mask indirectly).
Should this be still protected by the hotplug lock then?