Re: mm: deadlock between get_online_cpus/pcpu_alloc

From: Mel Gorman
Date: Tue Feb 07 2017 - 08:58:54 EST


On Tue, Feb 07, 2017 at 01:37:08PM +0100, Michal Hocko wrote:
> > You cannot put sleepable lock inside the preempt disbaled section...
> > We can make it a spinlock right?
>
> Scratch that! For some reason I thought that cpu notifiers are run in an
> atomic context. Now that I am checking the code again it turns out I was
> wrong. __cpu_notify uses __raw_notifier_call_chain so this is not an
> atomic context.

Indeed.

> Anyway, shouldn't be it sufficient to disable preemption
> on drain_local_pages_wq?

That would be sufficient for a hot-removed CPU moving the drain request
to another CPU and avoiding any scheduling events.

> The CPU hotplug callback will not preempt us
> and so we cannot work on the same cpus, right?
>

I don't see a specific guarantee that it cannot be preempted and it
would depend on an the exact cpu hotplug implementation which is subject
to quite a lot of change. Hence, the mutex provides a guantee that the
hot-removed CPU teardown cannot run on the same CPU as a workqueue drain
running on a CPU it was not originally scheduled for.

--
Mel Gorman
SUSE Labs