Re: mm: deadlock between get_online_cpus/pcpu_alloc
From: Mel Gorman
Date: Tue Feb 07 2017 - 04:43:16 EST
On Tue, Feb 07, 2017 at 09:48:56AM +0100, Michal Hocko wrote:
> > +
> > + /*
> > + * Only drain from contexts allocating for user allocations.
> > + * Kernel allocations could be holding a CPU hotplug-related
> > + * mutex, particularly hot-add allocating per-cpu structures
> > + * while hotplug-related mutex's are held which would prevent
> > + * get_online_cpus ever returning.
> > + */
> > + if (gfp_mask & __GFP_HARDWALL)
> > + drain_all_pages(NULL);
> > +
>
> This wouldn't work AFAICS. If you look at the lockdep splat, the path
> which reverses the locking order (takes pcpu_alloc_mutex prior to
> cpu_hotplug.lock is bpf_array_alloc_percpu which is GFP_USER and thus
> __GFP_HARDWALL.
>
You're right, I looked at the wrong caller.
> I believe we shouldn't pull any dependency on the hotplug locks inside
> the allocator. This is just too fragile! Can we simply drop the
> get_online_cpus()? Why do we need it, anyway?
To stop the CPU being queued going offline. Another alternative is bringing
back try_get_offline_cpus which Peter pointed out to be offline was removed
in commit 02ef3c4a2aae65a1632b27770bfea3f83ca06772 although it'd be a
shame for just this case.
> Say we are racing with the
> cpu offlining. I have to check the code but my impression was that WQ
> code will ignore the cpu requested by the work item when the cpu is
> going offline. If the offline happens while the worker function already
> executes then it has to wait as we run with preemption disabled so we
> should be safe here. Or am I missing something obvious?
It may be safe but I'd prefer to get confirmation from Tejun. If it happens
that a drain on a particular CPU gets missed in this path, it's not the
end of the world as the CPU offlining drains the pages in another path so
nothing gets lost.
It would also be acceptable if one CPU was drained twice because the
workqueue was unbound from a CPU being hot-removed and moved during a
hotplug operation.
If I'm reading this right, a hot-remove will set the pool POOL_DISASSOCIATED
and unbound. A workqueue queued for draining get migrated during hot-remove
and a drain operation will execute twice on a CPU -- one for what was
queued and a second time for the CPU it was migrated from. It should still
work with flush_work which doesn't appear to block forever if an item
got migrated to another workqueue. The actual drain workqueue function is
using the CPU ID it's currently running on so it shouldn't get confused.
Tejun, did I miss anything? Does a workqueue item queued on a CPU being
offline get unbound and a caller can still flush it safely? In this
specific case, it's ok that the workqueue item does not run on the CPU it
was queued on.
--
Mel Gorman
SUSE Labs