Re: mm: deadlock between get_online_cpus/pcpu_alloc
From: Mel Gorman
Date: Wed Feb 08 2017 - 09:05:08 EST
On Wed, Feb 08, 2017 at 02:23:19PM +0100, Thomas Gleixner wrote:
> On Wed, 8 Feb 2017, Mel Gorman wrote:
> > It may be worth noting that patches in Andrew's tree no longer disable
> > interrupts in the per-cpu allocator and now per-cpu draining will
> > be from workqueue context. The reasoning was due to the overhead of
> > the page allocator with figures included. Interrupts will bypass the
> > per-cpu allocator and use the irq-safe zone->lock to allocate from
> > the core. It'll collide with the RT patch. Primary patch of interest is
> > http://www.ozlabs.org/~akpm/mmots/broken-out/mm-page_alloc-only-use-per-cpu-allocator-for-irq-safe-requests.patch
>
> Yeah, we'll sort that out once it hits Linus tree and we move RT forward.
> Though I have once complaint right away:
>
> + preempt_enable_no_resched();
>
> This is a nono, even in mainline. You effectively disable a preemption
> point.
>
This came up during review on whether it should or shouldn't be a preemption
point. Initially it was preempt_enable() but a preemption point didn't
exist before, the reviewer pushed for it and as it was the allocator fast
path that was unlikely to need a reschedule or preempt, I made the change.
I can alter it before it hits mainline if you say RT is going to have an
issue with it.
> > The draining from workqueue context may be a problem for RT but one
> > option would be to move the drain to only drain for high-order pages
> > after direct reclaim combined with only draining for order-0 if
> > __alloc_pages_may_oom is about to be called.
>
> Why would the draining from workqueue context be an issue on RT?
>
It probably isn't. The latency of the operation is likely longer than an IPI
was but given the context it occurs in, I severely doubted it mattered. I
couldn't think of a reason why it would matter to RT but there was no harm
double checking.
--
Mel Gorman
SUSE Labs