Re: [PATCH] vmscan: do not throttle kthreads due to too_many_isolated

From: Vladimir Davydov
Date: Fri Nov 27 2015 - 08:40:36 EST


On Fri, Nov 27, 2015 at 01:50:05PM +0100, Michal Hocko wrote:
> On Thu 26-11-15 11:16:24, Vladimir Davydov wrote:
> > On Wed, Nov 25, 2015 at 07:27:57PM +0300, Vladimir Davydov wrote:
> > > On Wed, Nov 25, 2015 at 04:45:13PM +0100, Vlastimil Babka wrote:
> > > > On 11/25/2015 04:36 PM, Vladimir Davydov wrote:
> > > > > Block device drivers often hand off io request processing to kernel
> > > > > threads (example: device mapper). If such a thread calls kmalloc, it can
> > > > > dive into direct reclaim path and end up waiting for too_many_isolated
> > > > > to return false, blocking writeback. This can lead to a dead lock if the
> > > >
> > > > Shouldn't such allocation lack __GFP_IO to prevent this and other kinds of
> > > > deadlocks? And/or have mempools?
> > >
> > > Not necessarily. loopback is an example: it can call
> > > grab_cache_write_begin -> add_to_page_cache_lru with GFP_KERNEL.
>
> AFAIR loop driver reduces the gfp_maks via inode mapping.

Yeah, it does, missed that, thanks for pointing this out. But it doesn't
really make much difference, because it still can get stuck in
too_many_isolated, although it does reduce the chance of this happening.
When I hit it, DMA only got 3 inactive file pages and 68 isolated file
pages, as I mentioned in the comment to the patch, so even >> 3 wouldn't
save us.

>
> > Anyway, kthreads that use GFP_NOIO and/or mempool aren't safe either,
> > because it isn't an allocation context problem: the reclaimer locks up
> > not because it tries to take an fs/io lock the caller holds, but because
> > it waits for isolated pages to be put back, which will never happen,
> > since processes that isolated them depend on the kthread making
> > progress. This is purely a reclaimer heuristic, which kmalloc users are
> > not aware of.
> >
> > My point is that, in contrast to userspace processes, it is dangerous to
> > throttle kthreads in the reclaimer, because they might be responsible
> > for reclaimer progress (e.g. performing writeback).
>
> Wouldn't it be better if your writeback kthread did PF_MEMALLOC/__GFP_MEMALLOC
> instead because it is in fact a reclaimer so it even get to the reclaim.

The driver we use is similar to loop. It works as a proxy to fs it works
on top of. Allowing it to access emergency reserves would deplete them
quickly, just like in case of plain loop.

The problem is not about our driver, in fact. I'm pretty sure one can
hit it when using memcg along with loop or dm-crypt for instance.

>
> There way too many allocations done from the kernel thread context to be
> not throttled (just look at worker threads).

What about throttling them only once then?

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 97ba9e1cde09..9253f4531b9c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1578,6 +1578,9 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
/* We are about to die and free our memory. Return now. */
if (fatal_signal_pending(current))
return SWAP_CLUSTER_MAX;
+
+ if (current->flags & PF_KTHREAD)
+ break;
}

lru_add_drain();
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/