Re: deadlock during writeback when using f2fs filesystem

From: Michal Hocko
Date: Fri Jun 01 2018 - 07:27:51 EST


On Fri 01-06-18 16:50:50, Sahitya Tummala wrote:
> On Fri, Jun 01, 2018 at 12:26:09PM +0200, Michal Hocko wrote:
> > On Fri 01-06-18 15:02:35, Sahitya Tummala wrote:
> > > Hi,
> > >
> > > We are observing a deadlock scenario during FS writeback under low-memory
> > > condition with F2FS filesystem.
> > >
> > > Here is the callstack of this scenario -
> > >
> > > shrink_inactive_list()
> > > shrink_node_memcg.isra.74()
> > > shrink_node()
> > > shrink_zones(inline)
> > > do_try_to_free_pages(inline)
> > > try_to_free_pages()
> > > __perform_reclaim(inline)
> > > __alloc_pages_direct_reclaim(inline)
> > > __alloc_pages_slowpath(inline)
> > > no_zone()
> > > __alloc_pages(inline)
> > > __alloc_pages_node(inline)
> > > alloc_pages_node(inline)
> > > __page_cache_alloc(inline)
> > > pagecache_get_page()
> > > find_or_create_page(inline)
> > > grab_cache_page(inline)
> > > f2fs_grab_cache_page(inline)
> > > __get_node_page.part.32()
> > > __get_node_page(inline)
> > > get_node_page()
> > > update_inode_page()
> > > f2fs_write_inode()
> > > write_inode(inline)
> > > __writeback_single_inode()
> > > writeback_sb_inodes()
> > > __writeback_inodes_wb()
> > > wb_writeback()
> > > wb_do_writeback(inline)
> > > wb_workfn()
> > >
> > > The writeback thread is entering into the direct reclaim path due to low-memory and is
> > > getting stuck in shrink_inactive_list(), as shrink_inactive_list() is inturn waiting for
> > > writeback to happen for the dirty pages present in the inactive list.
> >
> > shrink_page_list waits only for writeback pages when we are in the memcg
> > reclaim. The above seems to be the global reclaim though. Moreover
> > GFP_F2FS_ZERO is GFP_NOFS so we are not waiting for writeback pages at
> > all. Are you sure the above is really a deadlock?
> >
>
> Let me correct my statement. It could be more of a livelock scenario.
>
> The direct reclaim path is not doing any writeback here, so the GFP_NOFS doesn't
> make any difference. In this case, the direct reclaim has to reclaim ~32 pages,
> which it picks up from the tail of the list. All of those tail pages are dirty
> and since direct reclaim path can't do any writeback, it just loops picking and
> skipping them.

But there are surely other pages on the LRU list, aren't they?
--
Michal Hocko
SUSE Labs