Re: [PATCH v2] memcg: allow exiting tasks to write back data to swap

From: Michal Hocko
Date: Tue Jan 14 2025 - 12:01:11 EST


On Tue 14-01-25 11:51:18, Rik van Riel wrote:
> On Tue, 2025-01-14 at 17:46 +0100, Michal Hocko wrote:
> > On Tue 14-01-25 11:09:55, Johannes Weiner wrote:
> >
> > >
> > > We managed to extract a stack trace of the livelocked task:
> > >
> > > obj_cgroup_may_swap
> > > zswap_store
> > > swap_writepage
> > > shrink_folio_list
> > > shrink_lruvec
> > > shrink_node
> > > do_try_to_free_pages
> > > try_to_free_mem_cgroup_pages
> >
> > OK, so this is the reclaim path and it fails due to reasons you
> > mention
> > below. This will retry several times until it hits mem_cgroup_oom
> > which
> > will bail in mem_cgroup_out_of_memory because of task_is_dying
> > (returns
> > true) and retry the charge + reclaim (as the oom killer hasn't done
> > anything) with passed_oom = true this time and eventually got to
> > nomem
> > path and returns ENOMEM. This should propaged -ENOMEM down the path
> >
> > > charge_memcg
> > > mem_cgroup_swapin_charge_folio
> > > __read_swap_cache_async
> > > swapin_readahead
> > > do_swap_page
> > > handle_mm_fault
> > > do_user_addr_fault
> > > exc_page_fault
> > > asm_exc_page_fault
> > > __get_user
> >
> > All the way here and return the failure to futex_cleanup which
> > doesn't
> > retry __get_user on the failure AFAICS (exit_robust_list). But I
> > might
> > be missing something, it's been quite some time since I've looked
> > into
> > futex code.
>
> Can you explain how -ENOMEM would get propagated down
> past the page fault handler?
>
> This isn't get_user_pages(), which can just pass
> -ENOMEM on to the caller.
>
> If there is code to pass -ENOMEM on past the page
> fault exception handler, I have not been able to
> find it. How does this work?

This might be me misunderstading get_user machinery but doesn't it
return a failure on PF handler returing ENOMEM?
--
Michal Hocko
SUSE Labs