Re: MGLRU premature memcg OOM on slow writes

From: Chris Down
Date: Thu Feb 29 2024 - 19:31:08 EST


Axel Rasmussen writes:
A couple of dumb questions. In your test, do you have any of the following
configured / enabled?

/proc/sys/vm/laptop_mode
memory.low
memory.min

None of these are enabled. The issue is trivially reproducible by writing to any slow device with memory.max enabled, but from the code it looks like MGLRU is also susceptible to this on global reclaim (although it's less likely due to page diversity).

Besides that, it looks like the place non-MGLRU reclaim wakes up the
flushers is in shrink_inactive_list() (which calls wakeup_flusher_threads()).
Since MGLRU calls shrink_folio_list() directly (from evict_folios()), I agree it
looks like it simply will not do this.

Yosry pointed out [1], where MGLRU used to call this but stopped doing that. It
makes sense to me at least that doing writeback every time we age is too
aggressive, but doing it in evict_folios() makes some sense to me, basically to
copy the behavior the non-MGLRU path (shrink_inactive_list()) has.

Thanks! We may also need reclaim_throttle(), depending on how you implement it. Current non-MGLRU behaviour on slow storage is also highly suspect in terms of (lack of) throttling after moving away from VMSCAN_THROTTLE_WRITEBACK, but one thing at a time :-)