Re: [PATCH] mm/mglru: fix cgroup OOM during MGLRU state switching

From: Yuanchu Xie

Date: Mon Mar 02 2026 - 12:52:15 EST

Hi Yafang,

On Mon, Mar 2, 2026 at 8:36 AM Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
>
> On Mon, Mar 2, 2026 at 5:48 PM Kairui Song <ryncsn@xxxxxxxxx> wrote:
> >
> > On Mon, Mar 2, 2026 at 5:20 PM Barry Song <21cnbao@xxxxxxxxx> wrote:
> > >
> > > On Mon, Mar 2, 2026 at 4:25 PM Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
> > > >
> > > > The challenge we're currently facing is that we don't yet know which
> > > > workloads would benefit from it ;)
> > > > We do want to enable mglru on our production servers, but first we
> > > > need to address the risk of OOM during the switch—that's exactly why
> > > > we're proposing this patch.
> > >
> > > Nobody objects to your intention to fix it. I’m curious: to what
> > > extent do we want to fix it? Do we aim to merely reduce the probability
> > > of OOM and other mistakes, or do we want a complete fix that makes
> > > the dynamic on/off fully safe?
> >
> > Yeah, I'm glad that more people are trying MGLRU and improving it.
> >
> > We also have an downstream fix for the OOM on switch issue, but that's
> > mostly as a fallback in case MGLRU doesn't work well, our goal is
> > still try to enable MGLRU as much as possible,
>
> Our goals are aligned.
> Before enabling mglru, we must first ensure it won't cause OOM errors
> across multiple servers. We propose fixing this because, during our
> previous mglru enablement, many instances of a single service OOM'd
> simultaneously—potentially leading to data loss for that service.

Would it be possible to drain the jobs away from the machine before
switching LRUs? The MGLRU kill-switch could be improved, but making
the switch more or less "hitless" would require significant work. Is
the use case a one-time switch from active/inactive to MGLRU?
I do want to note that OOMs causing data loss is not really the kernel's fault.

Thanks,
Yuanchu