Re: Machine lockups on extreme memory pressure

From: Shakeel Butt
Date: Tue Sep 22 2020 - 12:30:03 EST


On Tue, Sep 22, 2020 at 8:16 AM Michal Hocko <mhocko@xxxxxxxx> wrote:
>
> On Tue 22-09-20 06:37:02, Shakeel Butt wrote:
> [...]
> > > I would recommend to focus on tracking down the who is blocking the
> > > further progress.
> >
> > I was able to find the CPU next in line for the list_lock from the
> > dump. I don't think anyone is blocking the progress as such but more
> > like the spinlock in the irq context is starving the spinlock in the
> > process context. This is a high traffic machine and there are tens of
> > thousands of potential network ACKs on the queue.
>
> So there is a forward progress but it is too slow to have any reasonable
> progress in userspace?

Yes.

>
> > I talked about this problem with Johannes at LPC 2019 and I think we
> > talked about two potential solutions. First was to somehow give memory
> > reserves to oomd and second was in-kernel PSI based oom-killer. I am
> > not sure the first one will work in this situation but the second one
> > might help.
>
> Why does your oomd depend on memory allocation?
>

It does not but I think my concern was the potential allocations
during syscalls. Anyways, what do you think of the in-kernel PSI based
oom-kill trigger. I think Johannes had a prototype as well.