Re: [PATCH v3] memcg: fix soft lockup in the OOM process

From: Andrew Morton
Date: Mon Jan 13 2025 - 22:46:25 EST


On Mon, 13 Jan 2025 14:51:55 +0800 Chen Ridong <chenridong@xxxxxxxxxxxxxxx> wrote:

>
>
> On 2025/1/6 16:45, Vlastimil Babka wrote:
> > On 12/24/24 03:52, Chen Ridong wrote:
> >> From: Chen Ridong <chenridong@xxxxxxxxxx>
> >
> > +CC RCU
> >
> >> A soft lockup issue was found in the product with about 56,000 tasks were
> >> in the OOM cgroup, it was traversing them when the soft lockup was
> >> triggered.
> >>
>
> ...
>
> >> @@ -430,10 +431,15 @@ static void dump_tasks(struct oom_control *oc)
> >> mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
> >> else {
> >> struct task_struct *p;
> >> + int i = 0;
> >>
> >> rcu_read_lock();
> >> - for_each_process(p)
> >> + for_each_process(p) {
> >> + /* Avoid potential softlockup warning */
> >> + if ((++i & 1023) == 0)
> >> + touch_softlockup_watchdog();
> >
> > This might suppress the soft lockup, but won't a rcu stall still be detected?
>
> Yes, rcu stall was still detected.
> For global OOM, system is likely to struggle, do we have to do some
> works to suppress RCU detete?

rcu_cpu_stall_reset()?