Re: pids.current with invalid value for hours [5.0.0 rc3 git]

From: Roman Gushchin
Date: Fri Jan 25 2019 - 20:41:48 EST


On Fri, Jan 25, 2019 at 08:47:57PM +0100, Arkadiusz MiÅkiewicz wrote:
> On 25/01/2019 17:37, Tejun Heo wrote:
> > On Fri, Jan 25, 2019 at 08:52:11AM +0100, Arkadiusz MiÅkiewicz wrote:
> >> On 24/01/2019 12:21, Arkadiusz MiÅkiewicz wrote:
> >>> On 17/01/2019 14:17, Arkadiusz MiÅkiewicz wrote:
> >>>> On 17/01/2019 13:25, Aleksa Sarai wrote:
> >>>>> On 2019-01-17, Arkadiusz MiÅkiewicz <a.miskiewicz@xxxxxxxxx> wrote:
> >>>>>> Using kernel 4.19.13.
> >>>>>>
> >>>>>> For one cgroup I noticed weird behaviour:
> >>>>>>
> >>>>>> # cat pids.current
> >>>>>> 60
> >>>>>> # cat cgroup.procs
> >>>>>> #
> >>>>>
> >>>>> Are there any zombies in the cgroup? pids.current is linked up directly
> >>>>> to __put_task_struct (so exit(2) won't decrease it, only the task_struct
> >>>>> actually being freed will decrease it).
> >>>>>
> >>>>
> >>>> There are no zombie processes.
> >>>>
> >>>> In mean time the problem shows on multiple servers and so far saw it
> >>>> only in cgroups that were OOMed.
> >>>>
> >>>> What has changed on these servers (yesterday) is turning on
> >>>> memory.oom.group=1 for all cgroups and changing memory.high from 1G to
> >>>> "max" (leaving memory.max=2G limit only).
> >>>>
> >>>> Previously there was no such problem.
> >>>>
> >>>
> >>> I'm attaching reproducer. This time tried on different distribution
> >>> kernel (arch linux).
> >>>
> >>> After 60s pids.current still shows 37 processes even if there are no
> >>> processes running (according to ps aux).
> >>
> >>
> >> The same test on 5.0.0-rc3-00104-gc04e2a780caf and it's easy to
> >> reproduce bug. No processes in cgroup but pids.current reports 91.
> >
> > Can you please see whether the problem can be reproduced on the
> > current linux-next?
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>
> I can reproduce on next (5.0.0-rc3-next-20190125), too:

How reliably you can reproduce it? I've tried to run your reproducer
several times with different parameters, but wasn't lucky so far.
What's yours cpu number and total ram size?

Can you, please, provide the corresponding dmesg output?

I've checked the code again, and my wild guess is that these missing
tasks are waiting (maybe hopelessly) for the OOM reaper. Dmesg output
might be very useful here.

Thanks!