Re: [BUG/RFC] mm/memcg: Possible cgroup migrate/signal deadlock
From: Roman Gushchin
Date: Tue Feb 01 2022 - 17:12:10 EST
On Tue, Feb 01, 2022 at 02:56:23PM -0600, Jeremy Linton wrote:
> With CONFIG_MEMCG_KMEM and CONFIG_PROVE_LOCKING enabled (fedora
> rawhide kernel), running a simple podman test tosses a circular
> locking dependency warning. The podman container in question simpy
> contains the echo command and the libc/ld-linux needed to run it. The
> warning can be duplicated with just a single `podman build --network
> host --layers=false -t localhost/echo .` command, although the exact
> sequence that triggers the warning needs the task state to be changing
> the frozen state as well. So, its easier to duplicate with a slightly
> longer test case.
>
> I've attempted to trigger the actual deadlock with some standalone
> code and been unsuccessful, but looking at the code it appears to be a
> legitimate deadlock if a signal is being sent to the process from
> another thread while the task is migrating between cgroups.
>
> Attached is a fix which I'm confident fixes the problem, but I'm not
> really that confident in the fix since I don't fully understand all
> the possible states in the cgroup code. The fix avoids the deadlock by
> shifting the objcg->list manipulation to another spinlock and then
> using list_del_rcu in obj_cgroup_release.
>
> There is a bit more information in the actual BZ
> https://bugzilla.redhat.com/show_bug.cgi?id=2033016 including a shell
> script with the podman test/etc.
Hi Jeremy!
Thank you for the report and the patch!
We've discussed this issue some time ago and I posted a very similar patch:
https://marc.info/?l=linux-cgroups&m=164221633621286&w=2 .
Also I did resend the latest version few hours ago, but somehow the
mail didn't make it to the mailing lists. Anyway, I've added you
explicitly to cc@ and just resent.
Thanks!