Re: linux 5.14.3: free_user_ns causes NULL pointer dereference

From: Jordan Glover
Date: Wed Sep 29 2021 - 17:39:17 EST


On Wednesday, September 29th, 2021 at 5:36 PM, Alexey Gladkov <legion@xxxxxxxxxx> wrote:

> On Tue, Sep 28, 2021 at 01:40:48PM +0000, Jordan Glover wrote:
>
> > On Thursday, September 16th, 2021 at 5:30 PM, ebiederm@xxxxxxxxxxxx wrote:
> >
> > > Jordan Glover Golden_Miller83@xxxxxxxxxxxxx writes:
> > >
> > > > On Wednesday, September 15th, 2021 at 10:42 PM, Jordan Glover Golden_Miller83@xxxxxxxxxxxxx wrote:
> > > >
> > > > > I had about 2 containerized (flatpak/bubblewrap) apps (browser + music player) running . I quickly closed them with intent to shutdown the system but instead get the freeze and had to use magic sysrq to reboot. System logs end with what I posted and before there is nothing suspicious.
> > > > >
> > > > > Maybe it's some random fluke. I'll reply if I hit it again.
> > > >
> > > > Heh, it jut happened again. This time closing firefox alone had such
> > > >
> > > > effect:
> > >
> > > Ok. It looks like he have a couple of folks seeing issues here.
> > >
> > > I thought we had all of the issues sorted out for the release of v5.14,
> > >
> > > but it looks like there is still some little bug left.
> > >
> > > If Alex doesn't beat me to it I will see if I can come up with a
> > >
> > > debugging patch to make it easy to help track down where the reference
> > >
> > > count is going wrong. It will be a little bit as my brain is mush at
> > >
> > > the moment.
> > >
> > > Eric
> >
> > As the issue persist in 5.14.7 I would be very interested in such patch.
> >
> > For now the thing is mostly reproducible when I close several tabs in ff then
> >
> > close the browser in short period of time. When I close tabs then wait out
> >
> > a bit then close the browser it doesn't happen so I guess some interrupted
> >
> > cleanup triggers it.
>
> I'm still investigating, but I would like to rule out one option.
>
> Could you check out the patch?


Thx, I added it to my kernel and will report in few days.
Does this patch try to fix the issue or make it easier to track?

Jordan

> diff --git a/kernel/ucount.c b/kernel/ucount.c
>
> index bb51849e6375..f23f906f4f62 100644
>
> --- a/kernel/ucount.c
>
> +++ b/kernel/ucount.c
>
> @@ -201,11 +201,14 @@ void put_ucounts(struct ucounts *ucounts)
>
> {
>
> unsigned long flags;
>
> - if (atomic_dec_and_lock_irqsave(&ucounts->count, &ucounts_lock, flags)) {
>
>
>
> - spin_lock_irqsave(&ucounts_lock, flags);
>
>
> - if (atomic_dec_and_test(&ucounts->count)) {
>
> hlist_del_init(&ucounts->node);
>
> spin_unlock_irqrestore(&ucounts_lock, flags);
> kfree(ucounts);
>
>
> - return;
> }
>
>
> - spin_unlock_irqrestore(&ucounts_lock, flags);
>
>
>
> }
>
> static inline bool atomic_long_inc_below(atomic_long_t *v, int u)
>
> ---------------------------------------------------------------------
>
> Rgrds, legion