Re: PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" starting from 5.17 on dual socket Intel Xeon Gold servers

From: Minchan Kim
Date: Fri Apr 22 2022 - 14:49:56 EST


On Thu, Apr 21, 2022 at 06:47:41AM -1000, Tejun Heo wrote:
> Sorry about late reply.
>
> On Wed, Apr 20, 2022 at 10:02:20AM +0200, Jirka Hladky wrote:
> > > Based on your report, kernel was crashed due to kn_mondata was NULL
> > >
> > > rdt_kill_sb
> > > rmdir_all_sub
> > > ..
> > > kernfs_remove(kn_mondata);
> > > struct kernfs_root *root = kernfs_root(kn); <-- crashed
> > >
> > >
> > > Before the my patch[1], it worked like this.
> > >
> > > rdt_kill_sb
> > > rmdir_all_sub
> > > ..
> > > kernfs_remove(kn_mondata);
> > > down_write(&kernfs_rwsem);
> > > if (!kn)
> > > return;
> > > up_write(&kernfs_rwsem);
> > >
> > > IOW, before, kernfs_remove worked with NULL argument via just bailing
> > > but with the my patch[1], it doesn't work any longer.
> > >
> > > It makes me have questions for kernfs maintainers:
> > >
> > > Should kernfs_remove API support NULL parameter? If so, can we support
> > > it atomically without old global kernfs_rwsem?
> > >
> > > [1] 393c3714081a, kernfs: switch global kernfs_rwsem lock to per-fs lock
>
> Yes, I mean, kernfs_remove() used to support NULL arg, so it should do the
> same after the locking change too. Can you send a patch?

Thanks for checking, Tejun.

Jirka, Could you test the patch? Once it's confirmed, I need to resend
it with Ccing stable.

Thanks.