Re: PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" starting from 5.17 on dual socket Intel Xeon Gold servers

From: Jirka Hladky
Date: Wed Apr 20 2022 - 04:02:45 EST


Hi Minchan,

have you heard back from the kernfs maintainers?

Thank you!
Jirka


On Mon, Apr 4, 2022 at 7:41 PM Minchan Kim <minchan@xxxxxxxxxx> wrote:
>
> On Fri, Apr 01, 2022 at 02:04:03PM +0200, Jirka Hladky wrote:
> > > Could you decode exact source code line from the oops?
> >
> > Yes - please see below [1].
>
> Thanks.
>
> >
> > > I think it's fine to attach in the reply because kernel test bot
> >
> > OK. The reproducer is attached. Please unpack it and follow the
> > instructions in the README file. [2]
>
> Unfortunately, I failed to run the script in my machine.
>
> >
> > Thanks a lot for looking into it!
> > Jirka
> >
> > [1]
> > =============================================
> > Source code line numbers for the Oops message
> > =============================================
> >
> > 1) RIP: 0010:kernfs_remove+0x8/0x50:
> > (gdb) l *kernfs_remove+0x8
> > 0xffffffff81418588 is in kernfs_remove (fs/kernfs/kernfs-internal.h:48).
> > 43 * Return the kernfs_root @kn belongs to.
> > 44 */
> > 45 static inline struct kernfs_root *kernfs_root(struct kernfs_node *kn)
> > 46 {
> > 47 /* if parent exists, it's always a dir; otherwise, @sd
> > is a dir */
> > 48 if (kn->parent)
> > 49 kn = kn->parent;
> > 50 return kn->dir.root;
> > 51 }
> >
> > And here are source code lines from the 5 first functions in call trace:
> > [ 8563.366280] Call Trace:
> > [ 8563.366280] <TASK>
> > [ 8563.366280] rdt_kill_sb+0x29d/0x350
> > [ 8563.366280] deactivate_locked_super+0x36/0xa0
> > [ 8563.366280] cleanup_mnt+0x131/0x190
> > [ 8563.366280] task_work_run+0x5c/0x90
> > [ 8563.366280] exit_to_user_mode_prepare+0x229/0x230
> > [ 8563.366280] syscall_exit_to_user_mode+0x18/0x40
> > [ 8563.366280] do_syscall_64+0x48/0x90
> > [ 8563.366280] entry_SYSCALL_64_after_hwframe+0x44/0xae
> >
> > 2)(gdb) l *rdt_kill_sb+0x29d
> > 0xffffffff810506bd is in rdt_kill_sb
> > (arch/x86/kernel/cpu/resctrl/rdtgroup.c:2442).
> > 2437 /* Notify online CPUs to update per cpu storage and
> > PQR_ASSOC MSR */
> > 2438 update_closid_rmid(cpu_online_mask, &rdtgroup_default);
> > 2439
> > 2440 kernfs_remove(kn_info);
> > 2441 kernfs_remove(kn_mongrp);
> > 2442 kernfs_remove(kn_mondata);
> > 2443 }
> >
> > 3)(gdb) l *deactivate_locked_super+0x36
> > 0xffffffff813650f6 is in deactivate_locked_super (fs/super.c:342).
> > 337 /*
> > 338 * Since list_lru_destroy() may sleep, we
> > cannot call it from
> > 339 * put_super(), where we hold the sb_lock.
> > Therefore we destroy
> > 340 * the lru lists right now.
> > 341 */
> > 342 list_lru_destroy(&s->s_dentry_lru);
> > 343 list_lru_destroy(&s->s_inode_lru);
> > 344
> > 345 put_filesystem(fs);
> > 346 put_super(s);
> >
> > 4) (gdb) l *cleanup_mnt+0x131
> > 0xffffffff813890a1 is in cleanup_mnt (fs/namespace.c:137).
> > 132 return 0;
> > 133 }
> > 134
> > 135 static void mnt_free_id(struct mount *mnt)
> > 136 {
> > 137 ida_free(&mnt_id_ida, mnt->mnt_id);
> > 138 }
> >
> > 5) (gdb) l *task_work_run+0x5c
> > 0xffffffff8110620c is in task_work_run (./include/linux/sched.h:2017).
> > 2012
> > 2013 DECLARE_STATIC_CALL(cond_resched, __cond_resched);
> > 2014
> > 2015 static __always_inline int _cond_resched(void)
> > 2016 {
> > 2017 return static_call_mod(cond_resched)();
> > 2018 }
> >
> > 6) (gdb) l *exit_to_user_mode_prepare+0x229
> > 0xffffffff81176d19 is in exit_to_user_mode_prepare
> > (./include/linux/tracehook.h:189).
> > 184 * This barrier pairs with
> > task_work_add()->set_notify_resume() after
> > 185 * hlist_add_head(task->task_works);
> > 186 */
> > 187 smp_mb__after_atomic();
> > 188 if (unlikely(current->task_works))
> > 189 task_work_run();
> > 190
> > 191 #ifdef CONFIG_KEYS_REQUEST_CACHE
> > 192 if (unlikely(current->cached_requested_key)) {
> > 193 key_put(current->cached_requested_key);
> >
> > [2]
> > =============================================
> > Reproducer - README
> > =============================================
> >
> > 1) HW
> > This issue seems to be platform specific. I was not able to reproduce
> > it on AMD Zen and also not on Intel Ice Lake platform.
> > I see the issue on dual socket Intel Skylake systems. Reproduced on a
> > Supermicro Super Server/X11DDW-L with 2x Xeon Gold 6126 CPU.
>
> Based on your report, kernel was crashed due to kn_mondata was NULL
>
> rdt_kill_sb
> rmdir_all_sub
> ..
> kernfs_remove(kn_mondata);
> struct kernfs_root *root = kernfs_root(kn); <-- crashed
>
>
> Before the my patch[1], it worked like this.
>
> rdt_kill_sb
> rmdir_all_sub
> ..
> kernfs_remove(kn_mondata);
> down_write(&kernfs_rwsem);
> if (!kn)
> return;
> up_write(&kernfs_rwsem);
>
> IOW, before, kernfs_remove worked with NULL argument via just bailing
> but with the my patch[1], it doesn't work any longer.
>
> It makes me have questions for kernfs maintainers:
>
> Should kernfs_remove API support NULL parameter? If so, can we support
> it atomically without old global kernfs_rwsem?
>
> [1] 393c3714081a, kernfs: switch global kernfs_rwsem lock to per-fs lock
>


--
-Jirka