RE: [PATCH 2/2] NFSD: fix race between nfsd registration and exports_proc

From: Maninder Singh
Date: Thu Mar 06 2025 - 23:00:11 EST


Hi,

> > As of now nfsd calls create_proc_exports_entry() at start of init_nfsd
> > and cleanup by remove_proc_entry() at last of exit_nfsd.
> >
> > Which causes kernel OOPs if there is race between below 2 operations:
> > (i) exportfs -r
> > (ii) mount -t nfsd none /proc/fs/nfsd
> >
> > for 5.4 kernel ARM64:
> >
> > CPU 1:
> > el1_irq+0xbc/0x180
> > arch_counter_get_cntvct+0x14/0x18
> > running_clock+0xc/0x18
> > preempt_count_add+0x88/0x110
> > prep_new_page+0xb0/0x220
> > get_page_from_freelist+0x2d8/0x1778
> > __alloc_pages_nodemask+0x15c/0xef0
> > __vmalloc_node_range+0x28c/0x478
> > __vmalloc_node_flags_caller+0x8c/0xb0
> > kvmalloc_node+0x88/0xe0
> > nfsd_init_net+0x6c/0x108 [nfsd]
> > ops_init+0x44/0x170
> > register_pernet_operations+0x114/0x270
> > register_pernet_subsys+0x34/0x50
> > init_nfsd+0xa8/0x718 [nfsd]
> > do_one_initcall+0x54/0x2e0
> >
> > CPU 2 :
> > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
> >
> > PC is at : exports_net_open+0x50/0x68 [nfsd]
> >
> > Call trace:
> > exports_net_open+0x50/0x68 [nfsd]
> > exports_proc_open+0x2c/0x38 [nfsd]
> > proc_reg_open+0xb8/0x198
> > do_dentry_open+0x1c4/0x418
> > vfs_open+0x38/0x48
> > path_openat+0x28c/0xf18
> > do_filp_open+0x70/0xe8
> > do_sys_open+0x154/0x248



> To make sure I understand, the race is that sometimes the exports
> interface gets created before the net namespace is set up, and then
> that causes GPFs when exports_net_open tries to access the nfsd_net?
>


Yes, Sometime at time of module init this happened as I shared state of 2 CPUs at
time of crash.
and sometimes it occurs when module was unloading and user space was accessing it.

So I though interface to user shall be exported late during init and clean up early.
But what is actual position for that I was not sure, So I moved to last at time of init
and first at time of clean up.

And originally at time of 4.13 kernel this cleanup was the first thing to do in exit time.
but with time to fixing other issues, its position got changed.

https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=027690c75e8fd91b60a634d31c4891a6e39d45bd
https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=bd5ae9288d6451bd346a1b4a59d4fe7e62ba29b7
https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=6f6f84aa215f7b6665ccbb937db50860f9ec2989

Which caused this kernel OOPs I think.

Thanks,
Maninder Singh