Re: Regression: SUNRPC: Use poll() to fix up the socket requeue races

From: Trond Myklebust
Date: Mon Feb 25 2019 - 17:42:00 EST


On Mon, 2019-02-25 at 22:27 +0000, Jon Hunter wrote:
> On 25/02/2019 21:03, Trond Myklebust wrote:
> > On Mon, 2019-02-25 at 20:25 +0000, Jon Hunter wrote:
> > > Hi Trond,
> > >
> > > Starting in next-20190222 I have observed a regression with NFS
> > > causing
> > > some of our boards to fail to boot. Bisect points to your commit
> > > ...
> > >
> > > commit 0ffe86f48026b7f34db22d1004bc9992f0db8b33
> > > Author: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> > > Date: Wed Jan 30 14:51:26 2019 -0500
> > >
> > > SUNRPC: Use poll() to fix up the socket requeue races
> > >
> > >
> > > After reverting this on top of -next I no longer see the problem.
> > > I
> > > have
> > > not had chance to look any closer, but wanted to see if you had
> > > any
> > > ideas what might be the problem.
> > >
> > > Cheers
> > > Jon
> >
> > What kind of boot is this? UDP or TCP? nfsroot? NFSv3 or NFSv4?
>
> This is nfsroot. I don't specify any particular NFS version from
> the kernel cmdline, but this is seen with ARM kernel configs
> tegra_defconfig and multi_v7_defconfig.
>
> Looking at the logs I am seeing the following crash which appears
> to point to UDP ...
>
> [ 8.032956] Unable to handle kernel NULL pointer dereference at
> virtual address 00000024
> [ 8.041137] pgd = (ptrval)
> [ 8.043858] [00000024] *pgd=00000000
> [ 8.047437] Internal error: Oops: 5 [#1] SMP ARM
> [ 8.052049] Modules linked in:
> [ 8.055104] CPU: 1 PID: 100 Comm: kworker/u9:2 Not tainted 5.0.0-
> rc7-next-20190222-g94a4752 #1
> [ 8.063699] Hardware name: NVIDIA Tegra SoC (Flattened Device
> Tree)
> [ 8.069960] Workqueue: xprtiod xs_udp_data_receive_workfn
> [ 8.075353] PC is at udp_poll+0x30/0x64
> [ 8.079178] LR is at udp_poll+0x10/0x64

Thanks! I see what the issue is now and I'll be fixing it ASAP.

Cheers
Trond

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx