Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

From: Dave Chinner
Date: Thu Dec 22 2016 - 15:42:49 EST


On Thu, Dec 22, 2016 at 09:24:12AM -0800, Linus Torvalds wrote:
> On Wed, Dec 21, 2016 at 10:28 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> >
> > This sort of thing is normally indicative of a memory reclaim or
> > lock contention problem. Profile showed unusual spinlock contention,
> > but then I realised there was only one kswapd thread running.
> > Yup, sure enough, it's caused by a major change in memory reclaim
> > behaviour:
> >
> > [ 0.000000] Zone ranges:
> > [ 0.000000] DMA [mem 0x0000000000001000-0x0000000000ffffff]
> > [ 0.000000] DMA32 [mem 0x0000000001000000-0x00000000ffffffff]
> > [ 0.000000] Normal [mem 0x0000000100000000-0x000000083fffffff]
> > [ 0.000000] Movable zone start for each node
> > [ 0.000000] Early memory node ranges
> > [ 0.000000] node 0: [mem 0x0000000000001000-0x000000000009efff]
> > [ 0.000000] node 0: [mem 0x0000000000100000-0x00000000bffdefff]
> > [ 0.000000] node 0: [mem 0x0000000100000000-0x00000003bfffffff]
> > [ 0.000000] node 0: [mem 0x00000005c0000000-0x00000005ffffffff]
> > [ 0.000000] node 0: [mem 0x0000000800000000-0x000000083fffffff]
> > [ 0.000000] Initmem setup node 0 [mem 0x0000000000001000-0x000000083fffffff]
> >
> > the numa=fake=4 CLI option is broken.
>
> Ok, I think that is independent of anything else. Removing block
> people and adding the x86 people.
>
> I'm not seeing anything at all that would change the fake numa stuff,
> but maybe the cpu hotplug changes?
>
> Thomas/Ingo/Peter - Dave is going away for several months, so you
> won't get feedback from him, but can you look at this? Or maybe point
> me towards the right people - I'm seeing no possible relevant changes
> at all fir x85 numa since 4.9, so it must be some indirect breakage.
>
> Dave is using fake-numa to do performance testing in a VM, and it's a
> big deal for the node optimizations for writeback etc. Do you have any
> ideas?
>
> Dave, if you're still around, can you send out the kernel config file
> you used...

Looking at this fresh this morning (i.e. not pissed off by having
everything I tried to do fail in different ways all afternoon) I
found this:

$ grep NUMA .config
CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
# CONFIG_NUMA is not set
$

The .config I was using for 4.9 got 'make oldconfig' upgraded, and
looking at it there's a bunch of stuff that has been turned off that
I know was set:

# CONFIG_EXPERT is not set
# CONFIG_PARAVIRT_SPINLOCKS is not set
# CONFIG_COMPACTION is not set

and stuff I never use so don't set was set, like kernel crash dump,
a bunch of stuff for AMD CPUs, susp/resume and power management
debug, every partition type and filesystem under the sun was
selected, heaps of network devices enabled, etc.

So it looks like the problem has occurred during oldconfig, meaning
I have no idea exactly WTF I was testing. Rebuilding now with a
saner config, see what happens.

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx