Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"

From: Will Deacon
Date: Wed Feb 08 2017 - 04:47:27 EST


On Mon, Feb 06, 2017 at 06:49:42PM +0000, Russell King - ARM Linux wrote:
> On Mon, Feb 06, 2017 at 06:46:19PM +0000, Will Deacon wrote:
> > Converting the smc91x driver over to NAPI would probably solve this problem,
> > but given the "vintage" of this code, I'd be more tempted by a simpler
> > point fix if only I could think of one.
>
> I'm not sure if converting it to NAPI would solve it, or just move
> the problem elsewhere - IOW, move it from "we need to drop the packet
> because we couldn't allocate a skb" to "the hardware dropped the packed
> because the FIFO was full."

That's quite possible. I did a quick hack using a threaded irq handler,
with the thread basically running a modified version of smc_rcv using
GFP_KERNEL allocations. Whilst this improves things significantly, I do
still see rx drops, probably for the reason you mention above.

Still, NAPI should be better than what mainline is currently doing because
it won't continuously interrupt ksoftirqd when in polling mode. It's all
a rather delicate balancing act and getting back to the old behaviour might
not be possible after 4cd13c21b207.

> Yes, I'm intending giving it a go, once I've a spare moment to build
> a kernel for the platform etc. It runs root NFS, so should be a good
> test for it.

Thanks, that would be interesting. We resurrected one of our realview-eb
machines with this NIC, but I think it's all on an FPGA so the relative
speed of the NIC vs the CPU isn't different enough that we see the problem.

Will