Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"

From: Brian Starkey
Date: Thu Nov 17 2016 - 12:07:27 EST


Hi Eric,

On Thu, Nov 17, 2016 at 07:29:14AM -0800, Eric Dumazet wrote:
On Wed, Nov 16, 2016 at 1:01 PM, Brian Starkey <brian.starkey@xxxxxxx> wrote:
On Wed, Nov 16, 2016 at 10:49:06AM -0800, Eric Dumazet wrote:

On Wed, Nov 16, 2016 at 10:01 AM, Brian Starkey <brian.starkey@xxxxxxx>
wrote:


The smc91x driver does seem to have some trickiness around softirqs.
I'm not familiar with net drivers, but I'll see if I can figure
anything out there.


Oh this code looks ugly :(

Do you have CONFIG_SMP=y or not ?


Yeah CONFIG_SMP=y (and CONFIG_PREEMPT=y too, fwiw).

I did try forcing it into the no-op locking (as though config SMP
wasn't set), it didn't help (and it doesn't look like that would be
safe with CONFIG_PREEMPT=y either).

The bit in smc_hardware_send_pkt looks like skipping softirq
invocation when there's already one running wouldn't give the same
behaviour as before:

if (!smc_special_trylock(&lp->lock, flags)) {
netif_stop_queue(dev);
tasklet_schedule(&lp->tx_task);
return;
}

... that said, I've no idea if that matters.

Of course I also don't know if the network driver is even to blame :-(


I believe the problem is in SMC_WAIT_MMU_BUSY()

Could you try this patch ? (inlined and attached)


No joy with this patch :-(

I had to add an ioaddr argument because apparently that macro depends
on local context (yuck...), but it doesn't help my issue.

FWIW I don't see any timeouts, either with or without the patch.
(I don't know for sure, but I would guess that the model of the
network card doesn't model whatever stall that loop is checking for.
It probably just completes all MMU operations immediately)

-Brian