Re: WARNING: CPU: 1 PID: 1 at net/core/netpoll.c:370 netpoll_send_skb+0x1fc/0x20c at boot when netconsole is enabled (kernel v6.9-rc5, v6.8.7, sungem, PowerMac G4 DP)
From: Erhard Furtner
Date: Mon May 06 2024 - 20:43:21 EST
On Mon, 6 May 2024 07:26:45 -0700
Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
> On Sun, 5 May 2024 23:27:13 +0200 Erhard Furtner wrote:
> > > On Sun, 28 Apr 2024 12:53:06 +0200 Erhard Furtner wrote:
> > > > With netconsole enabled I get this "WARNING: CPU: 1 PID: 1 at
> > > > net/core/netpoll.c:370 netpoll_send_skb+0x1fc/0x20c" and "WARNING:
> > > > CPU: 1 PID: 1 at kernel/locking/irqflag-debug.c:10
> > > > warn_bogus_irq_restore+0x30/0x44" at boot on my PowerMac G4 DP.
> > > > Happens more often than not (6-7 out of 10 times booting):
> > >
> > > Could you try with LOCKDEP enabled?
> > > I wonder if irqs_disabled() behaves differently than we expect.
> >
> > Ok, after a few tries I got a "BUG: spinlock wrong CPU on CPU#0, swapper/0/1" LOCKDEP hit. But this does not happen every time when I get the netpoll_send WARNING:
>
> Oh, can you try deleting the gem_poll_controller() function?
> Unhook it from ndo_poll_controller and remove it completely.
Ok, this is the resulting diff:
diff --git a/drivers/net/ethernet/sun/sungem.c b/drivers/net/ethernet/sun/sungem.c
index 9bd1df8308d2..d3a2fbb14140 100644
--- a/drivers/net/ethernet/sun/sungem.c
+++ b/drivers/net/ethernet/sun/sungem.c
@@ -949,17 +949,6 @@ static irqreturn_t gem_interrupt(int irq, void *dev_id)
return IRQ_HANDLED;
}
-#ifdef CONFIG_NET_POLL_CONTROLLER
-static void gem_poll_controller(struct net_device *dev)
-{
- struct gem *gp = netdev_priv(dev);
-
- disable_irq(gp->pdev->irq);
- gem_interrupt(gp->pdev->irq, dev);
- enable_irq(gp->pdev->irq);
-}
-#endif
-
static void gem_tx_timeout(struct net_device *dev, unsigned int txqueue)
{
struct gem *gp = netdev_priv(dev);
@@ -2839,9 +2828,6 @@ static const struct net_device_ops gem_netdev_ops = {
.ndo_change_mtu = gem_change_mtu,
.ndo_validate_addr = eth_validate_addr,
.ndo_set_mac_address = gem_set_mac_address,
-#ifdef CONFIG_NET_POLL_CONTROLLER
- .ndo_poll_controller = gem_poll_controller,
-#endif
};
static int gem_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
And indeed without gem_poll_controller() I don't hit the "WARNING: CPU: 1 PID: 1 at net/core/netpoll.c:370 netpoll_send_skb+0x1fc/0x20c" and "WARNING: CPU: 1 PID: 1 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x30/0x44" or the according lockdep bug at boot!
Re-booted the machine about 20 times without anything suspicious showing up in the dmesg. With the unpatched kernel I got the WARNING at the 2nd reboot.
What I still get with 'modprobe -v dev_addr_lists_test', even with gem_poll_controller() removed is:
[...]
KTAP version 1
1..1
KTAP version 1
# Subtest: dev-addr-list-test
# module: dev_addr_lists_test
1..6
====================================
WARNING: kunit_try_catch/1770 still has locks held!
6.9.0-rc6-PMacG4-dirty #5 Tainted: G W N
------------------------------------
1 lock held by kunit_try_catch/1770:
#0: c0dbfce4 (rtnl_mutex){....}-{3:3}, at: dev_addr_test_init+0xbc/0xc8 [dev_addr_lists_test]
stack backtrace:
CPU: 0 PID: 1770 Comm: kunit_try_catch Tainted: G W N 6.9.0-rc6-PMacG4-dirty #5
Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
Call Trace:
[f3749ef0] [c07c2bec] dump_stack_lvl+0x80/0xac (unreliable)
[f3749f10] [c004fe64] do_exit+0x5b4/0x834
[f3749f60] [c006d848] kthread_complete_and_exit+0x0/0x28
[f3749f80] [c006d870] kthread+0x0/0xe8
[f3749fa0] [bebf0cf4] kunit_try_catch_run+0x0/0x15c [kunit]
[f3749fc0] [c006d954] kthread+0xe4/0xe8
[f3749ff0] [c0015304] start_kernel_thread+0x10/0x14
ok 1 dev_addr_test_basic
ok 2 dev_addr_test_sync_one
ok 3 dev_addr_test_add_del
ok 4 dev_addr_test_del_main
ok 5 dev_addr_test_add_set
ok 6 dev_addr_test_add_excl
# dev-addr-list-test: pass:6 fail:0 skip:0 total:6
# Totals: pass:6 fail:0 skip:0 total:6
ok 1 dev-addr-list-test
[...]
Regards,
Erhard