Re: sungem hangs in atomic if netconsole enabled but no carrier

From: Francois Romieu
Date: Tue Dec 20 2005 - 17:19:43 EST


Johannes Berg <johannes@xxxxxxxxxxxxxxxx> :
[...]
> think it should not hang the system completely. So far I haven't been
> able to figure out where it actually hangs and don't even know how to do
> so -- I'm open for suggestions on how to find out why/where it hangs or
> even fixes.

See the thread "Netconsole violates dev->hard_start_xmit synch rules"
started the 06/09/2005 on netdev@xxxxxxxxxxxxxxx for some interesting
background.

(the innocent hero slowly fades into the swamps of netpolling...)

Still with us ?

Were you using sundance.c, you would probably bug on the first timeout:

[net/sched/sch_generic.c]
static void dev_watchdog(unsigned long arg)
{
struct net_device *dev = (struct net_device *)arg;

spin_lock(&dev->xmit_lock);
^^^^^^^^^
if (dev->qdisc != &noop_qdisc) {
if (netif_device_present(dev) &&
netif_running(dev) &&
netif_carrier_ok(dev)) {
if (netif_queue_stopped(dev) &&
(jiffies - dev->trans_start) > dev->watchdog_timeo) {
printk(KERN_INFO "NETDEV WATCHDOG: %s: transmit timed out\n", dev->name);
dev->tx_timeout(dev);
^^^^^^^^^^^^^^^
[net/core/netpoll.c]
static void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb)
{
int status;
struct netpoll_info *npinfo;

if (!np || !np->dev || !netif_running(np->dev)) {
__kfree_skb(skb);
return;
}

npinfo = np->dev->npinfo;

/* avoid recursion */
if (npinfo->poll_owner == smp_processor_id() ||
np->dev->xmit_lock_owner == smp_processor_id()) {
if (np->drop)
np->drop(skb);
else
__kfree_skb(skb);
return;
}

do {
npinfo->tries--;
spin_lock(&np->dev->xmit_lock);
^^^^^^^^^

A quick glance shows no netif_carrier_{on/off} in the sundance driver.
It would be a good candidate.

However you are using sungem.c and despite the fact that I should really
have something for dinner *now*, you are protected by netif_carrier_off.

But (drums roll):

[drivers/net/sungem.c]
#define DEFAULT_MSG (NETIF_MSG_DRV | \
NETIF_MSG_PROBE | \
NETIF_MSG_LINK)

Thus gem_link_timer() will periodically complain that the link is down.

So gem_start_xmit() is issued.

Repeat until the TX ring is full: netif_stop_queue() is called.

gem_link_timer() printks.

net/core/netpoll.c::netpoll_send_skb() notices that the queue is stopped
and decides to try the usual NAPI poll(). A few function calls later, the
driver ends in drivers/net/sungem.c::gem_poll() where it takes so many
(irq-)locks that I do not even want to verify that it has a chance
to play nice with the pending gem_link_timer().

--
Ueimor
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/