Re: warn: Turn the netdev timeout WARN_ON() into a WARN()

From: Jeff Garzik
Date: Tue Sep 16 2008 - 23:27:19 EST


On Wed, Sep 17, 2008 at 02:59:12AM +0000, Linux Kernel Mailing List wrote:
>
> this patch turns the netdev timeout WARN_ON_ONCE() into a WARN_ONCE(),
> so that the device and driver names are inside the warning message.
> This helps automated tools like kerneloops.org to collect the data
> and do statistics, as well as making it more likely that humans
> cut-n-paste the important message as part of a bugreport.
>
> Signed-off-by: Arjan van de Ven <arjan@xxxxxxxxxxxxxxx>
> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>
> +#define WARN_ONCE(condition, format...) ({ \
> + static int __warned; \
> + int __ret_warn_once = !!(condition); \
> + \
> + if (unlikely(__ret_warn_once)) \
> + if (WARN(!__warned, format)) \
> + __warned = 1; \
> + unlikely(__ret_warn_once); \
> +})
> +
> #define WARN_ON_RATELIMIT(condition, state) \
> WARN_ON((condition) && __ratelimit(state))
>
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 9634091..ec0a083 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -215,10 +215,9 @@ static void dev_watchdog(unsigned long arg)
> time_after(jiffies, (dev->trans_start +
> dev->watchdog_timeo))) {
> char drivername[64];
> - printk(KERN_INFO "NETDEV WATCHDOG: %s (%s): transmit timed out\n",
> + WARN_ONCE(1, KERN_INFO "NETDEV WATCHDOG: %s (%s): transmit timed out\n",
> dev->name, netdev_drivername(dev, drivername, 64));
> dev->tx_timeout(dev);
> - WARN_ON_ONCE(1);


hrm, am I misunderstanding?

AFAICS, this change means the user is no longer notified [after
the first time] of a condition they really need to know about --
a hardware or driver bug.

These conditions can occur many hours or days apart, and the admin
needs to know EACH time it occurs, because it is a major networking
event, generally leading to a complete reset of the entire hardware.

And quite honestly, the backtrace is not useful (yes, even the one
that existing previously)... THINK for a second. The backtrace
is going to look exactly the same, since it is a timer-triggered
dev_watchdog() call.

NETDEV WATCHDOG timeouts are not easily fixable errors like lockdep
warnings, and the admin really does need to see each one.

Unless I am missing something, (1) this patch should be reverted,
and in additional, (2) I recommend removing the WARN_ON_ONCE()
because the backtrace is not helpful.

Jeff



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/