Re: [PATCH] bonding: replace system timer with work queue

From: Jay Vosburgh
Date: Thu Mar 01 2007 - 12:01:00 EST


Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:

>On Wed, 28 Feb 2007 10:12:01 +0100 (CET) Jaroslav Kysela <perex@xxxxxxx> wrote:
>> ==================
>> bonding: replace system timer with work queue
>>
>> This patch replaces system timer with work queue in monitor functions.
>> The reason for this change is that bonding handlers calls various
>> sleeping functions from the timer handler which is not allowed.
>
>Which sleeping functions? I'd have expected the kernel to spew runtime
>warnings when this happens, but I don't recall any such reports.

This affects one specific mode (balance-alb) in one specific
case (moving MAC addresses around, which happens during failover or
initialization), and a full fix is more complicated than just a switch
to work queues, although that is part of the full fix. There are three
things going on: calls to sleeping functions with locks held, the same
calls from the timer context, and rtnl hold issues.

The actual functions affected are various things called by
notifier NETDEV_CHANGEADDR callbacks started by dev_set_mac_address() as
well as some of the driver level set_mac_address functions that may
sleep.

Andy Gospodarek <andy@xxxxxxxxxxxxx> and I have been working
jointly on a two phased fix for these problems: he's working up the
short term fix, which includes the changeover to workqueues, and I've
been working on the long term fix, which involves refactoring the
bonding link monitoring and failover system. Jaroslav's patch looks to
be a subset of the patch Andy is working on.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, fubar@xxxxxxxxxx
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/