Re: [ 072/143] tipc: fix lockdep warning during bearer initialization

From: Jon Maloy
Date: Mon May 12 2014 - 12:19:19 EST


This one is obsolete.
tipc_net_lock does not exist in the current code. It was removed in commit
7216cd949c9bd56a4ccd952c624ab68f8c9aa0a4("tipc: purge tipc_net_lock lock")

Regards
///jon

On 05/11/2014 08:33 PM, Willy Tarreau wrote:
> 2.6.32-longterm review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: Ying Xue <ying.xue@xxxxxxxxxxxxx>
>
> [ Upstream commit 4225a398c1352a7a5c14dc07277cb5cc4473983b ]
>
> When the lockdep validator is enabled, it will report the below
> warning when we enable a TIPC bearer:
>
> [ INFO: possible irq lock inversion dependency detected ]
> ---------------------------------------------------------
> Possible interrupt unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(ptype_lock);
> local_irq_disable();
> lock(tipc_net_lock);
> lock(ptype_lock);
> <Interrupt>
> lock(tipc_net_lock);
>
> *** DEADLOCK ***
>
> the shortest dependencies between 2nd lock and 1st lock:
> -> (ptype_lock){+.+...} ops: 10 {
> [...]
> SOFTIRQ-ON-W at:
> [<c1089418>] __lock_acquire+0x528/0x13e0
> [<c108a360>] lock_acquire+0x90/0x100
> [<c1553c38>] _raw_spin_lock+0x38/0x50
> [<c14651ca>] dev_add_pack+0x3a/0x60
> [<c182da75>] arp_init+0x1a/0x48
> [<c182dce5>] inet_init+0x181/0x27e
> [<c1001114>] do_one_initcall+0x34/0x170
> [<c17f7329>] kernel_init+0x110/0x1b2
> [<c155b6a2>] kernel_thread_helper+0x6/0x10
> [...]
> ... key at: [<c17e4b10>] ptype_lock+0x10/0x20
> ... acquired at:
> [<c108a360>] lock_acquire+0x90/0x100
> [<c1553c38>] _raw_spin_lock+0x38/0x50
> [<c14651ca>] dev_add_pack+0x3a/0x60
> [<c8bc18d2>] enable_bearer+0xf2/0x140 [tipc]
> [<c8bb283a>] tipc_enable_bearer+0x1ba/0x450 [tipc]
> [<c8bb3a04>] tipc_cfg_do_cmd+0x5c4/0x830 [tipc]
> [<c8bbc032>] handle_cmd+0x42/0xd0 [tipc]
> [<c148e802>] genl_rcv_msg+0x232/0x280
> [<c148d3f6>] netlink_rcv_skb+0x86/0xb0
> [<c148e5bc>] genl_rcv+0x1c/0x30
> [<c148d144>] netlink_unicast+0x174/0x1f0
> [<c148ddab>] netlink_sendmsg+0x1eb/0x2d0
> [<c1456bc1>] sock_aio_write+0x161/0x170
> [<c1135a7c>] do_sync_write+0xac/0xf0
> [<c11360f6>] vfs_write+0x156/0x170
> [<c11361e2>] sys_write+0x42/0x70
> [<c155b0df>] sysenter_do_call+0x12/0x38
> [...]
> }
> -> (tipc_net_lock){+..-..} ops: 4 {
> [...]
> IN-SOFTIRQ-R at:
> [<c108953a>] __lock_acquire+0x64a/0x13e0
> [<c108a360>] lock_acquire+0x90/0x100
> [<c15541cd>] _raw_read_lock_bh+0x3d/0x50
> [<c8bb874d>] tipc_recv_msg+0x1d/0x830 [tipc]
> [<c8bc195f>] recv_msg+0x3f/0x50 [tipc]
> [<c146a5fa>] __netif_receive_skb+0x22a/0x590
> [<c146ab0b>] netif_receive_skb+0x2b/0xf0
> [<c13c43d2>] pcnet32_poll+0x292/0x780
> [<c146b00a>] net_rx_action+0xfa/0x1e0
> [<c103a4be>] __do_softirq+0xae/0x1e0
> [...]
> }
>
>>From the log, we can see three different call chains between
> CPU0 and CPU1:
>
> Time 0 on CPU0:
>
> kernel_init()->inet_init()->dev_add_pack()
>
> At time 0, the ptype_lock is held by CPU0 in dev_add_pack();
>
> Time 1 on CPU1:
>
> tipc_enable_bearer()->enable_bearer()->dev_add_pack()
>
> At time 1, tipc_enable_bearer() first holds tipc_net_lock, and then
> wants to take ptype_lock to register TIPC protocol handler into the
> networking stack. But the ptype_lock has been taken by dev_add_pack()
> on CPU0, so at this time the dev_add_pack() running on CPU1 has to be
> busy looping.
>
> Time 2 on CPU0:
>
> netif_receive_skb()->recv_msg()->tipc_recv_msg()
>
> At time 2, an incoming TIPC packet arrives at CPU0, hence
> tipc_recv_msg() will be invoked. In tipc_recv_msg(), it first wants
> to hold tipc_net_lock. At the moment, below scenario happens:
>
> On CPU0, below is our sequence of taking locks:
>
> lock(ptype_lock)->lock(tipc_net_lock)
>
> On CPU1, our sequence of taking locks looks like:
>
> lock(tipc_net_lock)->lock(ptype_lock)
>
> Obviously deadlock may happen in this case.
>
> But please note the deadlock possibly doesn't occur at all when the
> first TIPC bearer is enabled. Before enable_bearer() -- running on
> CPU1 does not hold ptype_lock, so the TIPC receive handler (i.e.
> recv_msg()) is not registered successfully via dev_add_pack(), so
> the tipc_recv_msg() cannot be called by recv_msg() even if a TIPC
> message comes to CPU0. But when the second TIPC bearer is
> registered, the deadlock can perhaps really happen.
>
> To fix it, we will push the work of registering TIPC protocol
> handler into workqueue context. After the change, both paths taking
> ptype_lock are always in process contexts, thus, the deadlock should
> never occur.
>
> Signed-off-by: Ying Xue <ying.xue@xxxxxxxxxxxxx>
> Signed-off-by: Jon Maloy <jon.maloy@xxxxxxxxxxxx>
> Signed-off-by: Paul Gortmaker <paul.gortmaker@xxxxxxxxxxxxx>
> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>
> Signed-off-by: Willy Tarreau <w@xxxxxx>
> ---
> net/tipc/eth_media.c | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/net/tipc/eth_media.c b/net/tipc/eth_media.c
> index 524ba56..22453a8 100644
> --- a/net/tipc/eth_media.c
> +++ b/net/tipc/eth_media.c
> @@ -56,6 +56,7 @@ struct eth_bearer {
> struct tipc_bearer *bearer;
> struct net_device *dev;
> struct packet_type tipc_packet_type;
> + struct work_struct setup;
> };
>
> static struct eth_bearer eth_bearers[MAX_ETH_BEARERS];
> @@ -122,6 +123,17 @@ static int recv_msg(struct sk_buff *buf, struct net_device *dev,
> }
>
> /**
> + * setup_bearer - setup association between Ethernet bearer and interface
> + */
> +static void setup_bearer(struct work_struct *work)
> +{
> + struct eth_bearer *eb_ptr =
> + container_of(work, struct eth_bearer, setup);
> +
> + dev_add_pack(&eb_ptr->tipc_packet_type);
> +}
> +
> +/**
> * enable_bearer - attach TIPC bearer to an Ethernet interface
> */
>
> @@ -157,7 +169,8 @@ static int enable_bearer(struct tipc_bearer *tb_ptr)
> eb_ptr->tipc_packet_type.af_packet_priv = eb_ptr;
> INIT_LIST_HEAD(&(eb_ptr->tipc_packet_type.list));
> dev_hold(dev);
> - dev_add_pack(&eb_ptr->tipc_packet_type);
> + INIT_WORK(&eb_ptr->setup, setup_bearer);
> + schedule_work(&eb_ptr->setup);
> }
>
> /* Associate TIPC bearer with Ethernet bearer */
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/