Re: [PATCH net/next] bridge:Add rcu read lock when delete br port

From: Michael S. Tsirkin
Date: Tue Aug 05 2014 - 06:12:55 EST


On Tue, Aug 05, 2014 at 12:43:09AM +0000, Lichunhe wrote:
> >-----Original Message-----
> >From: Michael S. Tsirkin [mailto:mst@xxxxxxxxxx]
> >Sent: Monday, August 04, 2014 8:41 PM
> >To: Lichunhe
> >Cc: vyasevic@xxxxxxxxxx; xiyou.wangcong@xxxxxxxxx;
> >makita.toshiaki@xxxxxxxxxxxxx; stephen@xxxxxxxxxxxxxxxxxx;
> >ebiederm@xxxxxxxxxxxx; f.fainelli@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> >Wuyunfei; Qianhuibin (Huibin QIAN, Euler)
> >Subject: Re: [PATCH net/next] bridge:Add rcu read lock when delete br port
> >
> >On Mon, Aug 04, 2014 at 11:37:56AM +0800, lichunhe@xxxxxxxxxx wrote:
> >> From: Chunhe Li <lichunhe@xxxxxxxxxx>
> >>
> >> In the br_hanle_frame function has a bug, when the bridge receive
> >> packets which go througth the br_handle_frame, get the net_bridge_port
> >> pointer "p", but don't check NULL pointer to use it. If somebody
> >> delete the bridge port at the same time, will call a NULL pointer,
> >> trigger kernel panic. I see the del_nbp comments, call del_nbp should via RCU,
> >but the caller don't do this.
> >
> >I don't see such a comment there.
> >
> >Are you talking about this line:
> > p = br_port_get_rcu(skb->dev);
> >
>
> Yes, this var "p" is NULL when the bug happened.
>
> >this is actually rx_handler_data.
> >The reason it should not be NULL is
> >explained here:
> >
> >void netdev_rx_handler_unregister(struct net_device *dev) {
> >
> > ASSERT_RTNL();
> > RCU_INIT_POINTER(dev->rx_handler, NULL);
> > /* a reader seeing a non NULL rx_handler in a rcu_read_lock()
> > * section has a guarantee to see a non NULL rx_handler_data
> > * as well.
> > */
> > synchronize_net();
> > RCU_INIT_POINTER(dev->rx_handler_data, NULL); }
> >EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister);
> >
> >
> >
> >> following steps will make bug happened 1.start vm and add the vm
> >> interface to a bridge br0,for example, brctl addbr br0 tap0
> >>
> >> 2.configuer vm interface and br0 same ip subnet, vm ping br0.
> >>
> >> 3.add and delete the vm interface port for endless loop.
> >>
> >> Signed-off-by: Chunhe Li <lichunhe@xxxxxxxxxx>
> >
> >OK but apparently something else triggered the bug here.
> >It might be a good idea to enable lockdep and rcu checks see if anything
> >suspicious is reported.
> >
> >
> >> ---
> >> net/bridge/br_if.c | 4 ++++
> >> 1 file changed, 4 insertions(+)
> >>
> >> diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c index
> >> 3eca3fd..91c611d 100644
> >> --- a/net/bridge/br_if.c
> >> +++ b/net/bridge/br_if.c
> >> @@ -274,9 +274,11 @@ void br_dev_delete(struct net_device *dev, struct
> >list_head *head)
> >> struct net_bridge *br = netdev_priv(dev);
> >> struct net_bridge_port *p, *n;
> >>
> >> + rcu_read_lock();
> >> list_for_each_entry_safe(p, n, &br->port_list, list) {
> >> del_nbp(p);
> >> }
> >> + rcu_read_unlock();
> >>
> >> br_fdb_delete_by_port(br, NULL, 1);
> >>
> >> @@ -550,7 +552,9 @@ int br_del_if(struct net_bridge *br, struct net_device
> >*dev)
> >> * there still maybe an alternate path for netconsole to use;
> >> * therefore there is no reason for a NETDEV_RELEASE event.
> >> */
> >> + rcu_read_lock();
> >> del_nbp(p);
> >> + rcu_read_unlock();
> >>
> >> spin_lock_bh(&br->lock);
> >> changed_addr = br_stp_recalculate_bridge_id(br);
> >
> >
> >Does the problem disappear with this applied?
> >I don't see how this would help. rcu locks do not synchronize against other
> >readers.
> >
> >
>
> Maybe I understand wrong, please ignore this patch, do you have better way to solve this problem?

As I wrote above, need to understand the problem first, we can't fix it
otherwise.

It seems possible that what you are seeing is use after free or an RCU
error. Enable as many debugging options in kernel build (under kernel
hacking) as you can and see if anything comes up in kernel log.

> >> --
> >> 1.9.2.0
> >>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/