Re: net: kernel BUG() in net/netns/generic.h:45

From: Eric W. Biederman
Date: Thu Apr 05 2012 - 19:49:37 EST


Sasha Levin <levinsasha928@xxxxxxxxx> writes:

> Hi all,
>
> When an initialization of a network namespace in setup_net() fails, we
> try to undo everything by executing each of the exit callbacks of every
> namespace in the network.
>
> The problem is, it might be possible that the net_generic array wasn't
> initialized before we fail and try to undo everything. At that point,
> some of the networks assume that since we're already calling the exit
> callback, the net_generic structure is initialized and we hit the BUG()
> in net/netns/generic.h:45 .
>
> I'm not quite sure whether the right fix from the following three
> options is, and would be happy to figure it out before fixing it:
>
> 1. Don't assume net_generic was initialized in the exit callback, which
> is a bit problematic since we can't query that nicely anyway (a
> sub-option here would be adding an API to query whether the net_generic
> structure is initialized.
>
> 2. Remove the BUG(), switch it to a WARN() and let each subsystem
> handle the case of NULL on it's own. While it sounds a bit wrong, it's
> worth mentioning that that BUG() was initially added in an attempt to
> fix an issue in CAIF, which was fixed in a completely different way
> afterwards, so it's not strictly necessary here.
>
> 3. Only call the exit callback for subsystems we have called the init
> callback for.

Your option 3 only calling the exit callbacks for subsystems we have
initialized should be what is implemented. Certainly it looks like we
are attempting to only call the exit callbacks for code whose init
callback has succeeded.

What problem are you seeing?

This smells suspiciously like a problem we had a little while ago caif
was registering as a pernet device instead of a pernet subsystem,
and because of that we had packets flying around after it had been
unregistered and was trying access it's net_generic data.

commit 8a8ee9aff6c3077dd9c2c7a77478e8ed362b96c6
Author: Eric W. Biederman <ebiederm@xxxxxxxxxxxx>
Date: Thu Jan 26 14:04:53 2012 +0000

net caif: Register properly as a pernet subsystem.

caif is a subsystem and as such it needs to register with
register_pernet_subsys instead of register_pernet_device.

Among other problems using register_pernet_device was resulting in
net_generic being called before the caif_net structure was allocated.
Which has been causing net_generic to fail with either BUG_ON's or by
return NULL pointers.

A more ugly problem that could be caused is packets in flight why the
subsystem is shutting down.

To remove confusion also remove the cruft cause by inappropriately
trying to fix this bug.

With the aid of the previous patch I have tested this patch and
confirmed that using register_pernet_subsys makes the failure go away as
it should.

Signed-off-by: Eric W. Biederman <ebiederm@xxxxxxxxxxxx>
Acked-by: Sjur BrÃndeland <sjur.brandeland@xxxxxxxxxxxxxx>
Tested-by: Sasha Levin <levinsasha928@xxxxxxxxx>
Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/