Re: net_ns cleanup / RCU overhead

From: Eric W. Biederman
Date: Fri Aug 29 2014 - 17:57:21 EST


Julian Anastasov <ja@xxxxxx> writes:

> Hello,
>
> On Thu, 28 Aug 2014, Simon Kirby wrote:
>
>> I noticed that [kworker/u16:0]'s stack is often:
>>
>> [<ffffffff810942a6>] wait_rcu_gp+0x46/0x50
>> [<ffffffff8109607e>] synchronize_sched+0x2e/0x50
>> [<ffffffffa00385ac>] nf_nat_net_exit+0x2c/0x50 [nf_nat]
>
> I guess the problem is in nf_nat_net_exit,
> may be other nf exit handlers too. pernet-exit handlers
> should avoid synchronize_rcu and rcu_barrier.
> A RCU callback and rcu_barrier in module-exit is the way
> to go. cleanup_net includes rcu_barrier, so pernet-exit
> does not need such calls.

In principle I agree, however in this particular case it looks a bit
tricky because a separate hash table to track nat state per network
namespace.

At the same time all of the packets should be drained before
we get to nf_nat_net_exit so it doesn't look the synchronize_rcu
in nf_nat_exit is actually protecting anything.

Further calling a rcu delay function in net_exit methods largely
destroys the batched cleanup of network namespaces, so it is very
unpleasant.

Could someone who knows nf_nat_core.c better than I do look and
see if we can just remove the synchronize_rcu in nf_nat_exit?

>> [<ffffffff81720339>] ops_exit_list.isra.4+0x39/0x60
>> [<ffffffff817209e0>] cleanup_net+0xf0/0x1a0
>> [<ffffffff81062997>] process_one_work+0x157/0x440
>> [<ffffffff81063303>] worker_thread+0x63/0x520
>> [<ffffffff81068b96>] kthread+0xd6/0xf0
>> [<ffffffff818d412c>] ret_from_fork+0x7c/0xb0
>> [<ffffffffffffffff>] 0xffffffffffffffff

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/