Re: [PATCH] netfilter: use per-cpu spinlock rather than RCU

From: Stephen Hemminger
Date: Tue Apr 14 2009 - 13:19:58 EST


On Tue, 14 Apr 2009 17:49:57 +0200
Eric Dumazet <dada1@xxxxxxxxxxxxx> wrote:

> Stephen Hemminger a Ãcrit :
> > On Tue, 14 Apr 2009 16:23:33 +0200
> > Eric Dumazet <dada1@xxxxxxxxxxxxx> wrote:
> >
> >> Patrick McHardy a Ãcrit :
> >>> Stephen Hemminger wrote:
> >>>> This is an alternative version of ip/ip6/arp tables locking using
> >>>> per-cpu locks. This avoids the overhead of synchronize_net() during
> >>>> update but still removes the expensive rwlock in earlier versions.
> >>>>
> >>>> The idea for this came from an earlier version done by Eric Duzamet.
> >>>> Locking is done per-cpu, the fast path locks on the current cpu
> >>>> and updates counters. The slow case involves acquiring the locks on
> >>>> all cpu's.
> >>>>
> >>>> The mutex that was added for 2.6.30 in xt_table is unnecessary since
> >>>> there already is a mutex for xt[af].mutex that is held.
> >>>>
> >>>> Tested basic functionality (add/remove/list), but don't have test cases
> >>>> for stress, ip6tables or arptables.
> >>> Thanks Stephen, I'll do some testing with ip6tables.
> >> Here is the patch I cooked on top of Stephen one to get proper locking.
> >
> > I see no demonstrated problem with locking in my version.
>
> Yes, I did not crash any machine around there, should we wait for a bug report ? :)
>
> > The reader/writer race is already handled. On replace the race of
> >
> > CPU 0 CPU 1
> > lock (iptables(1))
> > refer to oldinfo
> > swap in new info
> > foreach CPU
> > lock iptables(i)
> > (spin) unlock(iptables(1))
> > read oldinfo
> > unlock
> > ...
> >
> > The point is my locking works, you just seem to feel more comfortable with
> > a global "stop all CPU's" solution.
>
> Oh right, I missed that xt_replace_table() was *followed* by a get_counters()
> call, but I am pretty sure something is needed in xt_replace_table().
>
> A memory barrier at least (smp_wmb())
>
> As soon as we do "table->private = newinfo;", other cpus might fetch incorrect
> values for newinfo->fields.
>
> In the past, we had a write_lock_bh()/write_unlock_bh() pair that was
> doing this for us.
> Then we had rcu_assign_pointer() that also had this memory barrier implied.
>
> Even if vmalloc() calls we do before calling xt_replace_table() probably
> already force barriers, add one for reference, just in case we change callers
> logic to call kmalloc() instead of vmalloc() or whatever...
>

You are right, doing something with barrier would be safer there.
How about using xchg?

@@ -682,26 +668,19 @@ xt_replace_table(struct xt_table *table,
struct xt_table_info *newinfo,
int *error)
{
- struct xt_table_info *oldinfo, *private;
+ struct xt_table_info *private;

/* Do the substitution. */
- mutex_lock(&table->lock);
private = table->private;
/* Check inside lock: is the old number correct? */
if (num_counters != private->number) {
duprintf("num_counters != table->private->number (%u/%u)\n",
num_counters, private->number);
- mutex_unlock(&table->lock);
*error = -EAGAIN;
return NULL;
}
- oldinfo = private;
- rcu_assign_pointer(table->private, newinfo);
- newinfo->initial_entries = oldinfo->initial_entries;
- mutex_unlock(&table->lock);
-
- synchronize_net();
- return oldinfo;
+ newinfo->initial_entries = private->initial_entries;
+ return xchg(&table->private, newinfo);
}
EXPORT_SYMBOL_GPL(xt_replace_table);

P.s: we all missed the ordering bug in the RCU version??

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/