RE: netfilter: iptables-restore: setsockopt(3, SOL_IP,IPT_SO_SET_REPLACE, "security...", ...) return -EAGAIN

From: Dexuan Cui
Date: Thu May 13 2021 - 00:19:15 EST

> From: Dexuan Cui
> Sent: Wednesday, May 12, 2021 3:17 PM
> ...
> It looks like there is a race condition between iptables-restore calls
> getsockopt() to get the number of table entries and iptables call
> setsockopt() to replace the entries? Looks like some other program is
> concurrently calling getsockopt()/setsockopt() -- but it looks like this is
> not the case according to the messages I print via trace_printk() around
> do_replace() in do_ipt_set_ctl(): when the -EAGAIN error happens, there is
> no other program calling do_replace(); the table entry number was changed
> to 5 by another program 'iptables' about 1.3 milliseconds ago, and then
> this program 'iptables-restore' calls setsockopt() and the kernel sees
> 'num_counters' being 5 and the 'private->number' being 6 (how can this
> happen??); the next setsockopt() call for the same 'security' table
> happens in about 1 minute with both the numbers being 6.

Well, after tracing the code more closely, it turns out that the program
'iptables' indeed concurrently changes the number from 5 to 6, and this
causes the 'iptable-restores' program to get EAGAIN:

1. iptables-906 (906 is the process ID) calls IPT_SO_GET_ENTRIES and the
current num_entries is 4; the process calls IPT_SO_SET_REPLACE and the
private->number becomes 5.

2. iptables-restor-895 calls IPT_SO_GET_ENTRIES, and gets num_entries==5.

3. iptables-906 calls IPT_SO_GET_ENTRIES again, and also gets num_entries==5;
the process calls IPT_SO_SET_REPLACE and the private->number becomes 6.

4. iptables-restor-895 calls IPT_SO_SET_REPLACE and the kernel function
xt_replace_table() returns -EAGAIN due to num_counters == 5 and
private->number == 6.

I think the latest mainline kernel should also have the same race.
It looks like this by-design race exists since day one?

> BTW, iptables does have a retry mechanism for getsockopt():
> 2f93205b375e ("Retry ruleset dump when kernel returns EAGAIN.")
> (
> &ignorews=0&dt=0)
> But it looks like this is enough? e.g. here getsockopt() returns 0, but
> setsockopt() returns -EAGAIN.

It looks like we need to add a retry mechanism to the iptables-restore
program: iptables/iptables-restore.c: ip46tables_restore_main():

if cb->ops->commit(handle) -> ... -> setsockopt(...,IPT_SO_SET_REPLACE, ...)
fails due to EAGAIN, it should start over from the very begining, i.e. call
create_handle() -> handle = cb->ops->init(tablename) again to get the new
num_entries, and retry the commit op. But I'm not sure how to easily
re-create the context associated with the old handle (should/can it re-parse
the rules?), as I'm not really familiar with iptables.

Or, is it possible to fix the race condition from the netfilter module?

-- Dexuan