RE: netfilter: iptables-restore: setsockopt(3, SOL_IP,IPT_SO_SET_REPLACE, "security...", ...) return -EAGAIN
From: Dexuan Cui
Date: Thu May 13 2021 - 00:19:15 EST
> From: Dexuan Cui
> Sent: Wednesday, May 12, 2021 3:17 PM
> ...
> It looks like there is a race condition between iptables-restore calls
> getsockopt() to get the number of table entries and iptables call
> setsockopt() to replace the entries? Looks like some other program is
> concurrently calling getsockopt()/setsockopt() -- but it looks like this is
> not the case according to the messages I print via trace_printk() around
> do_replace() in do_ipt_set_ctl(): when the -EAGAIN error happens, there is
> no other program calling do_replace(); the table entry number was changed
> to 5 by another program 'iptables' about 1.3 milliseconds ago, and then
> this program 'iptables-restore' calls setsockopt() and the kernel sees
> 'num_counters' being 5 and the 'private->number' being 6 (how can this
> happen??); the next setsockopt() call for the same 'security' table
> happens in about 1 minute with both the numbers being 6.
Well, after tracing the code more closely, it turns out that the program
'iptables' indeed concurrently changes the number from 5 to 6, and this
causes the 'iptable-restores' program to get EAGAIN:
1. iptables-906 (906 is the process ID) calls IPT_SO_GET_ENTRIES and the
current num_entries is 4; the process calls IPT_SO_SET_REPLACE and the
private->number becomes 5.
2. iptables-restor-895 calls IPT_SO_GET_ENTRIES, and gets num_entries==5.
3. iptables-906 calls IPT_SO_GET_ENTRIES again, and also gets num_entries==5;
the process calls IPT_SO_SET_REPLACE and the private->number becomes 6.
4. iptables-restor-895 calls IPT_SO_SET_REPLACE and the kernel function
xt_replace_table() returns -EAGAIN due to num_counters == 5 and
private->number == 6.
I think the latest mainline kernel should also have the same race.
It looks like this by-design race exists since day one?
> BTW, iptables does have a retry mechanism for getsockopt():
> 2f93205b375e ("Retry ruleset dump when kernel returns EAGAIN.")
> (https://git.netfilter.org/iptables/commit/libiptc?id=2f93205b375e&context=10
> &ignorews=0&dt=0)
>
> But it looks like this is enough? e.g. here getsockopt() returns 0, but
> setsockopt() returns -EAGAIN.
It looks like we need to add a retry mechanism to the iptables-restore
program: iptables/iptables-restore.c: ip46tables_restore_main():
if cb->ops->commit(handle) -> ... -> setsockopt(...,IPT_SO_SET_REPLACE, ...)
fails due to EAGAIN, it should start over from the very begining, i.e. call
create_handle() -> handle = cb->ops->init(tablename) again to get the new
num_entries, and retry the commit op. But I'm not sure how to easily
re-create the context associated with the old handle (should/can it re-parse
the rules?), as I'm not really familiar with iptables.
Or, is it possible to fix the race condition from the netfilter module?
Thanks,
-- Dexuan