Re: userns, netns, and quick physical memory consumption by unprivileged user

From: Florian Westphal
Date: Fri Mar 11 2016 - 10:34:15 EST

Yuriy M. Kaminskiy <yumkam@xxxxxxxxx> wrote:
> BTW, all those hash/conntrack/etc default sizes was calculated from
> physical memory size in assumption there will be only *one* instance of
> those tables. Obviously, introduction of network namespaces (and
> especially unprivileged user-ns) thrown this assumption in the window
> (and here comes that "falling back to vmalloc" message again; in pre-netns
> world, those tables were allocated *once* on early system startup, with
> typically plenty of free and unfragmented memory).

No idea how to fix this expect by removing conntrack support in net
namespaces completely.

I'd disallow all write accesses to skb->nfct (NAT, CONNMARK,
CONNSECMARK, ...) and then no longer clear skb->nfct when forwarding
packet from init_ns to container.

Containers could then still test conntrack as seen from init namespace pov
in PREROUTING/FORWARD/INPUT (but not OUTPUT, obviously).

[ OUTPUT *might* be doable as well by allowing NEW creation in output
but skipping nat and deferring the confirmation/commit of the new
entry to the table until skb leaves initns ]

We could key conntrack entries to initns conntrack table
instead of adding one new table per netns, but seems like this only
replaces one problem with a new one (filling/blocking initns table from
another netns).

Maybe we could go with a compromise and skip/disallow conntrack in
unpriv userns only?