Re: [PATCH 4.9 50/71] inet: frags: use rhashtables for reassembly units

From: Greg Kroah-Hartman
Date: Thu Nov 29 2018 - 07:54:27 EST


On Fri, Oct 26, 2018 at 03:39:47PM +0200, Stefan Schmidt wrote:
> Hello Greg.
>
> [Hope I am not to late for this]
>
> On 16/10/2018 19:09, Greg Kroah-Hartman wrote:
> > 4.9-stable review patch. If anyone has any objections, please let me know.
> >
> > ------------------
> >
> > From: Eric Dumazet <edumazet@xxxxxxxxxx>
> >
> > Some applications still rely on IP fragmentation, and to be fair linux
> > reassembly unit is not working under any serious load.
> >
> > It uses static hash tables of 1024 buckets, and up to 128 items per bucket (!!!)
> >
> > A work queue is supposed to garbage collect items when host is under memory
> > pressure, and doing a hash rebuild, changing seed used in hash computations.
> >
> > This work queue blocks softirqs for up to 25 ms when doing a hash rebuild,
> > occurring every 5 seconds if host is under fire.
> >
> > Then there is the problem of sharing this hash table for all netns.
> >
> > It is time to switch to rhashtables, and allocate one of them per netns
> > to speedup netns dismantle, since this is a critical metric these days.
> >
> > Lookup is now using RCU. A followup patch will even remove
> > the refcount hold/release left from prior implementation and save
> > a couple of atomic operations.
> >
> > Before this patch, 16 cpus (16 RX queue NIC) could not handle more
> > than 1 Mpps frags DDOS.
> >
> > After the patch, I reach 9 Mpps without any tuning, and can use up to 2GB
> > of storage for the fragments (exact number depends on frags being evicted
> > after timeout)
> >
> > $ grep FRAG /proc/net/sockstat
> > FRAG: inuse 1966916 memory 2140004608
> >
> > A followup patch will change the limits for 64bit arches.
> >
> > Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx>
> > Cc: Kirill Tkhai <ktkhai@xxxxxxxxxxxxx>
> > Cc: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
> > Cc: Florian Westphal <fw@xxxxxxxxx>
> > Cc: Jesper Dangaard Brouer <brouer@xxxxxxxxxx>
> > Cc: Alexander Aring <alex.aring@xxxxxxxxx>
> > Cc: Stefan Schmidt <stefan@xxxxxxxxxxxxxxx>
> > Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>
> > (cherry picked from commit 648700f76b03b7e8149d13cc2bdb3355035258a9)
> > Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> > ---
> > Documentation/networking/ip-sysctl.txt | 7
> > include/net/inet_frag.h | 81 +++----
> > include/net/ipv6.h | 16 -
> > net/ieee802154/6lowpan/6lowpan_i.h | 26 --
> > net/ieee802154/6lowpan/reassembly.c | 91 +++-----
> > net/ipv4/inet_fragment.c | 349 ++++++--------------------------
> > net/ipv4/ip_fragment.c | 112 ++++------
> > net/ipv6/netfilter/nf_conntrack_reasm.c | 51 +---
> > net/ipv6/reassembly.c | 110 ++++------
> > 9 files changed, 267 insertions(+), 576 deletions(-)
> >
>
> When this patch hit master a while back we had to address a regression
> in the ieee802514 6lowpan layer. It seems this fix is missing in the
> backport series (only looking at your patchset here, no the full tree).
>
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=f18fa5de5ba7f1d6650951502bb96a6e4715a948
>
> I would appreciate if you could pull this into this series as well.

Now queued up for 4.14 and 4.9 as well, thanks.

greg k-h