Re: [PATCH nft] nft: memcg accounting for dynamically allocated objects

From: Florian Westphal
Date: Fri Apr 01 2022 - 08:03:53 EST


Vasily Averin <vasily.averin@xxxxxxxxx> wrote:
> nft_*.c files whose NFT_EXPR_STATEFUL flag is set on need to
> use __GFP_ACCOUNT flag for objects that are dynamically
> allocated from the packet path.
>
> Such objects are allocated inside .init() or .clone() callbacks
> of struct nft_expr_ops executed in task context while processing
> netlink messages.

They can also be called from packet path.

> diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
> index 04be94236a34..e01241151ef7 100644
> --- a/net/netfilter/nf_tables_api.c
> +++ b/net/netfilter/nf_tables_api.c
> @@ -5447,7 +5447,7 @@ int nft_set_elem_expr_clone(const struct nft_ctx *ctx, struct nft_set *set,
> int err, i, k;
>
> for (i = 0; i < set->num_exprs; i++) {
> - expr = kzalloc(set->exprs[i]->ops->size, GFP_KERNEL);
> + expr = kzalloc(set->exprs[i]->ops->size, GFP_KERNEL_ACCOUNT);
> if (!expr)
> goto err_expr;

This is ok.

> diff --git a/net/netfilter/nft_connlimit.c b/net/netfilter/nft_connlimit.c
> index 3362417ebfdb..9c2146aac59e 100644
> --- a/net/netfilter/nft_connlimit.c
> +++ b/net/netfilter/nft_connlimit.c
> @@ -77,7 +77,7 @@ static int nft_connlimit_do_init(const struct nft_ctx *ctx,
> invert = true;
> }
>
> - priv->list = kmalloc(sizeof(*priv->list), GFP_KERNEL);
> + priv->list = kmalloc(sizeof(*priv->list), GFP_KERNEL_ACCOUNT);
> if (!priv->list)
> return -ENOMEM;

Same.

> @@ -214,7 +214,7 @@ static int nft_connlimit_clone(struct nft_expr *dst, const struct nft_expr *src)
> struct nft_connlimit *priv_dst = nft_expr_priv(dst);
> struct nft_connlimit *priv_src = nft_expr_priv(src);
>
> - priv_dst->list = kmalloc(sizeof(*priv_dst->list), GFP_ATOMIC);
> + priv_dst->list = kmalloc(sizeof(*priv_dst->list), GFP_ATOMIC | __GFP_ACCOUNT);

This can be called from packet path, via nft_dynset.c.

nft_do_chain -> nft_dynset_eval -> nft_dynset_new ->
nft_dynset_expr_setup -> nft_expr_clone -> src->ops->clone()

> diff --git a/net/netfilter/nft_counter.c b/net/netfilter/nft_counter.c
> index f179e8c3b0ca..040a697d96b3 100644
> --- a/net/netfilter/nft_counter.c
> +++ b/net/netfilter/nft_counter.c
> @@ -62,7 +62,7 @@ static int nft_counter_do_init(const struct nlattr * const tb[],
> struct nft_counter __percpu *cpu_stats;
> struct nft_counter *this_cpu;
>
> - cpu_stats = alloc_percpu(struct nft_counter);
> + cpu_stats = alloc_percpu_gfp(struct nft_counter, GFP_KERNEL_ACCOUNT);

This is ok.

> @@ -235,7 +235,7 @@ static int nft_counter_clone(struct nft_expr *dst, const struct nft_expr *src)
>
> nft_counter_fetch(priv, &total);
>
> - cpu_stats = alloc_percpu_gfp(struct nft_counter, GFP_ATOMIC);
> + cpu_stats = alloc_percpu_gfp(struct nft_counter, GFP_ATOMIC | __GFP_ACCOUNT);
> if (cpu_stats == NULL)
> return -ENOMEM;

Same problem as connlimit, can be called from packet path.
Basically all GFP_ATOMIC are suspicious.

Not sure how to resolve this, similar mechanics in iptables world (e.g.
connlimit or SET target) don't use memcg accounting.

Perhaps for now resend with only the GFP_KERNEL parts converted?
Those are safe.

Insertion from packet path is limited by set->size (element count) only
at this time.