Re: [PATCH net v2 0/5] Make TCP-MD5-diag slightly less broken
From: Dmitry Safonov
Date: Fri Nov 15 2024 - 22:53:12 EST
On Sat, 16 Nov 2024 at 01:58, Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
>
> On Sat, 16 Nov 2024 00:48:17 +0000 Dmitry Safonov wrote:
> > On Sat, 16 Nov 2024 at 00:08, Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
> > > Would it be too ugly if we simply retried with a 32kB skb if the initial
> > > dump failed with EMSGSIZE?
> >
> > Yeah, I'm not sure. I thought of keeping it simple and just marking
> > the nlmsg "inconsistent". This is arguably a change of meaning for
> > NLM_F_DUMP_INTR because previously, it meant that the multi-message
> > dump became inconsistent between recvmsg() calls. And now, it is also
> > utilized in the "do" version if it raced with the socket setsockopts()
> > in another thread.
>
> NLM_F_DUMP_INTR is an interesting idea, but exactly as you say NLM_F_DUMP_INTR
> was a workaround for consistency of the dump as a whole. Single message
> we can re-generate quite easily in the kernel, so forcing the user to
> handle INTR and retry seems unnecessarily cruel ;)
Kind of agree. But then, it seems to be quite rare. Even on a
purposely created selftest it fires not each time (maybe I'm not
skilful enough). Yet somewhat sceptical about a re-try in the kernel:
the need for it is caused by another thread manipulating keys, so we
may need another re-try after the first re-try... So, then we would
have to introduce a limit on retries :D
Hmm, what do you think about a kind of middle-ground/compromise
solution: keeping this NLM_F_DUMP_INTR flag and logic, but making it
hardly ever/never happen by purposely allocating larger skb. I don't
want to set some value in stone as one day it might become not enough
for all different socket infos, but maybe just add 4kB more to the
initial allocation? So, for it to reproduce, another thread would have
to add 4kB/sizeof(tcp_diag_md5sig) = 4kB/100 ~= 40 MD5 keys on the
socket between this thread's skb allocation and filling of the info
array. I'd call it "attempting to be nice to a user, but not at their
busylooping expense".
> > > Just putting the same attribute type multiple times is preferable
> > > to array types.
> >
> > Cool. I didn't know that. I think I was confused by iproute way of
> > parsing [which I read very briefly, so might have misunderstood]:
> > : while (RTA_OK(rta, len)) {
> > : type = rta->rta_type & ~flags;
> > : if ((type <= max) && (!tb[type]))
> > : tb[type] = rta;
> > : rta = RTA_NEXT(rta, len);
> > : }
> > https://github.com/iproute2/iproute2/blob/main/lib/libnetlink.c#L1526
> >
> > which seems like it will just ignore duplicate attributes.
> >
> > That doesn't mean iproute has to dictate new code in kernel, for sure.
>
> Right, the table based parsing doesn't work well with multi-attr,
> but other table formats aren't fundamentally better. Or at least
> I never came up with a good way of solving this. And the multi-attr
> at least doesn't suffer from the u16 problem.
Yeah, also an array of structs that makes it impossible to extend such
an ABI with new members.
And with regards to u16, I was thinking of this diff for net-next, but
was not sure if it's worth it:
diff --git a/lib/nlattr.c b/lib/nlattr.c
index be9c576b6e2d..01c5a49ffa34 100644
--- a/lib/nlattr.c
+++ b/lib/nlattr.c
@@ -903,6 +903,9 @@ struct nlattr *__nla_reserve(struct sk_buff *skb,
int attrtype, int attrlen)
{
struct nlattr *nla;
+ DEBUG_NET_WARN_ONCE(attrlen >= U16_MAX,
+ "requested nlattr::nla_len %d >= U16_MAX", attrlen);
+
nla = skb_put(skb, nla_total_size(attrlen));
nla->nla_type = attrtype;
nla->nla_len = nla_attr_size(attrlen);
--->8---
Thanks,
Dmitry