Re: [RFCv2 1/9] tcp: authopt: Initial support and key management

From: David Ahern
Date: Wed Aug 11 2021 - 13:16:02 EST


On 8/11/21 8:31 AM, Dmitry Safonov wrote:
> On 8/11/21 9:29 AM, Leonard Crestez wrote:
>> On 8/10/21 11:41 PM, Dmitry Safonov wrote:
> [..]
>>>> +       u32 flags;
>>>> +       /* Wire identifiers */
>>>> +       u8 send_id, recv_id;
>>>> +       u8 alg_id;
>>>> +       u8 keylen;
>>>> +       u8 key[TCP_AUTHOPT_MAXKEYLEN];
>>>> +       struct rcu_head rcu;
>>>
>>> This is unaligned and will add padding.
>>
>> Not clear padding matters. Moving rcu_head higher would avoid it, is
>> that what you're suggesting.
>
> Yes.
>
>>> I wonder if it's also worth saving some bytes by something like
>>> struct tcp_ao_key *key;
>>>
>>> With
>>> struct tcp_ao_key {
>>>        u8 keylen;
>>>        u8 key[0];
>>> };
>>>
>>> Hmm?
>>
>> This increases memory management complexity for very minor gains. Very
>> few tcp_authopt_key will ever be created.
>
> The change doesn't seem to be big, like:
> --- a/net/ipv4/tcp_authopt.c
> +++ b/net/ipv4/tcp_authopt.c
> @@ -422,8 +422,16 @@ int tcp_set_authopt_key(struct sock *sk, sockptr_t
> optval, unsig>
> key_info = __tcp_authopt_key_info_lookup(sk, info, opt.local_id);
> if (key_info)
> tcp_authopt_key_del(sk, info, key_info);
> +
> + key = sock_kmalloc(sk, sizeof(*key) + opt.keylen, GFP_KERNEL |
> __GFP_ZERO);
> + if (!key) {
> + tcp_authopt_alg_release(alg);
> + return -ENOMEM;
> + }
> +
> key_info = sock_kmalloc(sk, sizeof(*key_info), GFP_KERNEL |
> __GFP_ZERO);
> if (!key_info) {
> + sock_kfree_s(sk, key, sizeof(*key) + opt.keylen);
> tcp_authopt_alg_release(alg);
> return -ENOMEM;
> }
>
> I don't know, probably it'll be enough for every user to limit their
> keys by length of 80, but if one day it won't be enough - this ABI will
> be painful to extend.
>
> [..]
>>>> +#define TCP_AUTHOPT                    38      /* TCP Authentication
>>>> Option (RFC2385) */
>>>> +#define TCP_AUTHOPT_KEY                39      /* TCP Authentication
>>>> Option update key (RFC2385) */
>>>
>>> RFC2385 is md5 one.
>>> Also, should there be TCP_AUTHOPT_ADD_KEY, TCP_AUTHOPT_DELTE_KEY
>>> instead of using flags inside setsocketopt()? (no hard feelings)
>>
>> Fixed RFC reference.
>>
>> TCP_AUTHOPT_DELETE_KEY could be clearer, I just wanted to avoid bloating
>> the sockopt space. But there's not any scarcity.
>>
>> For reference tcp_md5 handles key deletion based on keylen == 0. This
>> seems wrong to me, an empty key is in fact valid though not realistic.
>>
>> If local_id is removed in favor of "full match on id and binding" then
>> the delete sockopt would still need most of a full struct
>> tcp_authopt_key anyway.
>
> Sounds like a plan.
>
> [..]>> I'm not sure what's the use of enum here, probably: > #define
>>> TCP_AUTHOPT_FLAG_REJECT_UNEXPECTED BIT(2)
>>
>> This is an enum because it looks nice in kernel-doc. I couldn't find a
>> way to attach docs to a macro and include it somewhere else.
>
> Yeah, ok, seems like good justification.
>
>> BTW, the enum gains more members later.
>>
>> As for BIT() it doesn't see to be allowed in uapi and there were recent
>> changes removing such usage.
>
> Ok, I just saw it's still used in include/uapi, but not aware of the
> removal.
>
>>
>>> [..]
>>>> +struct tcp_authopt_key {
>>>> +       /** @flags: Combination of &enum tcp_authopt_key_flag */
>>>> +       __u32   flags;
>>>> +       /** @local_id: Local identifier */
>>>> +       __u32   local_id;
>>>> +       /** @send_id: keyid value for send */
>>>> +       __u8    send_id;
>>>> +       /** @recv_id: keyid value for receive */
>>>> +       __u8    recv_id;
>>>> +       /** @alg: One of &enum tcp_authopt_alg */
>>>> +       __u8    alg;
>>>> +       /** @keylen: Length of the key buffer */
>>>> +       __u8    keylen;
>>>> +       /** @key: Secret key */
>>>> +       __u8    key[TCP_AUTHOPT_MAXKEYLEN];
>>>> +       /**
>>>> +        * @addr: Key is only valid for this address
>>>> +        *
>>>> +        * Ignored unless TCP_AUTHOPT_KEY_ADDR_BIND flag is set
>>>> +        */
>>>> +       struct __kernel_sockaddr_storage addr;
>>>> +};
>>>
>>> It'll be an ABI if this is accepted. As it is - it can't support
>>> RFC5925 fully.
>>> Extending syscall ABI is painful. I think, even the initial ABI *must*
>>> support
>>> all possible features of the RFC.
>>> In other words, there must be src_addr, src_port, dst_addr and dst_port.
>>> I can see from docs you've written you don't want to support a mix of
>>> different
>>> addr/port MKTs, so you can return -EINVAL or -ENOTSUPP for any value
>>> but zero.
>>> This will create an ABI that can be later extended to fully support RFC.
>>
>> RFC states that MKT connection identifiers can be specified using ranges
>> and wildcards and the details are up to the implementation. Keys are
>> *NOT* just bound to a classical TCP 4-tuple.
>>
>> * src_addr and src_port is implicit in socket binding. Maybe in theory
>> src_addr they could apply for a server socket bound to 0.0.0.0:PORT but
>> userspace can just open more sockets.
>> * dst_port is not supported by MD5 and I can't think of any useful
>> usecase. This is either well known (179 for BGP) or auto-allocated.
>> * tcp_md5 was recently enhanced allow a "prefixlen" for addr and
>> "l3mdev" ifindex binding.
>>
>> This last point shows that the binding features people require can't be
>> easily predicted in advance so it's better to allow the rules to be
>> extended.
>
> Yeah, I see both changes you mention went on easy way as they reused
> existing paddings in the ABI structures.
> Ok, if you don't want to reserve src_addr/src_port/dst_addr/dst_port,
> than how about reserving some space for those instead?
>
>>> The same is about key: I don't think you need to define/use
>>> tcp_authopt_alg.
>>> Just use algo name - that way TCP-AO will automatically be able to use
>>> any algo supported by crypto engine.
>>> See how xfrm does it, e.g.:
>>> struct xfrm_algo_auth {
>>>      char        alg_name[64];
>>>      unsigned int    alg_key_len;    /* in bits */
>>>      unsigned int    alg_trunc_len;  /* in bits */
>>>      char        alg_key[0];
>>> };
>>>
>>> So you can let a user chose maclen instead of hard-coding it.
>>> Much more extendable than what you propose.
>>
>> This complicates ABI and implementation with features that are not
>> needed. I'd much rather only expose an enum of real-world tcp-ao
>> algorithms.
>
> I see it exactly the opposite way: a new enum unnecessary complicates
> ABI, instead of passing alg_name[] to crypto engine. No need to add any
> support in tcp-ao as the algorithms are already provided by kernel.
> That is how it transparently works for ipsec, why not for tcp-ao?
>
>>
>>> [..]
>>>> +#ifdef CONFIG_TCP_AUTHOPT
>>>> +       case TCP_AUTHOPT: {
>>>> +               struct tcp_authopt info;
>>>> +
>>>> +               if (get_user(len, optlen))
>>>> +                       return -EFAULT;
>>>> +
>>>> +               lock_sock(sk);
>>>> +               tcp_get_authopt_val(sk, &info);
>>>> +               release_sock(sk);
>>>> +
>>>> +               len = min_t(unsigned int, len, sizeof(info));
>>>> +               if (put_user(len, optlen))
>>>> +                       return -EFAULT;
>>>> +               if (copy_to_user(optval, &info, len))
>>>> +                       return -EFAULT;
>>>> +               return 0;
>>>> +       }
>>>
>>> I'm pretty sure it's not a good choice to write partly tcp_authopt.
>>> And a user has no way to check what's the correct len on this kernel.
>>> Instead of len = min_t(unsigned int, len, sizeof(info)), it should be
>>> if (len != sizeof(info))
>>>      return -EINVAL;
>>
>> Purpose is to allow sockopts to grow as md5 has grown.
>
> md5 has not grown. See above.

MD5 uapi has - e.g., 8917a777be3ba and 6b102db50cdde. We want similar
capabilities for growth with this API.

>
> Another issue with your approach
>
> + /* If userspace optlen is too short fill the rest with zeros */
> + if (optlen > sizeof(opt))
> + return -EINVAL;
> + memset(&opt, 0, sizeof(opt));
> + if (copy_from_sockptr(&opt, optval, optlen))
> + return -EFAULT;
>
> is that userspace compiled with updated/grew structure will fail on
> older kernel. So, no extension without breaking something is possible.
> Which also reminds me that currently you don't validate (opt.flags) for
> unknown by kernel flags.
>
> Extending syscalls is impossible without breaking userspace if ABI is
> not designed with extensibility in mind. That was quite a big problem,
> and still is. Please, read this carefully:
> https://lwn.net/Articles/830666/
>
> That is why I'm suggesting you all those changes that will be harder to
> fix when/if your patches get accepted.
> As an example how it should work see in copy_clone_args_from_user().
>

Look at how TCP_ZEROCOPY_RECEIVE has grown over releases as an example
of how to properly handle this.