Re: [RFCv3 07/15] tcp: authopt: Hook into tcp core

From: Leonard Crestez
Date: Wed Aug 25 2021 - 12:32:57 EST

On 25.08.2021 01:59, Eric Dumazet wrote:
On 8/24/21 2:34 PM, Leonard Crestez wrote:
The tcp_authopt features exposes a minimal interface to the rest of the
TCP stack. Only a few functions are exposed and if the feature is
disabled they return neutral values, avoiding ifdefs in the rest of the

Add calls into tcp authopt from send, receive and accept code.

Signed-off-by: Leonard Crestez <cdleonard@xxxxxxxxx>
include/net/tcp_authopt.h | 56 +++++++++
net/ipv4/tcp_authopt.c | 246 ++++++++++++++++++++++++++++++++++++++
net/ipv4/tcp_input.c | 17 +++
net/ipv4/tcp_ipv4.c | 3 +
net/ipv4/tcp_minisocks.c | 2 +
net/ipv4/tcp_output.c | 74 +++++++++++-
net/ipv6/tcp_ipv6.c | 4 +
7 files changed, 401 insertions(+), 1 deletion(-)

diff --git a/include/net/tcp_authopt.h b/include/net/tcp_authopt.h
index c9ee2059b442..61db268f36f8 100644
--- a/include/net/tcp_authopt.h
+++ b/include/net/tcp_authopt.h
@@ -21,10 +21,11 @@ struct tcp_authopt_key_info {
/* Wire identifiers */
u8 send_id, recv_id;
u8 alg_id;
u8 keylen;
+ u8 maclen;

I do not see maclen being enforced to 12, or a multiple of 4 ?

For both current algorithms the maclen value is 12. I just implemented RFC5926, there is no way to control this from userspace.

This means that later [2], tcp_authopt_hash() will leave up to 3
unitialized bytes in the TCP options, sent to the wire.

This is a security issue, since we will leak kernel memory.

Filling the remainder with zeroes does make sense, or at least WARN_ON(maclen != 4) so that it's obvious to anyone who attempts to extend the algorithms.

+struct tcp_authopt_key_info *tcp_authopt_lookup_send(struct tcp_authopt_info *info,
+ const struct sock *addr_sk,
+ int send_id)
+ struct tcp_authopt_key_info *result = NULL;
+ struct tcp_authopt_key_info *key;
+ hlist_for_each_entry_rcu(key, &info->head, node, 0) {
+ if (send_id >= 0 && key->send_id != send_id)
+ continue;
+ if (key->flags & TCP_AUTHOPT_KEY_ADDR_BIND) {
+ if (addr_sk->sk_family == AF_INET) {
+ struct sockaddr_in *key_addr = (struct sockaddr_in *)&key->addr;
+ const struct in_addr *daddr =
+ (const struct in_addr *)&addr_sk->sk_daddr;

Why a cast is needed ? sk_daddr is a __be32, no need to cast it to in_addr
+ if (WARN_ON(key_addr->sin_family != AF_INET))

Why a WARN_ON() is used ? If we expect this to trigger, then at minimumum WARN_ON_ONCE() please.

+ continue;
+ if (memcmp(daddr, &key_addr->sin_addr, sizeof(*daddr)))
+ continue;

Using memcmp() to compare two __be32 is overkill.

+ }
+ if (addr_sk->sk_family == AF_INET6) {
+ struct sockaddr_in6 *key_addr = (struct sockaddr_in6 *)&key->addr;
+ const struct in6_addr *daddr = &addr_sk->sk_v6_daddr;

Not sure why a variable is used, you need it once.

+ if (WARN_ON(key_addr->sin6_family != AF_INET6))
+ continue;
+ if (memcmp(daddr, &key_addr->sin6_addr, sizeof(*daddr)))

ipv6_addr_equal() should be faster.

OK, I will replace the comparisons.

Checking address family is mostly paranoia on my part, I don't know if a real scenario exists for AF mismatch. Still need to check ipv4-mapped ipv6 addresses, not sure if those can receive ipv4 skbs on an ipv6 socket.

+struct tcp_authopt_key_info *tcp_authopt_select_key(const struct sock *sk,
+ const struct sock *addr_sk,
+ u8 *rnextkeyid)
+ struct tcp_authopt_info *info;
+ info = rcu_dereference(tcp_sk(sk)->authopt_info);

distro kernels will have CONFIG_TCP_AUTHOPT set, meaning
that we will add a cache line miss for every incoming TCP packet
even on hosts not using any RFC5925 TCP flow.

For TCP MD5 we are using a static key, to avoid this extra cost.

OK, will add a static_key.

The check for "does socket have tcp_authopt" also belongs in an inline wrapper, similar to inbound check

+int __tcp_authopt_openreq(struct sock *newsk, const struct sock *oldsk, struct request_sock *req)
+ struct tcp_authopt_info *old_info;
+ struct tcp_authopt_info *new_info;
+ int err;
+ old_info = rcu_dereference(tcp_sk(oldsk)->authopt_info);
+ if (!old_info)
+ return 0;
+ new_info = kmalloc(sizeof(*new_info), GFP_ATOMIC | __GFP_ZERO);

kzalloc() is your friend. (same remark for your other patches, where you are using __GFP_ZERO)
Also see additional comment [1]


+ if (!new_info)
+ return -ENOMEM;
+ sk_nocaps_add(newsk, NETIF_F_GSO_MASK);
+ new_info->src_isn = tcp_rsk(req)->snt_isn;
+ new_info->dst_isn = tcp_rsk(req)->rcv_isn;
+ INIT_HLIST_HEAD(&new_info->head);
+ err = tcp_authopt_clone_keys(newsk, oldsk, new_info, old_info);
+ if (err) {
+ __tcp_authopt_info_free(newsk, new_info);

Are we leaving in place old value of newsk->authopt_info ?
If this is copied from the listener, I think you need
to add a tcp_sk(newsk)->authopt_info = NULL;
before the kzalloc() call done above.

Yes, authopt_info should be set to NULL on error because keeping the listen socket's value is wrong and dangerous (double free).

Leaving authopt_info NULL or malloc failure is still possible dangerous because it means all keys are ignored and accepted. Not clear how we could cause tcp_create_openreq_child to fail instead.

This is a problem in a few other parts: if cryptography fails the outbound MAC is filled with zeros because there's not obvious way to make TX fail at that point.

+ err = __tcp_authopt_calc_mac(sk, skb, key, false, macbuf);
+ if (err) {
+ /* If mac calculation fails and caller doesn't handle the error
+ * try to make it obvious inside the packet.
+ */
+ memset(hash_location, 0, key->maclen);
+ return err;
+ }
+ memcpy(hash_location, macbuf, key->maclen);

This is the place were we do not make sure to clear the padding bytes
(if key->maclen is not a multiple of 4)

Yes. It might make sense to fix in caller because it's the caller which decides to align options.