Re: [PATCH net v3] net: tls: use sync AEAD for sk_msg BPF sockets

From: Jiayuan Chen

Date: Tue May 26 2026 - 02:46:32 EST



On 5/26/26 10:51 AM, Christopher Lusk wrote:
The kTLS TX path can hand an open record to a sk_msg verdict
program before encryption. If the verdict applies fewer bytes
than the open record contains, tls_push_record() splits
ctx->open_rec into the record being encrypted and a remainder.
The synchronous path reattaches that remainder before continuing.

With an async AEAD provider, crypto_aead_encrypt() can return
-EINPROGRESS after ctx->open_rec has been unhooked but before the
split remainder is reattached. The remainder is no longer
reachable through ctx->open_rec or ctx->tx_list, silently dropping
transmitted data and leaking the unreachable tls_rec. The same
composition also entangles the user-page zerocopy lifetime rules
with an async completion path.

A sockmap cannot be attached to a socket after an inet ULP is
installed: sk_psock_init() returns -EINVAL when
inet_csk_has_ulp() is true. So the supported ordering for
sockmap + kTLS TX is sockmap first, TLS_TX setup second. When
TLS_TX setup sees an existing sk_psock, allocate the AEAD with
CRYPTO_ALG_ASYNC masked out and latch the TX zerocopy gate
(sw_ctx_tx->async_capable) so the buggy composition becomes
structurally unreachable. Ordinary kTLS sockets without sk_msg
BPF attached are unaffected and continue to use async-capable
providers.

Fixes: d3b18ad31f93 ("tls: add bpf support to sk_msg handling")
Cc: stable@xxxxxxxxxxxxxxx # 4.20+
Signed-off-by: Christopher Lusk <clusk@xxxxxxxxxxxxx>
Assisted-by: Codex:gpt-5.5
Assisted-by: Claude:claude-opus-4-7
---

Changes since v2 [1]:
- Per netdev maintainer guidance [2], replace the Option-C
drain-on-error fix with a setup-time surface narrowing in
tls_set_sw_offload(): when a sockmap is already attached at
TLS_TX setup, request a synchronous AEAD (CRYPTO_ALG_ASYNC in
the allocation mask) and set sw_ctx_tx->async_capable = 1.
Both moves are needed: latching async_capable alone disables
zerocopy but tls_do_encryption() can still return -EINPROGRESS
on the copy path; selecting a sync provider removes that return
path for sk_msg-attached sockets.
- Drop the selftest from the series per Jakub's note that the
existing sockmap + TLS coverage at
tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c exercises
this configuration [3]. That suite covers sockmap + kTLS
policy paths broadly; the specific async-pcrypt pass-then-drop
failure mode from the v2 reproducer was validated for v3 on
QEMU/KVM with a KASAN+LOCKDEP-instrumented kernel against net
base 2156a29aecff before send.
- Single-patch series.

Changes since v1:
- v1's remainder-rooting fix was incomplete; Sashiko AI review
surfaced a real UAF in the v2 follow-up that John Fastabend
endorsed on the v1 thread [4]. The surface-narrowing approach
in v3 makes both failure modes unreachable by avoiding the
async + sk_msg composition entirely rather than patching each
continuation point.

[1] https://lore.kernel.org/all/20260521025840.976378-1-clusk@xxxxxxxxxxxxx/
[2] https://lore.kernel.org/all/20260525133028.58494274@xxxxxxxxxx/
[3] https://lore.kernel.org/all/20260525133048.2dc6d8d3@xxxxxxxxxx/
[4] https://lore.kernel.org/all/huduxtn6parzgiaf5cyiyrrvjjvx6jsdedowvrd4nkwmuyeind@j6migjgofh2i/

net/tls/tls_sw.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 964ebc268..0000000 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -2867,7 +2867,20 @@ int tls_set_sw_offload(struct sock *sk, int tx,
rec_seq = crypto_info_rec_seq(src_crypto_info, cipher_desc);

if (!*aead) {
- *aead = crypto_alloc_aead(cipher_desc->cipher_name, 0, 0);
+ u32 mask = 0;
+
+ if (tx) {
+ struct sk_psock *psock;
+
+ psock = sk_psock_get(sk);
+ if (psock) {
+ mask = CRYPTO_ALG_ASYNC;
+ sw_ctx_tx->async_capable = 1;
+ sk_psock_put(sk, psock);
+ }
+ }
+
+ *aead = crypto_alloc_aead(cipher_desc->cipher_name, 0, mask);
if (IS_ERR(*aead)) {
rc = PTR_ERR(*aead);
*aead = NULL;
--
2.54.0

If async_capable is set to 1, the zerocopy path in tls_sw_sendmsg() is skipped.
Unfortunately ktls with bpf_msg_pop_data() does not work correctly under this
copy path.

tls_clone_plaintext_msg() aliases msg_pl onto msg_en's plaintext area (in-place encryption).

BPF runs bpf_msg_pop_data(msg, 0, 2). This shifts msg_pl's SG entry forward by 2 bytes.
The two SGs now point to the same page at different offsets. Physical memory overlaps but the start of
address differ.

I think selecting a sync provider via mask = CRYPTO_ALG_ASYNC is sufficient to
remove the -EINPROGRESS return path.

May be time to remove skmsg from ktls? (disable by default first, re-enable via a new ktls module_param?)