[PATCH net v4] net: ax25: Fix deadlock caused by skb_recv_datagram in ax25_recvmsg

From: Duoming Zhou
Date: Fri Jun 10 2022 - 08:33:51 EST


The skb_recv_datagram() in ax25_recvmsg() will hold lock_sock
and block until it receives a packet from the remote. If the client
doesn`t connect to server and calls read() directly, it will not
receive any packets forever. As a result, the deadlock will happen.

The fail log caused by deadlock is shown below:

[ 369.606973] INFO: task ax25_deadlock:157 blocked for more than 245 seconds.
[ 369.608919] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 369.613058] Call Trace:
[ 369.613315] <TASK>
[ 369.614072] __schedule+0x2f9/0xb20
[ 369.615029] schedule+0x49/0xb0
[ 369.615734] __lock_sock+0x92/0x100
[ 369.616763] ? destroy_sched_domains_rcu+0x20/0x20
[ 369.617941] lock_sock_nested+0x6e/0x70
[ 369.618809] ax25_bind+0xaa/0x210
[ 369.619736] __sys_bind+0xca/0xf0
[ 369.620039] ? do_futex+0xae/0x1b0
[ 369.620387] ? __x64_sys_futex+0x7c/0x1c0
[ 369.620601] ? fpregs_assert_state_consistent+0x19/0x40
[ 369.620613] __x64_sys_bind+0x11/0x20
[ 369.621791] do_syscall_64+0x3b/0x90
[ 369.622423] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 369.623319] RIP: 0033:0x7f43c8aa8af7
[ 369.624301] RSP: 002b:00007f43c8197ef8 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
[ 369.625756] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f43c8aa8af7
[ 369.626724] RDX: 0000000000000010 RSI: 000055768e2021d0 RDI: 0000000000000005
[ 369.628569] RBP: 00007f43c8197f00 R08: 0000000000000011 R09: 00007f43c8198700
[ 369.630208] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff845e6afe
[ 369.632240] R13: 00007fff845e6aff R14: 00007f43c8197fc0 R15: 00007f43c8198700

This patch replaces skb_recv_datagram() with an open-coded variant of it
releasing the socket lock before the __skb_wait_for_more_packets() call
and re-acquiring it after such call in order that other functions that
need socket lock could be executed.

what's more, the socket lock will be released only when recvmsg() will
block and that should produce nicer overall behavior.

Fixes: 40d0a923f55a ("Implement locking of internal data for NET/ROM and ROSE.")
Suggested-by: Thomas Osterried <thomas@xxxxxxxxxxxx>
Signed-off-by: Duoming Zhou <duoming@xxxxxxxxxx>
Reported-by: Thomas Habets <thomas@@habets.se>
---
Changes in v4:
- Replaces skb_recv_datagram() with an open-coded variant of it.

net/ax25/af_ax25.c | 33 ++++++++++++++++++++++++++++-----
1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 95393bb2760..4c7030ed8d3 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1661,9 +1661,12 @@ static int ax25_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
int flags)
{
struct sock *sk = sock->sk;
- struct sk_buff *skb;
+ struct sk_buff *skb, *last;
+ struct sk_buff_head *sk_queue;
int copied;
int err = 0;
+ int off = 0;
+ long timeo;

lock_sock(sk);
/*
@@ -1675,10 +1678,29 @@ static int ax25_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
goto out;
}

- /* Now we can treat all alike */
- skb = skb_recv_datagram(sk, flags, &err);
- if (skb == NULL)
- goto out;
+ /* We need support for non-blocking reads. */
+ sk_queue = &sk->sk_receive_queue;
+ skb = __skb_try_recv_datagram(sk, sk_queue, flags, &off, &err, &last);
+ /* If no packet is available, release_sock(sk) and try again. */
+ if (!skb) {
+ if (err != -EAGAIN)
+ goto out;
+ release_sock(sk);
+ timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
+ while (timeo && !__skb_wait_for_more_packets(sk, sk_queue, &err,
+ &timeo, last)) {
+ skb = __skb_try_recv_datagram(sk, sk_queue, flags, &off,
+ &err, &last);
+ if (skb)
+ break;
+
+ if (err != -EAGAIN)
+ goto done;
+ }
+ if (!skb)
+ goto done;
+ lock_sock(sk);
+ }

if (!sk_to_ax25(sk)->pidincl)
skb_pull(skb, 1); /* Remove PID */
@@ -1725,6 +1747,7 @@ static int ax25_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
out:
release_sock(sk);

+done:
return err;
}

--
2.17.1