Bug: Race condition in AX25 device cleanup routine

From: Hanjie Wu
Date: Wed Oct 20 2021 - 13:19:26 EST


Hello there,

Our team (BlockSec) found a race-condition vulnerability in Linux's AX.25 packet radio protocol implementation. It may cause null pointer dereference and use-after-free bugs and allow attackers to adopt further exploitations.

=*=*=*=*=*=*=*=*=  BUG DETAILS  =*=*=*=*=*=*=*=*=

Linux implements AX25 protocol through socket API and AF_AX25 address family. An AX25 socket stores a pointer to the AX25 device it is bound to in its ax25_cb struct. Various system socket calls, for example, sendmsg() and getsockopt(), retrieve information about the AX25 device through this pointer.

When an AX25 device is unregistered from the system, the notifier layer will call ax25_kill_by_device() to dissociate the bound sockets from this device, which is achieved by setting the ax25_dev pointer field of the ax25_cb struct to NULL. The memory block of ax25_dev struct is then released by calling kfree() in ax25_dev_device_down().

However, the current implementation fails to consider the situation that a pending socket request on another kthread may hold a pointer to the ax25_dev and after the device is unregistered and deallocated, this kthread may still access the invalidated pointer, causing null pointer dereference or use-after-free bugs.

=*=*=*=*=*=*=*=*=  BUG EFFECTS  =*=*=*=*=*=*=*=*=

These functions access the data members of the ax25_dev struct and thus have concurrency issues with ax25_kill_by_device():

* ax25_getsockopt()
* ax25_release()
* ax25_getname()
* ax25_sendmsg()
* ax25_info_show()

Take ax25_sendmsg() as an example:

static int ax25_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
{
ax25_cb *ax25;
...
ax25 = sk_to_ax25(sk);
if (ax25->ax25_dev == NULL) { // <1>
err = -ENETUNREACH;
goto out;
}
... // <2>
ax25_queue_xmit(skb, ax25->ax25_dev->dev); // <3>
...
}

Although <1> checks the validity of the device the socket is bound to, a device unregister event may happen at <2>, and ax25->ax25_dev becomes NULL, causing null pointer dereference at <3>. Backtrace of this bug:

[ 11.490378] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 11.498583] #PF: supervisor read access in kernel mode
[ 11.503179] #PF: error_code(0x0000) - not-present page
[ 11.506628] PGD 0 P4D 0
[ 11.508762] Oops: 0000 [#1] SMP PTI
[ 11.511299] CPU: 0 PID: 116 Comm: ax25 Not tainted 5.15.0-rc6+ #2
[ 11.516545] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 11.526496] RIP: 0010:ax25_sendmsg+0x2e4/0x410
[ 11.530360] Code: c0 00 00 00 4c 89 f7 41 89 c0 49 8b 86 c8 00 00 00 48 29 d0 44 01 c0 66 41 89 86 b2 00 00 00 0f b7 c0 c6 04 02 03 49 8b 47 28 <48> 8b 70 08 e8 13 c9 ff ff 44 89 6c 24 10 e9 c2 fe ff ff 41 0f b7
[ 11.544634] RSP: 0018:ffffb123001e7d48 EFLAGS: 00010216
[ 11.549297] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 11.555120] RDX: ffffa0f882b0b600 RSI: ffffa0f882b1a010 RDI: ffffa0f881207000
[ 11.559853] RBP: ffffb123001e7e30 R08: 000000000000000e R09: 000000000000000e
[ 11.566376] R10: 0000000000000001 R11: 0000000000000001 R12: ffffa0f882b19800
[ 11.572161] R13: 0000000000000064 R14: ffffa0f881207000 R15: ffffa0f882b1a000
[ 11.578678] FS: 00007fe456ad7740(0000) GS:ffffa0f884600000(0000) knlGS:0000000000000000
[ 11.585086] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 11.589845] CR2: 0000000000000008 CR3: 0000000002afc000 CR4: 00000000000006f0
[ 11.594916] Call Trace:
[ 11.597163] sock_sendmsg+0x59/0x60
[ 11.599641] __sys_sendto+0xec/0x160
[ 11.602429] ? lockdep_hardirqs_on_prepare+0xd4/0x180
[ 11.606920] __x64_sys_sendto+0x20/0x30
[ 11.609782] do_syscall_64+0x3b/0x90
[ 11.612575] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 11.616159] RIP: 0033:0x7fe456ca7f64
[ 11.618701] Code: 89 4c 24 1c e8 cd f8 ff ff 44 8b 54 24 1c 8b 3c 24 45 31 c9 89 c5 48 8b 54 24 10 48 8b 74 24 08 45 31 c0 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 30 89 ef 48 89 04 24 e8 f9 f8 ff ff 48 8b 04
[ 11.632162] RSP: 002b:00007ffef630f950 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 11.638681] RAX: ffffffffffffffda RBX: 000055918d85ad60 RCX: 00007fe456ca7f64
[ 11.644352] RDX: 0000000000000064 RSI: 00007fe456ce1000 RDI: 0000000000000006
[ 11.649375] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 11.654651] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffef630f990
[ 11.661407] R13: 00007ffef630fab0 R14: 0000000000000000 R15: 0000000000000000
[ 11.666645] Modules linked in:
[ 11.669274] CR2: 0000000000000008
[ 11.673051] ---[ end trace 883d122fc536f4b2 ]---

Moreover, if we take a look at ax25->ax25_dev->dev at <3>, we can find that it contains two pointer dereference operations, the first is reading ax25_dev from ax25_cb struct, the second is reading dev from ax25_dev. If the device unregister event happens between these two operations, the second read operation becomes a use-after-free bug.

=*=*=*=*=*=*=*=*=  BUG REPRODUCE  =*=*=*=*=*=*=*=*=

The ax25_sendmsg() copies the message from the userspace buffer by calling memcpy_from_msg(), which gives us a chance to pause this kthread using userfaultfd. When memcpy_from_msg() reads data from userspace pages, our page fault handler is called and we unregister the AX25 device in the handler function. After returning from page fault, the ax25->ax25_dev is set to NULL, and a null pointer dereference will be triggered.

A proof-of-concept code to reproduce the aforementioned bug is provided in the attachment.

Unfortunately, we are not able to pause other functions, so the bugs in these functions cannot be stably reproduced. The race condition bug can still happen in these functions, but it requires specific execution order and this depends on the kernel scheduler.

=*=*=*=*=*=*=*=*=  SUGGESTED FIX  =*=*=*=*=*=*=*=*=

From the discussion above, we conclude that the lack of mutual exclusion between ax25_kill_by_device() and other AX25 socket functions is the root cause of this bug. A possible fix is to introduce lock_sock() into the dissociation process in ax25_kill_by_device(), and this can guarantee that the unregister routine cannot proceed when another socket request is pending.

Below is our suggested patch:

From 01637996a51269ff566a104879a539ff45e6e782 Mon Sep 17 00:00:00 2001
From: Hanjie Wu <nagi@xxxxxxxxxx>
Date: Thu, 21 Oct 2021 00:09:30 +0800
Subject: [PATCH] ax25: fix race condition in AX25 device unregister routine

The ax25_kill_by_device() function in the unregister routine has
concurrency issues with other AX25 socket functions. The ax25_dev
pointer field of ax25_cb is set to NULL and the ax25_dev struct is then
deallocated by ax25_rt_device_down(). However, other socket functions
like ax25_sendmsg() may still access the invalidated pointer.

This patch introduce lock_sock() into ax25_kill_by_device(), in order to
guarantee that the unregister routine cannot proceed when another socket
request is pending.

Signed-off-by: Hanjie Wu <nagi@xxxxxxxxxx>
---
net/ax25/af_ax25.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 2631efc6e..aa7785687 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -85,8 +85,10 @@ static void ax25_kill_by_device(struct net_device *dev)
again:
ax25_for_each(s, &ax25_list) {
if (s->ax25_dev == ax25_dev) {
- s->ax25_dev = NULL;
spin_unlock_bh(&ax25_list_lock);
+ lock_sock(s->sk);
+ s->ax25_dev = NULL;
+ release_sock(s->sk);
ax25_disconnect(s, ENETUNREACH);
spin_lock_bh(&ax25_list_lock);

--
2.25.1

Best Regards

Hanjie Wu
From 01637996a51269ff566a104879a539ff45e6e782 Mon Sep 17 00:00:00 2001
From: Hanjie Wu <nagi@xxxxxxxxxx>
Date: Thu, 21 Oct 2021 00:09:30 +0800
Subject: [PATCH] ax25: fix race condition in AX25 device unregister routine

The ax25_kill_by_device() function in the unregister routine has
concurrency issues with other AX25 socket functions. The ax25_dev
pointer field of ax25_cb is set to NULL and the ax25_dev struct is then
deallocated by ax25_rt_device_down(). However, other socket functions
like ax25_sendmsg() may still access the invalidated pointer.

This patch introduce lock_sock() into ax25_kill_by_device(), in order to
guarantee that the unregister routine cannot proceed when another socket
request is pending.

Signed-off-by: Hanjie Wu <nagi@xxxxxxxxxx>
---
net/ax25/af_ax25.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 2631efc6e..aa7785687 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -85,8 +85,10 @@ static void ax25_kill_by_device(struct net_device *dev)
again:
ax25_for_each(s, &ax25_list) {
if (s->ax25_dev == ax25_dev) {
- s->ax25_dev = NULL;
spin_unlock_bh(&ax25_list_lock);
+ lock_sock(s->sk);
+ s->ax25_dev = NULL;
+ release_sock(s->sk);
ax25_disconnect(s, ENETUNREACH);
spin_lock_bh(&ax25_list_lock);

--
2.25.1

Attachment: ax25_poc.zip
Description: Zip archive