Re: Fwd: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod

From: Wang Zhaolong
Date: Wed Apr 02 2025 - 00:50:06 EST

Next message: Gregory Price: "Re: [LSFMM] Updated: Linux Management of Volatile CXL Memory Devices"
Previous message: Joe Perches: "Re: [PATCH v2 10/12] checkpatch: Deprecate srcu_read_lock_lite() and srcu_read_unlock_lite()"
In reply to: Kuniyuki Iwashima: "Re: Fwd: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod"
Next in thread: Greg Kroah-Hartman: "Re: Fwd: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Yes, it seems the previous description might not have been entirely clear.
I need to clearly point out that this patch, intended as the fix for CVE-2024-54680,
does not actually address any real issues. It also fails to resolve the null pointer
dereference problem within lockdep. On top of that, it has caused a series of
subsequent leakage issues.

We will indeed need the CNA team to update the description once the correct fix
is merged.

Best regards,
Wang Zhaolong

From: Wang Zhaolong <wangzhaolong1@xxxxxxxxxx>
Date: Tue, 1 Apr 2025 21:54:47 +0800

Hi.

My colleagues and I have been investigating the issue addressed by this patch
and have discovered some significant concerns that require broader discussion.

### Socket Leak Issue

After testing this patch extensively, I've confirmed it introduces a socket leak
when TCP connections don't complete proper termination (e.g., when FIN packets
are dropped). The leak manifests as a continuous increase in TCP slab usage:

I've documented this issue with a reproducer in Bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=219972#c0

The key issue appears to stem from the interaction between the SMB client and TCP
socket lifecycle management:

1. Removing `sk->sk_net_refcnt = 1` causes TCP timers to be terminated early in
`tcp_close()` via the `inet_csk_clear_xmit_timers_sync()` call when
`!sk->sk_net_refcnt`
2. This early timer termination prevents proper reference counting resolution
when connections don't complete the 4-way TCP termination handshake
3. The resulting socket references are never fully released, leading to a leak

#### Timeline of Related Changes

1. v4.2-rc1 26abe14379f8 ("net: Modify sk_alloc to not reference count the netns of kernel sockets")
- Added `sk_net_refcnt` field to distinguish user sockets (=1) from kernel sockets (=0)
- Kernel sockets don't hold netns references, which can lead to potential UAF issues

2. v6.9-rc2 151c9c724d05: ("tcp: properly terminate timers for kernel sockets")
- Modified `tcp_close()` to check `sk->sk_net_refcnt` and explicitly terminate timers for kernel sockets (=0)
- This prevents UAF when netns is destroyed before socket timers complete
- **Key change**: If `!sk->sk_net_refcnt`, call `inet_csk_clear_xmit_timers_sync()`

3. v6.12-rc7 ef7134c7fc48: ("smb: client: Fix use-after-free of network namespace")
- Fixed netns UAF in CIFS by manually setting `sk->sk_net_refcnt = 1`
- Also called `maybe_get_net()` to maintain netns references
- This effectively made kernel sockets behave like user sockets for reference counting

4. v6.13-rc4 e9f2517a3e18: ("smb: client: fix TCP timers deadlock after rmmod")
- Problem commit: Removed `sk->sk_net_refcnt = 1` setting
- Changed to using explicit `get_net()/put_net()` at CIFS layer
- This change leads to socket leaks because timers are terminated early

### Lockdep Warning Analysis

I've also investigated the lockdep warning mentioned in the patch. My analysis
suggests it may be a false positive rather than an actual deadlock. The crash
actually occurs in the lockdep subsystem itself (null pointer dereference in
`check_wait_context()`), not in the CIFS or networking code directly.

The procedure for the null pointer dereference is as follows:

When lockdep is enabled, the lock class "slock-AF_INET-CIFS" is set when a socket
connection is established.

```
cifs_do_mount
cifs_mount
mount_get_conns
cifs_get_tcp_session
ip_connect
generic_ip_connect
cifs_reclassify_socket4
sock_lock_init_class_and_name(sk, "slock-AF_INET-CIFS",
lockdep_init_map
lockdep_init_map_wait
lockdep_init_map_type
lockdep_init_map_type
register_lock_class
__set_bit(class - lock_classes, lock_classes_in_use);
```

When the module is unloaded, the lock class is cleaned up.

```
free_mod_mem
lockdep_free_key_range
__lockdep_free_key_range
zap_class
__clear_bit(class - lock_classes, lock_classes_in_use);
```

After the module is uninstalled and the network connection is restored, the
timer is woken up.

```
run_timer_softirq
run_timer_base
__run_timers
call_timer_fn
tcp_write_timer
bh_lock_sock
spin_lock(&((__sk)->sk_lock.slock))
_raw_spin_lock
lock_acquire
__lock_acquire
check_wait_context
hlock_class
if (!test_bit(class_idx, lock_classes_in_use)) {
return NULL;
hlock_class(next)->wait_type_inner; // Null pointer dereference
```

The problem lies within lockdep, as Kuniyuki says:

I tried the repro and confirmed it triggers null deref.

It happens in LOCKDEP internal, so for me it looks like a problem in
LOCKDEP rather than CIFS or TCP.

I think LOCKDEP should hold a module reference and prevent related
modules from being unloaded.

Regarding the deadlock issue, it is clear that the locks mentioned in the deadlock warning
do not belong to the same lock instance. A deadlock should not occur.

### Discussion Points

1. API Design Question: Is this fundamentally an issue with how CIFS uses the socket
API, or is it a networking layer issue that should handle socket cleanup differently?

2. Approach to Resolution: Would it be better to:
- Revert to the original solution (setting `sk->sk_net_refcnt = 1`) from ef7134c7fc48?
- Work with networking subsystem maintainers on a more comprehensive solution that
handles socket cleanup properly?

3. CVE Process Question: Given that CVE-2024-54680 appears to "fix" a non-existent issue
while introducing an actual vulnerability, what's the appropriate way to address this?

I tested on 6.14 and e9f2517a3e18, but the issue still reproduces,
so e9f2517a3e18 is a bogus fix, and we will need to ask the CNA team
to update the description once the correct fix is merged.

Next message: Gregory Price: "Re: [LSFMM] Updated: Linux Management of Volatile CXL Memory Devices"
Previous message: Joe Perches: "Re: [PATCH v2 10/12] checkpatch: Deprecate srcu_read_lock_lite() and srcu_read_unlock_lite()"
In reply to: Kuniyuki Iwashima: "Re: Fwd: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod"
Next in thread: Greg Kroah-Hartman: "Re: Fwd: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]