Re: [PATCH net-next v2] vsock/test: Add test for null ptr deref when transport changes

From: Stefano Garzarella
Date: Fri Mar 28 2025 - 09:58:20 EST


On Wed, Mar 26, 2025 at 05:21:03PM +0100, Stefano Garzarella wrote:
On Wed, Mar 26, 2025 at 04:14:20PM +0100, Luigi Leonardi wrote:
Hi Michal,

On Wed, Mar 19, 2025 at 01:27:35AM +0100, Michal Luczaj wrote:
On 3/14/25 10:27, Luigi Leonardi wrote:
Add a new test to ensure that when the transport changes a null pointer
dereference does not occur[1].

Note that this test does not fail, but it may hang on the client side if
it triggers a kernel oops.

This works by creating a socket, trying to connect to a server, and then
executing a second connect operation on the same socket but to a
different CID (0). This triggers a transport change. If the connect
operation is interrupted by a signal, this could cause a null-ptr-deref.

Just to be clear: that's the splat, right?

Oops: general protection fault, probably for non-canonical address 0xdffffc000000000c: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000060-0x0000000000000067]
CPU: 2 UID: 0 PID: 463 Comm: kworker/2:3 Not tainted
Workqueue: vsock-loopback vsock_loopback_work
RIP: 0010:vsock_stream_has_data+0x44/0x70
Call Trace:
virtio_transport_do_close+0x68/0x1a0
virtio_transport_recv_pkt+0x1045/0x2ae4
vsock_loopback_work+0x27d/0x3f0
process_one_work+0x846/0x1420
worker_thread+0x5b3/0xf80
kthread+0x35a/0x700
ret_from_fork+0x2d/0x70
ret_from_fork_asm+0x1a/0x30


Yep! I'll add it to the commit message in v3.
...
+static void test_stream_transport_change_client(const struct test_opts *opts)
+{
+ __sighandler_t old_handler;
+ pid_t pid = getpid();
+ pthread_t thread_id;
+ time_t tout;
+
+ old_handler = signal(SIGUSR1, test_transport_change_signal_handler);
+ if (old_handler == SIG_ERR) {
+ perror("signal");
+ exit(EXIT_FAILURE);
+ }
+
+ if (pthread_create(&thread_id, NULL, test_stream_transport_change_thread, &pid)) {
+ perror("pthread_create");

Does pthread_create() set errno on failure?
It does not, very good catch!

+ exit(EXIT_FAILURE);
+ }
+
+ tout = current_nsec() + TIMEOUT * NSEC_PER_SEC;

Isn't 10 seconds a bit excessive? I see the oops pretty much immediately.
Yeah it's probably excessive. I used because it's the default timeout value.

+ do {
+ struct sockaddr_vm sa = {
+ .svm_family = AF_VSOCK,
+ .svm_cid = opts->peer_cid,
+ .svm_port = opts->peer_port,
+ };
+ int s;
+
+ s = socket(AF_VSOCK, SOCK_STREAM, 0);
+ if (s < 0) {
+ perror("socket");
+ exit(EXIT_FAILURE);
+ }
+
+ connect(s, (struct sockaddr *)&sa, sizeof(sa));
+
+ /* Set CID to 0 cause a transport change. */
+ sa.svm_cid = 0;
+ connect(s, (struct sockaddr *)&sa, sizeof(sa));
+
+ close(s);
+ } while (current_nsec() < tout);
+
+ if (pthread_cancel(thread_id)) {
+ perror("pthread_cancel");

And errno here.

+ exit(EXIT_FAILURE);
+ }
+
+ /* Wait for the thread to terminate */
+ if (pthread_join(thread_id, NULL)) {
+ perror("pthread_join");

And here.
Aaand I've realized I've made exactly the same mistake elsewhere :)

...
+static void test_stream_transport_change_server(const struct test_opts *opts)
+{
+ time_t tout = current_nsec() + TIMEOUT * NSEC_PER_SEC;
+
+ do {
+ int s = vsock_stream_listen(VMADDR_CID_ANY, opts->peer_port);
+
+ close(s);
+ } while (current_nsec() < tout);
+}

I'm not certain you need to re-create the listener or measure the time
here. What about something like

int s = vsock_stream_listen(VMADDR_CID_ANY, opts->peer_port);
control_expectln("DONE");
close(s);

Just tried and it triggers the oops :)

If this works (as I also initially thought), we should check the result of the first connect() in the client code. It can succeed or fail with -EINTR, in other cases we should report an error because it is not expected.

And we should check also the second connect(), it should always fail, right?

For this I think you need another sync point to be sure the server is listening before try to connect the first time:

client:
// pthread_create, etc.

control_expectln("LISTENING");

do {
...
} while();

control_writeln("DONE");

server:
int s = vsock_stream_listen(VMADDR_CID_ANY, opts->peer_port);
control_writeln("LISTENING");

We found that this needed to be extended by adding an accept() loop to avoid filling up the backlog of the listening socket.
But by doing accept() and close() back to back, we found a problem in AF_VSOCK, where connect() in some cases would get stuck until the timeout (default: 2 seconds) returning -ETIMEDOUT.

Fix is coming.

Thanks,
Stefano

control_expectln("DONE");
close(s);

Thanks,
Stefano