Re: [net,v4] serial: caif: hold tty->link reference in ldisc_open and ser_release

From: Jakub Kicinski

Date: Tue Mar 03 2026 - 20:26:28 EST


This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.

For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
---
serial: caif: hold tty->link reference in ldisc_open and ser_release

This commit addresses a KASAN slab-use-after-free in pty_write_room()
triggered when caif_serial's TX path calls tty_write_room(). The fix
holds an extra reference on tty->link throughout the lifetime of the
caif_serial line discipline.

Link: https://lore.kernel.org/all/20260228094741.1e248271@xxxxxxxxxx/
Link: https://gist.github.com/shuangpengbai/c898debad6bdf170a84be7e6b3d8707f

The Fixes tag points to commit 308e7e4d0a84 ("serial: caif: fix
use-after-free in caif_serial ldisc_close()"), but the regression was
actually introduced by commit e31d5a05948e ("caif: tty's are kref
objects so take a reference"). That commit added tty_kref_get(tty) to
hold a reference on the tty, but failed to also hold a reference on
tty->link, which is accessed via tty_write_room() in handle_tx().

Commit 308e7e4d0a84 merely exposed this existing regression by changing
the timing.

Should the Fixes tag be updated to:

Fixes: e31d5a05948e ("caif: tty's are kref objects so take a reference")

> diff --git a/drivers/net/caif/caif_serial.c b/drivers/net/caif/caif_serial.c
> index b90890030751..1873d8287bb9 100644
> --- a/drivers/net/caif/caif_serial.c
> +++ b/drivers/net/caif/caif_serial.c
> @@ -297,6 +297,7 @@ static void ser_release(struct work_struct *work)
> dev_close(ser->dev);
> unregister_netdevice(ser->dev);
> debugfs_deinit(ser);
> + tty_kref_put(tty->link);
^^^^^^^^^^^^^^^^^^^^

Can this dereference NULL? The tty->link field is only set for PTY
(pseudo-terminal) pairs. For regular serial ports and other non-PTY
devices, tty->link is NULL.

The CAIF line discipline can be attached to any TTY device via the
TIOCSETD ioctl. When ser_release() is called to clean up a ser_device
that was created for a non-PTY TTY device, tty->link would be NULL.

Should this check for NULL before calling tty_kref_put()?

> tty_kref_put(tty);
> }
> rtnl_unlock();
> @@ -331,6 +332,7 @@ static int ldisc_open(struct tty_struct *tty)
>
> ser = netdev_priv(dev);
> ser->tty = tty_kref_get(tty);
> + tty_kref_get(tty->link);
^^^^^^^^^^^^^^^^^^^^^^^^

Can this dereference NULL? When a user attaches the CAIF line discipline
to a non-PTY TTY device via TIOCSETD, ldisc_open() is called with a tty
where tty->link is NULL.

While ldisc_open() performs capability checks (CAP_SYS_ADMIN), there is
no code restricting this to PTY devices only.

Should this check for NULL before calling tty_kref_get()?

> ser->dev = dev;
> debugfs_init(ser, tty);
> tty->receive_room = 4096;
> @@ -339,6 +341,7 @@ static int ldisc_open(struct tty_struct *tty)
> rtnl_lock();
> result = register_netdevice(dev);
> if (result) {
> + tty_kref_put(tty->link);
^^^^^^^^^^^^^^^^^^^^^^^^

Can this dereference NULL in the error path? If register_netdevice()
fails after the CAIF line discipline was attached to a non-PTY TTY
device, the error path would attempt to release a reference on
tty->link when it is NULL.

Should this check for NULL before calling tty_kref_put()?

> tty_kref_put(tty);
> rtnl_unlock();
> free_netdev(dev);

Simon Horman raised a concern in v2 about a NULL pointer dereference in
update_tty_status(). The v4 patch does not use the spinlock approach
from v2, and does not address this race condition.

After ser_release() sets ser->tty = NULL, a concurrent handle_tx() that
already passed the NULL check can reach update_tty_status(ser).

Looking at update_tty_status():

drivers/net/caif/caif_serial.c:update_tty_status() {
inline_decode(ser->common.flowctrl,
ser->tty->flow.stopped << 5 |
ser->tty->flow.tco_stopped << 3 |
ser->tty->ctrl.packet << 2);
}

It dereferences ser->tty directly (ser->tty->flow.stopped,
ser->tty->flow.tco_stopped, ser->tty->ctrl.packet).

Can this cause a NULL pointer dereference?

Simon also raised a concern about potential deadlock with spinlock usage
in v2. The v4 patch removes the spinlock approach entirely.

handle_tx() is called from caif_xmit(), which is the ndo_start_xmit
handler. This can run in softirq context (NET_TX softirq). ser_release()
runs in process context (workqueue).

How does the current synchronization between handle_tx() and
ser_release() prevent races?
--
pw-bot: cr