Re: KCSAN: data-race in desc_read / prb_reserve_in_last

From: Petr Mladek

Date: Wed Mar 11 2026 - 09:34:27 EST

On Wed 2026-03-11 11:31:08, Jianzhou Zhao wrote:
>
>
> Subject: [BUG] printk: KCSAN: data-race in desc_read / prb_reserve_in_last
>
> Dear Maintainers,
>
> We are writing to report a KCSAN-detected data-race vulnerability in the Linux kernel. This bug was found by our custom fuzzing tool, RacePilot. The bug occurs in the printk ringbuffer system, dealing with concurrent modifications and unannotated reads of the data block logical limits. We observed this on the Linux kernel version 6.18.0-08691-g2061f18ad76e-dirty.
>
> Call Trace & Context
> ==================================================================
> BUG: KCSAN: data-race in desc_read / prb_reserve_in_last
>
> write to 0xffffffff869276a8 of 8 bytes by task 14248 on cpu 0:
> data_realloc kernel/printk/printk_ringbuffer.c:1252 [inline]
> prb_reserve_in_last+0x831/0xb20 kernel/printk/printk_ringbuffer.c:1529
> vprintk_store+0x603/0x980 kernel/printk/printk.c:2283
> vprintk_emit+0xfd/0x540 kernel/printk/printk.c:2412
> vprintk_default+0x26/0x30 kernel/printk/printk.c:2451
> vprintk+0x1d/0x30 kernel/printk/printk_safe.c:82
> _printk+0x63/0x90 kernel/printk/printk.c:2461
> disk_unlock_native_capacity block/partitions/core.c:520 [inline]
> blk_add_partition block/partitions/core.c:543 [inline]
> blk_add_partitions block/partitions/core.c:633 [inline]
> bdev_disk_changed block/partitions/core.c:693 [inline]
> bdev_disk_changed+0xae3/0xeb0 block/partitions/core.c:642
> loop_reread_partitions+0x44/0xc0 drivers/block/loop.c:449
> loop_set_status+0x41c/0x580 drivers/block/loop.c:1278
> loop_set_status64 drivers/block/loop.c:1374 [inline]
> lo_ioctl+0xf0/0x1170 drivers/block/loop.c:1560
> blkdev_ioctl+0x377/0x420 block/ioctl.c:707
> vfs_ioctl fs/ioctl.c:52 [inline]
> __do_sys_ioctl fs/ioctl.c:605 [inline]
> __se_sys_ioctl fs/ioctl.c:591 [inline]
> __x64_sys_ioctl+0x121/0x170 fs/ioctl.c:591
> x64_sys_call+0xc3a/0x2030 arch/x86/include/generated/asm/syscalls_64.h:17
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0xae/0x2c0 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> read to 0xffffffff869276a0 of 16 bytes by task 3004 on cpu 1:
> desc_read+0x115/0x250 kernel/printk/printk_ringbuffer.c:499
> desc_read_finalized_seq+0x40/0x140 kernel/printk/printk_ringbuffer.c:1972
> prb_read kernel/printk/printk_ringbuffer.c:2018 [inline]
> _prb_read_valid+0xc1/0x550 kernel/printk/printk_ringbuffer.c:2213
> prb_read_valid_info+0x74/0xa0 kernel/printk/printk_ringbuffer.c:2321
> devkmsg_poll+0xa1/0x120 kernel/printk/printk.c:906
> vfs_poll include/linux/poll.h:82 [inline]
> ep_item_poll.isra.0+0xb0/0x110 fs/eventpoll.c:1059
> ep_send_events+0x231/0x670 fs/eventpoll.c:1818
> ep_try_send_events fs/eventpoll.c:1905 [inline]
> ep_poll fs/eventpoll.c:1970 [inline]
> do_epoll_wait+0x2a8/0x9c0 fs/eventpoll.c:2461
> __do_sys_epoll_wait fs/eventpoll.c:2469 [inline]
> __se_sys_epoll_wait fs/eventpoll.c:2464 [inline]
> __x64_sys_epoll_wait+0xcb/0x190 fs/eventpoll.c:2464
> x64_sys_call+0x194e/0x2030 arch/x86/include/generated/asm/syscalls_64.h:233
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0xae/0x2c0 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> Reported by Kernel Concurrency Sanitizer on:
> CPU: 1 UID: 0 PID: 3004 Comm: systemd-journal Not tainted 6.18.0-08691-g2061f18ad76e-dirty #50 PREEMPT(voluntary)
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
> ==================================================================
>
> Execution Flow & Code Context
> The writer task executes `prb_reserve_in_last()`, which concurrently reallocates and extends space for the newest record data by calling `data_realloc()`. Inside `data_realloc()`, it updates `blk_lpos->next` (which points to `desc->text_blk_lpos.next`) with a plain write:
> ```c
> // kernel/printk/printk_ringbuffer.c
> static char *data_realloc(struct printk_ringbuffer *rb, unsigned int size,
> struct prb_data_blk_lpos *blk_lpos, unsigned long id)
> {
> ...
> blk_lpos->next = next_lpos; // <-- Write
> return &blk->data[0];
> }
> ```
>
> Concurrently, another task acting as a reader accesses the ringbuffer via `desc_read()` to copy the descriptor structure to a local copy `desc_out`. It uses an unsynchronized `memcpy()` to read the `text_blk_lpos` structure which includes the 16 bytes representing both `begin` and `next` positions:
> ```c
> // kernel/printk/printk_ringbuffer.c
> static enum desc_state desc_read(struct prb_desc_ring *desc_ring,
> unsigned long id, struct prb_desc *desc_out,
> u64 *seq_out, u32 *caller_id_out)
> {
> ...
> if (desc_out) {
> memcpy(&desc_out->text_blk_lpos, &desc->text_blk_lpos,
> sizeof(desc_out->text_blk_lpos)); /* LMM(desc_read:C) */ // <-- Lockless Read
> }

The race is actually expected, see the comment above this code:

/*
* Copy the descriptor data. The data is not valid until the
* state has been re-checked. A memcpy() for all of @desc
* cannot be used because of the atomic_t @state_var field.
*/
if (desc_out) {
memcpy(&desc_out->text_blk_lpos, &desc->text_blk_lpos,
sizeof(desc_out->text_blk_lpos)); /* LMM(desc_read:C) */
}

The data are copied. But the reader is allowed to use
them only when the state of the block is marked as finalized.

And there is a long way between data_realloc() and
setting the finalized state:

1. Writer calls prb_reserve_in_last() -> data_realloc().
Note that the reserved size can only get increased here.

2. If prb_reserve_in_last() succeeded that writer copies
new data into the extended buffer.

3. The record is finalized in two situations:

a) When the appended text ended with newline '\n'
and the writer calls prb_final_commit().

b) When the writer calls pr_commit() and next
record is already reserved.

c) When another writer calls prb_reserve() and the last
(extended) record has already been committed.

So, the race should be safe.

But I agree that some race is there. The reader might copy
an inconsistent descriptor, get delayed by long NMI, and
then see already finalized state.

I mean the following:

CPU0 CPU1

prb_read()
desc_read()
prb_prb_reserve_in_last()
data_realloc()
memcpy(text_blk_lpos) blk_lpos->next = next_lpos

Warning: memcpy copies inconsistent lpos !!!

NMI enter

# Writer appends data and finalize
# the record
prb_final_commit()

NMI handler ends

state_val = atomic_long_read(state_var);
d_state = get_desc_state(id, state_val);

BANG: The reader gets inconsistent blk_lpos and sees
finalized state.

On the positive side:
---------------------

There are still some sanity checks, for example:

static const char *get_data(struct prb_data_ring *data_ring,
struct prb_data_blk_lpos *blk_lpos,
unsigned int *data_size)
{
[...]
*data_size = blk_lpos->next - blk_lpos->begin;
[...]
if (WARN_ON_ONCE(!data_check_size(data_ring, *data_size) || !*data_size))
return NULL;
...

They might catch races when the data_size is negative number.

As a result, I am not sure if any race could even produce
bigger *data_size which might point behind the log buffer.

Catch the race:
---------------

We still should try to catch the race.

For example, desc_read() reads the state before and after
reading the data. It might invalidate them when the states
differ. Something like:

diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
index 56c8e3d031f4..a5a456904b78 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -454,13 +454,13 @@ static enum desc_state desc_read(struct prb_desc_ring *desc_ring,
struct printk_info *info = to_info(desc_ring, id);
struct prb_desc *desc = to_desc(desc_ring, id);
atomic_long_t *state_var = &desc->state_var;
- enum desc_state d_state;
+ enum desc_state d_state_before, d_state_after;
unsigned long state_val;

/* Check the descriptor state. */
state_val = atomic_long_read(state_var); /* LMM(desc_read:A) */
- d_state = get_desc_state(id, state_val);
- if (d_state == desc_miss || d_state == desc_reserved) {
+ d_state_before = get_desc_state(id, state_val);
+ if (d_state_before == desc_miss || d_state_before == desc_reserved) {
/*
* The descriptor is in an inconsistent state. Set at least
* @state_var so that the caller can see the details of
@@ -546,11 +546,27 @@ static enum desc_state desc_read(struct prb_desc_ring *desc_ring,
* which may have changed since the load above.
*/
state_val = atomic_long_read(state_var); /* LMM(desc_read:E) */
- d_state = get_desc_state(id, state_val);
+ d_state_after = get_desc_state(id, state_val);
+
+ /*
+ * The copied output data might be inconsistent when the state
+ * has changed in the meantime. Make the caller aware that
+ * they are not final.
+ *
+ * FIXME: We need to double check all callers ignore the copied
+ * data when desc_reserved is returned!
+ *
+ * A cleaner solution might be introducing another state
+ * for this.
+ *
+ * Also we might want to check/modify @state_val instead.
+ */
+ if (d_state_after != d_state_before)
+ d_state_after = desc_reserved;
out:
if (desc_out)
atomic_long_set(&desc_out->state_var, state_val);
- return d_state;
+ return d_state_after;
}

/*

Also we might add more consistency checks when reading the data.
For example:

diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
index 56c8e3d031f4..e252678337f6 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -1319,6 +1335,12 @@ static const char *get_data(struct prb_data_ring *data_ring,
/* Subtract block ID space from size to reflect data size. */
*data_size -= sizeof(db->id);

+ /* Sanity check that we stay inside the printk data buffer */
+ if (WARN_ON_ONCE(&db->data[0] < data_ring->data ||
+ &db->data[0] + *data_size >= data_ring->data + DATA_SIZE(data_ring)))
+ return NULL;
+
+
return &db->data[0];
}

Hmm, I have tried to run printk printk_ringbuffer_kunit_test
and I have triggered the WARN_ON_ONCE() in get_data():

[ 269.954151] KTAP version 1
[ 269.954601] 1..1
[ 269.955248] KTAP version 1
[ 269.955718] # Subtest: printk-ringbuffer
[ 269.956287] # module: printk_ringbuffer_kunit_test
[ 269.956294] 1..1
[ 269.957591] # test_readerwriter: running for 10000 ms
[ 269.958731] # test_readerwriter: start thread 001 (writer)
[ 269.958827] # test_readerwriter: start thread 002 (writer)
[ 269.959031] # test_readerwriter: start thread 003 (writer)
[ 269.959287] # test_readerwriter: start thread 004 (writer)
[ 269.959550] # test_readerwriter: start thread 005 (writer)
[ 269.959649] # test_readerwriter: start thread 006 (writer)
[ 269.959800] # test_readerwriter: start thread 007 (writer)
[ 269.959888] # test_readerwriter: start thread 008 (writer)
[ 269.963125] # test_readerwriter: start thread 009 (writer)
[ 269.963205] # test_readerwriter: start thread 010 (writer)
[ 269.963275] # test_readerwriter: starting test
[ 269.963284] # test_readerwriter: start thread 011 (writer)
[ 269.963317] # test_readerwriter: start reader
[ 269.963919] ------------[ cut here ]------------
[ 269.972098] &db->data[0] < data_ring->data || &db->data[0] + *data_size >= data_ring->data + DATA_SIZE(data_ring)
[ 269.972101] WARNING: kernel/printk/printk_ringbuffer.c:1339 at get_data+0x13a/0x150, CPU#0: kunit_try_catch/4052
[ 269.972112] Modules linked in: printk_ringbuffer_kunit_test(E)
[ 269.972122] CPU: 0 UID: 0 PID: 4052 Comm: kunit_try_catch Kdump: loaded Tainted: G E N 7.0.0-rc3-default-dirty #562 PREEMPT(full) 60283ab8e9196bbd61ecb2de6cdd826bbd2749b4
[ 269.972128] Tainted: [E]=UNSIGNED_MODULE, [N]=TEST
[ 269.972130] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-2-g4f253b9b-prebuilt.qemu.org 04/01/2014
[ 269.972133] RIP: 0010:get_data+0x13a/0x150
[ 269.972138] Code: 44 29 d2 48 f7 d0 4c 21 d0 48 03 47 08 41 89 10 85 d2 0f 85 66 ff ff ff 0f 0b 31 c0 e9 0a f2 cd ff 0f 0b 31 c0 e9 01 f2 cd ff <0f> 0b 31 c0 e9 f8 f1 cd ff 0f 0b eb e3 0f 0b eb df 0f 0b eb db 90
[ 269.972141] RSP: 0018:ffffd084024d3b98 EFLAGS: 00010246
[ 269.972145] RAX: ffffffffc028cfc0 RBX: 0000000000002b57 RCX: 000000000000000d
[ 269.972148] RDX: ffffffffc028d040 RSI: ffffffffc028d040 RDI: 0000000000002000
[ 269.972151] RBP: 3fffffffffffffff R08: ffffd084024d3bdc R09: 00000000000d7f78
[ 269.972153] R10: 00000000000d7f78 R11: 0000000000002000 R12: 000000000000007c
[ 269.972156] R13: 00000000000004f8 R14: ffffffffc0295860 R15: 0000000000002610
[ 269.972160] FS: 0000000000000000(0000) GS:ffff894007af0000(0000) knlGS:0000000000000000
[ 269.972163] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 269.972166] CR2: 00007fc8e1e68758 CR3: 0000000142458000 CR4: 0000000000750ef0
[ 269.972171] PKRU: 55555554
[ 269.972173] Call Trace:
[ 269.972177] <TASK>
[ 269.972179] _prb_read_valid+0x179/0x370
[ 269.972210] prb_read_valid+0x1b/0x30
[ 269.997890] prbtest_reader.isra.0+0xec/0x304 [printk_ringbuffer_kunit_test 67c42ea3f25a5751580d5cf445aab4427a263db6]
[ 269.997905] ? __pfx_autoremove_wake_function+0x10/0x10
[ 270.000468] ? __pfx_prbtest_wakeup_callback+0x10/0x10 [printk_ringbuffer_kunit_test 67c42ea3f25a5751580d5cf445aab4427a263db6]
[ 270.002296] test_readerwriter.cold+0x62/0x6bc [printk_ringbuffer_kunit_test 67c42ea3f25a5751580d5cf445aab4427a263db6]
[ 270.003999] ? srso_alias_return_thunk+0x5/0xfbef5
[ 270.004006] ? lockdep_hardirqs_on+0x8c/0x130
[ 270.004019] ? __pfx_kunit_generic_run_threadfn_adapter+0x10/0x10
[ 270.004026] kunit_try_run_case+0x72/0xf0
[ 270.004033] ? srso_alias_return_thunk+0x5/0xfbef5
[ 270.004037] ? srso_alias_return_thunk+0x5/0xfbef5
[ 270.004040] ? lockdep_hardirqs_on+0x8c/0x130
[ 270.004043] ? srso_alias_return_thunk+0x5/0xfbef5
[ 270.004052] ? __pfx_kunit_generic_run_threadfn_adapter+0x10/0x10
[ 270.004057] kunit_generic_run_threadfn_adapter+0x21/0x40
[ 270.004063] kthread+0x103/0x140
[ 270.004068] ? __pfx_kthread+0x10/0x10
[ 270.004079] ret_from_fork+0x276/0x3a0
[ 270.004088] ? __pfx_kthread+0x10/0x10
[ 270.004092] ? __pfx_kthread+0x10/0x10
[ 270.004100] ret_from_fork_asm+0x1a/0x30
[ 270.004135] </TASK>
[ 270.004137] irq event stamp: 879
[ 270.004139] hardirqs last enabled at (885): [<ffffffff8d41e9b5>] console_trylock_spinning+0x1c5/0x1f0
[ 270.004144] hardirqs last disabled at (890): [<ffffffff8d41e968>] console_trylock_spinning+0x178/0x1f0
[ 270.004148] softirqs last enabled at (458): [<ffffffff8d36c7c0>] handle_softirqs+0x380/0x450
[ 270.004154] softirqs last disabled at (449): [<ffffffff8d36c979>] __irq_exit_rcu+0xd9/0x150
[ 270.004158] ---[ end trace 0000000000000000 ]---
[ 280.158960] # test_readerwriter: end reader: read=17402256 seq=17524469
[ 280.160183] # test_readerwriter: completed test
[ 280.163200] # test_readerwriter: end thread 011: wrote=1619327
[ 280.164338] # test_readerwriter: end thread 010: wrote=1590337
[ 280.165441] # test_readerwriter: end thread 009: wrote=1625807
[ 280.166375] # test_readerwriter: end thread 008: wrote=1592085
[ 280.167406] # test_readerwriter: end thread 007: wrote=1593299
[ 280.168341] # test_readerwriter: end thread 006: wrote=1591461
[ 280.169378] # test_readerwriter: end thread 005: wrote=1593876
[ 280.170253] # test_readerwriter: end thread 004: wrote=1608503
[ 280.171123] # test_readerwriter: end thread 003: wrote=1620013
[ 280.171970] # test_readerwriter: end thread 002: wrote=1616222
[ 280.172798] # test_readerwriter: end thread 001: wrote=1595192
[ 280.173652] # test_readerwriter.speed: slow
[ 280.173660] ok 1 test_readerwriter
[ 280.174266] ok 1 printk-ringbuffer

I have to think more about it.

Anyway, I wonder whether to discuss this in public. I believe that if
there was a race that it would be very hard to misuse in practice.

Best Regards,
Petr