Re: [BUG] KCSAN: data-race in process_one_work / process_one_work

From: Mirsad Todorovac
Date: Fri Aug 18 2023 - 08:00:00 EST


On 8/18/23 10:10, Mirsad Todorovac wrote:


On 8/17/23 23:59, Tejun Heo wrote:
On Thu, Aug 17, 2023 at 09:24:28PM +0200, Mirsad Todorovac wrote:
Hi,

This is your friendly bug reporter.

The environment is vanilla torvalds tree kernel on Ubuntu 22.04 LTS and on a Ryzen 7950X
assembled box.

The kernel reports G taint "properietary module loaded", but I know of no module since
the boot state when the status was "Not tainted".

Please find the complete dmesg output (or what's left in the ring buffer) and lshw output
attached.

Here is the dmesg output excerpt:

[ 6849.221584] ==================================================================
[ 6849.221607] BUG: KCSAN: data-race in process_one_work / process_one_work

[ 6849.221635] write to 0xffff9b7440151398 of 8 bytes by task 6364 on cpu 7:
[ 6849.221647]  process_one_work+0x504/0x930
[ 6849.221660]  worker_thread+0x311/0x7e0
[ 6849.221673]  kthread+0x18b/0x1d0
[ 6849.221683]  ret_from_fork+0x43/0x70
[ 6849.221695]  ret_from_fork_asm+0x1b/0x30

[ 6849.221712] read to 0xffff9b7440151398 of 8 bytes by task 5586 on cpu 28:
[ 6849.221724]  process_one_work+0x4e8/0x930
[ 6849.221737]  worker_thread+0x519/0x7e0
[ 6849.221749]  kthread+0x18b/0x1d0
[ 6849.221759]  ret_from_fork+0x43/0x70
[ 6849.221770]  ret_from_fork_asm+0x1b/0x30

[ 6849.221786] value changed: 0x00000000000154d0 -> 0x00000000000154d1

[ 6849.221802] Reported by Kernel Concurrency Sanitizer on:
[ 6849.221811] CPU: 28 PID: 5586 Comm: kworker/u64:0 Tainted: G             L     6.5.0-rc6-net-cfg-kcsan-00038-g16931859a650 #35
[ 6849.221825] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
[ 6849.221833] Workqueue: events_unbound wq_barrier_func
[ 6849.221846] ==================================================================

That looks like the PENDING bit. The setting is atomic. Maybe there's a raw
read? Can you map the addresses to lines?

Thanks.

Good news:

[ 1863.554079] ==================================================================
[ 1863.554118] BUG: KCSAN: data-race in process_one_work / process_one_work

[ 1863.554142] write to 0xffff963d99d79998 of 8 bytes by task 5394 on cpu 27:
[ 1863.554154] process_one_work (kernel/workqueue.c:2598)
[ 1863.554166] worker_thread (./include/linux/list.h:292 kernel/workqueue.c:2752)
[ 1863.554177] kthread (kernel/kthread.c:389)
[ 1863.554186] ret_from_fork (arch/x86/kernel/process.c:145)
[ 1863.554197] ret_from_fork_asm (arch/x86/entry/entry_64.S:312)

[ 1863.554213] read to 0xffff963d99d79998 of 8 bytes by task 5450 on cpu 12:
[ 1863.554224] process_one_work (kernel/workqueue.c:2598)
[ 1863.554235] worker_thread (./include/linux/list.h:292 kernel/workqueue.c:2752)
[ 1863.554247] kthread (kernel/kthread.c:389)
[ 1863.554255] ret_from_fork (arch/x86/kernel/process.c:145)
[ 1863.554266] ret_from_fork_asm (arch/x86/entry/entry_64.S:312)

[ 1863.554280] value changed: 0x0000000000001766 -> 0x000000000000176a

[ 1863.554295] Reported by Kernel Concurrency Sanitizer on:
[ 1863.554303] CPU: 12 PID: 5450 Comm: kworker/u64:1 Tainted: G             L     6.5.0-rc6+ #44
[ 1863.554314] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
[ 1863.554322] Workqueue: btrfs-endio btrfs_end_bio_work [btrfs]
[ 1863.554941] ==================================================================


    lockdep_invariant_state(true);
:    pwq->stats[PWQ_STAT_STARTED]++;
    trace_workqueue_execute_start(work);
    worker->current_func(work);

Just as an exercise, this seems to work, but it is probably not very clean and not very scalable:

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 800b4208dba9..237e5800fbad 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2595,7 +2595,7 @@ __acquires(&pool->lock)
* workqueues), so hiding them isn't a problem.
*/
lockdep_invariant_state(true);
- pwq->stats[PWQ_STAT_STARTED]++;
+ atomic_inc((atomic_t *) &pwq->stats[PWQ_STAT_STARTED]);
trace_workqueue_execute_start(work);
worker->current_func(work);
/*

--
Best regards,
Mirsad Todorovac