KCSAN: data-race in scsi_block_when_processing_errors / scsi_host_set_state

From: Jianzhou Zhao

Date: Wed Mar 11 2026 - 04:10:19 EST




Subject: [BUG] scsi: core: KCSAN: data-race in scsi_block_when_processing_errors / scsi_host_set_state

Dear Maintainers,

We are writing to report a KCSAN-detected data race vulnerability within the SCSI core subsystem (`drivers/scsi/hosts.c` and `include/scsi/scsi_host.h`). This bug was found by our custom fuzzing tool, RacePilot. The race occurs during the host state transition while an error recovery process is active, specifically between the active modification of `shost->shost_state` within `scsi_host_set_state` and the lockless checking loop inside `scsi_block_when_processing_errors` through `scsi_host_in_recovery()`. We observed this bug on the Linux kernel version 6.18.0-08691-g2061f18ad76e-dirty.

Call Trace & Context
==================================================================
BUG: KCSAN: data-race in scsi_block_when_processing_errors / scsi_host_set_state

write to 0xffff888009ff4280 of 4 bytes by task 307 on cpu 1:
scsi_host_set_state+0x92/0x180 drivers/scsi/hosts.c:148
scsi_restart_operations drivers/scsi/scsi_error.c:2162 [inline]
scsi_error_handler+0x269/0x840 drivers/scsi/scsi_error.c:2372
...

read to 0xffff888009ff4280 of 4 bytes by task 22653 on cpu 0:
scsi_host_in_recovery include/scsi/scsi_host.h:754 [inline]
scsi_block_when_processing_errors+0x41/0x240 drivers/scsi/scsi_error.c:388
sr_open+0x2e/0x60 drivers/scsi/sr.c:609
cdrom_open+0xbc/0xec0 drivers/cdrom/cdrom.c:1154
sr_block_open+0x9b/0x120 drivers/scsi/sr.c:512
blkdev_get_whole+0x55/0x1f0 block/bdev.c:758
...
__x64_sys_openat+0xc2/0x130 fs/open.c:1447

value changed: 0x00000005 -> 0x00000002

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 UID: 0 PID: 22653 Comm: syz.1.672 Not tainted 6.18.0-08691-g2061f18ad76e-dirty #42 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
==================================================================

Execution Flow & Code Context
When the SCSI error handler resolves outstanding issues and restarts operations, it invokes `scsi_restart_operations()`, which transitions the SCSI host state back to `SHOST_RUNNING` by calling `scsi_host_set_state()`. This alters the state enum variable locklessly but assigns the target state indiscriminately via standard assignment:
```c
// drivers/scsi/hosts.c
int scsi_host_set_state(struct Scsi_Host *shost, enum scsi_host_state state)
{
...
shost->shost_state = state; // <-- Plain concurrent 4-byte write
return 0;
...
}
```

At the exact same time, a completely separate thread opening the SCSI/CD-ROM device issues `wait_event(sdev->host->host_wait, !scsi_host_in_recovery(sdev->host));` inside `scsi_block_when_processing_errors()`. The `wait_event` loop repeatedly checks `scsi_host_in_recovery()`, which evaluates the `shost_state`:
```c
// include/scsi/scsi_host.h
static inline int scsi_host_in_recovery(struct Scsi_Host *shost)
{
return shost->shost_state == SHOST_RECOVERY || // <-- Plain concurrent 4-byte read
shost->shost_state == SHOST_CANCEL_RECOVERY ||
shost->shost_state == SHOST_DEL_RECOVERY ||
shost->tmf_in_progress;
}
```

Root Cause Analysis
A KCSAN data race unfolds due to the unprotected modification of the enum type `shost->shost_state` competing against the lockless predicate loop inherent to `wait_event`. The condition function `scsi_host_in_recovery` repetitively evaluates `shost_state`. Since this value can asynchronously shift (from `SHOST_RECOVERY` to `SHOST_RUNNING` as shown in the KCSAN trace when observing the change `0x00000005 -> 0x00000002`), evaluating it as a plain variable lacks fundamental compiler memory consistency boundaries. Without adequate compiler barriers, this specific mutation can suffer load tearing or generate severe KCSAN spam due to read-caching compiler optimizations globally.
Unfortunately, we were unable to generate a reproducer for this bug.

Potential Impact
This data race largely forces dynamic analysis tools like KCSAN to repetitively issue warnings when traversing the SCSI block paths, obstructing true analysis. In certain highly optimized architectures, standard assignments and reads over variable data can encounter load-tearing, wherein the predicate evaluation observes an illegal or incomplete transitional state structure causing `wait_event` to behave inconsistently.

Proposed Fix
We propose implementing `WRITE_ONCE` and `READ_ONCE` within the transition path and wait queue evaluator specifically for `shost_state` updates to respect proper concurrency protocols.

```diff
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -145,7 +145,7 @@ int scsi_host_set_state(struct Scsi_Host *shost, enum scsi_host_state state)
}
break;
}
- shost->shost_state = state;
+ WRITE_ONCE(shost->shost_state, state);
return 0;

illegal:
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -751,9 +751,11 @@ static inline struct Scsi_Host *dev_to_shost(struct device *dev)

static inline int scsi_host_in_recovery(struct Scsi_Host *shost)
{
- return shost->shost_state == SHOST_RECOVERY ||
- shost->shost_state == SHOST_CANCEL_RECOVERY ||
- shost->shost_state == SHOST_DEL_RECOVERY ||
+ enum scsi_host_state state = READ_ONCE(shost->shost_state);
+
+ return state == SHOST_RECOVERY ||
+ state == SHOST_CANCEL_RECOVERY ||
+ state == SHOST_DEL_RECOVERY ||
shost->tmf_in_progress;
}
```

We would be highly honored if this could be of any help.

Best regards,
RacePilot Team