[PATCH AUTOSEL 6.19-6.12] scsi: mpi3mr: Clear reset history on ready and recheck state after timeout
From: Sasha Levin
Date: Tue Mar 10 2026 - 05:09:45 EST
From: Ranjan Kumar <ranjan.kumar@xxxxxxxxxxxx>
[ Upstream commit dbd53975ed4132d161b6a97ebe785a262380182d ]
The driver retains reset history even after the IOC has successfully
reached the READY state. That leaves stale reset information active during
normal operation and can mislead recovery and diagnostics. In addition, if
the IOC becomes READY just as the ready timeout loop exits, the driver
still follows the failure path and may retry or report failure incorrectly.
Clear reset history once READY is confirmed so driver state matches actual
IOC status. After the timeout loop, recheck the IOC state and treat READY
as success instead of failing.
Signed-off-by: Ranjan Kumar <ranjan.kumar@xxxxxxxxxxxx>
Link: https://patch.msgid.link/20260225082622.82588-1-ranjan.kumar@xxxxxxxxxxxx
Signed-off-by: Martin K. Petersen <martin.petersen@xxxxxxxxxx>
Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>
---
LLM Generated explanations, may be completely bogus:
Now I have a clear picture of the changes. Let me finalize my analysis.
## Analysis
### What the commit does
This commit makes two small changes to `mpi3mr_bring_ioc_ready()` in the
mpi3mr SCSI driver:
1. **Clears reset history on READY** (line 1533): When the IOC
transitions to READY state inside the polling loop, it now calls
`mpi3mr_clear_reset_history()` before returning success. Without
this, stale reset history bits remain set, which can confuse
subsequent recovery logic that checks
`MPI3_SYSIF_IOC_STATUS_RESET_HISTORY`.
2. **Rechecks state after timeout** (lines 1553-1560): After the
`do...while` timeout loop exits, the code does one final check to see
if the IOC became READY right at the boundary. Previously, if the IOC
transitioned to READY during the last `msleep(100)` (after the
elapsed time already exceeded the timeout), the driver would fall
through to `out_failed` and potentially retry unnecessarily or report
failure.
### Bug assessment
**Race at timeout boundary (bug #2):** This is a real timing bug. The
loop checks `elapsed_time_sec < mrioc->ready_timeout` *after*
`msleep(100)`. If the IOC becomes READY during that last sleep,
`elapsed_time_sec` has already exceeded the timeout, so the loop exits.
Without the post-loop recheck, the driver goes to `out_failed`, which
triggers up to 2 retries (unnecessary soft resets) or reports
initialization failure. This can cause:
- Unnecessary controller resets during probe/resume
- Spurious initialization failures if retries are exhausted
- I/O failures during reset recovery (since `mpi3mr_reinit_ioc` fails)
**Stale reset history (bug #1):** If reset history is not cleared when
the IOC reaches READY, subsequent iterations of the loop in future calls
to this function will see `MPI3_SYSIF_IOC_STATUS_RESET_HISTORY` set and
jump to `out_failed` (line 1537-1540), causing spurious failure.
### Stable kernel criteria
- **Fixes a real bug:** Yes - timing race causes false initialization
failures; stale state causes spurious failures in subsequent
operations.
- **Obviously correct:** Yes - `mpi3mr_clear_reset_history()` is already
called elsewhere in the same function (at line 1510 after soft reset).
The post-timeout recheck follows the same pattern as the in-loop
check.
- **Small and contained:** Yes - 10 lines added, single file, single
function.
- **No new features:** Correct - pure bug fix.
- **Risk:** Very low. The added code only adds a success path where
there was none (post-timeout READY check) and clears a status bit that
should be cleared (reset history). No existing success/failure paths
are altered.
### Impact
This function is called on three critical paths: PCI probe, controller
reset recovery, and system resume. A false failure here means the SCSI
controller doesn't initialize, which means loss of access to all
attached storage devices. This affects real users with Broadcom MPI3
(SAS/NVMe) storage controllers.
### Verification
- Confirmed `mpi3mr_clear_reset_history` is a trivial inline that clears
the RESET_HISTORY bit in IOC status register (lines 1256-1263).
- Confirmed the function is called from probe (`mpi3mr_init_ioc`), reset
(`mpi3mr_reinit_ioc`), and resume paths.
- Confirmed that `mpi3mr_clear_reset_history` is already used elsewhere
in the same function, so the pattern is established.
- Confirmed the timeout race: the `do...while` loop exits when
`elapsed_time_sec >= ready_timeout`, but the IOC could have become
READY during the final `msleep(100)`, which is not checked without
this patch.
- Confirmed that falling through to `out_failed` triggers retries (up to
2) or returns failure (line 1564-1577).
- The commit is authored by Broadcom (the hardware vendor), accepted by
the SCSI maintainer Martin Petersen, indicating domain expertise.
- Could NOT verify specific user reports of this race triggering in the
field (commit message implies it was found via code
review/diagnostics), but the race window is real and the consequences
are severe.
**YES**
drivers/scsi/mpi3mr/mpi3mr_fw.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/scsi/mpi3mr/mpi3mr_fw.c b/drivers/scsi/mpi3mr/mpi3mr_fw.c
index 8c4bb7169a87c..6d36575997871 100644
--- a/drivers/scsi/mpi3mr/mpi3mr_fw.c
+++ b/drivers/scsi/mpi3mr/mpi3mr_fw.c
@@ -1530,6 +1530,7 @@ static int mpi3mr_bring_ioc_ready(struct mpi3mr_ioc *mrioc)
ioc_info(mrioc,
"successfully transitioned to %s state\n",
mpi3mr_iocstate_name(ioc_state));
+ mpi3mr_clear_reset_history(mrioc);
return 0;
}
ioc_status = readl(&mrioc->sysif_regs->ioc_status);
@@ -1549,6 +1550,15 @@ static int mpi3mr_bring_ioc_ready(struct mpi3mr_ioc *mrioc)
elapsed_time_sec = jiffies_to_msecs(jiffies - start_time)/1000;
} while (elapsed_time_sec < mrioc->ready_timeout);
+ ioc_state = mpi3mr_get_iocstate(mrioc);
+ if (ioc_state == MRIOC_STATE_READY) {
+ ioc_info(mrioc,
+ "successfully transitioned to %s state after %llu seconds\n",
+ mpi3mr_iocstate_name(ioc_state), elapsed_time_sec);
+ mpi3mr_clear_reset_history(mrioc);
+ return 0;
+ }
+
out_failed:
elapsed_time_sec = jiffies_to_msecs(jiffies - start_time)/1000;
if ((retry < 2) && (elapsed_time_sec < (mrioc->ready_timeout - 60))) {
--
2.51.0