Re: [REGRESSION?] scsi: sas: wildcard user scan may iterate over huge max_id
From: Li Lingfeng
Date: Mon Mar 30 2026 - 22:35:28 EST
在 2026/3/30 20:18, James Bottomley 写道:
On Sat, 2026-03-28 at 10:28 +0800, Li Lingfeng wrote:Hi James,
Hi,In the case of smartpqi, it isn't designed to be user scanned, I think.
I think commit 37c4e72b0651 ("scsi: Fix sas_user_scan() to handle
wildcard and multi-channel scans") may introduce a regression for
wildcard scans on some SAS hosts.
Userspace trigger:
echo "- - -" > /sys/class/scsi_host/host0/scan
results in:
channel = SCAN_WILD_CARD
id = SCAN_WILD_CARD
lun = SCAN_WILD_CARD
Before this commit, sas_user_scan() iterated sas_host->rphy_list and
called scsi_scan_target() for matching rphys. In effect, scanning was
limited to channel 0 and to target ids present in sas_host-
rphy_list.After this commit, sas_user_scan() does:
- scan channel 0 via scan_channel_zero()
- scan channels 1..shost->max_channel via
scsi_scan_host_selected()
When id == SCAN_WILD_CARD, the latter path goes through
scsi_scan_channel(), which iterates ids from 0 to shost->max_id.
This looks problematic for drivers that use a very large max_id. For
example, smartpqi sets:
shost->max_id = ~0;
In that case, a wildcard scan may end up iterating from id 0 to ~0 in
scsi_scan_channel(). In my testing/analysis, this makes the scan take
a very long time, and the id-space walk itself does not seem
meaningful for this SAS transport scan path.
So while the commit fixes incomplete wildcard channel handling, it
also appears to expand the id scan range from:
sas_host->rphy_list target ids
to:
0..shost->max_id
for the additional channels.
It seems to me that wildcard SAS scans should probably remain bounded
by transport-discovered SAS targets, instead of falling back to a
host-wide id enumeration for the extra channels. One possible
direction may be to avoid calling scsi_scan_host_selected() with id
== SCAN_WILD_CARD from sas_user_scan(), or otherwise constrain the id
range in a transport-aware way.
Am I understanding this correctly? If so, what would be the preferred
way to address this? I would appreciate feedback on whether this is
considered a real regression, and on the best fix direction.
So, as you say, it would take a long time to scan one channel. Since
it sets max_channels to 3, it would only take 4 times longer which
hardly constitutes a regression.
Doing serial scans is very scsi-2 so most discoverable device fabrics
don't bother and get the default settings for the scan max_channels
(which is zero). The only devices that seem to care about this at all
are fat firmware devices that bundle RAID or other capabilities by re-
purposing channels and they seem to be the ones that want this
behaviour:
https://lore.kernel.org/linux-scsi/CAFdVvOwjy+2ORJ6uJkspiLTPF05481U7gcS4QohFOFGPqAs8ig@xxxxxxxxxxxxxx/
Regards,
James
Thank you very much for the reply and for the additional background.
I would like to clarify one point about the performance regression I was
trying to describe.
I was not referring to the change from scanning one channel to scanning
multiple channels. My concern was about the change in the target ID scan
range within a single channel.
Before commit 37c4e72b0651 ("scsi: Fix sas_user_scan() to handle wildcard
and multi-channel scans"), the SAS path was effectively bounded by
rphy->scsi_target_id values discovered by the transport. After that change,
for the additional channels, the scan may go through scsi_scan_channel()
and iterate IDs in the range 0..shost->max_id when id == SCAN_WILD_CARD.
So the performance concern I had in mind was not really:
"one channel" -> "multiple channels"
but rather:
"scan transport-discovered IDs" -> "scan 0..max_id within a channel"
That said, after reading your reply, my current understanding is that the
motivation for 37c4e72b0651 is mainly to support controllers such as
mpt3sas and mpi3mr, where non-zero channels are meaningful and expected.
From that perspective, it seems to me that for scenarios that do not
involve mpt3sas/mpi3mr-like usage, one option would be to simply not take
37c4e72b0651, while if we do take it, we should accept that it may bring
this kind of scan-time performance regression on some hosts.
Does that sound like a reasonable way to look at it?
Thanks again for the clarification.
Regards,
Lingfeng.