Re: [PATCH] scsi: libsas: Fix disk not being scanned in after being removed

From: yangxingui
Date: Tue Mar 05 2024 - 06:29:11 EST



Hi John,
On 2024/3/5 18:15, John Garry wrote:
On 05/03/2024 02:56, Jason Yan wrote:
On 2024/3/4 20:50, yangxingui wrote:
Hi Jason,

On 2024/3/1 9:55, Jason Yan wrote:
On 2024/2/29 2:13, John Garry wrote:
On 21/02/2024 07:31, Xingui Yang wrote:
As of commit d8649fc1c5e4 ("scsi: libsas: Do discovery on empty PHY to
update PHY info"), do discovery will send a new SMP_DISCOVER and update
phy->phy_change_count. We found that if the disk is reconnected and phy
change_count changes at this time, the disk scanning process will not be
triggered.

So update the PHY info with the last query results.

Fixes: d8649fc1c5e4 ("scsi: libsas: Do discovery on empty PHY to update PHY info")
Signed-off-by: Xingui Yang <yangxingui@xxxxxxxxxx>kkkkk
---
  drivers/scsi/libsas/sas_expander.c | 9 ++++-----
  1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index a2204674b680..9563f5589948 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -1681,6 +1681,10 @@ int sas_get_phy_attached_dev(struct domain_device *dev, int phy_id,
          if (*type == 0)
              memset(sas_addr, 0, SAS_ADDR_SIZE);
      }
+
+    if ((SAS_ADDR(sas_addr) == 0) || (res == -ECOMM))

It's odd to call sas_set_ex_phy() if we got res == -ECOMM. I mean, in this this case disc_resp is not filled in as the command did not execute, right? I know that is what the current code does, but it is strange.

The current code actually re-send the SMP command and update the PHY status only when the the SMP command is responded correctly.

Xinggui, can you please fix this and send v3?
The current location cannot directly update the phy information. The previous phy information will be used later, and the previous sas address will be compared with the currently queried sas address. At present, v2 is more suitable after many days of testing.

I don't understand this. Where is the previous SAS address compared to the current SAS address?

Could this work:

diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index a2204674b680..e190038ba7bd 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -1675,11 +1675,13 @@ int sas_get_phy_attached_dev(struct domain_device *dev, int phy_id,

        res = sas_get_phy_discover(dev, phy_id, disc_resp);
        if (res == 0) {
-               memcpy(sas_addr, disc_resp->disc.attached_sas_addr,
-                      SAS_ADDR_SIZE);
                *type = to_dev_type(&disc_resp->disc);
-               if (*type == 0)
+               if (*type == SAS_PHY_UNUSED)
                        memset(sas_addr, 0, SAS_ADDR_SIZE);
+               else
+                       memcpy(sas_addr, disc_resp->disc.attached_sas_addr,
+                      SAS_ADDR_SIZE);
+               sas_set_ex_phy(dev, phy_id, disc_resp);
        }
        kfree(disc_resp);
        return res;
lines 1-21/21 (END)

It's like the change in this patch.
This doesn't work properly. the previous sas address will be compared with the currently queried sas address and the previous phy information will also be used when calling sas_unregister_devs_sas_addr() after the sas_rediscover_dev() function calls sas_get_phy_attached_dev(). Therefore, it is more appropriate to update the phy information after the device is unregistered. as follows:
static int sas_rediscover_dev(struct domain_device *dev, int phy_id,
bool last, int sibling)
{
...
res = sas_get_phy_attached_dev(dev, phy_id, sas_addr, &type);
switch (res) {
case SMP_RESP_NO_PHY:
phy->phy_state = PHY_NOT_PRESENT;
sas_unregister_devs_sas_addr(dev, phy_id, last);
return res;
case SMP_RESP_PHY_VACANT:
phy->phy_state = PHY_VACANT;
sas_unregister_devs_sas_addr(dev, phy_id, last);
return res;
case SMP_RESP_FUNC_ACC:
break;
case -ECOMM:
break;
default:
return res;
}

if ((SAS_ADDR(sas_addr) == 0) || (res == -ECOMM)) {
phy->phy_state = PHY_EMPTY;
sas_unregister_devs_sas_addr(dev, phy_id, last);
/*
* Even though the PHY is empty, for convenience we discover
* the PHY to update the PHY info, like negotiated linkrate.
*/
sas_ex_phy_discover(dev, phy_id);
return res;
} else if (SAS_ADDR(sas_addr) == SAS_ADDR(phy->attached_sas_addr) && // <=== Compare the previous sas address with the current sas address
dev_type_flutter(type, phy->attached_dev_type)) {
struct domain_device *ata_dev = sas_ex_to_ata(dev, phy_id);
char *action = "";

sas_ex_phy_discover(dev, phy_id);

if (ata_dev && phy->attached_dev_type == SAS_SATA_PENDING)
action = ", needs recovery";
pr_debug("ex %016llx phy%02d broadcast flutter%s\n",
SAS_ADDR(dev->sas_addr), phy_id, action);
return res;
}




OK, so let me have a closer look at v2.

I have to say that v2 is quite complicated...
Yes, but it works.

Thanks,
Xingui