Re: [PATCH v4] scsi: libsas: Fix the failure of adding phy with zero-address to port

From: John Garry
Date: Wed Nov 29 2023 - 07:55:05 EST


On 28/11/2023 03:45, yangxingui wrote:

On 2023/11/28 3:28, John Garry wrote:
On 24/11/2023 02:27, yangxingui wrote:
We already do this in sas_ex_join_wide_port(), right?
No, If the addr of ex_phy matches dev->parent, sas_ex_join_wide_port() will not be called, but sas_add_parent_port() will be called  as follows:
static int sas_ex_discover_dev(struct domain_device *dev, int phy_id)
{
         struct expander_device *ex = &dev->ex_dev;
         struct ex_phy *ex_phy = &ex->ex_phy[phy_id];
         struct domain_device *child = NULL;
         int res = 0;

     <...>
         /* Parent and domain coherency */
         if (!dev->parent && sas_phy_match_port_addr(dev->port, ex_phy)) {
                 sas_add_parent_port(dev, phy_id);
                 return 0;
         }
         if (dev->parent && sas_phy_match_dev_addr(dev->parent, ex_phy)) {
                 sas_add_parent_port(dev, phy_id);
                 if (ex_phy->routing_attr == TABLE_ROUTING)
                         sas_configure_phy(dev, phy_id, dev->port->sas_addr, 1);
                 return 0;
         }
     <...>
}


I am not saying that what we do now does not have a problem - I am just trying to understand what currently happens

ok, because ex_phy->port is not set when calling sas_add_parent_port(), when deleting phy from the parent wide port, it is not removed from the phy_list of the parent wide port as follows:
static void sas_unregister_devs_sas_addr(struct domain_device *parent,
                                          int phy_id, bool last)
{
     <...>
     // Since ex_phy->port is not set, this branch will not be enter

But then how does this ever work? It is because we follow path sas_rediscover_dev() -> sas_discover_new() -> sas_ex_discover_devices() -> sas_ex_discover_dev() -> sas_add_parent_port(), and not sas_rediscover_dev() -> sas_discover_new() -> sas_ex_join_wide_port()? If so, is that because ephy->sas_attached_phy == 0 in sas_discover_new() -> sas_ex_join_wide_port() and it fails?

BTW, about something mentioned earlier - adding the phy19 with SAS_ADDR

Yes,
For phy19, when the phy is attached and added to the parent wide port, the path is:
sas_rediscover()
    ->sas_discover_new()
        ->sas_ex_discover_devices()
            ->sas_ex_discover_dev()
                -> sas_add_parent_port().

ok, so then the change to set ex_phy->port = ex->parent_port looks ok. Maybe we can put this in a helper with the sas_port_add_phy() call, as it is duplicated in sas_ex_join_wide_port()

Do we also need to set ex_phy->phy_state (like sas_ex_join_wide_port())?

And the path called when it is removed from parent wide port is:
sas_rediscover()
    ->sas_unregister_devs_sas_addr() // The sas address of phy19 becomes 0. Since ex_phy->port is NULL, phy19 is not removed from the parent wide port's phy_list.

For phy0, it is connected to a new sata device.
sas_rediscover()
    ->sas_discover_new()->sas_ex_phy_discover()
                            ->sas_ex_phy_discover_helper()
                                ->sas_set_ex_phy() // The device type is stp. Since the linkrate is 5 and less than 1.5G, sas_address is set to 0.

Then when we get the proper linkrate later, will we then rediscover and set the proper SAS address? I am just wondering if this change is really required?

BTW, Even with the change to set ex_phy->port = ex->parent_port, are we still joining the host-attached expander phy (19) to a port with SAS address == 0?

                        ->sas_ex_discover_devices()
                            ->sas_ex_discover_dev()
                                ->sas_ex_discover_end_dev()
                                    ->sas_port_alloc() // Create port-7:7:0
                                    ->sas_ex_get_linkrate()
                                        ->sas_port_add_phy() // Try adding phy19 to port->7:7:0, triggering BUG()

Thanks,
John