Re: [RESEND PATCH net-next v3 2/4] net: phy: aquantia: wait for FW reset before checking the vendor ID

From: Vladimir Oltean
Date: Tue Aug 06 2024 - 07:28:14 EST


Hi Jon,

On Tue, Jul 30, 2024 at 10:59:59AM +0100, Jon Hunter wrote:
> Hi Bartosz,
>
> On 08/07/2024 08:50, Bartosz Golaszewski wrote:
> > From: Bartosz Golaszewski <bartosz.golaszewski@xxxxxxxxxx>
> >
> > Checking the firmware register before it complete the boot process makes
> > no sense, it will report 0 even if FW is available from internal memory.
> > Always wait for FW to boot before continuing or we'll unnecessarily try
> > to load it from nvmem/filesystem and fail.
> >
> > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@xxxxxxxxxx>
> > ---
> > drivers/net/phy/aquantia/aquantia_firmware.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/drivers/net/phy/aquantia/aquantia_firmware.c b/drivers/net/phy/aquantia/aquantia_firmware.c
> > index 0c9640ef153b..524627a36c6f 100644
> > --- a/drivers/net/phy/aquantia/aquantia_firmware.c
> > +++ b/drivers/net/phy/aquantia/aquantia_firmware.c
> > @@ -353,6 +353,10 @@ int aqr_firmware_load(struct phy_device *phydev)
> > {
> > int ret;
> > + ret = aqr_wait_reset_complete(phydev);
> > + if (ret)
> > + return ret;
> > +
> > /* Check if the firmware is not already loaded by pooling
> > * the current version returned by the PHY. If 0 is returned,
> > * no firmware is loaded.
>
>
> Although this fixed another issue we were seeing with this driver, we have
> been reviewing this change and have a question about it.
>
> According to the description for the function aqr_wait_reset_complete() this
> function is intended to give the device time to load firmware and check
> there is a valid firmware ID.
>
> If a valid firmware ID (non-zero) is detected, then
> aqr_wait_reset_complete() will return 0 (because phy_read_mmd_poll_timeout()
> returns 0 on success and -ETIMEDOUT upon a timeout).
>
> If it times out, then it would appear that with the above code we don't
> attempt to load the firmware by any other means?
>
> Hence, I was wondering if we want this ...
>
> diff --git a/drivers/net/phy/aquantia/aquantia_firmware.c
> b/drivers/net/phy/aquantia/aquantia_firmware.c
> index 524627a36c6f..a167f42ae36b 100644
> --- a/drivers/net/phy/aquantia/aquantia_firmware.c
> +++ b/drivers/net/phy/aquantia/aquantia_firmware.c
> @@ -353,16 +353,12 @@ int aqr_firmware_load(struct phy_device *phydev)
> {
> int ret;
>
> - ret = aqr_wait_reset_complete(phydev);
> - if (ret)
> - return ret;
> -
> - /* Check if the firmware is not already loaded by pooling
> + /* Check if the firmware is not already loaded by polling
> * the current version returned by the PHY. If 0 is returned,
> - * no firmware is loaded.
> + * firmware is loaded.
> */
> - ret = phy_read_mmd(phydev, MDIO_MMD_VEND1, VEND1_GLOBAL_FW_ID);
> - if (ret > 0)
> + ret = aqr_wait_reset_complete(phydev);
> + if (!ret)
> goto exit;
>
> ret = aqr_firmware_load_nvmem(phydev);

I agree with your analysis and we also noticed this.

But actually, you wouldn't want to ignore other return codes from
phy_read_mmd_poll_timeout() like real errors from phy_read_mmd():
-ENODEV, -ENXIO etc.

I found that the logic is more readable with a switch/case statement as below.

diff --git a/drivers/net/phy/aquantia/aquantia_firmware.c b/drivers/net/phy/aquantia/aquantia_firmware.c
index 524627a36c6f..d839f64471bd 100644
--- a/drivers/net/phy/aquantia/aquantia_firmware.c
+++ b/drivers/net/phy/aquantia/aquantia_firmware.c
@@ -353,26 +353,33 @@ int aqr_firmware_load(struct phy_device *phydev)
{
int ret;

- ret = aqr_wait_reset_complete(phydev);
- if (ret)
- return ret;
-
- /* Check if the firmware is not already loaded by pooling
- * the current version returned by the PHY. If 0 is returned,
- * no firmware is loaded.
+ /* Check if the firmware is not already loaded by polling
+ * the current version returned by the PHY.
*/
- ret = phy_read_mmd(phydev, MDIO_MMD_VEND1, VEND1_GLOBAL_FW_ID);
- if (ret > 0)
- goto exit;
+ ret = aqr_wait_reset_complete(phydev);
+ switch (ret) {
+ case 0:
+ /* Some firmware is loaded => do nothing */
+ return 0;
+ case -ETIMEDOUT:
+ /* VEND1_GLOBAL_FW_ID still reads 0 after 2 seconds of polling.
+ * We don't have full confidence that no firmware is loaded (in
+ * theory it might just not have loaded yet), but we will
+ * assume that, and load a new image.
+ */
+ ret = aqr_firmware_load_nvmem(phydev);
+ if (!ret)
+ goto exit;

- ret = aqr_firmware_load_nvmem(phydev);
- if (!ret)
- goto exit;
+ ret = aqr_firmware_load_fs(phydev);
+ if (ret)
+ return ret;

- ret = aqr_firmware_load_fs(phydev);
- if (ret)
+ break;
+ default:
+ /* PHY read error, propagate it to the caller */
return ret;
+ }

-exit:
return 0;
}