RE: [PATCH v2 1/1] fpga: dfl: afu: harden port enable logic

From: Wu, Hao
Date: Thu Sep 17 2020 - 22:52:52 EST


> Subject: Re: [PATCH v2 1/1] fpga: dfl: afu: harden port enable logic
>
> On Thu, Sep 17, 2020 at 01:28:22PM -0700, Tom Rix wrote:
> >
> > On 9/17/20 11:32 AM, Russ Weight wrote:
> > > Port enable is not complete until ACK = 0. Change
> > > __afu_port_enable() to guarantee that the enable process
> > > is complete by polling for ACK == 0.
> > >
> > > Signed-off-by: Russ Weight <russell.h.weight@xxxxxxxxx>
> General note: Please keep a changelog if you send updated versions of a
> patch. This can be added here with an extra '---' + Text between Signed-off
> and
> diffstat:
>
> ---
> Changes from v1:
> - FOo
> - Bar
> > > ---
> > > drivers/fpga/dfl-afu-error.c | 2 +-
> > > drivers/fpga/dfl-afu-main.c | 29 +++++++++++++++++++++--------
> > > drivers/fpga/dfl-afu.h | 2 +-
> > > 3 files changed, 23 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/drivers/fpga/dfl-afu-error.c b/drivers/fpga/dfl-afu-error.c
> > > index c4691187cca9..0806532a3e9f 100644
> > > --- a/drivers/fpga/dfl-afu-error.c
> > > +++ b/drivers/fpga/dfl-afu-error.c
> > > @@ -103,7 +103,7 @@ static int afu_port_err_clear(struct device *dev,
> u64 err)
> > > __afu_port_err_mask(dev, false);
> > >
> >
> > There is an earlier bit that sets ret = -EINVAL.
> >
> > This error will be lost or not handled well.
> >
> > Right now it doesn't seem to be handled.
>
> Ultimately you'd want to report *at least* one of them, the current code
> seems to continue and enable the port either case. Is that what it
> should be doing?

In order to do error clear, we have to put port into reset firstly and then
clear port after error clearing is done. If we see failure during error clearing
that we still want to get the port back to work at least. As we know, if
port is still in reset, then the accelerator connected to the port won't work.

>
> Is the timeout more severe than the invalid value? Do you want to print
> a warning?

Yes, It's a very bad case if port can not be enabled any more (accelerator may
not be accessible any more), hardware should already be in error, it's better
we have some warning messages here.

>
> Either way a comment explaining why this is ok would be appreciated :)
> >
> > > /* Enable the Port by clear the reset */
> > > - __afu_port_enable(pdev);
> > > + ret = __afu_port_enable(pdev);
> > >
> > > done:
> > > mutex_unlock(&pdata->lock);
> > > diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
> > > index 753cda4b2568..f73b06cdf13c 100644
> > > --- a/drivers/fpga/dfl-afu-main.c
> > > +++ b/drivers/fpga/dfl-afu-main.c
> > > @@ -21,6 +21,9 @@
> > >
> > > #include "dfl-afu.h"
> > >
> > > +#define RST_POLL_INVL 10 /* us */
> > > +#define RST_POLL_TIMEOUT 1000 /* us */
> > > +
> > > /**
> > > * __afu_port_enable - enable a port by clear reset
> > > * @pdev: port platform device.
> > > @@ -32,7 +35,7 @@
> > > *
> > > * The caller needs to hold lock for protection.
> > > */
> > > -void __afu_port_enable(struct platform_device *pdev)
> > > +int __afu_port_enable(struct platform_device *pdev)
> > > {
> > > struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev-
> >dev);
> > > void __iomem *base;
> > > @@ -41,7 +44,7 @@ void __afu_port_enable(struct platform_device
> *pdev)
> > > WARN_ON(!pdata->disable_count);
> > >
> > > if (--pdata->disable_count != 0)
> > > - return;
> > > + return 0;
> > Is this really a success ? Maybe -EBUSY ?
> Seems like if it's severe enough for a warning you'd probably want to
> return an error.

As Yilun mentioned, this is just a reference count operation, we don't
need to return error code.

> > >
> > > base = dfl_get_feature_ioaddr_by_id(&pdev->dev,
> PORT_FEATURE_ID_HEADER);
> > >
> > > @@ -49,10 +52,20 @@ void __afu_port_enable(struct platform_device
> *pdev)
> > > v = readq(base + PORT_HDR_CTRL);
> > > v &= ~PORT_CTRL_SFTRST;
> > > writeq(v, base + PORT_HDR_CTRL);
> > > -}
> > >
> > > -#define RST_POLL_INVL 10 /* us */
> > > -#define RST_POLL_TIMEOUT 1000 /* us */
> > > + /*
> > > + * HW clears the ack bit to indicate that the port is fully out
> > > + * of reset.
> > > + */
> > > + if (readq_poll_timeout(base + PORT_HDR_CTRL, v,
> > > + !(v & PORT_CTRL_SFTRST_ACK),
> > > + RST_POLL_INVL, RST_POLL_TIMEOUT)) {
> > > + dev_err(&pdev->dev, "timeout, failure to enable device\n");
> > > + return -ETIMEDOUT;
> > > + }
> > > +
> > > + return 0;
> > > +}
> > >
> > > /**
> > > * __afu_port_disable - disable a port by hold reset
> > > @@ -111,7 +124,7 @@ static int __port_reset(struct platform_device
> *pdev)
> > >
> > > ret = __afu_port_disable(pdev);
> > > if (!ret)
> > > - __afu_port_enable(pdev);
> > > + ret = __afu_port_enable(pdev);
> > >
> > > return ret;
> > > }
> > > @@ -872,11 +885,11 @@ static int afu_dev_destroy(struct
> platform_device *pdev)
> > > static int port_enable_set(struct platform_device *pdev, bool enable)
> > > {
> > > struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev-
> >dev);
> > > - int ret = 0;
> > > + int ret;
> > >
> > > mutex_lock(&pdata->lock);
> > > if (enable)
> > > - __afu_port_enable(pdev);
> > > + ret = __afu_port_enable(pdev);
> > > else
> > > ret = __afu_port_disable(pdev);
> > > mutex_unlock(&pdata->lock);
> > > diff --git a/drivers/fpga/dfl-afu.h b/drivers/fpga/dfl-afu.h
> > > index 576e94960086..e5020e2b1f3d 100644
> > > --- a/drivers/fpga/dfl-afu.h
> > > +++ b/drivers/fpga/dfl-afu.h
> > > @@ -80,7 +80,7 @@ struct dfl_afu {
> > > };
> > >
> > > /* hold pdata->lock when call __afu_port_enable/disable */
> > > -void __afu_port_enable(struct platform_device *pdev);
> > > +int __afu_port_enable(struct platform_device *pdev);
> > > int __afu_port_disable(struct platform_device *pdev);
> >
> > The other functions in this file have afu_*  since the
> __afu_port_enable/disable
> >
> > are used other places would it make sense to remove the '__' prefix ?
>
> The idea on those is to indicate that the caller need to be cautious
> (often a lock / mutex) is required. I think keeping them as is is fine.

Yes. That's why we add the prefix for these functions.

Thanks
Hao

>
> >
> > If you think so, maybe a cleanup patch later.
> >
> > Tom
> >
> > >
> > > void afu_mmio_region_init(struct dfl_feature_platform_data *pdata);
> >
>
> Thanks,
> Moritz