Re: [PATCH v4 4/5] phy: ti: Add a new SERDES driver for TI's AM654x SoC

From: Kishon Vijay Abraham I
Date: Thu Jun 13 2019 - 12:52:20 EST


Hi Peter,

On 13/06/19 4:20 AM, Peter Rosin wrote:
> Hi!
>
> [I know this has already been merged upstream, but I only just
> now noticed the code and went to the archives to find the
> originating mail. I hope I managed to set in-reply-to correctly...]
>
> The mux handling is problematic and does not follow the rules.
> It needs to be fixed, or you may face deadlocks. See below.
>
> On 2019-04-05 11:08, Kishon Vijay Abraham I wrote:
>> Add a new SERDES driver for TI's AM654x SoC which configures
>> the SERDES only for PCIe. Support fo USB3 will be added later.
>>
>> SERDES in am654x has three input clocks (left input, externel reference
>> clock and right input) and two output clocks (left output and right
>> output) in addition to a PLL mux clock which the SERDES uses for Clock
>> Multiplier Unit (CMU refclock).
>>
>> The PLL mux clock can select from one of the three input clocks.
>> The right output can select between left input and external reference
>> clock while the left output can select between the right input and
>> external reference clock.
>>
>> The driver has support to select PLL mux and left/right output mux as
>> specified in device tree.
>>
>> [rogerq@xxxxxx: Fix boot lockup caused by accessing a structure member
>> (hw->init) allocated in stack of probe() and accessed in get_parent]
>> [rogerq@xxxxxx: Fix "Failed to find the parent" warnings]
>> Signed-off-by: Roger Quadros <rogerq@xxxxxx>
>> Signed-off-by: Kishon Vijay Abraham I <kishon@xxxxxx>
>> ---
>> drivers/phy/ti/Kconfig | 12 +
>> drivers/phy/ti/Makefile | 1 +
>> drivers/phy/ti/phy-am654-serdes.c | 624 ++++++++++++++++++++++++++++++
>> 3 files changed, 637 insertions(+)
>> create mode 100644 drivers/phy/ti/phy-am654-serdes.c
>>
>> diff --git a/drivers/phy/ti/Kconfig b/drivers/phy/ti/Kconfig
>> index 7cdc35f8c862..6931c87235b9 100644
>> --- a/drivers/phy/ti/Kconfig
>> +++ b/drivers/phy/ti/Kconfig
>> @@ -20,6 +20,18 @@ config PHY_DM816X_USB
>> help
>> Enable this for dm816x USB to work.
>>
>> +config PHY_AM654_SERDES
>> + tristate "TI AM654 SERDES support"
>> + depends on OF && ARCH_K3 || COMPILE_TEST
>> + depends on COMMON_CLK
>> + select GENERIC_PHY
>> + select MULTIPLEXER
>> + select REGMAP_MMIO
>> + select MUX_MMIO
>> + help
>> + This option enables support for TI AM654 SerDes PHY used for
>> + PCIe.
>> +
>> config OMAP_CONTROL_PHY
>> tristate "OMAP CONTROL PHY Driver"
>> depends on ARCH_OMAP2PLUS || COMPILE_TEST
>> diff --git a/drivers/phy/ti/Makefile b/drivers/phy/ti/Makefile
>> index 368de8e1548d..601bbd88f35e 100644
>> --- a/drivers/phy/ti/Makefile
>> +++ b/drivers/phy/ti/Makefile
>> @@ -6,5 +6,6 @@ obj-$(CONFIG_OMAP_USB2) += phy-omap-usb2.o
>> obj-$(CONFIG_TI_PIPE3) += phy-ti-pipe3.o
>> obj-$(CONFIG_PHY_TUSB1210) += phy-tusb1210.o
>> obj-$(CONFIG_TWL4030_USB) += phy-twl4030-usb.o
>> +obj-$(CONFIG_PHY_AM654_SERDES) += phy-am654-serdes.o
>> obj-$(CONFIG_PHY_TI_GMII_SEL) += phy-gmii-sel.o
>> obj-$(CONFIG_PHY_TI_KEYSTONE_SERDES) += phy-keystone-serdes.o
>> diff --git a/drivers/phy/ti/phy-am654-serdes.c b/drivers/phy/ti/phy-am654-serdes.c
>> new file mode 100644
>> index 000000000000..4817c67abbbb
>> --- /dev/null
>> +++ b/drivers/phy/ti/phy-am654-serdes.c
>> @@ -0,0 +1,624 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/**
>> + * PCIe SERDES driver for AM654x SoC
>> + *
>> + * Copyright (C) 2018 - 2019 Texas Instruments Incorporated - http://www.ti.com/
>> + * Author: Kishon Vijay Abraham I <kishon@xxxxxx>
>> + */
>> +
>> +#include <dt-bindings/phy/phy.h>
>> +#include <linux/clk.h>
>> +#include <linux/clk-provider.h>
>> +#include <linux/delay.h>
>> +#include <linux/module.h>
>> +#include <linux/mfd/syscon.h>
>> +#include <linux/mux/consumer.h>
>> +#include <linux/of_address.h>
>> +#include <linux/phy/phy.h>
>> +#include <linux/platform_device.h>
>> +#include <linux/pm_runtime.h>
>> +#include <linux/regmap.h>
>> +
>> +#define CMU_R07C 0x7c
>> +
>> +#define COMLANE_R138 0xb38
>> +#define VERSION 0x70
>> +
>> +#define COMLANE_R190 0xb90
>> +
>> +#define COMLANE_R194 0xb94
>> +
>> +#define SERDES_CTRL 0x1fd0
>> +
>> +#define WIZ_LANEXCTL_STS 0x1fe0
>> +#define TX0_DISABLE_STATE 0x4
>> +#define TX0_SLEEP_STATE 0x5
>> +#define TX0_SNOOZE_STATE 0x6
>> +#define TX0_ENABLE_STATE 0x7
>> +
>> +#define RX0_DISABLE_STATE 0x4
>> +#define RX0_SLEEP_STATE 0x5
>> +#define RX0_SNOOZE_STATE 0x6
>> +#define RX0_ENABLE_STATE 0x7
>> +
>> +#define WIZ_PLL_CTRL 0x1ff4
>> +#define PLL_DISABLE_STATE 0x4
>> +#define PLL_SLEEP_STATE 0x5
>> +#define PLL_SNOOZE_STATE 0x6
>> +#define PLL_ENABLE_STATE 0x7
>> +
>> +#define PLL_LOCK_TIME 100000 /* in microseconds */
>> +#define SLEEP_TIME 100 /* in microseconds */
>> +
>> +#define LANE_USB3 0x0
>> +#define LANE_PCIE0_LANE0 0x1
>> +
>> +#define LANE_PCIE1_LANE0 0x0
>> +#define LANE_PCIE0_LANE1 0x1
>> +
>> +#define SERDES_NUM_CLOCKS 3
>> +
>> +struct serdes_am654_clk_mux {
>> + struct clk_hw hw;
>> + struct regmap *regmap;
>> + unsigned int reg;
>> + int *table;
>> + u32 mask;
>> + u8 shift;
>> + struct clk_init_data clk_data;
>> +};
>> +
>> +#define to_serdes_am654_clk_mux(_hw) \
>> + container_of(_hw, struct serdes_am654_clk_mux, hw)
>> +
>> +static struct regmap_config serdes_am654_regmap_config = {
>> + .reg_bits = 32,
>> + .val_bits = 32,
>> + .reg_stride = 4,
>> + .fast_io = true,
>> +};
>> +
>> +static const struct reg_field cmu_master_cdn_o = REG_FIELD(CMU_R07C, 24, 24);
>> +static const struct reg_field config_version = REG_FIELD(COMLANE_R138, 16, 23);
>> +static const struct reg_field l1_master_cdn_o = REG_FIELD(COMLANE_R190, 9, 9);
>> +static const struct reg_field cmu_ok_i_0 = REG_FIELD(COMLANE_R194, 19, 19);
>> +static const struct reg_field por_en = REG_FIELD(SERDES_CTRL, 29, 29);
>> +static const struct reg_field tx0_enable = REG_FIELD(WIZ_LANEXCTL_STS, 29, 31);
>> +static const struct reg_field rx0_enable = REG_FIELD(WIZ_LANEXCTL_STS, 13, 15);
>> +static const struct reg_field pll_enable = REG_FIELD(WIZ_PLL_CTRL, 29, 31);
>> +static const struct reg_field pll_ok = REG_FIELD(WIZ_PLL_CTRL, 28, 28);
>> +
>> +struct serdes_am654 {
>> + struct regmap *regmap;
>> + struct regmap_field *cmu_master_cdn_o;
>> + struct regmap_field *config_version;
>> + struct regmap_field *l1_master_cdn_o;
>> + struct regmap_field *cmu_ok_i_0;
>> + struct regmap_field *por_en;
>> + struct regmap_field *tx0_enable;
>> + struct regmap_field *rx0_enable;
>> + struct regmap_field *pll_enable;
>> + struct regmap_field *pll_ok;
>> +
>> + struct device *dev;
>> + struct mux_control *control;
>> + bool busy;
>> + u32 type;
>> + struct device_node *of_node;
>> + struct clk_onecell_data clk_data;
>> + struct clk *clks[SERDES_NUM_CLOCKS];
>> +};
>> +
>> +static int serdes_am654_enable_pll(struct serdes_am654 *phy)
>> +{
>> + int ret;
>> + u32 val;
>> +
>> + ret = regmap_field_write(phy->pll_enable, PLL_ENABLE_STATE);
>> + if (ret)
>> + return ret;
>> +
>> + return regmap_field_read_poll_timeout(phy->pll_ok, val, val, 1000,
>> + PLL_LOCK_TIME);
>> +}
>> +
>> +static void serdes_am654_disable_pll(struct serdes_am654 *phy)
>> +{
>> + struct device *dev = phy->dev;
>> + int ret;
>> +
>> + ret = regmap_field_write(phy->pll_enable, PLL_DISABLE_STATE);
>> + if (ret)
>> + dev_err(dev, "Failed to disable PLL\n");
>> +}
>> +
>> +static int serdes_am654_enable_txrx(struct serdes_am654 *phy)
>> +{
>> + int ret;
>> +
>> + /* Enable TX */
>> + ret = regmap_field_write(phy->tx0_enable, TX0_ENABLE_STATE);
>> + if (ret)
>> + return ret;
>> +
>> + /* Enable RX */
>> + ret = regmap_field_write(phy->rx0_enable, RX0_ENABLE_STATE);
>> + if (ret)
>> + return ret;
>> +
>> + return 0;
>> +}
>> +
>> +static int serdes_am654_disable_txrx(struct serdes_am654 *phy)
>> +{
>> + int ret;
>> +
>> + /* Disable TX */
>> + ret = regmap_field_write(phy->tx0_enable, TX0_DISABLE_STATE);
>> + if (ret)
>> + return ret;
>> +
>> + /* Disable RX */
>> + ret = regmap_field_write(phy->rx0_enable, RX0_DISABLE_STATE);
>> + if (ret)
>> + return ret;
>> +
>> + return 0;
>> +}
>> +
>> +static int serdes_am654_power_on(struct phy *x)
>> +{
>> + struct serdes_am654 *phy = phy_get_drvdata(x);
>> + struct device *dev = phy->dev;
>> + int ret;
>> + u32 val;
>> +
>> + ret = serdes_am654_enable_pll(phy);
>> + if (ret) {
>> + dev_err(dev, "Failed to enable PLL\n");
>> + return ret;
>> + }
>> +
>> + ret = serdes_am654_enable_txrx(phy);
>> + if (ret) {
>> + dev_err(dev, "Failed to enable TX RX\n");
>> + return ret;
>> + }
>> +
>> + return regmap_field_read_poll_timeout(phy->cmu_ok_i_0, val, val,
>> + SLEEP_TIME, PLL_LOCK_TIME);
>> +}
>> +
>> +static int serdes_am654_power_off(struct phy *x)
>> +{
>> + struct serdes_am654 *phy = phy_get_drvdata(x);
>> +
>> + serdes_am654_disable_txrx(phy);
>> + serdes_am654_disable_pll(phy);
>> +
>> + return 0;
>> +}
>> +
>> +static int serdes_am654_init(struct phy *x)
>> +{
>> + struct serdes_am654 *phy = phy_get_drvdata(x);
>> + int ret;
>> +
>> + ret = regmap_field_write(phy->config_version, VERSION);
>> + if (ret)
>> + return ret;
>> +
>> + ret = regmap_field_write(phy->cmu_master_cdn_o, 0x1);
>> + if (ret)
>> + return ret;
>> +
>> + ret = regmap_field_write(phy->l1_master_cdn_o, 0x1);
>> + if (ret)
>> + return ret;
>> +
>> + return 0;
>> +}
>> +
>> +static int serdes_am654_reset(struct phy *x)
>> +{
>> + struct serdes_am654 *phy = phy_get_drvdata(x);
>> + int ret;
>> +
>> + ret = regmap_field_write(phy->por_en, 0x1);
>> + if (ret)
>> + return ret;
>> +
>> + mdelay(1);
>> +
>> + ret = regmap_field_write(phy->por_en, 0x0);
>> + if (ret)
>> + return ret;
>> +
>> + return 0;
>> +}
>> +
>> +static void serdes_am654_release(struct phy *x)
>> +{
>> + struct serdes_am654 *phy = phy_get_drvdata(x);
>> +
>> + phy->type = PHY_NONE;
>> + phy->busy = false;
>> + mux_control_deselect(phy->control);
>
> Here you unconditionally deselect the mux, and that seems
> dangerous. Are you *sure* that ->release may not be called
> without a successful xlate call?

Yeah, without a successful xlate(), the consumer will never get a reference to
the PHY and the ->release() is invoked only from phy_put() which needs a
reference to the PHY.
>
> I'm not 100% sure of that, but I have not looked at the phy
> code before today, so it may very well be the case that this
> is safe...
>
>> +}
>> +
>> +struct phy *serdes_am654_xlate(struct device *dev, struct of_phandle_args
>> + *args)
>> +{
>> + struct serdes_am654 *am654_phy;
>> + struct phy *phy;
>> + int ret;
>> +
>> + phy = of_phy_simple_xlate(dev, args);
>> + if (IS_ERR(phy))
>> + return phy;
>> +
>> + am654_phy = phy_get_drvdata(phy);
>> + if (am654_phy->busy)
>> + return ERR_PTR(-EBUSY);
>> +
>> + ret = mux_control_select(am654_phy->control, args->args[1]);
>> + if (ret) {
>> + dev_err(dev, "Failed to select SERDES Lane Function\n");
>> + return ERR_PTR(ret);
>> + }
>
> *However*
>
> Here you select the mux as the last action, good, but, a mux must
> be handled with that same care as a locking primitive, i.e.
> successful selects must be perfectly balanced with deselects. I
> see no guarantee of that here, since there are other failures
> possible after the xlate call. So, being last in the function
> does not really help. If I read the code correctly, the
> phy core may fail if try_module_get fails in phy_get(). If that
> ever happens, a successful call to mux_control_select is simply
> forgotten, and the mux will be locked indefinitely.

Good catch. While adding ->release() ops which is only invoked from phy_put,
perhaps this was missed. Ideally it should be invoked from other places where
there is a failure after phy_get.
>
> am654_phy->busy will also be set indefinitely, so you will get
> -EBUSY and not a hard deadlock. At least here, but if the now
> locked mux control happens to also control some other muxes
> (probably unlikely, but if), then their consumers will potentially
> deadlock hard. But that's just after a cursory reading, so I may
> completely miss something...

The ->busy here should prevent two consumers trying to control the same mux.
>
> Thinking some more, the above mentioned forgotten phy problem
> in fact looks like a generic leak in phy_get. It will simply
> leak any and all phys where the owner module is not yet
> available. Or, am I missing something?

yes you are right. Ideally we should have invoked ->release() after phy_get()
failures.
>
> Perhaps fixing that is all that is needed to make the mux
> select/deselect calls guaranteed to be balanced?

That's correct. Thanks for reporting this.

Cheers
Kishon