Re: [PATCH v3 3/4] thermal: qcom: tsens: Add driver support for re-initialization quirk

From: Bjorn Andersson
Date: Mon Aug 29 2022 - 18:14:35 EST


On Thu, Aug 04, 2022 at 11:16:37AM +0530, Bhupesh Sharma wrote:
> Since for some Qualcomm tsens controllers, its suggested to
> monitor the controller health periodically and in case an
> issue is detected, to re-initialize the tsens controller
> via trustzone, add the support for the same in the
> qcom tsens driver.
>
> Note that once the tsens controller is reset using scm call,
> all SROT and TM region registers will enter the reset mode.
>
> While all the SROT registers will be re-programmed and
> re-enabled in trustzone prior to the scm call exit, the TM
> region registers will not re-initialized in trustzone and thus
> need to be handled by the tsens driver.
>
> Cc: Bjorn Andersson <bjorn.andersson@xxxxxxxxxx>
> Cc: Amit Kucheria <amitk@xxxxxxxxxx>
> Cc: Thara Gopinath <thara.gopinath@xxxxxxxxx>
> Cc: linux-pm@xxxxxxxxxxxxxxx
> Cc: linux-arm-msm@xxxxxxxxxxxxxxx
> Signed-off-by: Bhupesh Sharma <bhupesh.sharma@xxxxxxxxxx>
> ---
> drivers/thermal/qcom/tsens-v2.c | 3 +
> drivers/thermal/qcom/tsens.c | 197 ++++++++++++++++++++++++++++++++
> drivers/thermal/qcom/tsens.h | 12 ++
> 3 files changed, 212 insertions(+)
>
> diff --git a/drivers/thermal/qcom/tsens-v2.c b/drivers/thermal/qcom/tsens-v2.c
> index b293ed32174b..f521e4479cc5 100644
> --- a/drivers/thermal/qcom/tsens-v2.c
> +++ b/drivers/thermal/qcom/tsens-v2.c
> @@ -88,6 +88,9 @@ static const struct reg_field tsens_v2_regfields[MAX_REGFIELDS] = {
>
> /* TRDY: 1=ready, 0=in progress */
> [TRDY] = REG_FIELD(TM_TRDY_OFF, 0, 0),
> +
> + /* FIRST_ROUND_COMPLETE: 1=complete, 0=not complete */
> + [FIRST_ROUND_COMPLETE] = REG_FIELD(TM_TRDY_OFF, 3, 3),
> };
>
> static const struct tsens_ops ops_generic_v2 = {
> diff --git a/drivers/thermal/qcom/tsens.c b/drivers/thermal/qcom/tsens.c
> index e49f58e83513..c2d085fb5447 100644
> --- a/drivers/thermal/qcom/tsens.c
> +++ b/drivers/thermal/qcom/tsens.c
> @@ -7,6 +7,7 @@
> #include <linux/debugfs.h>
> #include <linux/err.h>
> #include <linux/io.h>
> +#include <linux/qcom_scm.h>
> #include <linux/module.h>
> #include <linux/nvmem-consumer.h>
> #include <linux/of.h>
> @@ -594,6 +595,113 @@ static void tsens_disable_irq(struct tsens_priv *priv)
> regmap_field_write(priv->rf[INT_EN], 0);
> }
>
> +static int tsens_reenable_hw_after_scm(struct tsens_priv *priv)

As written, this is a void function.

> +{
> + /*
> + * Re-enable watchdog, unmask the bark and
> + * disable cycle completion monitoring.
> + */
> + regmap_field_write(priv->rf[WDOG_BARK_CLEAR], 1);
> + regmap_field_write(priv->rf[WDOG_BARK_CLEAR], 0);
> + regmap_field_write(priv->rf[WDOG_BARK_MASK], 0);
> + regmap_field_write(priv->rf[CC_MON_MASK], 1);
> +
> + /* Re-enable interrupts */
> + tsens_enable_irq(priv);
> +
> + return 0;
> +}
> +
> +static int tsens_health_check_and_reinit(struct tsens_priv *priv,
> + int hw_id)
> +{
> + int ret, trdy, first_round, sw_reg;
> + unsigned long timeout;
> +
> + /* First check if TRDY is SET */
> + ret = regmap_field_read(priv->rf[TRDY], &trdy);
> + if (ret)
> + goto err;
> +
> + if (!trdy) {

if (trdy)
return 0;

Would save you one level of indentation.

> + ret = regmap_field_read(priv->rf[FIRST_ROUND_COMPLETE], &first_round);
> + if (ret)
> + goto err;
> +
> + if (!first_round) {

if (first_round)
return 0;

Would save you another level of indentation.

> + WARN_ON(!mutex_is_locked(&priv->reinit_mutex));

At least for now the function is only called within a small locked
region, so it's going to be locked here. But I'm wondering if there's
any relationship between the lock state of reinit_mutex and the values
of TRDY and FIRST_ROUND_COMPLETE.

Seems like it's possible to hit this function repeatedly and have it
exit early because of TRDY and FIRST_ROUND_COMPLETE values and then one
day if will reach here and trip.

So how about starting the function with this check, to make it more
likely to be hit in our testing?

> +
> + /* Wait for 2 ms for tsens controller to recover */
> + timeout = jiffies + msecs_to_jiffies(RESET_TIMEOUT_MS);
> + do {
> + ret = regmap_field_read(priv->rf[FIRST_ROUND_COMPLETE],
> + &first_round);
> + if (ret)
> + goto err;
> +
> + if (first_round) {
> + dev_dbg(priv->dev, "tsens controller recovered\n");
> + return 0; /* success */
> + }
> + } while (time_before(jiffies, timeout));

I see no delays in this loop, so we're presumably going to spin here
tightly for 2ms.

I think you could write this loop as:

ret = regmap_field_read_poll_timeout(priv->rf[FIRST_ROUND_COMPLETE],
&first_round, first_round, 100, 2000);
if (ret == 0) {
dev_dbg(priv->dev, "tsens controller recovered\n");
return 0;
}

> +
> + spin_lock(&priv->reinit_lock);
> +
> + /*
> + * Invoke SCM call only if SW register write is
> + * reflecting in controller. Try it for 2 ms.
> + * In case that fails mark the tsens controller
> + * as unrecoverable.
> + */
> + timeout = jiffies + msecs_to_jiffies(RESET_TIMEOUT_MS);
> + do {
> + ret = regmap_field_write(priv->rf[INT_EN], CRITICAL_INT_EN);
> + if (ret)
> + goto err;

You're holding reinit_lock here.

> +
> + ret = regmap_field_read(priv->rf[INT_EN], &sw_reg);
> + if (ret)
> + goto err;

And here.

> + } while ((sw_reg & CRITICAL_INT_EN) && (time_before(jiffies, timeout)));

And again, this is a tight loop. Please add a usleep_range(100, 1000),
perhaps inbetween the write and read?

> +
> + if (!(sw_reg & CRITICAL_INT_EN)) {
> + ret = -ENOTRECOVERABLE;
> + goto err;

Again, reinit_lock is held here.

> + }
> +
> + /*
> + * tsens controller did not recover,
> + * proceed with SCM call to re-init it.
> + */
> + ret = qcom_scm_tsens_reinit();
> + if (ret) {
> + dev_err(priv->dev, "tsens reinit scm call failed (%d)\n", ret);
> + goto err;

And here.

> + }
> +
> + /*
> + * After the SCM call, we need to re-enable
> + * the interrupts and also set active threshold
> + * for each sensor.
> + */
> + ret = tsens_reenable_hw_after_scm(priv);

As written tsens_reenable_hw_after_scm() doesn't return any value, so
skip the error handling.

> + if (ret) {
> + dev_err(priv->dev,
> + "tsens re-enable after scm call failed (%d)\n", ret);
> + goto err;

And here...

> + }
> +
> + /* Notify reinit wa worker */
> + queue_work(system_highpri_wq, &priv->reinit_wa_notify);
> +
> + spin_unlock(&priv->reinit_lock);
> + }
> + }
> +
> +err:
> + return ret;
> +}
> +
> int get_temp_tsens_valid(const struct tsens_sensor *s, int *temp)
> {
> struct tsens_priv *priv = s->priv;
> @@ -607,6 +715,21 @@ int get_temp_tsens_valid(const struct tsens_sensor *s, int *temp)
> if (tsens_version(priv) == VER_0)
> goto get_temp;
>
> + /*
> + * For some tsens controllers, its suggested to
> + * monitor the controller health periodically
> + * and in case an issue is detected to reinit
> + * tsens controller via trustzone.

Please use your 80 chars.

Regards,
Bjorn