Re: [PATCH] soc: qcom: rpmh-rsc: Don't use ktime for timeout in write_tcs_reg_sync()

From: Doug Anderson
Date: Fri May 29 2020 - 13:01:16 EST


Hi,

On Thu, May 28, 2020 at 3:44 PM Stephen Boyd <swboyd@xxxxxxxxxxxx> wrote:
>
> Quoting Douglas Anderson (2020-05-28 07:48:34)
> > The write_tcs_reg_sync() may be called after timekeeping is suspended
> > so it's not OK to use ktime. The readl_poll_timeout_atomic() macro
> > implicitly uses ktime. This was causing a warning at suspend time.
> >
> > Change to just loop 1000000 times with a delay of 1 us between loops.
> > This may give a timeout of more than 1 second but never less and is
> > safe even if timekeeping is suspended.
> >
> > NOTE: I don't have any actual evidence that we need to loop here.
> > It's possibly that all we really need to do is just read the value
> > back to ensure that the pipes are cleaned and the looping/comparing is
> > totally not needed. I never saw the loop being needed in my tests.
> > However, the loop shouldn't hurt.
> >
> > Fixes: 91160150aba0 ("soc: qcom: rpmh-rsc: Timeout after 1 second in write_tcs_reg_sync()")
> > Reported-by: Maulik Shah <mkshah@xxxxxxxxxxxxxx>
> > Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx>
> > ---
>
> Reviewed-by: Stephen Boyd <sboyd@xxxxxxxxxx>

Thanks!


> Although I don't think ktime_get() inside of readl_poll_timeout_atomic()
> is correct. The timekeeping base won't be able to update when a loop is
> spinning in an irq disabled region. We need the tick interrupt to come
> in and update the base.

Is this really a problem? I'm not totally familiar with the
timekeeping code, but I know I've used ktime to time things while
interrupts are disabled in the past. It looks as if things are OK as
long as the base is updated every once in a while and it just does
deltas from there...


> Spinning for a second with irqs disabled is also
> insane for realtime so there's that problem too.

Yeah. I just arbitrarily picked 1 second originally so we didn't loop
infinitely. The expectation is that we'd never actually hit this
timeout. If we do then there's (presumably) some type of serious
problem that needs to be debugged.


-Doug