Re: [Freedreno] [PATCH] drm/msm/dpu: ensure device suspend happens during PM sleep

From: Doug Anderson
Date: Wed May 27 2020 - 18:11:17 EST


Hi,

On Fri, May 15, 2020 at 9:37 AM Doug Anderson <dianders@xxxxxxxxxxxx> wrote:
>
> Hi,
>
> On Fri, May 15, 2020 at 5:06 AM <kalyan_t@xxxxxxxxxxxxxx> wrote:
> >
> > On 2020-05-14 21:47, Doug Anderson wrote:
> > > Hi,
> > >
> > > On Fri, May 1, 2020 at 6:31 AM Kalyan Thota <kalyan_t@xxxxxxxxxxxxxx>
> > > wrote:
> > >>
> > >> "The PM core always increments the runtime usage counter
> > >> before calling the ->suspend() callback and decrements it
> > >> after calling the ->resume() callback"
> > >>
> > >> DPU and DSI are managed as runtime devices. When
> > >> suspend is triggered, PM core adds a refcount on all the
> > >> devices and calls device suspend, since usage count is
> > >> already incremented, runtime suspend was not getting called
> > >> and it kept the clocks on which resulted in target not
> > >> entering into XO shutdown.
> > >>
> > >> Add changes to force suspend on runtime devices during pm sleep.
> > >>
> > >> Changes in v1:
> > >> - Remove unnecessary checks in the function
> > >> _dpu_kms_disable_dpu (Rob Clark).
> > >>
> > >> Changes in v2:
> > >> - Avoid using suspend_late to reset the usagecount
> > >> as suspend_late might not be called during suspend
> > >> call failures (Doug).
> > >>
> > >> Changes in v3:
> > >> - Use force suspend instead of managing device usage_count
> > >> via runtime put and get API's to trigger callbacks (Doug).
> > >>
> > >> Changes in v4:
> > >> - Check the return values of pm_runtime_force_suspend and
> > >> pm_runtime_force_resume API's and pass appropriately (Doug).
> > >>
> > >> Changes in v5:
> > >
> > > Can you please put the version number properly in your subject? It's
> > > really hard to tell one version of your patch from another.
> > >
> > >
> > >> - With v4 patch, test cycle has uncovered issues in device resume.
> > >>
> > >> On bubs: cmd tx failures were seen as SW is sending panel off
> > >> commands when the dsi resources are turned off.
> > >>
> > >> Upon suspend, DRM driver will issue a NULL composition to the
> > >> dpu, followed by turning off all the HW blocks.
> > >>
> > >> v5 changes will serialize the NULL commit and resource unwinding
> > >> by handling them under PM prepare and PM complete phases there by
> > >> ensuring that clks are on when panel off commands are being
> > >> processed.
> > >
> > > I'm still most definitely not an expert in how all the DRM pieces all
> > > hook up together, but the solution you have in this patch seems wrong
> > > to me. As far as I can tell the "prepare" state isn't supposed to be
> > > actually doing the suspend work and here that's exactly what you're
> > > doing. I think you should find a different solution to ensure
> > > ordering is correct.
> > >
> > > -Doug
> > >
> >
> > Hi,
>
> Quite honestly I'm probably not the right person to be reviewing this
> code. I mostly just noticed one of your early patches and it looked
> strange to me. Hopefully someone with actual experience in how all
> the DRM components work together can actually review and see if this
> makes sense. Maybe Sean would know better?
>
> That being said, let me at least look at what you're saying...
>
>
> > Prepare and Complete are callbacks defined as part of Sleep and Resume
> > sequence
> >
> > Entering PM SUSPEND the phases are : prepare --> suspend -->
> > suspend_late --> suspend_noirq.
> > While leaving PM SUSPEND the phases are: resume_noirq --> resume_early
> > --> resume --> complete.
>
> Sure, it's part of the sequence. It's also documented in pm.h as:
>
> * The principal role of this callback is to prevent new children of
> * the device from being registered after it has returned (the driver's
> * subsystem and generally the rest of the kernel is supposed to prevent
> * new calls to the probe method from being made too once @prepare() has
> * succeeded).
>
> It does not feel like that matches your usage of this call.
>
>
> > The reason to push drm suspend handling to PM prepare phase is that
> > parent here will trigger a modeset to turn off the timing and
> > subsequently the panel.
> > the child devices should not turn of their clocks before parent unwinds
> > the composition. Hence they are serialized as per the sequence mentioned
> > above.
>
> So the general model in Linux is that children suspend before their
> parents, right? So you're saying that, in this case, the parent needs
> to act on the child before the child suspends. Is that correct?
>
> Rather than hijacking the prepare/complete, I'd be at least slightly
> inclined to move the other driver to turn off its clocks in
> suspend_late and to turn them back on in resume_early? That seems to
> be what was done in "analogix_dp-rockchip.c" to solve a similar
> problem.
>
>
> > A similar approach is taken by other driver that use drm framework. In
> > this driver, the device registers for prepare and complete callbacks to
> > handle drm_suspend and drm_resume.
> > https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/exynos/exynos_drm_drv.c#L163
>
> OK, if there is another driver in DRM then I guess I won't object too
> strongly. Note that when searching for other drivers I noticed this
> bit in todo.rst:
>
> * Most drivers (except i915 and nouveau) that use
> * drm_atomic_helper_suspend/resume() can probably be converted to use
> * drm_mode_config_helper_suspend/resume(). Also there's still open-coded version
> * of the atomic suspend/resume code in older atomic modeset drivers.
>
> Does anything get fixed if you do that? It seems like it'd cleanup
> your code a bit so maybe worth doing anyway...
>
> ---
>
> I guess the last question I'd want resolved is why you have this asymmetry:
>
> SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, msm_pm_resume)
>
> Why couldn't you use pm_runtime_force_resume()?

I'm curious if you had answers to any of the questions I posed in my review.

-Doug