Re: [Freedreno] drm/msm: 'pp done time out' errors after async commit changes

From: Rob Clark
Date: Tue Nov 05 2019 - 11:23:46 EST


On Tue, Nov 5, 2019 at 2:08 AM Brian Masney <masneyb@xxxxxxxxxxxxx> wrote:
>
> On Mon, Nov 04, 2019 at 04:19:07PM -0800, Rob Clark wrote:
> > On Mon, Nov 4, 2019 at 4:01 PM Brian Masney <masneyb@xxxxxxxxxxxxx> wrote:
> > >
> > > Hey Rob,
> > >
> > > Since commit 2d99ced787e3 ("drm/msm: async commit support"), the frame
> > > buffer console on my Nexus 5 began throwing these errors:
> > >
> > > msm fd900000.mdss: pp done time out, lm=0
> > >
> > > The display still works.
> > >
> > > I see that mdp5_flush_commit() was introduced in commit 9f6b65642bd2
> > > ("drm/msm: add kms->flush_commit()") with a TODO comment and the commit
> > > description mentions flushing registers. I assume that this is the
> > > proper fix. If so, can you point me to where these registers are
> > > defined and I can work on the mdp5 implementation.
> >
> > See mdp5_ctl_commit(), which writes the CTL_FLUSH registers.. the idea
> > would be to defer writing CTL_FLUSH[ctl_id] = flush_mask until
> > kms->flush() (which happens from a timer shortly before vblank).
> >
> > But I think the async flush case should not come up with fbcon? It
> > was really added to cope with hwcursor updates (and userspace that
> > assumes it can do an unlimited # of cursor updates per frame).. the
> > intention was that nothing should change in the sequence for mdp5 (but
> > I guess that was not the case).
>
> The 'pp done time out' errors go away if I revert the following three
> commits:
>
> cd6d923167b1 ("drm/msm/dpu: async commit support")
> d934a712c5e6 ("drm/msm: add atomic traces")
> 2d99ced787e3 ("drm/msm: async commit support")
>
> I reverted the first one to fix a compiler error, and the second one so
> that the last patch can be reverted without any merge conflicts.
>
> I see that crtc_flush() calls mdp5_ctl_commit(). I tried to use
> crtc_flush_all() in mdp5_flush_commit() and the contents of the frame
> buffer dance around the screen like its out of sync. I renamed
> crtc_flush_all() to mdp5_crtc_flush_all() and removed the static
> declaration. Here's the relevant part of what I tried:
>
> --- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
> +++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
> @@ -171,7 +171,15 @@ static void mdp5_prepare_commit(struct msm_kms *kms, struct drm_atomic_state *st
>
> static void mdp5_flush_commit(struct msm_kms *kms, unsigned crtc_mask)
> {
> - /* TODO */
> + struct mdp5_kms *mdp5_kms = to_mdp5_kms(to_mdp_kms(kms));
> + struct drm_crtc *crtc;
> +
> + for_each_crtc_mask(mdp5_kms->dev, crtc, crtc_mask) {
> + if (!crtc->state->active)
> + continue;
> +
> + mdp5_crtc_flush_all(crtc);
> + }
> }
>
> Any tips would be appreciated.


I think this is along the lines of what we need to enable async commit
for mdp5 (but also removing the flush from the atomic-commit path)..
the principle behind the async commit is to do all the atomic state
commit normally, but defer writing the flush bits. This way, if you
get another async update before the next vblank, you just apply it
immediately instead of waiting for vblank.

But I guess you are on a command mode panel, if I remember? Which is
a case I didn't have a way to test. And I'm not entirely about how
kms_funcs->vsync_time() should be implemented for cmd mode panels.

That all said, I think we should first fix what is broken, before
worrying about extending async commit support to mdp5.. which
shouldn't hit the async==true path, due to not implementing
kms_funcs->vsync_time().

What I think is going on is that, in the cmd mode case,
mdp5_wait_flush() (indirectly) calls mdp5_crtc_wait_for_pp_done(),
which waits for a pp-done irq regardless of whether there is a flush
in progress. Since there is no flush pending, the irq never comes.
But the expectation is that kms_funcs->wait_flush() returns
immediately if there is nothing to wait for.

BR,
-R