Re: [PATCH v14 RESEND 5/6] drm/imx: Introduce i.MX8qm/qxp DPU DRM
From: Maxime Ripard
Date: Tue Oct 03 2023 - 07:53:00 EST
On Tue, Sep 26, 2023 at 03:55:35AM +0000, Ying Liu wrote:
> > > > > + cf->pec_base = devm_ioremap(dpu->dev, pec_base, SZ_16);
> > > > > + if (!cf->pec_base)
> > > > > + return -ENOMEM;
> > > > > +
> > > > > + cf->base = devm_ioremap(dpu->dev, base, SZ_32);
> > > > > + if (!cf->base)
> > > > > + return -ENOMEM;
> > > >
> > > > For the same reason, you need to protect any access to a device managed
> > > > resource (so clocks, registers, regulators, etc.) by a call to
> > > > drm_dev_enter/drm_dev_exit and you need to call drm_dev_unplug
> > instead
> > > > of drm_dev_unregister.
> > >
> > > That's a good point. I've tried to do that, but it turns out that the
> > > display controller cannot be enabled again after binding the dpu-core
> > > driver manually again. It seems that the display controller requires a
> > > proper disablement procedure, but the "driver instance overview " kdoc
> > > mentions the shortcoming of no proper disablement if drm_dev_unplug()
> > > is used:
> > >
> > > """
> > > * Drivers that want to support device unplugging (USB, DT overlay unload)
> > should
> > > * use drm_dev_unplug() instead of drm_dev_unregister(). The driver must
> > protect
> > > * regions that is accessing device resources to prevent use after they're
> > > * released. This is done using drm_dev_enter() and drm_dev_exit(). There
> > is one
> > > * shortcoming however, drm_dev_unplug() marks the drm_device as
> > unplugged before
> > > * drm_atomic_helper_shutdown() is called. This means that if the disable
> > code
> > > * paths are protected, they will not run on regular driver module unload,
> > > * possibly leaving the hardware enabled.
> > > """
> > >
> > > A DPU reset in dpu_core() might be helpful, but I'm not sure if there is any
> > > reset line provided by the embodying system.
> >
> > Generally speaking, you shouldn't rely on the device being in any
> > particuliar state before your driver loads. So a reset at probe/bind
> > time is a good idea.
>
> Yes. I'll drop the platform device creations for CRTCs from dpu-core.c
> and drop the aggregation of CRTC components from different DPU
> instances into one DRM device. This way, there will be only two CRTCs
> of one DPU in one DRM device.
Ok.
> Then, the driver will be simpler and users cannot unbind the driver of
> one of the two DPU instances,
Uh? They would still be able to do that.
> which means drm_dev_unplug() won't be needed any more(?)
So this would still be needed
> and the reset issue will be gone. The controller will be shutdown
> properly through drm_atomic_helper_shutdown() when the driver module
> is removed.
Again, you shouldn't rely on a particular state at boot. For all you
know, you could have been reset by some watchdog or been kexec'd.
> > > Even if the reset works, the 2nd DPU instance in i.MX8qm would be a
> > > problem, because it won't be reset or properly disabled if the 1st DPU
> > > instance is unbound.
> >
> > Why it wouldn't be reset?
>
> Because dpu_core_remove() is not called for the 2nd DPU instance.
> Anyway, with the above new design, this doesn't seem to be a problem
> any more.
Ok.
> >
> > > Although the two DPU instances could be wrapped by two DRM devices, I
> > > tend not to do that because downstream bridges in future SoCs might be
> > > able to mux to different DPU instances at runtime.
> > >
> > > Due to the disablement issue, can we set drm_dev_enter/exit/unplug
> > > aside first?
> >
> > I'd rather have that figured out prior to merging.
>
> I'm assuming that drm_dev_enter/exit/unplug won't be needed with the above
> new design - one DPU instance wrapped by one DRM device.
I'm not sure why you are making that claim. And again, that's good
practice: it does no harm while preventing unsafe behaviour in the
future.
> > > > > +static void dpu_atomic_put_plane_state(struct drm_atomic_state
> > *state,
> > > > > + struct drm_plane *plane)
> > > > > +{
> > > > > + int index = drm_plane_index(plane);
> > > > > +
> > > > > + plane->funcs->atomic_destroy_state(plane, state-
> > >planes[index].state);
> > > > > + state->planes[index].ptr = NULL;
> > > > > + state->planes[index].state = NULL;
> > > > > + state->planes[index].old_state = NULL;
> > > > > + state->planes[index].new_state = NULL;
> > > > > +
> > > > > + drm_modeset_unlock(&plane->mutex);
> > > > > +
> > > > > + dpu_plane_dbg(plane, "put state\n");
> > > > > +}
> > > > > +
> > > > > +static void dpu_atomic_put_crtc_state(struct drm_atomic_state *state,
> > > > > + struct drm_crtc *crtc)
> > > > > +{
> > > > > + int index = drm_crtc_index(crtc);
> > > > > +
> > > > > + crtc->funcs->atomic_destroy_state(crtc, state->crtcs[index].state);
> > > > > + state->crtcs[index].ptr = NULL;
> > > > > + state->crtcs[index].state = NULL;
> > > > > + state->crtcs[index].old_state = NULL;
> > > > > + state->crtcs[index].new_state = NULL;
> > > > > +
> > > > > + drm_modeset_unlock(&crtc->mutex);
> > > > > +
> > > > > + dpu_crtc_dbg(crtc, "put state\n");
> > > > > +}
> > > > > +
> > > > > +static void
> > > > > +dpu_atomic_put_possible_states_per_crtc(struct drm_crtc_state
> > > > *crtc_state)
> > > > > +{
> > > > > + struct drm_atomic_state *state = crtc_state->state;
> > > > > + struct drm_crtc *crtc = crtc_state->crtc;
> > > > > + struct drm_plane *plane;
> > > > > + struct drm_plane_state *old_plane_state, *new_plane_state;
> > > > > + struct dpu_plane_state *old_dpstate, *new_dpstate;
> > > > > +
> > > > > + drm_atomic_crtc_state_for_each_plane(plane, crtc_state) {
> > > > > + old_plane_state = drm_atomic_get_old_plane_state(state,
> > > > plane);
> > > > > + new_plane_state = drm_atomic_get_new_plane_state(state,
> > > > plane);
> > > > > +
> > > > > + old_dpstate = to_dpu_plane_state(old_plane_state);
> > > > > + new_dpstate = to_dpu_plane_state(new_plane_state);
> > > > > +
> > > > > + /* Should be enough to check the below HW plane resources.
> > > > */
> > > > > + if (old_dpstate->stage.ptr != new_dpstate->stage.ptr ||
> > > > > + old_dpstate->source != new_dpstate->source ||
> > > > > + old_dpstate->blend != new_dpstate->blend)
> > > > > + return;
> > > > > + }
> > > > > +
> > > > > + drm_atomic_crtc_state_for_each_plane(plane, crtc_state)
> > > > > + dpu_atomic_put_plane_state(state, plane);
> > > > > +
> > > > > + dpu_atomic_put_crtc_state(state, crtc);
> > > > > +}
> > > >
> > > > That's super suspicious too. Are you really going around and dropping
> > > > and destroying plane and crtc states in a global state?
> > >
> > > Yes.
> >
> > That's really not a good idea. Adding states are fine, dropping ones
> > aren't.
>
> The idea is to add all active planes on two CRTCs into one commit and
> try to allocate fetchunits for those planes as a whole to best meet user's
> requirements, because ...
>
> >
> > > >
> > > > At the very least, you need to describe what this addresses and why you
> > > > think it's a good solution.
> > >
> > > This is the solution to assign HW resources of a plane group to the two
> > CRTCs
> > > in one DPU or one CRTC group _dynamically_ at runtime. Dpu.h has some
> > > comments which hint this:
> > >
> > > """
> > > /*
> > > * fetchunit/scaler/layerblend resources of a plane group are
> > > * shared by the two CRTCs in a CRTC group
> > > */
> > > """
> > >
> > > I can add a DPU display controller block diagram in dpu_kms.c to tell the
> > HW
> > > architecture and some SW architecture to clarify this more.
> >
> > It's not so much the diagram that I'm looking for, but an accurate
> > description of the problem. What resource is there, why and how does it
> > need to be shared, so we can understand what you are doing there, and
> > possibly suggest other things.
>
> ... the problem is that fetchunits(fetchdecode/fetchlayer/fetchwarp) have
> different capabilities/feature sets and they can be built upon either of the
> two CRTCs through layerblends. The 4 fetchunits for one DPU display
> controller are fetchdecode0, fetchdecode1, fetchlayer0 and fetchwarp2.
> Correspondingly, there are 4 layerblends(0 to 3). Layerblends blend two
> input sources(primary input and secondary input). The primary input can
> be, say, constframe or another layerblend's output. The secondary input
> can be, say, one of those fetchunits. For example, a real use case could
> be:
> - CRTC0:
> Primary plane:
> Layerblend0:
> Primary: constframe4
> Secondary: fetchlayer0
> Overlay plane:
> Layerblend1:
> Primary: Layerblend0 output
> Secondary: fetchdecode1 + vscaler5 + hscaler4
>
> - CRTC1:
> Primary plane:
> Layerblend2:
> Primary: constframe5
> Secondary: fetchwarp2 + fetcheco2
> Overlay plane:
> Layerblend3:
> Primary: Layerblend2 output
> Secondary: fetchdecode0 + fetcheco0 + vscaler4
>
> Some fetchunit features:
> - fetchdecode: Horizontoal/vertical downscaling through video
> processing blocks and YUV fb with two planars(use fetcheco).
> - fetchwarp: YUV fb with two planars(use fetcheco), up to
> 8 sub-layers and warp.
> - fetchlayer: up to 8 sub-layers.
>
> All fetchunits support RGB fb.
>
>
> What I do is:
> - Add all active planes(including primary and overlays) on two CRTCs
> into one commit even if the user only wants to update the plane(s)
> on one CRTC.
> - Those manually added planes/CRTCs are prone to put, because
> the relevant fetchunits and layerblends are likely/eventually
> unchanged after the allocation.
> - Traverse through fetchunit list to try to find an available one to
> meet a plane's requirements(say CSC? Scalers?). Those prone-to-put
> planes are handled first to use the current fetchunits if modeset
> is not allowed, otherwise planes with lower z-positions go first.
> - Do not allow fetchunit hot-migration between two CRTCs.
> - Assign layerblend with the lowest possible index to planes. Planes
> with lower z-positions go first.
> - Put the prone-to-put planes/CRTC if possible to gain some
> performance .
So, you shouldn't do it the way you did it so far, but if I got you
right, this seems similar to what we have on vc4 where all planes go
through another device (called HVS) that we maintain a global state for.
That way, any plane change will pull that global state in, and you are
getting a global view of what resources are being used in the system.
See vc4_pv_muxing_atomic_check() for an example.
Maxime
Attachment:
signature.asc
Description: PGP signature