Re: [RFC PATCH 2/3] media: qcom: camss: Add CAMSS Offline Processing Engine driver
From: Laurent Pinchart
Date: Sun Apr 05 2026 - 16:11:54 EST
On Tue, Mar 24, 2026 at 11:00:21AM +0000, Bryan O'Donoghue wrote:
> On 23/03/2026 15:31, Loic Poulain wrote:
[snip]
> >>> +static void ope_prog_stripe(struct ope_ctx *ctx, struct ope_stripe *stripe)
> >>> +{
> >>> + struct ope_dev *ope = ctx->ope;
> >>> + int i;
> >>> +
> >>> + dev_dbg(ope->dev, "Context %p - Programming S%u\n", ctx, ope_stripe_index(ctx, stripe));
> >>> +
> >>> + /* Fetch Engine */
> >>> + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_UNPACK_CFG_0, stripe->src.format);
> >>> + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_RD_BUFFER_SIZE,
> >>> + (stripe->src.width << 16) + stripe->src.height);
> >>> + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_ADDR_IMAGE, stripe->src.addr);
> >>> + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_RD_STRIDE, stripe->src.stride);
> >>> + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_CCIF_META_DATA,
> >>> + FIELD_PREP(OPE_BUS_RD_CLIENT_0_CCIF_MD_PIX_PATTERN, stripe->src.pattern));
> >>> + ope_write_rd(ope, OPE_BUS_RD_CLIENT_0_CORE_CFG, OPE_BUS_RD_CLIENT_0_CORE_CFG_EN);
> >>> +
> >>> + /* Write Engines */
> >>> + for (i = 0; i < OPE_WR_CLIENT_MAX; i++) {
> >>> + if (!stripe->dst[i].enabled) {
> >>> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_CFG(i), 0);
> >>> + continue;
> >>> + }
> >>> +
> >>> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_ADDR_IMAGE(i), stripe->dst[i].addr);
> >>> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_IMAGE_CFG_0(i),
> >>> + (stripe->dst[i].height << 16) + stripe->dst[i].width);
> >>> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_IMAGE_CFG_1(i), stripe->dst[i].x_init);
> >>> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_IMAGE_CFG_2(i), stripe->dst[i].stride);
> >>> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_PACKER_CFG(i), stripe->dst[i].format);
> >>> + ope_write_wr(ope, OPE_BUS_WR_CLIENT_CFG(i),
> >>> + OPE_BUS_WR_CLIENT_CFG_EN + OPE_BUS_WR_CLIENT_CFG_AUTORECOVER);
> >>> + }
> >>> +
> >>> + /* Downscalers */
> >>> + for (i = 0; i < OPE_DS_MAX; i++) {
> >>> + struct ope_dsc_config *dsc = &stripe->dsc[i];
> >>> + u32 base = ope_ds_base[i];
> >>> + u32 cfg = 0;
> >>> +
> >>> + if (dsc->input_width != dsc->output_width) {
> >>> + dsc->phase_step_h |= DS_RESOLUTION(dsc->input_width,
> >>> + dsc->output_width) << 30;
> >>> + cfg |= OPE_PP_CLC_DOWNSCALE_MN_DS_CFG_H_SCALE_EN;
> >>> + }
> >>> +
> >>> + if (dsc->input_height != dsc->output_height) {
> >>> + dsc->phase_step_v |= DS_RESOLUTION(dsc->input_height,
> >>> + dsc->output_height) << 30;
> >>> + cfg |= OPE_PP_CLC_DOWNSCALE_MN_DS_CFG_V_SCALE_EN;
> >>> + }
> >>> +
> >>> + ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_DS_CFG(base), cfg);
> >>> + ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_DS_IMAGE_SIZE_CFG(base),
> >>> + ((dsc->input_width - 1) << 16) + dsc->input_height - 1);
> >>> + ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_DS_MN_H_CFG(base), dsc->phase_step_h);
> >>> + ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_DS_MN_V_CFG(base), dsc->phase_step_v);
> >>> + ope_write_pp(ope, OPE_PP_CLC_DOWNSCALE_MN_CFG(base),
> >>> + cfg ? OPE_PP_CLC_DOWNSCALE_MN_CFG_EN : 0);
> >>> + }
> >>> +}
> >>
> >> So - this is where the CDM should be used - so that you don't have to do
> >> all of these MMIO writes inside of your ISR.
> >
> > Indeed, and that also the reason stripes are computed ahead of time,
> > so that they can be further 'queued' in a CDM.
> >
> >> Is that and additional step after the RFC ?
> > The current implementation (without CDM) already provides good results
> > and performance, so CDM can be viewed as a future enhancement.
>
> That's true but then the number of MMIO writes per ISR is pretty small
> right now. You have about 50 writes here.
>
> > As far as I understand, CDM could also be implemented in a generic way
> > within CAMSS, since other CAMSS blocks make use of CDM as well.
> > This is something we should discuss further.
>
> My concern is even conservatively if each module adds another 10 ?
> writes by the time we get to denoising, sharpening, lens shade
> correction, those writes could easily look more like 100.
>
> What user-space should submit is well documented data-structures which
> then get translated into CDM buffers by the OPE and IFE for the various
> bits of the pipeline.
The mali-c55 driver does this, it translates the ISP parameters buffers
to a list of register values in userspace context, when the buffer is
queued. In the IRQ handler, it then either copies those values to
registers with MMIO writes, or use a DMA engine, depending on the
platform. The rppx1 driver does something similar, with a different
format for the buffer containing the register values.
I think this architecture could be replicated here. This translation in
userspace context ensures that work at IRQ time is limited. The driver
can use whatever DMA engine is available depending on the platform, and
we can also force usage of MMIO for debugging or development purpose.
That way, development of ISP features is decoupled from development of
CDM support, enabling parallel development if desired, and faster
plaform enablement that allows starting the userspace side of the work
quicker.
--
Regards,
Laurent Pinchart