Re: [PATCH 02/15] accel/qda: Add QDA driver documentation

From: Dmitry Baryshkov

Date: Wed May 20 2026 - 10:34:57 EST


On Tue, May 19, 2026 at 11:45:52AM +0530, Ekansh Gupta via B4 Relay wrote:
> From: Ekansh Gupta <ekansh.gupta@xxxxxxxxxxxxxxxx>
>
> Add documentation for the Qualcomm DSP Accelerator (QDA) driver under
> Documentation/accel/qda/. The documentation covers the driver
> architecture, GEM-based buffer management, IOMMU context bank
> isolation, and the RPMsg transport layer.
>
> The user-space API section describes the DRM IOCTLs for session
> management, GEM buffer allocation, and remote procedure invocation via
> the FastRPC protocol, along with a typical application lifecycle
> example. Sections for dynamic debug and basic testing are also
> included.
>
> Wire the new documentation into the Compute Accelerators index at
> Documentation/accel/index.rst.
>
> Assisted-by: Claude:claude-4-6-sonnet
> Signed-off-by: Ekansh Gupta <ekansh.gupta@xxxxxxxxxxxxxxxx>
> ---
> Documentation/accel/index.rst | 1 +
> Documentation/accel/qda/index.rst | 13 ++++
> Documentation/accel/qda/qda.rst | 146 ++++++++++++++++++++++++++++++++++++++
> 3 files changed, 160 insertions(+)
>
> diff --git a/Documentation/accel/index.rst b/Documentation/accel/index.rst
> index cbc7d4c3876a..5901ea7f784c 100644
> --- a/Documentation/accel/index.rst
> +++ b/Documentation/accel/index.rst
> @@ -10,4 +10,5 @@ Compute Accelerators
> introduction
> amdxdna/index
> qaic/index
> + qda/index
> rocket/index
> diff --git a/Documentation/accel/qda/index.rst b/Documentation/accel/qda/index.rst
> new file mode 100644
> index 000000000000..013400cf9c25
> --- /dev/null
> +++ b/Documentation/accel/qda/index.rst
> @@ -0,0 +1,13 @@
> +.. SPDX-License-Identifier: GPL-2.0-only
> +
> +==================================
> +accel/qda Qualcomm DSP Accelerator
> +==================================
> +
> +The QDA driver provides a DRM accel based interface for Qualcomm DSP offload.
> +It uses the FastRPC protocol and integrates with DRM and GEM infrastructure
> +for device and buffer management.
> +
> +.. toctree::
> +
> + qda
> diff --git a/Documentation/accel/qda/qda.rst b/Documentation/accel/qda/qda.rst
> new file mode 100644
> index 000000000000..9f49af6e6acc
> --- /dev/null
> +++ b/Documentation/accel/qda/qda.rst
> @@ -0,0 +1,146 @@
> +.. SPDX-License-Identifier: GPL-2.0-only
> +
> +=====================================
> +Qualcomm DSP Accelerator (QDA) Driver
> +=====================================
> +
> +Introduction
> +============
> +
> +The QDA driver is a DRM accel driver for Qualcomm's DSPs. It provides a
> +DRM accel based interface for Qualcomm DSP offload, supporting workloads
> +such as AI inference, computer vision, audio processing, and sensor offload
> +on Qualcomm SoCs. It uses the FastRPC protocol and integrates with DRM and
> +GEM infrastructure for device and buffer management.
> +
> +Key Features
> +============
> +
> +* **DRM accel Interface**: Exposes a standard character device node
> + (e.g., ``/dev/accel/accel0``) via the DRM accel subsystem.
> +* **FastRPC Protocol**: Implements the FastRPC protocol for communication
> + between the application processor and the DSP.
> +* **GEM Buffer Management**: Uses the DRM GEM interface for buffer
> + allocation, lifecycle management, and DMA-BUF import/export.
> +* **IOMMU Isolation**: Uses IOMMU context banks to enforce memory isolation
> + between different DSP user sessions.
> +* **Modular Design**: Clean separation between the core DRM logic, the
> + memory manager, and the RPMsg-based transport layer.
> +
> +Architecture
> +============
> +
> +The QDA driver consists of several functional blocks:
> +
> +1. **Core Driver (``qda_drv``)**: Manages device registration, file operations,
> + and DRM accel integration.
> +2. **Memory Manager (``qda_memory_manager``)**: A flexible memory management
> + layer that handles IOMMU context banks. It supports pluggable backends
> + (such as DMA-coherent) to adapt to different SoC memory architectures.
> +3. **GEM Subsystem**: Implements the DRM GEM interface for buffer management:
> +
> + * **``qda_gem``**: Core GEM object management, including allocation, mmap
> + operations, and buffer lifecycle management.
> + * **``qda_prime``**: PRIME import functionality for DMA-BUF interoperability
> + with other kernel subsystems.
> +
> +4. **Transport Layer (``qda_rpmsg``)**: Abstraction over the RPMsg framework
> + to handle low-level message passing with the DSP firmware.
> +5. **Compute Bus (``qda_compute_bus``)**: A custom virtual bus used to
> + enumerate and manage the specific compute context banks defined in the
> + device tree. The bus was introduced because IOMMU context banks (CBs) are
> + synthetic constructs — not real platform devices — making a platform driver
> + an incorrect abstraction for them. The earlier platform-driver approach also
> + had a race condition: device nodes were created before the RPMsg channel
> + resources were fully initialized, and because ``probe`` runs asynchronously,
> + applications could open a CB device and attempt to start a session before
> + the underlying transport was ready. The compute bus makes CB lifetime
> + explicitly subordinate to the parent QDA device, closing that window.
> +6. **FastRPC Core (``qda_fastrpc``)**: Implements the protocol logic for
> + marshalling arguments and handling remote invocations.
> +
> +User-Space API
> +==============
> +
> +The driver exposes a set of DRM-compliant IOCTLs:
> +
> +* ``DRM_IOCTL_QDA_QUERY``: Query DSP type (e.g., "cdsp", "adsp")
> + and capabilities.
> +* ``DRM_IOCTL_QDA_REMOTE_SESSION_CREATE``: Initialize a new process context
> + on the DSP.
> +* ``DRM_IOCTL_QDA_REMOTE_INVOKE``: Submit a remote method invocation (the
> + primary execution unit).
> +* ``DRM_IOCTL_QDA_GEM_CREATE``: Allocate a GEM buffer object for DSP usage.
> +* ``DRM_IOCTL_QDA_GEM_MMAP_OFFSET``: Retrieve mmap offsets for memory mapping.
> +* ``DRM_IOCTL_QDA_REMOTE_MAP`` / ``DRM_IOCTL_QDA_REMOTE_MUNMAP``: Map or unmap
> + buffers into the DSP's virtual address space. Each accepts a ``request``
> + field selecting between a legacy operation (``QDA_MAP_REQUEST_LEGACY`` /
> + ``QDA_MUNMAP_REQUEST_LEGACY``) and an attribute-based operation
> + (``QDA_MAP_REQUEST_ATTR`` / ``QDA_MUNMAP_REQUEST_ATTR``).

Explain, what happens in the users don't map the buffers into the DSP
space. Will DRM_IOCTL_QDA_REMOTE_INVOKE handle the mapping or not? What
is the difference between those two modes?

Would the driver benefit from using GPUVM?

> +
> +Usage Example
> +=============
> +
> +A typical lifecycle for a user-space application:
> +
> +1. **Discovery**: Open ``/dev/accel/accel*`` and use
> + ``DRM_IOCTL_QDA_QUERY`` to identify the DSP domain served by that
> + device node.
> +2. **Initialization**: Call ``DRM_IOCTL_QDA_REMOTE_SESSION_CREATE`` to
> + establish a session and create a process context on the DSP.
> +3. **Memory**: Allocate buffers via ``DRM_IOCTL_QDA_GEM_CREATE`` or import
> + DMA-BUFs (PRIME fd) from other drivers using ``DRM_IOCTL_PRIME_FD_TO_HANDLE``.
> +4. **Execution**: Use ``DRM_IOCTL_QDA_REMOTE_INVOKE`` to pass arguments and
> + execute functions on the DSP.
> +5. **Cleanup**: Close file descriptors to automatically release resources and
> + detach the session.

I'd have expected the description of the actual example. I.e. clone the
app from https://the.addr, prepare clang >= NN.MM, QAIC (https://foo),
run make, run the app, check the results. I'd remind that DRM Accel has
a very specific requirement of having the working toolhain in the
open-source.

> +
> +Internal Implementation
> +=======================
> +
> +Memory Management
> +-----------------
> +The driver's memory manager creates virtual "IOMMU devices" that map to
> +hardware context banks. This allows the driver to manage multiple isolated
> +address spaces. The implementation uses a DMA-coherent backend to ensure data consistency
> +between the CPU and DSP without manual cache maintenance in most cases.

GEM usage?

> +
> +Debugging
> +=========
> +The driver includes extensive dynamic debug support. Enable it via the
> +kernel's dynamic debug control:
> +
> +.. code-block:: bash
> +
> + echo "file drivers/accel/qda/* +p" > /sys/kernel/debug/dynamic_debug/control
> +
> +Testing
> +=======
> +The QDA driver can be exercised using the ``fastrpc_test`` utility from the
> +FastRPC userspace library. Run the test application:

pointer

> +
> +.. code-block:: bash
> +
> + fastrpc_test -d 3 -U 1 -t linux -a v68
> +
> +**Options**
> +
> +``-d domain``
> + Select the DSP domain to run on:
> +
> + * ``0`` — ADSP
> + * ``1`` — MDSP
> + * ``2`` — SDSP
> + * ``3`` — CDSP *(default on targets with CDSP)*
> +
> +``-U unsigned_PD``
> + Select signed or unsigned protection domain:
> +
> + * ``0`` — signed PD
> + * ``1`` — unsigned PD *(default)*
> +
> +``-t target``
> + Target platform: ``android`` or ``linux`` *(default: linux)*
> +
> +``-a arch_version``
> + DSP architecture version, e.g. ``v68``, ``v75`` *(default: v68)*
>
> --
> 2.34.1
>
>

--
With best wishes
Dmitry