Re: [PATCH RFC 00/18] accel/qda: Introduce Qualcomm DSP Accelerator driver

From: Ekansh Gupta

Date: Mon Mar 02 2026 - 03:55:09 EST




On 2/24/2026 3:33 AM, Bjorn Andersson wrote:
> On Tue, Feb 24, 2026 at 12:38:54AM +0530, Ekansh Gupta wrote:
>> This patch series introduces the Qualcomm DSP Accelerator (QDA) driver,
>> a modern DRM-based accelerator implementation for Qualcomm Hexagon DSPs.
>> The driver provides a standardized interface for offloading computational
>> tasks to DSPs found on Qualcomm SoCs, supporting all DSP domains (ADSP,
>> CDSP, SDSP, GDSP).
>>
>> The QDA driver is designed as an alternative for the FastRPC driver
>> in drivers/misc/, offering improved resource management, better integration
>> with standard kernel subsystems, and alignment with the Linux kernel's
>> Compute Accelerators framework.
>>
> If I understand correctly, this is just the same FastRPC protocol but
> in the accel framework, and hence with a new userspace ABI?
>
> I don't fancy the name "QDA" as an acronym for "FastRPC Accel".
>
> I would much prefer to see this living in drivers/accel/fastrpc and be
> named some variation of "fastrpc" (e.g. fastrpc_accel). (Driver name can
> be "fastrpc" as the other one apparently is named "qcom,fastrpc").
Planning to stick with QDA as per the future plans where the driver might use some
other mechanism than fastrpc(signalling).
>
>> User-space staging branch
>> ============
>> https://github.com/qualcomm/fastrpc/tree/accel/staging
>>
>> Key Features
>> ============
>>
>> * Standard DRM accelerator interface via /dev/accel/accelN
>> * GEM-based buffer management with DMA-BUF import/export support
>> * IOMMU-based memory isolation using per-process context banks
>> * FastRPC protocol implementation for DSP communication
>> * RPMsg transport layer for reliable message passing
>> * Support for all DSP domains (ADSP, CDSP, SDSP, GDSP)
>> * Comprehensive IOCTL interface for DSP operations
>>
>> High-Level Architecture Differences with Existing FastRPC Driver
>> =================================================================
>>
>> The QDA driver represents a significant architectural departure from the
>> existing FastRPC driver (drivers/misc/fastrpc.c), addressing several key
>> limitations while maintaining protocol compatibility:
>>
>> 1. DRM Accelerator Framework Integration
>> - FastRPC: Custom character device (/dev/fastrpc-*)
>> - QDA: Standard DRM accel device (/dev/accel/accelN)
>> - Benefit: Leverages established DRM infrastructure for device
>> management.
>>
>> 2. Memory Management
>> - FastRPC: Custom memory allocator with ION/DMA-BUF integration
>> - QDA: Native GEM objects with full PRIME support
>> - Benefit: Seamless buffer sharing using standard DRM mechanisms
>>
>> 3. IOMMU Context Bank Management
>> - FastRPC: Direct IOMMU domain manipulation, limited isolation
>> - QDA: Custom compute bus (qda_cb_bus_type) with proper device model
>> - Benefit: Each CB device is a proper struct device with IOMMU group
>> support, enabling better isolation and resource tracking.
>> - https://lore.kernel.org/all/245d602f-3037-4ae3-9af9-d98f37258aae@xxxxxxxxxxxxxxxx/
>>
>> 4. Memory Manager Architecture
>> - FastRPC: Monolithic allocator
>> - QDA: Pluggable memory manager with backend abstraction
>> - Benefit: Currently uses DMA-coherent backend, easily extensible for
>> future memory types (e.g., carveout, CMA)
>>
>> 5. Transport Layer
>> - FastRPC: Direct RPMsg integration in core driver
>> - QDA: Abstracted transport layer (qda_rpmsg.c)
>> - Benefit: Clean separation of concerns, easier to add alternative
>> transports if needed
>>
>> 8. Code Organization
>> - FastRPC: ~3000 lines in single file
>> - QDA: Modular design across multiple files (~4600 lines total)
> "Now 50% more LOC and you need 6 tabs open in your IDE!"
>
> Might be better, but in itself it provides no immediate value.
I added this as a point because I think separating/abstracting sensible parts to different files
might improve readability and maintainability. But if that doesn't make sense, then I can
remove this point.

https://lore.kernel.org/all/c007308b-4641-44a5-9e64-fb085cced2b0@xxxxxxxxxx/
>
>> * qda_drv.c: Core driver and DRM integration
>> * qda_gem.c: GEM object management
>> * qda_memory_manager.c: Memory and IOMMU management
>> * qda_fastrpc.c: FastRPC protocol implementation
>> * qda_rpmsg.c: Transport layer
>> * qda_cb.c: Context bank device management
>> - Benefit: Better maintainability, clearer separation of concerns
>>
>> 9. UAPI Design
>> - FastRPC: Custom IOCTL interface
>> - QDA: DRM-style IOCTLs with proper versioning support
>> - Benefit: Follows DRM conventions, easier userspace integration
>>
>> 10. Documentation
>> - FastRPC: Minimal in-tree documentation
>> - QDA: Comprehensive documentation in Documentation/accel/qda/
>> - Benefit: Better developer experience, clearer API contracts
>>
>> 11. Buffer Reference Mechanism
>> - FastRPC: Uses buffer file descriptors (FDs) for all book-keeping
>> in both kernel and DSP
>> - QDA: Uses GEM handles for kernel-side management, providing better
>> integration with DRM subsystem
>> - Benefit: Leverages DRM GEM infrastructure for reference counting,
>> lifetime management, and integration with other DRM components
>>
> This is all good, but what is the plan regarding /dev/fastrpc-*?
>
> The idea here clearly is to provide an alternative implementation, and
> they seem to bind to the same toplevel compatible - so you can only
> compile one into your kernel at any point in time.
>
> So if I understand correctly, at some point in time we need to say
> CONFIG_DRM_ACCEL_QDA=m and CONFIG_QCOM_FASTRPC=n, which will break all
> existing user space applications? That's not acceptable.
>
>
> Would it be possible to have a final driver that is implemented as a
> accel, but provides wrappers for the legacy misc and ioctl interface to
> the applications?
As per the discussions on other thread, I believe compat driver would be the way to
go for this. When I send the actual driver changes, I can include compat driver as well
to the patches.

I'm assuming a compat driver will live in the same QDA directory and will translate misc/fastrpc
calls to accel/qda calls if QDA is enabled.
>
> Regards,
> Bjorn
>
>> Key Technical Improvements
>> ===========================
>>
>> * Proper device model: CB devices are real struct device instances on a
>> custom bus, enabling proper IOMMU group management and power management
>> integration
>>
>> * Reference-counted IOMMU devices: Multiple file descriptors from the same
>> process share a single IOMMU device, reducing overhead
>>
>> * GEM-based buffer lifecycle: Automatic cleanup via DRM GEM reference
>> counting, eliminating many resource leak scenarios
>>
>> * Modular memory backends: The memory manager supports pluggable backends,
>> currently implementing DMA-coherent allocations with SID-prefixed
>> addresses for DSP firmware
>>
>> * Context-based invocation tracking: XArray-based context management with
>> proper synchronization and cleanup
>>
>> Patch Series Organization
>> ==========================
>>
>> Patches 1-2: Driver skeleton and documentation
>> Patches 3-6: RPMsg transport and IOMMU/CB infrastructure
>> Patches 7-9: DRM device registration and basic IOCTL
>> Patches 10-12: GEM buffer management and PRIME support
>> Patches 13-17: FastRPC protocol implementation (attach, invoke, create,
>> map/unmap)
>> Patch 18: MAINTAINERS entry
>>
>> Open Items
>> ===========
>>
>> The following items are identified as open items:
>>
>> 1. Privilege Level Management
>> - Currently, daemon processes and user processes have the same access
>> level as both use the same accel device node. This needs to be
>> addressed as daemons attach to privileged DSP PDs and require
>> higher privilege levels for system-level operations
>> - Seeking guidance on the best approach: separate device nodes,
>> capability-based checks, or DRM master/authentication mechanisms
>>
>> 2. UAPI Compatibility Layer
>> - Add UAPI compat layer to facilitate migration of client applications
>> from existing FastRPC UAPI to the new QDA accel driver UAPI,
>> ensuring smooth transition for existing userspace code
>> - Seeking guidance on implementation approach: in-kernel translation
>> layer, userspace wrapper library, or hybrid solution
>>
>> 3. Documentation Improvements
>> - Add detailed IOCTL usage examples
>> - Document DSP firmware interface requirements
>> - Create migration guide from existing FastRPC
>>
>> 4. Per-Domain Memory Allocation
>> - Develop new userspace API to support memory allocation on a per
>> domain basis, enabling domain-specific memory management and
>> optimization
>>
>> 5. Audio and Sensors PD Support
>> - The current patch series does not handle Audio PD and Sensors PD
>> functionalities. These specialized protection domains require
>> additional support for real-time constraints and power management
>>
>> Interface Compatibility
>> ========================
>>
>> The QDA driver maintains compatibility with existing FastRPC infrastructure:
>>
>> * Device Tree Bindings: The driver uses the same device tree bindings as
>> the existing FastRPC driver, ensuring no changes are required to device
>> tree sources. The "qcom,fastrpc" compatible string and child node
>> structure remain unchanged.
>>
>> * Userspace Interface: While the driver provides a new DRM-based UAPI,
>> the underlying FastRPC protocol and DSP firmware interface remain
>> compatible. This ensures that DSP firmware and libraries continue to
>> work without modification.
>>
>> * Migration Path: The modular design allows for gradual migration, where
>> both drivers can coexist during the transition period. Applications can
>> be migrated incrementally to the new UAPI with the help of the planned
>> compatibility layer.
>>
>> References
>> ==========
>>
>> Previous discussions on this migration:
>> - https://lkml.org/lkml/2024/6/24/479
>> - https://lkml.org/lkml/2024/6/21/1252
>>
>> Testing
>> =======
>>
>> The driver has been tested on Qualcomm platforms with:
>> - Basic FastRPC attach/release operations
>> - DSP process creation and initialization
>> - Memory mapping/unmapping operations
>> - Dynamic invocation with various buffer types
>> - GEM buffer allocation and mmap
>> - PRIME buffer import from other subsystems
>>
>> Signed-off-by: Ekansh Gupta <ekansh.gupta@xxxxxxxxxxxxxxxx>
>> ---
>> Ekansh Gupta (18):
>> accel/qda: Add Qualcomm QDA DSP accelerator driver docs
>> accel/qda: Add Qualcomm DSP accelerator driver skeleton
>> accel/qda: Add RPMsg transport for Qualcomm DSP accelerator
>> accel/qda: Add built-in compute CB bus for QDA and integrate with IOMMU
>> accel/qda: Create compute CB devices on QDA compute bus
>> accel/qda: Add memory manager for CB devices
>> accel/qda: Add DRM accel device registration for QDA driver
>> accel/qda: Add per-file DRM context and open/close handling
>> accel/qda: Add QUERY IOCTL and basic QDA UAPI header
>> accel/qda: Add DMA-backed GEM objects and memory manager integration
>> accel/qda: Add GEM_CREATE and GEM_MMAP_OFFSET IOCTLs
>> accel/qda: Add PRIME dma-buf import support
>> accel/qda: Add initial FastRPC attach and release support
>> accel/qda: Add FastRPC dynamic invocation support
>> accel/qda: Add FastRPC DSP process creation support
>> accel/qda: Add FastRPC-based DSP memory mapping support
>> accel/qda: Add FastRPC-based DSP memory unmapping support
>> MAINTAINERS: Add MAINTAINERS entry for QDA driver
>>
>> Documentation/accel/index.rst | 1 +
>> Documentation/accel/qda/index.rst | 14 +
>> Documentation/accel/qda/qda.rst | 129 ++++
>> MAINTAINERS | 9 +
>> arch/arm64/configs/defconfig | 2 +
>> drivers/accel/Kconfig | 1 +
>> drivers/accel/Makefile | 2 +
>> drivers/accel/qda/Kconfig | 35 ++
>> drivers/accel/qda/Makefile | 19 +
>> drivers/accel/qda/qda_cb.c | 182 ++++++
>> drivers/accel/qda/qda_cb.h | 26 +
>> drivers/accel/qda/qda_compute_bus.c | 23 +
>> drivers/accel/qda/qda_drv.c | 375 ++++++++++++
>> drivers/accel/qda/qda_drv.h | 171 ++++++
>> drivers/accel/qda/qda_fastrpc.c | 1002 ++++++++++++++++++++++++++++++++
>> drivers/accel/qda/qda_fastrpc.h | 433 ++++++++++++++
>> drivers/accel/qda/qda_gem.c | 211 +++++++
>> drivers/accel/qda/qda_gem.h | 103 ++++
>> drivers/accel/qda/qda_ioctl.c | 271 +++++++++
>> drivers/accel/qda/qda_ioctl.h | 118 ++++
>> drivers/accel/qda/qda_memory_dma.c | 91 +++
>> drivers/accel/qda/qda_memory_dma.h | 46 ++
>> drivers/accel/qda/qda_memory_manager.c | 382 ++++++++++++
>> drivers/accel/qda/qda_memory_manager.h | 148 +++++
>> drivers/accel/qda/qda_prime.c | 194 +++++++
>> drivers/accel/qda/qda_prime.h | 43 ++
>> drivers/accel/qda/qda_rpmsg.c | 327 +++++++++++
>> drivers/accel/qda/qda_rpmsg.h | 57 ++
>> drivers/iommu/iommu.c | 4 +
>> include/linux/qda_compute_bus.h | 22 +
>> include/uapi/drm/qda_accel.h | 224 +++++++
>> 31 files changed, 4665 insertions(+)
>> ---
>> base-commit: d4906ae14a5f136ceb671bb14cedbf13fa560da6
>> change-id: 20260223-qda-firstpost-4ab05249e2cc
>>
>> Best regards,
>> --
>> Ekansh Gupta <ekansh.gupta@xxxxxxxxxxxxxxxx>
>>
>>