Re: dma: idxd: TLB translation fetch operation not supported despite PCI ATS enabled

From: Karim Manaouil

Date: Thu Mar 05 2026 - 15:10:47 EST


On Thu, Mar 05, 2026 at 11:51:19AM -0700, Dave Jiang wrote:
>
>
> On 3/5/26 11:32 AM, Karim Manaouil wrote:
> > Hi Vinicius, Dave and others,
> >
> > I am working on a potential mm feature leveraging the Intel DSA
> > accelerator. I am trying to get the TLB translation fetch operation
> > to work (described in section 8.3.11 in the spec [0]). However, I keep
> > getting error code 0x10 which correponds to "operation not supported",
> > despite that PCIe ATS is enabled.
>
> Have you checked the op_cap sysfs attribute (OPCAP register) and see if the platform you are using supports the opcode? From the error code you are reporting, it appears the CPU you are using does not support that op. You may need a Granite Rapids (or equivalent gen) platform for that to be available. I don't recall that OP for DSA v1.0 which is on your SPR and EMR platforms.
>

You're right. I just checked the op_cap. It's not supported on SPR and EMR. Thank you.

> DJ
>
> >
> > I used idxd-config [1] and dsa-perf-micros [2] for testing as well as a
> > kernel module. This happened both on shared and dedicated queues.
> >
> > An example with [2] is given below
> >
> > With [2], Create a shared queue with a single engine
> >
> > # ./scripts/setup_dsa.sh -d dsa0 -w 1 -m s -e 1
> >
> > Then run a 4KiB memmove job (opcode 3) with translation fetching (-X).
> >
> > # dsa_perf_micros -n10 -s4k -i1 -k5 -w1 -o3 -X1
> > dsa_perf_micros -n10 -s4k -i1 -k5 -w1 -o3 -X1
> > blen 4096
> > bstride 4096
> > bstride 4096
> > nb_bufs 10
> > pg_size 0
> > wq_type 1
> > batch_sz 1
> > iter 1
> > nb_cpus 1
> > var_mmio 0
> > dma 1
> > verify 1
> > misc_flags 0
> > access_op[0] Write
> > access_op[1] Write
> > place_op[0] Memory
> > place_op[1] Memory
> > flags_cmask ffffffff
> > flags_smask 0
> > flags_nth_desc 1
> > nb_numa_node 1
> > cpu_desc_work 0
> > Memory affinity
> > CPUs in node 0: -1 -1
> > Buffer Offsets 0 0
> > dsa_perf_micros: check_comp: desc[0] error
> > desc addr: 0x7adf31986000
> > desc[0]: 0x0a00000c00000000
> > desc[1]: 0x00007adf31987000
> > desc[2]: 0x0000650dabb381c0
> > desc[3]: 0x0000000000000000
> > desc[4]: 0x000000000000a000
> > desc[5]: 0x0000000000000000
> > desc[6]: 0x0000000000000000
> > desc[7]: 0x0000000000000000
> > dsa_perf_micros: print_status: Comp status 0x10
> > dsa_perf_micros: main: test run failed
> >
> > As you can see, it fails with completion status 0x10.
> >
> > I also tried submitting this directly from the kernel on top of idxd. I
> > prepared and submitted a dsa_hw_desc as follows (roughly)
> >
> > struct dsa_hw_desc *hw;
> >
> > hw->flags = 0;
> > hw->opcode = DSA_OPCODE_TRANSL_FETCH;
> > hw->transl_fetch_addr = sg_dma_address(&sg[i]);
> > hw->region_size = sg_dma_len(&sg[i]);
> > hw->region_stride = 4096;
> > hw->priv = 0;
> >
> > rc = idxd_submit_desc(wq, desc);
> >
> > and I got the same error code 0x10 in the completion record.
> >
> > I am on Linux kernel 6.17 on Ubuntu and I tried this on two different
> > systems
> >
> > 1) Dual socket Intel Sapphire Rapids 4th Gen Xeon(R) Gold 5418N
> > 2) Single socket Intel Emerald Rapids 5th Gen Xeon(R) Gold 5512U
> >
> > I had the same error on both.
> >
> > lspci shows that PCI ATS features is enabled for DSA on both. As an
> > example, on system (2):
> >
> > # lspci -vvv -s 0000:f2:01.0
> > Capabilities: [220 v1] Address Translation Service (ATS)
> > ATSCap: Invalidate Queue Depth: 00
> > ATSCtl: Enable+, Smallest Translation Unit: 00
> > Capabilities: [230 v1] Process Address Space ID (PASID)
> > PASIDCap: Exec- Priv+, Max PASID Width: 14
> > PASIDCtl: Enable+ Exec- Priv+
> > Capabilities: [240 v1] Page Request Interface (PRI)
> > PRICtl: Enable+ Reset-
> > PRISta: RF- UPRGI- Stopped+
> > Page Request Capacity: 00000200, Page Request Allocation: 00000200
> > Kernel driver in use: idxd
> > Kernel modules: idxd
> >
> > And this is the output of accel-config list on system (2) for the example above
> >
> > [
> > {
> > "dev":"dsa0",
> > "read_buffer_limit":0,
> > "max_groups":4,
> > "max_work_queues":8,
> > "max_engines":4,
> > "work_queue_size":128,
> > "numa_node":0,
> > "op_cap":"00000000,00000000,00000000,00000000,00000000,00000000,00000001,003f027d",
> > "gen_cap":"0x40915f0107",
> > "version":"0x100",
> > "state":"enabled",
> > "max_read_buffers":96,
> > "max_batch_size":1024,
> > "configurable":1,
> > "pasid_enabled":1,
> > "cdev_major":235,
> > "clients":0,
> > "groups":[
> > {
> > "dev":"group0.0",
> > "read_buffers_reserved":0,
> > "use_read_buffer_limit":0,
> > "read_buffers_allowed":96,
> > "grouped_workqueues":[
> > {
> > "dev":"wq0.0",
> > "mode":"shared",
> > "size":128,
> > "group_id":0,
> > "priority":10,
> > "block_on_fault":0,
> > "max_batch_size":512,
> > "max_transfer_size":2097152,
> > "cdev_minor":0,
> > "type":"user",
> > "name":"app0",
> > "driver_name":"user",
> > "threshold":128,
> > "ats_disable":0,
> > "state":"enabled",
> > "clients":0
> > }
> > }
> > ],
> > "grouped_engines":[
> > {
> > "dev":"engine0.0",
> > "group_id":0
> > }
> > ]
> > },
> > {
> > "dev":"group0.1",
> > "read_buffers_reserved":0,
> > "use_read_buffer_limit":0,
> > "read_buffers_allowed":96
> > },
> > {
> > "dev":"group0.2",
> > "read_buffers_reserved":0,
> > "use_read_buffer_limit":0,
> > "read_buffers_allowed":96
> > },
> > {
> > "dev":"group0.3",
> > "read_buffers_reserved":0,
> > "use_read_buffer_limit":0,
> > "read_buffers_allowed":96
> > }
> > ],
> > "ungrouped_engines":[
> > {
> > "dev":"engine0.1"
> > },
> > {
> > "dev":"engine0.2"
> > },
> > {
> > "dev":"engine0.3"
> > }
> > ]
> > }
> > ]
> >
> > I tried to debug a bit in the kernel, and IOMMU code in the kernel successfully calls
> > pci_enable_ats() function at initialisation time. So I assume nothing is wrong with
> > IOMMU, PCI root complex and BIOS options.
> >
> > Do you have any clue how to get this to work? Or will it ever work in
> > the first place on these systems? This is not documented anywhere.
> >
> > Cheers
> >
> > [0] https://cdrdv2-public.intel.com/857060/341204-006-intel-data-streaming-accelerator-spec.pdf
> > [1] https://github.com/intel/idxd-config
> > [2] https://github.com/intel/dsa-perf-micros/tree/main
>

--
~karim