dma: idxd: TLB translation fetch operation not supported despite PCI ATS enabled
From: Karim Manaouil
Date: Thu Mar 05 2026 - 13:32:35 EST
Hi Vinicius, Dave and others,
I am working on a potential mm feature leveraging the Intel DSA
accelerator. I am trying to get the TLB translation fetch operation
to work (described in section 8.3.11 in the spec [0]). However, I keep
getting error code 0x10 which correponds to "operation not supported",
despite that PCIe ATS is enabled.
I used idxd-config [1] and dsa-perf-micros [2] for testing as well as a
kernel module. This happened both on shared and dedicated queues.
An example with [2] is given below
With [2], Create a shared queue with a single engine
# ./scripts/setup_dsa.sh -d dsa0 -w 1 -m s -e 1
Then run a 4KiB memmove job (opcode 3) with translation fetching (-X).
# dsa_perf_micros -n10 -s4k -i1 -k5 -w1 -o3 -X1
dsa_perf_micros -n10 -s4k -i1 -k5 -w1 -o3 -X1
blen 4096
bstride 4096
bstride 4096
nb_bufs 10
pg_size 0
wq_type 1
batch_sz 1
iter 1
nb_cpus 1
var_mmio 0
dma 1
verify 1
misc_flags 0
access_op[0] Write
access_op[1] Write
place_op[0] Memory
place_op[1] Memory
flags_cmask ffffffff
flags_smask 0
flags_nth_desc 1
nb_numa_node 1
cpu_desc_work 0
Memory affinity
CPUs in node 0: -1 -1
Buffer Offsets 0 0
dsa_perf_micros: check_comp: desc[0] error
desc addr: 0x7adf31986000
desc[0]: 0x0a00000c00000000
desc[1]: 0x00007adf31987000
desc[2]: 0x0000650dabb381c0
desc[3]: 0x0000000000000000
desc[4]: 0x000000000000a000
desc[5]: 0x0000000000000000
desc[6]: 0x0000000000000000
desc[7]: 0x0000000000000000
dsa_perf_micros: print_status: Comp status 0x10
dsa_perf_micros: main: test run failed
As you can see, it fails with completion status 0x10.
I also tried submitting this directly from the kernel on top of idxd. I
prepared and submitted a dsa_hw_desc as follows (roughly)
struct dsa_hw_desc *hw;
hw->flags = 0;
hw->opcode = DSA_OPCODE_TRANSL_FETCH;
hw->transl_fetch_addr = sg_dma_address(&sg[i]);
hw->region_size = sg_dma_len(&sg[i]);
hw->region_stride = 4096;
hw->priv = 0;
rc = idxd_submit_desc(wq, desc);
and I got the same error code 0x10 in the completion record.
I am on Linux kernel 6.17 on Ubuntu and I tried this on two different
systems
1) Dual socket Intel Sapphire Rapids 4th Gen Xeon(R) Gold 5418N
2) Single socket Intel Emerald Rapids 5th Gen Xeon(R) Gold 5512U
I had the same error on both.
lspci shows that PCI ATS features is enabled for DSA on both. As an
example, on system (2):
# lspci -vvv -s 0000:f2:01.0
Capabilities: [220 v1] Address Translation Service (ATS)
ATSCap: Invalidate Queue Depth: 00
ATSCtl: Enable+, Smallest Translation Unit: 00
Capabilities: [230 v1] Process Address Space ID (PASID)
PASIDCap: Exec- Priv+, Max PASID Width: 14
PASIDCtl: Enable+ Exec- Priv+
Capabilities: [240 v1] Page Request Interface (PRI)
PRICtl: Enable+ Reset-
PRISta: RF- UPRGI- Stopped+
Page Request Capacity: 00000200, Page Request Allocation: 00000200
Kernel driver in use: idxd
Kernel modules: idxd
And this is the output of accel-config list on system (2) for the example above
[
{
"dev":"dsa0",
"read_buffer_limit":0,
"max_groups":4,
"max_work_queues":8,
"max_engines":4,
"work_queue_size":128,
"numa_node":0,
"op_cap":"00000000,00000000,00000000,00000000,00000000,00000000,00000001,003f027d",
"gen_cap":"0x40915f0107",
"version":"0x100",
"state":"enabled",
"max_read_buffers":96,
"max_batch_size":1024,
"configurable":1,
"pasid_enabled":1,
"cdev_major":235,
"clients":0,
"groups":[
{
"dev":"group0.0",
"read_buffers_reserved":0,
"use_read_buffer_limit":0,
"read_buffers_allowed":96,
"grouped_workqueues":[
{
"dev":"wq0.0",
"mode":"shared",
"size":128,
"group_id":0,
"priority":10,
"block_on_fault":0,
"max_batch_size":512,
"max_transfer_size":2097152,
"cdev_minor":0,
"type":"user",
"name":"app0",
"driver_name":"user",
"threshold":128,
"ats_disable":0,
"state":"enabled",
"clients":0
}
}
],
"grouped_engines":[
{
"dev":"engine0.0",
"group_id":0
}
]
},
{
"dev":"group0.1",
"read_buffers_reserved":0,
"use_read_buffer_limit":0,
"read_buffers_allowed":96
},
{
"dev":"group0.2",
"read_buffers_reserved":0,
"use_read_buffer_limit":0,
"read_buffers_allowed":96
},
{
"dev":"group0.3",
"read_buffers_reserved":0,
"use_read_buffer_limit":0,
"read_buffers_allowed":96
}
],
"ungrouped_engines":[
{
"dev":"engine0.1"
},
{
"dev":"engine0.2"
},
{
"dev":"engine0.3"
}
]
}
]
I tried to debug a bit in the kernel, and IOMMU code in the kernel successfully calls
pci_enable_ats() function at initialisation time. So I assume nothing is wrong with
IOMMU, PCI root complex and BIOS options.
Do you have any clue how to get this to work? Or will it ever work in
the first place on these systems? This is not documented anywhere.
Cheers
[0] https://cdrdv2-public.intel.com/857060/341204-006-intel-data-streaming-accelerator-spec.pdf
[1] https://github.com/intel/idxd-config
[2] https://github.com/intel/dsa-perf-micros/tree/main
--
~karim