Here's my take on tying all the threads together. There are
four alignment combinations:
1. alloc_align_mask: zero; min_align_mask: zero
2. alloc_align_mask: zero; min_align_mask: non-zero
3. alloc_align_mask: non-zero; min_align_mask: zero/ignored
4. alloc_align_mask: non-zero; min_align_mask: non-zero
What does "min_align_mask: zero/ignored" mean? Under which
circumstances should be a non-zero min_align_mask ignored?
xen_swiotlb_map_page() and dma_direct_map_page() are #1 or #2
via swiotlb_map() and swiotlb_tbl_map_single()
iommu_dma_map_page() is #3 and #4 via swiotlb_tbl_map_single()
swiotlb_alloc() is #3, directly to swiotlb_find_slots()
For #1, the returned physical address has no constraints if
the requested size is less than a page. For page size or
greater, the discussed historical requirement for page
alignment applies.
For #2, min_align_mask governs the bits of the returned
physical address that must match the original address. When
needed, swiotlb must also allocate pre-padding aligned to
IO_TLB_SIZE that precedes the returned physical address. A
request size <= swiotlb_max_mapping_size() will not exceed
IO_TLB_SEGSIZE even with the padding. The historical
requirement for page alignment does not apply because the
driver has explicitly used the newer min_align_mask feature.
What is the idea here? Is it the assumption that only old drivers rely
on page alignment, so if they use min_align_mask, it proves that they
are new and must not rely on page alignment?
For #3, alloc_align_mask specifies the required alignment. No
pre-padding is needed. Per earlier comments from Robin[1],
it's reasonable to assume alloc_align_mask (i.e., the granule)
is >= IO_TLB_SIZE. The original address is not relevant in
determining the alignment, and the historical page alignment
requirement does not apply since alloc_align_mask explicitly
states the alignment.
For #4, the returned physical address must match the bits
in the original address specified by min_align_mask. swiotlb
swiotlb must also allocate pre-padding aligned to
alloc_align_mask that precedes the returned physical address.
Also per Robin[1], assume alloc_align_mask is >=
min_align_mask, which solves the conflicting alignment
problem pointed out by Petr[2]. Perhaps we should add a
"WARN_ON(alloc_align_mask < min_align_mask)" rather than
failing depending on which bits of the original address are
set. Again, the historical requirement for page alignment does
not apply.
AFAICS the only reason this works in practice is that there are only
two in-tree users of min_align_mask: NVMe and Hyper-V. Both use a mask
of 12 bits, and the IOVA granule size is never smaller than 4K.
If we want to rely on this, then I propose to make a BUG_ON() rather
than WARN_ON().