Re: [PATCH v2] dma-debug: suppress cacheline overlap warning when arch has no DMA alignment requirement
From: Marek Szyprowski
Date: Thu Apr 02 2026 - 07:41:01 EST
On 01.04.2026 17:26, Robin Murphy wrote:
> On 2026-04-01 2:25 pm, Andy Shevchenko wrote:
>> On Wed, Apr 1, 2026 at 3:11 PM Robin Murphy <robin.murphy@xxxxxxx> wrote:
>>> On 2026-03-30 8:44 am, Marek Szyprowski wrote:
>>>> On 27.03.2026 13:41, Mikhail Gavrilov wrote:
>>
>> ...
>>
>>> TBH I'd be inclined to have CONFIG_DMA_DEBUG raise ARCH_DMA_MINALIGN as
>>> appropriate such that genuine false-positives can't happen, rather than
>>> effectively defeat the whole check,
>>
>> I dunno if you read v1 thread, where I proposed to unroll the check
>> and use pr_debug_once() for the cases which we expect not to panic,
>> but would be good to have a track of.
>
> I had not seen v1, as I took the last 3 days off and hadn't got that far up my inbox yet - I guess it's at least reassuring to have reached similar conclusions independently :)
>
> The fundamental issue here is that dma-debug doesn't realistically have a way to know whether the thing being mapped is intentionally a whole dedicated kmalloc allocation - where we can trust SLUB (and DMA_BOUNCE_UNALIGNED_KMALLOC if appropriate) to do the right thing across different systems - or just something which might happen to line up by coincidence on someone's development machine, but for portability they definitely do still need to take explicit care about (e.g. struct devres::data).
Right, the debug code cannot distinguish them. I'm still convinced that increasing
ARCH_DMA_MINALIGN when DEBUG is enabled is not a good idea. I also agree that the
current exceptions for DMA_BOUNCE_UNALIGNED_KMALLOC and dma_get_cache_alignment() <
L1_CACHE_BYTES are wider than really needed. What we can do about it?
For the (dma_get_cache_alignment() < L1_CACHE_BYTES) case I came up with the idea
of reducing the check only to the dma_get_cache_alignment()-size overlapping:
diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index 86f87e43438c..9bf4b5c368f9 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -418,13 +418,13 @@ static void hash_bucket_del(struct dma_debug_entry *entry)
static RADIX_TREE(dma_active_cacheline, GFP_ATOMIC | __GFP_NOWARN);
static DEFINE_SPINLOCK(radix_lock);
#define ACTIVE_CACHELINE_MAX_OVERLAP ((1 << RADIX_TREE_MAX_TAGS) - 1)
-#define CACHELINE_PER_PAGE_SHIFT (PAGE_SHIFT - L1_CACHE_SHIFT)
-#define CACHELINES_PER_PAGE (1 << CACHELINE_PER_PAGE_SHIFT)
static phys_addr_t to_cacheline_number(struct dma_debug_entry *entry)
{
- return ((entry->paddr >> PAGE_SHIFT) << CACHELINE_PER_PAGE_SHIFT) +
- (offset_in_page(entry->paddr) >> L1_CACHE_SHIFT);
+ if (dma_get_cache_alignment() >= L1_CACHE_BYTES)
+ return entry->paddr >> L1_CACHE_SHIFT;
+ else
+ return entry->paddr >> ilog2(dma_get_cache_alignment());
}
static int active_cacheline_read_overlap(phys_addr_t cln)
For the remaining cases (DMA_BOUNCE_UNALIGNED_KMALLOC mainly), based on the Catalin's
suggestion, I came up with the following idea:
diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index 0677918f06a8..f9b6a989ac15 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -18,6 +18,7 @@
#include <linux/uaccess.h>
#include <linux/export.h>
#include <linux/device.h>
+#include <linux/dma-direct.h>
#include <linux/types.h>
#include <linux/sched.h>
#include <linux/ctype.h>
@@ -590,6 +591,14 @@ static int dump_show(struct seq_file *seq, void *v)
}
DEFINE_SHOW_ATTRIBUTE(dump);
+static inline void dma_debug_fixup_bounced(struct dma_debug_entry *entry)
+{
+ if (IS_ENABLED(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC) &&
+ dma_kmalloc_needs_bounce(entry->dev, entry->size, entry->direction) &&
+ is_swiotlb_active(entry->dev))
+ entry->paddr = dma_to_phys(entry->dev, entry->dev_addr);
+}
+
/*
* Wrapper function for adding an entry to the hash.
* This function takes care of locking itself.
@@ -601,6 +610,8 @@ static void add_dma_entry(struct dma_debug_entry *entry, unsigned long attrs)
unsigned long flags;
int rc;
+ dma_debug_fixup_bounced(entry);
+
entry->is_cache_clean = attrs & (DMA_ATTR_DEBUGGING_IGNORE_CACHELINES |
DMA_ATTR_REQUIRE_COHERENT);
@@ -614,9 +625,7 @@ static void add_dma_entry(struct dma_debug_entry *entry, unsigned long attrs)
global_disable = true;
} else if (rc == -EEXIST &&
!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
- !(entry->is_cache_clean && overlap_cache_clean) &&
- !(IS_ENABLED(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC) &&
- is_swiotlb_active(entry->dev))) {
+ !(entry->is_cache_clean && overlap_cache_clean)) {
err_printk(entry->dev, entry,
"cacheline tracking EEXIST, overlapping mappings aren't supported\n");
}
@@ -981,6 +990,8 @@ static void check_unmap(struct dma_debug_entry *ref)
struct hash_bucket *bucket;
unsigned long flags;
+ dma_debug_fixup_bounced(ref);
+
bucket = get_hash_bucket(ref, &flags);
entry = bucket_find_exact(bucket, ref);
What do You think about it? Both at least allows to detect errors like multiple,
incompatible mappings of the same memory.
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland