Re: [PATCH v2] dma-debug: suppress cacheline overlap warning when arch has no DMA alignment requirement

From: Marek Szyprowski

Date: Thu Apr 02 2026 - 07:41:01 EST

On 01.04.2026 17:26, Robin Murphy wrote:
> On 2026-04-01 2:25 pm, Andy Shevchenko wrote:
>> On Wed, Apr 1, 2026 at 3:11 PM Robin Murphy <robin.murphy@xxxxxxx> wrote:
>>> On 2026-03-30 8:44 am, Marek Szyprowski wrote:
>>>> On 27.03.2026 13:41, Mikhail Gavrilov wrote:
>>
>> ...
>>
>>> TBH I'd be inclined to have CONFIG_DMA_DEBUG raise ARCH_DMA_MINALIGN as
>>> appropriate such that genuine false-positives can't happen, rather than
>>> effectively defeat the whole check,
>>
>> I dunno if you read v1 thread, where I proposed to unroll the check
>> and use pr_debug_once() for the cases which we expect not to panic,
>> but would be good to have a track of.
>
> I had not seen v1, as I took the last 3 days off and hadn't got that far up my inbox yet - I guess it's at least reassuring to have reached similar conclusions independently :)
>
> The fundamental issue here is that dma-debug doesn't realistically have a way to know whether the thing being mapped is intentionally a whole dedicated kmalloc allocation - where we can trust SLUB (and DMA_BOUNCE_UNALIGNED_KMALLOC if appropriate) to do the right thing across different systems - or just something which might happen to line up by coincidence on someone's development machine, but for portability they definitely do still need to take explicit care about (e.g. struct devres::data).

Right, the debug code cannot distinguish them. I'm still convinced that increasing
ARCH_DMA_MINALIGN when DEBUG is enabled is not a good idea. I also agree that the
current exceptions for DMA_BOUNCE_UNALIGNED_KMALLOC and dma_get_cache_alignment() <
L1_CACHE_BYTES are wider than really needed. What we can do about it?

For the (dma_get_cache_alignment() < L1_CACHE_BYTES) case I came up with the idea
of reducing the check only to the dma_get_cache_alignment()-size overlapping:

diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index 86f87e43438c..9bf4b5c368f9 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -418,13 +418,13 @@ static void hash_bucket_del(struct dma_debug_entry *entry)
static RADIX_TREE(dma_active_cacheline, GFP_ATOMIC | __GFP_NOWARN);
static DEFINE_SPINLOCK(radix_lock);
#define ACTIVE_CACHELINE_MAX_OVERLAP ((1 << RADIX_TREE_MAX_TAGS) - 1)
-#define CACHELINE_PER_PAGE_SHIFT (PAGE_SHIFT - L1_CACHE_SHIFT)
-#define CACHELINES_PER_PAGE (1 << CACHELINE_PER_PAGE_SHIFT)

static phys_addr_t to_cacheline_number(struct dma_debug_entry *entry)
{
-       return ((entry->paddr >> PAGE_SHIFT) << CACHELINE_PER_PAGE_SHIFT) +
-               (offset_in_page(entry->paddr) >> L1_CACHE_SHIFT);
+       if (dma_get_cache_alignment() >= L1_CACHE_BYTES)
+               return entry->paddr >> L1_CACHE_SHIFT;
+       else
+               return entry->paddr >> ilog2(dma_get_cache_alignment());
}

static int active_cacheline_read_overlap(phys_addr_t cln)

For the remaining cases (DMA_BOUNCE_UNALIGNED_KMALLOC mainly), based on the Catalin's
suggestion, I came up with the following idea:

diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index 0677918f06a8..f9b6a989ac15 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -18,6 +18,7 @@
#include <linux/uaccess.h>
#include <linux/export.h>
#include <linux/device.h>
+#include <linux/dma-direct.h>
#include <linux/types.h>
#include <linux/sched.h>
#include <linux/ctype.h>
@@ -590,6 +591,14 @@ static int dump_show(struct seq_file *seq, void *v)
}
DEFINE_SHOW_ATTRIBUTE(dump);

+static inline void dma_debug_fixup_bounced(struct dma_debug_entry *entry)
+{
+       if (IS_ENABLED(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC) &&
+           dma_kmalloc_needs_bounce(entry->dev, entry->size, entry->direction) &&
+           is_swiotlb_active(entry->dev))
+               entry->paddr = dma_to_phys(entry->dev, entry->dev_addr);
+}
+
/*
* Wrapper function for adding an entry to the hash.
* This function takes care of locking itself.
@@ -601,6 +610,8 @@ static void add_dma_entry(struct dma_debug_entry *entry, unsigned long attrs)
        unsigned long flags;
        int rc;

+       dma_debug_fixup_bounced(entry);
+
        entry->is_cache_clean = attrs & (DMA_ATTR_DEBUGGING_IGNORE_CACHELINES |
                                         DMA_ATTR_REQUIRE_COHERENT);

@@ -614,9 +625,7 @@ static void add_dma_entry(struct dma_debug_entry *entry, unsigned long attrs)
                global_disable = true;
        } else if (rc == -EEXIST &&
                   !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
-                  !(entry->is_cache_clean && overlap_cache_clean) &&
-                  !(IS_ENABLED(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC) &&
-                    is_swiotlb_active(entry->dev))) {
+                  !(entry->is_cache_clean && overlap_cache_clean)) {
                err_printk(entry->dev, entry,
                        "cacheline tracking EEXIST, overlapping mappings aren't supported\n");
        }
@@ -981,6 +990,8 @@ static void check_unmap(struct dma_debug_entry *ref)
        struct hash_bucket *bucket;
        unsigned long flags;

+       dma_debug_fixup_bounced(ref);
+
        bucket = get_hash_bucket(ref, &flags);
        entry = bucket_find_exact(bucket, ref);

What do You think about it? Both at least allows to detect errors like multiple,
incompatible mappings of the same memory.

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland