Re: Bug in add_dma_entry()'s debugging code

From: Ferry Toth
Date: Fri Dec 01 2023 - 07:19:01 EST


Hi

On 01-12-2023 12:08, Catalin Marinas wrote:
On Thu, Nov 30, 2023 at 09:08:23PM +0100, Ferry Toth wrote:
Op 28-11-2023 om 18:44 schreef Catalin Marinas:
Or just force the kmalloc() min align to cache_line_size(), something
like:

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 4a658de44ee9..3ece20367636 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -543,6 +543,8 @@ static inline int dma_get_cache_alignment(void)
#ifdef ARCH_HAS_DMA_MINALIGN
return ARCH_DMA_MINALIGN;
#endif
+ if (IS_ENABLED(CONFIG_DMA_API_DEBUG))
+ return cache_line_size();
return 1;
}
#endif
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 8d431193c273..d0b21d6e9328 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -879,7 +879,7 @@ static unsigned int __kmalloc_minalign(void)
unsigned int minalign = dma_get_cache_alignment();
if (IS_ENABLED(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC) &&
- is_swiotlb_allocated())
+ is_swiotlb_allocated() && !IS_ENABLED(CONFIG_DMA_API_DEBUG))
minalign = ARCH_KMALLOC_MINALIGN;
return max(minalign, arch_slab_minalign());
With above suggestion "force the kmalloc() min align to cache_line_size()" +
Alan's debug code:

root@yuna:~# journalctl -k | grep hub
kernel: usbcore: registered new interface driver hub
kernel: hub 1-0:1.0: USB hub found
kernel: usb usb1: hub buffer at 71c7180, status at 71c71c0
kernel: hub 1-0:1.0: 1 port detected
kernel: hub 2-0:1.0: USB hub found
kernel: usb usb2: hub buffer at 71c79c0, status at 71c7a00
kernel: hub 2-0:1.0: 1 port detected
kernel: hub 1-1:1.0: USB hub found
kernel: usb 1-1: hub buffer at 65b36c0, status at 6639340
kernel: hub 1-1:1.0: 7 ports detected

and the stack trace indeed goes away.

IOW also the 2 root hub kmalloc() are now also aligned to the cache line
size, even though these never triggered the stack trace. Strange: hub status
is aligned far away from hub buffer, kmalloc mysteries.
They are 64 bytes apart in most cases other than the last one which I
guess the status had to go to a different slab page.

This still did not land for me: are we detecting a false alarm here as the 2
DMA operations can never happen on the same cache line on non-cache-coherent
platforms? If so, shouldn't we fix up the dma debug code to not detect a
false alarm? Instead of changing the alignment?
It's a false alarm indeed on this hardware since the DMA is
cache-coherent. I think Christoph mentioned earlier in this thread that
he'd like the DMA API debug to report potential problems even if the
hardware it is running on is safe.

I agree with Christoph. So, my question is "are we detecting a false alarm", not "a false alarm indeed on this hardware"?

Or, is this a bonafide warning (for non-cache-coherent platforms)? Then we
should not silence it by force aligning it, but issue a WARN (on a cache
coherent platform) that is more useful (i.e. here we have not an overlap but
a shared cache line). On a non-cache coherent platform something stronger
than a WARN might be appropriate?
A non-cache coherent platform would either have the kmalloc()
allocations aligned or it would bounce those small, unaligned buffers.
It would? Or it always has?
So it doesn't seem right to issue a warning on x86 where kmalloc()
minimum alignment is 8 and not getting the warning on a non-coherent
platform which forces the kmalloc() alignment.

If *all*non-coherent platforms implement either correct alignment or bounce buffer, and on (coherent) x86 we get an WARN, then it is a false alarm right?

That is exactly my question (because I have no idea which platforms have non-coherent caches).

If we consider the kmalloc() aspect already covered by bouncing or force
alignment, the DMA API debug code can still detect other cache line
sharing situations.

Consider? Or know?

If we know, and we can not easily prevent false WARN's on x86 it could be sufficient to change the message to "possible cacheline sharing detected" and add a line that "DMA_API_DEBUG" should be disabled on production systems.

And while at it, we might be able to fix the missed real cacheline sharing mentioned by Alan:

 "As a separate issue, active_cacheline_insert() fails to detect
overlap in situations where a mapping occupies more than one cache line.
For example, if mapping A uses lines N and N+1 and mapping B uses line
N+1, no overlap will be detected because the radix-tree keys for A and B
will be different (N vs. N+1)."