Re: [PATCH] dma-debug: New interfaces to debug dma mapping errors

From: Shuah Khan
Date: Tue Sep 18 2012 - 15:42:46 EST


On Tue, 2012-09-18 at 15:34 +0200, Joerg Roedel wrote:
> On Mon, Sep 17, 2012 at 04:45:15PM -0600, Shuah Khan wrote:
> > Yeah. I will firm up my ideas a bit and summarize in a day or two. Would
> > like to hear your ideas as well at that time, so we can pick the one
> > that works the best.
>
> I think the best approach for this functionality is to add a flag to
> 'struct dma_debug_entry' which tells whether the address has been
> checked with dma_mapping error or not. On unmap or driver unload you can
> then check for that flag and print a warning when an unchecked address
> is detected.

Was hoping to get comments from you as well. You are original author for
this dam-debug module.

Are you ok with the system wide and per device error counts I added? Any
comments on the overall approach?

The approach you suggested will cover the cases where drivers fail to
check good map cases. We won't able to catch failed maps that get used
without checks. Are you not concerned about these cases? These could
cause a silent error with wild writes or could bring the system down. Or
are you recommending changing the infrastructure to track failed maps as
well?

I am still pursuing a way to track failed map cases. I combined the flag
idea with one of the ideas I am looking into. Details below: (if this
sounds like a reasonable approach, I can do v2 patch and we can discuss
the code)

. Add new fields dma_map_errors, dma_map_errors_not_checked,
dma_unmap_errors, iotlb_overflow_cnt, and flag to struct
dma_debug_entry. Maybe flag is not even needed if
dma_map_errors_not_checked can double as status.

. Enhance dma_debug_init() to create a second table to track failed maps
with PREALLOC_DMA_DEBUG_ENTRIES/64 = 64. 64 devices probably is good
enough.

. Entries added to this new table when debug_dma_map_page() detects
error when mapping error is detected for the first time. Subsequent
errors, will increment dma_map_errors, dma_map_errors_not_checked for
that the device that is tracked by this entry. Note: paddr field could
work as an index into this table (existing table uses dma_addr)

. Decrement dma_map_errors_not_checked from debug_dma_mapping_error(),
clear the flag.

. check_unmap() when it detects mapping error, checks flag (status) and
prints warn message.

-- Shuah

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/