Re: [PATCH] dma-debug: fix debug_dma_assert_idle(), use rcu_read_lock()

From: Linus Torvalds
Date: Thu Aug 13 2020 - 15:03:04 EST

On Wed, Aug 12, 2020 at 8:17 PM Hugh Dickins <hughd@xxxxxxxxxx> wrote:
> Since commit 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic")
> improved unlock_page(), it has become more noticeable how cow_user_page()
> in a kernel with CONFIG_DMA_API_DEBUG=y can create and suffer from heavy
> contention on DMA debug's radix_lock in debug_dma_assert_idle().


Yeah, that's ridiculously expensive, and serializes things for no good reason.

Your patch looks obviously correct to me (Christoph?), but it also
makes me go "why are we doing this in the first place"?

Because it looks to me like
(a) the debug check is wrong
(b) this is left-over from early debugging

In particular, I don't see why we couldn't do a COW on a page that is
under writeback at the same time. We're not changing the page that is
doing DMA.

In fact, the whole "COW with DMA" makes me feel like the real bug may
have been due that whole "ambiguous COW" thing, which was fixed in
17839856fd58 ("gup: document and work around "COW can break either
way" issue")

That debug thing goes back almost 7 years, and I don't think it has
caught anything in those seven years, but I could be wrong.

The commit that adds it does talk about a bug, but that code was
removed entirely eventually. And google shows no hits for
debug_dma_assert_idle() since - until your email.

So my gut feel is that we should remove the check entirely, although
your patch does seem like a big improvement.


(And Dan too, of course, in case he happens to be relaxing in front of
the computer away from a newborn baby ;)