Re: [PATCH v2] dmaengine: check device and channel list for empty

From: Dave Jiang
Date: Thu Jul 09 2020 - 11:23:59 EST




On 7/8/2020 10:35 PM, Jiri Slaby wrote:
On 07. 07. 20, 17:42, Dave Jiang wrote:
On 7/6/2020 11:05 PM, Jiri Slaby wrote:
On 26. 06. 20, 20:09, Dave Jiang wrote:
Check dma device list and channel list for empty before iterate as the
iteration function assume the list to be not empty. With devices and
channels now being hot pluggable this is a condition that needs to be
checked. Otherwise it can cause the iterator to spin forever.

Could you be a little bit more specific how this can spin forever? I.e.
can you attach a stacktrace of such a behaviour?

I can't seem to find the original splat that lead me to the conclusion
of it's spinning forever. As I recall, the issue seems to produce
different splats and not always consistent in being reproduced. Here's a
partial splat that was tracked by the internal bug database. Since with
the dma device and channel list being are hot added and removed, the
device and channel lists can be empty. The list_entry() and friends
expect the list to not be empty (according to header comment), I added
the check to ensure that isn't the case before using them in dmaengine.

Yes, the comment states that as it is true: you receive a
wild/non-checkable pointer if you do list_entry on an empty list. BUT
have you actually read what I wrote:

As in the empty case, "&pos->member" is "head" (look into
list_for_each_entry) and the for loop should loop exactly zero times.

HERE ^^^^

With the fix, we can no longer produce any of the splats. So maybe the
above was a bad description of the issue.

No, not only the description, worse, the patch proper looks wrong.

[ 4216.048375]Â ? dma_channel_rebalance+0x7b/0x250
[ 4216.056360]Â dma_async_device_register+0x349/0x3a0
[ 4216.064604]Â idxd_register_dma_device+0x90/0xc0 [idxd]
[ 4216.073175]Â idxd_config_bus_probe.cold+0x7d/0x1fc [idxd]

So, the good part in the patch is the fixed locking in
dma_async_device_register. Otherwise it adds nonsense checks. So you
fixed the issue only by a chance, by a side effect as Peter pointed out.
Leaving aside that you broke dma_request_chan -- that could happen to
anybody.

Vinod, please drop/revert this patch. Then start over only with
dma_async_device_register fixed locking.

I'll start on the proper fix.


thanks,