Re: [PATCH v2] dmaengine: check device and channel list for empty

From: Jiri Slaby
Date: Thu Jul 09 2020 - 01:35:45 EST


On 07. 07. 20, 17:42, Dave Jiang wrote:
> On 7/6/2020 11:05 PM, Jiri Slaby wrote:
>> On 26. 06. 20, 20:09, Dave Jiang wrote:
>>> Check dma device list and channel list for empty before iterate as the
>>> iteration function assume the list to be not empty. With devices and
>>> channels now being hot pluggable this is a condition that needs to be
>>> checked. Otherwise it can cause the iterator to spin forever.
>>
>> Could you be a little bit more specific how this can spin forever? I.e.
>> can you attach a stacktrace of such a behaviour?
>
> I can't seem to find the original splat that lead me to the conclusion
> of it's spinning forever. As I recall, the issue seems to produce
> different splats and not always consistent in being reproduced. Here's a
> partial splat that was tracked by the internal bug database. Since with
> the dma device and channel list being are hot added and removed, the
> device and channel lists can be empty. The list_entry() and friends
> expect the list to not be empty (according to header comment), I added
> the check to ensure that isn't the case before using them in dmaengine.

Yes, the comment states that as it is true: you receive a
wild/non-checkable pointer if you do list_entry on an empty list. BUT
have you actually read what I wrote:

>> As in the empty case, "&pos->member" is "head" (look into
>> list_for_each_entry) and the for loop should loop exactly zero times.

HERE ^^^^

> With the fix, we can no longer produce any of the splats. So maybe the
> above was a bad description of the issue.

No, not only the description, worse, the patch proper looks wrong.

> [ 4216.048375]Â ? dma_channel_rebalance+0x7b/0x250
> [ 4216.056360]Â dma_async_device_register+0x349/0x3a0
> [ 4216.064604]Â idxd_register_dma_device+0x90/0xc0 [idxd]
> [ 4216.073175]Â idxd_config_bus_probe.cold+0x7d/0x1fc [idxd]

So, the good part in the patch is the fixed locking in
dma_async_device_register. Otherwise it adds nonsense checks. So you
fixed the issue only by a chance, by a side effect as Peter pointed out.
Leaving aside that you broke dma_request_chan -- that could happen to
anybody.

Vinod, please drop/revert this patch. Then start over only with
dma_async_device_register fixed locking.

thanks,
--
js