Re: 5.10 regression, many XHCI swiotlb buffer is full / DMAR: Device bounce map failed errors on thunderbolt connected XHCI controller

From: Hans de Goede
Date: Fri Nov 27 2020 - 06:41:58 EST


Hi,

On 11/24/20 11:27 AM, Christoph Hellwig wrote:
> On Mon, Nov 23, 2020 at 03:49:09PM +0100, Hans de Goede wrote:
>> Hi,
>>
>> +Cc Christoph Hellwig <hch@xxxxxx>
>>
>> Christoph, this is still an issue, so I've been looking around a bit and think this
>> might have something to do with the dma-mapping-5.10 changes.
>>
>> Do you have any suggestions to debug this, or is it time to do a git bisect
>> on this before 5.10 ships with regression?
>
> Given that DMAR prefix this seems to be about using intel-iommu + bounce
> buffering for external devices. I can't really think of anything specific
> in 5.10 related to that, so maybe you'll need to bisect.
>
> I doub this means we are actually leaking swiotlb buffers, so while
> I'm pretty sure we broke something in lower layers this also means
> xhci doesn't handle swiotlb operation very gracefully in general.

I've done a git bisect, and the result is somewhat surprising. The git-bisect
points to:

commit 558033c2828f ("uas: fix sdev->host->dma_dev")

Use scsi_add_host_with_dma() instead of scsi_add_host().

When the scsi request queue is initialized/allocated, hw_max_sectors is clamped
to the dma max mapping size. Therefore, the correct device that should be used
for the clamping needs to be set.

The same clamping is still needed in uas as hw_max_sectors could be changed
there. The original clamping would be invalidated in such cases.

I do have an UAS drive connected to the thunderbolt-dock, so I guess that this
change is causing the UAS driver to gobble all all available swiotlb space.

Regards,

Hans