Re: [PATCH RFC v4 2/3] page_pool: fix IOMMU crash when driver has already unbound

From: Yunsheng Lin
Date: Wed Dec 04 2024 - 06:18:05 EST


On 2024/11/28 0:27, Alexander Duyck wrote:

...

>
> My general thought would be to see if there is anything we could
> explore within the DMA API itself to optimize the handling for this
> sort of bulk unmap request. If not we could fall back to an approach
> that requires more overhead and invalidation of individual pages.
>
> You could think of it like the approach that has been taken with
> DEFINED_DMA_UNMAP_ADDR/LEN. Basically there are cases where this can
> be done much more quickly and it is likely we can clean up large
> swaths in one go. So why not expose a function that might be able to
> take advantage of that for exception cases like this surprise device
> removal.

I am not sure if I understand the 'surprise device removal' part, it
seems to be about calling the DMA API after the driver has already
unbound, which includes the normal driver unloading too as my
understanding.

For the dma sync API, it seems there is already an existing API to
check if the dma sync API is needed for a specific device:
dma_dev_need_sync(). And it seems that the API is not really reliable
as it might return different value during the lifetime of a driver
instance, see dma_reset_need_sync() called in swiotlb_tbl_map_single().

For the dma unmap API, the below patch implemented something similar to
check if the dma unmap API is needed for a specific device, it seems
to be unreliable too as the dma_dev_need_sync() does as they both depend
on the dev->dma_skip_sync.

Even if there is a reliable way to do the checking, it seems the
complexity‌ might be still needed for the case of not being able to skip
the DMA API.
As the main concerns seems to be about supporting unlimting inflight
pages and performance overhead, if there is no other better idea of
not tracking the inflight pages, perhaps it is better to go back to
the tracking the inflight pages way by supporting unlimting inflight
page and avoiding performance overhead as much as possible.

1. https://lore.kernel.org/linux-pci/b912495d307d92ac7071553db99b3badc477fb12.1731244445.git.leon@xxxxxxxxxx/

>