Re: [PATCH RFC v4 3/3] page_pool: skip dma sync operation for inflight pages

From: Yunsheng Lin
Date: Fri Nov 22 2024 - 02:21:00 EST


On 2024/11/21 21:44, Robin Murphy wrote:
> On 21/11/2024 8:04 am, Yunsheng Lin wrote:
>> On 2024/11/21 0:17, Robin Murphy wrote:
>>> On 20/11/2024 10:34 am, Yunsheng Lin wrote:
>>>> Skip dma sync operation for inflight pages before the
>>>> page_pool_destroy() returns to the driver as DMA API
>>>> expects to be called with a valid device bound to a
>>>> driver as mentioned in [1].
>>>>
>>>> After page_pool_destroy() is called, the page is not
>>>> expected to be recycled back to pool->alloc cache and
>>>> dma sync operation is not needed when the page is not
>>>> recyclable or pool->ring is full, so only skip the dma
>>>> sync operation for the infilght pages by clearing the
>>>> pool->dma_sync under protection of rcu lock when page
>>>> is recycled to pool->ring to ensure that there is no
>>>> dma sync operation called after page_pool_destroy() is
>>>> returned.
>>>
>>> Something feels off here - either this is a micro-optimisation which I wouldn't really expect to be meaningful, or it means patch #2 doesn't actually do what it claims. If it really is possible to attempt to dma_sync a page *after* page_pool_inflight_unmap() has already reclaimed and unmapped it, that represents yet another DMA API lifecycle issue, which as well as being even more obviously incorrect usage-wise, could also still lead to the same crash (if the device is non-coherent).
>>
>> For a page_pool owned page, it mostly goes through the below steps:
>> 1. page_pool calls buddy allocator API to allocate a page, call DMA mapping
>>     and sync_for_device API for it if its pool is empty. Or reuse the page in
>>     pool.
>>
>> 2. Driver calls the page_pool API to allocate the page, and pass the page
>>     to network stack after packet is dma'ed into the page and the sync_for_cpu
>>     API is called.
>>
>> 3. Network stack is done with page and called page_pool API to free the page.
>>
>> 4. page_pool releases the page back to buddy allocator if the page is not
>>     recyclable before doing the dma unmaping. Or do the sync_for_device
>>     and put the page in the its pool, the page might go through step 1
>>     again if the driver calls the page_pool allocate API.
>>
>> The calling of dma mapping and dma sync API is controlled by pool->dma_map
>> and pool->dma_sync respectively, the previous patch only clear pool->dma_map
>> after doing the dma unmapping. This patch ensures that there is no dma_sync
>> for recycle case of step 4 by clearing pool->dma_sync.
>
> But *why* does it want to ensure that? Is there some possible race where one thread can attempt to sync and recycle a page while another thread is attempting to unmap and free it, such that you can't guarantee the correctness of dma_sync calls after page_pool_inflight_unmap() has started, and skipping them is a workaround for that? If so, then frankly I think that would want solving properly, but at the very least this change would need to come before patch #2.

The racing window is something like below. page_pool_destroy() and
page_pool_put_page() can be called concurrently, patch 2 only use
a spinlock to synchronise page_pool_inflight_unmap() with
page_pool_return_page() called by page_pool_put_page() to avoid
concurrent dma unmapping, there is no synchronization between
page_pool_destroy() and page_pool_dma_sync_for_device() called
by page_pool_put_page():
CPU0 CPU1
. .
page_pool_destroy() page_pool_put_page()
. .
synchronize_rcu() .
. .
page_pool_inflight_unmap() .
. .
. __page_pool_put_page()
. .
. page_pool_dma_sync_for_device()
. .

After this patch, page_pool_dma_sync_for_device() is protected by
rcu lock and pool->dma_sync is cleared before synchronize_rcu and
page_pool_inflight_unmap() is called after synchronize_rcu to ensure
page_pool_dma_sync_for_device() will not call dma sync API after
synchronize_rcu():

CPU0 CPU1
. .
page_pool_destroy() CPU page_pool_put_page() CPU
. .
pool->dma_sync = false .
. .
synchronize_rcu() .
. .
page_pool_inflight_unmap() .
. .
. page_pool_recycle_in_ring()
. .
. rcu_read_lock()
. page_pool_dma_sync_for_device()
. rcu_read_unlock()

Previously patch 2&3 was combined as one patch, this version splits
it out to make it more reviewable.
I am not sure if it matters that much about the patch order as the
fix doesn't seem to be completed unless both patches are included.