Re: [PATCH RFC v4 2/3] page_pool: fix IOMMU crash when driver has already unbound

From: Jesper Dangaard Brouer
Date: Tue Nov 26 2024 - 05:24:20 EST

Next message: Neil Armstrong: "[PATCH 1/3] dt-bindings: PCI: qcom,pcie-sm8550: document 'global' interrupt"
Previous message: Neil Armstrong: "[PATCH 2/3] arm64: dts: qcom: sm8550: Add 'global' interrupt to the PCIe RC nodes"
In reply to: Yunsheng Lin: "Re: [PATCH RFC v4 2/3] page_pool: fix IOMMU crash when driver has already unbound"
Next in thread: Yunsheng Lin: "Re: [PATCH RFC v4 2/3] page_pool: fix IOMMU crash when driver has already unbound"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 26/11/2024 09.22, Yunsheng Lin wrote:

On 2024/11/25 23:25, Jesper Dangaard Brouer wrote:

...

+
   void page_pool_destroy(struct page_pool *pool)
   {
       if (!pool)
@@ -1139,6 +1206,8 @@ void page_pool_destroy(struct page_pool *pool)
        */
       synchronize_rcu();
   +    page_pool_inflight_unmap(pool);
+

Reaching here means we have detected in-flight packets/pages.

In "page_pool_inflight_unmap" we scan and find those in-flight pages to
DMA unmap them. Then below we wait for these in-flight pages again.
Why don't we just "release" (page_pool_release_page) those in-flight
pages from belonging to the page_pool, when we found them during scanning?

If doing so, we can hopefully remove the periodic checking code below.

I thought about that too, but it means more complicated work than just
calling the page_pool_release_page() as page->pp_ref_count need to be
converted into page->_refcount for the above to work, it seems hard to
do that with least performance degradation as the racing against
page_pool_put_page() being called concurrently.

Maybe we can have a design that avoid/reduce concurrency. Can we
convert the suggested pool->destroy_lock into an atomic?
(Doing an *atomic* READ in page_pool_return_page, should be fast if we
keep this cache in in (cache coherence) Shared state).

In your new/proposed page_pool_return_page() when we see the
"destroy_cnt" (now atomic READ) bigger than zero, then we can do nothing
(or maybe we need decrement page-refcnt?), as we know the destroy code

Is it valid to have a page->_refcount of zero when page_pool still own
the page if we only decrement page->_refcount and not clear page->pp_magic?

No, page_pool keeps page->_refcount equal 1 (for "release") and also clears page->pp_magic.

What happens if put_page() is called from other subsystem for a page_pool
owned page, isn't that mean the page might be returned to buddy page
allocator, causing use-after-free problem?

Notice that page_pool_release_page() didn't decrement page refcnt.
It disconnects a page (from a page_pool). To allow it to be used as
a regular page (that will eventually be returned to the normal
page-allocator via put_page).

will be taking care of "releasing" the pages from the page pool.

If page->_refcount is not decremented in page_pool_return_page(), how
does page_pool_destroy() know if a specific page have been called with
page_pool_return_page()? Does an extra state is needed to indicate that?

Good point. In page_pool_return_page(), we will need to handle the two cases. (1) page still belongs to a page_pool, (2) page have been released to look like a normal page. For (2) we do need to call put_page(). For (1) yes we would either need some extra state, such that page_pool_destroy() knows, or a lock like this patchset.

And there might still be concurrency between checking/handling of the extra
state in page_pool_destroy() and the setting of extra state in
page_pool_return_page(), something like lock might still be needed to avoid
the above concurrency.

I agree, we (likely) cannot avoid this lock.

Once the a page is release from a page pool it becomes a normal page,
that adhere to normal page refcnt'ing. That is how it worked before with
page_pool_release_page().
The later extensions with page fragment support and devmem might have
complicated this code path.

As page_pool_return_page() and page_pool_destroy() both try to "release"
the page concurrently for a specific page, I am not sure how using some
simple *atomic* can avoid this kind of concurrency even before page
fragment and devmem are supported, it would be good to be more specific
about that by using some pseudocode.

Okay, some my simple atomic idea will not work.

NEW IDEA:

So, the my concern in this patchset is that BH-disabling spin_lock pool->destroy_lock is held in the outer loop of page_pool_inflight_unmap() that scans all pages. Disabling BH for this long have nasty side-effects.

Will it be enough to grab the pool->destroy_lock only when we detect a page that belongs to our page pool? Of-cause after obtaining the lock. the code need to recheck if the page still belongs to the pool.

I looked at it more closely, previously page_pool_put_page() seemed to
not be allowed to be called after page_pool_release_page() had been
called for a specific page mainly because of concurrently checking/handlig
and clearing of page->pp_magic if I understand it correctly:
https://elixir.bootlin.com/linux/v5.16.20/source/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c#L5316

As I said above
The page_pool_release_page() didn't decrement page refcnt.
It disconnects a page (from a page_pool). To allow it to be used as
a regular page (that will eventually be returned to the normal
page-allocator via put_page).

--Jesper

Next message: Neil Armstrong: "[PATCH 1/3] dt-bindings: PCI: qcom,pcie-sm8550: document 'global' interrupt"
Previous message: Neil Armstrong: "[PATCH 2/3] arm64: dts: qcom: sm8550: Add 'global' interrupt to the PCIe RC nodes"
In reply to: Yunsheng Lin: "Re: [PATCH RFC v4 2/3] page_pool: fix IOMMU crash when driver has already unbound"
Next in thread: Yunsheng Lin: "Re: [PATCH RFC v4 2/3] page_pool: fix IOMMU crash when driver has already unbound"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]