Re: [PATCH net v2 1/2] page_pool: fix timing for checking and disabling napi_local

From: Yunsheng Lin
Date: Tue Oct 01 2024 - 21:53:18 EST

Next message: Daniel Mentz: "[PATCH] iommu/arm-smmu-v3: Fix last_sid_idx calculation for sid_bits==32"
Previous message: Rob Herring: "Re: [PATCH v4 5/5] dt-bindings: display: mediatek: dpi: correct power-domains property"
In reply to: Paolo Abeni: "Re: [PATCH net v2 1/2] page_pool: fix timing for checking and disabling napi_local"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 10/1/2024 7:30 PM, Paolo Abeni wrote:

...

@@ -828,6 +837,9 @@ void page_pool_put_unrefed_netmem(struct page_pool *pool, netmem_ref netmem,
          recycle_stat_inc(pool, ring_full);
          page_pool_return_page(pool, netmem);
      }
+
+    if (!allow_direct_orig)
+        rcu_read_unlock();

What about always acquiring the rcu lock? would that impact performances negatively?

If not, I think it's preferable, as it would make static checker happy.

As mentioned in cover letter, the overhead is about ~2ns
I guess it is the 'if' checking before rcu_read_unlock that static
checker is not happy about, there is also a 'if' checking before
the 'destroy_lock' introduced in patch 2, maybe '__cond_acquires'
can be used to make static checker happy?

}
EXPORT_SYMBOL(page_pool_put_unrefed_netmem);

[...]

@@ -1121,6 +1140,12 @@ void page_pool_destroy(struct page_pool *pool)
          return;
      page_pool_disable_direct_recycling(pool);
+
+    /* Wait for the freeing side see the disabling direct recycling setting
+     * to avoid the concurrent access to the pool->alloc cache.
+     */
+    synchronize_rcu();

When turning on/off a device with a lot of queues, the above could introduce a lot of long waits under the RTNL lock, right?

What about moving the trailing of this function in a separate helper and use call_rcu() instead?

For this patch, yes, it can be done.
But patch 2 also rely on the rcu lock in this patch to ensure that free
side is synchronized with the destroy path, and the dma mapping done for
the inflight pages in page_pool_item_uninit() can not be done in call_rcu(), as the driver might have unbound when RCU callback is
called, which might defeat the purpose of patch 2.

Maybe an optimization here is to only call synchronize_rcu() when there
are some inflight pages and pool->dma_map is set.

Next message: Daniel Mentz: "[PATCH] iommu/arm-smmu-v3: Fix last_sid_idx calculation for sid_bits==32"
Previous message: Rob Herring: "Re: [PATCH v4 5/5] dt-bindings: display: mediatek: dpi: correct power-domains property"
In reply to: Paolo Abeni: "Re: [PATCH net v2 1/2] page_pool: fix timing for checking and disabling napi_local"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]