Re: [PATCH net-next v7 3/8] page_pool: fix IOMMU crash when driver has already unbound

From: Jesper Dangaard Brouer
Date: Fri Jan 17 2025 - 11:56:54 EST




On 17/01/2025 12.56, Yunsheng Lin wrote:
On 2025/1/17 0:09, Jesper Dangaard Brouer wrote:

...

Mainly because there is no space available for keeping tracking of inflight
pages, using page->pp can only find the page_pool owning the page, but page_pool
is not able to keep track of the inflight page when the page is handled by
networking stack.

By using page_pool_item as below, the state is used to tell if a specific
item is being used/dma mapped or not by scanning all the item blocks in
pool->item_blocks. If a specific item is used by a page, then 'pp_netmem'
will point to that page so that dma unmapping can be done for that page
when page_pool_destroy() is called, otherwise free items sit in the
pool->hold_items or pool->release_items by using 'lentry':

struct page_pool_item {
    unsigned long state;
    union {
        netmem_ref pp_netmem;
        struct llist_node lentry;
    };
};

pahole  -C page_pool_item vmlinux
struct page_pool_item {
    /* An 'encoded_next' is a pointer to next item, lower 2 bits is used to
     * indicate the state of current item.
     */
    long unsigned int          encoded_next;     /*     0     8 */
    union {
        netmem_ref         pp_netmem;        /*     8     8 */
        struct llist_node  lentry;           /*     8     8 */
    };                                           /*     8     8 */

    /* size: 16, cachelines: 1, members: 2 */
    /* last cacheline: 16 bytes */
};


When a page is added to the page_pool, a item is deleted from pool->hold_items
or pool->release_items and set the 'pp_netmem' pointing to that page and set
'state' accordingly in order to keep track of that page.

When a page is deleted from the page_pool, it is able to tell which page_pool
this page belong to by using the below function, and after clearing the 'state',
the item is added back to pool->release_items so that the item is reused for new
pages.


To understand below, I'm listing struct page_pool_item_block for other
reviewers:

pahole  -C page_pool_item_block vmlinux
struct page_pool_item_block {
    struct page_pool *         pp;               /*     0     8 */
    struct list_head           list;             /*     8    16 */
    unsigned int               flags;            /*    24     4 */
    refcount_t                 ref;              /*    28     4 */
    struct page_pool_item      items[];          /*    32     0 */

    /* size: 32, cachelines: 1, members: 5 */
    /* last cacheline: 32 bytes */
};

static inline struct page_pool_item_block *
page_pool_item_to_block(struct page_pool_item *item)
{
    return (struct page_pool_item_block *)((unsigned long)item & PAGE_MASK);

This trick requires some comments explaining what is going on!
Please correct me if I'm wrong: Here you a masking off the lower bits of
the pointer to page_pool_item *item, as you know that a struct
page_pool_item_block is stored in the top of a struct page.  This trick
is like a "container_of" for going from page_pool_item to
page_pool_item_block, right?

Yes, you are right.


I do notice that you have a comment above struct page_pool_item_block
(that says "item_block is always PAGE_SIZE"), which is nice, but to be
more explicit/clear:
 I want a big comment block (placed above the main code here) that
explains the design and intention behind this newly invented
"item-block" scheme, like e.g. the connection between
page_pool_item_block and page_pool_item. Like the advantage/trick that
allows page->pp pointer to be an "item" and be mapped back to a "block"
to find the page_pool object it belongs to.  Don't write *what* the code
does, but write about the intended purpose and design reasons behind the
code.

The comment for page_pool_item_block is below, it seems I also wrote about
intended purpose and design reasons here.

/* The size of item_block is always PAGE_SIZE, so that the address of item_block
* for a specific item can be calculated using 'item & PAGE_MASK'
*/

Anyway, If putting something like above for page_pool_item_to_block() does
make it clearer, will add some comment for page_pool_item_to_block() too.



}

  static inline struct page_pool *page_pool_get_pp(struct page *page)
  {
       return page_pool_item_to_block(page->pp_item)->pp;
  }



+static void __page_pool_item_init(struct page_pool *pool, struct page *page)
+{

Function name is confusing.  First I though this was init'ing a single
item, but looking at the code it is iterating over ITEMS_PER_PAGE.

Maybe it should be called page_pool_item_block_init ?

The __page_pool_item_init() is added to make the below
page_pool_item_init() function more readable or maintainable, changing
it to page_pool_item_block_init doesn't seems consistent?

You (of-cause) also have to rename the other function, I though that was
implicitly understood.

BUT does my suggested rename make sense?  What I'm seeing is that all
the *items* in the "block" is getting inited. But we are also setting up
the "block" (e.g.  "block->pp=pool").

I am not really sure about that, as using the PAGE_SIZE block to hold the
item seems like a implementation detail, which might change in the future,
renaming other function to something like that doesn't seem right to me IMHO.

Also the next patch will add page_pool_item_blk_add() to support unlimited
inflight pages, it seems a better name is needed for that too, perheps rename
page_pool_item_blk_add() to page_pool_dynamic_item_add()?


Hmmm... not sure about this.
I think I prefer page_pool_item_blk_add() over page_pool_dynamic_item_add().

For __page_pool_item_init(), perhaps just inline it back to page_pool_item_init()
as __page_pool_item_init() is only used by page_pool_item_init(), and both of them
are not really large function.

I like that you had a helper function. So, don't merge __page_pool_item_init() into page_pool_item_init() just to avoid naming it differently.

Let me be more explicit what I'm asking for:

IMHO you should rename:
- __page_pool_item_init() to __page_pool_item_block_init()
and rename:
- page_pool_item_init() to page_pool_item_block_init()

I hope this make it more clear what I'm saying.

--Jesper