Re: [PATCH net-next] memory-provider: fix compilation issue without SYSFS
From: Mina Almasry
Date: Thu Sep 12 2024 - 14:27:28 EST
On Thu, Sep 12, 2024 at 8:26 AM Matthieu Baerts <matttbe@xxxxxxxxxx> wrote:
>
> Hi Mina,
>
> Thank you for your reply!
>
> On 12/09/2024 14:49, Mina Almasry wrote:
> > On Thu, Sep 12, 2024 at 3:25 AM Matthieu Baerts (NGI0)
> > <matttbe@xxxxxxxxxx> wrote:
> >>
> >> When CONFIG_SYSFS is not set, the kernel fails to compile:
> >>
> >> net/core/page_pool_user.c:368:45: error: implicit declaration of function 'get_netdev_rx_queue_index' [-Werror=implicit-function-declaration]
> >> 368 | if (pool->slow.queue_idx == get_netdev_rx_queue_index(rxq)) {
> >> | ^~~~~~~~~~~~~~~~~~~~~~~~~
> >>
> >> When CONFIG_SYSFS is not set, get_netdev_rx_queue_index() is not defined
> >> as well. In this case, page_pool_check_memory_provider() cannot check
> >> the memory provider, and a success answer can be returned instead.
> >>
> >
> > Thanks Matthieu, and sorry about that.
> >
> > I have reproduced the build error and the fix resolves it. But...
> >
> >> Fixes: 0f9214046893 ("memory-provider: dmabuf devmem memory provider")
> >> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@xxxxxxxxxx>
> >> ---
> >> net/core/page_pool_user.c | 4 ++++
> >> 1 file changed, 4 insertions(+)
> >>
> >> diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
> >> index 48335766c1bf..a98c0a76b33f 100644
> >> --- a/net/core/page_pool_user.c
> >> +++ b/net/core/page_pool_user.c
> >> @@ -353,6 +353,7 @@ void page_pool_unlist(struct page_pool *pool)
> >> int page_pool_check_memory_provider(struct net_device *dev,
> >> struct netdev_rx_queue *rxq)
> >> {
> >> +#ifdef CONFIG_SYSFS
> >> struct net_devmem_dmabuf_binding *binding = rxq->mp_params.mp_priv;
> >> struct page_pool *pool;
> >> struct hlist_node *n;
> >> @@ -372,6 +373,9 @@ int page_pool_check_memory_provider(struct net_device *dev,
> >> }
> >> mutex_unlock(&page_pools_lock);
> >> return -ENODATA;
> >> +#else
> >> + return 0;
> >
> > ...we can't assume success when we cannot check the memory provider.
> > The memory provider check is somewhat critical; we rely on it to
> > detect that the driver does not support memory providers or is not
> > doing the right thing, and report that to the user. I don't think we
> > can silently disable the check when the CONFIG_SYSFS is disabled.
> > Please return -ENODATA or some other error here.
>
> I initially returned 0 to have the same behaviour as when
> CONFIG_PAGE_POOL is not defined. But thanks to your explanations, I
> understand it seems better to return -ENODATA here. Or another errno, to
> let the userspace understanding there is a different error? I can send a
> v2 after the 24h rate-limit period if you are OK with that.
>
Yes, -EOPNOTSUPP would be my preference here. I think it makes sense,
we should not support memory-providers on configs where core can't
verify that the driver did the right thing.
[...]
> > However, I'm looking at the definition of get_netdev_rx_queue_index()
> > and at first glance I don't see anything there that is actually
> > dependent on CONFIG_SYSFS. Can we do this instead? I have build-tested
> > it and it resolves the build issue as well:
> >
> > ```
> > diff --git a/include/net/netdev_rx_queue.h b/include/net/netdev_rx_queue.h
> > index ac34f5fb4f71..596836abf7bf 100644
> > --- a/include/net/netdev_rx_queue.h
> > +++ b/include/net/netdev_rx_queue.h
> > @@ -45,7 +45,6 @@ __netif_get_rx_queue(struct net_device *dev, unsigned int rxq)
> > return dev->_rx + rxq;
> > }
> >
> > -#ifdef CONFIG_SYSFS
> > static inline unsigned int
> > get_netdev_rx_queue_index(struct netdev_rx_queue *queue)
> > {
> > @@ -55,7 +54,6 @@ get_netdev_rx_queue_index(struct netdev_rx_queue *queue)
> > BUG_ON(index >= dev->num_rx_queues);
> > return index;
> > }
> > -#endif
> > ```
>
> I briefly looked at taking this path when I saw what this helper was
> doing, but then I saw all operations related to the received queues were
> enabled only when CONFIG_SYSFS is set, see commit a953be53ce40
> ("net-sysfs: add support for device-specific rx queue sysfs
> attributes"). I understood from that it is better not to look at
> dev->_rx or dev->num_rx_queues when CONFIG_SYSFS is not set. I'm not
> very familiar to that part of the code, but it feels like removing this
> #ifdef might be similar to the "return 0" I suggested: silently
> disabling the check, no?
>
> I *think* it might be clearer to return an error when SYSFS is not set.
>
FWIW it looks like commit e817f85652c1 ("xdp: generic XDP handling of
xdp_rxq_info") reverted almost all the CONFIG_SYSFS checks set by
commit a953be53ce40 ("net-sysfs: add support for device-specific rx
queue sysfs attributes"), at least from a quick look.
But I understand your CI is probably very annoyed by the build
failure. I would be happy to reviewed-by a patch with just the change
to the error return value, and I can look into making this work with
CONFIG_SYSFS after the merge window.
--
Thanks,
Mina