Re: Re: [PATCH] xsk: Fix race condition in AF_XDP generic RX path

From: e.kubanski
Date: Wed Apr 09 2025 - 10:20:58 EST

Next message: Huacai Chen: "Re: [PATCH 1/1] LoongArch: Introduce the numa_memblks conversion"
Previous message: Deepak Gupta: "Re: [PATCH v12 10/28] riscv/mm: Implement map_shadow_stack() syscall"
In reply to: Magnus Karlsson: "Re: [PATCH] xsk: Fix race condition in AF_XDP generic RX path"
Next in thread: Magnus Karlsson: "Re: Re: [PATCH] xsk: Fix race condition in AF_XDP generic RX path"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> I do not fully understand what you are doing in user space. Could you
> please provide a user-space code example that will trigger this
> problem?

We want to scale single hardware queue AF_XDP setup to
receive packets on multiple threads through RPS mechanisms.
The problem arises when RPS is enabled in the kernel.
In this situation single hardware queue flow can scale across
multiple CPU cores. Then we perform XDP/eBPF load-balancing
to multiple sockets, by using CPU_ID of issued XDP call.

Every socket is binded to queue number 0, device has single queue.

User-space socket setup looks more-or-less like that (with libxdp):
```
xsk_ring_prod fq{};
xsk_ring_cons cq{};

xsk_umem_config umem_cfg{ ... };
xsk_umem* umem;
auto result = xsk_umem__create(&umem, umem_memory, pool_size_bytes, &fq, &cq, &umem_cfg);

...

xsk_socket_config xsk_cfg{
...
.xdp_flags = XDP_FLAGS_SKB_MODE,
...
};

xsk_socket* sock1{nullptr};
xsk_ring_cons rq1{};
xsk_ring_prod tq1{};
auto result = xsk_socket__create_shared(
&sock1,
device_name,
0,
&rq1,
&tq1,
&fq,
&cq,
&cfg
);

xsk_socket* sock2{nullptr};
xsk_ring_cons rq2{};
xsk_ring_prod tq2{};
auto result = xsk_socket__create_shared(
&sock2,
device_name,
0,
&rq2,
&tq2,
&fq,
&cq,
&cfg
);

...
```

We're working on cloud native deploymetns, where
it's not possible to scale RX through RSS mechanism only.

That's why we wanted to use RPS to scale not only
user-space processing but also XDP processing.

This patch effectively allows us to use RPS to scale XDP
in Generic mode.

The same goes for RPS disabled, where we use MACVLAN

child device attached to parent device with multiple queues.
In this situation MACVLAN allows for multi-core kernel-side
processing, but xsk_buff_pool isn't protected.

We can't do any passthrough in this situation, we must rely
on MACVLAN with single RX/TX queue pair.

Of course this is not a problem in situation where every device
packet is processed on single core.

> Please note that if you share an Rx ring or the fill ring between
> processes/threads, then you have to take care about mutual exclusion
> in user space.

Of course, RX/TX/FILL/COMP are SPSC queues, we included mutual
exclusion for FILL/COMP because RX/TX are accessed by single thread.
Im doing single process deployment with multiple threads, where every
thread has it's own AF_XDP socket and pool is shared across threads.

> If you really want to do this, it is usually a better
> idea to use the other shared umem mode in which each process gets its
> own rx and fill ring, removing the need for mutual exclusion.

If I understand AF_XDP architecture correctly it's not possible for single
queue deployment, or maybe Im missing something? We need to maintain
single FILL/COMP pair per device queue.

Next message: Huacai Chen: "Re: [PATCH 1/1] LoongArch: Introduce the numa_memblks conversion"
Previous message: Deepak Gupta: "Re: [PATCH v12 10/28] riscv/mm: Implement map_shadow_stack() syscall"
In reply to: Magnus Karlsson: "Re: [PATCH] xsk: Fix race condition in AF_XDP generic RX path"
Next in thread: Magnus Karlsson: "Re: Re: [PATCH] xsk: Fix race condition in AF_XDP generic RX path"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]