Re: [PATCH] net: tuntap: add ioctl() TUNGETQUEUEINDX to fetch queue index

From: Randy Li
Date: Wed Aug 07 2024 - 14:54:31 EST


Hello Willem

On 2024/8/2 23:10, Willem de Bruijn wrote:
Randy Li wrote:
On 2024/8/1 22:17, Willem de Bruijn wrote:
Randy Li wrote:
On 2024/8/1 21:04, Willem de Bruijn wrote:
Randy Li wrote:
On 2024/8/1 05:57, Willem de Bruijn wrote:
nits:

- INDX->INDEX. It's correct in the code
- prefix networking patches with the target tree: PATCH net-next
I see.
Randy Li wrote:
On 2024/7/31 22:12, Willem de Bruijn wrote:
Randy Li wrote:
We need the queue index in qdisc mapping rule. There is no way to
fetch that.
In which command exactly?
That is for sch_multiq, here is an example

tc qdisc add dev  tun0 root handle 1: multiq

tc filter add dev tun0 parent 1: protocol ip prio 1 u32 match ip dst
172.16.10.1 action skbedit queue_mapping 0
tc filter add dev tun0 parent 1: protocol ip prio 1 u32 match ip dst
172.16.10.20 action skbedit queue_mapping 1

tc filter add dev tun0 parent 1: protocol ip prio 1 u32 match ip dst
172.16.10.10 action skbedit queue_mapping 2
If using an IFF_MULTI_QUEUE tun device, packets are automatically
load balanced across the multiple queues, in tun_select_queue.

If you want more explicit queue selection than by rxhash, tun
supports TUNSETSTEERINGEBPF.
I know this eBPF thing. But I am newbie to eBPF as well I didn't figure
out how to config eBPF dynamically.
Lack of experience with an existing interface is insufficient reason
to introduce another interface, of course.
tc(8) was old interfaces but doesn't have the sufficient info here to
complete its work.
tc is maintained.

I think eBPF didn't work in all the platforms? JIT doesn't sound like a
good solution for embeded platform.

Some VPS providers doesn't offer new enough kernel supporting eBPF is
another problem here, it is far more easy that just patching an old
kernel with this.
We don't add duplicative features because they are easier to
cherry-pick to old kernels.
I was trying to say the tc(8) or netlink solution sound more suitable
for general deploying.
Anyway, I would learn into it while I would still send out the v2 of
this patch. I would figure out whether eBPF could solve all the problem
here.
Most importantly, why do you need a fixed mapping of IP address to
queue? Can you explain why relying on the standard rx_hash based
mapping is not sufficient for your workload?
Server

  |

  |------ tun subnet (e.x. 172.16.10.0/24) ------- peer A (172.16.10.1)

|------ peer B (172.16.10.3)

|------  peer C (172.16.10.20)

I am not even sure the rx_hash could work here, the server here acts as
a router or gateway, I don't know how to filter the connection from the
external interface based on rx_hash. Besides, VPN application didn't
operate on the socket() itself.

I think this question is about why I do the filter in the kernel not the
userspace?

It would be much more easy to the dispatch work in kernel, I only need
to watch the established peer with the help of epoll(). Kernel could
drop all the unwanted packets. Besides, if I do the filter/dispatcher
work in the userspace, it would need to copy the packet's data to the
userspace first, even decide its fate by reading a few bytes from its
beginning offset. I think we can avoid such a cost.
A custom mapping function is exactly the purpose of TUNSETSTEERINGEBPF.

Please take a look at that. It's a lot more elegant than going through
userspace and then inserting individual tc skbedit filters.

I checked how this socket filter works, I think we still need this serial of patch.

If I was right, this eBPF doesn't work like a regular socket filter. The eBPF's return value here means the target queue index not the size of the data that we want to keep from the sk_buf parameter's buf.

Besides, according to https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_SOCKET_FILTER/

I think the eBPF here can modify neither queue_mapping field nor hash field here.

See SKF_AD_QUEUE for classic BPF and __sk_buff queue_mapping for eBPF.

Is it a map type BPF_MAP_TYPE_QUEUE?

Besides, I think the eBPF in TUNSETSTEERINGEBPF would NOT take queue_mapping.

If I want to drop packets for unwanted destination, I think TUNSETFILTEREBPF is what I need?

That would lead to lookup the same mapping table twice, is there a better way for the CPU cache?