Re: [PATCH] net: tuntap: add ioctl() TUNGETQUEUEINDX to fetch queue index

From: Willem de Bruijn
Date: Wed Aug 07 2024 - 22:11:47 EST


> >> I think this question is about why I do the filter in the kernel not the
> >> userspace?
> >>
> >> It would be much more easy to the dispatch work in kernel, I only need
> >> to watch the established peer with the help of epoll(). Kernel could
> >> drop all the unwanted packets. Besides, if I do the filter/dispatcher
> >> work in the userspace, it would need to copy the packet's data to the
> >> userspace first, even decide its fate by reading a few bytes from its
> >> beginning offset. I think we can avoid such a cost.
> > A custom mapping function is exactly the purpose of TUNSETSTEERINGEBPF.
> >
> > Please take a look at that. It's a lot more elegant than going through
> > userspace and then inserting individual tc skbedit filters.
>
> I checked how this socket filter works, I think we still need this
> serial of patch.
>
> If I was right, this eBPF doesn't work like a regular socket filter. The
> eBPF's return value here means the target queue index not the size of
> the data that we want to keep from the sk_buf parameter's buf.

TUNSETSTEERINGEBPF is a queue selection mechanism for multi-queue tun devices.

It replaces the need to set skb->queue_mapping.

See also

"
commit 96f84061620c6325a2ca9a9a05b410e6461d03c3
Author: Jason Wang <jasowang@xxxxxxxxxx>
Date: Mon Dec 4 17:31:23 2017 +0800

tun: add eBPF based queue selection method
"

> Besides, according to
> https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_SOCKET_FILTER/
>
> I think the eBPF here can modify neither queue_mapping field nor hash
> field here.
>
> > See SKF_AD_QUEUE for classic BPF and __sk_buff queue_mapping for eBPF.
>
> Is it a map type BPF_MAP_TYPE_QUEUE?
>
> Besides, I think the eBPF in TUNSETSTEERINGEBPF would NOT take
> queue_mapping.

It obviates the need.

It sounds like you want to both filter and steer to a specific queue. Why?

In that case, a tc egress tc_bpf program may be able to do both.
Again, by writing to __sk_buff queue_mapping. Instead of u32 +
skbedit.

See also

"
commit 74e31ca850c1cddeca03503171dd145b6ce293b6
Author: Jesper Dangaard Brouer <brouer@xxxxxxxxxx>
Date: Tue Feb 19 19:53:02 2019 +0100

bpf: add skb->queue_mapping write access from tc clsact
"

But I suppose you could prefer u32 + skbedit.

Either way, the pertinent point is that you want to map some flow
match to a specific queue id.

This is straightforward if all queues are opened and none are closed.
But it is not if queues can get detached and attached dynamically.
Which I guess you encounter in practice?

I'm actually not sure how the current `tfile->queue_index =
tun->numqueues;` works in that case. As __tun_detach will do decrement
`--tun->numqueues;`. So multiple tfiles could end up with the same
queue_index. Unless dynamic detach + attach is not possible. But it
seems it is. Jason, if you're following, do you know this?

> If I want to drop packets for unwanted destination, I think
> TUNSETFILTEREBPF is what I need?
>
> That would lead to lookup the same mapping table twice, is there a
> better way for the CPU cache?

I agree that two programs should not be needed.