Re: [PATCH] net: tuntap: add ioctl() TUNGETQUEUEINDX to fetch queue index
From: Jason Wang
Date: Wed Aug 07 2024 - 22:50:18 EST
On Thu, Aug 8, 2024 at 10:11 AM Willem de Bruijn
<willemdebruijn.kernel@xxxxxxxxx> wrote:
>
> > >> I think this question is about why I do the filter in the kernel not the
> > >> userspace?
> > >>
> > >> It would be much more easy to the dispatch work in kernel, I only need
> > >> to watch the established peer with the help of epoll(). Kernel could
> > >> drop all the unwanted packets. Besides, if I do the filter/dispatcher
> > >> work in the userspace, it would need to copy the packet's data to the
> > >> userspace first, even decide its fate by reading a few bytes from its
> > >> beginning offset. I think we can avoid such a cost.
> > > A custom mapping function is exactly the purpose of TUNSETSTEERINGEBPF.
> > >
> > > Please take a look at that. It's a lot more elegant than going through
> > > userspace and then inserting individual tc skbedit filters.
> >
> > I checked how this socket filter works, I think we still need this
> > serial of patch.
> >
> > If I was right, this eBPF doesn't work like a regular socket filter. The
> > eBPF's return value here means the target queue index not the size of
> > the data that we want to keep from the sk_buf parameter's buf.
>
> TUNSETSTEERINGEBPF is a queue selection mechanism for multi-queue tun devices.
>
> It replaces the need to set skb->queue_mapping.
>
> See also
>
> "
> commit 96f84061620c6325a2ca9a9a05b410e6461d03c3
> Author: Jason Wang <jasowang@xxxxxxxxxx>
> Date: Mon Dec 4 17:31:23 2017 +0800
>
> tun: add eBPF based queue selection method
> "
Exactly.
>
> > Besides, according to
> > https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_SOCKET_FILTER/
> >
> > I think the eBPF here can modify neither queue_mapping field nor hash
> > field here.
> >
> > > See SKF_AD_QUEUE for classic BPF and __sk_buff queue_mapping for eBPF.
> >
> > Is it a map type BPF_MAP_TYPE_QUEUE?
> >
> > Besides, I think the eBPF in TUNSETSTEERINGEBPF would NOT take
> > queue_mapping.
>
> It obviates the need.
>
> It sounds like you want to both filter and steer to a specific queue. Why?
>
> In that case, a tc egress tc_bpf program may be able to do both.
> Again, by writing to __sk_buff queue_mapping. Instead of u32 +
> skbedit.
>
> See also
>
> "
> commit 74e31ca850c1cddeca03503171dd145b6ce293b6
> Author: Jesper Dangaard Brouer <brouer@xxxxxxxxxx>
> Date: Tue Feb 19 19:53:02 2019 +0100
>
> bpf: add skb->queue_mapping write access from tc clsact
> "
>
> But I suppose you could prefer u32 + skbedit.
>
> Either way, the pertinent point is that you want to map some flow
> match to a specific queue id.
>
> This is straightforward if all queues are opened and none are closed.
> But it is not if queues can get detached and attached dynamically.
> Which I guess you encounter in practice?
>
> I'm actually not sure how the current `tfile->queue_index =
> tun->numqueues;` works in that case. As __tun_detach will do decrement
> `--tun->numqueues;`. So multiple tfiles could end up with the same
> queue_index. Unless dynamic detach + attach is not possible.
It is expected to work, otherwise there should be a bug.
> But it
> seems it is. Jason, if you're following, do you know this?
__tun_detach() will move the last tfile in the tfiles[] array to the
current tfile->queue_index, and modify its queue_index:
rcu_assign_pointer(tun->tfiles[index],
tun->tfiles[tun->numqueues - 1]);
ntfile = rtnl_dereference(tun->tfiles[index]);
ntfile->queue_index = index;
rcu_assign_pointer(tun->tfiles[tun->numqueues - 1],
NULL);
--tun->numqueues;
tun_attach() will move the detached tfile to the end of the tfiles[]
array and enable it:
tfile->queue_index = tun->numqueues;
....
rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
tun->numqueues++;
>
> > If I want to drop packets for unwanted destination, I think
> > TUNSETFILTEREBPF is what I need?
> >
> > That would lead to lookup the same mapping table twice, is there a
> > better way for the CPU cache?
>
> I agree that two programs should not be needed.
>
Thanks