Re: [PATCH] net: tuntap: add ioctl() TUNGETQUEUEINDX to fetch queue index
From: Jason Wang
Date: Wed Aug 07 2024 - 23:37:19 EST
On Thu, Aug 8, 2024 at 11:12 AM Willem de Bruijn
<willemdebruijn.kernel@xxxxxxxxx> wrote:
>
> > > In that case, a tc egress tc_bpf program may be able to do both.
> > > Again, by writing to __sk_buff queue_mapping. Instead of u32 +
> > > skbedit.
> > >
> > > See also
> > >
> > > "
> > > commit 74e31ca850c1cddeca03503171dd145b6ce293b6
> > > Author: Jesper Dangaard Brouer <brouer@xxxxxxxxxx>
> > > Date: Tue Feb 19 19:53:02 2019 +0100
> > >
> > > bpf: add skb->queue_mapping write access from tc clsact
> > > "
> > >
> > > But I suppose you could prefer u32 + skbedit.
> > >
> > > Either way, the pertinent point is that you want to map some flow
> > > match to a specific queue id.
> > >
> > > This is straightforward if all queues are opened and none are closed.
> > > But it is not if queues can get detached and attached dynamically.
> > > Which I guess you encounter in practice?
> > >
> > > I'm actually not sure how the current `tfile->queue_index =
> > > tun->numqueues;` works in that case. As __tun_detach will do decrement
> > > `--tun->numqueues;`. So multiple tfiles could end up with the same
> > > queue_index. Unless dynamic detach + attach is not possible.
> >
> > It is expected to work, otherwise there should be a bug.
> >
> > > But it
> > > seems it is. Jason, if you're following, do you know this?
> >
> > __tun_detach() will move the last tfile in the tfiles[] array to the
> > current tfile->queue_index, and modify its queue_index:
> >
> > rcu_assign_pointer(tun->tfiles[index],
> > tun->tfiles[tun->numqueues - 1]);
> > ntfile = rtnl_dereference(tun->tfiles[index]);
> > ntfile->queue_index = index;
> > rcu_assign_pointer(tun->tfiles[tun->numqueues - 1],
> > NULL);
> >
> > --tun->numqueues;
> >
> > tun_attach() will move the detached tfile to the end of the tfiles[]
> > array and enable it:
> >
> >
> > tfile->queue_index = tun->numqueues;
> > ....
> > rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
> > tun->numqueues++;
> >
>
> Ah right. Thanks. I had forgotten about that.
>
> So I guess an application that owns all the queues could keep track of
> the queue-id to FD mapping. But it is not trivial, nor defined ABI
> behavior.
Yes and this is because tun allows arbitrary tfile to be detached
while the networking core assumes the queues are continuously. If
application want a stable queue id it needs to "understand" the kernel
behaviour somehow (unfortunately), see below.
>
> Querying the queue_id as in the proposed patch might not solve the
> challenge, though. Since an FD's queue-id may change simply because
> another queue was detached. So this would have to be queried on each
> detach.
Let's say we have 4 tunfd.
tunfd 0 -> queue 0
...
tunfd 3 -> queue 3
To have a stable queue id it needs to detach from the last tunfd and
attach in the reverse order.
E.g if it want to detach 2 tfiles, it needs to
detach tunfd3, detach tunfd2
And it if want to attach 2 tfiles back it needs to
attach tunfd2, attach tunfd3
This is how Qemu works in the case of multiqueue tuntap and RSS. This
allows the qemu doesn't need to deduce if the mapping between
queue_index and tfile changes. But it means if kernel behavior changes
in the future, it may break a lot of usersapce.
>
> I suppose one underlying question is how important is the mapping of
> flows to specific queue-id's?
For example to implement steering offload to kernel. E.g RSS,
userspace needs to know the mapping between queue id and tfile.
> Is it a problem if the destination queue
> for a flow changes mid-stream?
>
I think this is a good question. It's about whether or not we can make
the current kernel behaviour as part of ABI.
If yes, there's no need for the explicit getindex assuming the tunfd
is created by the application itself , as the queue_index could be
deduced from the behaviour of the application itself. But it might not
be all, if tuntap fd were passed through unix domain socket, the
application that receives those file descriptors might need to know
the queue_index mappings.
If no, allow such query might be useful. It might give a chance for
the user to know the mapping as you pointed out. But this patch needs
some tweak as when the tfile is detached, the queue_index is
meaningless in that case.
Thanks