Re: [PATCH net-next V2 3/3] tun: add eBPF based queue selection method

From: Michael S. Tsirkin
Date: Wed Nov 08 2017 - 00:44:00 EST


On Wed, Nov 08, 2017 at 02:28:53PM +0900, Jason Wang wrote:
>
>
> On 2017å11æ04æ 08:56, Willem de Bruijn wrote:
> > On Fri, Nov 3, 2017 at 5:56 PM, Willem de Bruijn
> > <willemdebruijn.kernel@xxxxxxxxx> wrote:
> > > On Tue, Oct 31, 2017 at 7:32 PM, Jason Wang <jasowang@xxxxxxxxxx> wrote:
> > > > This patch introduces an eBPF based queue selection method based on
> > > > the flow steering policy ops. Userspace could load an eBPF program
> > > > through TUNSETSTEERINGEBPF. This gives much more flexibility compare
> > > > to simple but hard coded policy in kernel.
> > > >
> > > > Signed-off-by: Jason Wang <jasowang@xxxxxxxxxx>
> > > > ---
> > > > +static int tun_set_steering_ebpf(struct tun_struct *tun, void __user *data)
> > > > +{
> > > > + struct bpf_prog *prog;
> > > > + u32 fd;
> > > > +
> > > > + if (copy_from_user(&fd, data, sizeof(fd)))
> > > > + return -EFAULT;
> > > > +
> > > > + prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_SOCKET_FILTER);
> > > If the idea is to allow guests to pass BPF programs down to the host,
> > > you may want to define a new program type that is more restrictive than
> > > socket filter.
> > >
> > > The external functions allowed for socket filters (sk_filter_func_proto)
> > > are relatively few (compared to, say, clsact), but may still leak host
> > > information to a guest. More importantly, guest security considerations
> > > limits how we can extend socket filters later.
> > Unless the idea is for the hypervisor to prepared the BPF based on a
> > limited set of well defined modes that the guest can configure. Then
> > socket filters are fine, as the BPF is prepared by a regular host process.
>
> Yes, I think the idea is to let qemu to build a BPF program now.
>
> Passing eBPF program from guest to host is interesting, but an obvious issue
> is how to deal with the accessing of map.
>
> Thanks

Fundamentally, I suspect the way to solve it is to allow
the program to specify "should be offloaded to host".

And then it would access the host map rather than the guest map.

Then add some control path API for guest to poke at the host map.

It's not that there's anything special about the host map -
it's just separate from the guest - so if we wanted to
do something that can work on bare-metal we could -
just do something like a namespace and put all host
maps there. But I'm not sure it's worth the complexity.

Cc Aaron who wanted to look at this.

--
MST