Re: [PATCH net-next 2/2] tun: allow to attach ebpf socket filter
From: Willem de Bruijn
Date: Sun Dec 31 2017 - 05:15:11 EST
On Fri, Dec 29, 2017 at 3:44 AM, Jason Wang <jasowang@xxxxxxxxxx> wrote:
> This patch allows userspace to attach eBPF filter to tun. This will
> allow to implement VM dataplane filtering in a more efficient way
> compared to cBPF filter.
Is the idea to allow the trusted hypervisor to install these programs,
or the untrusted guests?
eBPF privilege escalations like those recently described in
https://lwn.net/Articles/742170/ would give me pause to expose
this to guests.
> Signed-off-by: Jason Wang <jasowang@xxxxxxxxxx>
> ---
> drivers/net/tun.c | 26 ++++++++++++++++++++++++++
> include/uapi/linux/if_tun.h | 1 +
> 2 files changed, 27 insertions(+)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 0853829..6e9452b 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -238,6 +238,7 @@ struct tun_struct {
> struct tun_pcpu_stats __percpu *pcpu_stats;
> struct bpf_prog __rcu *xdp_prog;
> struct tun_prog __rcu *steering_prog;
> + struct tun_prog __rcu *filter_prog;
> };
>
> static int tun_napi_receive(struct napi_struct *napi, int budget)
> @@ -984,12 +985,25 @@ static void tun_automq_xmit(struct tun_struct *tun, struct sk_buff *skb)
> #endif
> }
>
> +static unsigned int run_ebpf_filter(struct tun_struct *tun,
> + struct sk_buff *skb,
> + int len)
> +{
> + struct tun_prog *prog = rcu_dereference(tun->filter_prog);
> +
> + if (prog)
> + len = bpf_prog_run_clear_cb(prog->prog, skb);
> +
> + return len;
> +}
> +
> /* Net device start xmit */
> static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
> {
> struct tun_struct *tun = netdev_priv(dev);
> int txq = skb->queue_mapping;
> struct tun_file *tfile;
> + int len = skb->len;
>
> rcu_read_lock();
> tfile = rcu_dereference(tun->tfiles[txq]);
> @@ -1015,9 +1029,16 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
> sk_filter(tfile->socket.sk, skb))
> goto drop;
>
> + len = run_ebpf_filter(tun, skb, len);
> + if (!len)
> + goto drop;
> +
This adds a second filter step independent of the sk_filter call above.
Perhaps the two filter interfaces can map onto to the same instance.
I imagine that qemu never programs SO_ATTACH_FILTER.
More importantly, should this program just return a boolean pass or
drop. Taking a length and trimming may introduce bugs later on if the
stack parses the packet unconditionally, expecting a minimum size
to be present.
This was the reason for introducing sk_filter_trim_cap and using that
in other sk_filter sites.
A quick scan shows that tun_put_user expects a full vlan tag to exist
if skb_vlan_tag_present(skb), for instance. If trimmed to below this
length the final call to skb_copy_datagram_iter may have negative
length.
This is an issue with the existing sk_filter call as much as with the
new run_ebpf_filter call.
> if (unlikely(skb_orphan_frags_rx(skb, GFP_ATOMIC)))
> goto drop;
>
> + if (pskb_trim(skb, len))
> + goto drop;
> +
> skb_tx_timestamp(skb);