On Fri, Dec 29, 2017 at 3:44 AM, Jason Wang <jasowang@xxxxxxxxxx> wrote:
This patch allows userspace to attach eBPF filter to tun. This willIs the idea to allow the trusted hypervisor to install these programs,
allow to implement VM dataplane filtering in a more efficient way
compared to cBPF filter.
or the untrusted guests?
eBPF privilege escalations like those recently described in
https://lwn.net/Articles/742170/ would give me pause to expose
this to guests.
Signed-off-by: Jason Wang <jasowang@xxxxxxxxxx>This adds a second filter step independent of the sk_filter call above.
---
drivers/net/tun.c | 26 ++++++++++++++++++++++++++
include/uapi/linux/if_tun.h | 1 +
2 files changed, 27 insertions(+)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 0853829..6e9452b 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -238,6 +238,7 @@ struct tun_struct {
struct tun_pcpu_stats __percpu *pcpu_stats;
struct bpf_prog __rcu *xdp_prog;
struct tun_prog __rcu *steering_prog;
+ struct tun_prog __rcu *filter_prog;
};
static int tun_napi_receive(struct napi_struct *napi, int budget)
@@ -984,12 +985,25 @@ static void tun_automq_xmit(struct tun_struct *tun, struct sk_buff *skb)
#endif
}
+static unsigned int run_ebpf_filter(struct tun_struct *tun,
+ struct sk_buff *skb,
+ int len)
+{
+ struct tun_prog *prog = rcu_dereference(tun->filter_prog);
+
+ if (prog)
+ len = bpf_prog_run_clear_cb(prog->prog, skb);
+
+ return len;
+}
+
/* Net device start xmit */
static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
{
struct tun_struct *tun = netdev_priv(dev);
int txq = skb->queue_mapping;
struct tun_file *tfile;
+ int len = skb->len;
rcu_read_lock();
tfile = rcu_dereference(tun->tfiles[txq]);
@@ -1015,9 +1029,16 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
sk_filter(tfile->socket.sk, skb))
goto drop;
+ len = run_ebpf_filter(tun, skb, len);
+ if (!len)
+ goto drop;
+
Perhaps the two filter interfaces can map onto to the same instance.
I imagine that qemu never programs SO_ATTACH_FILTER.
More importantly, should this program just return a boolean pass or
drop. Taking a length and trimming may introduce bugs later on if the
stack parses the packet unconditionally, expecting a minimum size
to be present.
This was the reason for introducing sk_filter_trim_cap and using that
in other sk_filter sites.
A quick scan shows that tun_put_user expects a full vlan tag to exist
if skb_vlan_tag_present(skb), for instance. If trimmed to below this
length the final call to skb_copy_datagram_iter may have negative
length.
This is an issue with the existing sk_filter call as much as with the
new run_ebpf_filter call.
if (unlikely(skb_orphan_frags_rx(skb, GFP_ATOMIC)))
goto drop;
+ if (pskb_trim(skb, len))
+ goto drop;
+
skb_tx_timestamp(skb);