Re: [PATCH net] vhost_net: fix possible infinite loop

From: Jason Wang
Date: Mon May 13 2019 - 01:44:15 EST

Next message: kernel test robot: "[mm] af53d3e9e0: vm-scalability.median 12.5% improvement"
Previous message: Guenter Roeck: "Re: [PATCH v2 1/2] net: phy: realtek: Add rtl8211e rx/tx delays config"
In reply to: Michael S. Tsirkin: "Re: [PATCH net] vhost_net: fix possible infinite loop"
Next in thread: Michael S. Tsirkin: "Re: [PATCH net] vhost_net: fix possible infinite loop"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2019/5/13 äå1:10, Michael S. Tsirkin wrote:

On Sun, May 05, 2019 at 12:20:24PM +0800, Jason Wang wrote:

On 2019/4/26 äå3:35, Jason Wang wrote:

On 2019/4/26 äå1:52, Michael S. Tsirkin wrote:

On Thu, Apr 25, 2019 at 03:33:19AM -0400, Jason Wang wrote:

When the rx buffer is too small for a packet, we will discard the vq
descriptor and retry it for the next packet:

while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk,
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ &busyloop_intr))) {
...
ÂÂÂÂ/* On overrun, truncate and discard */
ÂÂÂÂif (unlikely(headcount > UIO_MAXIOV)) {
ÂÂÂÂÂÂÂ iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1);
ÂÂÂÂÂÂÂ err = sock->ops->recvmsg(sock, &msg,
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 1, MSG_DONTWAIT | MSG_TRUNC);
ÂÂÂÂÂÂÂ pr_debug("Discarded rx packet: len %zd\n", sock_len);
ÂÂÂÂÂÂÂ continue;
ÂÂÂÂ}
...
}

This makes it possible to trigger a infinite while..continue loop
through the co-opreation of two VMs like:

1) Malicious VM1 allocate 1 byte rx buffer and try to slow down the
ÂÂÂ vhost process as much as possible e.g using indirect descriptors or
ÂÂÂ other.
2) Malicious VM2 generate packets to VM1 as fast as possible

Fixing this by checking against weight at the end of RX and TX
loop. This also eliminate other similar cases when:

- userspace is consuming the packets in the meanwhile
- theoretical TOCTOU attack if guest moving avail index back and forth
ÂÂ to hit the continue after vhost find guest just add new buffers

This addresses CVE-2019-3900.

Fixes: d8316f3991d20 ("vhost: fix total length when packets are
too short")

I agree this is the real issue.

Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server")

This is just a red herring imho. We can stick this on any vhost patch :)

Signed-off-by: Jason Wang <jasowang@xxxxxxxxxx>
---
Â drivers/vhost/net.c | 41 +++++++++++++++++++++--------------------
Â 1 file changed, 21 insertions(+), 20 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index df51a35..fb46e6b 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -778,8 +778,9 @@ static void handle_tx_copy(struct vhost_net
*net, struct socket *sock)
ÂÂÂÂÂ int err;
ÂÂÂÂÂ int sent_pkts = 0;
ÂÂÂÂÂ bool sock_can_batch = (sock->sk->sk_sndbuf == INT_MAX);
+ÂÂÂ bool next_round = false;
Â -ÂÂÂ for (;;) {
+ÂÂÂ do {
ÂÂÂÂÂÂÂÂÂ bool busyloop_intr = false;
Â ÂÂÂÂÂÂÂÂÂ if (nvq->done_idx == VHOST_NET_BATCH)
@@ -845,11 +846,10 @@ static void handle_tx_copy(struct
vhost_net *net, struct socket *sock)
ÂÂÂÂÂÂÂÂÂ vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq, head);
ÂÂÂÂÂÂÂÂÂ vq->heads[nvq->done_idx].len = 0;
ÂÂÂÂÂÂÂÂÂ ++nvq->done_idx;
-ÂÂÂÂÂÂÂ if (vhost_exceeds_weight(++sent_pkts, total_len)) {
-ÂÂÂÂÂÂÂÂÂÂÂ vhost_poll_queue(&vq->poll);
-ÂÂÂÂÂÂÂÂÂÂÂ break;
-ÂÂÂÂÂÂÂ }
-ÂÂÂ }
+ÂÂÂ } while (!(next_round = vhost_exceeds_weight(++sent_pkts,
total_len)));
+
+ÂÂÂ if (next_round)
+ÂÂÂÂÂÂÂ vhost_poll_queue(&vq->poll);
Â ÂÂÂÂÂ vhost_tx_batch(net, nvq, sock, &msg);
Â }
@@ -873,8 +873,9 @@ static void handle_tx_zerocopy(struct
vhost_net *net, struct socket *sock)
ÂÂÂÂÂ struct vhost_net_ubuf_ref *uninitialized_var(ubufs);
ÂÂÂÂÂ bool zcopy_used;
ÂÂÂÂÂ int sent_pkts = 0;
+ÂÂÂ bool next_round = false;
Â -ÂÂÂ for (;;) {
+ÂÂÂ do {
ÂÂÂÂÂÂÂÂÂ bool busyloop_intr;
Â ÂÂÂÂÂÂÂÂÂ /* Release DMAs done buffers first */
@@ -951,11 +952,10 @@ static void handle_tx_zerocopy(struct
vhost_net *net, struct socket *sock)
ÂÂÂÂÂÂÂÂÂ else
ÂÂÂÂÂÂÂÂÂÂÂÂÂ vhost_zerocopy_signal_used(net, vq);
ÂÂÂÂÂÂÂÂÂ vhost_net_tx_packet(net);
-ÂÂÂÂÂÂÂ if (unlikely(vhost_exceeds_weight(++sent_pkts, total_len))) {
-ÂÂÂÂÂÂÂÂÂÂÂ vhost_poll_queue(&vq->poll);
-ÂÂÂÂÂÂÂÂÂÂÂ break;
-ÂÂÂÂÂÂÂ }
-ÂÂÂ }
+ÂÂÂ } while (!(next_round = vhost_exceeds_weight(++sent_pkts,
total_len)));
+
+ÂÂÂ if (next_round)
+ÂÂÂÂÂÂÂ vhost_poll_queue(&vq->poll);
Â }
Â Â /* Expects to be always run from workqueue - which acts as
@@ -1134,6 +1134,7 @@ static void handle_rx(struct vhost_net *net)
ÂÂÂÂÂ struct iov_iter fixup;
ÂÂÂÂÂ __virtio16 num_buffers;
ÂÂÂÂÂ int recv_pkts = 0;
+ÂÂÂ bool next_round = false;
Â ÂÂÂÂÂ mutex_lock_nested(&vq->mutex, VHOST_NET_VQ_RX);
ÂÂÂÂÂ sock = vq->private_data;
@@ -1153,8 +1154,11 @@ static void handle_rx(struct vhost_net *net)
ÂÂÂÂÂÂÂÂÂ vq->log : NULL;
ÂÂÂÂÂ mergeable = vhost_has_feature(vq, VIRTIO_NET_F_MRG_RXBUF);
Â -ÂÂÂ while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk,
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ &busyloop_intr))) {
+ÂÂÂ do {
+ÂÂÂÂÂÂÂ sock_len = vhost_net_rx_peek_head_len(net, sock->sk,
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ &busyloop_intr);
+ÂÂÂÂÂÂÂ if (!sock_len)
+ÂÂÂÂÂÂÂÂÂÂÂ break;
ÂÂÂÂÂÂÂÂÂ sock_len += sock_hlen;
ÂÂÂÂÂÂÂÂÂ vhost_len = sock_len + vhost_hlen;
ÂÂÂÂÂÂÂÂÂ headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
@@ -1239,12 +1243,9 @@ static void handle_rx(struct vhost_net *net)
ÂÂÂÂÂÂÂÂÂÂÂÂÂ vhost_log_write(vq, vq_log, log, vhost_len,
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ vq->iov, in);
ÂÂÂÂÂÂÂÂÂ total_len += vhost_len;
-ÂÂÂÂÂÂÂ if (unlikely(vhost_exceeds_weight(++recv_pkts, total_len))) {
-ÂÂÂÂÂÂÂÂÂÂÂ vhost_poll_queue(&vq->poll);
-ÂÂÂÂÂÂÂÂÂÂÂ goto out;
-ÂÂÂÂÂÂÂ }
-ÂÂÂ }
-ÂÂÂ if (unlikely(busyloop_intr))
+ÂÂÂ } while (!(next_round = vhost_exceeds_weight(++recv_pkts,
total_len)));
+
+ÂÂÂ if (unlikely(busyloop_intr || next_round))
ÂÂÂÂÂÂÂÂÂ vhost_poll_queue(&vq->poll);
ÂÂÂÂÂ else
ÂÂÂÂÂÂÂÂÂ vhost_net_enable_vq(net, vq);

I'm afraid with this addition the code is too much like spagetty. What
does next_round mean?Â Just that we are breaking out of loop?

Yes, we can have a better name of course.

That is
what goto is for...Â Either let's have for(;;) with goto/break to get
outside or a while loop with a condition.Â Both is just unreadable.

All these checks in 3 places are exactly the same on all paths and they
are slow path. Why don't we put this in a function?

The point I think is, we want the weight to be checked in both fast path
and slow path.

E.g. like the below.
Warning: completely untested.

Signed-off-by: Michael S. Tsirkin <mst@xxxxxxxxxx>

---

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index df51a35cf537..a0f89a504cd9 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -761,6 +761,23 @@ static int vhost_net_build_xdp(struct
vhost_net_virtqueue *nvq,
ÂÂÂÂÂ return 0;
Â }
Â +/* Returns true if caller needs to go back and re-read the ring. */
+static bool empty_ring(struct vhost_net *net, struct
vhost_virtqueue *vq,
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ int pkts, size_t total_len, bool busyloop_intr)
+{
+ÂÂÂ if (unlikely(busyloop_intr)) {
+ÂÂÂÂÂÂÂ vhost_poll_queue(&vq->poll);
+ÂÂÂ } else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
+ÂÂÂÂÂÂÂ /* They have slipped one in meanwhile: check again. */
+ÂÂÂÂÂÂÂ vhost_disable_notify(&net->dev, vq);
+ÂÂÂÂÂÂÂ if (!vhost_exceeds_weight(pkts, total_len))
+ÂÂÂÂÂÂÂÂÂÂÂ return true;
+ÂÂÂÂÂÂÂ vhost_poll_queue(&vq->poll);
+ÂÂÂ }
+ÂÂÂ /* Nothing new?Â Wait for eventfd to tell us they refilled. */
+ÂÂÂ return false;
+}

Ring empy is not the only places that needs care. E.g for RX, we need
care about overrun and when userspace is consuming the packet in the
same time. So there's no need to toggle vq notification in those two.

Well I just factored out code that looked exactly the same.
You can add more common code and rename the function
if it turns out to be worth while.

+
Â static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
Â {
ÂÂÂÂÂ struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
@@ -790,15 +807,10 @@ static void handle_tx_copy(struct vhost_net
*net, struct socket *sock)
ÂÂÂÂÂÂÂÂÂ /* On error, stop handling until the next kick. */
ÂÂÂÂÂÂÂÂÂ if (unlikely(head < 0))
ÂÂÂÂÂÂÂÂÂÂÂÂÂ break;
-ÂÂÂÂÂÂÂ /* Nothing new?Â Wait for eventfd to tell us they refilled. */
ÂÂÂÂÂÂÂÂÂ if (head == vq->num) {
-ÂÂÂÂÂÂÂÂÂÂÂ if (unlikely(busyloop_intr)) {
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ vhost_poll_queue(&vq->poll);
-ÂÂÂÂÂÂÂÂÂÂÂ } else if (unlikely(vhost_enable_notify(&net->dev,
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ vq))) {
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ vhost_disable_notify(&net->dev, vq);
+ÂÂÂÂÂÂÂÂÂÂÂ if (unlikely(empty_ring(net, vq, ++sent_pkts,
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ total_len, busyloop_intr)))
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ continue;
-ÂÂÂÂÂÂÂÂÂÂÂ }
ÂÂÂÂÂÂÂÂÂÂÂÂÂ break;
ÂÂÂÂÂÂÂÂÂ }
Â @@ -886,14 +898,10 @@ static void handle_tx_zerocopy(struct
vhost_net *net, struct socket *sock)
ÂÂÂÂÂÂÂÂÂ /* On error, stop handling until the next kick. */
ÂÂÂÂÂÂÂÂÂ if (unlikely(head < 0))
ÂÂÂÂÂÂÂÂÂÂÂÂÂ break;
-ÂÂÂÂÂÂÂ /* Nothing new?Â Wait for eventfd to tell us they refilled. */
ÂÂÂÂÂÂÂÂÂ if (head == vq->num) {
-ÂÂÂÂÂÂÂÂÂÂÂ if (unlikely(busyloop_intr)) {
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ vhost_poll_queue(&vq->poll);
-ÂÂÂÂÂÂÂÂÂÂÂ } else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ vhost_disable_notify(&net->dev, vq);
+ÂÂÂÂÂÂÂÂÂÂÂ if (unlikely(empty_ring(net, vq, ++sent_pkts,
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ total_len, busyloop_intr)))
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ continue;
-ÂÂÂÂÂÂÂÂÂÂÂ }
ÂÂÂÂÂÂÂÂÂÂÂÂÂ break;
ÂÂÂÂÂÂÂÂÂ }
Â @@ -1163,18 +1171,10 @@ static void handle_rx(struct vhost_net *net)
ÂÂÂÂÂÂÂÂÂ /* On error, stop handling until the next kick. */
ÂÂÂÂÂÂÂÂÂ if (unlikely(headcount < 0))
ÂÂÂÂÂÂÂÂÂÂÂÂÂ goto out;
-ÂÂÂÂÂÂÂ /* OK, now we need to know about added descriptors. */
ÂÂÂÂÂÂÂÂÂ if (!headcount) {
-ÂÂÂÂÂÂÂÂÂÂÂ if (unlikely(busyloop_intr)) {
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ vhost_poll_queue(&vq->poll);
-ÂÂÂÂÂÂÂÂÂÂÂ } else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ /* They have slipped one in as we were
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * doing that: check again. */
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ vhost_disable_notify(&net->dev, vq);
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ continue;
-ÂÂÂÂÂÂÂÂÂÂÂ }
-ÂÂÂÂÂÂÂÂÂÂÂ /* Nothing new?Â Wait for eventfd to tell us
-ÂÂÂÂÂÂÂÂÂÂÂÂ * they refilled. */
+ÂÂÂÂÂÂÂÂÂÂÂ if (unlikely(empty_ring(net, vq, ++recv_pkts,
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ total_len, busyloop_intr)))
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ continue;
ÂÂÂÂÂÂÂÂÂÂÂÂÂ goto out;
ÂÂÂÂÂÂÂÂÂ }
ÂÂÂÂÂÂÂÂÂ busyloop_intr = false;

The patch misses several other continue that need cares and there's
another call of vhost_exceeds_weight() at the end of the loop.

So instead of duplicating check everywhere like:

for (;;) {
ÂÂÂ if (condition_x) {
ÂÂÂ ÂÂÂ if (empty_ring())
ÂÂÂ ÂÂÂ ÂÂÂ continue;
ÂÂÂ ÂÂÂ break;
ÂÂÂ }
ÂÂÂ if (condition_y) {
ÂÂÂ ÂÂÂ if (empty_ring());
ÂÂÂ ÂÂÂ ÂÂÂ continue;
ÂÂÂ ÂÂÂ break;
ÂÂÂ }
ÂÂÂ if (condition_z) {
ÂÂÂ ÂÂÂ if (empty_ring())
ÂÂÂ ÂÂÂ ÂÂÂ continue;
ÂÂÂ ÂÂÂ break;
ÂÂÂ }

}

What this patch did:

do {
ÂÂ if (condition_x)
ÂÂÂ continue;
ÂÂ if (condition_y)
ÂÂÂ continue;
ÂÂ if (condition_z)
ÂÂÂ continue;
} while(!need_break())

is much more compact and easier to read?

Thanks

Hi Michael:

Any more comments?

Thanks

Jason the actual code in e.g. handle_tx_copy is nowhere close to the
neat example you provide below. Yes new parts are like this but we
kept all the old code around and that works differently.

Look at handle_tx_copy for example.
With your patch applied one can exit the loop:

with a break
with continue and condition false
get to end of loop and condition false

and we have a goto there which now can get us to
end of loop and then exit.

For the goto case, there's no functional change. For either case, there will be a weight check at the end of the loop. Isn't it?

previously at least we would only exit
on a break.

Actually, the only difference in handle_tx_copy() is the handling of 'continue'. Without the patch, we won't check weight. With the patch, we will check and exit the loop if we exceeds the weight. Did I miss anything obvious?

Thanks

Frankly trying to review it I get lost now.
I also think repeated checking of empty_ring is not that
problematic.
But I don't insist on this specific splitup
just pls make the code regular by
moving stuff to sub-function.

Next message: kernel test robot: "[mm] af53d3e9e0: vm-scalability.median 12.5% improvement"
Previous message: Guenter Roeck: "Re: [PATCH v2 1/2] net: phy: realtek: Add rtl8211e rx/tx delays config"
In reply to: Michael S. Tsirkin: "Re: [PATCH net] vhost_net: fix possible infinite loop"
Next in thread: Michael S. Tsirkin: "Re: [PATCH net] vhost_net: fix possible infinite loop"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]