On Wed, Apr 02, 2025 at 12:14:24PM -0400, Stefan Hajnoczi wrote:
On Tue, Apr 01, 2025 at 08:13:49PM +0000, Alexander Graf wrote:
Ever since the introduction of the virtio vsock driver, it includedThe reason for queued_replies is that rx packet processing may emit tx
pushback logic that blocks it from taking any new RX packets until the
TX queue backlog becomes shallower than the virtqueue size.
This logic works fine when you connect a user space application on the
hypervisor with a virtio-vsock target, because the guest will stop
receiving data until the host pulled all outstanding data from the VM.
With Nitro Enclaves however, we connect 2 VMs directly via vsock:
Parent Enclave
RX -------- TX
TX -------- RX
This means we now have 2 virtio-vsock backends that both have the pushback
logic. If the parent's TX queue runs full at the same time as the
Enclave's, both virtio-vsock drivers fall into the pushback path and
no longer accept RX traffic. However, that RX traffic is TX traffic on
the other side which blocks that driver from making any forward
progress. We're now in a deadlock.
To resolve this, let's remove that pushback logic altogether and rely on
higher levels (like credits) to ensure we do not consume unbounded
memory.
packets. Therefore tx virtqueue space is required in order to process
the rx virtqueue.
queued_replies puts a bound on the amount of tx packets that can be
queued in memory so the other side cannot consume unlimited memory. Once
that bound has been reached, rx processing stops until the other side
frees up tx virtqueue space.
It's been a while since I looked at this problem, so I don't have a
solution ready. In fact, last time I thought about it I wondered if the
design of virtio-vsock fundamentally suffers from deadlocks.
I don't think removing queued_replies is possible without a replacement
for the bounded memory and virtqueue exhaustion issue though. Credits
are not a solution - they are about socket buffer space, not about
virtqueue space, which includes control packets that are not accounted
by socket buffer space.
Hmm.
Actually, let's think which packets require a response.
VIRTIO_VSOCK_OP_REQUEST
VIRTIO_VSOCK_OP_SHUTDOWN
VIRTIO_VSOCK_OP_CREDIT_REQUEST
the response to these always reports a state of an existing socket.
and, only one type of response is relevant for each socket.
So here's my suggestion:
stop queueing replies on the vsock device, instead,
simply store the response on the socket, and create a list of sockets
that have replies to be transmitted
WDYT?