On Mon, Jan 20, 2014 at 09:24:28PM +0000, Zoltan Kiss wrote:I don't think it should be that perfect. This is just a best effort estimation, if someone changes the vif queue length and see this message because of that, nothing very drastic will happen. It is just a rate limited warning message. Well, it is marked as error, because it is a serious condition.@@ -557,12 +577,25 @@ void xenvif_disconnect(struct xenvif *vif)You beat me to this. Was about to reply to your other email. :-)
void xenvif_free(struct xenvif *vif)
{
int i, unmap_timeout = 0;
+ /* Here we want to avoid timeout messages if an skb can be legitimatly
+ * stucked somewhere else. Realisticly this could be an another vif's
+ * internal or QDisc queue. That another vif also has this
+ * rx_drain_timeout_msecs timeout, but the timer only ditches the
+ * internal queue. After that, the QDisc queue can put in worst case
+ * XEN_NETIF_RX_RING_SIZE / MAX_SKB_FRAGS skbs into that another vif's
+ * internal queue, so we need several rounds of such timeouts until we
+ * can be sure that no another vif should have skb's from us. We are
+ * not sending more skb's, so newly stucked packets are not interesting
+ * for us here.
+ */
It's also worth mentioning that DIV_ROUND_UP part is merely estimation,
as you cannot possible know the maximum / miminum queue length of all
other vifs (as they can be changed during runtime). In practice most
users will stick with the default, but some advanced users might want to
tune this value for individual vif (whether that's a good idea or not is
another topic).
So, in order to convince myself this is safe. I also did some analysis
on the impact of having queue length other than default value. If
queue_len < XENVIF_QUEUE_LENGTH, that means you can queue less packets
in qdisc than default and drain it faster than calculated, which is
safe. On the other hand if queue_len > XENVIF_QUEUE_LENGTH, it means
actually you need more time than calculated. I'm in two minded here. The
default value seems sensible to me but I'm still a bit worried about the
queue_len > XENVIF_QUEUE_LENGTH case.
An idea is to book-keep maximum tx queue len among all vifs and use that
to calculate worst scenario.