Re: [PATCH net-next 0/2] xen-netback: Changes around carrier handling
From: Zoltan Kiss
Date: Wed Aug 06 2014 - 15:21:24 EST
On 06/08/14 00:07, David Miller wrote:
From: Zoltan Kiss <zoltan.kiss@xxxxxxxxxx>
Date: Mon, 4 Aug 2014 16:20:56 +0100
This series starts using carrier off as a way to purge packets when the guest is
not able (or willing) to receive them. It is a much faster way to get rid of
packets waiting for an overwhelmed guest.
The first patch changes current netback code where it relies currently on
netif_carrier_ok.
The second turns off the carrier if the guest times out on a queue, and only
turn it on again if that queue (or queues) resurrects.
Signed-off-by: Zoltan Kiss <zoltan.kiss@xxxxxxxxxx>
Signed-off-by: David Vrabel <david.vrabel@xxxxxxxxxx>
Applied, but I have some reservations:
1) This is starting to bleed what is normally qdisc type policy into the
driver.
Yeah. The fundamental problem with netback that start_xmit place the
packet into an internal queue, and then the thread does the actual
transmission from that queue, but it doesn't know whether it will
succeed in a finite time period.
There is slot estimation in start_xmit to prevent QDisc pouring more
packets into the internal queue when it seems it won't fit. When that
happens, we stop QDisc and start a timer, and when the timer fires,
before this patch we just ditched the internal queue and started QDisc
again. If the frontend were dead, it lead to a quite long delay until
packets destined to it were freed.
I just noticed a possible problem with start_xmit: if this slot
estimation fails, and the packet is likely to be stalled at least for a
while, it still place the skb into the internal queue and return
NETDEV_TX_OK. Shouldn't we return NETDEV_TX_BUSY and not placing the
packet into the internal queue? It will be requeued later, or dropped by
QDisc. I think it will be more natural. But it can decrease performance
if these "not enough slot" situations are very frequent and short
living, and by the time the RX thread wakes up
My long term idea is to move part of the thread's work into start_xmit,
so it can set up the grant operations and if there isn't enough slot it
can return the skb to QDisc with NETDEV_TX_BUSY for requeueing. Then the
thread can only do the batching of the grant copy operations and
releasing the skbs. And we can ditch a good part of the code ...
2) There are other drivers that could run into this kind of situation and
have similar concerns, therefore we should make sure we have a consistent
approach that such entities use to handle this problem.
Part of the problem is that netif_carrier_off() only partially mimicks
the situation. It expresses the "transmitter is down so packets
aren't going onto the wire" part, which keeps the watchdog from
spitting out log messages ever time it fires. But it doesn't deal
with packet freeing policy meanwhile, which I guess is the part that
this patch series is largely trying to address.
Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/