Re: [PATCH v2] net/sched: netem: account for backlog updates from child qdisc
From: Jamal Hadi Salim
Date: Thu Dec 05 2024 - 07:40:43 EST
On Wed, Dec 4, 2024 at 7:29 AM Martin Ottens <martin.ottens@xxxxxx> wrote:
>
> In general, 'qlen' of any classful qdisc should keep track of the
> number of packets that the qdisc itself and all of its children holds.
> In case of netem, 'qlen' only accounts for the packets in its internal
> tfifo. When netem is used with a child qdisc, the child qdisc can use
> 'qdisc_tree_reduce_backlog' to inform its parent, netem, about created
> or dropped SKBs. This function updates 'qlen' and the backlog statistics
> of netem, but netem does not account for changes made by a child qdisc.
> 'qlen' then indicates the wrong number of packets in the tfifo.
> If a child qdisc creates new SKBs during enqueue and informs its parent
> about this, netem's 'qlen' value is increased. When netem dequeues the
> newly created SKBs from the child, the 'qlen' in netem is not updated.
> If 'qlen' reaches the configured sch->limit, the enqueue function stops
> working, even though the tfifo is not full.
>
> Reproduce the bug:
> Ensure that the sender machine has GSO enabled. Configure netem as root
> qdisc and tbf as its child on the outgoing interface of the machine
> as follows:
> $ tc qdisc add dev <oif> root handle 1: netem delay 100ms limit 100
> $ tc qdisc add dev <oif> parent 1:0 tbf rate 50Mbit burst 1542 latency 50ms
>
> Send bulk TCP traffic out via this interface, e.g., by running an iPerf3
> client on the machine. Check the qdisc statistics:
> $ tc -s qdisc show dev <oif>
>
> tbf segments the GSO SKBs (tbf_segment) and updates the netem's 'qlen'.
> The interface fully stops transferring packets and "locks". In this case,
> the child qdisc and tfifo are empty, but 'qlen' indicates the tfifo is at
> its limit and no more packets are accepted.
>
Would be nice to see the before and after (your change) output of the
stats to illustrate
> This patch adds a counter for the entries in the tfifo. Netem's 'qlen' is
> only decreased when a packet is returned by its dequeue function, and not
> during enqueuing into the child qdisc. External updates to 'qlen' are thus
> accounted for and only the behavior of the backlog statistics changes. As
> in other qdiscs, 'qlen' then keeps track of how many packets are held in
> netem and all of its children. As before, sch->limit remains as the
> maximum number of packets in the tfifo. The same applies to netem's
> backlog statistics.
> (Note: the 'backlog' byte-statistic of netem is incorrect in the example
> above even after the patch is applied due to a bug in tbf. See my
> previous patch ([PATCH] net/sched: tbf: correct backlog statistic for
> GSO packets)).
>
> Signed-off-by: Martin Ottens <martin.ottens@xxxxxx>
Fixes missing - probably since 2.6.x. Stephen?
Your fix seems reasonable but I am curious: does this only happen with
TCP? If yes, perhaps the
GSO handling maybe contributing?
Can you run iperf with udp and see if the issue shows up again? Or
ping -f with size 1024.
You can slow down tbf for example with something like: tbf rate 20kbit
buffer 1600 limit 3000
cheers,
jamal
> ---
> net/sched/sch_netem.c | 22 ++++++++++++++++------
> 1 file changed, 16 insertions(+), 6 deletions(-)
>
> diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> index fe6fed291a7b..71ec9986ed37 100644
> --- a/net/sched/sch_netem.c
> +++ b/net/sched/sch_netem.c
> @@ -79,6 +79,8 @@ struct netem_sched_data {
> struct sk_buff *t_head;
> struct sk_buff *t_tail;
>
> + u32 t_len;
> +
> /* optional qdisc for classful handling (NULL at netem init) */
> struct Qdisc *qdisc;
>
> @@ -383,6 +385,7 @@ static void tfifo_reset(struct Qdisc *sch)
> rtnl_kfree_skbs(q->t_head, q->t_tail);
> q->t_head = NULL;
> q->t_tail = NULL;
> + q->t_len = 0;
> }
>
> static void tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch)
> @@ -412,6 +415,7 @@ static void tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch)
> rb_link_node(&nskb->rbnode, parent, p);
> rb_insert_color(&nskb->rbnode, &q->t_root);
> }
> + q->t_len++;
> sch->q.qlen++;
> }
>
> @@ -518,7 +522,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> 1<<get_random_u32_below(8);
> }
>
> - if (unlikely(sch->q.qlen >= sch->limit)) {
> + if (unlikely(q->t_len >= sch->limit)) {
> /* re-link segs, so that qdisc_drop_all() frees them all */
> skb->next = segs;
> qdisc_drop_all(skb, sch, to_free);
> @@ -702,8 +706,8 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
> tfifo_dequeue:
> skb = __qdisc_dequeue_head(&sch->q);
> if (skb) {
> - qdisc_qstats_backlog_dec(sch, skb);
> deliver:
> + qdisc_qstats_backlog_dec(sch, skb);
> qdisc_bstats_update(sch, skb);
> return skb;
> }
> @@ -719,8 +723,7 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
>
> if (time_to_send <= now && q->slot.slot_next <= now) {
> netem_erase_head(q, skb);
> - sch->q.qlen--;
> - qdisc_qstats_backlog_dec(sch, skb);
> + q->t_len--;
> skb->next = NULL;
> skb->prev = NULL;
> /* skb->dev shares skb->rbnode area,
> @@ -747,16 +750,21 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
> if (net_xmit_drop_count(err))
> qdisc_qstats_drop(sch);
> qdisc_tree_reduce_backlog(sch, 1, pkt_len);
> + sch->qstats.backlog -= pkt_len;
> + sch->q.qlen--;
> }
> goto tfifo_dequeue;
> }
> + sch->q.qlen--;
> goto deliver;
> }
>
> if (q->qdisc) {
> skb = q->qdisc->ops->dequeue(q->qdisc);
> - if (skb)
> + if (skb) {
> + sch->q.qlen--;
> goto deliver;
> + }
> }
>
> qdisc_watchdog_schedule_ns(&q->watchdog,
> @@ -766,8 +774,10 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
>
> if (q->qdisc) {
> skb = q->qdisc->ops->dequeue(q->qdisc);
> - if (skb)
> + if (skb) {
> + sch->q.qlen--;
> goto deliver;
> + }
> }
> return NULL;
> }
> --
> 2.39.5
>