Re: [PATCH net] openvswitch: meter: remove rate from the bucket size calculation

From: Ilya Maximets
Date: Thu Apr 29 2021 - 17:12:25 EST

Next message: Pavel Machek: "Re: [PATCH v6 2/4] leds: Add driver for Qualcomm LPG"
Previous message: Don Hiatt: "Re: [PATCH 04/19] sched: Prepare for Core-wide rq->lock"
In reply to: Ilya Maximets: "Re: [PATCH net] openvswitch: meter: remove rate from the bucket size calculation"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 4/28/21 1:22 PM, Ilya Maximets wrote:
> On 4/28/21 8:45 AM, Jean Tourrilhes wrote:
>> On Wed, Apr 28, 2021 at 02:24:10PM +0800, Tonghao Zhang wrote:
>>> Hi Ilya
>>> If we set the burst size too small, the meters of ovs don't work.
>>
>> Most likely, you need to set the burst size larger.
>> A quick Google on finding a good burst size :
>> https://www.juniper.net/documentation/us/en/software/junos/routing-policy/topics/concept/policer-mx-m120-m320-burstsize-determining.html
>
> +1.
> Tonghao, If you're configuring burst size too low, meter will not pass
> packets. That's expected behavior. In your example with 1400B packets
> and 1500B (12 kbit) burst size there is a very high probability that a
> lot of packets will be dropped and not pass the meter unless you're
> sending them in a very precise points in time. I don't think that anyone
> will recommend setting burst size so close to the MTU. The article above
> suggests using 10x MTU value, but I don't know if that will be enough
> with high speed devices.
>
>>
>> Now, the interesting question, is the behaviour of OVS
>> different from a standard token bucket, such as a kernel policer ?
>
> I didn't test it, but I looked at the implementation in
> net/sched/act_police.c and net/sched/sch_tbf.c, and they should work
> in a same way as this patch, i.e. it's a classic token bucket where
> burst is a burst and nothing else. These implementations uses burst
> in nanoseconds instead of bytes, but that doesn't matter (nanoseconds
> calculated from the rate and burst in bytes specified by user).
> For example, net/sched/act_police.c works like this:
>
> toks = min_t(s64, now - police->tcfp_t_c, p->tcfp_burst);
> ^---- calculating how many tokens needs to be added
> toks += police->tcfp_toks; <-- also adding all existing tokens
> if (toks > p->tcfp_burst)
> toks = p->tcfp_burst; <-- hard limit of tokens by the burst size
> toks -= (s64)psched_l2t_ns(&p->rate, qdisc_pkt_len(skb));
> ^-- spending tokens to pass the packet
> if (toks >= 0) { <-- Did we have enough tokens?
> /* Packet passed. */
> police->tcfp_t_c = now;
> police->tcfp_toks = toks;
> }
>
> net/sched/sch_tbf.c works in almost exactly same way. So, there is
> *no algorithmic difference* here.
>
> ---
>
> There is one difference though. I said that it doesn't matter that
> tc uses time instead of bytes as a measure for tokens, but it actually
> does matter because time is calculated based on the configured rate,
> but applied to the actual rate. Let me explain:
>
> Assuming configuration "rate 200mbit burst 20K" as in example below.
> iproute2 will calculate burst using tc_calc_xmittime function:
> https://github.com/shemminger/iproute2/blob/9f366536edb5158343152604e82b968be46dbf26/tc/tc_core.c#L60
>
> So the burst configuration passed to kernel will be:
>
> TIME_UNITS_PER_SEC(1000000) * (20 * 1024) / (200 * 1024*1024/8) = 781 usec
> 10^-6 bytes bytes/sec
>
> That means that burst is not 20K bytes as configured, but any number of
> bytes in 781 usec window regardless of a line rate.

OK. I found my mistake here. Even though the burst size is in units of
time, it doesn't matter because, when tokens are consumed, algorithm
subtracts time needed to pass a packet with a configured rate (see
psched_l2t_ns() function). This evens out the difference.

So, everything is perfectly fine here. :)

Sorry for the noise.

> For example, if traffic goes from 10 Gbps interface, effective burst size
> will be 10^9 / 8 * 781 * 10^-6 = 97K which is almost 5 times higher than
> the configured value. And the difference scales linearly with the increase
> of the line rate speed. For 100G interface it will be 970K.
>
> It might be much more noticeable with lower configured rate.
> For "rate 10mbit burst 20K", real burst interval will be 15.6 msec, which
> will translate into 1.9M burst size for a 10G line rate, which is almost
> 100 times larger than configured 20K. And it will be 19M for a 100Gbps
> interface, making the average rate triple as high as configured for a
> policer.
>
> All in all this looks more like an issue of TC and iproute implementation.
> IMHO, tc command should not allow configuration of burst in bytes just
> because it can not configure that in kernel and therefore can not guarantee
> that behavior. Configuration should be in micro/nanoseconds instead.
>
> CC: Cong, Davide
> Maybe someone from the TC side can comment on that?
>
> We can try to mimic this behavior in OVS, but I'm not sure if it's correct.
> Current OVS implementation, unlike TC, guarantees the burst size in bytes.
> And it's also a completely different kind of difference with OVS meters, so
> unrelated to the current patch.
>
> Best regards, Ilya Maximets.
>
>> Here is how to set up a kernel policer :
>> ----------------------------------------------------------
>> # Create a dummy classful discipline to attach filter
>> tc qdisc del dev eth6 root
>> tc qdisc add dev eth6 root handle 1: prio bands 2 priomap 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> tc qdisc add dev eth6 parent 1:1 handle 10: pfifo limit 1000
>> tc qdisc add dev eth6 parent 1:2 handle 20: pfifo limit 1000
>> tc -s qdisc show dev eth6
>> tc -s class show dev eth6
>>
>> # Filter to do hard rate limiting
>> tc filter del dev eth6 parent 1: protocol all prio 1 handle 800::100 u32
>> tc filter add dev eth6 parent 1: protocol all prio 1 handle 800::100 u32 match u32 0 0 police rate 200mbit burst 20K mtu 10000 drop
>> tc -s filter show dev eth6
>> tc filter change dev eth6 parent 1: protocol all prio 1 handle 800::100 u32 match u32 0 0 police rate 200mbit burst 50K mtu 10000 drop
>> ----------------------------------------------------------
>>
>> Regards,
>>
>> Jean
>>
>

Next message: Pavel Machek: "Re: [PATCH v6 2/4] leds: Add driver for Qualcomm LPG"
Previous message: Don Hiatt: "Re: [PATCH 04/19] sched: Prepare for Core-wide rq->lock"
In reply to: Ilya Maximets: "Re: [PATCH net] openvswitch: meter: remove rate from the bucket size calculation"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]