Re: high number of dropped packets/rx_missed_errors from 4.17 kernel

From: Andrei Popa
Date: Fri Dec 04 2020 - 01:46:43 EST


Hi,

I’ve applied your patch on kernel 4.17.0 and dropped packets and rx_missed_errors are still present, through they are increasing at a lower rate.

root@shaper:~# ./test
rx_missed_errors: 2135
RX errors 0 dropped 2155 overruns 0 frame 0
sleeping 60 seconds
rx_missed_errors: 2433
RX errors 0 dropped 2459 overruns 0 frame 0
sleeping 60 seconds
rx_missed_errors: 2433
RX errors 0 dropped 2465 overruns 0 frame 0
sleeping 60 seconds
rx_missed_errors: 2526
RX errors 0 dropped 2564 overruns 0 frame 0
sleeping 60 seconds


> On 3 Dec 2020, at 21:43, Andrei Popa <andreipopad@xxxxxxxxx> wrote:
>
> Hi,
>
> On what kernel version should I try the patch ? I tried on 5.9 and it doesn't build.
>
>> On 18 Nov 2020, at 20:47, Rafael J. Wysocki <rjw@xxxxxxxxxxxxx> wrote:
>>
>> On Tuesday, November 17, 2020 7:31:29 PM CET Rafael J. Wysocki wrote:
>>> On 11/16/2020 8:11 AM, Andrei Popa wrote:
>>>> Hello,
>>>>
>>>> After an update from vmlinuz-4.15.0-106-generic to vmlinuz-5.4.0-37-generic we experience, on a number of servers, a very high number of rx_missed_errors and dropped packets only on the uplink 10G interface. We have another 10G downlink interface with no problems.
>>>>
>>>> The affected servers have the following mainboards:
>>>> S5520HC ver E26045-455
>>>> S5520UR ver E22554-751
>>>> S5520UR ver E22554-753
>>>> S5000VSA
>>>>
>>>> On other 30 servers with similar mainboards and/or configs there are no dropped packets with vmlinuz-5.4.0-37-generic.
>>>>
>>>> We’ve installed vanilla 4.16 and there were no dropped packets.
>>>> Vanilla 4.17 had a very high number of dropped packets like the following:
>>>>
>>>> root@shaper:~# cat test
>>>> #!/bin/bash
>>>> while true
>>>> do
>>>> ethtool -S ens6f1|grep "missed_errors"
>>>> ifconfig ens6f1|grep RX|grep dropped
>>>> sleep 1
>>>> done
>>>>
>>>> root@shaper:~# ./test
>>>> rx_missed_errors: 2418845
>>>> RX errors 0 dropped 2418888 overruns 0 frame 0
>>>> rx_missed_errors: 2426175
>>>> RX errors 0 dropped 2426218 overruns 0 frame 0
>>>> rx_missed_errors: 2431910
>>>> RX errors 0 dropped 2431953 overruns 0 frame 0
>>>> rx_missed_errors: 2437266
>>>> RX errors 0 dropped 2437309 overruns 0 frame 0
>>>> rx_missed_errors: 2443305
>>>> RX errors 0 dropped 2443348 overruns 0 frame 0
>>>> rx_missed_errors: 2448357
>>>> RX errors 0 dropped 2448400 overruns 0 frame 0
>>>> rx_missed_errors: 2452539
>>>> RX errors 0 dropped 2452582 overruns 0 frame 0
>>>>
>>>> We did a git bisect and we’ve found that the following commit generates the high number of dropped packets:
>>>>
>>>> Author: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx <mailto:rafael.j.wysocki@xxxxxxxxx>>
>>>> Date: Thu Apr 5 19:12:43 2018 +0200
>>>> cpuidle: menu: Avoid selecting shallow states with stopped tick
>>>> If the scheduler tick has been stopped already and the governor
>>>> selects a shallow idle state, the CPU can spend a long time in that
>>>> state if the selection is based on an inaccurate prediction of idle
>>>> time. That effect turns out to be relevant, so it needs to be
>>>> mitigated.
>>>> To that end, modify the menu governor to discard the result of the
>>>> idle time prediction if the tick is stopped and the predicted idle
>>>> time is less than the tick period length, unless the tick timer is
>>>> going to expire soon.
>>>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx <mailto:rafael.j.wysocki@xxxxxxxxx>>
>>>> Acked-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx <mailto:peterz@xxxxxxxxxxxxx>>
>>>> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
>>>> index 267982e471e0..1bfe03ceb236 100644
>>>> --- a/drivers/cpuidle/governors/menu.c
>>>> +++ b/drivers/cpuidle/governors/menu.c
>>>> @@ -352,13 +352,28 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>>>> */
>>>> data->predicted_us = min(data->predicted_us, expected_interval);
>>>> - /*
>>>> - * Use the performance multiplier and the user-configurable
>>>> - * latency_req to determine the maximum exit latency.
>>>> - */
>>>> - interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
>>>> - if (latency_req > interactivity_req)
>>>> - latency_req = interactivity_req;
>>>
>>> The tick_nohz_tick_stopped() check may be done after the above and it
>>> may be reworked a bit.
>>>
>>> I'll send a test patch to you shortly.
>>
>> The patch is appended, but please note that it has been rebased by hand and
>> not tested.
>>
>> Please let me know if it makes any difference.
>>
>> And in the future please avoid pasting the entire kernel config to your
>> reports, that's problematic.
>>
>> ---
>> drivers/cpuidle/governors/menu.c | 23 ++++++++++++-----------
>> 1 file changed, 12 insertions(+), 11 deletions(-)
>>
>> Index: linux-pm/drivers/cpuidle/governors/menu.c
>> ===================================================================
>> --- linux-pm.orig/drivers/cpuidle/governors/menu.c
>> +++ linux-pm/drivers/cpuidle/governors/menu.c
>> @@ -308,18 +308,18 @@ static int menu_select(struct cpuidle_dr
>> get_typical_interval(data, predicted_us)) *
>> NSEC_PER_USEC;
>>
>> - if (tick_nohz_tick_stopped()) {
>> - /*
>> - * If the tick is already stopped, the cost of possible short
>> - * idle duration misprediction is much higher, because the CPU
>> - * may be stuck in a shallow idle state for a long time as a
>> - * result of it. In that case say we might mispredict and use
>> - * the known time till the closest timer event for the idle
>> - * state selection.
>> - */
>> - if (data->predicted_us < TICK_USEC)
>> - data->predicted_us = min_t(unsigned int, TICK_USEC,
>> - ktime_to_us(delta_next));
>> + /*
>> + * If the tick is already stopped, the cost of possible short idle
>> + * duration misprediction is much higher, because the CPU may be stuck
>> + * in a shallow idle state for a long time as a result of it. In that
>> + * case, say we might mispredict and use the known time till the closest
>> + * timer event for the idle state selection, unless that event is going
>> + * to occur within the tick time frame (in which case the CPU will be
>> + * woken up from whatever idle state it gets into soon enough anyway).
>> + */
>> + if (tick_nohz_tick_stopped() && data->predicted_us < TICK_USEC &&
>> + delta_next >= TICK_NSEC) {
>> + data->predicted_us = ktime_to_us(delta_next);
>> } else {
>> /*
>> * Use the performance multiplier and the user-configurable
>