Re: INFO: rcu detected stall in br_handle_frame (2)

From: Eric Dumazet
Date: Sat Dec 28 2019 - 10:01:56 EST




On 12/28/19 3:15 AM, Florian Westphal wrote:
> syzbot <syzbot+dc9071cc5a85950bdfce@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> [ CC Eric, fq related ]
>
>> syzbot found the following crash on:
>>
>> HEAD commit: 7e0165b2 Merge branch 'akpm' (patches from Andrew)
>> git tree: upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=116ec09ee00000
>> kernel config: https://syzkaller.appspot.com/x/.config?x=1b59a3066828ac4c
>> dashboard link: https://syzkaller.appspot.com/bug?extid=dc9071cc5a85950bdfce
>> compiler: gcc (GCC) 9.0.0 20181231 (experimental)
>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=159182c1e00000
>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1221218ee00000
>>
>> Bisection is inconclusive: the bug happens on the oldest tested release.
>>
>> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=158224c1e00000
>> final crash: https://syzkaller.appspot.com/x/report.txt?x=178224c1e00000
>> console output: https://syzkaller.appspot.com/x/log.txt?x=138224c1e00000
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+dc9071cc5a85950bdfce@xxxxxxxxxxxxxxxxxxxxxxxxx
>>
>> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>> (detected by 0, t=10502 jiffies, g=10149, q=201)
>> rcu: All QSes seen, last rcu_preempt kthread activity 10502
>> (4294978441-4294967939), jiffies_till_next_fqs=1, root ->qsmask 0x0
>> sshd R running task 26584 10034 9965 0x00000008
>> Call Trace:
>> <IRQ>
>> sched_show_task kernel/sched/core.c:5954 [inline]
> [..]
>
> The reproducer sets up 'fq' sched with TCA_FQ_QUANTUM == 0x80000000
>
> This causes infinite loop in fq_dequeue:
>
> if (f->credit <= 0) {
> f->credit += q->quantum;
> goto begin;
> }
>
> ... because f->credit is either 0 or -2147483648.
>
> Eric, what is a 'sane' ->quantum value?
>
> One could simply add a 'quantum > 0 && quantum < INT_MAX'
> constraint afaics.
>
> If you don't have a better idea/suggestion for an upperlimit INT_MAX
> would be enough to prevent perpetual <= 0 condition.
>

Thanks Florian for the analysis.

I guess we could use a conservative upper bound value of (1 << 20)
( about 16 64KB packets )

diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index ff4c5e9d0d7778d86f20f4bd67cc627eed0713d9..12f1d1c6044fac9db987f7ce3a50a7e2c711358b 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -786,15 +786,20 @@ static int fq_change(struct Qdisc *sch, struct nlattr *opt,
if (tb[TCA_FQ_QUANTUM]) {
u32 quantum = nla_get_u32(tb[TCA_FQ_QUANTUM]);

- if (quantum > 0)
+ if (quantum > 0 && quantum <= (1 << 20))
q->quantum = quantum;
else
err = -EINVAL;
}

- if (tb[TCA_FQ_INITIAL_QUANTUM])
- q->initial_quantum = nla_get_u32(tb[TCA_FQ_INITIAL_QUANTUM]);
+ if (tb[TCA_FQ_INITIAL_QUANTUM]) {
+ u32 quantum = nla_get_u32(tb[TCA_FQ_INITIAL_QUANTUM]);

+ if (quantum > 0 && quantum <= (1 << 20))
+ q->initial_quantum = quantum;
+ else
+ err = -EINVAL;
+ }
if (tb[TCA_FQ_FLOW_DEFAULT_RATE])
pr_warn_ratelimited("sch_fq: defrate %u ignored.\n",
nla_get_u32(tb[TCA_FQ_FLOW_DEFAULT_RATE]));