refcount_warn_saturate WARNING (was: Re: cls_flower: Fix the behavior using port ranges with hw-offload)

From: Geert Uytterhoeven
Date: Thu Dec 19 2019 - 05:12:28 EST


Hi Komachi-san,

On Sun, Dec 8, 2019 at 10:40 PM Linux Kernel Mailing List
<linux-kernel@xxxxxxxxxxxxxxx> wrote:
> Commit: 8ffb055beae58574d3e77b4bf9d4d15eace1ca27
> Parent: 2f23cd42e19c22c24ff0e221089b7b6123b117c5
> Refname: refs/heads/master
> Web: https://git.kernel.org/torvalds/c/8ffb055beae58574d3e77b4bf9d4d15eace1ca27
> Author: Yoshiki Komachi <komachi.yoshiki@xxxxxxxxx>
> AuthorDate: Tue Dec 3 19:40:12 2019 +0900
> Committer: David S. Miller <davem@xxxxxxxxxxxxx>
> CommitDate: Tue Dec 3 11:55:46 2019 -0800
>
> cls_flower: Fix the behavior using port ranges with hw-offload
>
> The recent commit 5c72299fba9d ("net: sched: cls_flower: Classify
> packets using port ranges") had added filtering based on port ranges
> to tc flower. However the commit missed necessary changes in hw-offload
> code, so the feature gave rise to generating incorrect offloaded flow
> keys in NIC.
>
> One more detailed example is below:
>
> $ tc qdisc add dev eth0 ingress
> $ tc filter add dev eth0 ingress protocol ip flower ip_proto tcp \
> dst_port 100-200 action drop
>
> With the setup above, an exact match filter with dst_port == 0 will be
> installed in NIC by hw-offload. IOW, the NIC will have a rule which is
> equivalent to the following one.
>
> $ tc qdisc add dev eth0 ingress
> $ tc filter add dev eth0 ingress protocol ip flower ip_proto tcp \
> dst_port 0 action drop
>
> The behavior was caused by the flow dissector which extracts packet
> data into the flow key in the tc flower. More specifically, regardless
> of exact match or specified port ranges, fl_init_dissector() set the
> FLOW_DISSECTOR_KEY_PORTS flag in struct flow_dissector to extract port
> numbers from skb in skb_flow_dissect() called by fl_classify(). Note
> that device drivers received the same struct flow_dissector object as
> used in skb_flow_dissect(). Thus, offloaded drivers could not identify
> which of these is used because the FLOW_DISSECTOR_KEY_PORTS flag was
> set to struct flow_dissector in either case.
>
> This patch adds the new FLOW_DISSECTOR_KEY_PORTS_RANGE flag and the new
> tp_range field in struct fl_flow_key to recognize which filters are applied
> to offloaded drivers. At this point, when filters based on port ranges
> passed to drivers, drivers return the EOPNOTSUPP error because they do
> not support the feature (the newly created FLOW_DISSECTOR_KEY_PORTS_RANGE
> flag).
>
> Fixes: 5c72299fba9d ("net: sched: cls_flower: Classify packets using port ranges")
> Signed-off-by: Yoshiki Komachi <komachi.yoshiki@xxxxxxxxx>
> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>

I still see the below warning on m68k/ARAnyM during boot with v5.5-rc2
and next-20191219.
Reverting commit 8ffb055beae58574 ("cls_flower: Fix the behavior using
port ranges with hw-offload") fixes that.

As this is networking, perhaps this is seen on big-endian only?
Or !CONFIG_SMP?

Do you have a clue?
I'm especially worried as this commit is already being backported to stable.
Thanks!

EXT4-fs (sda1): re-mounted. Opts:
EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro
ext3 filesystem being remounted at / supports timestamps until 2038 (0x7fffffff)
------------[ cut here ]------------
WARNING: CPU: 0 PID: 7 at lib/refcount.c:28 refcount_warn_saturate+0x54/0x100
refcount_t: underflow; use-after-free.
Modules linked in:
CPU: 0 PID: 7 Comm: ksoftirqd/0 Not tainted 5.5.0-rc2-next-20191219-atari #246
Stack from 00c31e88:
00c31e88 00387ebc 00023d96 0039b5b0 0000001c 00000009 00a6d680 00023dda
0039b5b0 0000001c 001a9658 00000009 00000000 00c31ed0 00000001 00000000
04208040 0000000a 0039b5eb 00c31ef0 00c30000 001a9658 0039b5b0 0000001c
00000009 0039b5eb 0027071c 00326d1c 00000003 00326cd8 00271840 00000001
00326d1c 00000000 00a6d680 0024339c 00a6d680 00000000 00000200 000ab5e6
00048632 00a6d680 0000000a 00400d78 003fbc00 002f88a6 00400d78 002f8b5e
Call Trace: [<00023d96>] __warn+0xb2/0xb4
[<00023dda>] warn_slowpath_fmt+0x42/0x64
[<001a9658>] refcount_warn_saturate+0x54/0x100
[<001a9658>] refcount_warn_saturate+0x54/0x100
[<0027071c>] refcount_sub_and_test.constprop.77+0x38/0x3e
[<00271840>] ipv4_dst_destroy+0x24/0x42
[<0024339c>] dst_destroy+0x40/0xae
[<000ab5e6>] kfree+0x0/0x3e
[<00048632>] rcu_process_callbacks+0x9a/0x9c
[<002f88a6>] __do_softirq+0x146/0x182
[<002f8b5e>] schedule+0x0/0xb4
[<00035f3e>] kthread_parkme+0x0/0x10
[<00035a86>] __init_completion+0x0/0x20
[<0003836c>] smpboot_thread_fn+0x0/0x100
[<000360a2>] kthread_should_stop+0x0/0x12
[<00036096>] kthread_should_park+0x0/0xc
[<00025c28>] run_ksoftirqd+0x12/0x20
[<00038466>] smpboot_thread_fn+0xfa/0x100
[<00024914>] do_exit+0x0/0x6d4
[<0003f620>] complete+0x0/0x34
[<0003665c>] kthread+0xb2/0xbc
[<00035a86>] __init_completion+0x0/0x20
[<000365aa>] kthread+0x0/0xbc
[<00002ab8>] ret_from_kernel_thread+0xc/0x14
---[ end trace cddc6a39eb5bb237 ]---

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds