Re: [PATCH v14 net-next 08/23] net/tcp: Add AO sign to RST packets

From: Oliver Sang
Date: Wed Oct 18 2023 - 03:32:34 EST


hi, Dmitry Safonov,

On Wed, Oct 18, 2023 at 12:19:56AM +0100, Dmitry Safonov wrote:
> On 10/17/23 09:37, kernel test robot wrote:
> >
> >
> > Hello,
> >
> > kernel test robot noticed "BUG:sleeping_function_called_from_invalid_context_at_net/core/sock.c" on:
> >
> > commit: df13d11e6a2a3cc5f973aca36f68f880fa42d55f ("[PATCH v14 net-next 08/23] net/tcp: Add AO sign to RST packets")
> > url: https://github.com/intel-lab-lkp/linux/commits/Dmitry-Safonov/net-tcp-Prepare-tcp_md5sig_pool-for-TCP-AO/20231010-071412
> > patch link: https://lore.kernel.org/all/20231009230722.76268-9-dima@xxxxxxxxxx/
> > patch subject: [PATCH v14 net-next 08/23] net/tcp: Add AO sign to RST packets
> >
> > in testcase: trinity
> > version: trinity-i386-abe9de86-1_20230429
> > with following parameters:
> >
> > runtime: 300s
> > group: group-02
> > nr_groups: 5
> >
> > test-description: Trinity is a linux system call fuzz tester.
> > test-url: http://codemonkey.org.uk/projects/trinity/
> >
> >
> > compiler: gcc-12
> > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> >
> > (please refer to attached dmesg/kmsg for entire log/backtrace)
> >
> >
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
> > | Closes: https://lore.kernel.org/oe-lkp/202310171606.30e15ebe-oliver.sang@xxxxxxxxx
> >
> >
> > [ 221.348247][ T7133] BUG: sleeping function called from invalid context at net/core/sock.c:2978
> > [ 221.349875][ T7133] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 7133, name: trinity-c4
> > [ 221.351666][ T7133] preempt_count: 0, expected: 0
> > [ 221.352614][ T7133] RCU nest depth: 1, expected: 0
> > [ 221.353518][ T7133] 2 locks held by trinity-c4/7133:
> > [ 221.354530][ T7133] #0: ed8b5660 (sk_lock-AF_INET6){+.+.}-{0:0}, at: tcp_sendmsg (net/ipv4/tcp.c:1336)
> > [ 221.374314][ T7133] #1: c27dbb18 (rcu_read_lock){....}-{1:2}, at: inet6_csk_xmit (include/linux/rcupdate.h:747 net/ipv6/inet6_connection_sock.c:129)
> > [ 221.375906][ T7133] CPU: 1 PID: 7133 Comm: trinity-c4 Tainted: G W TN 6.6.0-rc4-01105-gdf13d11e6a2a #1
>
> Quite puzzling. It doesn't seem that it can happen to inet6_csk_xmit():
>
> : rcu_read_lock();
> : skb_dst_set_noref(skb, dst);
> :
> : /* Restore final destination back after routing done */
> : fl6.daddr = sk->sk_v6_daddr;
> :
> : res = ip6_xmit(sk, skb, &fl6, sk->sk_mark, rcu_dereference(np->opt),
> : np->tclass, sk->sk_priority);
> : rcu_read_unlock();
>
> So, I presumed the calltrace was for nested rcu_read_lock() case.
> Then I've looked at all return/goto cases, I couldn't find any
> unbalanced rcu_read_{,un}lock().
>
> Is this reproducible by any chance?

do you mean how often it could be reproduced?

we run the tests upon this commit and parent both up to 15 times.
from below, parent shows quite clean, but there are various issues while
running tests upon this patch.

8468a6f4f3143 net/tcp: Add tcp_parse_auth_options()
df13d11e6a2a3 net/tcp: Add AO sign to RST packets

8468a6f4f3143ba2 df13d11e6a2a3cc5f973aca36f6
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:15 87% 13:15 dmesg.BUG:sleeping_function_called_from_invalid_context_at_arch/x86/entry/common.c
:15 33% 5:15 dmesg.BUG:sleeping_function_called_from_invalid_context_at_fs/dcache.c
:15 13% 2:15 dmesg.BUG:sleeping_function_called_from_invalid_context_at_include/linux/percpu-rwsem.h
:15 60% 9:15 dmesg.BUG:sleeping_function_called_from_invalid_context_at_include/linux/sched/mm.h
:15 7% 1:15 dmesg.BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/mutex.c
:15 7% 1:15 dmesg.BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/rwsem.c
:15 7% 1:15 dmesg.BUG:sleeping_function_called_from_invalid_context_at_kernel/sched/completion.c
:15 40% 6:15 dmesg.BUG:sleeping_function_called_from_invalid_context_at_kernel/task_work.c
:15 33% 5:15 dmesg.BUG:sleeping_function_called_from_invalid_context_at_lib/iov_iter.c
:15 33% 5:15 dmesg.BUG:sleeping_function_called_from_invalid_context_at_lib/strncpy_from_user.c
:15 20% 3:15 dmesg.BUG:sleeping_function_called_from_invalid_context_at_lib/usercopy.c
:15 13% 2:15 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/vmscan.c
:15 100% 15:15 dmesg.BUG:sleeping_function_called_from_invalid_context_at_net/core/sock.c
:15 100% 15:15 dmesg.EIP:rcu_note_context_switch
:15 100% 15:15 dmesg.WARNING:at_kernel/rcu/tree_plugin.h:#rcu_note_context_switch
:15 100% 15:15 dmesg.WARNING:lock_held_when_returning_to_user_space
:15 100% 15:15 dmesg.is_leaving_the_kernel_with_locks_still_held


>
>
> > [ 221.377820][ T7133] Call Trace:
> > [ 221.378447][ T7133] dump_stack_lvl (lib/dump_stack.c:107)
> > [ 221.379373][ T7133] dump_stack (lib/dump_stack.c:114)
> > [ 221.380186][ T7133] __might_resched (kernel/sched/core.c:10188)
> > [ 221.381100][ T7133] __release_sock (include/linux/sched.h:2097 net/core/sock.c:2978)
> > [ 221.381960][ T7133] release_sock (net/core/sock.c:3520)
> > [ 221.382784][ T7133] inet_wait_for_connect (net/ipv4/af_inet.c:609)
> > [ 221.383763][ T7133] ? autoremove_wake_function (kernel/sched/wait.c:479)
> > [ 221.384757][ T7133] __inet_stream_connect (net/ipv4/af_inet.c:701 (discriminator 1))
> > [ 221.385741][ T7133] ? kmalloc_node_trace (mm/slab_common.c:1133)
> > [ 221.386702][ T7133] tcp_sendmsg_fastopen (net/ipv4/tcp.c:1026)
> > [ 221.387685][ T7133] tcp_sendmsg_locked (net/ipv4/tcp.c:1073)
> > [ 221.388642][ T7133] ? find_held_lock (kernel/locking/lockdep.c:5243)
> > [ 221.389536][ T7133] ? mark_held_locks (kernel/locking/lockdep.c:4273)
> > [ 221.390437][ T7133] ? lock_sock_nested (net/core/sock.c:3511)
> > [ 221.391359][ T7133] ? lock_sock_nested (net/core/sock.c:3511)
> > [ 221.392335][ T7133] tcp_sendmsg (net/ipv4/tcp.c:1336)
> > [ 221.393153][ T7133] inet6_sendmsg (net/ipv6/af_inet6.c:658 (discriminator 2))
> > [ 221.394010][ T7133] ____sys_sendmsg (net/socket.c:730 net/socket.c:745 net/socket.c:2558)
> > [ 221.394927][ T7133] ___sys_sendmsg (net/socket.c:2612)
> > [ 221.395844][ T7133] __sys_sendmsg (net/socket.c:2641)
> > [ 221.396671][ T7133] __ia32_sys_sendmsg (net/socket.c:2648)
> > [ 221.397562][ T7133] __do_fast_syscall_32 (arch/x86/entry/common.c:112 arch/x86/entry/common.c:178)
> > [ 221.398485][ T7133] do_fast_syscall_32 (arch/x86/entry/common.c:203)
> > [ 221.399401][ T7133] do_SYSENTER_32 (arch/x86/entry/common.c:247)
> > [ 221.404363][ T7133] entry_SYSENTER_32 (arch/x86/entry/entry_32.S:840)
> > [ 221.405255][ T7133] EIP: 0xb7f59579
> > [ 221.405931][ T7133] Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
> > All code
> > ========
> > 0: b8 01 10 06 03 mov $0x3061001,%eax
> > 5: 74 b4 je 0xffffffffffffffbb
> > 7: 01 10 add %edx,(%rax)
> > 9: 07 (bad)
> > a: 03 74 b0 01 add 0x1(%rax,%rsi,4),%esi
> > e: 10 08 adc %cl,(%rax)
> > 10: 03 74 d8 01 add 0x1(%rax,%rbx,8),%esi
> > ...
> > 20: 00 51 52 add %dl,0x52(%rcx)
> > 23: 55 push %rbp
> > 24:* 89 e5 mov %esp,%ebp <-- trapping instruction
> > 26: 0f 34 sysenter
> > 28: cd 80 int $0x80
> > 2a: 5d pop %rbp
> > 2b: 5a pop %rdx
> > 2c: 59 pop %rcx
> > 2d: c3 ret
> > 2e: 90 nop
> > 2f: 90 nop
> > 30: 90 nop
> > 31: 90 nop
> > 32: 8d 76 00 lea 0x0(%rsi),%esi
> > 35: 58 pop %rax
> > 36: b8 77 00 00 00 mov $0x77,%eax
> > 3b: cd 80 int $0x80
> > 3d: 90 nop
> > 3e: 8d .byte 0x8d
> > 3f: 76 .byte 0x76
> >
> > Code starting with the faulting instruction
> > ===========================================
> > 0: 5d pop %rbp
> > 1: 5a pop %rdx
> > 2: 59 pop %rcx
> > 3: c3 ret
> > 4: 90 nop
> > 5: 90 nop
> > 6: 90 nop
> > 7: 90 nop
> > 8: 8d 76 00 lea 0x0(%rsi),%esi
> > b: 58 pop %rax
> > c: b8 77 00 00 00 mov $0x77,%eax
> > 11: cd 80 int $0x80
> > 13: 90 nop
> > 14: 8d .byte 0x8d
> > 15: 76 .byte 0x76
> > [ 221.409527][ T7133] EAX: ffffffda EBX: 00000137 ECX: 01ce1580 EDX: 240449b4
> > [ 221.410805][ T7133] ESI: 000000b1 EDI: 8b8b8b8b EBP: 08080808 ESP: bfc205fc
> > [ 221.412147][ T7133] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000296
> > [ 221.481648][ T7133]
> > [ 221.482194][ T7133] ================================================
> > [ 221.483377][ T7133] WARNING: lock held when returning to user space!
> > [ 221.484574][ T7133] 6.6.0-rc4-01105-gdf13d11e6a2a #1 Tainted: G W TN
> > [ 221.485904][ T7133] ------------------------------------------------
> > [ 221.487044][ T7133] trinity-c4/7133 is leaving the kernel with locks still held!
> > [ 221.488448][ T7133] 1 lock held by trinity-c4/7133:
> > [ 221.489401][ T7133] #0: c27dbb18 (rcu_read_lock){....}-{1:2}, at: inet6_csk_xmit (include/linux/rcupdate.h:747 net/ipv6/inet6_connection_sock.c:129)
> > [ 221.491125][ T7133] ------------[ cut here ]------------
> > [ 221.501170][ T7133] Voluntary context switch within RCU read-side critical section!
> > [ 221.501214][ T7133] WARNING: CPU: 1 PID: 7133 at kernel/rcu/tree_plugin.h:320 rcu_note_context_switch (kernel/rcu/tree_plugin.h:320 (discriminator 11))
> > [ 221.504458][ T7133] Modules linked in: ipmi_msghandler uio_pdrv_genirq uio rtc_cmos processor fuse drm drm_panel_orientation_quirks configfs
> > [ 221.506701][ T7133] CPU: 1 PID: 7133 Comm: trinity-c4 Tainted: G W TN 6.6.0-rc4-01105-gdf13d11e6a2a #1
> > [ 221.508634][ T7133] EIP: rcu_note_context_switch (kernel/rcu/tree_plugin.h:320 (discriminator 11))
> > [ 221.509684][ T7133] Code: e9 87 fe ff ff 8d 74 26 00 8b 41 2c 89 45 ec e9 16 ff ff ff 8d 74 26 00 90 c6 05 09 88 94 c2 01 68 04 84 32 c2 e8 47 14 f5 ff <0f> 0b 5a e9 b0 fd ff ff 8d b4 26 00 00 00 00 81 e2 ff ff ff 7f 0f
> > All code
> > ========
> > 0: e9 87 fe ff ff jmp 0xfffffffffffffe8c
> > 5: 8d 74 26 00 lea 0x0(%rsi,%riz,1),%esi
> > 9: 8b 41 2c mov 0x2c(%rcx),%eax
> > c: 89 45 ec mov %eax,-0x14(%rbp)
> > f: e9 16 ff ff ff jmp 0xffffffffffffff2a
> > 14: 8d 74 26 00 lea 0x0(%rsi,%riz,1),%esi
> > 18: 90 nop
> > 19: c6 05 09 88 94 c2 01 movb $0x1,-0x3d6b77f7(%rip) # 0xffffffffc2948829
> > 20: 68 04 84 32 c2 push $0xffffffffc2328404
> > 25: e8 47 14 f5 ff call 0xfffffffffff51471
> > 2a:* 0f 0b ud2 <-- trapping instruction
> > 2c: 5a pop %rdx
> > 2d: e9 b0 fd ff ff jmp 0xfffffffffffffde2
> > 32: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi
> > 39: 81 e2 ff ff ff 7f and $0x7fffffff,%edx
> > 3f: 0f .byte 0xf
> >
> > Code starting with the faulting instruction
> > ===========================================
> > 0: 0f 0b ud2
> > 2: 5a pop %rdx
> > 3: e9 b0 fd ff ff jmp 0xfffffffffffffdb8
> > 8: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi
> > f: 81 e2 ff ff ff 7f and $0x7fffffff,%edx
> > 15: 0f .byte 0xf
> > [ 221.513148][ T7133] EAX: 00000000 EBX: e52d7c40 ECX: 00000000 EDX: 00000000
> > [ 221.514443][ T7133] ESI: 00000000 EDI: eadb5640 EBP: ead7df28 ESP: ead7df10
> > [ 221.515747][ T7133] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010046
> > [ 221.517184][ T7133] CR0: 80050033 CR2: 01ce68cc CR3: 2ae6f000 CR4: 00000690
> > [ 221.518449][ T7133] Call Trace:
> > [ 221.519074][ T7133] ? show_regs (arch/x86/kernel/dumpstack.c:478)
> > [ 221.519826][ T7133] ? rcu_note_context_switch (kernel/rcu/tree_plugin.h:320 (discriminator 11))
> > [ 221.520887][ T7133] ? __warn (kernel/panic.c:673)
> > [ 221.521663][ T7133] ? rcu_note_context_switch (kernel/rcu/tree_plugin.h:320 (discriminator 11))
> > [ 221.522722][ T7133] ? report_bug (lib/bug.c:201 lib/bug.c:219)
> > [ 221.523615][ T7133] ? exc_overflow (arch/x86/kernel/traps.c:250)
> > [ 221.524517][ T7133] ? handle_bug (arch/x86/kernel/traps.c:216)
> > [ 221.525240][ T7133] ? exc_invalid_op (arch/x86/kernel/traps.c:258 (discriminator 1))
> > [ 221.526115][ T7133] ? handle_exception (arch/x86/entry/entry_32.S:1056)
> > [ 221.527035][ T7133] ? exc_overflow (arch/x86/kernel/traps.c:250)
> > [ 221.527882][ T7133] ? rcu_note_context_switch (kernel/rcu/tree_plugin.h:320 (discriminator 11))
> > [ 221.528945][ T7133] ? exc_overflow (arch/x86/kernel/traps.c:250)
> > [ 221.529743][ T7133] ? rcu_note_context_switch (kernel/rcu/tree_plugin.h:320 (discriminator 11))
> > [ 221.530784][ T7133] __schedule (arch/x86/include/asm/preempt.h:80 (discriminator 3) kernel/sched/core.c:556 (discriminator 3) kernel/sched/sched.h:1372 (discriminator 3) kernel/sched/sched.h:1681 (discriminator 3) kernel/sched/core.c:6612 (discriminator 3))
> > [ 221.531573][ T7133] ? exit_to_user_mode_prepare (kernel/entry/common.c:158 kernel/entry/common.c:204)
> > [ 221.532627][ T7133] schedule (arch/x86/include/asm/preempt.h:85 (discriminator 1) kernel/sched/core.c:6772 (discriminator 1))
> > [ 221.533358][ T7133] exit_to_user_mode_prepare (kernel/entry/common.c:161 kernel/entry/common.c:204)
> > [ 221.534382][ T7133] ? sysvec_reboot (arch/x86/kernel/smp.c:269)
> > [ 221.535255][ T7133] irqentry_exit_to_user_mode (kernel/entry/common.c:130 kernel/entry/common.c:311)
> > [ 221.536302][ T7133] irqentry_exit (kernel/entry/common.c:445)
> > [ 221.537101][ T7133] sysvec_reschedule_ipi (arch/x86/kernel/smp.c:269)
> > [ 221.538079][ T7133] handle_exception (arch/x86/entry/entry_32.S:1056)
> > [ 221.538990][ T7133] EIP: 0xb7f59579
> > [ 221.539674][ T7133] Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
> > All code
> > ========
> > 0: b8 01 10 06 03 mov $0x3061001,%eax
> > 5: 74 b4 je 0xffffffffffffffbb
> > 7: 01 10 add %edx,(%rax)
> > 9: 07 (bad)
> > a: 03 74 b0 01 add 0x1(%rax,%rsi,4),%esi
> > e: 10 08 adc %cl,(%rax)
> > 10: 03 74 d8 01 add 0x1(%rax,%rbx,8),%esi
> > ...
> > 20: 00 51 52 add %dl,0x52(%rcx)
> > 23: 55 push %rbp
> > 24:* 89 e5 mov %esp,%ebp <-- trapping instruction
> > 26: 0f 34 sysenter
> > 28: cd 80 int $0x80
> > 2a: 5d pop %rbp
> > 2b: 5a pop %rdx
> > 2c: 59 pop %rcx
> > 2d: c3 ret
> > 2e: 90 nop
> > 2f: 90 nop
> > 30: 90 nop
> > 31: 90 nop
> > 32: 8d 76 00 lea 0x0(%rsi),%esi
> > 35: 58 pop %rax
> > 36: b8 77 00 00 00 mov $0x77,%eax
> > 3b: cd 80 int $0x80
> > 3d: 90 nop
> > 3e: 8d .byte 0x8d
> > 3f: 76 .byte 0x76
> >
> > Code starting with the faulting instruction
> > ===========================================
> > 0: 5d pop %rbp
> > 1: 5a pop %rdx
> > 2: 59 pop %rcx
> > 3: c3 ret
> > 4: 90 nop
> > 5: 90 nop
> > 6: 90 nop
> > 7: 90 nop
> > 8: 8d 76 00 lea 0x0(%rsi),%esi
> > b: 58 pop %rax
> > c: b8 77 00 00 00 mov $0x77,%eax
> > 11: cd 80 int $0x80
> > 13: 90 nop
> > 14: 8d .byte 0x8d
> > 15: 76 .byte 0x76
> > [ 221.543249][ T7133] EAX: ffffff91 EBX: 00000137 ECX: bfc205fc EDX: b7f59579
> > [ 221.544585][ T7133] ESI: 000000b1 EDI: 8b8b8b8b EBP: 08080808 ESP: bfc205fc
> > [ 221.545872][ T7133] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000296
> > [ 221.547230][ T7133] ? sysvec_reboot (arch/x86/kernel/smp.c:269)
> > [ 221.548133][ T7133] irq event stamp: 10576
> > [ 221.548909][ T7133] hardirqs last enabled at (10575): _raw_spin_unlock_irq (arch/x86/include/asm/irqflags.h:42 arch/x86/include/asm/irqflags.h:77 include/linux/spinlock_api_smp.h:159 kernel/locking/spinlock.c:202)
> > [ 221.550576][ T7133] hardirqs last disabled at (10576): exit_to_user_mode_prepare (include/linux/entry-common.h:191 kernel/entry/common.c:181 kernel/entry/common.c:204)
> > [ 221.552434][ T7133] softirqs last enabled at (10570): release_sock (net/core/sock.c:3528)
> > [ 221.553958][ T7133] softirqs last disabled at (10568): release_sock (net/core/sock.c:3517)
> > [ 221.555538][ T7133] ---[ end trace 0000000000000000 ]---
> >
> >
> > The kernel config and materials to reproduce are available at:
> > https://download.01.org/0day-ci/archive/20231017/202310171606.30e15ebe-oliver.sang@xxxxxxxxx
> >
> >
> >
>
> Thanks,
> Dmitry
>