[linus:master] [sched] 851a723e45: WARNING:possible_circular_locking_dependency_detected

From: kernel test robot
Date: Mon Oct 30 2023 - 11:40:23 EST




Hello,

kernel test robot noticed "WARNING:possible_circular_locking_dependency_detected" on:

commit: 851a723e45d1c4c8f6f7b0d2cfbc5f53690bb4e9 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[test failed on linus/master 56567a20b22bdbf85c3e55eee3bf2bd23fa2f108]
[test failed on linux-next/master 66f1e1ea3548378ff6387b1ce0b40955d54e86aa]

in testcase: rcutorture
version:
with following parameters:

runtime: 300s
test: cpuhotplug
torture_type: trivial



compiler: gcc-12
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


please be aware that the issue doesn't always happen, ~50% rate:

da019032819a1f09 851a723e45d1c4c8f6f7b0d2cfb
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:44 45% 20:44 dmesg.WARNING:possible_circular_locking_dependency_detected
:44 45% 20:44 dmesg.WARNING:possible_circular_locking_dependency_detected_migration_is_trying_to_acquire_lock:at:rcu_nocb_try_bypass_but_task_is_already_holding_lock:at:__balance_push_cpu_stop/0x



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
| Closes: https://lore.kernel.org/oe-lkp/202310302207.a25f1a30-oliver.sang@xxxxxxxxx



[ 708.890347][ T22]
[ 708.890648][ T22] ======================================================
[ 708.891276][ T22] WARNING: possible circular locking dependency detected
[ 708.891882][ T22] 6.1.0-rc2-00015-g851a723e45d1 #1 Tainted: G W
[ 708.892549][ T22] ------------------------------------------------------
[ 708.893234][ T22] migration/1/22 is trying to acquire lock:
[ 708.893792][ T22] ffff88839d3f2478 (&rdp->nocb_lock){-.-.}-{2:2}, at: rcu_nocb_try_bypass+0x71b/0x990
[ 708.894716][ T22]
[ 708.894716][ T22] but task is already holding lock:
[ 708.895414][ T22] ffff88839d3f1518 (&rq->__lock){-.-.}-{2:2}, at: __balance_push_cpu_stop+0xa0/0x470
[ 708.896311][ T22]
[ 708.896311][ T22] which lock already depends on the new lock.
[ 708.896311][ T22]
[ 708.897274][ T22]
[ 708.897274][ T22] the existing dependency chain (in reverse order) is:
[ 708.898122][ T22]
[ 708.898122][ T22] -> #2 (&rq->__lock){-.-.}-{2:2}:
[ 708.898816][ T22] __lock_acquire+0x9d7/0x1330
[ 708.899332][ T22] lock_acquire+0x193/0x500
[ 708.899802][ T22] _raw_spin_lock_nested+0x2e/0x70
[ 708.900317][ T22] resched_cpu+0x77/0x150
[ 708.900802][ T22] rcu_implicit_dynticks_qs+0x5bc/0x850
[ 708.901391][ T22] force_qs_rnp+0x40b/0x790
[ 708.901873][ T22] rcu_gp_fqs_loop+0x63a/0xa40
[ 708.902393][ T22] rcu_gp_kthread+0x23b/0x380
[ 708.902886][ T22] kthread+0x246/0x2d0
[ 708.903320][ T22] ret_from_fork+0x1f/0x30
[ 708.903777][ T22]
[ 708.903777][ T22] -> #1 (rcu_node_0){-.-.}-{2:2}:
[ 708.904456][ T22] __lock_acquire+0x9d7/0x1330
[ 708.904971][ T22] lock_acquire+0x193/0x500
[ 708.905470][ T22] _raw_spin_lock+0x30/0x70
[ 708.905961][ T22] nocb_gp_wait+0x349/0x1d50
[ 708.906460][ T22] rcu_nocb_gp_kthread+0xa3/0x230
[ 708.907000][ T22] kthread+0x246/0x2d0
[ 708.907448][ T22] ret_from_fork+0x1f/0x30
[ 708.907923][ T22]
[ 708.907923][ T22] -> #0 (&rdp->nocb_lock){-.-.}-{2:2}:
[ 708.908664][ T22] check_prev_add+0x94/0xbc0
[ 708.909167][ T22] validate_chain+0x701/0x950
[ 708.909678][ T22] __lock_acquire+0x9d7/0x1330
[ 708.910197][ T22] lock_acquire+0x193/0x500
[ 708.910682][ T22] _raw_spin_lock+0x30/0x70
[ 708.911174][ T22] rcu_nocb_try_bypass+0x71b/0x990
[ 708.911715][ T22] call_rcu+0x22d/0x840
[ 708.912172][ T22] __kmem_cache_free+0x1c9/0x460
[ 708.912691][ T22] select_fallback_rq+0x546/0x690
[ 708.913226][ T22] __balance_push_cpu_stop+0x21d/0x470
[ 708.913791][ T22] cpu_stopper_thread+0x206/0x450
[ 708.914328][ T22] smpboot_thread_fn+0x38a/0x870
[ 708.914852][ T22] kthread+0x246/0x2d0
[ 708.915297][ T22] ret_from_fork+0x1f/0x30
[ 708.915769][ T22]
[ 708.915769][ T22] other info that might help us debug this:
[ 708.915769][ T22]
[ 708.916718][ T22] Chain exists of:
[ 708.916718][ T22] &rdp->nocb_lock --> rcu_node_0 --> &rq->__lock
[ 708.916718][ T22]
[ 708.917838][ T22] Possible unsafe locking scenario:
[ 708.917838][ T22]
[ 708.918541][ T22] CPU0 CPU1
[ 708.919055][ T22] ---- ----
[ 708.919543][ T22] lock(&rq->__lock);
[ 708.919927][ T22] lock(rcu_node_0);
[ 708.920535][ T22] lock(&rq->__lock);
[ 708.921146][ T22] lock(&rdp->nocb_lock);
[ 708.921576][ T22]
[ 708.921576][ T22] *** DEADLOCK ***
[ 708.921576][ T22]
[ 708.922329][ T22] 2 locks held by migration/1/22:
[ 708.922796][ T22] #0: ffff88812e1ec000 (&p->pi_lock){-.-.}-{2:2}, at: __balance_push_cpu_stop+0x8b/0x470
[ 708.923764][ T22] #1: ffff88839d3f1518 (&rq->__lock){-.-.}-{2:2}, at: __balance_push_cpu_stop+0xa0/0x470
[ 708.924711][ T22]
[ 708.924711][ T22] stack backtrace:
[ 708.925280][ T22] CPU: 1 PID: 22 Comm: migration/1 Tainted: G W 6.1.0-rc2-00015-g851a723e45d1 #1
[ 708.926248][ T22] Stopper: __balance_push_cpu_stop+0x0/0x470 <- balance_push+0x218/0x360
[ 708.927040][ T22] Call Trace:
[ 708.927351][ T22] <TASK>
[ 708.927632][ T22] dump_stack_lvl+0x9c/0xea
[ 708.928053][ T22] check_noncircular+0x28e/0x330
[ 708.928517][ T22] ? kvm_sched_clock_read+0x14/0x30
[ 708.929024][ T22] ? print_circular_bug+0x470/0x470
[ 708.929546][ T22] ? housekeeping_affine+0xa0/0xa0
[ 708.930071][ T22] ? alloc_chain_hlocks+0x21f/0x6f0
[ 708.930572][ T22] check_prev_add+0x94/0xbc0
[ 708.931053][ T22] ? lockdep_lock+0xa8/0x190
[ 708.931491][ T22] ? usage_skip+0x170/0x170
[ 708.931936][ T22] validate_chain+0x701/0x950
[ 708.932378][ T22] ? check_prev_add+0xbc0/0xbc0
[ 708.932842][ T22] ? mark_lock+0x36/0x3c0
[ 708.933282][ T22] __lock_acquire+0x9d7/0x1330
[ 708.933749][ T22] lock_acquire+0x193/0x500
[ 708.934198][ T22] ? rcu_nocb_try_bypass+0x71b/0x990
[ 708.934717][ T22] ? lock_downgrade+0x120/0x120
[ 708.935188][ T22] _raw_spin_lock+0x30/0x70
[ 708.935631][ T22] ? rcu_nocb_try_bypass+0x71b/0x990
[ 708.936143][ T22] rcu_nocb_try_bypass+0x71b/0x990
[ 708.936641][ T22] ? rcu_implicit_dynticks_qs+0x850/0x850
[ 708.937195][ T22] ? find_and_remove_object+0x1b0/0x1b0
[ 708.937700][ T22] call_rcu+0x22d/0x840
[ 708.938095][ T22] ? rcu_nocb_try_bypass+0x990/0x990
[ 708.938591][ T22] ? select_fallback_rq+0x546/0x690
[ 708.939097][ T22] __kmem_cache_free+0x1c9/0x460
[ 708.939580][ T22] select_fallback_rq+0x546/0x690
[ 708.940067][ T22] ? __do_set_cpus_allowed+0x730/0x730
[ 708.940593][ T22] ? update_rq_clock+0xd8/0x890
[ 708.941060][ T22] __balance_push_cpu_stop+0x21d/0x470
[ 708.941605][ T22] ? tracer_hardirqs_on+0x3b/0x3f0
[ 708.942101][ T22] ? migration_cpu_stop+0xb60/0xb60
[ 708.942582][ T22] ? _raw_spin_unlock_irq+0x24/0x40
[ 708.943068][ T22] ? migration_cpu_stop+0xb60/0xb60
[ 708.943566][ T22] cpu_stopper_thread+0x206/0x450
[ 708.944084][ T22] ? stop_core_cpuslocked+0x1c0/0x1c0
[ 708.944611][ T22] ? smpboot_thread_fn+0x6f/0x870
[ 708.945096][ T22] smpboot_thread_fn+0x38a/0x870
[ 708.945604][ T22] ? find_next_bit+0x30/0x30
[ 708.946058][ T22] kthread+0x246/0x2d0
[ 708.946467][ T22] ? kthread_complete_and_exit+0x40/0x40
[ 708.947015][ T22] ret_from_fork+0x1f/0x30
[ 708.947449][ T22] </TASK>
[ 708.974879][ T3910] smpboot: CPU 1 is now offline
[ 709.052971][ T3910] x86: Booting SMP configuration:
[ 709.053504][ T3910] smpboot: Booting Node 0 Processor 1 APIC 0x1
[ 709.065107][ T0] masked ExtINT on CPU#1
[ 709.177297][ T3910] smpboot: CPU 1 is now offline
[ 709.236384][ T3910] x86: Booting SMP configuration:
[ 709.236914][ T3910] smpboot: Booting Node 0 Processor 1 APIC 0x1
[ 709.248562][ T0] masked ExtINT on CPU#1
[ 709.357577][ T3910] smpboot: CPU 1 is now offline



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231030/202310302207.a25f1a30-oliver.sang@xxxxxxxxx



--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki