Re: [PATCH -next] mm/hotplug: silence a lockdep splat with printk()

From: Qian Cai
Date: Wed Jan 15 2020 - 12:16:25 EST




> On Jan 15, 2020, at 12:02 PM, Petr Mladek <pmladek@xxxxxxxx> wrote:
>
> On Wed 2020-01-15 06:49:03, Qian Cai wrote:
>>
>>
>>> On Jan 15, 2020, at 4:52 AM, Petr Mladek <pmladek@xxxxxxxx> wrote:
>>>
>>> I could understand that Michal is against hack in -mm code that
>>> would just hide a false positive warning.
>>
>> Well, I donât have any confidence to say everything this patch is
>> trying to fix is false positives.
>
> You look at this from a wrong angle. AFAIK, all lockdep reports pasted
> in the below mentioned thread were false positives. Now, this patch
> complicates an already complicated -mm code to hide the warning
> and fix theoretical problems.

What makes you say all of those are false positives?

[12471.671123] WARNING: possible circular locking dependency detected
[12471.677995] 5.4.0-rc6-next-20191111+ #2 Tainted: G W L
[12471.684950] ------------------------------------------------------
[12471.691819] read_all/70259 is trying to acquire lock:
[12471.697559] ffff00977b407290 (&(&zone->lock)->rlock){..-.}, at: rmqueue+0xf1c/0x2050
[12471.706005]
but task is already holding lock:
[12471.713219] 69ff00082000fd18 (&(&n->list_lock)->rlock){-.-.}, at: list_locations+0x104/0x4b4
[12471.722364]
which lock already depends on the new lock.

[12471.732617]
the existing dependency chain (in reverse order) is:
[12471.741480]
-> #4 (&(&n->list_lock)->rlock){-.-.}:
[12471.749150] lock_acquire+0x320/0x360
[12471.754028] _raw_spin_lock+0x64/0x80
[12471.758903] get_partial_node+0x48/0x208
[12471.764037] ___slab_alloc+0x1b8/0x640
[12471.768997] __kmalloc+0x3c4/0x490
[12471.773623] __tty_buffer_request_room+0x118/0x1f8
[12471.779627] tty_insert_flip_string_fixed_flag+0x6c/0x144
[12471.786240] pty_write+0x80/0xd0
[12471.790680] n_tty_write+0x450/0x60c
[12471.795466] tty_write+0x338/0x474
[12471.800082] __vfs_write+0x88/0x214
[12471.804780] vfs_write+0x12c/0x1a4
[12471.809393] redirected_tty_write+0x90/0xdc
[12471.814790] do_loop_readv_writev+0x140/0x180
[12471.820357] do_iter_write+0xe0/0x10c
[12471.825230] vfs_writev+0x134/0x1cc
[12471.829929] do_writev+0xbc/0x130
[12471.834455] __arm64_sys_writev+0x58/0x8c
[12471.839688] el0_svc_handler+0x170/0x240
[12471.844829] el0_sync_handler+0x150/0x250
[12471.850049] el0_sync+0x164/0x180
[12471.854572]
-> #3 (&(&port->lock)->rlock){-.-.}:
[12471.862057] lock_acquire+0x320/0x360
[12471.866930] _raw_spin_lock_irqsave+0x7c/0x9c
[12471.872498] tty_port_tty_get+0x24/0x60
[12471.877545] tty_port_default_wakeup+0x1c/0x3c
[12471.883199] tty_port_tty_wakeup+0x34/0x40
[12471.888510] uart_write_wakeup+0x28/0x44
[12471.893644] pl011_tx_chars+0x1b8/0x270
[12471.898696] pl011_start_tx+0x24/0x70
[12471.903570] __uart_start+0x5c/0x68
[12471.908269] uart_write+0x164/0x1c8
[12471.912969] do_output_char+0x33c/0x348
[12471.918016] n_tty_write+0x4bc/0x60c
[12471.922802] tty_write+0x338/0x474
[12471.927414] redirected_tty_write+0xc0/0xdc
[12471.932808] do_loop_readv_writev+0x140/0x180
[12471.938375] do_iter_write+0xe0/0x10c
[12471.943248] vfs_writev+0x134/0x1cc
[12471.947950] do_writev+0xbc/0x130
[12471.952478] __arm64_sys_writev+0x58/0x8c
[12471.957700] el0_svc_handler+0x170/0x240
[12471.962833] el0_sync_handler+0x150/0x250
[12471.968053] el0_sync+0x164/0x180
[12471.972576]
-> #2 (&port_lock_key){-.-.}:
[12471.979453] lock_acquire+0x320/0x360
[12471.984326] _raw_spin_lock+0x64/0x80
[12471.989200] pl011_console_write+0xec/0x2cc
[12471.994595] console_unlock+0x794/0x96c
[12471.999641] vprintk_emit+0x260/0x31c
[12472.004513] vprintk_default+0x54/0x7c
[12472.009475] vprintk_func+0x218/0x254
[12472.014358] printk+0x7c/0xa4
[12472.018536] register_console+0x734/0x7b0
[12472.023757] uart_add_one_port+0x734/0x834
[12472.029065] pl011_register_port+0x6c/0xac
[12472.034372] sbsa_uart_probe+0x234/0x2ec
[12472.039508] platform_drv_probe+0xd4/0x124
[12472.044821] really_probe+0x250/0x71c
[12472.049694] driver_probe_device+0xb4/0x200
[12472.055090] __device_attach_driver+0xd8/0x188
[12472.060744] bus_for_each_drv+0xbc/0x110
[12472.065878] __device_attach+0x120/0x220
[12472.071012] device_initial_probe+0x20/0x2c
[12472.076405] bus_probe_device+0x54/0x100
[12472.081539] device_add+0xae8/0xc2c
[12472.086242] platform_device_add+0x278/0x3b8
[12472.091725] platform_device_register_full+0x238/0x2ac
[12472.098079] acpi_create_platform_device+0x2dc/0x3a8
[12472.104263] acpi_bus_attach+0x390/0x3cc
[12472.109397] acpi_bus_attach+0x108/0x3cc
[12472.114531] acpi_bus_attach+0x108/0x3cc
[12472.119664] acpi_bus_attach+0x108/0x3cc
[12472.124798] acpi_bus_scan+0x7c/0xb0
[12472.129588] acpi_scan_init+0xe4/0x304
[12472.134548] acpi_init+0x100/0x114
[12472.139160] do_one_initcall+0x348/0x6a0
[12472.144299] do_initcall_level+0x190/0x1fc
[12472.149606] do_basic_setup+0x34/0x4c
[12472.154479] kernel_init_freeable+0x19c/0x260
[12472.160051] kernel_init+0x18/0x338
[12472.164751] ret_from_fork+0x10/0x18
[12472.169534]
-> #1 (console_owner){-...}:
[12472.176323] lock_acquire+0x320/0x360
[12472.181196] console_lock_spinning_enable+0x6c/0x7c
[12472.187284] console_unlock+0x4f8/0x96c
[12472.192330] vprintk_emit+0x260/0x31c
[12472.197202] vprintk_default+0x54/0x7c
[12472.202162] vprintk_func+0x218/0x254
[12472.207035] printk+0x7c/0xa4
[12472.211218] get_random_u64+0x1c4/0x1dc
[12472.216266] shuffle_pick_tail+0x40/0xac
[12472.221408] __free_one_page+0x424/0x710
[12472.226541] free_one_page+0x70/0x120
[12472.231415] __free_pages_ok+0x61c/0xa94
[12472.236550] __free_pages_core+0x1bc/0x294
[12472.241861] memblock_free_pages+0x38/0x48
[12472.247171] __free_pages_memory+0xcc/0xfc
[12472.252478] __free_memory_core+0x70/0x78
[12472.257699] free_low_memory_core_early+0x148/0x18c
[12472.263787] memblock_free_all+0x18/0x54
[12472.268921] mem_init+0xb4/0x17c
[12472.273360] mm_init+0x14/0x38
[12472.277625] start_kernel+0x19c/0x530
[12472.282495]
-> #0 (&(&zone->lock)->rlock){..-.}:
[12472.289977] validate_chain+0xf6c/0x2e2c
[12472.295111] __lock_acquire+0x868/0xc2c
[12472.300159] lock_acquire+0x320/0x360
[12472.305032] _raw_spin_lock_irqsave+0x7c/0x9c
[12472.310599] rmqueue+0xf1c/0x2050
[12472.315128] get_page_from_freelist+0x474/0x688
[12472.320869] __alloc_pages_nodemask+0x3b4/0x18dc
[12472.326707] alloc_pages_current+0xd0/0xe0
[12472.332014] __get_free_pages+0x24/0x6c
[12472.337061] alloc_loc_track+0x38/0x80
[12472.342022] process_slab+0x228/0x544
[12472.346895] list_locations+0x158/0x4b4
[12472.351942] alloc_calls_show+0x38/0x48
[12472.356991] slab_attr_show+0x38/0x54
[12472.361876] sysfs_kf_seq_show+0x198/0x2d4
[12472.367184] kernfs_seq_show+0xa4/0xcc
[12472.372150] seq_read+0x394/0x918
[12472.376676] kernfs_fop_read+0xa8/0x334
[12472.381722] __vfs_read+0x88/0x20c
[12472.386334] vfs_read+0xdc/0x110
[12472.390773] ksys_read+0xb0/0x120
[12472.395298] __arm64_sys_read+0x54/0x88
[12472.400345] el0_svc_handler+0x170/0x240
[12472.405479] el0_sync_handler+0x150/0x250
[12472.410699] el0_sync+0x164/0x180
[12472.415223]
other info that might help us debug this:

[12472.425304] Chain exists of:
&(&zone->lock)->rlock --> &(&port->lock)->rlock --> &(&n->list_lock)->rlock

[12472.439914] Possible unsafe locking scenario:

[12472.447216] CPU0 CPU1
[12472.452434] ---- ----
[12472.457650] lock(&(&n->list_lock)->rlock);
[12472.462610] lock(&(&port->lock)->rlock);
[12472.469914] lock(&(&n->list_lock)->rlock);
[12472.477390] lock(&(&zone->lock)->rlock);
[12472.482175]
*** DEADLOCK ***

[12472.490172] 4 locks held by read_all/70259:
[12472.495041] #0: 33ff00947d9881e0 (&p->lock){+.+.}, at: seq_read+0x50/0x918
[12472.502701] #1: f9ff0095cb6e2680 (&of->mutex){+.+.}, at: kernfs_seq_start+0x34/0xf0
[12472.511141] #2: b8ff00083dc2dd08 (kn->count#48){++++}, at: kernfs_seq_start+0x44/0xf0
[12472.519756] #3: 69ff00082000fd18 (&(&n->list_lock)->rlock){-.-.}, at: list_locations+0x104/0x4b4
[12472.529325]
stack backtrace:
[12472.535069] CPU: 236 PID: 70259 Comm: read_all Tainted: G W L 5.4.0-rc6-next-20191111+ #2
[12472.545062] Hardware name: HPE Apollo 70 /C01_APACHE_MB , BIOS L50_5.13_1.11 06/18/2019
[12472.555489] Call trace:
[12472.558626] dump_backtrace+0x0/0x248
[12472.562977] show_stack+0x20/0x2c
[12472.566992] dump_stack+0xe8/0x150
[12472.571084] print_circular_bug+0x368/0x380
[12472.575957] check_noncircular+0x28c/0x294
[12472.580742] validate_chain+0xf6c/0x2e2c
[12472.585355] __lock_acquire+0x868/0xc2c
[12472.589882] lock_acquire+0x320/0x360
[12472.594234] _raw_spin_lock_irqsave+0x7c/0x9c
[12472.599280] rmqueue+0xf1c/0x2050
[12472.603286] get_page_from_freelist+0x474/0x688
[12472.608506] __alloc_pages_nodemask+0x3b4/0x18dc
[12472.613813] alloc_pages_current+0xd0/0xe0
[12472.618600] __get_free_pages+0x24/0x6c
[12472.623126] alloc_loc_track+0x38/0x80
[12472.627565] process_slab+0x228/0x544
[12472.631917] list_locations+0x158/0x4b4
[12472.636444] alloc_calls_show+0x38/0x48
[12472.640969] slab_attr_show+0x38/0x54
[12472.645322] sysfs_kf_seq_show+0x198/0x2d4
[12472.650108] kernfs_seq_show+0xa4/0xcc
[12472.654547] seq_read+0x394/0x918
[12472.658552] kernfs_fop_read+0xa8/0x334
[12472.663078] __vfs_read+0x88/0x20c
[12472.667169] vfs_read+0xdc/0x110
[12472.671087] ksys_read+0xb0/0x120
[12472.675091] __arm64_sys_read+0x54/0x88
[12472.679618] el0_svc_handler+0x170/0x240
[12472.684231] el0_sync_handler+0x150/0x250
[12472.688929] el0_sync+0x164/0x180


the existing dependency chain (in reverse order) is:

-> #4 (&pool->lock/1){-.-.}:
lock_acquire+0x320/0x360
_raw_spin_lock+0x64/0x80
__queue_work+0x4b4/0xa10
queue_work_on+0xac/0x11c
tty_schedule_flip+0x84/0xbc
tty_flip_buffer_push+0x1c/0x28
pty_write+0x98/0xd0
n_tty_write+0x450/0x60c
tty_write+0x338/0x474
__vfs_write+0x88/0x214
vfs_write+0x12c/0x1a4
redirected_tty_write+0x90/0xdc
do_loop_readv_writev+0x140/0x180
do_iter_write+0xe0/0x10c
vfs_writev+0x134/0x1cc
do_writev+0xbc/0x130
__arm64_sys_writev+0x58/0x8c
el0_svc_handler+0x170/0x240
el0_sync_handler+0x150/0x250
el0_sync+0x164/0x180

-> #3 (&(&port->lock)->rlock){-.-.}:
lock_acquire+0x320/0x360
_raw_spin_lock_irqsave+0x7c/0x9c
tty_port_tty_get+0x24/0x60
tty_port_default_wakeup+0x1c/0x3c
tty_port_tty_wakeup+0x34/0x40
uart_write_wakeup+0x28/0x44
pl011_tx_chars+0x1b8/0x270
pl011_start_tx+0x24/0x70
__uart_start+0x5c/0x68
uart_write+0x164/0x1c8
do_output_char+0x33c/0x348
n_tty_write+0x4bc/0x60c
tty_write+0x338/0x474
redirected_tty_write+0xc0/0xdc
do_loop_readv_writev+0x140/0x180
do_iter_write+0xe0/0x10c
vfs_writev+0x134/0x1cc
do_writev+0xbc/0x130
__arm64_sys_writev+0x58/0x8c
el0_svc_handler+0x170/0x240
el0_sync_handler+0x150/0x250
el0_sync+0x164/0x180

-> #2 (&port_lock_key){-.-.}:
lock_acquire+0x320/0x360
_raw_spin_lock+0x64/0x80
pl011_console_write+0xec/0x2cc
console_unlock+0x794/0x96c
vprintk_emit+0x260/0x31c
vprintk_default+0x54/0x7c
vprintk_func+0x218/0x254
printk+0x7c/0xa4
register_console+0x734/0x7b0
uart_add_one_port+0x734/0x834
pl011_register_port+0x6c/0xac
sbsa_uart_probe+0x234/0x2ec
platform_drv_probe+0xd4/0x124
really_probe+0x250/0x71c
driver_probe_device+0xb4/0x200
__device_attach_driver+0xd8/0x188
bus_for_each_drv+0xbc/0x110
__device_attach+0x120/0x220
device_initial_probe+0x20/0x2c
bus_probe_device+0x54/0x100
device_add+0xae8/0xc2c
platform_device_add+0x278/0x3b8
platform_device_register_full+0x238/0x2ac
acpi_create_platform_device+0x2dc/0x3a8
acpi_bus_attach+0x390/0x3cc
acpi_bus_attach+0x108/0x3cc
acpi_bus_attach+0x108/0x3cc
acpi_bus_attach+0x108/0x3cc
acpi_bus_scan+0x7c/0xb0
acpi_scan_init+0xe4/0x304
acpi_init+0x100/0x114
do_one_initcall+0x348/0x6a0
do_initcall_level+0x190/0x1fc
do_basic_setup+0x34/0x4c
kernel_init_freeable+0x19c/0x260
kernel_init+0x18/0x338
ret_from_fork+0x10/0x18

-> #1 (console_owner){-...}:
lock_acquire+0x320/0x360
console_lock_spinning_enable+0x6c/0x7c
console_unlock+0x4f8/0x96c
vprintk_emit+0x260/0x31c
vprintk_default+0x54/0x7c
vprintk_func+0x218/0x254
printk+0x7c/0xa4
get_random_u64+0x1c4/0x1dc
shuffle_pick_tail+0x40/0xac
__free_one_page+0x424/0x710
free_one_page+0x70/0x120
__free_pages_ok+0x61c/0xa94
__free_pages_core+0x1bc/0x294
memblock_free_pages+0x38/0x48
__free_pages_memory+0xcc/0xfc
__free_memory_core+0x70/0x78
free_low_memory_core_early+0x148/0x18c
memblock_free_all+0x18/0x54
mem_init+0xb4/0x17c
mm_init+0x14/0x38
start_kernel+0x19c/0x530

-> #0 (&(&zone->lock)->rlock){..-.}:
validate_chain+0xf6c/0x2e2c
__lock_acquire+0x868/0xc2c
lock_acquire+0x320/0x360
_raw_spin_lock+0x64/0x80
rmqueue+0x138/0x2050
get_page_from_freelist+0x474/0x688
__alloc_pages_nodemask+0x3b4/0x18dc
alloc_pages_current+0xd0/0xe0
alloc_slab_page+0x2b4/0x5e0
new_slab+0xc8/0x6bc
___slab_alloc+0x3b8/0x640
kmem_cache_alloc+0x4b4/0x588
__debug_object_init+0x778/0x8b4
debug_object_init_on_stack+0x40/0x50
start_flush_work+0x16c/0x3f0
__flush_work+0xb8/0x124
flush_work+0x20/0x30
xlog_cil_force_lsn+0x88/0x204 [xfs]
xfs_log_force_lsn+0x128/0x1b8 [xfs]
xfs_file_fsync+0x3c4/0x488 [xfs]
vfs_fsync_range+0xb0/0xd0
generic_write_sync+0x80/0xa0 [xfs]
xfs_file_buffered_aio_write+0x66c/0x6e4 [xfs]
xfs_file_write_iter+0x1a0/0x218 [xfs]
__vfs_write+0x1cc/0x214
vfs_write+0x12c/0x1a4
ksys_write+0xb0/0x120
__arm64_sys_write+0x54/0x88
el0_svc_handler+0x170/0x240
el0_sync_handler+0x150/0x250
el0_sync+0x164/0x180

other info that might help us debug this:

Chain exists of:
&(&zone->lock)->rlock --> &(&port->lock)->rlock --> &pool->lock/1

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&pool->lock/1);
lock(&(&port->lock)->rlock);
lock(&pool->lock/1);
lock(&(&zone->lock)->rlock);

*** DEADLOCK ***

4 locks held by doio/49441:
#0: a0ff00886fc27408 (sb_writers#8){.+.+}, at: vfs_write+0x118/0x1a4
#1: 8fff00080810dfe0 (&xfs_nondir_ilock_class){++++}, at:
xfs_ilock+0x2a8/0x300 [xfs]
#2: ffff9000129f2390 (rcu_read_lock){....}, at:
rcu_lock_acquire+0x8/0x38
#3: 60ff000822352818 (&pool->lock/1){-.-.}, at:
start_flush_work+0xd8/0x3f0

stack backtrace:
CPU: 48 PID: 49441 Comm: doio Tainted: G W
Hardware name: HPE Apollo 70 /C01_APACHE_MB , BIOS
L50_5.13_1.11 06/18/2019
Call trace:
dump_backtrace+0x0/0x248
show_stack+0x20/0x2c
dump_stack+0xe8/0x150
print_circular_bug+0x368/0x380
check_noncircular+0x28c/0x294
validate_chain+0xf6c/0x2e2c
__lock_acquire+0x868/0xc2c
lock_acquire+0x320/0x360
_raw_spin_lock+0x64/0x80
rmqueue+0x138/0x2050
get_page_from_freelist+0x474/0x688
__alloc_pages_nodemask+0x3b4/0x18dc
alloc_pages_current+0xd0/0xe0
alloc_slab_page+0x2b4/0x5e0
new_slab+0xc8/0x6bc
___slab_alloc+0x3b8/0x640
kmem_cache_alloc+0x4b4/0x588
__debug_object_init+0x778/0x8b4
debug_object_init_on_stack+0x40/0x50
start_flush_work+0x16c/0x3f0
__flush_work+0xb8/0x124
flush_work+0x20/0x30
xlog_cil_force_lsn+0x88/0x204 [xfs]
xfs_log_force_lsn+0x128/0x1b8 [xfs]
xfs_file_fsync+0x3c4/0x488 [xfs]
vfs_fsync_range+0xb0/0xd0
generic_write_sync+0x80/0xa0 [xfs]
xfs_file_buffered_aio_write+0x66c/0x6e4 [xfs]
xfs_file_write_iter+0x1a0/0x218 [xfs]
__vfs_write+0x1cc/0x214
vfs_write+0x12c/0x1a4
ksys_write+0xb0/0x120
__arm64_sys_write+0x54/0x88
el0_svc_handler+0x170/0x240
el0_sync_handler+0x150/0x250
el0_sync+0x164/0x180

WARNING: possible circular locking dependency detected
5.3.0-next-20190917 #8 Not tainted
------------------------------------------------------
test.sh/8653 is trying to acquire lock:
ffffffff865a4460 (console_owner){-.-.}, at:
console_unlock+0x207/0x750

but task is already holding lock:
ffff88883fff3c58 (&(&zone->lock)->rlock){-.-.}, at:
__offline_isolated_pages+0x179/0x3e0

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #3 (&(&zone->lock)->rlock){-.-.}:
__lock_acquire+0x5b3/0xb40
lock_acquire+0x126/0x280
_raw_spin_lock+0x2f/0x40
rmqueue_bulk.constprop.21+0xb6/0x1160
get_page_from_freelist+0x898/0x22c0
__alloc_pages_nodemask+0x2f3/0x1cd0
alloc_pages_current+0x9c/0x110
allocate_slab+0x4c6/0x19c0
new_slab+0x46/0x70
___slab_alloc+0x58b/0x960
__slab_alloc+0x43/0x70
__kmalloc+0x3ad/0x4b0
__tty_buffer_request_room+0x100/0x250
tty_insert_flip_string_fixed_flag+0x67/0x110
pty_write+0xa2/0xf0
n_tty_write+0x36b/0x7b0
tty_write+0x284/0x4c0
__vfs_write+0x50/0xa0
vfs_write+0x105/0x290
redirected_tty_write+0x6a/0xc0
do_iter_write+0x248/0x2a0
vfs_writev+0x106/0x1e0
do_writev+0xd4/0x180
__x64_sys_writev+0x45/0x50
do_syscall_64+0xcc/0x76c
entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> #2 (&(&port->lock)->rlock){-.-.}:
__lock_acquire+0x5b3/0xb40
lock_acquire+0x126/0x280
_raw_spin_lock_irqsave+0x3a/0x50
tty_port_tty_get+0x20/0x60
tty_port_default_wakeup+0xf/0x30
tty_port_tty_wakeup+0x39/0x40
uart_write_wakeup+0x2a/0x40
serial8250_tx_chars+0x22e/0x440
serial8250_handle_irq.part.8+0x14a/0x170
serial8250_default_handle_irq+0x5c/0x90
serial8250_interrupt+0xa6/0x130
__handle_irq_event_percpu+0x78/0x4f0
handle_irq_event_percpu+0x70/0x100
handle_irq_event+0x5a/0x8b
handle_edge_irq+0x117/0x370
do_IRQ+0x9e/0x1e0
ret_from_intr+0x0/0x2a
cpuidle_enter_state+0x156/0x8e0
cpuidle_enter+0x41/0x70
call_cpuidle+0x5e/0x90
do_idle+0x333/0x370
cpu_startup_entry+0x1d/0x1f
start_secondary+0x290/0x330
secondary_startup_64+0xb6/0xc0

-> #1 (&port_lock_key){-.-.}:
__lock_acquire+0x5b3/0xb40
lock_acquire+0x126/0x280
_raw_spin_lock_irqsave+0x3a/0x50
serial8250_console_write+0x3e4/0x450
univ8250_console_write+0x4b/0x60
console_unlock+0x501/0x750
vprintk_emit+0x10d/0x340
vprintk_default+0x1f/0x30
vprintk_func+0x44/0xd4
printk+0x9f/0xc5

-> #0 (console_owner){-.-.}:
check_prev_add+0x107/0xea0
validate_chain+0x8fc/0x1200
__lock_acquire+0x5b3/0xb40
lock_acquire+0x126/0x280
console_unlock+0x269/0x750
vprintk_emit+0x10d/0x340
vprintk_default+0x1f/0x30
vprintk_func+0x44/0xd4
printk+0x9f/0xc5
__offline_isolated_pages.cold.52+0x2f/0x30a
offline_isolated_pages_cb+0x17/0x30
walk_system_ram_range+0xda/0x160
__offline_pages+0x79c/0xa10
offline_pages+0x11/0x20
memory_subsys_offline+0x7e/0xc0
device_offline+0xd5/0x110
state_store+0xc6/0xe0
dev_attr_store+0x3f/0x60
sysfs_kf_write+0x89/0xb0
kernfs_fop_write+0x188/0x240
__vfs_write+0x50/0xa0
vfs_write+0x105/0x290
ksys_write+0xc6/0x160
__x64_sys_write+0x43/0x50
do_syscall_64+0xcc/0x76c
entry_SYSCALL_64_after_hwframe+0x49/0xbe

other info that might help us debug this:

Chain exists of:
console_owner --> &(&port->lock)->rlock --> &(&zone->lock)-

>rlock


Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&(&zone->lock)->rlock);
lock(&(&port->lock)->rlock);
lock(&(&zone->lock)->rlock);
lock(console_owner);

*** DEADLOCK ***

9 locks held by test.sh/8653:
#0: ffff88839ba7d408 (sb_writers#4){.+.+}, at:
vfs_write+0x25f/0x290
#1: ffff888277618880 (&of->mutex){+.+.}, at:
kernfs_fop_write+0x128/0x240
#2: ffff8898131fc218 (kn->count#115){.+.+}, at:
kernfs_fop_write+0x138/0x240
#3: ffffffff86962a80 (device_hotplug_lock){+.+.}, at:
lock_device_hotplug_sysfs+0x16/0x50
#4: ffff8884374f4990 (&dev->mutex){....}, at:
device_offline+0x70/0x110
#5: ffffffff86515250 (cpu_hotplug_lock.rw_sem){++++}, at:
__offline_pages+0xbf/0xa10
#6: ffffffff867405f0 (mem_hotplug_lock.rw_sem){++++}, at:
percpu_down_write+0x87/0x2f0
#7: ffff88883fff3c58 (&(&zone->lock)->rlock){-.-.}, at:
__offline_isolated_pages+0x179/0x3e0
#8: ffffffff865a4920 (console_lock){+.+.}, at:
vprintk_emit+0x100/0x340

stack backtrace:
CPU: 1 PID: 8653 Comm: test.sh Not tainted 5.3.0-next-20190917 #8
Hardware name: HPE ProLiant DL560 Gen10/ProLiant DL560 Gen10,
BIOS U34 05/21/2019
Call Trace:
dump_stack+0x86/0xca
print_circular_bug.cold.31+0x243/0x26e
check_noncircular+0x29e/0x2e0
check_prev_add+0x107/0xea0
validate_chain+0x8fc/0x1200
__lock_acquire+0x5b3/0xb40
lock_acquire+0x126/0x280
console_unlock+0x269/0x750
vprintk_emit+0x10d/0x340
vprintk_default+0x1f/0x30
vprintk_func+0x44/0xd4
printk+0x9f/0xc5
__offline_isolated_pages.cold.52+0x2f/0x30a
offline_isolated_pages_cb+0x17/0x30
walk_system_ram_range+0xda/0x160
__offline_pages+0x79c/0xa10
offline_pages+0x11/0x20
memory_subsys_offline+0x7e/0xc0
device_offline+0xd5/0x110
state_store+0xc6/0xe0
dev_attr_store+0x3f/0x60
sysfs_kf_write+0x89/0xb0
kernfs_fop_write+0x188/0x240
__vfs_write+0x50/0xa0
vfs_write+0x105/0x290
ksys_write+0xc6/0x160
__x64_sys_write+0x43/0x50
do_syscall_64+0xcc/0x76c
entry_SYSCALL_64_after_hwframe+0x49/0xbe


>
> I suggest to disable lockdep around the safe allocation in the console
> initialization code. Then we will see if there are other locations
> that trigger this lockdep warning. It is trivial and will not
> complicate the code because of false positives.
>
>
>> I have been spent the last a few months to research this, so
>> I donât feel like to do this again.
>>
>> https://lore.kernel.org/linux-mm/1570633715.5937.10.camel@xxxxxx/
>
> Have you tried to disable lockdep around the problematic allocation?
>
> Have you seen other lockdep reports caused by exactly this printk()
> in the allocator code?
>
> My big problem with this patch is that the commit message does not
> contain any lockdep report. It will complicate removing the hack
> when it is not longer needed. Nobody will know what was the exact
> problem and if it is safe to get removed. I believe that printk()
> will offload console handling rather sooner than later and this
> extra logic will not be necessary.
>
> Best Regards,
> Petr