Re: lockdep warning: console vs. mem hotplug
From: Sergey Senozhatsky
Date: Tue Mar 21 2017 - 00:47:31 EST
Cc Michal, Linus, Peter, Thomas, Andrew, Martin
// I kept only lockdep splats from the original report and cut off .configs
// full version:
// lkml.kernel.org/r/alpine.LFD.2.20.1703201736070.1753@schleppi
Sebastian, thanks.
On (03/20/17 17:43), Sebastian Ott wrote:
[..]
> Since commit f975237b7682 ("printk: use printk_safe buffers in printk")
> I observe lockdep warnings on s390 when doing memory hotplug:
>
> [ 132.379220] ======================================================
> [ 132.379220] [ INFO: possible circular locking dependency detected ]
> [ 132.379221] 4.11.0-rc3 #330 Not tainted
> [ 132.379221] -------------------------------------------------------
> [ 132.379222] sh/708 is trying to acquire lock:
> [ 132.379222] (&(&sclp_con_lock)->rlock){-.-...}, at: [<000000000074524a>] sclp_console_write+0x4a/0x2b8
> [ 132.379224]
> [ 132.379224] but task is already holding lock:
> [ 132.379224] (&(&zone->lock)->rlock){..-...}, at: [<00000000002a65a8>] __offline_isolated_pages+0xe0/0x340
> [ 132.379226]
> [ 132.379227] which lock already depends on the new lock.
> [ 132.379227]
> [ 132.379227]
> [ 132.379228] the existing dependency chain (in reverse order) is:
> [ 132.379228]
> [ 132.379229] -> #2 (&(&zone->lock)->rlock){..-...}:
> [ 132.379230] validate_chain.isra.22+0xb36/0xd18
> [ 132.379231] __lock_acquire+0x650/0x880
> [ 132.379231] lock_acquire+0x24e/0x2b0
> [ 132.379232] _raw_spin_lock_irqsave+0x70/0xb8
> [ 132.379232] get_page_from_freelist+0x548/0xf30
> [ 132.379233] __alloc_pages_nodemask+0x1c8/0x14c0
> [ 132.379233] allocate_slab+0xde/0x668
> [ 132.379233] new_slab+0x94/0xa8
> [ 132.379234] ___slab_alloc.constprop.66+0x556/0x578
> [ 132.379234] __slab_alloc.isra.62.constprop.65+0x74/0xa8
> [ 132.379235] kmem_cache_alloc+0x136/0x4b8
> [ 132.379235] __debug_object_init+0xaa/0x448
> [ 132.379236] debug_object_activate+0x182/0x238
> [ 132.379236] mod_timer+0xe6/0x450
> [ 132.379236] prandom_reseed+0x44/0x50
> [ 132.379237] do_one_initcall+0xa2/0x150
> [ 132.379237] kernel_init_freeable+0x228/0x2d8
> [ 132.379238] kernel_init+0x2a/0x138
> [ 132.379238] kernel_thread_starter+0x6/0xc
> [ 132.379239] kernel_thread_starter+0x0/0xc
> [ 132.379239]
> [ 132.379239] -> #1 (&(&base->lock)->rlock){-.-.-.}:
> [ 132.379241] validate_chain.isra.22+0xb36/0xd18
> [ 132.379242] __lock_acquire+0x650/0x880
> [ 132.379242] lock_acquire+0x24e/0x2b0
> [ 132.379242] _raw_spin_lock_irqsave+0x70/0xb8
> [ 132.379243] lock_timer_base+0x78/0xa8
> [ 132.379243] mod_timer+0xba/0x450
> [ 132.379244] sclp_console_write+0x262/0x2b8
> [ 132.379244] console_unlock+0x4a6/0x658
> [ 132.379245] register_console+0x31c/0x430
> [ 132.379245] sclp_console_init+0x152/0x170
> [ 132.379245] console_init+0x3a/0x58
> [ 132.379246] start_kernel+0x350/0x480
> [ 132.379246] _stext+0x20/0x80
> [ 132.379246]
> [ 132.379247] -> #0 (&(&sclp_con_lock)->rlock){-.-...}:
> [ 132.379249] check_prev_add+0x150/0x648
> [ 132.379249] validate_chain.isra.22+0xb36/0xd18
> [ 132.379249] __lock_acquire+0x650/0x880
> [ 132.379250] lock_acquire+0x24e/0x2b0
> [ 132.379250] _raw_spin_lock_irqsave+0x70/0xb8
> [ 132.379251] sclp_console_write+0x4a/0x2b8
> [ 132.379251] console_unlock+0x4a6/0x658
> [ 132.379252] vprintk_emit+0x33e/0x350
> [ 132.379252] vprintk_default+0x44/0x58
> [ 132.379252] printk+0x4e/0x58
> [ 132.379253] __offline_isolated_pages+0x1a0/0x340
> [ 132.379253] offline_isolated_pages_cb+0x2e/0x40
> [ 132.379254] walk_system_ram_range+0x96/0xe8
> [ 132.379254] __offline_pages.constprop.43+0x6da/0x918
> [ 132.379255] memory_subsys_offline+0x6a/0xa0
> [ 132.379255] device_offline+0x84/0xd8
> [ 132.379255] store_mem_state+0xfe/0x120
> [ 132.379256] kernfs_fop_write+0x126/0x1d0
> [ 132.379256] __vfs_write+0x46/0x140
> [ 132.379257] vfs_write+0xb8/0x1a8
> [ 132.379257] SyS_write+0x66/0xc0
> [ 132.379257] system_call+0xc4/0x240
> [ 132.379258]
> [ 132.379258] other info that might help us debug this:
> [ 132.379258]
> [ 132.379259] Chain exists of:
> [ 132.379259] &(&sclp_con_lock)->rlock --> &(&base->lock)->rlock --> &(&zone->lock)->rlock
> [ 132.379262]
> [ 132.379262] Possible unsafe locking scenario:
> [ 132.379262]
> [ 132.379263] CPU0 CPU1
> [ 132.379263] ---- ----
> [ 132.379264] lock(&(&zone->lock)->rlock);
> [ 132.379265] lock(&(&base->lock)->rlock);
> [ 132.379266] lock(&(&zone->lock)->rlock);
> [ 132.379267] lock(&(&sclp_con_lock)->rlock);
after a rather quick look.
as far as I understand it, the problem here is that we have the
following chains
a) mod_timer() -> __debug_object_init() -> __alloc_pages_nodemask()
takes zone->lock under timer base->lock
so this adds a "base->lock -> zone->lock" dependency
b) console_unlock() -> sclp_console_write() -> lock_timer_base()
because sclp_console_write() calls add_timer() under sclp
console spin_lock.
this one adds a "sclp_con_lock -> base->lock" dependency
c) __offline_isolated_pages() -> printk() -> sclp_console_write()
calls printk under zone->lock. pr_info("remove from free list %lx %d %lx\n", pfn, 1 << order, end_pfn);
so it's a "zone->lock -> sclp_con_lock -> base->lock" dependency.
which conflicts with (a).
a1) as an additional complication, (a) can also printk() from __debug_object_init(),
for example see debug_print_object(). which probably can result in
mod_timer() -> printk() -> mod_timer() -> base->lock *** deadlock
so may be we can consider switching to printk_safe in debug_print_object().
but this is not directly related to the lockdep splat above.
so we probably can
1) move pr_info() out of zone->lock in __offline_isolated_pages().
meh...
2) switch to printk_deferred() in __offline_isolated_pages().
meh.. there might a bunch of other printks done from under zone->lock.
3) move add_timer() out of sclp_con_lock console in sclp_console_write().
well, there can be other consoles that do something similar.
4) ... something smart.
need to look more at the second lockdep splat, but looks more or less
similar.
-ss
> [ 132.379268] *** DEADLOCK ***
> [ 132.379269]
> [ 132.379269] 10 locks held by sh/708:
> [ 132.379269] #0: (sb_writers#4){.+.+.+}, at: [<0000000000345d4a>] vfs_write+0xa2/0x1a8
> [ 132.379272] #1: (&of->mutex){+.+.+.}, at: [<00000000003e5e60>] kernfs_fop_write+0xd8/0x1d0
> [ 132.379273] #2: (s_active#180){.+.+.+}, at: [<00000000003e5e6c>] kernfs_fop_write+0xe4/0x1d0
> [ 132.379275] #3: (device_hotplug_lock){+.+...}, at: [<000000000069bb58>] lock_device_hotplug_sysfs+0x30/0x68
> [ 132.379277] #4: (memory_add_remove_lock){+.+...}, at: [<000000000031cfe6>] mem_hotplug_begin+0x3e/0xd0
> [ 132.379279] #5: (mem_hotplug.lock){++++.+}, at: [<000000000031cfa8>] mem_hotplug_begin+0x0/0xd0
> [ 132.379281] #6: (mem_hotplug.lock#2){+.+...}, at: [<000000000031d02c>] mem_hotplug_begin+0x84/0xd0
> [ 132.379283] #7: (&dev->mutex){......}, at: [<000000000069d55c>] device_offline+0x5c/0xd8
> [ 132.379285] #8: (&(&zone->lock)->rlock){..-...}, at: [<00000000002a65a8>] __offline_isolated_pages+0xe0/0x340
> [ 132.379287] #9: (console_lock){+.+...}, at: [<00000000001bdc02>] vprintk_emit+0x332/0x350
>
> Here's another one using a different console:
>
> [ 66.522890] ======================================================
> [ 66.522890] [ INFO: possible circular locking dependency detected ]
> [ 66.522891] 4.11.0-rc2-00237-g2e287c0 #119 Not tainted
> [ 66.522891] -------------------------------------------------------
> [ 66.522892] sh/822 is trying to acquire lock:
> [ 66.522892] (&console_sch_key){-.-...}, at: [<00000000007573f0>] raw3215_write+0x58/0x208
> [ 66.522894] but task is already holding lock:
> [ 66.522894] (&(&zone->lock)->rlock){..-...}, at: [<00000000002ae3da>] __offline_isolated_pages+0x302/0x370
> [ 66.522897] which lock already depends on the new lock.
> [ 66.522898] the existing dependency chain (in reverse order) is:
> [ 66.522899] -> #4 (&(&zone->lock)->rlock){..-...}:
> [ 66.522901] validate_chain.isra.10+0xb7a/0xda8
> [ 66.522901] __lock_acquire+0x62c/0x850
> [ 66.522902] lock_acquire+0x254/0x2b8
> [ 66.522902] _raw_spin_lock_irqsave+0x70/0xb8
> [ 66.522903] get_page_from_freelist+0x4e2/0xf28
> [ 66.522903] __alloc_pages_nodemask+0x1fa/0x1528
> [ 66.522904] allocate_slab+0xf2/0x6c0
> [ 66.522904] new_slab+0x94/0xa8
> [ 66.522905] ___slab_alloc.constprop.22+0x570/0x590
> [ 66.522905] __slab_alloc.isra.16.constprop.21+0x74/0xa8
> [ 66.522906] kmem_cache_alloc+0x13c/0x4b0
> [ 66.522906] __debug_object_init+0x5c/0x468
> [ 66.522907] hrtimer_init+0x42/0x1d8
> [ 66.522907] init_dl_task_timer+0x3a/0x58
> [ 66.522908] __sched_fork.isra.5+0x7e/0xe8
> [ 66.522908] init_idle+0x7a/0x278
> [ 66.522908] fork_idle+0xa4/0xb8
> [ 66.522909] idle_threads_init+0x64/0xc8
> [ 66.522909] smp_init+0x2e/0x110
> [ 66.522910] kernel_init_freeable+0x16c/0x2e0
> [ 66.522910] kernel_init+0x2a/0x148
> [ 66.522911] kernel_thread_starter+0x6/0xc
> [ 66.522911] kernel_thread_starter+0x0/0xc
> [ 66.522912] -> #3 (&rq->lock){-.-.-.}:
> [ 66.522914] validate_chain.isra.10+0xb7a/0xda8
> [ 66.522914] __lock_acquire+0x62c/0x850
> [ 66.522915] lock_acquire+0x254/0x2b8
> [ 66.522915] _raw_spin_lock+0x60/0xa0
> [ 66.522916] task_fork_fair+0x54/0x118
> [ 66.522916] sched_fork+0x13e/0x2a0
> [ 66.522916] copy_process+0x642/0x1e80
> [ 66.522917] _do_fork+0xc2/0x6d0
> [ 66.522917] kernel_thread+0x4e/0x60
> [ 66.522918] rest_init+0x42/0x170
> [ 66.522918] start_kernel+0x464/0x478
> [ 66.522918] _stext+0x20/0x80
> [ 66.522919] -> #2 (&p->pi_lock){-.-.-.}:
> [ 66.522921] validate_chain.isra.10+0xb7a/0xda8
> [ 66.522922] __lock_acquire+0x62c/0x850
> [ 66.522922] lock_acquire+0x254/0x2b8
> [ 66.522922] _raw_spin_lock_irqsave+0x70/0xb8
> [ 66.522923] try_to_wake_up+0x4a/0x5f8
> [ 66.522924] autoremove_wake_function+0x2e/0x88
> [ 66.522924] __wake_up_common+0x76/0xc0
> [ 66.522924] __wake_up+0x54/0x68
> [ 66.522925] ccw_device_verify_done+0xae/0x268
> [ 66.522925] ccw_request_handler+0x422/0x560
> [ 66.522926] do_cio_interrupt+0x224/0x2a0
> [ 66.522926] __handle_irq_event_percpu+0x1a6/0x440
> [ 66.522927] handle_irq_event_percpu+0x38/0x88
> [ 66.522927] handle_percpu_irq+0x84/0xb0
> [ 66.522928] generic_handle_irq+0x42/0x60
> [ 66.522928] do_IRQ+0x86/0xc8
> [ 66.522929] io_int_handler+0x104/0x294
> [ 66.522929] enabled_wait+0x72/0x140
> [ 66.522930] enabled_wait+0x5a/0x140
> [ 66.522930] arch_cpu_idle+0x32/0x50
> [ 66.522930] default_idle_call+0x52/0x68
> [ 66.522931] do_idle+0x10e/0x160
> [ 66.522931] cpu_startup_entry+0x3e/0x48
> [ 66.522932] start_kernel+0x464/0x478
> [ 66.522932] _stext+0x20/0x80
> [ 66.522933] -> #1 (&priv->wait_q){-.-...}:
> [ 66.522935] validate_chain.isra.10+0xb7a/0xda8
> [ 66.522935] __lock_acquire+0x62c/0x850
> [ 66.522935] lock_acquire+0x254/0x2b8
> [ 66.522936] _raw_spin_lock_irqsave+0x70/0xb8
> [ 66.522936] __wake_up+0x3a/0x68
> [ 66.522937] ccw_device_recog_done+0x28e/0x2c8
> [ 66.522937] snsid_callback+0x324/0x390
> [ 66.522938] ccw_request_handler+0x480/0x560
> [ 66.522938] do_cio_interrupt+0x224/0x2a0
> [ 66.522939] __handle_irq_event_percpu+0x1a6/0x440
> [ 66.522939] handle_irq_event_percpu+0x38/0x88
> [ 66.522940] handle_percpu_irq+0x84/0xb0
> [ 66.522940] generic_handle_irq+0x42/0x60
> [ 66.522940] do_IRQ+0x86/0xc8
> [ 66.522941] io_int_handler+0x104/0x294
> [ 66.522941] _raw_spin_unlock_irq+0x4e/0x78
> [ 66.522942] _raw_spin_unlock_irq+0x4a/0x78
> [ 66.522942] ccw_device_enable_console+0x9a/0x180
> [ 66.522943] con3215_init+0x110/0x1b8
> [ 66.522943] console_init+0x3a/0x58
> [ 66.522944] start_kernel+0x33a/0x478
> [ 66.522944] _stext+0x20/0x80
> [ 66.522945] -> #0 (&console_sch_key){-.-...}:
> [ 66.522947] check_prev_add+0x160/0x6e8
> [ 66.522947] validate_chain.isra.10+0xb7a/0xda8
> [ 66.522948] __lock_acquire+0x62c/0x850
> [ 66.522948] lock_acquire+0x254/0x2b8
> [ 66.522948] _raw_spin_lock_irqsave+0x70/0xb8
> [ 66.522949] raw3215_write+0x58/0x208
> [ 66.522949] con3215_write+0x8e/0xf8
> [ 66.522950] console_unlock+0x49a/0x670
> [ 66.522950] vprintk_emit+0x308/0x378
> [ 66.522951] vprintk_default+0x44/0x58
> [ 66.522951] printk+0x4e/0x60
> [ 66.522952] __offline_isolated_pages+0x168/0x370
> [ 66.522952] offline_isolated_pages_cb+0x2e/0x40
> [ 66.522953] walk_system_ram_range+0x92/0xf0
> [ 66.522953] __offline_pages.constprop.9+0x74a/0x988
> [ 66.522954] memory_subsys_offline+0x6c/0xa0
> [ 66.522954] device_offline+0x84/0xe0
> [ 66.522954] store_mem_state+0xfe/0x120
> [ 66.522955] kernfs_fop_write+0x124/0x1c8
> [ 66.522955] __vfs_write+0x36/0x140
> [ 66.522956] vfs_write+0xb8/0x1a0
> [ 66.522956] SyS_write+0x66/0xc0
> [ 66.522957] system_call+0xc4/0x240
>
> [ 66.522957] other info that might help us debug this:
> [ 66.522958] Chain exists of:
> [ 66.522958] &console_sch_key --> &rq->lock --> &(&zone->lock)->rlock
> [ 66.522961] Possible unsafe locking scenario:
> [ 66.522962] CPU0 CPU1
> [ 66.522962] ---- ----
> [ 66.522963] lock(&(&zone->lock)->rlock);
> [ 66.522964] lock(&rq->lock);
> [ 66.522965] lock(&(&zone->lock)->rlock);
> [ 66.522966] lock(&console_sch_key);
>
> [ 66.522967] *** DEADLOCK ***
>
> [ 66.522968] 10 locks held by sh/822:
> [ 66.522968] #0: (sb_writers#5){.+.+.+}, at: [<000000000034f9c2>] vfs_write+0xa2/0x1a0
> [ 66.522970] #1: (&of->mutex){+.+.+.}, at: [<00000000003f1afe>] kernfs_fop_write+0xd6/0x1c8
> [ 66.522972] #2: (s_active#53){.+.+.+}, at: [<00000000003f1b0a>] kernfs_fop_write+0xe2/0x1c8
> [ 66.522974] #3: (device_hotplug_lock){+.+...}, at: [<00000000006adb70>] lock_device_hotplug_sysfs+0x30/0x70
> [ 66.522976] #4: (memory_add_remove_lock){+.+...}, at: [<0000000000328e16>] mem_hotplug_begin+0x3e/0xd0
> [ 66.522978] #5: (mem_hotplug.lock){++++.+}, at: [<0000000000328dd8>] mem_hotplug_begin+0x0/0xd0
> [ 66.522980] #6: (mem_hotplug.lock#2){+.+...}, at: [<0000000000328e5c>] mem_hotplug_begin+0x84/0xd0
> [ 66.522982] #7: (&dev->mutex){......}, at: [<00000000006af41c>] device_offline+0x5c/0xe0
> [ 66.522984] #8: (&(&zone->lock)->rlock){..-...}, at: [<00000000002ae3da>] __offline_isolated_pages+0x302/0x370
> [ 66.522986] #9: (console_lock){+.+...}, at: [<00000000001c2c84>] vprintk_emit+0x2fc/0x378