Re: lockdep warning: console vs. mem hotplug

From: Steven Rostedt
Date: Fri Mar 24 2017 - 12:39:49 EST


On Mon, 20 Mar 2017 17:43:23 +0100 (CET)
Sebastian Ott <sebott@xxxxxxxxxxxxxxxxxx> wrote:

> Since commit f975237b7682 ("printk: use printk_safe buffers in printk")
> I observe lockdep warnings on s390 when doing memory hotplug:
>
> [ 132.379220] ======================================================
> [ 132.379220] [ INFO: possible circular locking dependency detected ]
> [ 132.379221] 4.11.0-rc3 #330 Not tainted
> [ 132.379221] -------------------------------------------------------
> [ 132.379222] sh/708 is trying to acquire lock:
> [ 132.379222] (&(&sclp_con_lock)->rlock){-.-...}, at: [<000000000074524a>] sclp_console_write+0x4a/0x2b8
> [ 132.379224]
> [ 132.379224] but task is already holding lock:
> [ 132.379224] (&(&zone->lock)->rlock){..-...}, at: [<00000000002a65a8>] __offline_isolated_pages+0xe0/0x340
> [ 132.379226]
> [ 132.379227] which lock already depends on the new lock.
> [ 132.379227]
> [ 132.379227]
> [ 132.379228] the existing dependency chain (in reverse order) is:
> [ 132.379228]
> [ 132.379229] -> #2 (&(&zone->lock)->rlock){..-...}:
> [ 132.379230] validate_chain.isra.22+0xb36/0xd18
> [ 132.379231] __lock_acquire+0x650/0x880
> [ 132.379231] lock_acquire+0x24e/0x2b0
> [ 132.379232] _raw_spin_lock_irqsave+0x70/0xb8
> [ 132.379232] get_page_from_freelist+0x548/0xf30
> [ 132.379233] __alloc_pages_nodemask+0x1c8/0x14c0
> [ 132.379233] allocate_slab+0xde/0x668
> [ 132.379233] new_slab+0x94/0xa8
> [ 132.379234] ___slab_alloc.constprop.66+0x556/0x578
> [ 132.379234] __slab_alloc.isra.62.constprop.65+0x74/0xa8
> [ 132.379235] kmem_cache_alloc+0x136/0x4b8
> [ 132.379235] __debug_object_init+0xaa/0x448
> [ 132.379236] debug_object_activate+0x182/0x238
> [ 132.379236] mod_timer+0xe6/0x450
> [ 132.379236] prandom_reseed+0x44/0x50
> [ 132.379237] do_one_initcall+0xa2/0x150
> [ 132.379237] kernel_init_freeable+0x228/0x2d8
> [ 132.379238] kernel_init+0x2a/0x138
> [ 132.379238] kernel_thread_starter+0x6/0xc
> [ 132.379239] kernel_thread_starter+0x0/0xc
> [ 132.379239]
> [ 132.379239] -> #1 (&(&base->lock)->rlock){-.-.-.}:
> [ 132.379241] validate_chain.isra.22+0xb36/0xd18
> [ 132.379242] __lock_acquire+0x650/0x880
> [ 132.379242] lock_acquire+0x24e/0x2b0
> [ 132.379242] _raw_spin_lock_irqsave+0x70/0xb8
> [ 132.379243] lock_timer_base+0x78/0xa8
> [ 132.379243] mod_timer+0xba/0x450
> [ 132.379244] sclp_console_write+0x262/0x2b8
> [ 132.379244] console_unlock+0x4a6/0x658
> [ 132.379245] register_console+0x31c/0x430
> [ 132.379245] sclp_console_init+0x152/0x170
> [ 132.379245] console_init+0x3a/0x58
> [ 132.379246] start_kernel+0x350/0x480
> [ 132.379246] _stext+0x20/0x80
> [ 132.379246]
> [ 132.379247] -> #0 (&(&sclp_con_lock)->rlock){-.-...}:
> [ 132.379249] check_prev_add+0x150/0x648
> [ 132.379249] validate_chain.isra.22+0xb36/0xd18
> [ 132.379249] __lock_acquire+0x650/0x880
> [ 132.379250] lock_acquire+0x24e/0x2b0
> [ 132.379250] _raw_spin_lock_irqsave+0x70/0xb8
> [ 132.379251] sclp_console_write+0x4a/0x2b8
> [ 132.379251] console_unlock+0x4a6/0x658
> [ 132.379252] vprintk_emit+0x33e/0x350
> [ 132.379252] vprintk_default+0x44/0x58
> [ 132.379252] printk+0x4e/0x58
> [ 132.379253] __offline_isolated_pages+0x1a0/0x340
> [ 132.379253] offline_isolated_pages_cb+0x2e/0x40
> [ 132.379254] walk_system_ram_range+0x96/0xe8
> [ 132.379254] __offline_pages.constprop.43+0x6da/0x918
> [ 132.379255] memory_subsys_offline+0x6a/0xa0
> [ 132.379255] device_offline+0x84/0xd8
> [ 132.379255] store_mem_state+0xfe/0x120
> [ 132.379256] kernfs_fop_write+0x126/0x1d0
> [ 132.379256] __vfs_write+0x46/0x140
> [ 132.379257] vfs_write+0xb8/0x1a8
> [ 132.379257] SyS_write+0x66/0xc0
> [ 132.379257] system_call+0xc4/0x240
> [ 132.379258]
> [ 132.379258] other info that might help us debug this:
> [ 132.379258]
> [ 132.379259] Chain exists of:
> [ 132.379259] &(&sclp_con_lock)->rlock --> &(&base->lock)->rlock --> &(&zone->lock)->rlock
> [ 132.379262]
> [ 132.379262] Possible unsafe locking scenario:
> [ 132.379262]
> [ 132.379263] CPU0 CPU1
> [ 132.379263] ---- ----
> [ 132.379264] lock(&(&zone->lock)->rlock);
> [ 132.379265] lock(&(&base->lock)->rlock);
> [ 132.379266] lock(&(&zone->lock)->rlock);
> [ 132.379267] lock(&(&sclp_con_lock)->rlock);
> [ 132.379268]
> [ 132.379268] *** DEADLOCK ***
> [ 132.379269]
> [ 132.379269] 10 locks held by sh/708:
> [ 132.379269] #0: (sb_writers#4){.+.+.+}, at: [<0000000000345d4a>] vfs_write+0xa2/0x1a8
> [ 132.379272] #1: (&of->mutex){+.+.+.}, at: [<00000000003e5e60>] kernfs_fop_write+0xd8/0x1d0
> [ 132.379273] #2: (s_active#180){.+.+.+}, at: [<00000000003e5e6c>] kernfs_fop_write+0xe4/0x1d0
> [ 132.379275] #3: (device_hotplug_lock){+.+...}, at: [<000000000069bb58>] lock_device_hotplug_sysfs+0x30/0x68
> [ 132.379277] #4: (memory_add_remove_lock){+.+...}, at: [<000000000031cfe6>] mem_hotplug_begin+0x3e/0xd0
> [ 132.379279] #5: (mem_hotplug.lock){++++.+}, at: [<000000000031cfa8>] mem_hotplug_begin+0x0/0xd0
> [ 132.379281] #6: (mem_hotplug.lock#2){+.+...}, at: [<000000000031d02c>] mem_hotplug_begin+0x84/0xd0
> [ 132.379283] #7: (&dev->mutex){......}, at: [<000000000069d55c>] device_offline+0x5c/0xd8
> [ 132.379285] #8: (&(&zone->lock)->rlock){..-...}, at: [<00000000002a65a8>] __offline_isolated_pages+0xe0/0x340
> [ 132.379287] #9: (console_lock){+.+...}, at: [<00000000001bdc02>] vprintk_emit+0x332/0x350

Is there a stack trace of where the lockdep dump happened? That is
useful too. Otherwise we don't see where the inverse happened.

-- Steve