Re: [PATCH V1] x86, espfix: postpone the initialization of espfix stack for AP

From: H. Peter Anvin
Date: Wed Jun 17 2015 - 03:27:47 EST


On 06/04/2015 02:45 AM, Gu Zheng wrote:
> The following lockdep warning occurrs when running with latest kernel:
> [ 3.178000] ------------[ cut here ]------------
> [ 3.183000] WARNING: CPU: 128 PID: 0 at kernel/locking/lockdep.c:2755 lockdep_trace_alloc+0xdd/0xe0()
> [ 3.193000] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
> [ 3.199000] Modules linked in:
>
> [ 3.203000] CPU: 128 PID: 0 Comm: swapper/128 Not tainted 4.1.0-rc3 #70
> [ 3.221000] 0000000000000000 2d6601fb3e6d4e4c ffff88086fd5fc38 ffffffff81773f0a
> [ 3.230000] 0000000000000000 ffff88086fd5fc90 ffff88086fd5fc78 ffffffff8108c85a
> [ 3.238000] ffff88086fd60000 0000000000000092 ffff88086fd60000 00000000000000d0
> [ 3.246000] Call Trace:
> [ 3.249000] [<ffffffff81773f0a>] dump_stack+0x4c/0x65
> [ 3.255000] [<ffffffff8108c85a>] warn_slowpath_common+0x8a/0xc0
> [ 3.261000] [<ffffffff8108c8e5>] warn_slowpath_fmt+0x55/0x70
> [ 3.268000] [<ffffffff810ee24d>] lockdep_trace_alloc+0xdd/0xe0
> [ 3.274000] [<ffffffff811cda0d>] __alloc_pages_nodemask+0xad/0xca0
> [ 3.281000] [<ffffffff810ec7ad>] ? __lock_acquire+0xf6d/0x1560
> [ 3.288000] [<ffffffff81219c8a>] alloc_page_interleave+0x3a/0x90
> [ 3.295000] [<ffffffff8121b32d>] alloc_pages_current+0x17d/0x1a0
> [ 3.301000] [<ffffffff811c869e>] ? __get_free_pages+0xe/0x50
> [ 3.308000] [<ffffffff811c869e>] __get_free_pages+0xe/0x50
> [ 3.314000] [<ffffffff8102640b>] init_espfix_ap+0x17b/0x320
> [ 3.320000] [<ffffffff8105c691>] start_secondary+0xf1/0x1f0
> [ 3.327000] ---[ end trace 1b3327d9d6a1d62c ]---
>
> As we alloc pages with GFP_KERNEL in init_espfix_ap() which is called
> before enabled local irq, and the lockdep sub-system considers this
> behaviour as allocating memory with GFP_FS with local irq disabled,
> then trigger the warning as mentioned about.
>
> Though we could allocate them on the boot CPU side and hand them over to
> the secondary CPU, but it seemes a bit waste if some of cpus are offline.
> As thers is no need to these pages(espfix stack) until we try to run user
> code, so we postpone the initialization of espfix stack, and let the boot
> up routine init the espfix stack for the target cpu after it booted to
> avoid the noise.
>

It isn't *at all* obvious to me at least that if the GFP_KERNEL
allocation fails we may not get rescheduled on another CPU and/or get stuck.

I'm starting to think that the right thing to do is to allocate these on
the CPU that is bringing up the other CPU, at the same time we allocate
the percpu area. This won't affect offline CPUs.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/