Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!

From: Yinghai Lu
Date: Mon Feb 25 2013 - 21:06:45 EST


[ Add new address with Martin]

On Mon, Feb 25, 2013 at 4:35 PM, Yinghai Lu <yinghai@xxxxxxxxxx> wrote:
> On Mon, Feb 25, 2013 at 2:50 PM, Yinghai Lu <yinghai@xxxxxxxxxx> wrote:
>> On Mon, Feb 25, 2013 at 1:27 PM, Don Morris <don.morris@xxxxxx> wrote:
>>> On 02/25/2013 10:32 AM, Tim Gardner wrote:
>>>> On 02/25/2013 08:02 AM, Tim Gardner wrote:
>>>>> Is this an expected warning ? I'll boot a vanilla kernel just to be sure.
>>>>>
>>>>> rebased against ab7826595e9ec51a51f622c5fc91e2f59440481a in Linus' repo:
>>>>>
>>>>
>>>> Same with a vanilla kernel, so it doesn't appear that any Ubuntu cruft
>>>> is having an impact:
>>>
>>> Reproduced on a HP z620 workstation (E5-2620 instead of E5-2680, but
>>> still Sandy Bridge, though I don't think that matters).
>>>
>>> Bisection leads to:
>>> # bad: [e8d1955258091e4c92d5a975ebd7fd8a98f5d30f] acpi, memory-hotplug:
>>> parse SRAT before memblock is ready
>>>
>>> Nothing terribly obvious leaps out as to *why* that reshuffling messes
>>> up the cpu<-->node bindings, but I wanted to put this out there while
>>> I poke around further. [Note that the SRAT: PXM -> APIC -> Node print
>>> outs during boot are the same either way -- if you look at the APIC
>>> numbers of the processors (from /proc/cpuinfo), the processors should
>>> be assigned to the correct node, but they aren't.] cc'ing Tang Chen
>>> in case this is obvious to him or he's already fixed it somewhere not
>>> on Linus's tree yet.
>>>
>>> Don Morris
>>>
>>>>
>>>> [ 0.170435] ------------[ cut here ]------------
>>>> [ 0.170450] WARNING: at arch/x86/kernel/smpboot.c:324
>>>> topology_sane.isra.2+0x71/0x84()
>>>> [ 0.170452] Hardware name: S2600CP
>>>> [ 0.170454] sched: CPU #1's llc-sibling CPU #0 is not on the same
>>>> node! [node: 1 != 0]. Ignoring dependency.
>>>> [ 0.156000] smpboot: Booting Node 1, Processors #1
>>>> [ 0.170455] Modules linked in:
>>>> [ 0.170460] Pid: 0, comm: swapper/1 Not tainted 3.8.0+ #1
>>>> [ 0.170461] Call Trace:
>>>> [ 0.170466] [<ffffffff810597bf>] warn_slowpath_common+0x7f/0xc0
>>>> [ 0.170473] [<ffffffff810598b6>] warn_slowpath_fmt+0x46/0x50
>>>> [ 0.170477] [<ffffffff816cc752>] topology_sane.isra.2+0x71/0x84
>>>> [ 0.170482] [<ffffffff816cc9de>] set_cpu_sibling_map+0x23f/0x436
>>>> [ 0.170487] [<ffffffff816ccd0c>] start_secondary+0x137/0x201
>>>> [ 0.170502] ---[ end trace 09222f596307ca1d ]---
>>
>> that commit is totally broken, and it should be reverted.
>>
>> 1. numa_init is called several times, NOT just for srat. so those
>> nodes_clear(numa_nodes_parsed)
>> memset(&numa_meminfo, 0, sizeof(numa_meminfo))
>> can not be just removed.
>> please consider sequence is: numaq, srat, amd, dummy.
>> You need to make fall back path working!
>>
>> 2. simply split acpi_numa_init to early_parse_srat.
>> a. that early_parse_srat is NOT called for ia64, so you break ia64.
>> b. for (i = 0; i < MAX_LOCAL_APIC; i++)
>> set_apicid_to_node(i, NUMA_NO_NODE)
>> still left in numa_init. So it will just clear result from early_parse_srat.
>> it should be moved before that....
>>
>> 3. that patch TITLE is total misleading, there is NO x86 in the title,
>> but it changes
>> to x86 code.
>>
>> 4, it does not CC to TJ and other numa guys...
>
> attached workaround the problem for now.
> but it will assume NUMAQ would not have SRAT table.
>

Martin, can you confirm that numaq does not have srat?

Thanks

Yinghai

Attachment: x.patch
Description: Binary data