Re: Panic on 8-node system in memblock_virt_alloc_try_nid()

From: Yinghai Lu
Date: Fri Jan 24 2014 - 01:57:14 EST


On Thu, Jan 23, 2014 at 10:38 PM, Santosh Shilimkar
<santosh.shilimkar@xxxxxx> wrote:
> Yinghai,
>
> On Friday 24 January 2014 12:55 AM, Yinghai Lu wrote:
>> On Thu, Jan 23, 2014 at 2:49 PM, Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>>> > Linus's current tree doesn't boot on an 8-node/1TB NUMA system that I
>>> > have. Its reboots are *LONG*, so I haven't fully bisected it, but it's
>>> > down to a just a few commits, most of which are changes to the memblock
>>> > code. Since the panic is in the memblock code, it looks like a
>>> > no-brainer. It's almost certainly the code from Santosh or Grygorii
>>> > that's triggering this.
>>> >
>>> > Config and good/bad dmesg with memblock=debug are here:
>>> >
>>> > http://sr71.net/~dave/intel/3.13/
>>> >
>>> > Please let me know if you need it bisected further than this.
>> Please check attached patch, and it should fix the problem.
>>
>
> [...]
>
>>
>> Subject: [PATCH] x86: Fix numa with reverting wrong memblock setting.
>>
>> Dave reported Numa on x86 is broken on system with 1T memory.
>>
>> It turns out
>> | commit 5b6e529521d35e1bcaa0fe43456d1bbb335cae5d
>> | Author: Santosh Shilimkar <santosh.shilimkar@xxxxxx>
>> | Date: Tue Jan 21 15:50:03 2014 -0800
>> |
>> | x86: memblock: set current limit to max low memory address
>>
>> set limit to low wrongly.
>>
>> max_low_pfn_mapped is different from max_pfn_mapped.
>> max_low_pfn_mapped is always under 4G.
>>
>> That will memblock_alloc_nid all go under 4G.
>>
>> Revert that offending patch.
>>
>> Reported-by: Dave Hansen <dave.hansen@xxxxxxxxx>
>> Signed-off-by: Yinghai Lu <yinghai@xxxxxxxxxx>
>>
>>
> This mostly will fix the $subject issue but the regression
> reported by Andrew [1] will surface with the revert. Its clear
> now that even though commit fixed the issue, it wasn't the fix.
>
> Would be great if you can have a look at the thread.

>> [1] http://lkml.indiana.edu/hypermail/linux/kernel/1312.1/03770.html

Andrew,

Did you bisect which patch in that 23 patchset cause your system have problem?

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/