Re: [lkp-robot] [x86/cpu_entry_area] 10043e02db: kernel_BUG_at_arch/x86/mm/physaddr.c
From: Dmitry Vyukov
Date: Wed Jan 10 2018 - 04:36:44 EST
On Thu, Dec 28, 2017 at 5:18 PM, Andrey Ryabinin
<aryabinin@xxxxxxxxxxxxx> wrote:
>
>
> On 12/28/2017 02:54 PM, Dmitry Vyukov wrote:
>> On Thu, Dec 28, 2017 at 12:51 PM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>>> On Wed, 27 Dec 2017, Dmitry Vyukov wrote:
>>>> On Wed, Dec 27, 2017 at 7:05 PM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>>>>> So this dies simply because kasan_populate_shadow() runs out of memory and
>>>>> has no sanity check whatsoever.
>>>>>
>>>>> static __init void *early_alloc(size_t size, int nid)
>>>>> {
>>>>> return memblock_virt_alloc_try_nid_nopanic(size, size,
>>>>> __pa(MAX_DMA_ADDRESS), BOOTMEM_ALLOC_ACCESSIBLE, nid);
>>>>> }
>>>>>
>>>>> kasan_populate_pmd()
>>>>> {
>>>>> .....
>>>>>
>>>>> p = early_alloc(PAGE_SIZE, nid);
>>>>> entry = pfn_pte(PFN_DOWN(__pa(p)), PAGE_KERNEL);
>>>>>
>>>>> I've instrumented the whole thing and early_alloc() returns NULL at some
>>>>> point and then __pa(NULL) dies in the VIRTUAL_DEBUG code. Well, it would
>>>>> die with VIRTUAL_DEBUG=n as well at some other place.
>>>>>
>>>>> Not really a problem caused by the patch above, it's merily exposing a code
>>>>> path which relies blindly on "enough memory available" assumptions.
>>>>>
>>>>> Throwing more memory at the VM makes the problem go away...
>>>>
>>>> Hi Thomas,
>>>>
>>>> We just need a check inside of early_alloc() to properly diagnose such
>>>> situation, right?
>>>
>>> At least you want to panic with a proper out of memory message. But letting
>>> the thing die at a random place is a bad idea.
>>
>> Thanks. I will cook a patch (if Andrey won't beat me to it).
>>
>
> We probably should panic only if PAGE_SIZE allocation failed. PUD_SIZE,PMD_SIZE allocations have
> failure fallback. I would suggest add 'bool panic' param to early_alloc() and call
> memblock_virt_alloc_try_nid() if it's true.
FTR, I filed https://bugzilla.kernel.org/show_bug.cgi?id=198427 for
this so it's not get lost.