Re: [PATCH] lib/genalloc.c: Avoid de-referencing NULL pool

From: Florian Fainelli
Date: Mon Jun 10 2019 - 18:34:43 EST


On 6/10/19 3:03 PM, Florian Fainelli wrote:
> On 6/10/19 2:51 PM, Andrew Morton wrote:
>> On Fri, 7 Jun 2019 16:43:31 -0700 Florian Fainelli <f.fainelli@xxxxxxxxx> wrote:
>>
>>> With architectures allowing the kernel to be placed almost arbitrarily
>>> in memory (e.g.: ARM64), it is possible to have the kernel resides at
>>> physical addresses above 4GB, resulting in neither the default CMA area,
>>> nor the atomic pool from successfully allocating. This does not prevent
>>> specific peripherals from working though, one example is XHCI, which
>>> still operates correctly.
>>>
>>> Trouble comes when the XHCI driver gets suspended and resumed, since we
>>> can now trigger the following NPD:
>>>
>>> ...
>>>
>>> [ 13.327884] f8c0: 0000000000000030 ffffffffffffffff
>>> [ 13.332835] [<ffffff80083c0df8>] addr_in_gen_pool+0x4/0x48
>>> [ 13.338398] [<ffffff80086004d0>] xhci_mem_cleanup+0xc8/0x51c
>>> [ 13.344137] [<ffffff80085f9250>] xhci_resume+0x308/0x65c
>>> [ 13.349524] [<ffffff80085e3de8>] xhci_brcm_resume+0x84/0x8c
>>> [ 13.355174] [<ffffff80084ad040>] platform_pm_resume+0x3c/0x64
>>> [ 13.360997] [<ffffff80084b91b4>] dpm_run_callback+0x5c/0x15c
>>> [ 13.366732] [<ffffff80084b96bc>] device_resume+0xc0/0x190
>>> [ 13.372205] [<ffffff80084baa70>] dpm_resume+0x144/0x2cc
>>> [ 13.377504] [<ffffff80084bafbc>] dpm_resume_end+0x20/0x34
>>> [ 13.382980] [<ffffff80080e0d88>] suspend_devices_and_enter+0x104/0x704
>>> [ 13.389585] [<ffffff80080e16a8>] pm_suspend+0x320/0x53c
>>> [ 13.394881] [<ffffff80080dfd08>] state_store+0xbc/0xe0
>>> [ 13.400094] [<ffffff80083a89d4>] kobj_attr_store+0x14/0x24
>>> [ 13.405655] [<ffffff800822a614>] sysfs_kf_write+0x60/0x70
>>> [ 13.411128] [<ffffff80082295d4>] kernfs_fop_write+0x130/0x194
>>> [ 13.416954] [<ffffff80081b5d10>] __vfs_write+0x60/0x150
>>> [ 13.422254] [<ffffff80081b6b20>] vfs_write+0xc8/0x164
>>> [ 13.427376] [<ffffff80081b7dd8>] SyS_write+0x70/0xc8
>>> [ 13.432412] [<ffffff8008083180>] el0_svc_naked+0x34/0x38
>>> [ 13.437800] Code: 92800173 97f6fb9e 17fffff5 d1000442 (f8408c03)
>>> [ 13.444033] ---[ end trace 2effe12f909ce205 ]---
>>>
>>> The call path leading to this problem is xhci_mem_cleanup() ->
>>> dma_free_coherent() -> dma_free_from_pool() -> addr_in_gen_pool. If the
>>> atomic_pool is NULL, we can't possibly have the address in the atomic
>>> pool anyway, so guard against that.
>>>
>>
>> Arguably the caller shouldn't be pasing in a NULL pointer. Perhaps we
>> couild do this as a convenience thing if addr_in_gen_pool(NULL) makes
>> some sort of semantic sense, but I'm having trouble convincing myself
>> that it does.
>
> That is absolutely true, part of the problem is that there is a context
> imbalance here going on, which is why this condition can be triggered.
> The first time the XHCI descriptor memory is allocated, we are in
> sleepable context, but when we resume from system sleep, we are not. The
> allocation is checked properly against a NULL pool, but not the freeing.

On second thought it's probably fine to put the check in
dma_in_atomic_pool() to match what other parts of kernel/dma/remap.c do,
I will go ahead and do that.
--
Florian