Re: [PATCH] lib/genalloc.c: Avoid de-referencing NULL pool
From: Florian Fainelli
Date: Mon Jun 10 2019 - 18:08:16 EST
On 6/10/19 2:51 PM, Andrew Morton wrote:
> On Fri, 7 Jun 2019 16:43:31 -0700 Florian Fainelli <f.fainelli@xxxxxxxxx> wrote:
>
>> With architectures allowing the kernel to be placed almost arbitrarily
>> in memory (e.g.: ARM64), it is possible to have the kernel resides at
>> physical addresses above 4GB, resulting in neither the default CMA area,
>> nor the atomic pool from successfully allocating. This does not prevent
>> specific peripherals from working though, one example is XHCI, which
>> still operates correctly.
>>
>> Trouble comes when the XHCI driver gets suspended and resumed, since we
>> can now trigger the following NPD:
>>
>> ...
>>
>> [ 13.327884] f8c0: 0000000000000030 ffffffffffffffff
>> [ 13.332835] [<ffffff80083c0df8>] addr_in_gen_pool+0x4/0x48
>> [ 13.338398] [<ffffff80086004d0>] xhci_mem_cleanup+0xc8/0x51c
>> [ 13.344137] [<ffffff80085f9250>] xhci_resume+0x308/0x65c
>> [ 13.349524] [<ffffff80085e3de8>] xhci_brcm_resume+0x84/0x8c
>> [ 13.355174] [<ffffff80084ad040>] platform_pm_resume+0x3c/0x64
>> [ 13.360997] [<ffffff80084b91b4>] dpm_run_callback+0x5c/0x15c
>> [ 13.366732] [<ffffff80084b96bc>] device_resume+0xc0/0x190
>> [ 13.372205] [<ffffff80084baa70>] dpm_resume+0x144/0x2cc
>> [ 13.377504] [<ffffff80084bafbc>] dpm_resume_end+0x20/0x34
>> [ 13.382980] [<ffffff80080e0d88>] suspend_devices_and_enter+0x104/0x704
>> [ 13.389585] [<ffffff80080e16a8>] pm_suspend+0x320/0x53c
>> [ 13.394881] [<ffffff80080dfd08>] state_store+0xbc/0xe0
>> [ 13.400094] [<ffffff80083a89d4>] kobj_attr_store+0x14/0x24
>> [ 13.405655] [<ffffff800822a614>] sysfs_kf_write+0x60/0x70
>> [ 13.411128] [<ffffff80082295d4>] kernfs_fop_write+0x130/0x194
>> [ 13.416954] [<ffffff80081b5d10>] __vfs_write+0x60/0x150
>> [ 13.422254] [<ffffff80081b6b20>] vfs_write+0xc8/0x164
>> [ 13.427376] [<ffffff80081b7dd8>] SyS_write+0x70/0xc8
>> [ 13.432412] [<ffffff8008083180>] el0_svc_naked+0x34/0x38
>> [ 13.437800] Code: 92800173 97f6fb9e 17fffff5 d1000442 (f8408c03)
>> [ 13.444033] ---[ end trace 2effe12f909ce205 ]---
>>
>> The call path leading to this problem is xhci_mem_cleanup() ->
>> dma_free_coherent() -> dma_free_from_pool() -> addr_in_gen_pool. If the
>> atomic_pool is NULL, we can't possibly have the address in the atomic
>> pool anyway, so guard against that.
>>
>
> Arguably the caller shouldn't be pasing in a NULL pointer. Perhaps we
> couild do this as a convenience thing if addr_in_gen_pool(NULL) makes
> some sort of semantic sense, but I'm having trouble convincing myself
> that it does.
That is absolutely true, part of the problem is that there is a context
imbalance here going on, which is why this condition can be triggered.
The first time the XHCI descriptor memory is allocated, we are in
sleepable context, but when we resume from system sleep, we are not. The
allocation is checked properly against a NULL pool, but not the freeing.
The reason why I went with a change in lib/genalloc.c instead of at the
caller level under kernel/dma/remap.c is because lib/genalloc.c already
has a number of those checks against a NULL pool argument, and this one
was missing because arguably the condition is hard to hit, but can be
hit with the right conditions.
>
> So I'm somewhat inclined to think that going oops was the appropriate
> response to this input...
>
>> --- a/lib/genalloc.c
>> +++ b/lib/genalloc.c
>> @@ -439,6 +439,9 @@ bool addr_in_gen_pool(struct gen_pool *pool, unsigned long start,
>> unsigned long end = start + size - 1;
>> struct gen_pool_chunk *chunk;
>>
>> + if (unlikely(!pool))
>> + return found;
>
> I think it would be clearer to use "return false" here, so the reader
> doesn't have to go find and out what value `found' has.
Sounds fair, the diff did not make it clear because it literally just
the line that is not included in the diff output by default :)
v2 coming, thanks for taking a look.
--
Florian