Re: [RFC v2 PATCH] mm/percpu.c: fix panic triggered by BUG_ON() falsely

From: Tejun Heo
Date: Thu Oct 13 2016 - 20:34:01 EST


On Tue, Oct 11, 2016 at 10:00:28PM +0800, zijun_hu wrote:
> From: zijun_hu <zijun_hu@xxxxxxx>
>
> as shown by pcpu_build_alloc_info(), the number of units within a percpu
> group is educed by rounding up the number of CPUs within the group to
> @upa boundary, therefore, the number of CPUs isn't equal to the units's
> if it isn't aligned to @upa normally. however, pcpu_page_first_chunk()
> uses BUG_ON() to assert one number is equal the other roughly, so a panic
> is maybe triggered by the BUG_ON() falsely.
>
> in order to fix this issue, the number of CPUs is rounded up then compared
> with units's, the BUG_ON() is replaced by warning and returning error code
> as well to keep system alive as much as possible.

I really can't decode what the actual issue is here. Can you please
give an example of a concrete case?

> @@ -2113,21 +2120,22 @@ int __init pcpu_page_first_chunk(size_t reserved_size,
>
> /* allocate pages */
> j = 0;
> - for (unit = 0; unit < num_possible_cpus(); unit++)
> + for (unit = 0; unit < num_possible_cpus(); unit++) {
> + unsigned int cpu = ai->groups[0].cpu_map[unit];
> for (i = 0; i < unit_pages; i++) {
> - unsigned int cpu = ai->groups[0].cpu_map[unit];
> void *ptr;
>
> ptr = alloc_fn(cpu, PAGE_SIZE, PAGE_SIZE);
> if (!ptr) {
> pr_warn("failed to allocate %s page for cpu%u\n",
> - psize_str, cpu);
> + psize_str, cpu);

And stop making gratuitous changes?

Thanks.

--
tejun