Re: [PATCH] random: Fix kernel panic due to system_wq use before init

From: Linus Torvalds
Date: Wed Sep 14 2016 - 15:14:37 EST


Ugh, I detest this patch.

My gut feeling is that a driver (even a fairly core one like the
random code) should not have to know these kinds of details like
"schedule_work() needs system_wq to have been initialized".

I'm wondering if we couldn't just initialize "system_wq" earlier.
Right now init_workqueues() is an "early_initcall()", so it's at the
same priority as a number of other random early initcalls. My gut
feeling is that it should be initialized even earlier, with the
scheduler.

Because dammit, we use "schedule_work()" as if it was a pretty core
scheduler thing, and having to have some odd knowledge of system_wq
initialization details in the rest of the kernel sounds really really
wrong.

I don't think the random code is at all special in maybe wanting to
schedule some work despite being an "early" initcall.

Adding Tejun to the cc, and quoting the whole email.

Tejun, comments?

Linus

On Wed, Sep 14, 2016 at 12:03 PM, Waiman Long <Waiman.Long@xxxxxxx> wrote:
> While booting a 4.8-rc6 kernel on a 16-socket 768-thread Broadwell-EX
> system, the kernel panic'ed with the following log:
>
> [ 51.837010] BUG: unable to handle kernel NULL pointer dereference at 0000000000000102
> [ 51.845635] IP: [<ffffffff810a49d2>] __queue_work+0x32/0x420
> [ 52.004366] Call Trace:
> [ 52.007053] <IRQ>
> [ 52.009171] [<ffffffff810a4de7>] queue_work_on+0x27/0x40
> [ 52.015306] [<ffffffff8146ebd7>] credit_entropy_bits+0x1d7/0x2a0
> [ 52.022002] [<ffffffff8146f339>] ? add_interrupt_randomness+0x1b9/0x210
> [ 52.029366] [<ffffffff8146f339>] add_interrupt_randomness+0x1b9/0x210
> [ 52.036544] [<ffffffff810ea670>] handle_irq_event_percpu+0x40/0x80
> [ 52.043430] [<ffffffff810ea6eb>] handle_irq_event+0x3b/0x60
> [ 52.049655] [<ffffffff810ede48>] handle_level_irq+0x88/0x100
> [ 52.055968] [<ffffffff8103004b>] handle_irq+0xab/0x130
> [ 52.061708] [<ffffffff81092861>] ? _local_bh_enable+0x21/0x50
> [ 52.068125] [<ffffffff81037bf5>] ? __exit_idle+0x5/0x30
> [ 52.073965] [<ffffffff816fc9ed>] do_IRQ+0x4d/0xd0
> [ 52.079229] [<ffffffff816fa88c>] common_interrupt+0x8c/0x8c
> [ 52.085444] <EOI>
> [ 52.087568] [<ffffffff8106bea9>] ? try_to_free_pmd_page+0x9/0x40
> [ 52.094462] [<ffffffff8106bea5>] ? try_to_free_pmd_page+0x5/0x40
> [ 52.101157] [<ffffffff8106bf2a>] ? __unmap_pmd_range.part.5+0x4a/0x70
> [ 52.108330] [<ffffffff8106c330>] unmap_pmd_range+0x130/0x250
> [ 52.114644] [<ffffffff8106cb7b>] __cpa_process_fault+0x47b/0x5a0
> [ 52.121339] [<ffffffff8106d7cb>] __change_page_attr+0x78b/0x9e0
> [ 52.127946] [<ffffffff81065e45>] ? __raw_callee_save___native_queued_spin_unlock+0x15/0x30
> [ 52.137124] [<ffffffff8106da98>] __change_page_attr_set_clr+0x78/0x300
> [ 52.144404] [<ffffffff81229180>] ? __slab_alloc+0x4d/0x5c
> [ 52.150436] [<ffffffff8106f15f>] kernel_map_pages_in_pgd+0x8f/0xd0
> [ 52.157333] [<ffffffff81dc0fb0>] efi_setup_page_tables+0xcc/0x1d9
> [ 52.164124] [<ffffffff81dc0a8d>] efi_enter_virtual_mode+0x35e/0x4af
> [ 52.171117] [<ffffffff81d9e109>] start_kernel+0x41f/0x4c8
> [ 52.177142] [<ffffffff81d9dad8>] ? set_init_arg+0x55/0x55
> [ 52.183168] [<ffffffff81d9d120>] ? early_idt_handler_array+0x120/0x120
> [ 52.190440] [<ffffffff81d9d5d6>] x86_64_start_reservations+0x2a/0x2c
> [ 52.197516] [<ffffffff81d9d724>] x86_64_start_kernel+0x14c/0x16f
> [ 52.204214] Code: 89 e5 41 57 41 56 49 89 f6 41 55 41 89 fd 41 54 49 89 d4 53 48 83 ec 10 89 7d d4 ff 14 25 a0 a7 c2 81 f6 c4 02 0f 85 0d 03 00 00 <41> f6 86 02 01 00 00 01 0f 85 ae 02 00 00 49 c7 c7 18 41 01 00
> [ 52.225516] RIP [<ffffffff810a49d2>] __queue_work+0x32/0x420
> [ 52.231838] RSP <ffff88bd7f403de8>
> [ 52.235667] CR2: 0000000000000102
> [ 52.239667] ---[ end trace 2ee7ea9d2908eb72 ]---
> [ 52.244743] Kernel panic - not syncing: Fatal exception in interrupt
>
> Looking at the panic'ed instruction indicated that system_wq was
> used by credit_entropy_bits() before it it was initialized in an
> early_initcall.
>
> This patch prevents the schedule_work() call from being made before
> system_wq is initialized.
>
> Signed-off-by: Waiman Long <Waiman.Long@xxxxxxx>
> ---
> drivers/char/random.c | 8 ++++++--
> 1 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 3efb3bf..3afc519 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -730,8 +730,12 @@ retry:
> r->entropy_total >= 2*random_read_wakeup_bits) {
> struct entropy_store *other = &blocking_pool;
>
> - if (other->entropy_count <=
> - 3 * other->poolinfo->poolfracbits / 4) {
> + /*
> + * We cannot call schedule_work() before system_wq
> + * is initialized.
> + */
> + if (system_wq && (other->entropy_count <=
> + 3 * other->poolinfo->poolfracbits / 4)) {
> schedule_work(&other->push_work);
> r->entropy_total = 0;
> }
> --
> 1.7.1
>