Re: [PATCH v1] xen: avoid crash in disable_hotplug_cpu

From: Juergen Gross
Date: Wed Sep 05 2018 - 11:14:54 EST


On 05/09/18 16:47, Olaf Hering wrote:
> Am Wed, 5 Sep 2018 12:55:58 +0200
> schrieb Juergen Gross <jgross@xxxxxxxx>:
>
>> Instead of trying to fight the symptoms, I think avoiding to offline
>> the last cpu would make more sense.
>
> Well, apparently the fix is to leave cpu#0 online because of a backtrace like that:

I'd go with testing whether cpu_is_hotpluggable(cpu) returns true.
Per default this will return false for cpu 0.

Additionally I'd really like a test for num_online_cpus() > 1

BTW: I'm not sure this WARN triggers because it is cpu#0. Are you sure
the tested cpu in that WARN was 0? After all the test is just running
on cpu#0 and I don't think it can be offline already.


Juergen

>
> WARNING: CPU: 0 PID: 83 at kernel/sched/cpudeadline.c:159 cpudl_clear+0xa5/0xb0
> Workqueue: events cpuset_hotplug_workfn
> RIP: e030:cpudl_clear+0xa5/0xb0
> Code: 8b 43 48 c7 44 28 0c ff ff ff ff e8 d5 fd ff ff 48 8d 43 08 f0 4c 0f ab 20 4c 89 ee 48 89 df 5b 5d 41 5c 41 5d e9 0b 3b 79 00 <0f> 0b e9 76 ff ff ff 0f 1f 40 00 66 66 66 66 90 41 56 49 89 d6 41
> RSP: e02b:ffffc900411cbc40 EFLAGS: 00010086
> RAX: ffffffff810d09a0 RBX: ffff880106f1a100 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880106f1a100
> RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8801068989b0
> R10: ffff8801068989d0 R11: 0000000000000008 R12: 0000000000000000
> R13: ffff8801f3800200 R14: 0000000000000001 R15: ffff8801f3823240
> FS: 00007fd40d7f08c0(0000) GS:ffff8801f3800000(0000) knlGS:0000000000000000
> CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000055eff60fe098 CR3: 00000001edf24000 CR4: 0000000000002660
> Call Trace:
> rq_offline_dl+0x36/0x80
> set_rq_offline+0x31/0x60
> rq_attach_root+0x98/0xc0
> cpu_attach_domain+0x107/0x320
> partition_sched_domains+0x117/0x347
> ? cpus_read_lock+0x2d/0x50
> rebuild_sched_domains_locked+0xe4/0x4e0
> ? __switch_to_asm+0x40/0x70
> ? xen_mc_flush+0x102/0x210
> rebuild_sched_domains+0x16/0x30
> cpuset_hotplug_workfn+0x45e/0xef0
> ? _raw_spin_unlock_irq+0x22/0x40
> ? finish_task_switch+0x75/0x250
> process_one_work+0x1fd/0x3e0
> worker_thread+0x2d/0x3d0
> ? rescuer_thread+0x340/0x340
> kthread+0x112/0x130
> ? kthread_create_worker_on_cpu+0x40/0x40
> ret_from_fork+0x3a/0x50
>
> Initially I did not spot it because the kernel was booted with 'quiet'.
>
> Olaf
>