Re: hid-related 5.2-rc1 boot hang
From: Benjamin Tissoires
Date: Tue Jun 04 2019 - 08:29:18 EST
On Tue, Jun 4, 2019 at 12:50 PM Hans de Goede <hdegoede@xxxxxxxxxx> wrote:
>
> Hi,
>
> On 04-06-19 12:08, Benjamin Tissoires wrote:
> > On Tue, Jun 4, 2019 at 9:51 AM Benjamin Tissoires
> > <benjamin.tissoires@xxxxxxxxxx> wrote:
> >>
> >> On Mon, Jun 3, 2019 at 4:17 PM Hans de Goede <hdegoede@xxxxxxxxxx> wrote:
> >>>
> >>> Hi,
> >>>
> >>> On 03-06-19 15:55, Benjamin Tissoires wrote:
> >>>> On Mon, Jun 3, 2019 at 11:51 AM Hans de Goede <hdegoede@xxxxxxxxxx> wrote:
> >>>>>
> >>>>> Hi Again,
> >>>>>
> >>>>> On 03-06-19 11:11, Hans de Goede wrote:
> >>>>> <snip>
> >>>>>
> >>>>>>> not sure about the rest of logitech issues yet) next week.
> >>>>>>
> >>>>>> The main problem seems to be the request_module patches. Although I also
> >>>>
> >>>> Can't we use request_module_nowait() instead, and set a reasonable
> >>>> timeout that we detect only once to check if userspace is compatible:
> >>>>
> >>>> In pseudo-code:
> >>>> if (!request_module_checked) {
> >>>> request_module_nowait(name);
> >>>> use_request_module = wait_event_timeout(wq,
> >>>> first_module_loaded, 10 seconds in jiffies);
> >>>> request_module_checked = true;
> >>>> } else if (use_request_module) {
> >>>> request_module(name);
> >>>> }
> >>>
> >>> Well looking at the just attached dmesg , the modprobe
> >>> when triggered by udev from userspace succeeds in about
> >>> 0.5 seconds, so it seems that the modprobe hangs happens
> >>> when called from within the kernel rather then from within
> >>> userspace.
> >>>
> >>> What I do not know if is the hang is inside userspace, or
> >>> maybe it happens when modprobe calls back into the kernel,
> >>> if the hang happens when modprobe calls back into the kernel,
> >>> then other modprobes (done from udev) likely will hang too
> >>> since I think only 1 modprobe can happen at a time.
> >>>
> >>> I really wish we knew what distinguished working systems
> >>> from non working systems :|
> >>>
> >>> I cannot find a common denominator; other then the systems
> >>> are not running Fedora. So far we've reports from both Ubuntu 16.04
> >>> and Tumbleweed, so software version wise these 2 are wide apart.
> >>
> >> I am trying to reproduce the lock locally, and installed an opensuse
> >> Tumbleweed in a VM. When forwarding a Unifying receiver to the VM, I
> >> do not see the lock with either my vanilla compiled kernel and the rpm
> >> found in http://download.opensuse.org/repositories/Kernel:/HEAD/standard/x86_64/
> >>
> >> Next step is install Tumbleweed on bare metal, but I do not see how
> >> this could introduce a difference (maybe USB2 vs 3).
> >
> > Making progress here.
> >
> > The difference between Ubuntu/Tumbleweed and Fedora: usbhid is shipped
> > as a module while in Fedora usbhid is included in the kernel.
> >
> > If I rmmod hid_* and usbhid, then modprobe usbhid, the command hangs
> > for 3 minutes.
> > If usbhid is already loaded, inserting a receiver is immediate
> > regarding the loading of the external modules.
> >
> > So my assumption is that when the device gets detected at boot, usbhid
> > gets loaded by the kernel event, which in turns attempts to call
> > __request_module, but the modprobe can't be fulfilled because it's
> > already waiting for the initial usbhid modprobe to finish.
> >
> > Still don't know how to solve that, but I thought I should share.
>
> Hmm, we may be hitting the scenario described in the big comment
> around line 3500 of kernel/module.c.
Well, we are not locking during do_init_module(), but in waiting for
the completion of request_module(). So as I read the trace, we wait
for userspace to call/terminate modprobe.
>
> But I'm not sure that is what is happening here.
>
> Maybe you can put a WARN_ON(1) in request_module and look at the
> backtrace ? That may help to figure out what is going on; or
> alternatively it might help to find some way to detect this and
> if it happens skip the request_module...
Ftrace is much easier to deal with:
---
/sys/kernel/debug/tracing # cat trace
# tracer: function
#
# entries-in-buffer/entries-written: 4/4 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
modprobe-23236 [003] .... 9191.880917: __request_module
<-hid_add_device
modprobe-23236 [003] .... 9191.880937: <stack trace>
=> 0xffffffffc0851061
=> __request_module
=> hid_add_device
=> usbhid_probe
=> usb_probe_interface
=> really_probe
=> driver_probe_device
=> device_driver_attach
=> __driver_attach
=> bus_for_each_dev
=> bus_add_driver
=> driver_register
=> usb_register_driver
=> hid_init
=> do_one_initcall
=> do_init_module
=> load_module
=> __do_sys_finit_module
=> do_syscall_64
=> entry_SYSCALL_64_after_hwframe
modprobe-23236 [003] .... 9191.892765: __request_module
<-hid_add_device
modprobe-23236 [003] .... 9191.892784: <stack trace>
=> 0xffffffffc0851061
=> __request_module
=> hid_add_device
=> usbhid_probe
=> usb_probe_interface
=> really_probe
=> driver_probe_device
=> device_driver_attach
=> __driver_attach
=> bus_for_each_dev
=> bus_add_driver
=> driver_register
=> usb_register_driver
=> hid_init
=> do_one_initcall
=> do_init_module
=> load_module
=> __do_sys_finit_module
=> do_syscall_64
=> entry_SYSCALL_64_after_hwframe
---
This is what happen with the logitech receiver plugged in and while
modprobing usbhid. The modprobe command hangs, and I took the trace
here.
If I hit Ctrl-C, the modprobe loading continue properly.
There is nothing special here in the stack trace, except that there
are 2 calls to request_module here (one for the touchscreen I guess
and one for the logitech receiver).
I have attached the call graph of the same sequence:
wait_for_completion_killable() seems to be the culprit, we are waiting
for userspace to notify it has done calling modprobe.
My idea would be to defer any call to hid_add_device() into a
workqueue and see if that unlocks the situation.
Cheers,
Benjamin
/sys/kernel/debug/tracing # cat trace
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
0) | __request_module() {
0) 0.256 us | current_is_async();
0) | security_kernel_module_request() {
0) 0.307 us | integrity_kernel_module_request();
0) 0.625 us | }
0) | kmem_cache_alloc_trace() {
0) | _cond_resched() {
0) 0.123 us | rcu_all_qs();
0) 0.355 us | }
0) 0.122 us | should_failslab();
0) | _cond_resched() {
0) 0.120 us | rcu_all_qs();
0) 0.355 us | }
0) 0.120 us | memcg_kmem_put_cache();
0) 1.563 us | }
0) | kstrdup() {
0) | __kmalloc_track_caller() {
0) 0.115 us | kmalloc_slab();
0) | _cond_resched() {
0) 0.118 us | rcu_all_qs();
0) 0.348 us | }
0) 0.119 us | should_failslab();
0) | _cond_resched() {
0) 0.115 us | rcu_all_qs();
0) 0.345 us | }
0) 0.119 us | memcg_kmem_put_cache();
0) 1.757 us | }
0) 2.040 us | }
0) | call_usermodehelper_setup() {
0) | kmem_cache_alloc_trace() {
0) | _cond_resched() {
0) 0.116 us | rcu_all_qs();
0) 0.345 us | }
0) 0.117 us | should_failslab();
0) | _cond_resched() {
0) 0.118 us | rcu_all_qs();
0) 0.347 us | }
0) 0.123 us | memcg_kmem_put_cache();
0) 1.792 us | }
0) 2.023 us | }
0) | call_usermodehelper_exec() {
0) | queue_work_on() {
0) | __queue_work() {
0) 0.126 us | get_work_pool();
0) 0.183 us | _raw_spin_lock();
0) | insert_work() {
0) 0.121 us | get_pwq.isra.0();
0) | wake_up_process() {
0) | try_to_wake_up() {
0) 0.227 us | _raw_spin_lock_irqsave();
0) | select_task_rq_fair() {
0) 0.133 us | available_idle_cpu();
0) 0.166 us | update_cfs_rq_h_load();
0) | select_idle_sibling() {
0) 0.137 us | available_idle_cpu();
0) 0.371 us | }
0) 1.299 us | }
0) 0.133 us | _raw_spin_lock();
0) 0.148 us | update_rq_clock();
0) | ttwu_do_activate() {
0) | activate_task() {
0) | enqueue_task_fair() {
0) | enqueue_entity() {
0) 0.123 us | update_curr();
0) | __update_load_avg_se() {
0) 0.118 us | __accumulate_pelt_segments();
0) 0.420 us | }
0) 0.125 us | __update_load_avg_cfs_rq();
0) 0.122 us | update_cfs_group();
0) 0.138 us | account_entity_enqueue();
0) 0.120 us | place_entity();
0) 0.119 us | __enqueue_entity();
0) 2.146 us | }
0) 0.115 us | hrtick_update();
0) 2.624 us | }
0) 2.890 us | }
0) | ttwu_do_wakeup() {
0) | check_preempt_curr() {
0) 0.130 us | resched_curr();
0) 0.417 us | }
0) 1.034 us | }
0) 4.269 us | }
0) 0.410 us | _raw_spin_unlock_irqrestore();
0) 7.466 us | }
0) 7.702 us | }
0) 8.164 us | }
0) 9.079 us | }
0) 9.370 us | }
0) | wait_for_completion_killable() {
0) | _cond_resched() {
0) 0.118 us | rcu_all_qs();
0) 0.365 us | }
0) 0.122 us | _raw_spin_lock_irq();
0) | schedule_timeout() {
0) | schedule() {
0) | rcu_note_context_switch() {
0) 0.147 us | rcu_qs();
0) 0.384 us | }
0) 0.124 us | _raw_spin_lock();
0) 0.144 us | update_rq_clock();
0) | deactivate_task() {
0) | dequeue_task_fair() {
0) | dequeue_entity() {
0) | update_curr() {
0) 0.121 us | update_min_vruntime();
0) 0.132 us | cpuacct_charge();
0) | __cgroup_account_cputime() {
0) 0.124 us | cgroup_rstat_updated();
0) 0.354 us | }
0) 1.088 us | }
0) 0.127 us | __update_load_avg_se();
0) 0.136 us | __update_load_avg_cfs_rq();
0) 0.118 us | clear_buddies();
0) 0.124 us | account_entity_dequeue();
0) 0.120 us | update_cfs_group();
0) 0.128 us | update_min_vruntime();
0) 2.780 us | }
0) 0.114 us | hrtick_update();
0) 3.242 us | }
0) 3.503 us | }
0) | pick_next_task_fair() {
0) 0.121 us | __msecs_to_jiffies();
0) 0.389 us | }
0) | pick_next_task_idle() {
0) | put_prev_task_fair() {
0) | put_prev_entity() {
0) 0.122 us | check_cfs_rq_runtime();
0) 0.355 us | }
0) 0.581 us | }
0) 0.123 us | __update_idle_core();
0) 1.055 us | }
0) 0.118 us | enter_lazy_tlb();
2) | finish_task_switch() {
2) | __mmdrop() {
2) | pgd_free() {
2) 0.248 us | _raw_spin_lock();
2) | free_pages() {
2) | free_pages.part.0() {
2) | __free_pages() {
2) | __free_pages_ok() {
2) | free_one_page() {
2) 0.213 us | _raw_spin_lock();
2) 0.219 us | __mod_zone_page_state();
2) 1.102 us | }
2) 1.839 us | }
2) 2.337 us | }
2) 2.672 us | }
2) 2.986 us | }
2) 4.044 us | }
2) 0.218 us | destroy_context_ldt();
2) | kmem_cache_free() {
2) 0.416 us | ___cache_free();
2) 1.051 us | }
2) 6.234 us | }
2) 8.542 us | }
2) * 13308.49 us | } /* schedule */
2) * 13308.85 us | } /* schedule_timeout */
2) 0.235 us | _raw_spin_lock_irq();
2) * 13310.47 us | } /* wait_for_completion_killable */
2) | free_modprobe_argv() {
2) | kfree() {
2) 0.213 us | ___cache_free();
2) 0.657 us | }
2) | kfree() {
2) 0.153 us | ___cache_free();
2) 0.557 us | }
2) 1.732 us | }
2) | kfree() {
2) 0.207 us | ___cache_free();
2) 0.734 us | }
2) | __wake_up() {
2) | __wake_up_common_lock() {
2) 0.265 us | _raw_spin_lock_irqsave();
2) 0.199 us | __wake_up_common();
2) 0.214 us | _raw_spin_unlock_irqrestore();
2) 1.333 us | }
2) 1.648 us | }
2) * 13325.51 us | } /* call_usermodehelper_exec */
2) | __wake_up() {
2) | __wake_up_common_lock() {
2) 0.195 us | _raw_spin_lock_irqsave();
2) 0.200 us | __wake_up_common();
2) 0.189 us | _raw_spin_unlock_irqrestore();
2) 1.274 us | }
2) 1.629 us | }
2) * 13337.37 us | } /* __request_module */
2) | __request_module() {
2) 0.279 us | current_is_async();
2) | security_kernel_module_request() {
2) 0.310 us | integrity_kernel_module_request();
2) 0.799 us | }
2) | kmem_cache_alloc_trace() {
2) | _cond_resched() {
2) 0.209 us | rcu_all_qs();
2) 0.598 us | }
2) 0.208 us | should_failslab();
2) | _cond_resched() {
2) 0.202 us | rcu_all_qs();
2) 0.597 us | }
2) 0.206 us | memcg_kmem_put_cache();
2) 2.596 us | }
2) | kstrdup() {
2) | __kmalloc_track_caller() {
2) 0.209 us | kmalloc_slab();
2) | _cond_resched() {
2) 0.200 us | rcu_all_qs();
2) 0.598 us | }
2) 0.399 us | should_failslab();
2) | _cond_resched() {
2) 0.179 us | rcu_all_qs();
2) 0.491 us | }
2) 0.174 us | memcg_kmem_put_cache();
2) 3.004 us | }
2) 3.424 us | }
2) | call_usermodehelper_setup() {
2) | kmem_cache_alloc_trace() {
2) | _cond_resched() {
2) 0.171 us | rcu_all_qs();
2) 0.816 us | }
2) 0.173 us | should_failslab();
2) | _cond_resched() {
2) 0.188 us | rcu_all_qs();
2) 0.544 us | }
2) 0.205 us | memcg_kmem_put_cache();
2) 2.728 us | }
2) 3.100 us | }
2) | call_usermodehelper_exec() {
2) | queue_work_on() {
2) | __queue_work() {
2) 0.213 us | get_work_pool();
2) 0.206 us | _raw_spin_lock();
2) | insert_work() {
2) 0.194 us | get_pwq.isra.0();
2) | wake_up_process() {
2) | try_to_wake_up() {
2) 0.248 us | _raw_spin_lock_irqsave();
2) | select_task_rq_fair() {
2) 0.205 us | available_idle_cpu();
2) 0.252 us | update_cfs_rq_h_load();
2) | select_idle_sibling() {
2) 0.277 us | available_idle_cpu();
2) 0.666 us | }
2) 2.074 us | }
2) 0.213 us | _raw_spin_lock();
2) 0.240 us | update_rq_clock();
2) | ttwu_do_activate() {
2) | activate_task() {
2) | enqueue_task_fair() {
2) | enqueue_entity() {
2) 0.209 us | update_curr();
2) | __update_load_avg_se() {
2) 0.198 us | __accumulate_pelt_segments();
2) 0.715 us | }
2) 0.221 us | __update_load_avg_cfs_rq();
2) 0.205 us | update_cfs_group();
2) 0.274 us | account_entity_enqueue();
2) 0.209 us | place_entity();
2) 0.211 us | __enqueue_entity();
2) 3.718 us | }
2) 0.208 us | hrtick_update();
2) 4.568 us | }
2) 5.005 us | }
2) | ttwu_do_wakeup() {
2) | check_preempt_curr() {
2) 0.222 us | resched_curr();
2) 0.700 us | }
2) 1.473 us | }
2) 7.098 us | }
2) 0.218 us | _raw_spin_unlock_irqrestore();
2) + 11.605 us | }
2) + 11.947 us | }
2) + 12.730 us | }
2) + 14.091 us | }
2) + 14.527 us | }
2) | wait_for_completion_killable() {
2) | _cond_resched() {
2) 0.210 us | rcu_all_qs();
2) 0.636 us | }
2) 0.241 us | _raw_spin_lock_irq();
2) | schedule_timeout() {
2) | schedule() {
2) | rcu_note_context_switch() {
2) 0.217 us | rcu_qs();
2) 0.618 us | }
2) 0.200 us | _raw_spin_lock();
2) 0.213 us | update_rq_clock();
2) | deactivate_task() {
2) | dequeue_task_fair() {
2) | dequeue_entity() {
2) | update_curr() {
2) 0.196 us | update_min_vruntime();
2) 0.239 us | cpuacct_charge();
2) | __cgroup_account_cputime() {
2) 0.202 us | cgroup_rstat_updated();
2) 0.638 us | }
2) 1.946 us | }
2) 0.210 us | __update_load_avg_se();
2) 0.213 us | __update_load_avg_cfs_rq();
2) 0.200 us | clear_buddies();
2) 0.212 us | account_entity_dequeue();
2) 0.203 us | update_cfs_group();
2) 0.207 us | update_min_vruntime();
2) 5.029 us | }
2) 0.194 us | hrtick_update();
2) 5.793 us | }
2) 6.205 us | }
2) | pick_next_task_fair() {
2) | update_blocked_averages() {
2) 0.149 us | _raw_spin_lock_irqsave();
2) 0.205 us | update_rq_clock();
2) 0.167 us | __update_load_avg_cfs_rq();
2) 0.206 us | update_rt_rq_load_avg();
2) 0.184 us | update_dl_rq_load_avg();
2) 0.183 us | _raw_spin_unlock_irqrestore();
2) 2.510 us | }
2) | load_balance() {
2) | find_busiest_group() {
2) 0.172 us | update_nohz_stats();
2) 0.176 us | idle_cpu();
2) 0.207 us | update_nohz_stats();
2) 1.818 us | }
2) 2.339 us | }
2) 0.171 us | __msecs_to_jiffies();
2) | load_balance() {
2) | find_busiest_group() {
2) 0.211 us | update_nohz_stats();
2) 0.203 us | update_nohz_stats();
2) 0.190 us | idle_cpu();
2) 0.216 us | update_nohz_stats();
2) 0.222 us | update_nohz_stats();
2) 2.711 us | }
2) 0.228 us | _raw_spin_lock_irqsave();
2) 0.262 us | update_rq_clock();
2) 0.233 us | can_migrate_task();
2) 0.322 us | can_migrate_task();
2) 0.222 us | update_cfs_rq_h_load();
2) | deactivate_task() {
2) | dequeue_task_fair() {
2) | dequeue_entity() {
2) | update_curr() {
2) 0.209 us | update_min_vruntime();
2) 0.259 us | cpuacct_charge();
2) | __cgroup_account_cputime() {
2) 0.193 us | cgroup_rstat_updated();
2) 0.738 us | }
2) 2.022 us | }
2) 0.223 us | __update_load_avg_se();
2) 0.213 us | __update_load_avg_cfs_rq();
2) 0.202 us | clear_buddies();
2) 0.215 us | account_entity_dequeue();
2) 0.204 us | update_cfs_group();
2) 0.202 us | update_min_vruntime();
2) 4.891 us | }
2) 0.205 us | hrtick_update();
2) 5.711 us | }
2) 6.143 us | }
2) | set_task_cpu() {
2) | migrate_task_rq_fair() {
2) | detach_entity_cfs_rq() {
2) 0.215 us | __update_load_avg_se();
2) 0.204 us | __update_load_avg_cfs_rq();
2) 0.219 us | propagate_entity_cfs_rq.isra.0();
2) 1.433 us | }
2) 1.814 us | }
2) 0.195 us | set_task_rq_fair();
2) 2.664 us | }
2) 0.202 us | _raw_spin_lock();
2) 0.219 us | update_rq_clock();
2) | attach_task() {
2) | activate_task() {
2) | enqueue_task_fair() {
2) | enqueue_entity() {
2) | update_curr() {
2) 0.225 us | update_min_vruntime();
2) 0.202 us | cpuacct_charge();
2) | __cgroup_account_cputime() {
2) 0.196 us | cgroup_rstat_updated();
2) 0.569 us | }
2) 2.120 us | }
2) 0.208 us | __update_load_avg_cfs_rq();
2) 0.219 us | attach_entity_load_avg();
2) 0.203 us | update_cfs_group();
2) 0.184 us | account_entity_enqueue();
2) 0.217 us | __enqueue_entity();
2) 4.678 us | }
2) 0.177 us | hrtick_update();
2) 5.408 us | }
2) 5.795 us | }
2) | check_preempt_curr() {
2) | check_preempt_wakeup() {
2) 0.205 us | update_curr();
2) 0.201 us | wakeup_preempt_entity.isra.0();
2) 0.970 us | }
2) 1.359 us | }
2) 7.688 us | }
2) + 23.511 us | }
2) 0.201 us | __msecs_to_jiffies();
2) 0.205 us | _raw_spin_lock();
2) 0.197 us | check_cfs_rq_runtime();
2) | pick_next_entity() {
2) 0.198 us | clear_buddies();
2) 0.600 us | }
2) | put_prev_entity() {
2) 0.188 us | check_cfs_rq_runtime();
2) 0.583 us | }
2) | set_next_entity() {
2) 0.209 us | __update_load_avg_se();
2) 0.215 us | __update_load_avg_cfs_rq();
2) 1.053 us | }
2) + 33.704 us | }
2) | switch_mm_irqs_off() {
2) 0.607 us | load_new_mm_cr3();
2) 2.798 us | }
3) 5.150 us | finish_task_switch();
--------------------------------------------------------------------------------
Ctrl-C hit, notice the delay
--------------------------------------------------------------------------------
3) $ 54568382 us | } /* schedule */
3) $ 54568383 us | } /* schedule_timeout */
3) 1.011 us | _raw_spin_lock_irq();
3) $ 54568388 us | } /* wait_for_completion_killable */
3) | __wake_up() {
3) | __wake_up_common_lock() {
3) 0.911 us | _raw_spin_lock_irqsave();
3) 1.015 us | __wake_up_common();
3) 0.958 us | _raw_spin_unlock_irqrestore();
3) 6.186 us | }
3) 7.650 us | }
3) $ 54568414 us | } /* call_usermodehelper_exec */
3) | __wake_up() {
3) | __wake_up_common_lock() {
3) 0.720 us | _raw_spin_lock_irqsave();
3) 0.826 us | __wake_up_common();
3) 0.850 us | _raw_spin_unlock_irqrestore();
3) 5.655 us | }
3) 7.309 us | }
3) $ 54568437 us | } /* __request_module */