Re: [RFC][PATCH] firmware: Drop WARN from usermodehelper_read_trylock error case

From: Takashi Iwai
Date: Sat May 02 2015 - 03:10:25 EST

Next message: Wang Nan: "Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs."
Previous message: Vineet Gupta: "Re: [RFC][PATCH 4/4] perf: Create aliases for PMU events"
In reply to: Laura Abbott: "Re: [RFC][PATCH] firmware: Drop WARN from usermodehelper_read_trylock error case"
Next in thread: Laura Abbott: "[PATCH] Bluetooth: Make request workqueue freezable"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

At Fri, 01 May 2015 14:35:16 -0700,
Laura Abbott wrote:
>
> On 05/01/2015 01:22 PM, Laura Abbott wrote:
> > On 04/30/2015 07:43 AM, Takashi Iwai wrote:
> >> At Thu, 30 Apr 2015 07:19:47 -0700,
> >> Marcel Holtmann wrote:
> >>>
> >>> Hi Ming,
> >>>
> >>>>>>>> We've received a number of reports of warnings when coming
> >>>>>>>> out of suspend with certain bluetooth firmware configurations:
> >>>>>>>>
> >>>>>>>> WARNING: CPU: 3 PID: 3280 at drivers/base/firmware_class.c:1126
> >>>>>>>> _request_firmware+0x558/0x810()
> >>>>>>>> Modules linked in: ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
> >>>>>>>> xt_conntrack ebtable_nat ebtable_broute bridge stp llc
> >>>>>>>> ebtable_filter
> >>>>>>>> ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
> >>>>>>>> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
> >>>>>>>> ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
> >>>>>>>> nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw
> >>>>>>>> binfmt_misc bnep intel_rapl iosf_mbi arc4 x86_pkg_temp_thermal
> >>>>>>>> snd_hda_codec_hdmi coretemp kvm_intel joydev snd_hda_codec_realtek
> >>>>>>>> iwldvm snd_hda_codec_generic kvm iTCO_wdt mac80211
> >>>>>>>> iTCO_vendor_support
> >>>>>>>> snd_hda_intel snd_hda_controller snd_hda_codec crct10dif_pclmul
> >>>>>>>> snd_hwdep crc32_pclmul snd_seq crc32c_intel ghash_clmulni_intel
> >>>>>>>> uvcvideo
> >>>>>>>> snd_seq_device iwlwifi btusb videobuf2_vmalloc snd_pcm
> >>>>>>>> videobuf2_core
> >>>>>>>> serio_raw bluetooth cfg80211 videobuf2_memops sdhci_pci v4l2_common
> >>>>>>>> videodev thinkpad_acpi sdhci i2c_i801 lpc_ich mfd_core wacom
> >>>>>>>> mmc_core
> >>>>>>>> media snd_timer tpm_tis hid_logitech_hidpp wmi tpm rfkill snd
> >>>>>>>> mei_me mei
> >>>>>>>> shpchp soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc i915
> >>>>>>>> i2c_algo_bit drm_kms_helper e1000e drm hid_logitech_dj ptp pps_core
> >>>>>>>> video
> >>>>>>>> CPU: 3 PID: 3280 Comm: kworker/u17:0 Not tainted
> >>>>>>>> 3.19.3-200.fc21.x86_64
> >>>>>>>> Hardware name: LENOVO 343522U/343522U, BIOS GCET96WW (2.56 )
> >>>>>>>> 10/22/2013
> >>>>>>>> Workqueue: hci0 hci_power_on [bluetooth]
> >>>>>>>> 0000000000000000 0000000089944328 ffff88040acffb78 ffffffff8176e215
> >>>>>>>> 0000000000000000 0000000000000000 ffff88040acffbb8 ffffffff8109bc1a
> >>>>>>>> 0000000000000000 ffff88040acffcd0 00000000fffffff5 ffff8804076bac40
> >>>>>>>> Call Trace:
> >>>>>>>> [<ffffffff8176e215>] dump_stack+0x45/0x57
> >>>>>>>> [<ffffffff8109bc1a>] warn_slowpath_common+0x8a/0xc0
> >>>>>>>> [<ffffffff8109bd4a>] warn_slowpath_null+0x1a/0x20
> >>>>>>>> [<ffffffff814dbe78>] _request_firmware+0x558/0x810
> >>>>>>>> [<ffffffff814dc165>] request_firmware+0x35/0x50
> >>>>>>>> [<ffffffffa03a7886>] btusb_setup_bcm_patchram+0x86/0x590 [btusb]
> >>>>>>>> [<ffffffff814d40e6>] ? rpm_idle+0xd6/0x230
> >>>>>>>> [<ffffffffa04d4801>] hci_dev_do_open+0xe1/0xa90 [bluetooth]
> >>>>>>>> [<ffffffff810c51dd>] ? ttwu_do_activate.constprop.90+0x5d/0x70
> >>>>>>>> [<ffffffffa04d5980>] hci_power_on+0x40/0x200 [bluetooth]
> >>>>>>>> [<ffffffff810b487c>] process_one_work+0x14c/0x3f0
> >>>>>>>> [<ffffffff810b52f3>] worker_thread+0x53/0x470
> >>>>>>>> [<ffffffff810b52a0>] ? rescuer_thread+0x300/0x300
> >>>>>>>> [<ffffffff810ba548>] kthread+0xd8/0xf0
> >>>>>>>> [<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0
> >>>>>>>> [<ffffffff81774958>] ret_from_fork+0x58/0x90
> >>>>>>>> [<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0
> >>>>>>>>
> >>>>>>>> This occurs after every resume.
> >>>>>>>>
> >>>>>>>> When resuming, the bluetooth driver needs to re-request the
> >>>>>>>> firmware. This re-request is happening before usermodehelper
> >>>>>>>> is fully enabled. If the firmware load succeeded previously, the
> >>>>>>>> caching behavior of the firmware code typically negates the
> >>>>>>>> need to call the usermodehelper code again and the request
> >>>>>>>> succeeds. If the firmware was never loaded because it isn't
> >>>>>>>> actually present in the file system, this results in a call
> >>>>>>>> to usermodehelper and a failure warning every resume. Rather
> >>>>>>>> than have a WARN clogging up the kernel messages each time,
> >>>>>>>> just drop the warn. There is still a dev_err for debugging
> >>>>>>>> purposes.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Laura Abbott <labbott@xxxxxxxxxxxxxxxxx>
> >>>>>>>> ---
> >>>>>>>> This might be papering over a real issue but I'm not
> >>>>>>>> familiar enough with any of suspend/resume, bluetooth,
> >>>>>>>> or firmware loading to identify an alternate fix.
> >>>>>>>> The backtrace is from bcm patchram but the problem
> >>>>>>>> isn't limited to that hardware. Intel also does a
> >>>>>>>> request firmware and I was able to reproduce the
> >>>>>>>> same backtrace on that driver by requesting non-existant
> >>>>>>>> firmware file.
> >>>>>>>
> >>>>>>> so here is the thing with Bluetooth firmware. Some of them
> >>>>>>> are RAM patches to fix the ROM modules. Others are full firmware
> >>>>>>> that are required to be downloaded first.
> >>>>>>>
> >>>>>>> For ROM modules, the RAM patching procedure is optional. So we
> >>>>>>> will proceed even if no firmware is available. This means that
> >>>>>>> the kernel will never cache it (since it is not there in the
> >>>>>>> first place) and also on every resume we have the same issues.
> >>>>>>> So optional firmware is something that happens for Bluetooth USB
> >>>>>>> dongles a lot.
> >>>>>>>
> >>>>>>> In the driver we know which firmwares are optional and which are
> >>>>>>> required. So we could tell the firmware class this if this would
> >>>>>>> make
> >>>>>>> things better and result in clearer errors and warnings. Is that
> >>>>>>> something we want here?
> >>>>>>>
> >>>>>>
> >>>>>> The response on another reply was
> >>>>>>
> >>>>>> "Yes, it is a driver problem, and loading firmware from filesystem
> >>>>>> isn't safe during resume, and that is the purpose of the warning."
> >>>>>>
> >>>>>> It isn't clear if this means request_firmware shouldn't be called
> >>>>>> on resume at all or if request_firmware shouldn't be called unless
> >>>>>> we can guarantee it won't make a call into the file system. I'd
> >>>>
> >>>> If the firmware is cached before resume, it is ok to call
> >>>> request_firmware()
> >>>> during resume. Otherwise it will call filesystem and disks, which may
> >>>> be a problem because the disk may not be ready for completing the
> >>>> request during resume.
> >>>>
> >>>>>> be okay with adding another api (request_optional_firmware?) to
> >>>>>> represent this if the firmware maintainers aren't against the
> >>>>>> concept. If the firmware maintainers are against the concept,
> >>>>>> it seems like the only solution is to rework the bluetooth drivers
> >>>>>> to not request anything on resume.
> >>>>
> >>>> So do you just want to work around the warning by introducing a new
> >>>> API?
> >>>>
> >>>>>
> >>>>> I think request_optional_firmware concept sounds like an useful
> >>>>> addition.
> >>>>>
> >>>>> However the problem here is that the driver does not know that it
> >>>>> is called from resume path. It is easy to say that this is a driver
> >>>>> problem, but the driver does not know it.
> >>>>
> >>>> From USB stack view, one usb driver should know it is in the resume
> >>>> path
> >>>> because the root entry is the .resume() callback of the USB BT driver.
> >>>
> >>> have you actually read drivers/bluetooth/btusb.c before making these
> >>> statements. I explained how and when request_firmware is actually
> >>> called.
> >>>
> >>>>> If the hci_register_dev is called which in fact triggers
> >>>>> hdev->setup to deal with vendor specific firmware path, then it
> >>>>> means the driver just went through its probe() phase again. How
> >>>>> would the driver differentiate this from any hot plug event. So to
> >>>>> say this is a driver problem is just plain stupid. The driver does
> >>>>> not know we are ending up in a reset_resume use case or when
> >>>>> ACPI/BIOS decides to emulate an USB disconnect.
> >>>
> >>> The only time request_firmware is called is from hdev->setup(). And
> >>> that can only be triggered by hci_register_dev(). Which is only
> >>> called from probe() callback of the driver. This has nothing to do
> >>> with the resume() callback.
> >>
> >> FYI, I once fixed a very similar problem in commit
> >> 4320f6b1d9db4ca912c5eb6ecb328b2e090e1586
> >> PM / sleep: Fix request_firmware() error at resume
> >>
> >> But the bug seems still alive. The stack trace looks pretty similar
> >> as the original bug report. Hmm.
> >>
> >> My fix above addressed the race between UMH lock changes in
> >> thaw_process(). If the UMH lock still fails, it means that the
> >> task was triggered even before the UMH protection in thaw_process().
> >> That sounds weird, and might be a more serious problem.
> >>
> >> Added Rafael to Cc, in case I overlooked something obvious.
> >>
> >
> > Thanks for pointing that out. It does look like a similar if not
> > the same problem. In this case though we still haven't closed
> > the race:
> >
> > [ 35.725301] PM: resume of devices complete after 2370.964 msecs
> > [ 35.725363] Bluetooth: >>>> Begin testing line 2754 btusb.c
> >
> > [ 35.725875] ------------[ cut here ]------------
> > [ 35.725879] WARNING: CPU: 7 PID: 2542 at
> > drivers/base/firmware_class.c:1118 _request_firmware+0x557/0x810()
> > [ 35.725896] Modules linked in: ccm fuse cmac xt_CHECKSUM
> > ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netbios_ns
> > nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
> > xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter
> > ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
> > ip6table_mangle ip6table_security ip6table_raw ip6table_filter
> > ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
> > nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw bnep
> > arc4 iwlmvm snd_hda_codec_realtek snd_hda_codec_generic
> > snd_hda_codec_hdmi mac80211 snd_hda_intel snd_hda_controller intel_rapl
> > iosf_mbi snd_hda_codec x86_pkg_temp_thermal snd_hda_core iwlwifi
> > coretemp kvm cfg80211 btusb btbcm btintel bluetooth snd_hwdep snd_seq
> > uvcvideo snd_seq_device
> > [ 35.725909] snd_pcm videobuf2_vmalloc videobuf2_core
> > videobuf2_memops iTCO_wdt v4l2_common iTCO_vendor_support thinkpad_acpi
> > videodev mei_me snd_timer rtsx_pci_ms ie31200_edac tpm_tis pcspkr
> > serio_raw joydev memstick media edac_core snd tpm mei rfkill shpchp
> > lpc_ich i2c_i801 soundcore wmi nfsd auth_rpcgss nfs_acl lockd grace
> > sunrpc binfmt_misc dm_crypt i915 i2c_algo_bit drm_kms_helper drm
> > rtsx_pci_sdmmc mmc_core e1000e crct10dif_pclmul crc32_pclmul
> > crc32c_intel rtsx_pci ptp ghash_clmulni_intel pps_core mfd_core video
> > [ 35.725911] CPU: 7 PID: 2542 Comm: kworker/u17:0 Tainted: G W
> > 4.0.0-bt_hackery+ #36
> > [ 35.725911] Hardware name: LENOVO 20BFS0EC00/20BFS0EC00, BIOS
> > GMET62WW (2.10 ) 03/19/2014
> > [ 35.725932] Workqueue: hci0 hci_power_on [bluetooth]
> > [ 35.725933] 0000000000000000 00000000923b240b ffff8803fc9d3b48
> > ffffffff8178a76a
> > [ 35.725935] 0000000000000000 0000000000000000 ffff8803fc9d3b88
> > ffffffff8109f5fa
> > [ 35.725936] 0000000000000000 ffffffffa05480a6 00000000fffffff5
> > ffff88040934d8a0
> > [ 35.725936] Call Trace:
> > [ 35.725939] [<ffffffff8178a76a>] dump_stack+0x45/0x57
> > [ 35.725940] [<ffffffff8109f5fa>] warn_slowpath_common+0x8a/0xc0
> > [ 35.725943] [<ffffffff8109f72a>] warn_slowpath_null+0x1a/0x20
> > [ 35.725944] [<ffffffff814ed227>] _request_firmware+0x557/0x810
> > [ 35.725946] [<ffffffff814e5253>] ? rpm_idle+0x23/0x230
> > [ 35.725948] [<ffffffff814ed515>] request_firmware+0x35/0x50
> > [ 35.725951] [<ffffffffa05443d8>] btusb_setup_intel+0x68/0x880 [btusb]
> > [ 35.725952] [<ffffffff814e5306>] ? rpm_idle+0xd6/0x230
> > [ 35.725954] [<ffffffff814e54ca>] ? __pm_runtime_idle+0x6a/0x80
> > [ 35.725960] [<ffffffffa04aa3a1>] hci_dev_do_open+0xe1/0x580 [bluetooth]
> > [ 35.725962] [<ffffffff810c8afd>] ?
> > ttwu_do_activate.constprop.95+0x5d/0x70
> > [ 35.725967] [<ffffffffa04ad440>] hci_power_on+0x40/0x200 [bluetooth]
> > [ 35.725969] [<ffffffff810b7f9b>] process_one_work+0x14b/0x400
> > [ 35.725971] [<ffffffff810b8a33>] worker_thread+0x53/0x490
> > [ 35.725972] [<ffffffff810b89e0>] ? rescuer_thread+0x300/0x300
> > [ 35.725974] [<ffffffff810bddd8>] kthread+0xd8/0xf0
> > [ 35.725976] [<ffffffff810bdd00>] ? kthread_create_on_node+0x1b0/0x1b0
> > [ 35.725977] [<ffffffff817911a2>] ret_from_fork+0x42/0x70
> > [ 35.725979] [<ffffffff810bdd00>] ? kthread_create_on_node+0x1b0/0x1b0
> > [ 35.725980] ---[ end trace 27cd183538666cbc ]---
> > [ 35.725981] bluetooth hci0: firmware: THISISASTEST will not be loaded
> > [ 35.726100] PM: Finishing wakeup.
> > [ 35.726101] Restarting tasks ... done.
> >
> > From this, it looks like usermodehelper still isn't set until after
> > we've already tried to load the firmware. Even weirder, it looks like
> > a thread is running before we've thawed the processes? Am I
> > misunderstanding something or is the dmesg lying to me?
> >
>
> Okay I took another look and I noticed hci_register_dev calls
> alloc_workqueue and then queue work to call the power on function.
> Is it possible that the alloc_workqueue/queue_work is getting run
> before thawing is complete? I don't have anymore time today to dig
> into that.

Thinking of this again, I wonder why request_firmware() (no _nowait
version) doesn't wait. Maybe just always waiting the lock would help?
A totally untested patch below.

Takashi

---
diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 171841ad1008..de397404a80e 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -1105,21 +1105,12 @@ _request_firmware(const struct firmware **firmware_p, const char *name,

ret = 0;
timeout = firmware_loading_timeout();
- if (opt_flags & FW_OPT_NOWAIT) {
- timeout = usermodehelper_read_lock_wait(timeout);
- if (!timeout) {
- dev_dbg(device, "firmware: %s loading timed out\n",
- name);
- ret = -EBUSY;
- goto out;
- }
- } else {
- ret = usermodehelper_read_trylock();
- if (WARN_ON(ret)) {
- dev_err(device, "firmware: %s will not be loaded\n",
- name);
- goto out;
- }
+ timeout = usermodehelper_read_lock_wait(timeout);
+ if (!timeout) {
+ dev_dbg(device, "firmware: %s loading timed out\n",
+ name);
+ ret = -EBUSY;
+ goto out;
}

ret = fw_get_filesystem_firmware(device, fw->priv);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Wang Nan: "Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs."
Previous message: Vineet Gupta: "Re: [RFC][PATCH 4/4] perf: Create aliases for PMU events"
In reply to: Laura Abbott: "Re: [RFC][PATCH] firmware: Drop WARN from usermodehelper_read_trylock error case"
Next in thread: Laura Abbott: "[PATCH] Bluetooth: Make request workqueue freezable"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]