Re: [RFC][PATCH] firmware: Drop WARN from usermodehelper_read_trylock error case
From: Marcel Holtmann
Date: Wed Apr 29 2015 - 19:04:16 EST
Hi Laura,
>>> We've received a number of reports of warnings when coming
>>> out of suspend with certain bluetooth firmware configurations:
>>>
>>> WARNING: CPU: 3 PID: 3280 at drivers/base/firmware_class.c:1126
>>> _request_firmware+0x558/0x810()
>>> Modules linked in: ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
>>> xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter
>>> ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
>>> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
>>> ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
>>> nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw
>>> binfmt_misc bnep intel_rapl iosf_mbi arc4 x86_pkg_temp_thermal
>>> snd_hda_codec_hdmi coretemp kvm_intel joydev snd_hda_codec_realtek
>>> iwldvm snd_hda_codec_generic kvm iTCO_wdt mac80211 iTCO_vendor_support
>>> snd_hda_intel snd_hda_controller snd_hda_codec crct10dif_pclmul
>>> snd_hwdep crc32_pclmul snd_seq crc32c_intel ghash_clmulni_intel uvcvideo
>>> snd_seq_device iwlwifi btusb videobuf2_vmalloc snd_pcm videobuf2_core
>>> serio_raw bluetooth cfg80211 videobuf2_memops sdhci_pci v4l2_common
>>> videodev thinkpad_acpi sdhci i2c_i801 lpc_ich mfd_core wacom mmc_core
>>> media snd_timer tpm_tis hid_logitech_hidpp wmi tpm rfkill snd mei_me mei
>>> shpchp soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc i915
>>> i2c_algo_bit drm_kms_helper e1000e drm hid_logitech_dj ptp pps_core
>>> video
>>> CPU: 3 PID: 3280 Comm: kworker/u17:0 Not tainted 3.19.3-200.fc21.x86_64
>>> Hardware name: LENOVO 343522U/343522U, BIOS GCET96WW (2.56 ) 10/22/2013
>>> Workqueue: hci0 hci_power_on [bluetooth]
>>> 0000000000000000 0000000089944328 ffff88040acffb78 ffffffff8176e215
>>> 0000000000000000 0000000000000000 ffff88040acffbb8 ffffffff8109bc1a
>>> 0000000000000000 ffff88040acffcd0 00000000fffffff5 ffff8804076bac40
>>> Call Trace:
>>> [<ffffffff8176e215>] dump_stack+0x45/0x57
>>> [<ffffffff8109bc1a>] warn_slowpath_common+0x8a/0xc0
>>> [<ffffffff8109bd4a>] warn_slowpath_null+0x1a/0x20
>>> [<ffffffff814dbe78>] _request_firmware+0x558/0x810
>>> [<ffffffff814dc165>] request_firmware+0x35/0x50
>>> [<ffffffffa03a7886>] btusb_setup_bcm_patchram+0x86/0x590 [btusb]
>>> [<ffffffff814d40e6>] ? rpm_idle+0xd6/0x230
>>> [<ffffffffa04d4801>] hci_dev_do_open+0xe1/0xa90 [bluetooth]
>>> [<ffffffff810c51dd>] ? ttwu_do_activate.constprop.90+0x5d/0x70
>>> [<ffffffffa04d5980>] hci_power_on+0x40/0x200 [bluetooth]
>>> [<ffffffff810b487c>] process_one_work+0x14c/0x3f0
>>> [<ffffffff810b52f3>] worker_thread+0x53/0x470
>>> [<ffffffff810b52a0>] ? rescuer_thread+0x300/0x300
>>> [<ffffffff810ba548>] kthread+0xd8/0xf0
>>> [<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0
>>> [<ffffffff81774958>] ret_from_fork+0x58/0x90
>>> [<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0
>>>
>>> This occurs after every resume.
>>>
>>> When resuming, the bluetooth driver needs to re-request the
>>> firmware. This re-request is happening before usermodehelper
>>> is fully enabled. If the firmware load succeeded previously, the
>>> caching behavior of the firmware code typically negates the
>>> need to call the usermodehelper code again and the request
>>> succeeds. If the firmware was never loaded because it isn't
>>> actually present in the file system, this results in a call
>>> to usermodehelper and a failure warning every resume. Rather
>>> than have a WARN clogging up the kernel messages each time,
>>> just drop the warn. There is still a dev_err for debugging
>>> purposes.
>>>
>>> Signed-off-by: Laura Abbott <labbott@xxxxxxxxxxxxxxxxx>
>>> ---
>>> This might be papering over a real issue but I'm not
>>> familiar enough with any of suspend/resume, bluetooth,
>>> or firmware loading to identify an alternate fix.
>>> The backtrace is from bcm patchram but the problem
>>> isn't limited to that hardware. Intel also does a
>>> request firmware and I was able to reproduce the
>>> same backtrace on that driver by requesting non-existant
>>> firmware file.
>>
>> so here is the thing with Bluetooth firmware. Some of them
>> are RAM patches to fix the ROM modules. Others are full firmware
>> that are required to be downloaded first.
>>
>> For ROM modules, the RAM patching procedure is optional. So we
>> will proceed even if no firmware is available. This means that
>> the kernel will never cache it (since it is not there in the
>> first place) and also on every resume we have the same issues.
>> So optional firmware is something that happens for Bluetooth USB
> > dongles a lot.
>>
>> In the driver we know which firmwares are optional and which are
> > required. So we could tell the firmware class this if this would make
> > things better and result in clearer errors and warnings. Is that
> > something we want here?
>>
>
> The response on another reply was
>
> "Yes, it is a driver problem, and loading firmware from filesystem
> isn't safe during resume, and that is the purpose of the warning."
>
> It isn't clear if this means request_firmware shouldn't be called
> on resume at all or if request_firmware shouldn't be called unless
> we can guarantee it won't make a call into the file system. I'd
> be okay with adding another api (request_optional_firmware?) to
> represent this if the firmware maintainers aren't against the
> concept. If the firmware maintainers are against the concept,
> it seems like the only solution is to rework the bluetooth drivers
> to not request anything on resume.
I think request_optional_firmware concept sounds like an useful addition.
However the problem here is that the driver does not know that it is called from resume path. It is easy to say that this is a driver problem, but the driver does not know it.
If the hci_register_dev is called which in fact triggers hdev->setup to deal with vendor specific firmware path, then it means the driver just went through its probe() phase again. How would the driver differentiate this from any hot plug event. So to say this is a driver problem is just plain stupid. The driver does not know we are ending up in a reset_resume use case or when ACPI/BIOS decides to emulate an USB disconnect.
Regards
Marcel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/