[REGRESSION] Kernel panic in firmware_request since v4.9

From: Ben Gamari
Date: Mon Jan 02 2017 - 00:59:53 EST



Hello everyone,

After a recent upgrade to 4.10-rc2 I noticed that my wireless adapter
didn't come up due to a kernel panic in the firmware class induced by
iwlwifi,

BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
IP: _request_firmware+0x995/0xa40
PGD 0

Oops: 0000 [#1] SMP
Modules linked in: i915(+) joydev rtsx_pci_sdmmc irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc iTCO_wdt iTCO_vendor_support uvcvideo dell_smm_hwmon aesni_intel aes_x86_64 crypto_simd videobuf2_vmalloc glue_helper videobuf2_memops cryptd videobuf2_v4l2 videobuf2_core snd_hda_intel videodev coretemp snd_hda_codec tg3 snd_hda_core iwlwifi pcspkr hid_multitouch psmouse drm_kms_helper snd_hwdep drm snd_pcm snd_timer rtsx_pci snd cfg80211 i2c_algo_bit mei_me soundcore fb_sys_fops mei syscopyarea i2c_i801 sysfillrect sg sysimgblt shpchp hci_uart wmi acpi_als video intel_lpss_acpi kfifo_buf intel_lpss industrialio i2c_hid acpi_pad e1000e ptp pps_core e1000 sunrpc parport_pc ppdev lp parport ip_tables x_tables autofs4 raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq
async_xor xor async_tx btusb btrtl btbcm btintel bluetooth raid6_pq raid1 raid0 multipath linear hid_generic usbkbd usbhid hid ahci serio_raw libahci pinctrl_sunrisepoint pinctrl_intel
CPU: 0 PID: 321 Comm: kworker/0:2 Not tainted 4.9.0+ #4
Hardware name: Dell Inc. Latitude E7470/0T6HHJ, BIOS 1.6.3 06/15/2016
Workqueue: events request_firmware_work_func
task: ffff880824200000 task.stack: ffffc90003a94000
RIP: 0010:_request_firmware+0x995/0xa40
RSP: 0000:ffffc90003a97db8 EFLAGS: 00010246
RAX: 00000000fffffffe RBX: ffff88082441b000 RCX: ffff88082441b028
RDX: ffff880824200000 RSI: 0000000000000287 RDI: 0000000000000000
RBP: ffffc90003a97e30 R08: ffff88084cc104e0 R09: 0000000000000000
R10: 000000017a1c7c3f R11: 00007f894759ed80 R12: ffffc90003a97e40
R13: 0000000000000007 R14: ffff88081c4a4800 R15: 0000000000003a98
FS: 0000000000000000(0000) GS:ffff88084cc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000038 CR3: 0000000824ef3000 CR4: 00000000003406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
request_firmware_work_func+0x2b/0x60
process_one_work+0x1fc/0x490
worker_thread+0x4b/0x500
kthread+0x101/0x140
? process_one_work+0x490/0x490
? kthread_create_on_node+0x40/0x40
? do_syscall_64+0x6e/0x180
ret_from_fork+0x25/0x30
Code: 9e d0 81 e8 4e 6d fe ff b8 f4 ff ff ff e9 db f9 ff ff 48 c7 c7 60 91 ed 81 89 45 b0 e8 95 a3 29 00 49 8b be 00 03 00 00 8b 45 b0 <83> 7f 38 02 74 08 e8 40 eb ff ff 8b 45 b0 49 c7 86 00 03 00 00
RIP: _request_firmware+0x995/0xa40 RSP: ffffc90003a97db8
CR2: 0000000000000038
---[ end trace 94dd071852e99747 ]---

This occurs while iwlwifi is trying to load a firmware image that
(expectedly) does not exist on my system. In prior kernel versions this
would fail with,

iwlwifi 0000:01:00.0: Direct firmware load for iwlwifi-8000C-26.ucode failed with error -2
iwlwifi 0000:01:00.0: Falling back to user helper

and continue to load other images until it found an available version
(-22.ucode). Note that this is a Debian Sid system running systemd 232.
I believe this failure is also reproducible in the 4.9 kernel.

A bit of digging revealed that the non-existent firmware image triggers
an unchecked dereference of buf->fw_st in __fw_load_abort. The attached
patch avoids the issue although it's unclear whether this is the correct
fix. It appears there have been a few refactorings in
driver/base/firmware_class.c recently. I've CC'd the authors of these
patches in hopes that this will ring a bell with someone.

Cheers,

- Ben



diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 4497d263209f..badc5737bad2 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -546,7 +546,8 @@ static void __fw_load_abort(struct firmware_buf *buf)
* There is a small window in which user can write to 'loading'
* between loading done and disappearance of 'loading'
*/
- if (fw_state_is_done(&buf->fw_st))
+ if (!buf || fw_state_is_done(&buf->fw_st))
return;

list_del_init(&buf->pending_list);
--
2.11.0

Attachment: signature.asc
Description: PGP signature