On 11/09/2017 12:02 PM, Andrew Banman wrote:
On 11/8/17 2:00 PM, Corey Minyard wrote:
On 11/08/2017 11:11 AM, Andrew Banman wrote:My kernel is missing 0944d889a2, wherein the dmi handling moved to a platform
On 11/8/17 11:06 AM, Andrew Banman wrote:This begs the question: How did you produce this? From what I can tell,
If there are uninitialized SMIs in the smi_infos list, i.e. with noI think this points to a broader problem of holding uninitialized smi_info
handlers set, then disable_si_irq() in cleanup_smi_one() will hit a null
pointer dereference when the former attempts to start the check enables
transaction. Thus, we panic during module exit.
structs in smi_infos list. There are many places where handlers and other struct
members are assumed. Maybe a better design would be to remove SMIs from the list
if we have no intention of initializing them?
there is no way you can get to this code if you don't have a working and
initialized smi_info structure, and that's not the only place this would
have to be fixed if it wasn't. So it's not what you assume, I don't think
it's an uninitialized smi_info structure on the list.
device. Without this commit the driver discovers both ACPI and SMBIOS records
but only initializes the former; this is visible in the tracebacks below.
Is there a case for pushing a similar patch to stable releases predating 0944d889a2
to prevent this crash?
It looks like you are using a Redhat 3.10 kernel. Is that right?
-corey
Thank you for your time,
Andrew
As usual with these sorts of things, please send tracebacks and reproduction# dmesg | grep -i ipmi
procedures.
[ÂÂ 14.883554] ipmi message handler version 39.2
[ÂÂ 14.905543] ipmi device interface
[ÂÂ 14.931149] ipmi_si IPI0001:00: ipmi_si: probing via ACPI
[ 14.937254] ipmi_si IPI0001:00: [io 0x0ce4-0x0ce6] regsize 1 spacing 1 irq 6
[ÂÂ 14.946767] ipmi_si: Adding ACPI-specified bt state machine
[ÂÂ 14.954621] IPMI System Interface driver.
[ÂÂ 14.975257] ipmi_si: probing via SMBIOS
[ÂÂ 14.980446] ipmi_si: SMBIOS: mem 0xce4 regsize 1 spacing 1 irq 6
[ÂÂ 14.987163] ipmi_si: Adding SMBIOS-specified bt state machine
[ÂÂ 14.996706] ipmi_si: probing via SPMI
[ÂÂ 15.002351] ipmi_si: SPMI: io 0xce4 regsize 1 spacing 1 irq 6
[ÂÂ 15.010320] ipmi_si: Adding SPMI-specified bt state machine duplicate interface
[ÂÂ 15.020056] ipmi_si: Trying ACPI-specified bt state machine at i/o address 0xce4, slave address 0x0, irq 6
[ÂÂ 15.066306] ipmi_si IPI0001:00: Using irq 6
[ÂÂ 15.077855] IPMI BT: req2rsp=2 secs retries=1
[ÂÂ 15.130045] ipmi_si IPI0001:00: Found new BMC (man_id: 0x000000, prod_id: 0x0101, dev_id: 0x20)
[ÂÂ 15.140472] ipmi_si IPI0001:00: IPMI bt interface initialized
# rmmod ipmi_si
[ÂÂ 85.492578] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[ÂÂ 85.501346] IP: [<ffffffffc027840e>] start_check_enables+0x3e/0x80 [ipmi_si]
[ÂÂ 85.509233] PGD 7fb3875067 PUD 7fbc222067 PMD 0
[ÂÂ 85.514421] Oops: 0000 [#1] SMP
[ÂÂ 85.518056] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter dm_mirror dm_region_hash dm_log dm_mod iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp vfat intel_rapl fat iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw i40e gf128mul glue_helper ablk_helper cryptd ptp pcspkr pps_core joydev sg i2c_i801 shpchp lpc_ich ipmi_si(-) wmi ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd numatools(OE) grace hwperf(OE) binfmt_misc sunrpc ip_tables sr_mod cdrom xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect crct10dif_pclmul crct10dif_common sysimgblt crc32c_intel fb_sys_fops ahci ttm libahci uas drm libata usb_storage i2c_core
[ÂÂ 85.616799] CPU: 241 PID: 5743 Comm: rmmod Tainted: GÂÂÂÂÂÂÂÂÂÂ OEÂ ------------ÂÂ 3.10.0-693.el7.x86_64 #1
[ÂÂ 85.627674] Hardware name: HPE Superdome Flex/Superdome Flex, BIOS IP147.006.000.137.000.1711022001 11/02/2017
[ÂÂ 85.638843] task: ffff887fb8ac6eb0 ti: ffff887ffd30c000 task.ti: ffff887ffd30c000
[ÂÂ 85.647200] RIP: 0010:[<ffffffffc027840e>] [<ffffffffc027840e>] start_check_enables+0x3e/0x80 [ipmi_si]
[ÂÂ 85.657794] RSP: 0018:ffff887ffd30fea8Â EFLAGS: 00010246
[ÂÂ 85.663723] RAX: 0000000000000000 RBX: ffff88016930b000 RCX: 0000000000000006
[ÂÂ 85.671688] RDX: 0000000000000002 RSI: ffff887ffd30feae RDI: 0000000000000000
[ÂÂ 85.679655] RBP: ffff887ffd30fec0 R08: ffff88016930b1a0 R09: 00000001820001fb
[ÂÂ 85.687623] R10: 00000000beb72001 R11: ffffea013efadc80 R12: ffffffffc02822c0
[ÂÂ 85.695589] R13: 0000000000000800 R14: 0000000001f222b0 R15: 0000000001f22010
[ÂÂ 85.703555] FS:Â 00007fb9a5c10740(0000) GS:ffff887fbea40000(0000) knlGS:0000000000000000
[ÂÂ 85.712587] CS:Â 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ÂÂ 85.719002] CR2: 0000000000000010 CR3: 0000007ffd2a5000 CR4: 00000000003407e0
[ÂÂ 85.726968] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ÂÂ 85.734936] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ÂÂ 85.742904] Stack:
[ÂÂ 85.745150]Â 2f18887ffd30fec0 000000007706aaa0 ffff88016930b000 ffff887ffd30fed8
[ÂÂ 85.753456]Â ffffffffc027ae19 ffffffffc0281ff0 ffff887ffd30fef0 ffffffffc027b9cb
[ÂÂ 85.761762]Â fffffffffffffff5 ffff887ffd30ff78 ffffffff810fe55b 0000000000000000
[ÂÂ 85.770063] Call Trace:
[ÂÂ 85.772797]Â [<ffffffffc027ae19>] cleanup_one_si+0x149/0x180 [ipmi_si]
[ÂÂ 85.780090]Â [<ffffffffc027b9cb>] cleanup_ipmi_si+0x6b/0xc0 [ipmi_si]
[ÂÂ 85.787293]Â [<ffffffff810fe55b>] SyS_delete_module+0x19b/0x300
[ÂÂ 85.793913]Â [<ffffffff816b4fc9>] system_call_fastpath+0x16/0x1b
[ÂÂ 85.800621] Code: ee 18 c6 45 ef 2f 65 48 8b 04 25 28 00 00 00 48 89 45 f0 31 c0 40 84 f6 75 33 48 8b 47 18 ba 02 00 00 00 48 8b 7f 10 48 8d 75 ee <ff> 50 10 48 8b 45 f0 65 48 33 04 25 28 00 00 00 c7 43 38 05 00
[ÂÂ 85.822638] RIPÂ [<ffffffffc027840e>] start_check_enables+0x3e/0x80 [ipmi_si]
[ÂÂ 85.830611]Â RSP <ffff887ffd30fea8>
[ÂÂ 85.834504] CR2: 0000000000000010
And with debug printks printing out some contents of the struct at the start of
module exit:
# rmmod ipmi_si; dmesg -c
[ 1083.527776]ÂÂÂÂÂÂÂÂÂÂÂÂ smi_info ffff880fbb38f600
[ 1083.527779]ÂÂÂÂÂÂÂÂÂÂÂÂ intf_num 0
[ 1083.527779]ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ intf ffff880fbb800000
[ 1083.527780]ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ *si_sm ffff880fbb144400
[ 1083.527780]ÂÂÂÂÂÂÂÂÂÂÂ *handlers ffffffffa051bac0
[ 1083.527780]ÂÂÂÂÂÂÂÂÂÂÂÂÂ si_type 2
[ 1083.527782]ÂÂÂÂÂÂÂÂÂÂÂÂ smi_info ffff880fbb38f200
[ 1083.527783]ÂÂÂÂÂÂÂÂÂÂÂÂ intf_num 0
[ 1083.527783]ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ intfÂÂÂÂÂÂÂÂÂÂ (null)
[ 1083.527784]ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ *si_smÂÂÂÂÂÂÂÂÂÂ (null)
[ 1083.527785]ÂÂÂÂÂÂÂÂÂÂÂ *handlersÂÂÂÂÂÂÂÂÂÂ (null)
[ 1083.527786]ÂÂÂÂÂÂÂÂÂÂÂÂÂ si_type 2
@@ -3961,6 +3985,15 @@ static void cleanup_ipmi_si(void)
ÂÂÂÂÂ if (!initialized)
ÂÂÂÂÂÂÂÂÂ return;
 + list_for_each_entry_safe(e, tmp_e, &smi_infos, link) {
+ÂÂÂÂÂÂÂ pr_crit("%20s %p\n", "smi_info",ÂÂÂ e);
+ÂÂÂÂÂÂÂ pr_crit("%20s %d\n", "intf_num",ÂÂÂ e->intf_num);
+ÂÂÂÂÂÂÂ pr_crit("%20s %p\n", "intf",ÂÂÂÂÂÂÂ e->intf);
+ÂÂÂÂÂÂÂ pr_crit("%20s %p\n", "*si_sm",ÂÂÂÂÂÂÂ e->si_sm);
+ÂÂÂÂÂÂÂ pr_crit("%20s %p\n", "*handlers",ÂÂÂ e->handlers);
+ÂÂÂÂÂÂÂ pr_crit("%20s %d\n\n", "si_type",ÂÂÂ e->si_type);
+ÂÂÂ }
+
 #ifdef CONFIG_PCI
ÂÂÂÂÂ if (pci_registered)
ÂÂÂÂÂÂÂÂÂ pci_unregister_driver(&ipmi_pci_driver);
If you are removing the module and this happens, there may be a race
conditions, but this is the wrong fix. More likely that the structure gets
cleaned up and this function is called afterwards.
-corey
Andrew
Avoid panicking when there are uninitialized SMIs by checking for a handler
pointer before starting the check enables transaction.
Signed-off-by: Andrew Banman <abanman@xxxxxxx>
---
ÂÂÂdrivers/char/ipmi/ipmi_si_intf.c | 2 +-
ÂÂÂ1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c
index cb5719e..6c0b1b3 100644
--- a/drivers/char/ipmi/ipmi_si_intf.c
+++ b/drivers/char/ipmi/ipmi_si_intf.c
@@ -442,7 +442,7 @@ static void start_check_enables(struct smi_info
*smi_info, bool start_timer)
ÂÂÂÂÂÂ if (start_timer)
ÂÂÂÂÂÂÂÂÂÂ start_new_msg(smi_info, msg, 2);
-ÂÂÂ else
+ÂÂÂ else if (smi_info->handlers)
smi_info->handlers->start_transaction(smi_info->si_sm, msg, 2);
ÂÂÂÂÂÂ smi_info->si_state = SI_CHECKING_ENABLES;
ÂÂÂ}