Re: [BUG 6.4-rc3] BUG: kernel NULL pointer dereference in __dev_fwnode

From: Linus Torvalds
Date: Wed May 24 2023 - 14:29:05 EST

On Wed, May 24, 2023 at 10:12 AM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> I started adding fixes to my urgent branch rebased on top of v6.4-rc3
> and ran my tests. Unfortunately they crashed on unrelated code.
> Here's the dump:
> BUG: kernel NULL pointer dereference, address: 00000000000003e8
> RIP: 0010:__dev_fwnode+0x9/0x2a
> Code: ff 85 c0 78 16 48 8b 3c 24 89 c6 59 e9 e0 f7 ff ff b8 ea ff ff ff c3 cc cc cc cc 5a c3 cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 <48> 8b 87 e8 03 00 00 48
> 83 c0 18 c3 cc cc cc cc 48

That disassembles to

nopl 0x0(%rax,%rax,1)
mov 0x3e8(%rdi),%rax
add $0x18,%rax

which looks like it must be the

return dev->fwnode;

with a NULL 'dev'. Which makes sense for __dev_fwnode with CONFIG_OF
not enabled.

Except I have no idea what that odd 'add $0x18" is all about. Strange.

Anyway, the caller seems to be this code in power_supply_get_battery_info():

if (psy->of_node) {
.. presumably not this ..
} else {
err = fwnode_property_get_reference_args(
"monitored-battery", NULL, 0, 0, &args);

so I suspect we have psy->dev.parent being NULL.

> I ran a bisect and it found it to be this commit:
> 27a2195efa8d2 ("power: supply: core: auto-exposure of simple-battery data")
> I checked out that commit and tested it, and it crashed. I then
> reverted that commit, and the crash goes away.

At a guess, it's

(a) the new code to expose battery info at registration time:

+ /*
+ * Expose constant battery info, if it is available. While there are
+ * some chargers accessing constant battery data, we only want to
+ * expose battery data to userspace for battery devices.
+ */
+ if (desc->type == POWER_SUPPLY_TYPE_BATTERY) {
+ rc = power_supply_get_battery_info(psy, &psy->battery_info);
+ if (rc && rc != -ENODEV && rc != -ENOENT)
+ goto check_supplies_failed;
+ }

interacting with

(b) the test_power_init() that does that

test_power_supplies[i] = power_supply_register(NULL,

which passes in NULL for the "parent" pointer.

So it looks like a dodgy test that was a bit lazy. But maybe a NULL
parent is supposed to work.