Re: [Linux kernel bug] general protection fault in disable_store

From: Sam Sun
Date: Fri Apr 12 2024 - 09:08:43 EST


On Thu, Apr 11, 2024 at 11:24 PM Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Thu, Apr 11, 2024 at 02:52:27PM +0800, Sam Sun wrote:
> > Dear developers and maintainers,
> >
> > We encountered a general protection fault in function disable_store.
> > It is tested against the latest upstream linux (tag 6.9-rc3). C repro
> > and kernel config are attached to this email. Kernel crash log is
> > listed below.
> > ```
> > general protection fault, probably for non-canonical address
> > 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
> > KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
> > CPU: 1 PID: 9459 Comm: syz-executor414 Not tainted 6.7.0-rc7 #2
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
> > RIP: 0010:disable_store+0xd0/0x3d0 drivers/usb/core/port.c:88
> > Code: 02 00 00 4c 8b 75 40 4d 8d be 58 ff ff ff 4c 89 ff e8 a4 20 fa
> > ff 48 89 c2 48 89 c5 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c
> > 02 00 0f 85 b0 02 00 00 48 8b 45 00 48 8d bb 34 05 00 00 48
> > RSP: 0018:ffffc90006e3fc08 EFLAGS: 00010246
> > RAX: dffffc0000000000 RBX: ffff88801d4d4008 RCX: ffffffff86706be8
> > RDX: 0000000000000000 RSI: ffffffff86706c4d RDI: 0000000000000005
> > RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff92000dc7f85
> > R13: ffff88810f4bfb18 R14: ffff88801d4d10a8 R15: ffff88801d4d1000
> > FS: 00007fa0af71b640(0000) GS:ffff888135c00000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007fa0af71a4b8 CR3: 0000000022f5f000 CR4: 0000000000750ef0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > PKRU: 55555554
>
> > ----------------
> > Code disassembly (best guess):
> > 0: 02 00 add (%rax),%al
> > 2: 00 4c 8b 75 add %cl,0x75(%rbx,%rcx,4)
> > 6: 40 rex
> > 7: 4d 8d be 58 ff ff ff lea -0xa8(%r14),%r15
> > e: 4c 89 ff mov %r15,%rdi
> > 11: e8 a4 20 fa ff call 0xfffa20ba
> > 16: 48 89 c2 mov %rax,%rdx
> > 19: 48 89 c5 mov %rax,%rbp
> > 1c: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
> > 23: fc ff df
> > 26: 48 c1 ea 03 shr $0x3,%rdx
> > * 2a: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <--
> > trapping instruction
> > 2e: 0f 85 b0 02 00 00 jne 0x2e4
> > 34: 48 8b 45 00 mov 0x0(%rbp),%rax
> > 38: 48 8d bb 34 05 00 00 lea 0x534(%rbx),%rdi
> > 3f: 48 rex.W
> > ```
> > We analyzed the root cause of this bug. When calling disable_store()
> > in drivers/usb/core/port.c, if function authorized_store() is calling
> > usb_deauthorized_device() concurrently, the usb_interface will be
> > removed by usb_disable_device. However, in function disable_store,
> > usb_hub_to_struct_hub() would try to deref interface, causing
> > nullptr-deref. We also tested other functions in
> > drivers/usb/core/port.c. So far we haven't found a similar problem.
>
> I don't see how this explanation could be correct. disable_store() is a
> sysfs attribute file for the port device, so when it is called the port
> device structure must still be registered. The interface structure
> doesn't get removed until after usb_disable_device() calls device_del(),
> which won't return until hub_disconnect() returns, which won't happen
> until after the port devices are unregistered, which doesn't happen
> until disable_store() calls sysfs_break_active_protection(), which is
> after the call to usb_hub_to_struct_hub().
>
> Can you do a little extra debugging to find out exactly which C
> statement causes the trap? The disassembly above indicates the trap
> happens during a compare against 0 inside disable_store() -- not inside
> usb_hub_to_struct_hub(). Can you figure out which comparison that is?
>

Sorry for the mistake I made when debugging this bug. Now I have more
information about it. Disassembly of function disable_store() in the
latest upstream kernel is listed below.
```
Dump of assembler code for function disable_store:
...
0xffffffff86e907eb <+187>: lea -0x8(%r14),%r12
0xffffffff86e907ef <+191>: mov (%rbx),%rax
0xffffffff86e907f2 <+194>: mov %rax,0x20(%rsp)
0xffffffff86e907f7 <+199>: lea -0xa8(%rax),%rdi
0xffffffff86e907fe <+206>: mov %rdi,0x18(%rsp)
0xffffffff86e90803 <+211>: call 0xffffffff86e20220
<usb_hub_to_struct_hub>
0xffffffff86e90808 <+216>: mov %rax,%rbx
0xffffffff86e9080b <+219>: shr $0x3,%rax
0xffffffff86e9080f <+223>: movabs $0xdffffc0000000000,%rcx
0xffffffff86e90819 <+233>: cmpb $0x0,(%rax,%rcx,1)
0xffffffff86e9081d <+237>: je 0xffffffff86e90827 <disable_store+247>
0xffffffff86e9081f <+239>: mov %rbx,%rdi
0xffffffff86e90822 <+242>: call 0xffffffff81eeb0b0
<__asan_report_load8_noabort>
0xffffffff86e90827 <+247>: lea 0x60(%rsp),%rsi
...
```
The cmpb in disable_store()<+233> is generated by KASAN to check the
shadow memory status. If equals 0, which means the load 8 is valid,
pass the KASAN check. However, this time rax is 0, so it first
triggers general protection fault, since 0xdffffc0000000000 is not a
valid address. rax contains the return address of function
usb_hub_to_struct_hub(), in this case is a NULL.

In function usb_hub_to_struct_hub(), I checked hdev and its sub
domains, and they are not NULL. Is it possible that
usb_deauthorized_device() set
hdev->actconfig->interface[0]->dev.driver_data to NULL? I cannot
confirm that since every time I try to breakpoint the code it crashes
differently.

If there is any other thing I could help, please let me know.

Best,
Yue


> Alan Stern
>
> > If you have any questions, please contact us.
> >
> > Reported by Yue Sun <samsun1006219@xxxxxxxxx>
> > Reported by xingwei lee <xrivendell7@xxxxxxxxx>
> >
> > Best Regards,
> > Yue