Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives

From: Guenter Roeck
Date: Thu Jan 16 2020 - 22:53:53 EST


On 1/16/20 5:43 PM, Martin K. Petersen wrote:

Guenter,

Can you by any chance provide a full traceback ?

My test machines are tied up with something else right now. This is from
a few days ago (pristine hwmon-next, I believe):

[ 1055.611912] ------------[ cut here ]------------
[ 1055.611922] WARNING: CPU: 3 PID: 3233 at drivers/base/dd.c:519 really_probe+0x436/0x4f0
[ 1055.611925] Modules linked in: sd_mod sg ahci libahci libata drivetemp scsi_mod crc32c_intel igb i2c_algo_bit i2c_core dca hwmon ipv6 nf_defrag_ipv6 crc_ccitt
[ 1055.611955] CPU: 3 PID: 3233 Comm: kworker/u17:1 Tainted: G W 5.5.0-rc1+ #21
[ 1055.611965] Workqueue: events_unbound async_run_entry_fn
[ 1055.611973] RIP: 0010:really_probe+0x436/0x4f0
[ 1055.611979] Code: c7 30 69 f8 82 e8 ba 94 e5 ff e9 60 ff ff ff 48 8d 7b 38 e8 cc d9 b4 ff 48 8b 43 38 48 85 c0 0f 85 41 fd ff ff e9 4f fd ff ff <0f> 0b e9 66 fc ff ff 48 8d 7d 50 e8 aa d9 b4 ff 4c 8b 6d 50 4d 85
[ 1055.611983] RSP: 0018:ffff8881edb77c98 EFLAGS: 00010287
[ 1055.611989] RAX: ffff8881e1f8fb80 RBX: ffffffffa033a000 RCX: ffffffff8182e583
[ 1055.611993] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: ffff8881dec506a8
[ 1055.611997] RBP: ffff8881dec50238 R08: 0000000000000001 R09: fffffbfff09629ed
[ 1055.612000] R10: fffffbfff09629ec R11: 0000000000000003 R12: 0000000000000000
[ 1055.612004] R13: ffff8881dec506a8 R14: ffffffff8182eca0 R15: 000000000000000b
[ 1055.612009] FS: 0000000000000000(0000) GS:ffff8881f8900000(0000) knlGS:0000000000000000
[ 1055.612013] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1055.612017] CR2: 00007f957884a000 CR3: 00000001df5ec003 CR4: 00000000000606e0
[ 1055.612020] Call Trace:
[ 1055.612038] ? driver_probe_device+0x170/0x170
[ 1055.612045] driver_probe_device+0x82/0x170
[ 1055.612058] ? driver_probe_device+0x170/0x170
[ 1055.612064] __driver_attach_async_helper+0xa3/0xe0
[ 1055.612076] async_run_entry_fn+0x68/0x2a0
[ 1055.612094] process_one_work+0x4df/0x990
[ 1055.612121] ? pwq_dec_nr_in_flight+0x110/0x110
[ 1055.612127] ? do_raw_spin_lock+0x113/0x1d0
[ 1055.612161] worker_thread+0x78/0x5c0
[ 1055.612190] ? process_one_work+0x990/0x990
[ 1055.612195] kthread+0x1be/0x1e0
[ 1055.612202] ? kthread_create_worker_on_cpu+0xd0/0xd0
[ 1055.612215] ret_from_fork+0x3a/0x50
[ 1055.612251] irq event stamp: 3512
[ 1055.612259] hardirqs last enabled at (3511): [<ffffffff81d2b874>] _raw_spin_unlock_irq+0x24/0x30
[ 1055.612265] hardirqs last disabled at (3512): [<ffffffff810029c9>] trace_hardirqs_off_thunk+0x1a/0x1c
[ 1055.612272] softirqs last enabled at (3500): [<ffffffff820003a5>] __do_softirq+0x3a5/0x5a8
[ 1055.612281] softirqs last disabled at (3489): [<ffffffff810cec7b>] irq_exit+0xfb/0x100
[ 1055.612284] ---[ end trace f0a8dd9a37bea031 ]---

Either case, I would like to track down how the warning happens, so any
information you can provide that lets me reproduce the problem would be
very helpful.

The three systems that exhibit the problem are stock (2010/2012/2014
vintage) x86_64 servers with onboard AHCI and a variety of 4-6 SATA
drives each.

For the qemu test I didn't have ahci configured but I had my SCSI temp
patch on top of yours and ran modprobe drivetemp; modprobe scsi_debug to
trigger the warnings.


Interesting. Looks like your system performs asynchronous probing.
No idea how that can result in that kind of problem, but who knows.
Can you send me the qemu command line ?

Thanks,
Guenter