Re: WARNING: CPU: 1 PID: 14735 at fs/dcache.c:365 dentry_free+0x100/0x128

From: John David Anglin
Date: Tue Jul 19 2022 - 17:02:19 EST


Hi Helge,

I hit this warning with the patch below building ghc on mx3210:

mx3210 login: ------------[ cut here ]------------
WARNING: CPU: 2 PID: 32654 at fs/dcache.c:365 dentry_free+0xfc/0x108
Modules linked in: binfmt_misc ext2 ext4 crc16 mbcache jbd2 ipmi_watchdog sg ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler fuse nfsd ip_tables x_tables ipv6 autofs4 xfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod t10_pi ses enclosure scsi_transport_sas crc64_rocksoft crc64 uas usb_storage sr_mod cdrom ohci_pci sym53c8xx pata_cmd64x ehci_pci ohci_hcd libata scsi_transport_spi ehci_hcd tg3 scsi_mod usbcore scsi_common usb_common
CPU: 2 PID: 32654 Comm: cc1 Not tainted 5.18.12+ #2
Hardware name: 9000/800/rp3440

     YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00001000000001000110100000001111 Not tainted
r00-03  000000000804680f 00000040ce7fc880 00000000404f2b74 00000040ce7fc920
r04-07  0000000040be4940 000000410f6cd630 00000001413e4068 000000410f6cd688
r08-11  0000000040fd2e60 0000000040bc5020 0000000040c2c940 00000000000800e0
r12-15  0000000040c2c940 0000000000000001 0000000040c2c940 000000410f6cd688
r16-19  00000001f9fe105d 00000040ce7fc1f8 000000000000002f 000000000a0c1000
r20-23  000000000800000f 000000000800000f 000000410f6cd639 000000000800000f
r24-27  0000000000000000 0000000000000385 000000410f6cd630 0000000040be4940
r28-31  0000000041104530 00000040ce7fc8f0 00000040ce7fc9a0 0000000000000000
sr00-03  0000000000a03800 0000000000000000 0000000000000000 0000000000a03800
sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000

IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000404f18bc 00000000404f18c0
 IIR: 03ffe01f    ISR: 0000000010350000  IOR: 00000239ff3fc928
 CPU:        2   CR30: 00000040cadd1380 CR31: ffffffffffffffff
 ORIG_R28: 00000040ce7fcb70
 IAOQ[0]: dentry_free+0xfc/0x108
 IAOQ[1]: dentry_free+0x100/0x108
 RP(r2): __dentry_kill+0x2bc/0x338
Backtrace:
 [<00000000404f2b74>] __dentry_kill+0x2bc/0x338
 [<00000000404f37b8>] dentry_kill+0xb0/0x318
 [<00000000404f3d08>] dput+0x2e8/0x328
 [<00000000404dd7dc>] step_into+0x344/0x390
 [<00000000404dda4c>] walk_component+0xa4/0x310
 [<00000000404df234>] link_path_walk.part.0+0x2ec/0x4b0
 [<00000000404e0000>] path_openat+0xe8/0x348
 [<00000000404e2c58>] do_filp_open+0x98/0x178
 [<00000000404babe8>] do_sys_openat2+0x148/0x288
 [<00000000404bb41c>] compat_sys_openat+0x54/0x98
 [<0000000040203e30>] syscall_exit+0x0/0x10

---[ end trace 0000000000000000 ]---
watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [cc1:32657]

Regards,
Dave

On 2022-07-19 12:32 p.m., Helge Deller wrote:
Hello Hillf,

On 7/17/22 13:36, Hillf Danton wrote:
On Sun, 17 Jul 2022 11:42:48 +0200
I used WARN_ON() instead of BUG_ON().
With that, both triggered, first the first one, then the second one.
Full log is here:
http://dellerweb.de/testcases/minicom.dcache.crash.6-warn
Given the first BUG_ON triggered, and dentry at the moment is supposed to
not be alias, see if it is still in lookup with d_lock held. That is the
step before de-unioning d_alias with d_in_lookup_hash.

On the other hand if only the second one triggered, we should track
DCACHE_DENTRY_KILLED instead in assumption that killed dentry was
used again after releasing d_lock surrounding the firt one.
The machine has now been up for 2 days without any issues, while it had pretty
much the same load as when it was crashing earlier.
So, in summary I'd assume that your patch below fixes the issue.

I'm now rebooting the machine with a new kernel, where I just changed
if (unlikely(d_in_lookup(dentry)))
to
if (WARN_ON_ONCE(d_in_lookup(dentry)))
in order to see if this really triggered.

Anyway, I think your patch is good so far.
Would that be the final patch, or should I test some others?

Thanks!
Helge

--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -605,8 +605,12 @@ static void __dentry_kill(struct dentry
spin_unlock(&parent->d_lock);
if (dentry->d_inode)
dentry_unlink_inode(dentry);
- else
+ else {
+ if (unlikely(d_in_lookup(dentry))) {
+ __d_lookup_done(dentry);
+ }
spin_unlock(&dentry->d_lock);
+ }
this_cpu_dec(nr_dentry);
if (dentry->d_op && dentry->d_op->d_release)
dentry->d_op->d_release(dentry);


--
John David Anglin dave.anglin@xxxxxxxx