On Tue, Apr 21, 2015 at 04:52:37PM -0400, Waiman Long wrote:
On 04/17/2015 07:45 PM, Dave Chinner wrote:Wait, what?
On Fri, Apr 17, 2015 at 01:38:49PM -0400, Waiman Long wrote:I did the test on a RHEL 6.6 system. The 4.0 kernel is unsigned, but
Hi Dave,First thing you need to do is upgrade xfsprogs so that this message
When I was running the AIM7's disk workload on a 8-socket
Westmere-EX server with 4.0 kernel, the kernel crash. A set of small
ramdisks were created (ramdisk_size=271072). Those ramdisks were
formatted with XFS filesystem before the test began. The kernel log
was:
XFS (ram12): Mounting V4 Filesystem
XFS (ram12): Log size 1424 blocks too small, minimum size is 1596 blocks
XFS (ram12): Log size out of supported range. Continuing onwards,
but if log hangs are
experienced then please report this message in the bug report.
goes away. or use "mkfs.xfs -l size=10m" so that the log is larger
than the minimum.
XFS (ram15): Ending clean mountWhy do you have a mix of signed and unsigned modules loaded?
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff812abd6d>] __memcpy+0xd/0x110
PGD 29f7655f067 PUD 29f75a80067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in: xfs exportfs libcrc32c ebtable_nat ebtables
xt_CHECKSUM iptable_mangle bridge stp llc autofs4 ipt_REJECT
nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter
ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6
nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6
vhost_net macvtap macvlan vhost tun kvm_intel kvm ipmi_si
ipmi_msghandler tpm_infineon iTCO_wdt iTCO_vendor_support wmi
acpi_cpufreq microcode pcspkr serio_raw qlcnic be2net vxlan
udp_tunnel ip6_udp_tunnel ses enclosure igb dca ptp pps_core lpc_ich
mfd_core hpilo hpwdt sg i7core_edac edac_core netxen_nic ext4(E)
jbd2(E) mbcache(E) sr_mod(E) cdrom(E) sd_mod(E) lpfc(E) qla2xxx(E)
scsi_transport_fc(E) pata_acpi(E) ata_generic(E) ata_piix(E) hpsa(E)
radeon(E) ttm(E) drm_kms_helper(E) drm(E) i2c_algo_bit(E)
i2c_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
there are some additional RHEL modules loaded at boot up time.
Do you have rhel 6.6 modules loaded into a 4.0 kernel? If so, I'd
suggest you fix things so that doesn't happen before running any
more tests...
In what way?823 case XFS_DINODE_FMT_LOCAL:I have done the bisection and the following commit in 3.13 is the
824 if ((iip->ili_fields& dataflag[whichfork])&&
0x00000000000023c0<+336>: movslq %ecx,%rcx
0x00000000000023c3<+339>: movswl 0x0(%rcx,%rcx,1),%eax
0x00000000000023cb<+347>: test %eax,0x90(%rdx)
0x00000000000023d1<+353>: je 0x2350<xfs_iflush_fork+224>
825 (ifp->if_bytes> 0)) {
0x00000000000023d7<+359>: mov (%r10),%edx
0x00000000000023da<+362>: test %edx,%edx
0x00000000000023dc<+364>: jle 0x2350<xfs_iflush_fork+224>
So the contents of rdx says that the inode fork size is 6 bytes in
local format. The call location also indicates that it is the
attribute fork that is in being flushed. The minimum size of the
attr fork is 3 bytes - an empty header. However, then ext valid size
has a second header that adds 4 bytes to the size, plus the bytes
inteh attr name and value.
Hence a size of 6 bytes is invalid, and probably indicates that
there is some form of memory corruption going on here.
IIRC, we haven't touched this code for a while - can you test 3.19
and see if it has the same problem? If it doesn't have the problem,
and given you can reliably reproduce the crash, can you run a
bisect to find the cause?
one that cause the problem, I think:
f7be2d7f594cbc7a00902b5427332a1ad519a528
xfs: push down inactive transaction mgmt for truncate
I looked at the patch, and it didn't seem quite right,
but I don'tDoesn't actually seem very likely - that's mostly just a factoring
know much about the XFS internal to be sure. Maybe you can take a
look at that.
patch, and it is called on every inode that is reclaimed from
memory, so it's not like that code path doesn't get well tested....
So, I'm confused - I thought you were reporting a recent regression.
Are you actually reporting a regression between a RHEL 6.6 kernel
and the current mainline kernel? Is this the first time you've run
this test on XFS on a kernel more recent than RHEL6.6?
Details, please; they are important.
http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
Cheers,
Dave.