RE: 3.19 kernel: BUG: unable to handle kernel NULL pointer dereference [SOLVED (use 3.15 for now)]

From: Justin Piszcz
Date: Fri Mar 06 2015 - 17:25:25 EST


> -----Original Message-----
> From: Justin Piszcz [mailto:jpiszcz@xxxxxxxxxxxxxxx]
> Sent: Saturday, February 28, 2015 5:57 PM
> To: linux-kernel@xxxxxxxxxxxxxxx
> Subject: RE: 3.19 kernel: BUG: unable to handle kernel NULL pointer
> dereference
>

Removing the card did not fix the issue.

I've gone back to 3.15 in the meantime and was able to process 20TB of test
data in a little over 10 hours--without any crashes with the following
.config:
https://home.comcast.net/~jpiszcz/20150306/3.15-working.txt

Proof:
p34:/r1# /usr/bin/time cp -r /nfs/atom/r1 .
35.79user 19018.69system 10:07:54elapsed 52%CPU (0avgtext+0avgdata
4484maxresident)k
0inputs+0outputs (0major+1443minor)pagefaults 0swaps
p34:/r1#

$ uname -r
Linux box 3.15.0 #1 SMP Sat Jul 12 09:54:17 EDT 2014 x86_64 GNU/Linux
$ zcat /proc/config.gz > ~/3.15-working.txt

I have not had time to perform a git bi-sect etc. but hopefully this helps
someone with the DMAR/PTE issue- 3.19 crashes consistently with any of these
configs:
http://home.comcast.net/~jpiszcz/20150306/config-3.19.0-1.txt
http://home.comcast.net/~jpiszcz/20150306/config-3.19.0-2.txt
http://home.comcast.net/~jpiszcz/20150306/config-3.19.0-3.txt
http://home.comcast.net/~jpiszcz/20150306/config-3.19.0-4.txt

$ diff -u 3.15-working.txt config-3.19.0-4.txt | grep -i DMAR

$ grep -i DMAR 3.15-working.txt config-3.19.0-4.txt
3.15-working.txt:CONFIG_DMAR_TABLE=y
config-3.19.0-4.txt:CONFIG_DMAR_TABLE=y

$ cp dmesg ~/dmesg-3.15.txt
$ cat dmesg.0 > ~/dmesg-3.19.txt

DMESG:
http://home.comcast.net/~jpiszcz/20150306/dmesg-3.15.txt (works 100%!)
http://home.comcast.net/~jpiszcz/20150306/dmesg-3.19.txt (crashes
consistently when copying files over NFS)

With the 3.15 kernel:
[ 0.058061] Freeing SMP alternatives memory: 28K (ffffffff81d92000 -
ffffffff81d99000)
[ 0.058208] dmar: Host address width 40
[ 0.058272] dmar: DRHD base: 0x000000f8dfe000 flags: 0x0
[ 0.058350] dmar: IOMMU 0: reg_base_addr f8dfe000 ver 1:0 cap
c90780106f0462 ecap f020f6
[ 0.058444] dmar: DRHD base: 0x000000fecfe000 flags: 0x1
[ 0.058515] dmar: IOMMU 1: reg_base_addr fecfe000 ver 1:0 cap
c90780106f0462 ecap f020f6
[ 0.058599] dmar: RMRR base: 0x000000000ec000 end: 0x000000000effff
[ 0.058667] dmar: RMRR base: 0x000000bf7ec000 end: 0x000000bf7fffff
[ 0.058734] dmar: ATSR flags: 0x0
[ 0.058796] dmar: ATSR flags: 0x0
[ 0.058859] dmar: RHSA base: 0x000000fecfe000 proximity domain: 0x0
[ 0.058927] dmar: RHSA base: 0x000000f8dfe000 proximity domain: 0x1
[ 0.059266] Switched APIC routing to physical flat.

With the 3.19 kernel:
[ 0.055785] Freeing SMP alternatives memory: 32K (ffffffff81faf000 -
ffffffff81fb7000)
[ 0.055939] dmar: Host address width 40
[ 0.056003] dmar: DRHD base: 0x000000f8dfe000 flags: 0x0
[ 0.056080] dmar: IOMMU 0: reg_base_addr f8dfe000 ver 1:0 cap
c90780106f0462 ecap f020f6
[ 0.056164] dmar: DRHD base: 0x000000fecfe000 flags: 0x1
[ 0.056237] dmar: IOMMU 1: reg_base_addr fecfe000 ver 1:0 cap
c90780106f0462 ecap f020f6
[ 0.056321] dmar: RMRR base: 0x000000000ec000 end: 0x000000000effff
[ 0.056389] dmar: RMRR base: 0x000000bf7ec000 end: 0x000000bf7fffff
[ 0.056466] dmar: ATSR flags: 0x0
[ 0.056528] dmar: ATSR flags: 0x0
[ 0.056591] dmar: RHSA base: 0x000000fecfe000 proximity domain: 0x0
[ 0.056658] dmar: RHSA base: 0x000000f8dfe000 proximity domain: 0x1
[ 0.056997] Switched APIC routing to physical flat.

Justin.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/