On Tue, Nov 7, 2017 at 2:26 AM, Fengguang Wu <fengguang.wu@xxxxxxxxx> wrote:
FYI this happens in v4.14-rc8 -- it's not necessarily a new bug.
.. in fact I don't think it's a bug at all. Not in the kernel, that is.
[ 186.238181] BUG: unable to handle kernel paging request at ffff880210af6000
[ 186.257107] IP: slob_free+0x1c4/0x276
This looks like the same bug we saw earlier, which is due to a gcc bug.
The trapping code disassembles to:
0: 8b 45 00 mov 0x0(%rbp),%eax
3: 41 be 01 00 00 00 mov $0x1,%r14d
9: 48 89 ef mov %rbp,%rdi
c: 66 85 c0 test %ax,%ax
and the thing to note is that: "test %ax,%ax".
It's testing a 16-bit value, but it *loads* a 32-bit one.
It is supposed to load a 16-bit value from the last two bytes of the page:
RBP: ffff880210af5ffe
but because it has turned the 16-bit load into a 32-bit load, it
faults when accessing the next page.
It's hard to trigger, since you need to have the next page unmapped
due to DEBUG_PAGEALLOC and have just the right allocations etc to make
this happen, but clearly the 0day has gotten pretty good at triggering
it.
Anyway, for now, I'd suggest 0day either:
- upgrade the compiler (this is known to happen with 4.8 and 4.9 but
apparently not 5.1)
- not use SLOB in the kernel configurations it tests
Honestly, I'd prefer the former, because apparently you use some
ancient debian gcc version 4.8.4, and gcc these days is on 7.2.
Apparently the ancient gcc version is causing problems with KASAN too.
Anyway, I will be ignoring the slob_free() reports for now, and you
should too until the gcc version is fixed.