Re: arm/ksm: Unable to handle kernel paging request in get_ksm_page() and ksm_scan_thread()

From: Hugh Dickins
Date: Sun Mar 29 2015 - 20:43:37 EST


On Sat, 28 Mar 2015, Xishi Qiu wrote:
> On 2015/3/26 21:23, Xishi Qiu wrote:
>
> > Here are two panic logs from smart phone test, and the kernel version is v3.10.
> >
> > log1 is "Unable to handle kernel paging request at virtual address c0704da020", it should be ffffffc0704da020, right?

That one was an oops at get_ksm_page+0x34/0x150: I'm pretty sure that
comes from the "kpfn = ACCESS_ONCE(stable_node->kpfn)" line, that the
stable_node pointer (in x21 or x22) has upper bits cleared; which
suggests corruption of the rmap_item supposed to point to it.

get_ksm_page() is tricky with ACCESS_ONCEs against page migration,
and the structures tricky with unions; but pointers overlay pointers
in those unions, I don't see any way we might pick up an address with
the upper 24 or 32 bits cleared due to that.

> > and log2 is "Unable to handle kernel paging request at virtual address 1e000796", it should be ffffffc01e000796, right?

And this one was an oops at ksm_scan_thread+0x4ac/0xce0; as is the oops
you posted below. Which contains lots of hex numbers, but very little
info I can work from.

Please make a CONFIG_DEBUG_INFO=y build of one of the kernels you're
hitting this with, then use the disassembler (objdump -ld perhaps) to
identify precisely which line of ksm.c that is oopsing on: the compiler
will have inlined more interesting functions into ksm_scan_thread, so
I haven't a clue where it's actually oopsing.

Maybe we'll find that it's also oopsing on a kernel virtual address
from an rmap_item, maybe we won't.

And I don't read arm64 assembler at all, so I shall be rather limited
in what I can tell you, I'm afraid.

> >
> > I cann't repeat the panic by test, so could anyone tell me this is the
> > bug of ksm or other reason?

I've not heard of any problem like this with KSM on other architectures.
Maybe it is making some assumption which is invalid on arm64, but I'd
have thought we'd have heard about that before now. My guess is that
something in your kernel is stamping on KSM's structures.

A relevant experiment (after identifying the oops line in your current
kernel) might be to switch from CONFIG_SLAB=y to CONFIG_SLUB=y or vice
versa. I doubt SLAB or SLUB is to blame, but changing allocator might
shake things up in a way that either hides the problem, or shifts it
elsewhere.

Hugh

> >
> > Thanks,
> > Xishi Qiu
> >
>
> Here is another one.
>
> [145556.775726s][2015:03:24 20:07:00][pid:864,cpu0,ksmd]Unable to handle kernel paging request at virtual address ff00000000000018
> [145556.775817s][2015:03:24 20:07:00][pid:864,cpu0,ksmd]pgd = ffffffc07f5e4000
> [145556.775817s][2015:03:24 20:07:00][pid:864,cpu0,ksmd][ff00000000000018] *pgd=0000000080808003, *pmd=0000000000000000
> [145556.775878s][2015:03:24 20:07:00][pid:864,cpu0,ksmd]Internal error: Oops: 96000006 [#1] PREEMPT SMP
> [145556.775909s][2015:03:24 20:07:00][pid:864,cpu0,ksmd]Modules linked in:
> [145556.776000s][2015:03:24 20:07:00][pid:864,cpu0,ksmd]CPU: 0 PID: 864 Comm: ksmd Tainted: G W 3.10.61-g2aca0a6-dirty #2
> [145556.776031s][2015:03:24 20:07:00][pid:864,cpu0,ksmd]task: ffffffc0bc06ee00 ti: ffffffc0baae4000 task.ti: ffffffc0baae4000
> [145556.776092s][2015:03:24 20:07:00][pid:864,cpu0,ksmd]PC is at ksm_scan_thread+0x4ac/0xce0
> [145556.776123s][2015:03:24 20:07:00][pid:864,cpu0,ksmd]LR is at ksm_scan_thread+0x49c/0xce0
> [145556.776153s][2015:03:24 20:07:00][pid:864,cpu0,ksmd]pc : [<ffffffc00077a3e4>] lr : [<ffffffc00077a3d4>] pstate: 80000145
> [145556.776153s][2015:03:24 20:07:00][pid:864,cpu0,ksmd]sp : ffffffc0baae7d50
> [145556.776184s][2015:03:24 20:07:00][pid:864,cpu0,ksmd]x29: ffffffc0baae7d50 x28: 0000000075a40000
> [145556.776214s][2015:03:24 20:07:00][pid:864,cpu0,ksmd]x27: ffffffbc02308260 x26: ffffffc0010ab000
> [145556.776245s][2015:03:24 20:07:00][pid:864,cpu0,ksmd]x25: ffffffc0599392a0 x24: ffffffc0baae4000
> [145556.776306s][2015:03:24 20:07:00][pid:864,cpu0,ksmd]x23: ffffffc001a0aa90 x22: ffffffc0baae7df8
> [145556.776336s][2015:03:24 20:07:00][pid:864,cpu0,ksmd]x21: ffffffc084150080 x20: ff00000000000000
> [145556.776367s][2015:03:24 20:07:00][pid:864,cpu0,ksmd]x19: ffffffc0018ddb88 x18: 0000000000000000
> [145556.776397s][2015:03:24 20:07:00][pid:864,cpu0,ksmd]x17: 0000007f7f28a974 x16: ffffffc0007ca16c
> [145556.776428s][2015:03:24 20:07:00][pid:864,cpu0,ksmd]x15: 0000000000000873 x14: 0000000000000001
> [145556.776458s][2015:03:24 20:07:00][pid:864,cpu0,ksmd]x13: 0000000000000001 x12: 0000000000000848
> [145556.776489s][2015:03:24 20:07:00][pid:864,cpu0,ksmd]x11: 0000000000000848 x10: 000000006995fcb1
> [145556.776519s][2015:03:24 20:07:00][pid:864,cpu0,ksmd]x9 : 00000000c72311f7 x8 : 0000000009050501
> [145556.776550s][2015:03:24 20:07:00][pid:864,cpu0,ksmd]x7 : 0000000005aeda8e x6 : 00000000fa9a48df
> [145556.776611s][2015:03:24 20:07:00][pid:864,cpu0,ksmd]x5 : ffffffc095e7abb0 x4 : 00000000000bffff
> [145556.776641s][2015:03:24 20:07:00][pid:864,cpu0,ksmd]x3 : 0000000000000001 x2 : 0000000000000001
> [145556.776672s][2015:03:24 20:07:00][pid:864,cpu0,ksmd]x1 : 0000000000100051 x0 : ffffffbc02308260
[ remainder snipped ]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/