Re: [GIT PULL] x86/mm changes for v5.9

From: Linus Torvalds
Date: Thu Aug 06 2020 - 15:42:47 EST


On Thu, Aug 6, 2020 at 12:23 PM Joerg Roedel <jroedel@xxxxxxx> wrote:
>
> Yes, that's the best for now. My gut feeling is that the fault Jason is
> seeing didn't happen on a vmalloc address, but I can't prove that yet.

No, it's definitely fairly high in the vmalloc space. Look at the
faulting address:

BUG: unable to handle page fault for address: ffffe8ffffd00608

and the code sequence is this:

> 12: 48 8b 06 mov (%rsi),%rax
> 15: 4c 8b 67 40 mov 0x40(%rdi),%r12
> 19: 49 89 c6 mov %rax,%r14
> 1c: 45 30 f6 xor %r14b,%r14b
> 1f: a8 04 test $0x4,%al
> 21: b8 00 00 00 00 mov $0x0,%eax
> 26: 4c 0f 44 f0 cmove %rax,%r14

that admittedly odd sequence is get_work_pwq(work)

And then the faulting instruction is:

> 2a:* 49 8b 46 08 mov 0x8(%r14),%rax <-- trapping instruction

and this is the "->wq" dereference.

So it's the pwq->wq that traps, with 'pwq' being the trapping base
pointer, and clearly being in the vmalloc space.

I think pwq may a percpu allocation, so not _directly_ vmalloc().
Adding Tejun to the cc in case he can clarify ("No, silly Linus, it's
allocated here..").

Linus