Re: [PATCH 2/3] x86/traps: Print non-canonical address on #GP

From: Jann Horn
Date: Thu Nov 14 2019 - 15:03:42 EST


On Thu, Nov 14, 2019 at 7:00 PM Andy Lutomirski <luto@xxxxxxxxxx> wrote:
> On Thu, Nov 14, 2019 at 9:46 AM Sean Christopherson
> <sean.j.christopherson@xxxxxxxxx> wrote:
> > On Tue, Nov 12, 2019 at 10:10:01PM +0100, Jann Horn wrote:
> > > A frequent cause of #GP exceptions are memory accesses to non-canonical
> > > addresses. Unlike #PF, #GP doesn't come with a fault address in CR2, so
> > > the kernel doesn't currently print the fault address for #GP.
> > > Luckily, we already have the necessary infrastructure for decoding X86
> > > instructions and computing the memory address that is being accessed;
> > > hook it up to the #GP handler so that we can figure out whether the #GP
> > > looks like it was caused by a non-canonical address, and if so, print
> > > that address.
[...]
> > > + /*
> > > + * If insn_get_addr_ref() failed or we got a canonical address in the
> > > + * kernel half, bail out.
> > > + */
> > > + if ((addr_ref | __VIRTUAL_MASK) == ~0UL)
> > > + return;
> > > + /*
> > > + * For the user half, check against TASK_SIZE_MAX; this way, if the
> > > + * access crosses the canonical address boundary, we don't miss it.
> > > + */
> > > + if (addr_ref <= TASK_SIZE_MAX)
> >
> > Any objection to open coding the upper bound instead of using
> > TASK_SIZE_MASK to make the threshold more obvious?
> >
> > > + return;
> > > +
> > > + pr_alert("dereferencing non-canonical address 0x%016lx\n", addr_ref);
> >
> > Printing the raw address will confuse users in the case where the access
> > straddles the lower canonical boundary. Maybe combine this with open
> > coding the straddle case? With a rough heuristic to hedge a bit for
> > instructions whose operand size isn't accurately reflected in opnd_bytes.
> >
> > if (addr_ref > __VIRTUAL_MASK)
> > pr_alert("dereferencing non-canonical address 0x%016lx\n", addr_ref);
> > else if ((addr_ref + insn->opnd_bytes - 1) > __VIRTUAL_MASK)
> > pr_alert("straddling non-canonical boundary 0x%016lx - 0x%016lx\n",
> > addr_ref, addr_ref + insn->opnd_bytes - 1);
> > else if ((addr_ref + PAGE_SIZE - 1) > __VIRTUAL_MASK)
> > pr_alert("potentially straddling non-canonical boundary 0x%016lx - 0x%016lx\n",
> > addr_ref, addr_ref + PAGE_SIZE - 1);
>
> This is unnecessarily complicated, and I suspect that Jann had the
> right idea but just didn't quite explain it enough. The secret here
> is that TASK_SIZE_MAX is a full page below the canonical boundary
> (thanks, Intel, for screwing up SYSRET), so, if we get #GP for an
> address above TASK_SIZE_MAX, then it's either a #GP for a different
> reason or it's a genuine non-canonical access.
>
> So I think that just a comment about this would be enough.

Ah, I didn't realize that insn->opnd_bytes exists. Since I already
have that available, I guess using that is cleaner than being clever
with TASK_SIZE_MAX.

> *However*, the printout should at least hedge a bit and say something
> like "probably dereferencing non-canonical address", since there are
> plenty of ways to get #GP with an operand that is nominally
> non-canonical but where the actual cause of #GP is different.

Ah, yeah, I'll change that.

> And I think this code should be skipped entirely if error_code != 0.

Makes sense. As Borislav suggested, I'll add some code to
do_general_protection() to instead print a hint about it being a
segment-related problem.