Re: [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP
From: Jann Horn
Date: Wed Nov 20 2019 - 09:24:48 EST
On Wed, Nov 20, 2019 at 2:56 PM Andi Kleen <ak@xxxxxxxxxxxxxxx> wrote:
> > Is there a specific concern you have about the instruction decoder? As
> > far as I can tell, all the paths of insn_get_addr_ref() only work if
> > the instruction has a mod R/M byte according to the instruction
> > tables, and then figures out the address based on that. While that
> > means that there's a wide variety of cases in which we won't be able
> > to figure out the address, I'm not aware of anything specific that is
> > likely to lead to false positives.
>
> First there will be a lot of cases you'll just print 0, even
> though 0 is canonical if there is no operand.
Why would I print zeroes if there is no operand? The decoder logic
returns a -1 if it can't find a mod r/m byte, which causes the #GP
handler to not print any address at all. Or are you talking about some
weird instruction that takes an operand that is actually ignored, or
something weird like that?
> Then it might be that the address is canonical, but triggers
> #GP anyways (e.g. unaligned SSE)
Which is an argument for printing the address even if it is canonical,
as Ingo suggested, I guess.
> Or it might be the wrong address if there is an operand,
> there are many complex instructions that reference something
> in memory, and usually do canonical checking there.
In which case you'd probably usually see a canonical address in the
instruction's argument, which causes the error message to not appear
(in patch v2/v3) / to be different (in my current draft for patch v4).
And as Ingo said over in the other thread, even if the argument is not
directly the faulting address at all, it might still help with
figuring out what's going on.
> And some other odd cases. For example when the instruction length
> exceeds 15 bytes.
But this is the #GP handler. Don't overlong instructions give you #UD instead?
> I know there is fuzzing for the instruction
> decoder, but it might be worth double checking it handles
> all of that correctly. I'm not sure how good the fuzzer's coverage
> is.
>
> At a minimum you should probably check if the address is
> actually non canonical. Maybe that's simple enough and weeds out
> most cases.
The patch you're commenting on does that already; quoting the patch:
+ /* Bail out if insn_get_addr_ref() failed or we got a kernel address. */
+ if (addr_ref >= ~__VIRTUAL_MASK)
+ return;
+
+ /* Bail out if the entire operand is in the canonical user half. */
+ if (addr_ref + insn.opnd_bytes - 1 <= __VIRTUAL_MASK)
+ return;
But at Ingo's request, I'm planning to change that in the v4 patch;
see <https://lore.kernel.org/lkml/20191120111859.GA115930@xxxxxxxxx/>
and <https://lore.kernel.org/lkml/CAG48ez0Frp4-+xHZ=UhbHh0hC_h-1VtJfwHw=kDo6NahyMv1ig@xxxxxxxxxxxxxx/>.