Re: [PATCH] x86/mm: determine whether the fault address is canonical

From: Ingo Molnar
Date: Mon Oct 07 2019 - 10:33:02 EST

Next message: Guenter Roeck: "Re: [PATCH 5.2 000/137] 5.2.20-stable review"
Previous message: Guenter Roeck: "Re: [PATCH 4.19 000/106] 4.19.78-stable review"
In reply to: Sean Christopherson: "Re: [PATCH] x86/mm: determine whether the fault address is canonical"
Next in thread: Ingo Molnar: "Re: [PATCH] x86/mm: determine whether the fault address is canonical"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

* Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote:

> On Fri, Oct 04, 2019 at 07:39:08AM -0700, Dave Hansen wrote:
> > On 10/4/19 6:45 AM, Changbin Du wrote:
> > > +static inline bool is_canonical_addr(u64 addr)
> > > +{
> > > +#ifdef CONFIG_X86_64
> > > + int shift = 64 - boot_cpu_data.x86_phys_bits;
> >
> > I think you mean to check the virtual bits member, not "phys_bits".
> >
> > BTW, I also prefer the IS_ENABLED(CONFIG_) checks to explicit #ifdefs.
> > Would one of those work in this case?
> >
> > As for the error message:
> >
> > > {
> > > - WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault in user access. Non-canonical address?");
> > > + WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault at %s address in user access.",
> > > + is_canonical_addr(fault_addr) ? "canonical" : "non-canonical");
> >
> > I've always read that as "the GP might have been caused by a
> > non-canonical access". The main nit I'd have with the change is that I
> > don't think all #GP's during user access functions which are given a
> > non-canonical address *necessarily* caused the #GP.
> >
> > There are a billion ways you can get a #GP and I bet canonical
> > violations aren't the only way you can get one in a user copy function.
>
> All the other reasons would require a fairly egregious kernel bug, hence
> the speculation that the #GP is due to a non-canonical address. Something
> like the following would be more precise, though highly unlikely to ever
> be exercised, e.g. KVM had a fatal bug related to injecting a non-zero
> error code that went unnoticed for years.
>
> WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault in user access. %s?\n",
> (IS_ENABLED(CONFIG_X86_64) && !error_code) ? "Non-canonical address" :
> "Segmentation bug");

Instead of trying to guess the reason of the #GPF (which guess might be
wrong), please just state it as the reason if we are sure that the cause
is a non-canonical address - and provide a best-guess if it's not but
clearly signal that it's a guess.

I.e. if I understood all the cases correctly we'd have three types of
messages generated:

!error_code:
"General protection fault in user access, due to non-canonical address."

error_code && !is_canonical_addr(fault_addr):
"General protection fault in user access. Non-canonical address?"

error_code && is_canonical_addr(fault_addr):
"General protection fault in user access. Segmentation bug?"

Only the first one is declarative, because we know we got a #GP with a
zero error code which should denote a non-canonical address access.

The second and third ones are guesses with question marks to communicate
the uncertainty.

Assuming that !error_code always means non-canonical access?

And hopefully "!error_code && !is_canonical_addr(fault_addr)" is not
possible?

Thanks,

Ingo

Next message: Guenter Roeck: "Re: [PATCH 5.2 000/137] 5.2.20-stable review"
Previous message: Guenter Roeck: "Re: [PATCH 4.19 000/106] 4.19.78-stable review"
In reply to: Sean Christopherson: "Re: [PATCH] x86/mm: determine whether the fault address is canonical"
Next in thread: Ingo Molnar: "Re: [PATCH] x86/mm: determine whether the fault address is canonical"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]