Re: [PATCH 10/26] x86, pkeys: notify userspace about protection key faults

From: Ingo Molnar
Date: Fri Sep 25 2015 - 03:11:43 EST

* Dave Hansen <dave@xxxxxxxx> wrote:

> On 09/24/2015 02:30 AM, Ingo Molnar wrote:
> >> To answer your question in the comment: it looks useful to have some sort of
> >> 'extended page fault error code' information here, which shows why the page fault
> >> happened. With the regular error_code it's easy - with protection keys there's 16
> >> separate keys possible and user-space might not know the actual key value in the
> >> pte.
> >
> > Btw., alternatively we could also say that user-space should know what protection
> > key it used when it created the mapping - there's no need to recover it for every
> > page fault.
> That's true. We don't, for instance, tell userspace whether it was a
> write that caused a fault.

I think we do put it into the signal frame, see setup_sigcontext():

put_user_ex(current->thread.error_code, &sc->err);

and 'error_code & PF_WRITE' tells us whether it's a write fault.

And I'm pretty sure applications like Valgrind rely on this.

> But, other than smaps we don't have *any* way to tell userspace what protection
> key a page has. I think some mechanism is going to be required for this to be
> reasonably debuggable.

I think it's a conceptual extension of sigcontext::err and we need it for similar

> > OTOH, as long as we don't do a separate find_vma(), it looks cheap enough to
> > look up the pkey value of that address and give it to user-space in the signal
> > frame.
> I still think that find_vma() in this case is pretty darn cheap, definitely if
> you compare it to the cost of the entire fault path.

So where's the problem? We have already looked up the vma and know whether there's
any vma there or not. Why not pass in that pointer and be done with it? Why
complicate the code by looking up a second time (and exposing us to various

> > Btw., how does pkey support interact with hugepages?
> Surprisingly little. I've made sure that everything works with huge pages and
> that the (huge) PTEs and VMAs get set up correctly, but I'm not sure I had to
> touch the huge page code at all. I have test code to ensure that it works the
> same as with small pages, but everything worked pretty naturally.

Yeah, so the reason I'm asking about expectations is that this code:

+ follow_ret = follow_pte(tsk->mm, address, &ptep, &ptl);
+ if (!follow_ret) {
+ /*
+ * On a successful follow, make sure to
+ * drop the lock.
+ */
+ pte = *ptep;
+ pte_unmap_unlock(ptep, ptl);
+ ret = pte_pkey(pte);

is visibly hugepage-unsafe: if a vma is hugepage mapped, there are no ptes, only
pmds - and the protection key index lives in the pmd. We don't seem to recover
that information properly.

In any case, please put those hugepage tests into tools/tests/selftests/x86/ as
well, as part of the pkey series.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at