Re: [PATCH v8 00/10] Intel MPX support

From: Dave Hansen
Date: Fri Sep 12 2014 - 17:23:44 EST

Next message: Greg KH: "Re: linux-3.16.2 queue (3.16.1+)"
Previous message: John Stultz: "[PATCH 2/4] alarmtimer: Return relative times in timer_gettime"
In reply to: Thomas Gleixner: "Re: [PATCH v8 00/10] Intel MPX support"
Next in thread: Thomas Gleixner: "Re: [PATCH v8 00/10] Intel MPX support"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 09/12/2014 12:21 PM, Thomas Gleixner wrote:
> Yes, the most important question is WHY must the kernel handle the
> bound table memory allocation in the first place. The "documentation"
> patch completely fails to tell that.

This will become the description of "patch 04/10". Feel free to wait
until we repost these to read it, but I'm posting it here because it's
going to be a couple of days before we actually get a new set of patches
out.

Any suggestions for how much of this is appropriate for Documentation/
would be much appreciated. I don't have a good feel for it.

---

Subject: x86: mpx: on-demand kernel allocation of bounds tables
MPX only has 4 hardware registers for storing bounds information.
If MPX-enabled code needs more than these 4 registers, it needs
to spill them somewhere. It has two special instructions for
this which allow the bounds to be moved between the bounds
registers and some new "bounds tables".

#BR exceptions are a new class of exceptions just for MPX. They
are similar conceptually to a page fault and will be raised by
the MPX hardware during both bounds violations or when the tables
are not present. This patch handles those #BR exceptions for
not-present tables by carving the space out of the normal
processes address space (essentially calling mmap() from inside
the kernel) and then pointing the bounds-directory over to it.

The tables *need* to be accessed and controlled by userspace
because the instructions for moving bounds in and out of them are
extremely frequent. They potentially happen every time a
register points to memory. Any direct kernel involvement (like a
syscall) to access the tables would obviously destroy
performance.

==== Why not do this in userspace? ====

This patch is obviously doing this allocation in the kernel.
However, MPX does not strictly *require* anything in the kernel.
It can theoretically be done completely from userspace. Here are
a few ways this *could* be done. I don't think any of them are
practical in the real-world, but here they are.

Q: Can virtual space simply be reserved for the bounds tables so
that we never have to allocate them?
A: As noted earlier, these tables are *HUGE*. An X-GB virtual
area needs 4*X GB of virtual space, plus 2GB for the bounds
directory. If we were to preallocate them for the 128TB of
user virtual address space, we would need to reserve 512TB+2GB,
which is larger than the entire virtual address space today.
This means they can not be reserved ahead of time. Also, a
single process's pre-popualated bounds directory consumes 2GB
of virtual *AND* physical memory. IOW, it's completely
infeasible to prepopulate bounds directories.

Q: Can we preallocate bounds table space at the same time memory
is allocated which might contain pointers that might eventually
need bounds tables?
A: This would work if we could hook the site of each and every
memory allocation syscall. This can be done for small,
constrained applications. But, it isn't practical at a larger
scale since a given app has no way of controlling how all the
parts of the app migth allocate memory (think libraries). The
kernel is really the only place to intercept these calls.

Q: Could a bounds fault be handed to userspace and the tables
allocated there in a signal handler intead of in the kernel?
A: (thanks to tglx) mmap() is not on the list of safe async
handler functions and even if mmap() would work it still
requires locking or nasty tricks to keep track of the
allocation state there.

Having ruled out all of the userspace-only approaches for managing
bounds tables that we could think of, we create them on demand
in the kernel.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Greg KH: "Re: linux-3.16.2 queue (3.16.1+)"
Previous message: John Stultz: "[PATCH 2/4] alarmtimer: Return relative times in timer_gettime"
In reply to: Thomas Gleixner: "Re: [PATCH v8 00/10] Intel MPX support"
Next in thread: Thomas Gleixner: "Re: [PATCH v8 00/10] Intel MPX support"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]