Re: [RFC PATCH 0/4] x86: Improve Minimum Alternate Stack Size

From: H.J. Lu
Date: Wed Oct 07 2020 - 09:30:46 EST


On Wed, Oct 7, 2020 at 3:47 AM Dave Martin <Dave.Martin@xxxxxxx> wrote:
>
> On Tue, Oct 06, 2020 at 10:44:14AM -0700, H.J. Lu wrote:
> > On Tue, Oct 6, 2020 at 9:55 AM Dave Martin <Dave.Martin@xxxxxxx> wrote:
> > >
> > > On Tue, Oct 06, 2020 at 08:34:06AM -0700, H.J. Lu wrote:
> > > > On Tue, Oct 6, 2020 at 8:25 AM Dave Martin <Dave.Martin@xxxxxxx> wrote:
> > > > >
> > > > > On Tue, Oct 06, 2020 at 05:12:29AM -0700, H.J. Lu wrote:
> > > > > > On Tue, Oct 6, 2020 at 2:25 AM Dave Martin <Dave.Martin@xxxxxxx> wrote:
> > > > > > >
> > > > > > > On Mon, Oct 05, 2020 at 10:17:06PM +0100, H.J. Lu wrote:
> > > > > > > > On Mon, Oct 5, 2020 at 6:45 AM Dave Martin <Dave.Martin@xxxxxxx> wrote:
> > > > > > > > >
> > > > > > > > > On Tue, Sep 29, 2020 at 01:57:42PM -0700, Chang S. Bae wrote:
> > > > > > > > > > During signal entry, the kernel pushes data onto the normal userspace
> > > > > > > > > > stack. On x86, the data pushed onto the user stack includes XSAVE state,
> > > > > > > > > > which has grown over time as new features and larger registers have been
> > > > > > > > > > added to the architecture.
> > > > > > > > > >
> > > > > > > > > > MINSIGSTKSZ is a constant provided in the kernel signal.h headers and
> > > > > > > > > > typically distributed in lib-dev(el) packages, e.g. [1]. Its value is
> > > > > > > > > > compiled into programs and is part of the user/kernel ABI. The MINSIGSTKSZ
> > > > > > > > > > constant indicates to userspace how much data the kernel expects to push on
> > > > > > > > > > the user stack, [2][3].
> > > > > > > > > >
> > > > > > > > > > However, this constant is much too small and does not reflect recent
> > > > > > > > > > additions to the architecture. For instance, when AVX-512 states are in
> > > > > > > > > > use, the signal frame size can be 3.5KB while MINSIGSTKSZ remains 2KB.
> > > > > > > > > >
> > > > > > > > > > The bug report [4] explains this as an ABI issue. The small MINSIGSTKSZ can
> > > > > > > > > > cause user stack overflow when delivering a signal.
> > > > > > > > > >
> > > > > > > > > > In this series, we suggest a couple of things:
> > > > > > > > > > 1. Provide a variable minimum stack size to userspace, as a similar
> > > > > > > > > > approach to [5]
> > > > > > > > > > 2. Avoid using a too-small alternate stack
> > > > > > > > >
> > > > > > > > > I can't comment on the x86 specifics, but the approach followed in this
> > > > > > > > > series does seem consistent with the way arm64 populates
> > > > > > > > > AT_MINSIGSTKSZ.
> > > > > > > > >
> > > > > > > > > I need to dig up my glibc hacks for providing a sysconf interface to
> > > > > > > > > this...
> > > > > > > >
> > > > > > > > Here is my proposal for glibc:
> > > > > > > >
> > > > > > > > https://sourceware.org/pipermail/libc-alpha/2020-September/118098.html
> > > > > > >
> > > > > > > Thanks for the link.
> > > > > > >
> > > > > > > Are there patches yet? I already had some hacks in the works, but I can
> > > > > > > drop them if there's something already out there.
> > > > > >
> > > > > > I am working on it.
> > > > >
> > > > > OK. I may post something for discussion, but I'm happy for it to be
> > > > > superseded by someone (i.e., other than me) who actually knows what
> > > > > they're doing...
> > > >
> > > > Please see my previous email for my glibc patch:
> > > >
> > > > https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/AT_MINSIGSTKSZ
> > > >
> > > > > > >
> > > > > > > > 1. Define SIGSTKSZ and MINSIGSTKSZ to 64KB.
> > > > > > >
> > > > > > > Can we do this? IIUC, this is an ABI break and carries the risk of
> > > > > > > buffer overruns.
> > > > > > >
> > > > > > > The reason for not simply increasing the kernel's MINSIGSTKSZ #define
> > > > > > > (apart from the fact that it is rarely used, due to glibc's shadowing
> > > > > > > definitions) was that userspace binaries will have baked in the old
> > > > > > > value of the constant and may be making assumptions about it.
> > > > > > >
> > > > > > > For example, the type (char [MINSIGSTKSZ]) changes if this #define
> > > > > > > changes. This could be a problem if an newly built library tries to
> > > > > > > memcpy() or dump such an object defined by and old binary.
> > > > > > > Bounds-checking and the stack sizes passed to things like sigaltstack()
> > > > > > > and makecontext() could similarly go wrong.
> > > > > >
> > > > > > With my original proposal:
> > > > > >
> > > > > > https://sourceware.org/pipermail/libc-alpha/2020-September/118028.html
> > > > > >
> > > > > > char [MINSIGSTKSZ] won't compile. The feedback is to increase the
> > > > > > constants:
> > > > > >
> > > > > > https://sourceware.org/pipermail/libc-alpha/2020-September/118092.html
> > > > >
> > > > > Ah, I see. But both still API and ABI breaks; moreover, declaraing an
> > > > > array with size based on (MIN)SIGSTKSZ is not just reasonable, but the
> > > > > obvious thing to do with this constant in many simple cases. Such usage
> > > > > is widespread, see:
> > > > >
> > > > > * https://codesearch.debian.net/search?q=%5BSIGSTKSZ%5D&literal=1
> > > > >
> > > > >
> > > > > Your two approaches seem to trade off two different sources of buffer
> > > > > overruns: undersized stacks versus ABI breaks across library boundaries.
> > > >
> > > > We can't get everything we want.
> > > >
> > > > > Since undersized stack is by far the more familiar problem and we at
> > > > > least have guard regions to help detect overruns, I'd vote to keep
> > > > > MINSIGSTKSZ and SIGSTKSZ as-is, at least for now.
> > > >
> > > > Agree.
> > > >
> > > > > Or are people reporting real stack overruns on x86 today?
> > > >
> > > > I hope so.
> > > >
> > > > >
> > > > > For arm64, we made large vectors on SVE opt-in, so that oversized signal
> > > > > frames are not seen by default. Would somethine similar be feasible on
> > > > > x86?
> > > > >
> > > > >
> > > > > > > > 2. Add _SC_RSVD_SIG_STACK_SIZE for signal stack size reserved by the kernel.
> > > > > > >
> > > > > > > How about "_SC_MINSIGSTKSZ"? This was my initial choice since only the
> > > > > > > discovery method is changing. The meaning of the value is exactly the
> > > > > > > same as before.
> > > > > > >
> > > > > > > If we are going to rename it though, it could make sense to go for
> > > > > > > something more directly descriptive, say, "_SC_SIGNAL_FRAME_SIZE".
> > > > > > >
> > > > > > > The trouble with including "STKSZ" is that is sounds like a
> > > > > > > recommendation for your stack size. While the signal frame size is
> > > > > > > relevant to picking a stack size, it's not the only thing to
> > > > > > > consider.
> > > > > >
> > > > > > The problem is that AT_MINSIGSTKSZ is the signal frame size used by
> > > > > > kernel. The minimum stack size for a signal handler is more likely
> > > > > > AT_MINSIGSTKSZ + 1.5KB unless AT_MINSIGSTKSZ returns the signal
> > > > > > frame size used by kernel + 6KB for user application.
> > > > >
> > > > > Ack; to be correct, you also need to take into account which signals may
> > > > > be unmasked while running on this stack, and the stack requirements of
> > > > > all their handlers. Unfortunately, that's hard :(
> > > > >
> > > > > What's your view on my naming suggesions?
> > > >
> > > > I used _SC_MINSIGSTKSZ:
> > > >
> > > > https://gitlab.com/x86-glibc/glibc/-/commit/73ca53bfbc1c105bc579f55f15af011a07fcded9
> > >
> > > Apologies, I missed that.
> > >
> > > Otherwise, the changes look much as I would expect, except for the
> > > "6K for user program" thing. This is strictly not included in the
> > > legacy MINSIGSTKSZ.
> > >
> > > >
> > > > >
> > > > > > > Also, do we need a _SC_SIGSTKSZ constant, or should the entire concept
> > > > > > > of a "recommended stack size" be abandoned? glibc can at least make a
> > > > > > > slightly more informed guess about suitable stack sizes than the kernel
> > > > > > > (and glibc already has to guess anyway, in order to determine the
> > > > > > > default thread stack size).
> > > > > >
> > > > > > Glibc should try to deduct signal frame size if AT_MINSIGSTKSZ isn't
> > > > > > available.
> > > > >
> > > > > In my code, I generate _SC_SIGSTKSZ as the equivalent of
> > > > >
> > > > > max(sysconf(_SC_MINSIGSTKSZ) * 4, SIGSTKSZ)
> > > > >
> > > > > which is >= the legacy value, and broadly reperesentative of the
> > > > > relationship between MINSIGSTKSZ and SIGSTKSZ on most arches.
> > > > >
> > > > >
> > > > > What do you think?
> > > >
> > > > sysconf(_SC_MINSIGSTKSZ) should be usable ASIS for most cases.
> > >
> > > Why, though?
> > >
> > > MINSIGSTKSZ is not specified to be usable as-is for any case whatsoever.
> > >
> > >
> > > Software that calculates its own needs to know the actual system values,
> > > not estimates based on guesses about how much stack a typical program
> > > might need if it were recompiled for x86.
> > >
> > > This doesn't mean we can't have a generic suggested value that's suitable
> > > for common scenarios (like SIGSTKSZ), but if we do then I think it
> > > should be a separate constant.
> >
> > I updated my glibc patch to add both _SC_MINSIGSTKSZ and _SC_SIGSTKSZ.
> > _SC_MINSIGSTKSZ is the minimum signal stack size from AT_MINSIGSTKSZ,
> > which is the signal frame size used by kernel, and _SC_SIGSTKSZ is the value
> > of sysconf (_SC_MINSIGSTKSZ) + 6KB for user application.
>
> Can I suggest sysconf (_SC_MINSIGSTKSZ) * 4 instead?

Done.

> If the arch has more or bigger registers to save in the signal frame,
> the chances are that they're going to get saved in some userspace stack
> frames too. So I suspect that the user signal handler stack usage may
> scale up to some extent rather than being a constant.
>
>
> To help people migrate without unpleasant surprises, I also figured it
> would be a good idea to make sure that sysconf (_SC_MINSIGSTKSZ) >=
> legacy MINSIGSTKSZ, and sysconf (_SC_SIGSTKSZ) >= legacy SIGSTKSZ.
> This should makes it safer to use sysconf (_SC_MINSIGSTKSZ) as a
> drop-in replacement for MINSIGSTKSZ, etc.
>
> (To explain: AT_MINSIGSTKSZ may actually be < MINSIGSTKSZ on AArch64.
> My idea was that sysconf () should hide this surprise, but people who
> really want to know the kernel value would tolerate some
> nonportability and read AT_MINSIGSTKSZ directly.)
>
>
> So then:
>
> kernel_minsigstksz = getauxval(AT_MINSIGSTKSZ);
> minsigstksz = LEGACY_MINSIGSTKSZ;
> if (kernel_minsigstksz > minsigstksz)
> minsistksz = kernel_minsigstksz;
>
> sigstksz = LEGACY_SIGSTKSZ;
> if (minsigstksz * 4 > sigstksz)
> sigstksz = minsigstksz * 4;

I updated users/hjl/AT_MINSIGSTKSZ branch with

+@item _SC_MINSIGSTKSZ
+@standards{GNU, unistd.h}
+Inquire about the signal stack size used by the kernel.
+
+@item _SC_SIGSTKSZ
+@standards{GNU, unistd.h}
+Inquire about the default signal stack size for a signal handler.

case _SC_MINSIGSTKSZ:
assert (GLRO(dl_minsigstacksize) != 0);
return GLRO(dl_minsigstacksize);

case _SC_SIGSTKSZ:
{
/* Return MAX (MINSIGSTKSZ, sysconf (_SC_MINSIGSTKSZ)) * 4. */
long int minsigstacksize = GLRO(dl_minsigstacksize);
_Static_assert (__builtin_constant_p (MINSIGSTKSZ),
"MINSIGSTKSZ is constant");
if (minsigstacksize < MINSIGSTKSZ)
minsigstacksize = MINSIGSTKSZ;
return minsigstacksize * 4;
}

>
> (Or something like that, unless the architecture provides its own
> definitions. ia64's MINSIGSTKSZ is enormous, so it probably doesn't
> want this.)
>
>
> Also: should all these values be rounded up to a multiple of the
> architecture's preferred stack alignment?

Kernel should provide a properly aligned AT_MINSIGSTKSZ.

> Should the preferred stack alignment also be exposed through sysconf?
> Portable code otherwise has no way to know this, though if the
> preferred alignment is <= the minimum malloc()/alloca() alignment then
> this is generally not an issue.)

Good question. But it is orthogonal to the signal stack size issue.

>
> > >
> > > > > > > > 3. Deprecate SIGSTKSZ and MINSIGSTKSZ if _SC_RSVD_SIG_STACK_SIZE
> > > > > > > > is in use.
> > > > > > >
> > > > > > > Great if we can do it. I was concerned that this might be
> > > > > > > controversial.
> > > > > > >
> > > > > > > Would this just be a recommendation, or can we enforce it somehow?
> > > > > >
> > > > > > It is just an idea. We need to move away from constant SIGSTKSZ and
> > > > > > MINSIGSTKSZ.
> > > > >
> > > > > Totally agree with that.
> > > > >
> > > >
> > > > With my glibc patch, -D_SC_MINSIGSTKSZ_SOURCE will fail to compile
> > > > if the source assumes constant SIGSTKSZ or MINSIGSTKSZ.
> > >
> > > Ah yes, I see. That's a sensible precaution.
> > >
> > > Is it accepted in general that defining different feature test macros
> > > can lead to ABI incompatibilities?
> > >
> > > I have thought that building a shared library with _GNU_SOURCE (say)
> > > doesn't mean that a program that loads that library must also be built
> > > with _GNU_SOURCE. For one thing, that's hard to police.
> > >
> > >
> > > However, there are already combinations that could break, e.g., mixing
> > > -D_FILE_OFFSET_BITS=64 with -D_FILE_OFFSET_BITS=32 would be broken if
> > > this define changes off_t.
> > >
> > >
> > > So, maybe having _SC_MINSIGSTKSZ_SOURCE break things in this way is an
> > > acceptable compromise. Interfaces that depend on the value of
> > > MINSIGSTKSZ or SIGSTKSZ are possible, but probably rare in practice --
> > > I don't know of a specific example.
> > >
> >
> > I changed it to _SC_SIGSTKSZ_SOURCE:
> >
> > https://gitlab.com/x86-glibc/glibc/-/commit/41d5e6b31025721590f12d5aa543eb0bc53ce263
> >
> > #ifdef __USE_SC_SIGSTKSZ
> > # include <unistd.h>
> > /* Minimum stack size for a signal handler: sysconf (SC_SIGSTKSZ). */
> > # undef MINSIGSTKSZ
> > # define MINSIGSTKSZ sysconf (_SC_SIGSTKSZ)
> > /* System default stack size for a signal handler: MINSIGSTKSZ. */
> > # undef SIGSTKSZ
> > # define SIGSTKSZ MINSIGSTKSZ
> > #endif
> >
> > Compilation will fail if the source assumes constant MINSIGSTKSZ or
> > SIGSTKSZ.
>
> I don't understand all the glibc-fu, bit it looks reasonable overall
> (notwithstanding my comments above).
>
> Cheers
> ---Dave



--
H.J.