Re: Documenting sigaltstack SS_AUTODISRM

From: Michael Kerrisk (man-pages)
Date: Mon Oct 30 2017 - 08:38:30 EST


On 05/24/2017 10:12 PM, Stas Sergeev wrote:
> Hello,
>
> 24.05.2017 14:09, Michael Kerrisk (man-pages) ÐÐÑÐÑ:
>>> Could you please point and cite the spec that says
>>> exactly this?
>> I take your point: the text of the spec could be more precise. It
>> does not provide a complete support for my assertion.
>>
>> But, I do think the interpretation I suggest is the more natural one,
>> for many reasons:
> Note: below I am not arguing to whether it is
> good or bad to use SS_ONSTACK. This question
> is closed as soon as you pointed to EINVAL in
> other OSes (it doesn't make it bad immediately,
> but at least the outcome of its usage is now
> pretty clear for me).
> What I still want to point out is the fact that the
> implementation in linux was very unlikely caused
> by a confusion. It was the decision made in agreement
> with the most straight-forward reading of the posix
> spec, just maybe without knowing about this EINVAL
> problem. And it can't create a compatibility problem
> by itself because no one have explicitly promoted its
> use: 0 is still accepted.
> Why it could create a compatibility problem is only
> because people suddenly started to use it, and for
> the very simple reason (judging by myself): I was
> absolutely confused with this '0' in the example, and
> after many attempts to find even a slightest hint in
> the same doc why it is safe (why SS_DISABLE can't
> be 0), I resorted to looking into the kernel and the
> code of other projects. And SS_ONSTACK looked
> like a perfectly valid solution, fully agreeing with
> the text (i.e. SS_ONSTACK != SS_DISABLE).
> So what really creates a compatibility problem here,
> is only a misleading text (or a misleading implementation
> of it in other OSes). And as such, stating that "someone
> was confused" looks more like an unjustified insult to me. :)

See previous mail :-).

Just by the bye, the correct (according to my definition)
way of doing things (ss.ss_flags == 0 to establish an
alternate signal stack) has been documented in the manual
page since 2001. (That page documented the Linux 2.2
behavior.)

>> 1. The field is named '*flags', which commonly means a bit mask.
> Indeed, but the value suggests otherwise.
> The name for a flag could be "SS_DISABLED"
> (with D at the end), while "SS_DISABLE" suggests
> an action. And indeed, it doesn't change a property
> of a performing action the way SS_AUTODISARM
> does. Instead it changes the action itself: the sas
> get cleared instead of being set, and all the other
> arguments are ignored. Eg I would be very surprised
> seeing the "MAP_UNMAP" flag to mmap() that turns
> it into munmap(). This more resembles the "action"
> argument of the sigprocmask() call.
>
>> 2. The example in that you mention that is in the spec is part of
>> the spec. It illustrates the understanding of the developers
>> of the spec about the interface they were specifying.
>> 3. Various existing implementations treat the field as a bit mask
>> for which there is only one valid bit (SS_DISABLE), doing
>> tests of the form (ss.ss_flags & SS_DISABLE).
>> 4. Some implementations (including *all* the ones that I looked at
>> the source code for, that is Illumos, FreeBSD, OpenBSD)
>> explicitly error if ss.ss_flags has bits other than SS_DISABLE set.
> Well, yes, those are the practical arguments that
> can't be ignored... Of course the one can try to
> submit the patch to these projects that nullifies
> this argument. :)
>
>> 6. Before kernel 2.4 (Jan 2001), Linux also used to do 3 & 4!
> Is there a discussion about this change somewhere?
>
>> 8. The standard explicitly mentions that SS_ONSTACK may be returned
>> in old_ss.ss_flags, but makes no mention of the use of SS_ONSTACK
>> in ss.ss_flags. That fact should, IMO, be taken as a strong hint
>> that the standard developers did not believe that SS_ONSTACK was
>> to be used with ss.ss_flags.
> But this doesn't matter as they "seemingly" allow
> any other value than SS_DISABLE, and this one is
> the most reasonable value of all (unless you know
> this is a bitmask, of course).
>
> So to make it clear: I am not arguing to what is
> better or more portable. I am just not satisfied
> with the statement that whoever implemented
> it that way (and whoever uses it, too, obviously),
> was confused or read the standard badly. It is
> not the case. Also I don't agree that this implementation
> can create any (portability or any other) kind of
> problem: it still accepts 0, and its author didn't
> advice anyone to use SS_ONSTACK. So in my eyes
> the implementation is perfectly valid, no ifs no
> buts. :) Man page can warn people to not use
> SS_ONSTACK, but it shouldn't blame the author
> of the kernel code in question IMHO.
>
>>>>>> API history is littered with stories where users found out that
>>>>>> something unforeseen "worked", and so they used it. The question
>>>>>> is: what can go wrong if people do try using this "feature"?
>>>>> It will disappear at the exit from SIGA.
>>>>> To me this is "wrong enough" to not suggest doing so.
>>>> See my comment above. It's weird because it will disappear at exit
>>>> from SIGA, but not "wrong".
>>> What do you mean?
>>> There was no any notion of the "sigaltstack scope",
>>> so with the existing semantic it is wrong, because
>>> currently sigaltstack has no scope and can't change
>> I'm not sure what you mean by "currently" here. I'm pointing
>> out that whereas before one could not change the signal stack
>> while executing a handler that is using a signal stack, now it
>> seems to be possible. But, it's not random: the programmer must
>> explicitly make this happen (and be lucky in the timing of signals).
> By "random points" I meant that sas swaps back
> to the previous one on a sighandler return. This
> sighandler return is just a random point for anyone
> who assumes the global scope of the sas. And the
> scope of sas was always global, so from that POV
> it doesn't work reliably. If you invent some notion
> of the scoped sigaltstacks, then you will turn that
> into a working functionality with new semantic.
> But I don't think you can call this "working" without
> first inventing an adequate semantic for it.
>
>>> at random moments. You can make it "not wrong"
>>> by inventing a new semantic with some notion of
>>> the "sigaltstack scope" though. Whether it worth the
>>> troubles, is what we will see. :)
>> I don't think I'm inventing a new semantic. I think you already
>> did that :-). The question is what we should say about it in the
>> man page. I don't think being silent on this detail is the way to
>> go. Perhaps noting a few details and warning the reader strongly
>> against relying on this "feature" in any way is appropriate?
> This is entirely up to you. My point is just that it is
> not a "few details", but really a new semantic with
> a notion of a sas's scope. I.e. when the control goes
> out of current scope (sighandler return), sas reverts
> to the one of the parent scope. Since this was not
> envisioned and is unlikely needed to anyone, I was
> just suggesting to not do this, but if you want to spec
> this all - why not. :)
>
>>>>> The kernel already has the sigaltstack test-case,
>>>>> so maybe you can add some checks to it from your
>>>>> test-case.
>>>> I must admit I'm still trying to grasp some details of what's
>>>> possible. What tests do you think could be usefully added?
>>> If you are going to add the scoped/nested sigaltstacks,
>>> then perhaps you should add the test that nesting works
>>> correctly (you have that already in your test-case), and
>>> maybe also the direct manipulations to uc_stack, as this
>>> is the only _reliable_ way to set the new sas inside the
>>> sighandler, that I can think of.
>> See above. I'm not sure that we want to specify things to this
>> level. But my point is that in the lack of any text in the man
>> page on the topic, some user-space programmers will discover the
>> feature and perhaps try to use it. The question is what the man
>> page should say to those programmers. Do you see what I mean?
> Yes, but I don't see what the man page should say
> to those programmers. :) Or if I do, the description
> would became too lengthy and complex.

So, the man page text currently says:

If old_ss is not NULL, then it is used to return information about
the alternate signal stack which was in effect prior to the call
to sigaltstack(). The old_ss.ss_sp and old_ss.ss_size fields
return the starting address and size of that stack. The
old_ss.ss_flags may return either of the following values:

SS_ONSTACK
The process is currently executing on the alternate signal
stack. (Note that it is not possible to change the alterâ
nate signal stack if the process is currently executing on
it.)

SS_DISABLE
The alternate signal stack is currently disabled.

Alternatively, this value is returned if the process is
currently executing on an alternate signal stack that was
established using the SS_AUTODISARM flag. In this case, it
is safe to switch away from the signal handler with swapâ
context(3). It is also possible to set up a different
alternative signal stack using a further call to sigaltâ
stack().

So, given the discussion so far, I think that last sentence needs to
be changed. A simple change would be to just remove the sentence.
However, as I've tried to bring out, I think that *not* documenting
something is generally not a winning strategy. Eventually, people
figure out what's possible, and someone may try using the feature.
So, as well as removing that sentence, how about some text something
like the following in NOTES:

When old_ss.ss_flags returns SS_DISABLE, meaning the
process is currently executing a signal handler ("X")
on an alternate signal stack that was established
using the SS_AUTODISARM flag, then it is
possible inside that handler to set up a new
alternative signal stack using a further call to
sigaltstack(). However, this is not recommended. Taking
advantage of this possibility is inherently racy, since
the new alternate signal stack settings will be removed
when signal handler "X" returns.

?

Thanks,

Michael





--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/