Re: [PATCH 1/1] rseq.2: New man page for the rseq(2) API

From: Mathieu Desnoyers
Date: Wed Feb 15 2023 - 12:09:28 EST


On 2023-02-14 20:20, G. Branden Robinson wrote:
[CC list violently trimmed; for those who remain, this is mostly man
page style issues]

[ Gently added linux-man to CC list. ;-) ]


At 2023-02-14T23:29:37+0100, Alejandro Colomar wrote:
On 2/14/23 20:54, Mathieu Desnoyers wrote:
+per-thread data structure shared between kernel and user-space.

This last 'user-space' is not adjectivated, so it should go without
a hyphen, according to common English rules.

+1

done



Also I like your coinage. "Adjectivated yeast" is reflexive and
tautological!

+.RB ( "struct rseq" )

We format types in italics, so this should be '.RI'.

+1

OK, so it's italics for both types and arguments.

I will replace all the bold markers for "struct rseq" and "struct rseq_cs" to italic in the description (but not in the synopsis section and not in the code snippets).


+Only one
+.BR rseq ()
+ABI can be registered per thread, so user-space libraries and
+applications must follow a user-space ABI defining how to share this
+resource.

Please use semantic newlines. See man-pages(7):

Use semantic newlines
In the source of a manual page, new sentences should be started on new
lines, long sentences should be split into lines at clause breaks (com‐
mas, semicolons, colons, and so on), and long clauses should be split
at phrase boundaries. This convention, sometimes known as "semantic
newlines", makes it easier to see the effect of patches, which often
operate at the level of individual sentences, clauses, or phrases.

I think I've said this before, but, strictly, commas in particular can
separate things that are not clauses. Clauses have subjects and
predicates.

Might it be better to say simply:

Start each sentence on a new line. Split long sentences where
punctuated by commas, semicolons, and colons.

With this there is not even any need to discuss "phrase boundaries".


I've modified to:

Only one
.BR rseq ()
ABI can be registered per thread,
so user-space libraries and applications must follow a user-space ABI
defining how to share this resource.

Hopefully that's correct.


In the above lines, that would mean breaking after the comma,
and not leaving resource in a line of its own.

The latter is inevitably going to happen from time to time simply due to
sentence length and structure and the line length used by one's text
editor. I don't think an "orphan word" (what typographers call this) is
symptomatic of anything in *roff source when filling is enabled.

+The ABI defining how to share this resource between applications and
+libraries is defined by the C library.
+Allocation of the per-thread
+.BR rseq ()
+ABI and its registration to the kernel is handled by glibc since version
+2.35.
+.PP
+The
+.BR rseq ()
+ABI per-thread data structure contains a
+.I rseq_cs
+field which points to the currently executing critical section.

currently-executing should probably use a hyphen
(if I understood the line correctly).

This is not the case, according to some style authorities. Dave Kemper
convinced me of this on the groff list.

Here is one resource.

https://www.editorgroup.com/blog/to-hyphenate-or-not-to-hyphenate/

See an interesting discussion in the groff@ mailing list:
<https://lists.gnu.org/archive/html/groff/2022-10/msg00015.html>

That's not _squarely_ on point, as none of "block", "device", or "based"
is an adverb. "Currently" is.

Leaving unchanged based on this discussion.


+For each thread, a single rseq critical section can run at any given
+point.
+Each critical section need to be implemented in assembly.

needs?

+1

done.


+.TP
+.B Structure alignment

Let's remove the bold here. It's not necessary for marking a constant
or something that needs bold. And the indentation is already making
it stand out, so bold is a bit too much aggressive to the reader.

I agree; if it wouldn't be styled in running text, it doesn't need
styling as a paragraph tag; it already stands out by dint of its
placement as a tag.

+Its value should always be confirmed by reading the cpu_id field before

cpu_id should be formatted (.I).

+1

done


+user-space performs any side-effect
+(e.g. storing to memory).
+.IP
+This field is always guaranteed to hold a valid CPU number in the range
+[ 0 .. nr_possible_cpus - 1 ].

Please use interval notation:
[0, nr_possible_cpus)
or
[0, nr_possible_cpus - 1]
whichever looks better to you.

We did some consistency fix recently:
<https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=147a60d792a5db8f3cb93ea16eefb73e16c1fb91>

Also, do we have a more standard way of saying nr_possible_cpus?
Should we say nproc?

nproc(1) means:

Print the number of processing units available to the current
process, which may be less than the number of online processors

Which is the number of cpus currently available (AFAIU the result of the
cpuset and sched affinity).

What I really mean here is the maximum value for possible cpus which can be hotplugged into the system. So it's not the maximum number of possible CPUs per se, but rather the maximum enabled bit in the possible CPUs mask.

Note that we could express this differently as well: rather than saying that it guarantees a value in the range [0, nr_possible_cpus - 1], we could say that the values are guaranteed to be part of the possible cpus mask, which would actually more accurate in case the possible cpus mask has a hole (it tends to happen with things like lxc containers nowadays).

Do you agree that we should favor expressing this in terms of belonging to the possible cpumask set rather than a range starting from 0 ?

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com