Re: [PATCH] doc: convert printk-formats.txt to rst

From: Kees Cook
Date: Thu Dec 07 2017 - 17:50:49 EST


On Tue, Dec 5, 2017 at 5:45 PM, Tobin C. Harding <me@xxxxxxxx> wrote:
> Documentation/printk-formats.txt is a candidate for conversion to
> ReStructuredText format. Some effort has already been made to do this
> conversion even thought the suffix is currently .txt
>
> Changes required to complete conversion
>
> - Add double backticks where needed.
> - Add entry to Documentation/index.rst
> - Use flat-table instead of ASCII table.
> - Fix minor grammatical errors.
> - Capitalize headers and correctly order heading adornments.
> - Use 'Passed by reference' uniformly.
> - Update pointer documentation around %px specifier.
> - Fix erroneous double backticks (to commas).
> - Simplify documentation for kobject.
> - Convert lib/vsnprintf.c function docs to use kernel-docs and
> include in Documentation/printk-formats.rst

Awesome!

>
> Signed-off-by: Tobin C. Harding <me@xxxxxxxx>
> ---
>
> The last two need special reviewing please. In particular the conversion
> of kernel-docs in vsnprintf.c attempt was made to reduce documentation
> duplication with comments in the source code being simplified in order
> to be suitable for inclusion in Documentation/printk-formats.rst
>
> I used -M when formatting the patch. If you don't like the diff with
> this flag just holla.
>
> thanks,
> Tobin.
>
> Documentation/index.rst | 10 +
> .../{printk-formats.txt => printk-formats.rst} | 295 ++++++++++++---------
> lib/vsprintf.c | 160 +++++------
> 3 files changed, 235 insertions(+), 230 deletions(-)
> rename Documentation/{printk-formats.txt => printk-formats.rst} (61%)
>
> diff --git a/Documentation/index.rst b/Documentation/index.rst
> index cb7f1ba5b3b1..83ace60efbe7 100644
> --- a/Documentation/index.rst
> +++ b/Documentation/index.rst
> @@ -87,6 +87,16 @@ implementation.
>
> sh/index
>
> +Miscellaneous documentation
> +---------------------------
> +
> +These guides contain general information useful when writing kernel code.
> +
> +.. toctree::
> + :maxdepth: 1
> +
> + printk-formats

I actually think this belongs in the kernel core API documentation
list (Documentation/core-api/index.rst). I defer to Jon's opinion,
though.

> [...]
> +.. kernel-doc:: lib/vsprintf.c
> + :doc: Extended Format Pointer Specifiers

Awesome!

> [...]
> +For printing pointers when you *really* want to print the address. Please
> consider whether or not you are leaking sensitive information about the
> -Kernel layout in memory before printing pointers with %px. %px is
> -functionally equivalent to %lx. %px is preferred to %lx because it is more
> -uniquely grep'able. If, in the future, we need to modify the way the Kernel
> -handles printing pointers it will be nice to be able to find the call
> -sites.
> +kernel memory layout before printing pointers with ``%px``. ``%px`` is
> +functionally equivalent to ``%lx`` (or ``%lu``). ``%px``, however, is
> +preferable because it is more uniquely grep'able. If, in the future, we need
> +to modify the way the Kernel handles printing pointers we will be better
> +equipped to find the call sites.

nit? You de-capitalized Kernel at the start but not at the end. "the
Kernel handles ..."

> [...]
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index 01c3957b2de6..f9247b78e8ef 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -1727,115 +1727,73 @@ static char *ptr_to_id(char *buf, char *end, void *ptr, struct printf_spec spec)
> return number(buf, end, hashval, spec);
> }
>
> +/**
> + * DOC: Extended Format Pointer Specifiers
> + *
> + * Briefly we handle the following extensions:
> + *
> + * - ``F`` - For symbolic function descriptor pointers with offset.
> + * - ``f`` - For simple symbolic function names without offset.
> + *
> + * - ``S`` - For symbolic direct pointers with offset.
> + * - ``s`` - For symbolic direct pointers without offset.
> + * - ``[FfSs]R`` - As above with ``__builtin_extract_return_addr()`` translation.
> + * - ``B`` - For backtraced symbolic direct pointers with offset.
> + * - ``R`` - For decoded struct resource, e.g., [mem 0x0-0x1f 64bit pref].
> + * - ``r`` - For raw struct resource, e.g., [mem 0x0-0x1f flags 0x201].
> + * - ``b[l]`` - For a bitmap, the number of bits is determined by the field
> + * width which must be explicitly specified either as part of the format
> + * string ``32b[l]`` or through ``*b[l]``, ``[l]`` selects range-list format
> + * instead of hex format.
> + * - ``M`` - For a 6-byte MAC address, it prints the address in the usual
> + * colon-separated hex notation.
> + * - ``m`` - For a 6-byte MAC address, it prints the hex address without colons.
> + * - ``MF`` - For a 6-byte MAC FDDI address, it prints the address with a
> + * dash-separated hex notation.
> + * - ``[mM]R`` - For a 6-byte MAC address, Reverse order (Bluetooth).
> + * - ``I[46]`` - For IPv4/IPv6 addresses printed in the usual way.
> + * - ``I[S][pfs]`` - For generic IPv4/IPv6 address (struct sockaddr *) that falls
> + * back to ``[4]`` or ``[6]`` and is able to print port ``[p]``,
> + * flowinfo ``[f]``, scope ``[s]``.
> + * - ``i[46]`` - For 'raw' IPv4/IPv6 addresses IPv6 omits the colons (01020304...0f)
> + * IPv4 uses dot-separated decimal with leading 0's (010.123.045.006).
> + * - ``i[S][pfs]`` - For generic IPv4/IPv6 address (struct sockaddr *) that falls back
> + * to ``[4]`` or ``[6]`` (``[pfs]`` as above).
> + * - ``[Ii][4S][hnbl]`` - For IPv4 addresses in host, network, big or little endian order.
> + * - ``I[6S]c`` - For IPv6 addresses printed as per http://tools.ietf.org/html/rfc5952.
> + * - ``E[achnops]`` - For an escaped buffer.
> + * - ``U`` - For a 16 byte UUID/GUID.
> + * - ``V`` - For a ``struct va_format`` which contains a format ``string *``
> + * and ``va_list *``.
> + * - ``K`` - For a kernel pointer that should be hidden from unprivileged users.
> + * - ``NF`` - For a ``netdev_features_t``.
> + * - ``h[CDN]`` - For a variable-length buffer.
> + * - ``a[pd]`` - For address types ``[p] phys_addr_t``, ``[d] dma_addr_t`` and
> + * derivatives.
> + * - ``d[234]`` - For a dentry name (optionally 2-4 last components).
> + * - ``D[234]`` - Same as 'd' but for a struct file.
> + * - ``g`` - For ``block_device`` name (gendisk + partition number).
> + * - ``C[n]`` - For a clock, it prints the name (Common Clock Framework) or
> + * address (legacy clock framework) of the clock. ``[n]`` is optional.
> + * - ``Cr`` - For a clock, it prints the current rate of the clock.
> + * - ``G`` - For flags to be printed as a collection of symbolic strings that
> + * would construct the specific value.
> + * - ``O`` - For a kobject based struct (device node).
> + * - ``x`` - For printing the address. Equivalent to ``%lx``.
> + */
> +
> /*
> * Show a '%p' thing. A kernel extension is that the '%p' is followed
> * by an extra set of alphanumeric characters that are extended format
> * specifiers.
> *
> + * Please see Documentation/printk-formats.rst for fuller description
> + * of specifier extensions. Also please update this file when making
> + * changes.

Perhaps "... please update the kernel-doc above when making changes to
this file." ?

> + *
> * Please update scripts/checkpatch.pl when adding/removing conversion
> * characters. (Search for "check for vsprintf extension").
> *
> - * Right now we handle:
> - *
> - * - 'F' For symbolic function descriptor pointers with offset
> - * - 'f' For simple symbolic function names without offset
> - * - 'S' For symbolic direct pointers with offset
> - * - 's' For symbolic direct pointers without offset
> - * - '[FfSs]R' as above with __builtin_extract_return_addr() translation
> - * - 'B' For backtraced symbolic direct pointers with offset
> - * - 'R' For decoded struct resource, e.g., [mem 0x0-0x1f 64bit pref]
> - * - 'r' For raw struct resource, e.g., [mem 0x0-0x1f flags 0x201]
> - * - 'b[l]' For a bitmap, the number of bits is determined by the field
> - * width which must be explicitly specified either as part of the
> - * format string '%32b[l]' or through '%*b[l]', [l] selects
> - * range-list format instead of hex format
> - * - 'M' For a 6-byte MAC address, it prints the address in the
> - * usual colon-separated hex notation
> - * - 'm' For a 6-byte MAC address, it prints the hex address without colons
> - * - 'MF' For a 6-byte MAC FDDI address, it prints the address
> - * with a dash-separated hex notation
> - * - '[mM]R' For a 6-byte MAC address, Reverse order (Bluetooth)
> - * - 'I' [46] for IPv4/IPv6 addresses printed in the usual way
> - * IPv4 uses dot-separated decimal without leading 0's (1.2.3.4)
> - * IPv6 uses colon separated network-order 16 bit hex with leading 0's
> - * [S][pfs]
> - * Generic IPv4/IPv6 address (struct sockaddr *) that falls back to
> - * [4] or [6] and is able to print port [p], flowinfo [f], scope [s]
> - * - 'i' [46] for 'raw' IPv4/IPv6 addresses
> - * IPv6 omits the colons (01020304...0f)
> - * IPv4 uses dot-separated decimal with leading 0's (010.123.045.006)
> - * [S][pfs]
> - * Generic IPv4/IPv6 address (struct sockaddr *) that falls back to
> - * [4] or [6] and is able to print port [p], flowinfo [f], scope [s]
> - * - '[Ii][4S][hnbl]' IPv4 addresses in host, network, big or little endian order
> - * - 'I[6S]c' for IPv6 addresses printed as specified by
> - * http://tools.ietf.org/html/rfc5952
> - * - 'E[achnops]' For an escaped buffer, where rules are defined by combination
> - * of the following flags (see string_escape_mem() for the
> - * details):
> - * a - ESCAPE_ANY
> - * c - ESCAPE_SPECIAL
> - * h - ESCAPE_HEX
> - * n - ESCAPE_NULL
> - * o - ESCAPE_OCTAL
> - * p - ESCAPE_NP
> - * s - ESCAPE_SPACE
> - * By default ESCAPE_ANY_NP is used.

Is there no way to retain these per-flag details in the kernel-doc?
Seems a shame to split up the docs if we can keep it together in here.

> - * - 'U' For a 16 byte UUID/GUID, it prints the UUID/GUID in the form
> - * "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
> - * Options for %pU are:
> - * b big endian lower case hex (default)
> - * B big endian UPPER case hex
> - * l little endian lower case hex
> - * L little endian UPPER case hex
> - * big endian output byte order is:
> - * [0][1][2][3]-[4][5]-[6][7]-[8][9]-[10][11][12][13][14][15]
> - * little endian output byte order is:
> - * [3][2][1][0]-[5][4]-[7][6]-[8][9]-[10][11][12][13][14][15]
> - * - 'V' For a struct va_format which contains a format string * and va_list *,
> - * call vsnprintf(->format, *->va_list).
> - * Implements a "recursive vsnprintf".
> - * Do not use this feature without some mechanism to verify the
> - * correctness of the format string and va_list arguments.
> - * - 'K' For a kernel pointer that should be hidden from unprivileged users
> - * - 'NF' For a netdev_features_t
> - * - 'h[CDN]' For a variable-length buffer, it prints it as a hex string with
> - * a certain separator (' ' by default):
> - * C colon
> - * D dash
> - * N no separator
> - * The maximum supported length is 64 bytes of the input. Consider
> - * to use print_hex_dump() for the larger input.
> - * - 'a[pd]' For address types [p] phys_addr_t, [d] dma_addr_t and derivatives
> - * (default assumed to be phys_addr_t, passed by reference)
> - * - 'd[234]' For a dentry name (optionally 2-4 last components)
> - * - 'D[234]' Same as 'd' but for a struct file
> - * - 'g' For block_device name (gendisk + partition number)
> - * - 'C' For a clock, it prints the name (Common Clock Framework) or address
> - * (legacy clock framework) of the clock
> - * - 'Cn' For a clock, it prints the name (Common Clock Framework) or address
> - * (legacy clock framework) of the clock
> - * - 'Cr' For a clock, it prints the current rate of the clock
> - * - 'G' For flags to be printed as a collection of symbolic strings that would
> - * construct the specific value. Supported flags given by option:
> - * p page flags (see struct page) given as pointer to unsigned long
> - * g gfp flags (GFP_* and __GFP_*) given as pointer to gfp_t
> - * v vma flags (VM_*) given as pointer to unsigned long
> - * - 'O' For a kobject based struct. Must be one of the following:
> - * - 'OF[fnpPcCF]' For a device tree object
> - * Without any optional arguments prints the full_name
> - * f device node full_name
> - * n device node name
> - * p device node phandle
> - * P device node path spec (name + @unit)
> - * F device node flags
> - * c major compatible string
> - * C full compatible string
> - *
> - * - 'x' For printing the address. Equivalent to "%lx".
> - *
> - * ** Please update also Documentation/printk-formats.txt when making changes **
> - *
> * Note: The difference between 'S' and 'F' is that on ia64 and ppc64
> * function pointers are really function descriptors, which contain a
> * pointer to the real address.
> --
> 2.7.4
>

Nice work!

-Kees

--
Kees Cook
Pixel Security