Re: [PATCH v2 1/2] lib/strtox: introduce kstrtoull_suffix() helper

From: David Disseldorp
Date: Wed Dec 20 2023 - 00:39:24 EST


On Wed, 20 Dec 2023 10:40:00 +1030, Qu Wenruo wrote:

> Just as mentioned in the comment of memparse(), the simple_stroull()
> usage can lead to overflow all by itself.
>
> Furthermore, the suffix calculation is also super overflow prone because
> that some suffix like "E" itself would eat 60bits, leaving only 4 bits
> available.
>
> And that suffix "E" can also lead to confusion since it's using the same
> char of hex Ox'E'.
>
> One simple example to expose all the problem is to use memparse() on
> "25E".
> The correct value should be 28823037615171174400, but the suffix E makes
> it super simple to overflow, resulting the incorrect value
> 10376293541461622784 (9E).
>
> So here we introduce a new helper to address the problem,
> kstrtoull_suffix():
>
> - Enhance _kstrtoull()
> This allow _kstrtoull() to return even if it hits an invalid char, as
> long as the optional parameter @retptr is provided.
>
> If @retptr is provided, _kstrtoull() would try its best to parse the
> valid part, and leave the remaining to be handled by the caller.
>
> If @retptr is not provided, the behavior is not altered.
>
> - New kstrtoull_suffix() helper
> This new helper utilize the new @retptr capability of _kstrtoull(),
> and provides 2 new ability:
>
> * Allow certain suffixes to be chosen
> The recommended suffix list is "KMGTP" (using the new unit_suffix
> enum as a bitmap), excluding the overflow prone "E".
> Undermost cases there is really no need to use "E" suffix anyway.
> And for those who really need that exabytes suffix, they can enable
> that suffix pretty easily.
>
> * Add overflow checks for the suffixes
> If the original number string is fine, but with the extra left
> shift overflow happens, then -EOVERFLOW is returned.
>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Cc: Christophe JAILLET <christophe.jaillet@xxxxxxxxxx>
> Cc: Andy Shevchenko <andriy.shevchenko@xxxxxxxxxxxxxxx>
> Cc: David Laight <David.Laight@xxxxxxxxxx>
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
>
> ----
> Changelog:
> v2:
> - Use enum bitmap to describe the suffixes
> This gets rid of the upper/lower case problem, and enum makes it
> a little more readable.
>
> - Fix the suffix overflow detection
>
> - Move the left shift out of the switch block
>
> - Remove the "done" tag
> Since no tailing character can already be handled properly.

nit: git am puts this changelog in the commit message when applied.
Please use `git send-email --annotate` and put it next to the diffstat,
so that it gets discarded.

> ---
> include/linux/kstrtox.h | 20 ++++++++
> lib/kstrtox.c | 108 ++++++++++++++++++++++++++++++++++++++--
> 2 files changed, 123 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/kstrtox.h b/include/linux/kstrtox.h
> index 7fcf29a4e0de..edac52d18a8e 100644
> --- a/include/linux/kstrtox.h
> +++ b/include/linux/kstrtox.h
> @@ -9,6 +9,26 @@
> int __must_check _kstrtoul(const char *s, unsigned int base, unsigned long *res);
> int __must_check _kstrtol(const char *s, unsigned int base, long *res);
>
> +/*
> + * The default suffix list would not include "E" since it's too easy to overflow
> + * and not much real world usage.
> + */
> +enum unit_suffix {
> + SUFFIX_K = (1 << 0),
> + SUFFIX_M = (1 << 1),
> + SUFFIX_G = (1 << 2),
> + SUFFIX_T = (1 << 3),
> + SUFFIX_P = (1 << 4),
> + SUFFIX_E = (1 << 5),
> +};
> +
> +/*
> + * The default suffix list would not include "E" since it's too easy to overflow
> + * and not much real world usage.
> + */

^ this comment is a duplicate.

> +#define KSTRTOULL_SUFFIX_DEFAULT (SUFFIX_K | SUFFIX_M | SUFFIX_G | SUFFIX_T | SUFFIX_P)

I think it'd be clearer if you dropped this default and had callers
explicitly provide the desired suffix mask.

> +int kstrtoull_suffix(const char *s, unsigned int base, unsigned long long *res,
> + enum unit_suffix suffixes);

__must_check would be consistent with the other helpers...

> int __must_check kstrtoull(const char *s, unsigned int base, unsigned long long *res);
> int __must_check kstrtoll(const char *s, unsigned int base, long long *res);
>
> diff --git a/lib/kstrtox.c b/lib/kstrtox.c
> index d586e6af5e5a..8a2fdd1e3376 100644
> --- a/lib/kstrtox.c
> +++ b/lib/kstrtox.c
> @@ -93,7 +93,8 @@ unsigned int _parse_integer(const char *s, unsigned int base, unsigned long long
> return _parse_integer_limit(s, base, p, INT_MAX);
> }
>
> -static int _kstrtoull(const char *s, unsigned int base, unsigned long long *res)
> +static int _kstrtoull(const char *s, unsigned int base, unsigned long long *res,
> + char **retptr)
> {
> unsigned long long _res;
> unsigned int rv;
> @@ -105,11 +106,19 @@ static int _kstrtoull(const char *s, unsigned int base, unsigned long long *res)
> if (rv == 0)
> return -EINVAL;
> s += rv;
> - if (*s == '\n')
> +
> + /*
> + * If @retptr is provided, caller is responsible to detect
> + * the extra chars, otherwise we can skip one newline.
> + */
> + if (!retptr && *s == '\n')
> s++;
> - if (*s)
> + if (!retptr && *s)
> return -EINVAL;
> +
> *res = _res;
> + if (retptr)
> + *retptr = (char *)s;
> return 0;
> }
>
> @@ -133,10 +142,99 @@ int kstrtoull(const char *s, unsigned int base, unsigned long long *res)
> {
> if (s[0] == '+')
> s++;
> - return _kstrtoull(s, base, res);
> + return _kstrtoull(s, base, res, NULL);
> }
> EXPORT_SYMBOL(kstrtoull);
>
> +/**
> + * kstrtoull_suffix - convert a string to ull with suffixes support
> + * @s: The start of the string. The string must be null-terminated, and may also
> + * include a single newline before its terminating null.
> + * @base: The number base to use. The maximum supported base is 16. If base is
> + * given as 0, then the base of the string is automatically detected with the
> + * conventional semantics - If it begins with 0x the number will be parsed as a
> + * hexadecimal (case insensitive), if it otherwise begins with 0, it will be
> + * parsed as an octal number. Otherwise it will be parsed as a decimal.
> + * @res: Where to write the result of the conversion on success.
> + * @suffixes: bitmap of acceptable suffixes, unknown bits would be ignored.
> + *
> + * Return 0 on success.
> + *
> + * Return -ERANGE on overflow or -EINVAL if invalid chars found.
> + * Return value must be checked.
> + */
> +int kstrtoull_suffix(const char *s, unsigned int base, unsigned long long *res,
> + enum unit_suffix suffixes)
> +{
> + unsigned long long value;
> + int shift = 0;
> + char *endptr;
> + int ret;
> +
> + ret = _kstrtoull(s, base, &value, &endptr);
> + /* Either already overflow or no number string at all. */
> + if (ret < 0)
> + return ret;
> +
> + switch (*endptr) {
> + case 'K':
> + case 'k':
> + if (!(suffixes & SUFFIX_K))
> + return -EINVAL;
> + shift = 10;
> + break;
> + case 'M':
> + case 'm':
> + if (!(suffixes & SUFFIX_M))
> + return -EINVAL;
> + shift = 20;
> + break;
> + case 'G':
> + case 'g':
> + if (!(suffixes & SUFFIX_G))
> + return -EINVAL;
> + shift = 30;
> + break;
> + case 'T':
> + case 't':
> + if (!(suffixes & SUFFIX_T))
> + return -EINVAL;
> + shift = 40;
> + break;
> + case 'P':
> + case 'p':
> + if (!(suffixes & SUFFIX_P))
> + return -EINVAL;
> + shift = 50;
> + break;
> + case 'E':
> + case 'e':
> + if (!(suffixes & SUFFIX_E))
> + return -EINVAL;
> + shift = 60;
> + break;
> + }
> + if (shift)
> + endptr++;
> + if (*endptr == '\n')
> + endptr++;
> + if (*endptr)
> + return -EINVAL;
> +
> + /*
> + * Overflow check.
> + *
> + * Check @shift before doing right shift, as if right shift bit is
> + * greater than or equal to the number of bits, the result is undefined.
> + */
> + if (shift && value >> (64 - shift))
> + return -EOVERFLOW;
> + value <<= shift;

Might as well roll this logic in with the existing "if (shift)"
above, e.g.
if (shift) {
/*
* Overflow check...
*/
if (value >> (64 - shift))
return -EOVERFLOW;
value <<= shift;
endptr++;
}
if (*endptr == '\n')
...

> + *res = value;
> + return 0;
> +}
> +EXPORT_SYMBOL(kstrtoull_suffix);
> +
> /**
> * kstrtoll - convert a string to a long long
> * @s: The start of the string. The string must be null-terminated, and may also
> @@ -159,7 +257,7 @@ int kstrtoll(const char *s, unsigned int base, long long *res)
> int rv;
>
> if (s[0] == '-') {
> - rv = _kstrtoull(s + 1, base, &tmp);
> + rv = _kstrtoull(s + 1, base, &tmp, NULL);
> if (rv < 0)
> return rv;
> if ((long long)-tmp > 0)


With the above changes made, feel free to add
Reviewed-by: David Disseldorp <ddiss@xxxxxxx>

I'll leave the review of patch 2/2 up to others, as I'm still a little
worried about sysfs trailing whitespace regressions.