Re: [PATCH v4 3/4] rust: str: add radix prefixed integer parsing functions

From: Andreas Hindborg
Date: Tue Feb 04 2025 - 04:52:43 EST


Hi Gary,

Sorry, I missed this email when sending v5. Thanks for the comments!

"Gary Guo" <gary@xxxxxxxxxxx> writes:

> On Thu, 09 Jan 2025 11:54:58 +0100
> Andreas Hindborg <a.hindborg@xxxxxxxxxx> wrote:
>
>> Add the trait `ParseInt` for parsing string representations of integers
>> where the string representations are optionally prefixed by a radix
>> specifier. Implement the trait for the primitive integer types.
>>
>> Signed-off-by: Andreas Hindborg <a.hindborg@xxxxxxxxxx>
>> ---
>> rust/kernel/str.rs | 118 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 118 insertions(+)
>>
>> diff --git a/rust/kernel/str.rs b/rust/kernel/str.rs
>> index 9c446ff1ad7adba7ca09a5ae9df00fd369a32899..14da40213f9eafa07a104eba3129efe07c8343f3 100644
>> --- a/rust/kernel/str.rs
>> +++ b/rust/kernel/str.rs
>> @@ -914,3 +914,121 @@ fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
>> macro_rules! fmt {
>> ($($f:tt)*) => ( core::format_args!($($f)*) )
>> }
>> +
>> +pub mod parse_int {
>> + //! Integer parsing functions for parsing signed and unsigned integers
>> + //! potentially prefixed with `0x`, `0o`, or `0b`.
>> +
>> + use crate::alloc::flags;
>> + use crate::prelude::*;
>> + use crate::str::BStr;
>> +
>> + /// Trait that allows parsing a [`&BStr`] to an integer with a radix.
>> + ///
>> + /// [`&BStr`]: kernel::str::BStr
>> + // This is required because the `from_str_radix` function on the primitive
>> + // integer types is not part of any trait.
>> + pub trait FromStrRadix: Sized {
>> + /// Parse `src` to `Self` using radix `radix`.
>> + fn from_str_radix(src: &BStr, radix: u32) -> Result<Self, crate::error::Error>;
>> + }
>> +
>> + /// Extract the radix from an integer literal optionally prefixed with
>> + /// one of `0x`, `0X`, `0o`, `0O`, `0b`, `0B`, `0`.
>> + fn strip_radix(src: &BStr) -> (u32, &BStr) {
>> + if let Some(n) = src.strip_prefix(b_str!("0x")) {
>> + (16, n)
>> + } else if let Some(n) = src.strip_prefix(b_str!("0X")) {
>> + (16, n)
>> + } else if let Some(n) = src.strip_prefix(b_str!("0o")) {
>> + (8, n)
>> + } else if let Some(n) = src.strip_prefix(b_str!("0O")) {
>> + (8, n)
>> + } else if let Some(n) = src.strip_prefix(b_str!("0b")) {
>> + (2, n)
>> + } else if let Some(n) = src.strip_prefix(b_str!("0B")) {
>> + (2, n)
>> + } else if let Some(n) = src.strip_prefix(b_str!("0")) {
>> + (8, n)
>> + } else {
>> + (10, src)
>> + }
>
> This can be done better with a match:
>
> match src.deref() {
> [b'0', b'x' | b'X', ..] => (16, &src[2..]),
> [b'0', b'o' | b'O', ..] => (8, &src[2..]),
> [b'0', b'b' | b'B', ..] => (2, &src[2..]),
> [b'0', ..] => (8, &src[1..]),
> _ => (10, src),
> }

Thanks, will add. I was not aware that matching syntax was this powerful.

>
>> + }
>> +
>> + /// Trait for parsing string representations of integers.
>> + ///
>> + /// Strings beginning with `0x`, `0o`, or `0b` are parsed as hex, octal, or
>> + /// binary respectively. Strings beginning with `0` otherwise are parsed as
>> + /// octal. Anything else is parsed as decimal. A leading `+` or `-` is also
>> + /// permitted. Any string parsed by [`kstrtol()`] or [`kstrtoul()`] will be
>> + /// successfully parsed.
>> + ///
>> + /// [`kstrtol()`]: https://www.kernel.org/doc/html/latest/core-api/kernel-api.html#c.kstrtol
>> + /// [`kstrtoul()`]: https://www.kernel.org/doc/html/latest/core-api/kernel-api.html#c.kstrtoul
>> + ///
>> + /// # Example
>> + /// ```
>> + /// use kernel::str::parse_int::ParseInt;
>> + /// use kernel::b_str;
>> + ///
>> + /// assert_eq!(Ok(0xa2u8), u8::from_str(b_str!("0xa2")));
>> + /// assert_eq!(Ok(-0xa2i32), i32::from_str(b_str!("-0xa2")));
>> + ///
>> + /// assert_eq!(Ok(-0o57i8), i8::from_str(b_str!("-0o57")));
>> + /// assert_eq!(Ok(0o57i8), i8::from_str(b_str!("057")));
>> + ///
>> + /// assert_eq!(Ok(0b1001i16), i16::from_str(b_str!("0b1001")));
>> + /// assert_eq!(Ok(-0b1001i16), i16::from_str(b_str!("-0b1001")));
>> + ///
>> + /// assert_eq!(Ok(127), i8::from_str(b_str!("127")));
>> + /// assert!(i8::from_str(b_str!("128")).is_err());
>> + /// assert_eq!(Ok(-128), i8::from_str(b_str!("-128")));
>> + /// assert!(i8::from_str(b_str!("-129")).is_err());
>> + /// assert_eq!(Ok(255), u8::from_str(b_str!("255")));
>> + /// assert!(u8::from_str(b_str!("256")).is_err());
>> + /// ```
>> + pub trait ParseInt: FromStrRadix {
>> + /// Parse a string according to the description in [`Self`].
>> + fn from_str(src: &BStr) -> Result<Self> {
>> + match src.iter().next() {
>> + None => Err(EINVAL),
>> + Some(sign @ b'-') | Some(sign @ b'+') => {
>> + let (radix, digits) = strip_radix(BStr::from_bytes(&src[1..]));
>> + let mut n_digits: KVec<u8> =
>> + KVec::with_capacity(digits.len() + 1, flags::GFP_KERNEL)?;
>
> I don't think we should allocate for parsing. This can trivially be a
> non-allocating. Just check that the next byte is an ASCII digit (reject
> if so, in case people give multiple signs), and then from_str_radix and
> return as is or use `checked_neg`.

The issue with that approach is that 2s complement signed integer types
of width `b` can assume values from -2^(b-1) to (2^(b-1))-1. We would
reject the value -2^(b-1) when trying to parse as 2^(b-1).

We could parse into an unsigned type, but it gets kind of clunky.

Another option is to stop relying on `from_str_radix` from core and roll
our own that takes sign as a separate function argument.


Best regards,
Andreas Hindborg