On Tue, 23 Feb 2016 15:38:22 -0500 Jessica Yu <jeyu@xxxxxxxxxx> wrote:
Implement basic character sets for the '%[]' conversion specifier.
The '%[]' conversion specifier matches a nonempty sequence of characters
from the specified set of accepted (or with '^', rejected) characters
between the brackets. The substring matched is to be made up of characters
in (or not in) the set. This implementation differs from its glibc
counterpart in that it does not support character ranges (e.g., 'a-z' or
'0-9'), the hyphen '-' is *not* a special character, and the brackets
themselves cannot be matched.
Signed-off-by: Jessica Yu <jeyu@xxxxxxxxxx>
---
This patch adds support for the '%[' conversion specifier for sscanf().
This is useful in cases where we'd like to match substrings delimited by
something other than spaces. The original motivation for this patch
actually came from a livepatch discussion (See: https://lkml.org/lkml/2016/2/8/790),
where we were trying to come up with a clean way to parse symbol names with
substrings delimited by periods and commas.
It would be better to include the justification right here in the
changelog please. Not via some link-to-discussion and definitely not
below the ^--- marker! It's very important.
The deviation from the glibc behaviour is a bit of a worry,
particularly as it is done in a non-back-compat manner: code which
assumes "-" is non-magic might break if someone later adds range
support.
Presumably we can live with that - there won't be many callsites and
they can be grepped for. But please, let's get a description of all
these considerations into the code as a comment. Probably it would be
helpful to include a little usage example in that comment.
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -2714,6 +2714,47 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
num++;
}
continue;
+ case '[':
+ {
+ char *s = (char *)va_arg(args, char *);
+ char *set;
+ size_t (*op)(const char *str, const char *set);
+ size_t len = 0;
+ bool negate = (*(fmt) == '^');
+
+ if (field_width == -1)
+ field_width = SHRT_MAX;
+
+ op = negate ? &strcspn : &strspn;
+ if (negate)
+ fmt++;
+
+ len = strcspn(fmt, "]");
+ /* invalid format; stop here */
+ if (!len)
+ return num;
+
+ set = kstrndup(fmt, len, GFP_KERNEL);
Embedding a GFP_KERNEL allocation into vsscanf is problematic - it
limits the situations in which this functionality can be used.
afaict the allocation is there merely so we can null-terminate the
string so we can use existing library functions (strcspn, strspn). Is
that compromise really worth it? We could pretty easily convert
strcspn() into
strcnspn(const char *s, const char *reject, size_t len)
and convert strcspn() to call that (ifndef __HAVE_ARCH_STRCSPN)
In fact I think we could still use strspn() and strcspn() on `fmt'
directly? We just need to check for the return value exceeding `len'
and if so, treat that as a no-match?
+ if (!set)
+ return num;
+
+ /* advance fmt past ']' */
+ fmt += len + 1;
+
+ len = op(str, set);
+ /* no matches */
+ if (!len) {
+ kfree(set);
+ return num;
+ }
+
+ while (len-- && field_width--)
+ *s++ = *str++;
+ *s = '\0';
+ kfree(set);
+ num++;
+ }
+ continue;
case 'o':
base = 8;
break;