Re: [PATCH v3] sscanf: implement basic character sets

From: Rasmus Villemoes
Date: Tue Feb 23 2016 - 17:47:17 EST


On Tue, Feb 23 2016, Jessica Yu <jeyu@xxxxxxxxxx> wrote:

> Implement basic character sets for the '%[]' conversion specifier.
>
>
> lib/vsprintf.c | 41 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 41 insertions(+)
>
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index 525c8e1..983358a 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -2714,6 +2714,47 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
> num++;
> }
> continue;
> + case '[':
> + {
> + char *s = (char *)va_arg(args, char *);
> + char *set;
> + size_t (*op)(const char *str, const char *set);
> + size_t len = 0;
> + bool negate = (*(fmt) == '^');
> +
> + if (field_width == -1)
> + field_width = SHRT_MAX;
> +
> + op = negate ? &strcspn : &strspn;
> + if (negate)
> + fmt++;
> +
> + len = strcspn(fmt, "]");
> + /* invalid format; stop here */
> + if (!len)
> + return num;
> +
> + set = kstrndup(fmt, len, GFP_KERNEL);
> + if (!set)
> + return num;
> +
> + /* advance fmt past ']' */
> + fmt += len + 1;
> +
> + len = op(str, set);
> + /* no matches */
> + if (!len) {
> + kfree(set);
> + return num;
> + }
> +
> + while (len-- && field_width--)
> + *s++ = *str++;
> + *s = '\0';
> + kfree(set);
> + num++;
> + }
> + continue;
> case 'o':
> base = 8;
> break;

(1) How do we know that doing a memory allocation would be ok, and then
with GFP_KERNEL? vsnprintf can be called from just about any context, so
I don't think that would fly there. Sooner or later someone is going to
be calling sscanf with a spinlock held, methinks.

(2) I think a field width should be mandatory (so %[ should simply be
regarded as malformed - it should be %*[ or %n[ for some explicit
decimal n). That will allow the compiler or other static analyzers to do
sanity checking, and we'll probably be saved from a few buffer
overflows down the line.

It's a bit sad that the C standard doesn't include the terminating '\0'
in the field width, so one would sometimes have to write
'(int)sizeof(buf)-1', but there's not much to do about that. On that
note, it seems that your field width handling is off-by-one.

To get rid of the allocation, why not use a small bitmap? Something like

{
char *s = (char *)va_arg(args, char *);
DECLARE_BITMAP(map, 256) = {0};
bool negate = false;

/* a field width is required, and must provide room for at least a '\0' */
if (field_width <= 0)
return num;

if (*fmt == '^') {
negate = true;
++fmt;
}
for ( ; *fmt && *fmt != ']'; ++fmt)
set_bit((u8)*fmt, map);
if (!*fmt) // no ], so malformed input
return num;
++fmt;
if (negate) {
bitmap_complement(map, map, 256);
clear_bit(0, map); // this avoids testing *str != '\0' below
}

if (!test_bit((u8)*str, map)) // match must be non-empty
return num;
while (test_bit((u8)*str, map) && --field_width) {
*s++ = *str++;
}
*s = '\0';
++num;
}

Rasmus