Re: [PATCH v2] dm: Add support for escaped characters in str_field_delimit()

From: Benjamin Marzinski
Date: Fri Jun 14 2024 - 16:12:42 EST


On Thu, Jun 13, 2024 at 04:26:32PM +0000, Abhinav Jain wrote:
> Remove all the escape characters that come before separator.
> Tested this code by writing a dummy program containing the two
> functions and testing it on below input, sharing results:
>
> Original string: "field1\,with\,commas,field2\,with\,more\,commas"
> Field: "field1"
> Field: "with"
> Field: "commas"
> Field: "field2"
> Field: "with"
> Field: "more"
> Field: "commas"

But that's not the output that you want here. The purpose of escaping
the separator is so that the seraptor character remains in the field
without the escape character and without acting as a seperator.

The output you would want is:

Field: "field1,with,commas"
Field: "field2,with,more,commas"

>
> Signed-off-by: Abhinav Jain <jain.abhinav177@xxxxxxxxx>
> ---
> PATCH v1:
> https://lore.kernel.org/all/20240609141721.52344-1-jain.abhinav177@xxxxxxxxx/
>
> Changes since v1:
> - Modified the str_field_delimit function as per shared feedback
> - Added remove_escaped_characters function
> ---
> ---
> drivers/md/dm-init.c | 53 +++++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 47 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/md/dm-init.c b/drivers/md/dm-init.c
> index 2a71bcdba92d..0e31ecf1b48e 100644
> --- a/drivers/md/dm-init.c
> +++ b/drivers/md/dm-init.c
> @@ -76,6 +76,24 @@ static void __init dm_setup_cleanup(struct list_head *devices)
> }
> }
>
> +/* Remove escape characters from a given field string. */
> +static void __init remove_escape_characters(char *field)
> +{

This means that there is no way to have the escape character in a field,
which is a valid character in device-mapper names and UUIDS. "bad\name"
is a valid device-mapper name. So is "badname\".

This brings up a different point, both the separator characters and the
escape character aren't valid for udev names. If you want this to work
correctly with udev and let users enter these unfortunate names, there
is more mangling that will need to get done later. I'm not sure all this
is super useful just to let people use poorly chosen device names/uuids.
Is there some other purpose for this work that I'm missing?

Assuming there are reasons to do this work, the only strings that need
to be changed by this function are.

<start_of_string>\<seperator><rest_of_string>

Which needs to be changed to

<start_of_string><seperator><rest_of_string>

and

<start_of_string>\\

which needs to be changed to

<start_of_string>\

This is assuming that "\\<seperator>" is what you would use to end your
field in a \, escaping the escape, so that it didn't interere with the
seperator.

> + char *src = field;
> + char *dest = field;
> +
> + while (*src) {
> + if (*src == '\\') {
> + src++;
> + if (*src)
> + *dest++ = *src++;
> + } else {
> + *dest++ = *src++;
> + }
> + }
> + *dest = '\0';
> +}
> +
> /**
> * str_field_delimit - delimit a string based on a separator char.
> * @str: the pointer to the string to delimit.
> @@ -87,16 +105,39 @@ static void __init dm_setup_cleanup(struct list_head *devices)
> */
> static char __init *str_field_delimit(char **str, char separator)
> {
> - char *s;
> + char *s, *escaped, *field;
>
> - /* TODO: add support for escaped characters */
> *str = skip_spaces(*str);
> s = strchr(*str, separator);
> - /* Delimit the field and remove trailing spaces */
> - if (s)
> +
> + /* Check for escaped character */
> + escaped = strchr(*str, '\\');
> + while (escaped && (s == NULL || escaped < s)) {
> + /*
> + * Move the separator search ahead if escaped
> + * character comes before.
> + */
> + s = strchr(escaped + 1, separator);
> + escaped = strchr(escaped + 1, '\\');
> + }
> +

This code still splits the string at every seperator. It should probably
just scan for separators, and split the string when it finds the first
one that does not have exactly one escape character before it.

> + /* If we found a separator, we need to handle escape characters */
> + if (s) {
> + *s = '\0';
> +
> + remove_escape_characters(*str);
> + field = *str;
> + *str = s + 1;
> + } else {
> + /* Handle the last field when no separator is present */

If no separator is present, there's nothing to do. strlen() only works
on strings that are already null-terminated.

> + s = *str + strlen(*str);
> *s = '\0';
> - *str = strim(*str);

Why skip trimming the string?

> - return s ? ++s : NULL;
> +
> + remove_escape_characters(*str);
> + field = *str;
> + *str = s;
> + }

This function is supposed to return the rest of the string after the
separator. and *str is supposed to point to the start of the field
after skipping the initial spaces.

-Ben

> + return field;
> }
>
> /**
> --
> 2.34.1