Re: [PATCH] udf: rework name conversions to fix multi-bytes characters support
From: Jan Kara
Date: Wed Dec 16 2015 - 06:25:57 EST
On Fri 11-12-15 05:44:32, Andrew Gabbasov wrote:
> Current implementation has several issues in unicode.c, mostly related
> to handling multi-bytes characters in file names:
Thanks for looking into these problems! Did you find them by code
inspection or do you actually have some problematic fs images?
> - loop ending conditions in udf_CS0toUTF8 and udf_CS0toNLS functions do not
> properly catch the end of output buffer in case of multi-bytes characters,
> allowing out-of-bounds writing and memory corruption;
Can you provide concrete example please? Do you mean the problem that ustr
can hold only UDF_NAME_LEN-2 characters and the code seems to assume it can
hold UDF_NAME_LEN characters?
> - udf_UTF8toCS0 and udf_NLStoCS0 do not check the right boundary of output
> buffer at all, also allowing out-of-bounds writing and memory corruption;
Ah, I guess you mean the case when maxval == 0xffff and we can possibly
encode one character of input into two characters of output, right?
> - udf_translate_to_linux does not take into account multi-bytes characters
> at all (although it is called after converting to UTF8 or NLS): maximal
> length of extension is counted as 5 bytes, that may be incorrect with
> multi-bytes characters; when inserting CRC and extension for long names
> (near the end of the buffer), they are inserted at fixed place at the end,
> that can break into the middle of the multi-bytes character;
>
> - when being converted from CS0 to UTF8 (or NLS), the name can be truncated
> (even if the sizes in bytes of input and output buffers are the same),
> but the following translating function does not know about it and does not
> insert CRC, as it is assumed by the specs.
>
> Because of the last item above, it looks like all the checks and
> conversions (re-coding and possible CRC insertions) should be done
> simultaneously in the single function. This means that the listed
> issues can not be fixed independently and separately. So, the whole
> conversion and translation support should be reworked.
>
> The proposed implementation below fixes the listed issues, and also has
> some additional features:
>
> - it gets rid of "struct ustr", since it actually just makes an unneeded
> extra copying of the buffer and does not have any other significant
> advantage;
>
> - it unifies UTF8 and NLS conversions support, since there is no much
> sense to separate these cases;
>
> - UDF_NAME_LEN constant adjusted to better reflect actual restrictions.
So I like the changes you do but this has to be split in several patches
because in current form the patch is very difficult to review. I suggest
something like following steps:
1) get rid of ustr
2) change definition of UDF_NAME_LEN, introduce UDF_NAME_LEN_CS0
3) join functions for NLS & UTF8 conversion
4) fix boundary checks
5) merge udf_translate_to_linux() into conversion from CS0 (frankly, I'm
not very happy with the complexity of the resulting function and would love
to split it up but I don't see any natural split).
Thanks!
Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/