Re: [PATCH v2 0/7] udf: rework name conversions to fix multi-bytes characters support

From: Jan Kara
Date: Fri Jan 22 2016 - 11:41:51 EST


On Fri 15-01-16 02:44:18, Andrew Gabbasov wrote:
> V3:
>
> Patches 1 and 2 skipped from sending since they are already accepted
> by the maintainer (patch 2 with some changes comparing to V2).
>
> Patches 3 - 5 rebased on top of updated patch 2.
>
> Patch 6: Fixed a mistake in passing parameters to translate_to_linux():
> the third buffer and length, used for CRC calculation, should be
> passed without leading encoding character.

Thanks! For now I've taken patches 3-6 into my tree. I'll have a look at
patch 7 next week since I need a fresh mind for that.

Honza
>
> Patch 7: Main part of body of converting loops extracted to a separate
> helper function. Also, some other modifications addressing maintainer's
> comments to V2.
>
> V2:
>
> The single patch was split into several commits for separate logical
> steps. Also, some minor fixes were done in the code of the patches.
>
> V1:
>
> Current implementation has several issues in unicode.c, mostly related
> to handling multi-bytes characters in file names:
>
> - loop ending conditions in udf_CS0toUTF8 and udf_CS0toNLS functions do not
> properly catch the end of output buffer in case of multi-bytes characters,
> allowing out-of-bounds writing and memory corruption;
>
> - udf_UTF8toCS0 and udf_NLStoCS0 do not check the right boundary of output
> buffer at all, also allowing out-of-bounds writing and memory corruption;
>
> - udf_translate_to_linux does not take into account multi-bytes characters
> at all (although it is called after converting to UTF8 or NLS): maximal
> length of extension is counted as 5 bytes, that may be incorrect with
> multi-bytes characters; when inserting CRC and extension for long names
> (near the end of the buffer), they are inserted at fixed place at the end,
> that can break into the middle of the multi-bytes character;
>
> - when being converted from CS0 to UTF8 (or NLS), the name can be truncated
> (even if the sizes in bytes of input and output buffers are the same),
> but the following translating function does not know about it and does not
> insert CRC, as it is assumed by the specs.
>
> Because of the last item above, it looks like all the checks and
> conversions (re-coding and possible CRC insertions) should be done
> simultaneously in the single function. This means that the listed
> issues can not be fixed independently and separately. So, the whole
> conversion and translation support should be reworked.
>
> The proposed implementation below fixes the listed issues, and also has
> some additional features:
>
> - it gets rid of "struct ustr", since it actually just makes an unneeded
> extra copying of the buffer and does not have any other significant
> advantage;
>
> - it unifies UTF8 and NLS conversions support, since there is no much
> sense to separate these cases;
>
> - UDF_NAME_LEN constant adjusted to better reflect actual restrictions.
>
>
> Andrew Gabbasov (7):
> udf: Prevent buffer overrun with multi-byte characters
> udf: Check output buffer length when converting name to CS0
> udf: Parameterize output length in udf_put_filename
> udf: Join functions for UTF8 and NLS conversions
> udf: Adjust UDF_NAME_LEN to better reflect actual restrictions
> udf: Remove struct ustr as non-needed intermediate storage
> udf: Merge linux specific translation into CS0 conversion function
>
> fs/udf/namei.c | 16 +-
> fs/udf/super.c | 38 ++--
> fs/udf/udfdecl.h | 21 +-
> fs/udf/unicode.c | 620 ++++++++++++++++++++++---------------------------------
> 4 files changed, 281 insertions(+), 414 deletions(-)
>
> --
> 2.1.0
>
>
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR