[PATCH v2 0/7] udf: rework name conversions to fix multi-bytes characters support

From: Andrew Gabbasov
Date: Thu Dec 24 2015 - 11:27:10 EST

Next message: Andrew Gabbasov: "[PATCH v2 7/7] udf: Merge linux specific translation into CS0 conversion function"
Previous message: Andrew Gabbasov: "[PATCH v2 4/7] udf: Join functions for UTF8 and NLS conversions"
Next in thread: Andrew Gabbasov: "[PATCH v2 1/7] udf: Prevent buffer overrun with multi-byte characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

V2:

The single patch was split into several commits for separate logical
steps. Also, some minor fixes were done in the code of the patches.

V1:

Current implementation has several issues in unicode.c, mostly related
to handling multi-bytes characters in file names:

- loop ending conditions in udf_CS0toUTF8 and udf_CS0toNLS functions do not
properly catch the end of output buffer in case of multi-bytes characters,
allowing out-of-bounds writing and memory corruption;

- udf_UTF8toCS0 and udf_NLStoCS0 do not check the right boundary of output
buffer at all, also allowing out-of-bounds writing and memory corruption;

- udf_translate_to_linux does not take into account multi-bytes characters
at all (although it is called after converting to UTF8 or NLS): maximal
length of extension is counted as 5 bytes, that may be incorrect with
multi-bytes characters; when inserting CRC and extension for long names
(near the end of the buffer), they are inserted at fixed place at the end,
that can break into the middle of the multi-bytes character;

- when being converted from CS0 to UTF8 (or NLS), the name can be truncated
(even if the sizes in bytes of input and output buffers are the same),
but the following translating function does not know about it and does not
insert CRC, as it is assumed by the specs.

Because of the last item above, it looks like all the checks and
conversions (re-coding and possible CRC insertions) should be done
simultaneously in the single function. This means that the listed
issues can not be fixed independently and separately. So, the whole
conversion and translation support should be reworked.

The proposed implementation below fixes the listed issues, and also has
some additional features:

- it gets rid of "struct ustr", since it actually just makes an unneeded
extra copying of the buffer and does not have any other significant
advantage;

- it unifies UTF8 and NLS conversions support, since there is no much
sense to separate these cases;

- UDF_NAME_LEN constant adjusted to better reflect actual restrictions.

Andrew Gabbasov (7):
udf: Prevent buffer overrun with multi-byte characters
udf: Check output buffer length when converting name to CS0
udf: Parameterize output length in udf_put_filename
udf: Join functions for UTF8 and NLS conversions
udf: Adjust UDF_NAME_LEN to better reflect actual restrictions
udf: Remove struct ustr as non-needed intermediate storage
udf: Merge linux specific translation into CS0 conversion function

fs/udf/namei.c | 16 +-
fs/udf/super.c | 38 ++--
fs/udf/udfdecl.h | 21 +-
fs/udf/unicode.c | 611 ++++++++++++++++++++++---------------------------------
4 files changed, 274 insertions(+), 412 deletions(-)

--
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Andrew Gabbasov: "[PATCH v2 7/7] udf: Merge linux specific translation into CS0 conversion function"
Previous message: Andrew Gabbasov: "[PATCH v2 4/7] udf: Join functions for UTF8 and NLS conversions"
Next in thread: Andrew Gabbasov: "[PATCH v2 1/7] udf: Prevent buffer overrun with multi-byte characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]