RE: [PATCH v2 04/10] fs/ntfs3: Add file operations and implementation
From: Konstantin Komarov
Date: Thu Aug 27 2020 - 12:19:32 EST
From: Pali Rohár <pali@xxxxxxxxxx>
Sent: Sunday, August 23, 2020 12:49 PM
>
> Hello Konstantin!
>
> On Friday 21 August 2020 16:25:15 Konstantin Komarov wrote:
> > diff --git a/fs/ntfs3/dir.c b/fs/ntfs3/dir.c
> > new file mode 100644
> > index 000000000000..5f1105f1283c
> > --- /dev/null
[]
> > +#include <linux/iversion.h>
> > +#include <linux/nls.h>
> > +
> > +#include "debug.h"
> > +#include "ntfs.h"
> > +#include "ntfs_fs.h"
> > +
> > +/*
> > + * Convert little endian Unicode 16 to UTF-8.
>
> I guess that by "Unicode 16" you mean UTF-16, right?
>
> Anyway, comment is incorrect as function does not support UTF-16 nor
> UTF-8. This function works only with UCS-2 encoding (not full UTD-16)
> and converts input buffer to NLS encoding, not UTF-8. Moreover kernel's
> NLS API does not support full UTF-8 and NLS's UTF-8 encoding is semi
> broken and limited to just 3-byte sequences. Which means it does not
> allow to access all UNICODE filenames.
>
> So result is that comment for uni_to_x8 function is incorrect.
>
> I would suggest to not use NLS API for encoding from/to UTF-8, but
> rather use utf16s_to_utf8s() and utf8s_to_utf16s() functions.
>
> See for example how it is implemented in exfat driver:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/exfat/nls.c
> Look for functions exfat_utf16_to_nls() and exfat_nls_to_utf16().
>
Hi Pali! Thank you. In V3 we'll rewrite this part alike to how its done in exfat. We also faced an issue in our encodings support tests once moved to utf8s_to_utf16s(). Had to copy the utf8s_to_utf16s() implementation from the kernel into our code, fix the issues for the problem part (also, fixed numerous sparse warnings for that code) and use the modified version. Could you please take a look there in V3 once posted, and share the feedback on do these changes to utf8s_to_utf16s() worth to be prepared separately as the patch to adapt in kernel?
> Ideally check if you can store character 💩 (Pile of Poo, U+1F4A9, does
> not fit into 3byte UTF-8 sequence) into filename and if there is correct
> interoperability between Windows and this new ntfs3 implementation.
>
This is great test case. We are placing it into our set of compatibility tests.