Re: [PATCHv2] fat: add config option to set UTF-8 mount option by default

From: Maciej S. Szmigiero
Date: Wed Mar 23 2016 - 12:41:32 EST


On 23.03.2016 13:57, Josh Boyer wrote:
> On Wed, Mar 23, 2016 at 8:27 AM, Geert Uytterhoeven
> <geert@xxxxxxxxxxxxxx> wrote:
>> On Wed, Mar 23, 2016 at 12:28 PM, Josh Boyer <jwboyer@xxxxxxxxxxxxxxxxx> wrote:
>>> On Wed, Mar 23, 2016 at 4:17 AM, Geert Uytterhoeven
>>> <geert@xxxxxxxxxxxxxx> wrote:
>>>> On Tue, Mar 8, 2016 at 2:53 PM, Maciej S. Szmigiero
>>>> <mail@xxxxxxxxxxxxxxxxxxxxx> wrote:
>>>>> FAT has long supported its own default file name encoding
>>>>> config setting, separate from CONFIG_NLS_DEFAULT.
>>>>>
>>>>> However, if UTF-8 encoded file names are desired FAT
>>>>> character set should not be set to utf8 since this would
>>>>> make file names case sensitive even if case insensitive
>>>>> matching is requested.
>>>>> Instead, "utf8" mount options should be provided to enable
>>>>> UTF-8 file names in FAT file system.
>>>>>
>>>>> Unfortunately, there was no possibility to set the default
>>>>> value of this option so on UTF-8 system "utf8" mount option
>>>>> had to be added manually to most FAT mounts.
>>>>>
>>>>> This patch adds config option to set such default value.
>>>>>
>>>>> Signed-off-by: Maciej S. Szmigiero <mail@xxxxxxxxxxxxxxxxxxxxx>
>>>>
>>>>> --- a/fs/fat/Kconfig
>>>>> +++ b/fs/fat/Kconfig
>>>>> @@ -93,8 +93,24 @@ config FAT_DEFAULT_IOCHARSET
>>>>> that most of your FAT filesystems use, and can be overridden
>>>>> with the "iocharset" mount option for FAT filesystems.
>>>>> Note that "utf8" is not recommended for FAT filesystems.
>>>>> - If unsure, you shouldn't set "utf8" here.
>>>>> + If unsure, you shouldn't set "utf8" here - select the next option
>>>>> + instead if you would like to use UTF-8 encoded file names by default.
>>>>> See <file:Documentation/filesystems/vfat.txt> for more information.
>>>>>
>>>>> Enable any character sets you need in File Systems/Native Language
>>>>> Support.
>>>>> +
>>>>> +config FAT_DEFAULT_UTF8
>>>>> + bool "Enable FAT UTF-8 option by default"
>>>>> + depends on VFAT_FS
>>>>> + default n
>>>>> + help
>>>>> + Set this if you would like to have "utf8" mount option set
>>>>> + by default when mounting FAT filesystems.
>>>>> +
>>>>> + Even if you say Y here can always disable UTF-8 for
>>>>> + particular mount by adding "utf8=0" to mount options.
>>>>> +
>>>>> + Say Y if you use UTF-8 encoding for file names, N otherwise.
>>>>> +
>>>>> + See <file:Documentation/filesystems/vfat.txt> for more information.
>>>>
>>>> What's the recommended value of CONFIG_FAT_DEFAULT_UTF8 for
>>>> a (distro) defconfig?
>>>
>>> Yes, I'm curious about this as well. My initial assumption is to
>>> leave it off, given that if you turn it on when it wasn't previously
>>> it will change the behavior. I would also assume that is why it is
>>> marked as default n.
>>
>> "default n" is superfluous, as all options default to "n" in the absence
>> of a default specifier.
>
> Yes, I know that. I meant that I assumed the patch author knows that
> too, and included it anyway as a helpful indicator that it shouldn't
> be turned on in most cases. At any rate, your question still stands
> and it would be nice to get an answer.

The default is 'n' here for compatibility with older .configs,
and to be consistent with the main FS NLS option (CONFIG_NLS_DEFAULT)
since it also defaults to non-UTF-8 encoding.

If file names are UTF-8 encoded then if FAT filesystems were always
mounted with utf8 mount option, or with CONFIG_FAT_DEFAULT_IOCHARSET or
"iocharset" mount option set to "utf8" (not recommended,
but I've seen for example Knoppix doing it) then with this options set
there is effectively no change in functionality.

If file names are UTF-8 encoded but none of conditions described in
the previous paragraph were true then UTF-8 file names were reinterpreted
as CONFIG_FAT_DEFAULT_IOCHARSET (by default iso8859-1) then converted
into UTF-16 for storage.

While this usually worked it weren't correct: file names containing
characters outside ASCII had them replaced with some garbage when
accessing such FS with UTF-8 correctly enabled (for example with this
option set) or on Windows.

However, if such conditions (UTF-8 file names but non-UTF-8 FAT mount
options) were present for a long time then it has to be taken into
consideration that there are likely at least a few file systems with
file names encoded in such way and it would be good not to change it
suddenly when people update their kernels.

> josh

Maciej