Re: [RFC v2 03/83] Add super.h.

From: Andreas Dilger
Date: Thu Mar 15 2018 - 16:04:31 EST


On Mar 15, 2018, at 11:51 AM, Andiry Xu <jix024@xxxxxxxxxxxx> wrote:
>
> On Thu, Mar 15, 2018 at 2:05 AM, Arnd Bergmann <arnd@xxxxxxxx> wrote:
>> On Thu, Mar 15, 2018 at 7:11 AM, Andiry Xu <jix024@xxxxxxxxxxxx> wrote:
>>> On Wed, Mar 14, 2018 at 9:54 PM, Darrick J. Wong
>>> <darrick.wong@xxxxxxxxxx> wrote:
>>>> On Sat, Mar 10, 2018 at 10:17:44AM -0800, Andiry Xu wrote:
>>
>>>>> + /* s_mtime and s_wtime should be together and their order should not be
>>>>> + * changed. we use an 8 byte write to update both of them atomically
>>>>> + */
>>>>> + __le32 s_mtime; /* mount time */
>>>>> + __le32 s_wtime; /* write time */
>>>>
>>>> Hmmm, 32-bit timestamps? 2038 isn't that far away...
>>>>
>>>
>>> I will try fixing this in the next version.
>>
>> I would also recommend adding nanosecond-resolution timestamps.
>> In theory, a signed 64-bit nanosecond field is sufficient for each timestamp
>> (it's good for several hundred years), but the more common format uses
>> 64-bit seconds and 32-bit nanoseconds in other file systems.
>>
>> Unfortunately it looks, you will have to come up with a more sophisticated
>> update method above, even if you leave out the nanoseconds, you can't
>> easily rely on a 16-byte atomic update across architectures to deal with
>> the two 64-bit timestamps. For the superblock fields, you might be able
>> to get away with using second resolution, and then encoding the
>> timestamps as a signed 64-bit 'mkfs time' along with two unsigned
>> 32-bit times added on top, which gives you a range of 136 years mount
>> a file system after its creation.
>>
>
> I will take a look at other file systems.
>
> Superblock mtime is not a big problem as it is updated rarely. 64-bit
> seconds and 32-bit nanoseconds make the inode and log entry bigger,
> and updating file->atime cannot be done with a single 64bit update.
> That may be annoying and needs to use journaling.

If the 64-bit atomicity was really a performance issue, you could do
something like:

__u32 time_high = seconds >> 32;
__u64 time_low = seconds << 32 | nanoseconds;

and then you only need to update time_high with a journal operation if it
has changed from the current time_high value (about once every 140 years),
and the time_low can be set atomically. It needs a few extra cycles each
time (hidden with an unlikely()) vs. just setting both, but that is a win
if it avoids other CPU or IO overhead.

Cheers, Andreas





Attachment: signature.asc
Description: Message signed with OpenPGP