Re: [RFC] metadata updates vs. fetches (was Re: [PATCH v4] fs: Fix data race in inode_set_ctime_to_ts)

From: Linus Torvalds
Date: Sun Nov 24 2024 - 19:54:07 EST


On Sun, 24 Nov 2024 at 15:53, Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>
> a file time which is newer than the actual time of the file. I tried
> to construct an example, and I couldn't. eg:
>
> A: WRITE_ONCE(inode->sec, 5)
> A: WRITE_ONCE(inode->nsec, 950)
> A: WRITE_ONCE(inode->sec, 6)
> B: READ_ONCE(inode->nsec)
> B: READ_ONCE(inode->sec)
> A: WRITE_ONCE(inode->sec, 170)
> A: WRITE_ONCE(inode->sec, 7)
> A: WRITE_ONCE(inode->sec, 950)
> B: READ_ONCE(inode->nsec)

I assume those WRITE_ONCE(170/190) should be nsec.

Also note that as long as you insist on using READ_ONCE, your example
is completely bogus due to memory ordering issues. It happens to work
on x86 where all reads are ordered, but on other architectures you'd
literally re-order the reads in question completely.

So assuming we make the read_once be "smp_read_acquire()" to make them
ordered wrt each other, and make the writes ordered with smp_wmb() or
smp_store_release(), I think it still can fail.

Look, let's write 5.000950, 6.000150 and 7.000950, while there is a
single reader (and let's assume these are all properly ordered reads
and writes):

W1.s 5
W1.ns 950
W2.s 6
R.ns (950)
R.s (6)
W2.ns 150
W3.s 7
W3.ns 950
R.ns (950)

and look how the reader is happy, because it got the same nanoseconds
twice. But the reader thinks it had a time of 6.000950, and AT NO
POINT was that actually a valid time.

Linus