Re: [GIT PULL] inode->i_version rework for v4.16
From: Jeff Layton
Date: Tue Jan 30 2018 - 07:06:01 EST
On Mon, 2018-01-29 at 13:50 -0800, Linus Torvalds wrote:
> On Mon, Jan 29, 2018 at 4:26 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> >
> > This pile of patches is a rework of the inode->i_version field. We have
> > traditionally incremented that field on every inode data or metadata
> > change. Typically this increment needs to be logged on disk even when
> > nothing else has changed, which is rather expensive.
>
> Hmm. I have pulled this, but it is really really broken in one place,
> to the degree that I always went "no, I won't pull this garbage".
>
> But the breakage is potential, not actual, and can be fixed trivially,
> so I'll let it slide - but I do require it to be fixed. And I require
> people to *think* about it.
>
> So what's to horribly horribly wrong?
>
> The inode_cmp_iversion{+raw}() functions are pure and utter crap.
>
> Why?
>
> You say that they return 0/negative/positive, but they do so in a
> completely broken manner. They return that ternary value as the
> sequence number difference in a 's64', which means that if you
> actually care about that ternary value, and do the *sane* thing that
> the kernel-doc of the function implies is the right thing, you would
> do
>
> int cmp = inode_cmp_iversion(inode, old);
>
> if (cmp < 0 ...
>
> and as a result you get code that looks sane, but that doesn't
> actually *WORK* right.
>
My intent here was to have this handle wraparound using the same sort of
method that the time_before/time_after macros do. Obviously, I didn't
document that well.
I want to make sure I understand what's actually broken here thoug. Is
it only broken when the two values are more than 2**63 apart, or is
there something else more fundamentally wrong here?
> To make it even worse, it will actually work in practice by accident
> in 99.99999% of all cases, so now you have
>
> (a) subtly buggy code
> (b) that looks fine
> (c) and that works in testing
>
> which is just about the worst possible case for any code. The
> interface is simply garbage that encourages bugs.
>
> And the bug wouldn't be in the user, the bug would be in this code you
> just sent me. The interface is simply wrong.
>
> So this absolutely needs to be fixed. I see two fixes:
>
> - just return a boolean. That's all that any current user actually
> wants, so the ternary value seems pointless.
>
> - make it return an 'int', and not just any int, but -1/0/1. That way
> there is no worry about uses, and if somebody *really* cares about the
> ternary value, they can now use a "switch" statement to get it
> (alternatively, make it return an enum, but whatever).
>
> That "ternary" function that has 18446744069414584320 incorrect return
> values really is unacceptable.
>
I think I'll just make it return a boolean value like you suggested
first. I'll send a patch to fix it once I've done some basic testing
with it.
Many thanks,
--
Jeff Layton <jlayton@xxxxxxxxxx>