Re: [GIT PULL] please pull file-locking related changes for v3.20
From: Jeff Layton
Date: Mon Feb 16 2015 - 14:59:46 EST
On Mon, 16 Feb 2015 11:24:03 -0800
Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Mon, Feb 16, 2015 at 10:46 AM, Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > This code is so broken that my initial reaction is "We need to just
> > revert the crap".
Sure, no problem reverting the lock counters. They were an optional
thing anyway...
>
> How the hell is flock_lock_file() supposed to work at all, btw?
>
> Say we have an existing flock, and now do a new one that conflicts. I
> see what looks like three separate bugs.
>
> - We go through the first loop, find a lock of another type, and
> delete it in preparation for replacing it
>
> - we *drop* the lock context spinlock.
>
> - BUG #1? So now there is no lock at all, and somebody can come in
> and see that unlocked state. Is that really valid?
>
> - another thread comes in while the first thread dropped the lock
> context lock, and wants to add its own lock. It doesn't see the
> deleted or pending locks, so it just adds it
>
> - the first thread gets the context spinlock again, and adds the lock
> that replaced the original
>
> - BUG #2? So now there are *two* locks on the thing, and the next
> time you do an unlock (or when you close the file), it will only
> remove/replace the first one.
>
> Both of those bugs are due to the whole "drop the lock in the middle",
> which is pretty much always a mistake. BUG#2 could easily explain the
> warning Kirill reports, afaik.
>
Ahh, well spotted.
That drop the lock in the middle thing has always looked a little fishy
to me, but I had convinced myself a while back that it was OK. I think
you're correct that it's not, however.
I'll spin up a patch to remove it and we can see if that helps the
problem that Kirill saw.
> BUG#3 seems to be independent, and is about somebody replacing an
> existing lock, but the new lock conflicts. Again, the first loop will
> remove the old lock, and then the second loop will see the conflict,
> and return an error (and we may then end up waiting for it for the
> FILE_LOCK_DEFERRED case). Now the original lock is gone. Is that
> really right? That sounds bogus. *Failing* to insert a flock causing
> the old flock to go away?
>
No, I think you're correct here.
The main problem is a LOCK_SH -> LOCK_EX upgrade. If there are other
LOCK_SH locks on the file, then you'll both lose your lock and fail to
get the LOCK_EX lock.
I think that's fixable by ensuring that we don't actually remove the
lock until we're sure that we can replace it. I'll see about fixing
that up as well. Give me a couple of days to get those cleaned up and
tested and I'll post some patches that clean up the holes here.
At that point we can see if that fixes Kirill's problem.
> Now, flock semantics are pretty much insane, so maybe all these bugs
> except for #2 aren't actually bugs, and are "features" of flock. But
> bug #2 can't be a semantic feature.
>
> Is there something I'm missing here?
>
No, I think they're all bugs. flock semantics aren't great, but I'm
pretty sure you shouldn't lose your lock just because you tried to
upgrade one and failed to do so.
> This was all just looking at a *single* function. Quite frankly, I
> hate how the code also just does
>
> if (filp->f_op->flock)
> filp->f_op->flock(filp, F_SETLKW, &fl);
> else
> flock_lock_file(filp, &fl);
>
> and blithely assumes that some random filesystem will get the flock
> semantics right, when even the code code screwed it up this badly.
>
Sigh yeah. Most of the filesystems that define ->flock (or ->lock for
that matter) are remote or distributed filesystems, where we leave it
up to the server or lock manager (e.g. DLM) to get it right. I think we
don't have much choice but to assume that they do...
> And maybe I'm wrong, and there's some reason why none of the things
> above can actually happen, but it looks really bad to me.
>
> Linus
Thanks for helping sanity check this stuff.
--
Jeff Layton <jlayton@xxxxxxxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/