Re: spinlock contention of files->file_lock

From: Linus Torvalds
Date: Mon Sep 30 2013 - 21:45:17 EST


On Mon, Sep 30, 2013 at 6:05 PM, Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote:
> Speaking of spinlock contention, the files->file_lock is a good example.
>
> Multi threaded programs hit this spinlock three times per fd :

.. do you actually have programs that see contention?

> alloc_fd() / fd_install() / close()
>
> I think fd_install() could be done without this spinlock.

Yes, I haven't thought much about it, but from a quick look it should
be trivial to do fd_install(). We already free the fdtable using rcu
(because we look things up without locking), so we could just get the
rcu read-lock, do fd_install() without any locking at all, and then
verify that the fd-table is still the same. Rinse and repeat if not.

> - Add a seqcount, and cmpxchg() to synchronize fd_install() with the
> potential resizer. (Might need additional RCU locking)

I don't think you even need that. alloc_fd() has already reserved the
spot, so I think you really can just assign without any fancy cmpxchg
at all. If you write to the old fdtable, who cares? Probably just
something like

do {
fdt = files_fdtable(files);
rcu_assign_pointer(fdt->fd[fd], file);
smp_mb();
} while (fdt != files_fdtable(files));

or similar. Under the RCU lock to guarantee the allocations don't
disappear from under you, but with no other locking.

Maybe I'm missing something, but it really doesn't look that bad.

Now, that only gets rid of fd_install(), but I suspect you could do
something similar for put_unused_fd() (that one does need cmpxchg for
the "next_fd" thing, though). We'd have to replace the non-atomic
bitops on open_fds[] with atomic ones, just to make sure adjacent bit
clearings don't screw up concurrent adjacent bit values, but that
looks fairly straightforward too.

Now, getting rid of the spinlock for alloc_fd() would be more work,
and you'd likely want to keep it for actually resizing (just to keep
that case more straightforward), but you could probably do the
non-resizing cases fairly easily there too without holding the lock.
Scan for a free bit optimistically and without any locking, and then
be somewhat careful with setting that open_fds[] bit - not only using
an atomic test_and_set() for it, but also do the same "let's check
that the fdtable base pointers haven't changed in the meantime and
repeat".

On the whole the fd table looks like if it really is a contention
point, it would be fairly straightforward to fix without any huge
contortions. Sure, lots of care and small details to look out for, but
nothing looks really all that fundamentally difficult.

But is it really a contention point on any real load?

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/