Re: [tip:core/locking] x86/smp: Move waiting on contended ticket lockout of line
From: Linus Torvalds
Date: Thu Feb 28 2013 - 16:58:54 EST
On Thu, Feb 28, 2013 at 1:14 PM, Rik van Riel <riel@xxxxxxxxxx> wrote:
>
> I have modified one of the semop tests to use multiple semaphores.
Ooh yeah. This shows contention quite nicely. And it's all from
ipc_lock, and looking at the top-10 loffenders of the profile:
43.01% semop-multi [kernel.kallsyms] [k] _raw_spin_lock
...
4.73% semop-multi [kernel.kallsyms] [k] avc_has_perm_flags
4.52% semop-multi [kernel.kallsyms] [k] ipc_has_perm.isra.21
...
2.43% semop-multi [kernel.kallsyms] [k] ipcperms
The 43% isn't actually all that interesting, it just shows that there
is contention and we're waiting for other user. Yes, we waste almost
half the CPU time on locking, but ignore that for a moment.
The "more than 10% of the total time is spent in ipc permission code"
*is* the interesting part. Because that 10%+ is actually more like 20%
if you ignore the "wait for lock" part. And it's all done *inside* the
lock.
In other words, I can pretty much guarantee that the contention will
go down a lot if we just move the security check outside the spinlock.
According to the above numbers, we're currently spending basically
1/5th of our remaining CPU resources serialized for absolutely no good
reason. THAT is the kind of thing we shouldn't do.
The rest of the big offenders seem to be mostly done outside the
spinlock, although it's hard to tell how much of the 10% of
sys_semtimedop() iis also under the lock. There's probably other
things there than just the permission checking.
I'm not seeing any real reason the permission checking couldn't be
done just under the RCU lock, before we get the spinlock. Except for
the fact that the "helper" routines in ipc/util.c are written the way
they are, so it's a layering violation. But I really think that would
be a *reasonably* low-hanging fruit thing to do.
Changing the locking itself to be more fine-grained, and doing it
across many different ipc semaphores would be a major pain. So I do
suspect that the work Michel Lespinasse did is probably worth doing
anyway in addition to at least trying to fix the horrible lack of
scalability of the code a bit.
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/