Re: [PATCH v2] sched, timer: Use atomics for thread_group_cputimer to improve scalability
From: Linus Torvalds
Date: Mon Mar 02 2015 - 14:04:01 EST
On Mon, Mar 2, 2015 at 10:42 AM, Jason Low <jason.low2@xxxxxx> wrote:
>
> This patch converts the timers to 64 bit atomic variables and use
> atomic add to update them without a lock. With this patch, the percent
> of total time spent updating thread group cputimer timers was reduced
> from 30% down to less than 1%.
NAK.
Not because I think this is wrong, but because somebody needs to look
at the effects on 32-bit architectures too.
In particular, check out lib/atomic64.c - which uses a hashed array of
16-bit spinlocks to do 64-bit atomics. That may or may well work ok in
practice, but it does mean that now sample_group_cputimer() and
update_gt_cputime() will take that (it ends up generally being the
same) spinlock three times for the three atomic64_read()'s.
Now, I think on x86, we end up using not lib/atomic64.c but our own
versions that use cmpxchg8b, which is probably fine from a performance
standpoint. But I see a lot of "select GENERIC_ATOMIC64" for other
architectures.
Anyway, it is *possible* that even on those 32-bit targets, the
atomic64's aren't any worse than the current spinlock in practice. So
the "NAK" is in no way absolute - but I'd just like to hear that this
is all reasonably fine on 32-bit ARM and powerpc, for example.
Hmm?
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/