On Sat, Jun 29, 2013 at 2:34 PM, Waiman Long<waiman.long@xxxxxx> wrote:I think I got it now. For architecture with transactional memory support toYes. Except even more complex: I want the generic fallbacks in a
use an alternative implementation, we will need to use some kind of dynamic
patching at kernel boot up time as not all CPUs in that architecture will
have that support. In that case the helper functions have to be real
functions and cannot be inlined. That means I need to put the implementation
into a spinlock_refcount.c file with the header file contains structure
definitions and function prototypes only. Is that what you are looking for?
lib/*.c files too.
So we basically have multiple "levels" of specialization:
(a) the purely lock-based model that doesn't do any optimization at
all, because we have lockdep enabled etc, so we *want* things to fall
back to real spinlocks.
(b) the generic cmpxchg approach for the case when that works
(c) the capability for an architecture to make up its own very
specialized version
and while I think in all cases the actual functions are big enough
that you don't ever want to inline them, at least in the case of (c)
it is entirely possible that the architecture actually wants a
particular layout for the spinlock and refcount, so we do want the
architecture to be able to specify the exact data structure in its own
<asm/spinlock-refcount.h> file. In fact, that may well be true of case
(b) too, as Andi already pointed out that on x86-32, an "u64" is not
necessarily sufficiently aligned for efficient cmpxchg (it may *work*,
but cacheline-crossing atomics are very very slow).
Other architectures may have other issues - even with a "generic"
cmpxchg-based library version, they may well want to specify exactly
how to take the lock. So while (a) would be 100% generic, (b) might
need small architecture-specific tweaks, and (c) would be a full
custom implementation.
See how we do<asm/word-at-a-time.h> and CONFIG_DCACHE_WORD_ACCESS.
Notice how there is a "generic"<asm-generic/word-at-a-time.h> file
(actually, big-endian only) for reference implementations (used by
sparc, m68k and parisc, for example), and then you have "full custom"
implementations for x86, powerpc, alpha and ARM.
See also lib/strnlen_user.c and CONFIG_GENERIC_STRNLEN_USER as an
example of how architectures may choose to opt in to using generic
library versions - if those work sufficiently well for that
architecture. Again, some architecture may decide to write their own
fully custome strlen_user() function.
Very similar concept.
Linus