Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for locklessupdate of refcount

From: Waiman Long
Date: Sat Jun 29 2013 - 17:04:15 EST

Next message: Sergey Meirovich: "Re: Linux 3.10-rc7"
Previous message: Sergey Meirovich: "Re: Linux 3.9.8"
In reply to: Thomas Gleixner: "Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for locklessupdate of refcount"
Next in thread: Waiman Long: "Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for locklessupdate of refcount"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 06/27/2013 10:44 AM, Thomas Gleixner wrote:

On Wed, 26 Jun 2013, Waiman Long wrote:
On 06/26/2013 07:06 PM, Thomas Gleixner wrote:
On Wed, 26 Jun 2013, Waiman Long wrote:
This is a complete design disaster. You are violating every single
rule of proper layering. The differentiation of spinlock, raw_spinlock
and arch_spinlock is there for a reason and you just take the current
implementation and map your desired function to it. If we take this,
then we fundamentally ruled out a change to the mappings of spinlock,
raw_spinlock and arch_spinlock. This layering is on purpose and it
took a lot of effort to make it as clean as it is now. Your proposed
change undoes that and completely wreckages preempt-rt.

Would you mind enlighten me how this change will break preempt-rt?
The whole spinlock layering was done for RT. In mainline spinlock is
mapped to raw_spinlock. On RT spinlock is mapped to a PI aware
rtmutex.

So your mixing of the various layers and the assumption of lock plus
count being adjacent, does not work on RT at all.

Thank for the explanation. I had downloaded and looked at the RT patch. My code won't work for the full RT kernel. I guess that is no way to work around this and only logical choice is to disable it for the full RT kernel.

The only architecture that will break, according to data in the
respectively arch/*/include/asm/spinlock_types.h files is PA-RISC
1.x (it is OK in PA-RISC 2.0) whose arch_spinlock type has a size of
16 bytes. I am not sure if 32-bit PA-RISC 1.x is still supported or
not, but we can always disable the optimization for this case.

You have to do that right from the beginning with a proper data
structure and proper accessor functions. Introducing wreckage and then
retroactivly fixing it, is not a really good idea.

For architecture that needs a larger than 32-bit arch_spin_lock type, the optimization code will be disabled just like the non-SMP and full RT cases.

So what you really want is a new data structure, e.g. something like
struct spinlock_refcount() and a proper set of new accessor functions
instead of an adhoc interface which is designed solely for the needs
of dcache improvements.

I had thought about that. The only downside is we need more code changes to
make existing code to use the new infrastructure. One of the drivers in my

That's not a downside. It makes the usage of the infrastructure more
obvious and not hidden behind macro magic. And the changes are trivial
to script.

Making the code from d_lock to lockref.lock, for example, is trivial. However, there are about 40 different files that need to be changed with different maintainers. Do I need to get an ACK from all of them or can I just go ahead with such trivial changes without their ACK? You know it can be hard to get responses from so many maintainers in a timely manner. This is actually my main concern.

design is to minimize change to existing code. Personally, I have no
objection of doing that if others think this is the right way to go.

Definitely. Something like this would be ok:

struct lock_refcnt {
int refcnt;
struct spinlock lock;
};

It does not require a gazillion of ifdefs and works for
UP,SMP,DEBUG....

extern bool lock_refcnt_mod(struct lock_refcnt *lr, int mod, int cond);

You also want something like:

extern void lock_refcnt_disable(void);

So we can runtime disable it e.g. when lock elision is detected and
active. So you can force lock_refcnt_mod() to return false.

static inline bool lock_refcnt_inc(struct lock_refcnt *sr)
{
#ifdef CONFIG_HAVE_LOCK_REFCNT
return lock_refcnt_mod(sr, 1, INTMAX);
#else
return false;
#endif
}

That does not break any code as long as CONFIG_HAVE_SPINLOCK_REFCNT=n.

So we can enable it per architecture and make it depend on SMP. For RT
we simply can force this switch to n.

The other question is the semantic of these refcount functions. From
your description the usage pattern is:

if (refcnt_xxx())
return;
/* Slow path */
spin_lock();
...
spin_unlock();

So it might be sensible to make this explicit:

static inline bool refcnt_inc_or_lock(struct lock_refcnt *lr))
{
#ifdef CONFIG_HAVE_SPINLOCK_REFCNT
if (lock_refcnt_mod(lr, 1, INTMAX))
return true;
#endif
spin_lock(&lr->lock);
return false;
}

Yes, it is a good idea to have a config variable to enable/disable it as long as the default is "y". Of course, an full RT kernel or an non-SMP kernel will have it disabled.

Regards,
Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Sergey Meirovich: "Re: Linux 3.10-rc7"
Previous message: Sergey Meirovich: "Re: Linux 3.9.8"
In reply to: Thomas Gleixner: "Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for locklessupdate of refcount"
Next in thread: Waiman Long: "Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for locklessupdate of refcount"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]