Re: [PATCH 3/3] CGroups: Add css_tryget()

From: Paul Menage
Date: Wed Dec 17 2008 - 17:48:28 EST


On Wed, Dec 17, 2008 at 2:07 PM, Andrew Morton
<akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> /*
> * State maintained by the cgroup system to allow subsystems
> * to be "busy". Should be accessed via css_get(),
> * css_tryget() and and css_put().
> */
>
> is conventional/preferred.

Oops, will fix.

>> static inline void css_get(struct cgroup_subsys_state *css)
>> @@ -77,9 +80,32 @@ static inline void css_get(struct cgroup
>> if (!test_bit(CSS_ROOT, &css->flags))
>> atomic_inc(&css->refcnt);
>> }
>> +
>> +static inline bool css_is_removed(struct cgroup_subsys_state *css)
>> +{
>> + return test_bit(CSS_REMOVED, &css->flags);
>> +}
>> +
>> +/*
>> + * Call css_tryget() to take a reference on a css if your existing
>> + * (known-valid) reference isn't already ref-counted. Returns false if
>> + * the css has been destroyed.
>> + */
>> +
>> +static inline bool css_tryget(struct cgroup_subsys_state *css)
>> +{
>> + if (test_bit(CSS_ROOT, &css->flags))
>> + return true;
>> + while (!atomic_inc_not_zero(&css->refcnt)) {
>> + if (test_bit(CSS_REMOVED, &css->flags))
>> + return false;
>> + }
>> + return true;
>> +}
>
> This looks too large to inline.
>
> We should have a cpu_relax() in the loop?

Sounds reasonable.

>
> And possibly a cond_resched().

No, we don't want to reschedule. These are pseudo spin locks rather
than psuedo mutexes. And the "hold time" is extremely short.

>
> It would be better if these polling loops didn't exist at all, of
> course. But I guess if you could work out a way of doing that, this
> patch wouldn't exist.

It would certainly be possible to implement it as a spinlock and a
count, and do:

css_get() {
spin_lock(&css->lock);
css->count++;
spin_unlock(&css->lock);
}

css_tryget() {
spin_lock(&css->lock);
if (css->count > 1) {
css->count++; result = true;
} else {
result = false;
}
spin_unlock(&css->lock);
}

and implement the cgroups side of it as

for each subsystem {
spin_lock(&css->lock);
if (css->count == 1) {
css->count = 0;
} else {
success = false;
}
}
for each subsystem {
if (!success && css->count == 0) {
css->count = 1;
}
spin_unlock(&css->lock);
}

Functionally that would be identical - the only downside is that's an
extra atomic operation in the fast path of css_get() and css_tryget(),
which some people had objected to in the past when I proposed similar
patches.

Hmm. Thinking about it, this is very similar to the rwlock_t logic,
and I could probably implement css_get() and css_tryget() via
read_lock() and the clear_css_refs() side via write_trylock(). Which
would be pretty much the same as the original patch, except using
conventional primitives. Big downside would be that we would be
limited to RW_LOCK_BIAS refcounts, or about 16M, versus the 2B that we
get with regular atomics.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/