Re: [RFC][PATCH 1/7] Resource counters
From: Pavel Emelianov
Date: Tue Mar 13 2007 - 05:25:36 EST
Eric W. Biederman wrote:
> Herbert Poetzl <herbert@xxxxxxxxxxxx> writes:
>
>> On Sun, Mar 11, 2007 at 01:00:15PM -0600, Eric W. Biederman wrote:
>>> Herbert Poetzl <herbert@xxxxxxxxxxxx> writes:
>>>
>>>> Linux-VServer does the accounting with atomic counters,
>>>> so that works quite fine, just do the checks at the
>>>> beginning of whatever resource allocation and the
>>>> accounting once the resource is acquired ...
>>> Atomic operations versus locks is only a granularity thing.
>>> You still need the cache line which is the cost on SMP.
>>>
>>> Are you using atomic_add_return or atomic_add_unless or
>>> are you performing you actions in two separate steps
>>> which is racy? What I have seen indicates you are using
>>> a racy two separate operation form.
>> yes, this is the current implementation which
>> is more than sufficient, but I'm aware of the
>> potential issues here, and I have an experimental
>> patch sitting here which removes this race with
>> the following change:
>>
>> - doesn't store the accounted value but
>> limit - accounted (i.e. the free resource)
>> - uses atomic_add_return()
>> - when negative, an error is returned and
>> the resource amount is added back
>>
>> changes to the limit have to adjust the 'current'
>> value too, but that is again simple and atomic
>>
>> best,
>> Herbert
>>
>> PS: atomic_add_unless() didn't exist back then
>> (at least I think so) but that might be an option
>> too ...
>
> I think as far as having this discussion if you can remove that race
> people will be more willing to talk about what vserver does.
>
> That said anything that uses locks or atomic operations (finer grained locks)
> because of the cache line ping pong is going to have scaling issues on large
> boxes.
BTW atomic_add_unless() is essentially a loop!!! Just
like spin_lock() is, so why is one better that another?
spin_lock() can go to schedule() on preemptive kernels
thus increasing interactivity, while atomic can't.
> So in that sense anything short of per cpu variables sucks at scale. That said
> I would much rather get a simple correct version without the complexity of
> per cpu counters, before we optimize the counters that much.
>
> Eric
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/