Re: [RFC 00/12] locking: Separate lock tracepoints from lockdep/lock_stat (v1)

From: Mathieu Desnoyers
Date: Wed Feb 09 2022 - 13:39:25 EST


----- On Feb 9, 2022, at 1:19 PM, Waiman Long longman@xxxxxxxxxx wrote:

> On 2/9/22 04:09, Peter Zijlstra wrote:
>> On Tue, Feb 08, 2022 at 10:41:56AM -0800, Namhyung Kim wrote:
>>
>>> Eventually I'm mostly interested in the contended locks only and I
>>> want to reduce the overhead in the fast path. By moving that, it'd be
>>> easy to track contended locks with timing by using two tracepoints.
>> So why not put in two new tracepoints and call it a day?
>>
>> Why muck about with all that lockdep stuff just to preserve the name
>> (and in the process continue to blow up data structures etc..). This
>> leaves distros in a bind, will they enable this config and provide
>> tracepoints while bloating the data structures and destroying things
>> like lockref (which relies on sizeof(spinlock_t)), or not provide this
>> at all.
>>
>> Yes, the name is convenient, but it's just not worth it IMO. It makes
>> the whole proposition too much of a trade-off.
>>
>> Would it not be possible to reconstruct enough useful information from
>> the lock callsite?
>>
> I second that as I don't want to see the size of a spinlock exceeds 4
> bytes in a production system.
>
> Instead of storing additional information (e.g. lock name) directly into
> the lock itself. Maybe we can store it elsewhere and use the lock
> address as the key to locate it in a hash table. We can certainly extend
> the various lock init functions to do that. It will be trickier for
> statically initialized locks, but we can probably find a way to do that too.

If we go down that route, it would be nice if we can support a few different
use-cases for various tracers out there.

One use-case (a) requires the ability to query the lock name based on its address as key.
For this a hash table is a good fit. This would allow tracers like ftrace to
output lock names in its human-readable output which is formatted within the kernel.

Another use-case (b) is to be able to "dump" the lock { name, address } tuples
into the trace stream (we call this statedump events in lttng), and do the
translation from address to name at post-processing. This simply requires
that this information is available for iteration for both the core kernel
and module locks, so the tracer can dump this information on trace start
and module load.

Use-case (b) is very similar to what is done for the kernel tracepoints. Based
on this, implementing the init code that iterates on those sections and populates
a hash table for use-case (a) should be easy enough.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com