Re: [PATCH] RFC: debugobjects: capture stack traces at _init() time

From: Daniel Vetter
Date: Fri Mar 23 2018 - 12:26:25 EST


On Tue, Mar 20, 2018 at 8:44 PM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> On Tue, 20 Mar 2018, Daniel Vetter wrote:
>
>> Sometimes it's really easy to know which object has gone boom and
>> where the offending code is, and sometimes it's really hard. One case
>> we're trying to hunt down is when module unload catches a live debug
>> object, with a module with lots of them.
>>
>> Capture a full stack trace from debug_object_init() and dump it when
>> there's a problem.
>>
>> FIXME: Should we have a separate Kconfig knob for the backtraces,
>> they're quite expensive? Atm I'm just selecting it for the general
>> debug object stuff.
>
> Just make it boot time enabled. We really want to be able to switch it off.
>
>> +#define ODEBUG_STACKDEPTH 32
>
> Do we really need it that deep? I doubt that.

Entirely arbitrary, I just needed this to start hunting a rare crash
we're seeing in CI and stitched something together. I agree we
probably don't need that much.

>> +static noinline depot_stack_handle_t save_stack(struct debug_obj *obj)
>> +{
>> + unsigned long entries[ODEBUG_STACKDEPTH];
>> + struct stack_trace trace = {
>> + .entries = entries,
>> + .max_entries = ODEBUG_STACKDEPTH,
>> + .skip = 2
>> + };
>> +
>> + save_stack_trace(&trace);
>> + if (trace.nr_entries != 0 &&
>> + trace.entries[trace.nr_entries-1] == ULONG_MAX)
>> + trace.nr_entries--;
>> +
>> + /* May be called under spinlock, so avoid sleeping */
>> + return depot_save_stack(&trace, GFP_NOWAIT);
>
> I really don't like the idea of unconditional allocations in that code
> path. We went great length to make it perform halfways sane with the
> pool. Though we should actually have per cpu pools to avoid the lock
> contention, but thats a different problem.
>
> I'd rather make the storage a fixed size allocation which is appended
> to the debug object. Something like this:
>
> struct debug_obj_trace {
> struct stack_trace trace;
> unsigned long entries[ODEBUG_STACKDEPTH];
> };
>
> struct debug_obj {
> unsigned int astate;
> void *object;
> struct debug_obj_descr *descr;
> struct debug_obj_trace trace[0];
> };
>
> void __init debug_objects_mem_init(void)
> {
> unsigned long objsize = sizeof (struct debug_obj);
>
> if (!debug_objects_enabled)
> return;
>
> if (debug_objects_trace_stack)
> objsize += sizeof(struct debug_obj_trace);
>
> obj_cache = kmem_cache_create("debug_objects_cache", objsize, 0,
> SLAB_DEBUG_OBJECTS, NULL);
>
> __debug_object_init(void *addr, struct debug_obj_descr *descr, int onstack)
> {
> ....
> case ODEBUG_STATE_NONE:
> case ODEBUG_STATE_INIT:
> case ODEBUG_STATE_INACTIVE:
> if (debug_object_trace_stack) {
> obj->trace[0].trace.entries = obj->trace[0].entries;
> save_stack_trace(&trace[0].trace);
> }
>
> That means we need to size the static objects which are used during early
> boot with stack under the assumption that stack tracing is enabled, but
> that's simple enough.
>
> Hmm?

Yeah looks reasonable, and gives us an easy Kconfig/boot-time option
to enable/disable it. I'm a bit swamped, but I'll try to respin.
Thanks very much for the look and suggesting a cleaner integration
approach - the allocations and recursion into the stack depot really
did not result in clean code.

Thanks, Daniel

> Thanks,
>
> tglx



--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch