[RESEND] Re: usb/net/p54: trying to register non-static key in p54_unregister_leds

From: Christian Lamparter
Date: Sat Sep 23 2017 - 15:37:54 EST


This got rejected by gmail once. Let's see if it works now.

On Thursday, September 21, 2017 8:22:45 PM CEST Andrey Konovalov wrote:
> On Wed, Sep 20, 2017 at 9:55 PM, Johannes Berg
> <johannes@xxxxxxxxxxxxxxxx> wrote:
> > On Wed, 2017-09-20 at 21:27 +0200, Christian Lamparter wrote:
> >
> >> It seems this is caused as a result of:
> >> -> lock_map_acquire(&work->lockdep_map);
> >> lock_map_release(&work->lockdep_map);
> >>
> >> in flush_work() [0]
> >
> > Agree.
> >
> >> This was added by:
> >>
> >> commit 0976dfc1d0cd80a4e9dfaf87bd8744612bde475a
> >> Author: Stephen Boyd <sboyd@xxxxxxxxxxxxxx>
> >> Date: Fri Apr 20 17:28:50 2012 -0700
> >>
> >> workqueue: Catch more locking problems with flush_work()
> >
> > Yes, but that doesn't matter.
> >
> >> Looking at the Stephen's patch, it's clear that it was made
> >> with "static DECLARE_WORK(work, my_work)" in mind. However
> >> p54's led_work is "per-device", hence it is stored in the
> >> devices context p54_common, which is dynamically allocated.
> >> So, maybe revert Stephen's patch?
> >
> > I disagree - as the lockdep warning says:
> >
> >> > INFO: trying to register non-static key.
> >> > the code is fine but needs lockdep annotation.
> >> > turning off the locking correctness validator.
> >
> > What it needs is to actually correctly go through initializing the work
> > at least once.
> >
> > Without more information, I can't really say what's going on, but I
> > assume that something is failing and p54_unregister_leds() is getting
> > invoked without p54_init_leds() having been invoked, so essentially
> > it's trying to flush a work that was never initialized?
> >
> > INIT_DELAYED_WORK() does, after all, initialize the lockdep map
> > properly via __INIT_WORK().

Ok, thanks. This does indeed explain it.

But this also begs the question: Is this really working then?
>From what I can tell, if CONFIG_LOCKDEP is not set then there's no BUG
no WARN, no other splat or any other odd system behaviour. Does
[cancel | flush]_[delayed_]work[_sync] really "just work" by *accident*,
as long the delayed_work | work_struct is zeroed out?
And should it work in the future as well?

> Since I'm able to reproduce this, please let me know if you need me to
> collect some debug traces to help with the triage.

Do you want to take a shot at making a patch too? At a quick glance, it
should be enough to move the [#ifdef CONFIG_P54_LEDS ... #endif] block
in p54_unregister_common() into the if (priv->registered) { block
(preferably before the ieee80211_unregister_hw(dev).

Regards,
Christian