Re: e1000_netpoll(): disable_irq() triggers might_sleep() on linux-next

From: Thomas Gleixner
Date: Wed Oct 29 2014 - 16:24:15 EST


On Wed, 29 Oct 2014, Thomas Gleixner wrote:
> On Wed, 29 Oct 2014, Peter Zijlstra wrote:
>
> > On Wed, Oct 29, 2014 at 08:49:03PM +0100, Thomas Gleixner wrote:
> > > On Wed, 29 Oct 2014, Peter Zijlstra wrote:
> > >
> > > > On Wed, Oct 29, 2014 at 07:33:00PM +0100, Thomas Gleixner wrote:
> > > > > Yuck. No. You are just papering over the problem.
> > > > >
> > > > > What happens if you add 'threadirqs' to the kernel command line? Or if
> > > > > the interrupt line is shared with a real threaded interrupt user?
> > > > >
> > > > > The proper solution is to have a poll_lock for e1000 which serializes
> > > > > the hardware interrupt against netpoll instead of using
> > > > > disable/enable_irq().
> > > > >
> > > > > In fact that's less expensive than the disable/enable_irq() dance and
> > > > > the chance of contention is pretty low. If done right it will be a
> > > > > NOOP for the CONFIG_NET_POLL_CONTROLLER=n case.
> > > > >
> > > >
> > > > OK a little something like so then I suppose.. But I suspect most all
> > > > the network drivers will need this and maybe more, disable_irq() is a
> > > > popular little thing and we 'just' changed semantics on them.
> > >
> > > We changed that almost 4 years ago :) What we 'just' did was to add a
> > > prominent warning into the code.
> >
> > You know that is the same right... they didn't know it was broken
> > therefore it wasn't :-), but now they need to go actually do stuff about
> > it, an entirely different proposition.
>
> Right, and of course the world and some more has the very same code
> there:
>
> poll_controller()
> {
> disable_irq();
> dev_interrupt_handler();
> enable_irq();
> }
>
> Trying to twist my brain to come up with a solution which avoids the
> spinlock, but I have a hard time to come up with one.
>
> The only thing I came up with so far is to avoid adding locks to every
> driver incarnation and instead put it into struct net_device and
> provide helper functions for the lock/unlock case.
>
> That does not change the fact that we need to deal with that on a per
> driver basis :(

But at least it allows to mitigate the impact by making it conditional
at a central point.

static inline void netpoll_lock(struct net_device *nd)
{
if (netpoll_active(nd))
spin_lock(&nd->netpoll_lock);
}

and let the core code make sure that activation/deactivation of
netpoll on a particular interface is serialized against the interrupt
and netpoll calls.

Not sure if it's worth the trouble, but at least it allows to deal
with it in the core instead of dealing with it on a per driver base.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/