On Fri, Jan 26, 2018 at 11:07:39AM -0500, David Miller wrote:
FWIW, the tool *does* promise to be usefulJia-Ju, I'm probably not going to apply any of your GFP_KERNELThis is found by a static analysis tool named DCNS written by myself.The trouble is, places like
net/atm/raw.c:65: vcc->send = atm_send_aal0;
net/atm/raw.c:74: vcc->send = vcc->dev->ops->send;
net/atm/raw.c:83: vcc->send = vcc->dev->ops->send;
mean extra call chains. It's *not* just vcc_sendmsg(), and e.g.
ret = ATM_SKB(skb)->vcc->send(ATM_SKB(skb)->vcc, skb)
? DROP_PACKET : 1;
bh_unlock_sock(sk_atm(vcc));
in pppoatm_send() definitely is called under a spinlock.
Looking through the driver (in advanced bitrot, as usual for drivers/atm),
I'd say that submit_queue() is fucked in head in the "queue full" case.
And judging by the history, had been thus since the original merge...
conversions.
Al's analysis above is similar to how things looked for other patches
you submited of this nature.
So because of the lack of care and serious auditing you put into these
conversions, I really have no choice but to drop them from my queue
because on the whole they are adding bugs rather than improving the
code.
- as far as I understand it
looks for places where GFP_ATOMIC allocation goes with blocking operations
near every callchain leading there. And that indicates something fishy
going on - either pointless GFP_ATOMIC (in benign case), or something
much nastier: a callchain that would require GFP_ATOMIC. In that case
whatever blocking operation found along the way is a bug.
This time it has, AFAICS, caught a genuine bug in drivers/atm/firestream.c -
static void submit_qentry (struct fs_dev *dev, struct queue *q, struct FS_QENTRY *qe)
{
u32 wp;
struct FS_QENTRY *cqe;
/* XXX Sanity check: the write pointer can be checked to be
still the same as the value passed as qe... -- REW */
/* udelay (5); */
while ((wp = read_fs (dev, Q_WP (q->offset))) & Q_FULL) {
fs_dprintk (FS_DEBUG_TXQ, "Found queue at %x full. Waiting.\n",
q->offset);
schedule ();
}
...
static void submit_queue (struct fs_dev *dev, struct queue *q,
u32 cmd, u32 p1, u32 p2, u32 p3)
{
struct FS_QENTRY *qe;
qe = get_qentry (dev, q);
qe->cmd = cmd;
qe->p0 = p1;
qe->p1 = p2;
qe->p2 = p3;
submit_qentry (dev, q, qe);
...
static int fs_send (struct atm_vcc *atm_vcc, struct sk_buff *skb)
{
...
td = kmalloc (sizeof (struct FS_BPENTRY), GFP_ATOMIC);
...
submit_queue (dev, &dev->hp_txq,
QE_TRANSMIT_DE | vcc->channo,
virt_to_bus (td), 0,
virt_to_bus (td));
...
Either all callchains leading to fs_send() are in non-atomic contexts
(in which case that GFP_ATOMIC would be pointless) or there's one
where we cannot block. Any such would be a potential deadlock.
And the latter appears to be the case - fs_send() is atmdev_ops ->send(),
which can end up in atm_vcc ->send(), which can be called from under
->sk_lock.slock.
What would be really useful:
* list of "here's a list of locations where we do something
blocking; each callchain to this GFP_ATOMIC allocation passes either
upstream of one of those without leaving atomic context in between
or downstream without entering one".
* after that - backtracking these callchains further, watching
for ones in atomic contexts. Can be done manually, but if that tool
can assist in doing so, all the better. If we do find one, we have
found a deadlock - just take the blocking operation reported for
that callchain and that's it. If it's not an obvious false positive
(e.g.
if (!foo)
spin_lock(&splat);
.....
if (foo)
schedule();
), it's worth reporting to maintainers of the code in question.
* if all callchains reach obviously non-atomic contexts
(syscall entry, for example, or a kernel thread payload, or
a function documented to require a mutex held by caller, etc.) -
then a switch to GFP_KERNEL might be appropriate. With analysis
of callchains posted as you are posting that.
* either way, having the tool print the callchains out
would be a good idea - for examining them, for writing reports,
etc.