Re: [PATCHv3 1/3] NETPOLL: Extend rx_hook support.

From: Jason Wessel
Date: Thu Mar 01 2012 - 17:24:35 EST


On 03/01/2012 03:04 PM, Andrei Warkentin wrote:
> ----- Original Message -----
>> From: "Andrei Warkentin" <awarkentin@xxxxxxxxxx>
>> To: "Jason Wessel" <jason.wessel@xxxxxxxxxxxxx>
>> Cc: netdev@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, "Andrei Warkentin" <andreiw@xxxxxxxxxx>,
>> kgdb-bugreport@xxxxxxxxxxxxxxxxxxxxx, "Matt Mackall" <mpm@xxxxxxxxxxx>, "Andrei Warkentin"
>> <andrey.warkentin@xxxxxxxxx>
>> Sent: Tuesday, February 28, 2012 12:43:52 PM
>> Subject: Re: [PATCHv3 1/3] NETPOLL: Extend rx_hook support.
>>
>>> All that netpoll_poll() did was to call netpoll_poll_dev(). I have
>>> not yet looked at the differences between kgdboe and the netkdb
>>> code
>>> you proposed but I would have suspected it also falls victim to the
>>> ethernet preemption problem which prevented kgdboe from ever being
>>> considered for a mainline merge. Certainly there are ways to fix
>>> this
>>> problem but most involved changes to scheduling, core net code, or
>>> substantial driver specific changes.
>>>
>> I see, I read up on the issues w.r.t. preemption. Could this be
>> worked
>> around by modifiying affected drivers to bypass locking if they are
>> used in KDB context? Make some accessor netdev-specific lock/unlocks
>> that won't do anything if running in KDB context.
>>
>>
> By the way, is there a good way to repro the preemption case? Hopefully this doesn't
> involve some crazy hardware...


I have several cases which will usually hang the machine fairly quickly, but they all involve using gdb and a target using SMP. Most often it is as simple as this:

* Use an SMP system with with at least 2 cores
* Start two threads rapidly running some processes
while [ 1 ] ; do date > /dev/null ; done &
while [ 1 ] ; do date > /dev/null ; done &
* Connect with gdb to kgdb and set a breakpoint at do_fork
Now do "c"
Now do "c 1000"

Generally the system will hang long before you get 1000 breakpoints hit and it will be a condition where there is a lock needed to create an skb, or the ethernet driver is preempted or some part of the network stack is preempted (or holding a lock) on the non master cpu.

There is another condition that is hard to catch that involves a task migrating from one cpu to the next, but we'll stick to the simple test case I described above for now.

I did have a question, because it seems you were using qemu / kvm. I have a number of test cases that use kvm, but the netkkgdb does not seem to work with the nc. My question is how am I supposed to actually use the netkgdb?

Here is what I observe on the target system:

insmod netkgdb.ko netkgdb=@/,@10.0.2.2/
echo g > /proc/sysrq-trigger

On my host system:
nc.traditional -l -u -p 7777

I will type help, and then the netkgdb is toast. It doesn't seem to respond anymore.

Jason.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/