Re: linux-next: manual merge of the kgdb tree with Linus' tree

From: Jason Wessel
Date: Mon Aug 09 2010 - 01:13:19 EST


On 08/07/2010 04:17 PM, Paul E. McKenney wrote:
> On Sat, Aug 07, 2010 at 02:05:42PM +1000, Stephen Rothwell wrote:
>
>> Hi Jason,
>>
>> Today's linux-next merge of the kgdb tree got a conflict in
>> include/linux/rcupdate.h between commits
>> 551d55a944b143ef26fbd482d1c463199d6f65cf ("tree/tiny rcu: Add debug RCU
>> head objects") and f5155b33277c9678041a27869165619bb34f722f ("rcu: add an
>> rcu_dereference_index_check()") from Linus' tree and commit
>> 9e213357d0aeaeb81e213cfd3b9415db5fccc1b5 ("rcu,debug_core: allow the
>> kernel debugger to reset the rcu stall timer") from the kgdb tree.
>>
>
> Hello, Jason,
>
> Just trying to make sure I understand this...
>
> This cannot be a "stop the machine" debugger, because otherwise the
> jiffies counter would stop and you would not get RCU CPU stall warnings.
>
> It might be a "stop the machine" debugger, but where the jiffies counter
> catches up quickly as soon as the machine restarts. In this case,
> your patch would be a reasonable approach, but RCU CPU stall warnings
> are going to be the least of your problems.

You should have the patches now in as I posted them to LKML as an RFC.
If there are other problems in this area I am interested in
understanding what further issues exist that still have yet to be dealt
with.

The general idea is that the kernel can take an exception and execute
for a short period of time with all the processors spinning in a wait
loop and then resume kernel execution. As you might guess the debugger
is a "multipurpose" tool and there are quite a few circumstances where
the a trip into the debugger is really a one way trip to a reboot when
you are done inspecting.

> Actually, I have only seen
> one piece of your patch. Could you please send me the rest of it?
>
> If you are permitting some tasks to run while others are halted,
> then the RCU CPU stall is simply a symptom of an underlying problem,
> namely that if you halt a task in an RCU read-side critical section
> for long enough, you will OOM the system.
>
>

We are definitely not "partially running". Picking an choosing threads
to run without a complete integration with the scheduler and all other
related systems like RCU would be a _really_ bad idea. :-)

Jason.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/