RE: [BUG - HRT patch] disabling timer hangs system when multiple overruns

From: Fleischer, Julie N (julie.n.fleischer@intel.com)
Date: Wed Jan 15 2003 - 19:12:45 EST


> George Anzinger wrote:
> I suspect that you have run into a bug I fixed in the latest
> version having to do with handing off a timer from id look
> up to the spin lock on the timer. I was releasing the look
> up lock prior to taking the timer lock which allowed an
> interrupt to sneek in there and set up a dead lock with the
> interrupt code. Most likey to happen when processing
> overruning timers.
>
> This is fixed in the latest patch.

George -
Again, sorry about not testing on your latest version (and thanks for the
bk6 patch! :) ). I ran this test again on your latest patch (the
2.5.54-bk6-1.0 patches), and I'm still seeing a hang of the test case.
There is a good difference, though, I think due to your fix. In 2.5.54-bk1,
I had to reboot the system after the hang(or 2/3 times I usually had to).
Now, I do not have to reboot the system (or 3/3 times I don't have to), but
the test case still hangs (i.e., I have to manually kill the session the
test case was started in).

I forgot to mention reproducibility before. The test case hang is always
reproducible (bk6 or bk1 patches). As I mentioned, the system hang no
longer happens with bk6 (probably the issue you fixed), but the system would
hang ~1/3 times with the bk1 patches.

Someone also suggested I use strace to get more output. That seemed to help
pinpoint exactly that the issue came doing the "timer_settime(<an its with
it_value=0>)" call.

Here's that output:
(...)
write(1, "setup first timer\n", 18setup first timer
) = 18
ipc_subcall(0x8000003, 0, 0xbffff8e0, 0) = 0
write(1, "sleep\n", 6sleep
) = 6
rt_sigprocmask(SIG_BLOCK, [CHLD], [ALRM], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [ALRM], NULL, 8) = 0
nanosleep({1, 0}, {1, 0}) = 0
write(1, "awoke\n", 6awoke
) = 6
rt_sigaction(SIGALRM, {0x80485c0, [], 0x4000000}, NULL, 8) = 0
write(1, "setup second timer\n", 19setup second timer
) = 19
ipc_subcall(0x8000003, 0, 0xbffff8e0, 0

This is the point where the test case hangs. If I'm using strace, I just do
a Ctrl-C to get out.

I'm using these patches:
  hrtimers-core-2.5.54-bk6-1.0.patch
  hrtimers-hrposix-2.5.54-bk6-1.0.patch
  hrtimers-i386-2.5.54-bk6-1.0.patch
  hrtimers-posix-2.5.54-bk6-1.0.patch
  hrtimers-support-2.5.52-1.0.patch

Thanks.
- Julie

**These views are not necessarily those of my employer.**
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Jan 15 2003 - 22:00:56 EST