Re: [PATCH] posix-timers: Do not modify an already queued timer signal

From: Oleg Nesterov
Date: Sun Jul 20 2008 - 07:05:34 EST

On 07/19, Roland McGrath wrote:
> I think the solution we want is to make sure that a timer whose
> expiration signal is still pending never gets a second notification.
> POSIX says, "The effect of disarming or resetting a timer with pending
> expiration notifications is unspecified." This gives us lots of
> latitude. I think the best behavior is the one that comports with the
> general specification, "Only a single signal shall be queued to the
> process for a given timer at any point in time."
> The it_requeue_pending/si_sys_private logic is indeed intended to
> address this case. It predates my involvement of the code, and I
> agree it has always looked oddly hairy. Its plan is to make sure that
> dequeue_signal's call to do_schedule_next_timer does not re-arm the
> timer when an intervening timer_settime call has already done so. As
> Mark found, this is buggy because it's not really safe for the timer
> to go off (call posix_timer_event again) before the dequeue.
> In the signals code, si_sys_private is used purely as a flag that
> dequeue_signal needs to bother with dropping the siglock to call
> do_schedule_next_timer. Its use as a serialization counter between
> timer_settime and do_schedule_next_timer is entirely local to the
> posix-timers code.

Yes, thanks, I see. But does it have any meaning for the user-space?

Let me repeat. Can't we make a simple fix for now for this nasty and
ancient bug, before we do the more clever changes,

--- kernel/posix-timers.c
+++ kernel/posix-timers.c
@@ -298,12 +298,10 @@ void do_schedule_next_timer(struct sigin

int posix_timer_event(struct k_itimer *timr,int si_private)
- memset(&timr->sigq->info, 0, sizeof(siginfo_t));
timr->sigq->info.si_sys_private = si_private;
/* Send signal to the process that owns this timer.*/

timr->sigq->info.si_signo = timr->it_sigev_signo;
- timr->sigq->info.si_errno = 0;
timr->sigq->info.si_code = SI_TIMER;
timr->sigq->info.si_tid = timr->it_id;
timr->sigq->info.si_value = timr->it_sigev_value;
@@ -435,6 +433,7 @@ static struct k_itimer * alloc_posix_tim
kmem_cache_free(posix_timers_cache, tmr);
tmr = NULL;
+ memset(&timr->sigq->info, 0, sizeof(siginfo_t));
return tmr;


Suppose ->sigq is queued. In that case "si_sys_private = si_private"
can race with dequeue_signal/do_schedule_next_timer, but everything
looks OK because the timer is locked.

If dequeue_signal() sees the old value of si_sys_private, it won't
call do_schedule_next_timer(). If it sees the new value, do_schedule_next_timer()
will wait for ->it_lock.

Yes, in both cases we can queue ->sigq again instead of incrementing
info.si_overrun. But this is not worse than we have now, and afaics
this is also possible with timr->fired, please see below.

> What userland should observe is that when the new timer expiration
> time arrives while the previously-fired timer signal is still pending,
> there is no new signal, and the timer accrues an overrun.
> An obvious idea is simply not to re-arm in timer_settime when the
> timer already has a pending notification. When the signal gets
> dequeued, do_schedule_next_timer will arm it then. However, the way
> we dispatch to different clocks' timer_set functions makes it awkward
> to validate and store a new setting and fetch the old value without
> actually arming the timer. Also, the logic for one-shot timers has
> more wrinkles.
> So I think the thing to do is arm the timer in timer_settime even when
> it's already pending (as we do now), but have posix_timer_event detect
> a firing when already queued and just bump the overrun count.
> We can do away with it_requeue_pending, and make si_sys_private purely
> a simple 0/1 flag for whether to call do_schedule_next_timer. To
> optimize the extra checks, we use different bookkeeping fields in
> struct k_itimer.
> Have a flag that means "might be queued". do_schedule_next_timer
> replaces:
> if (timr && timr->it_requeue_pending == info->si_sys_private) {
> ...
> }
> with:
> if (timr && timr->fired) {
> timr->fired = 0;
> ...
> }
> That avoids races with timer_settime, which does (after timer_set
> call, timer still locked):
> if (timr->fired) {
> spin_lock(&current->sighand->siglock);
> if (list_empty(&timr->sigq->list))
> timr->fired = 0;
> spin_unlock(&current->sighand->siglock);
> }
> In posix_timer_event (timer is locked), check:
> if (timr->fired) {
> spin_lock(&current->sighand->siglock);
> if (list_empty(&timr->sigq->list))
> timr->fired = 0;
> spin_unlock(&current->sighand->siglock);

(we can't use current->sighand->siglock, this is irq context, we should
use timr->it_process->group_leader...)

Suppose that the signal was already dequeued but sigq->info wasn't copied
to the user-space. The thread which does dequeue_signal() unlocked siglock
and waits for ->it_lock.

posix_timer_event() sees list_empty(), proceeds and calls send_sigqueue().
Now we requeue this ->sigq instead of incrementing si_overrun.

The thread which does dequeue_signal() continues and re-schedules the
timer while ->sigq is queued. Then it copies sigq->info to the user space.

The same problem as with the simple patch above. Not very bad though,
afaics. The user space just recieves 2 signals instead of one with
the right value of si_overrun.

While we are here, off-topics question. posix_timer_event() does

if (timr->it_sigev_notify & SIGEV_THREAD_ID) {
struct task_struct *leader;
int ret = send_sigqueue(timr->sigq, timr->it_process, 0);

if (likely(ret >= 0))
return ret;

timr->it_sigev_notify = SIGEV_SIGNAL;
leader = timr->it_process->group_leader;
timr->it_process = leader;

return send_sigqueue(timr->sigq, timr->it_process, 1);

Is it really useful to convert the SIGEV_THREAD_ID timer to the group-wide
timer when the thread dies?

This uglifies de_thread(), but more importantly this doesn't work reliably.
Suppose that timr->it_process dies with the pending ->sigq. In that case
the timer stops anyway.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at