Re: F_SETOWN_TID: F_SETOWN was thread-specific for a while

From: Jamie Lokier
Date: Tue Aug 11 2009 - 09:10:28 EST


Oleg Nesterov wrote:
> Agreed, this looks a bit odd. But at least this is documented. From
> man 2 fcntl:
>
> By using F_SETSIG with a nonzero value, and setting SA_SIGINFO
> for the signal handler (see sigaction(2)), extra information
> about I/O events is passed to the handler in a siginfo_t
> structure. If the si_code field indicates the source is
> SI_SIGIO, the si_fd field gives the file descriptor associated
> with the event. Otherwise, there is no indication which file
> descriptors are pending,
>
> Not sure if it is safe to change the historical behaviour.

The change in 2.6.12 breaks some code of mine, which uses RT queued
I/O signals on multiple threads but as far as I know it's not used
anywhere now.

In the <= 2.4 era, there were lots of web servers and benchmarks using
queued I/O signals for scalable event-driven I/O, but I don't know of
any implementation who dared do it with multiple threads, except mine.

It was regarded as "beware ye who enter here" territory, which I can
attest to from the long time it took to get it right and the multitude
of kernel bugs and version changes needing to be worked around.

Since 2.6, everyone uses epoll which is much better, except that
occasionally SIGIO comes in handy when an async notification is
required.

So the change in 2.6.12 does break something that probably isn't much
used, but it's too late now. Occasionally thread-specific SIGIO (or
F_SETSIG) is useful; F_SETOWN_TID makes that nice and clear.

I would drop the pseudo-"bug compatible" behaviour of using negative
tid to mean pid; that's pointless. I'd also make F_GETOWN return an
error when F_SETOWN_TID has been used, and F_GETOWN_TID return an
error when F_SETOWN has been used.

> (the manpage is not exactly right though, and the comment in
> send_sigio_to_task() is not right too: SI_SIGIO (and, btw,
> SI_QUEUE/SI_DETHREAD) is never used).

Ah, there's another historical change you see. It was changed in
2.3.21 from SI_SIGIO to POLL_xxx, and si_band started being set at the
same time. The man page could be updated to reflect that.

(My portable-to-ancient-Linux code checks for si_code == SI_SIGIO, in
which case it has the descriptor but doesn't know what type of event
(pre 2.3.21) so adds it to a poll() set, or checks for si_code ==
POLL_xxx, in which case it ignores the si_code value completely and
looks at si_band for the set of pending events because some patch that
was never mainlined could result in multiple si_band bits set).

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/