RE: UDP recvmsg blocks after select(), 2.6 bug?

From: David Schwartz
Date: Fri Oct 15 2004 - 23:25:26 EST



> On Fri, Oct 15, 2004 at 04:33:40PM -0700, David Schwartz wrote:
> > I think it's a really bad idea to make 'select' more
> > complicated by trying
> > to nail down precise semantics for every possible protocol. The 'select'
> > function is supposed to be protocol-neutral and trying to say
> > it guarantees
> > X on protocol Y, where such guarantees constrict what the
> > kernel can do and
> > do not make user code anything but more fragile, doesn't seem
> > to be a good
> > idea to me.

> Are you saying that a minimal operating system should feel free
> to implement
> select() to always return true with all bits set?

Yes, though that would obviously be a very poor implementation.

> > For UDP specifically, datagrams are fundamentally discardable
> > whenever that
> > seems to be a good idea. In general, there are any number of
> > corner cases
> > for various combinations of protocols and situations where a
> > 'select' hit
> > will not be followed by an operation that doesn't block.
>
> Like?

Like a TCP 'write'.

> In the accept() case, you genuinely have a 'if you had substituted the
> select() with an accept(), the accept() would have succeeded.'

While this is true, there is an 'as if' here. There's no difference to the
application between a datagram that dropped and a datagram that got
corrupted in transit.

> In the UDP case, you do NOT have this situation. A recvmesg() in place
> of select() would block, therefore, select() should not block.

But it would block if the UDP datagram had been dropped after the 'select'
hit but before the 'recvmsg'. There is no logical reason there should be a
semantic difference between a UDP packet that was dropped and a UDP packet
that was corrupted.

> > It just happens to be that 'select' works best when it's a hint that
> > something has changed and the operation can/should be re-tried. It works
> > very badly when the results of a 'select' are supposed to
> > change something
> > because you're supposed to be able to 'select' (and then not perform the
> > operation) without affecting things. This is level semantics, not edge.

> We're talking about a packet that was never readable.

There is no *application* difference between a dropped packet and a corrupt
packet. If the packet was dropped, it would have been readable.

> If you had an
> efficient enough check ahead of time (perhaps implemented in
> hardware?), it would never get to the point where select() was in this
> position. The Linux developer of this section of code decided that they
> wanted UDP to be more efficient by delaying the checksum validation until
> the last minute. The cost of this, is that they broke the API with regard
> to select(). This isn't being admitted. Instead, off-topic challenges of
> POSIX, and impractical claims regarding the use of blocking file
> descriptors
> with select() being not recommended have been offered.

> These answers are simply wrong. If the decision is to make select() with
> blocking file descriptors unreliable, than the decision *IS* to recommend
> that select() never be used with blocking file descriptors. What kind of
> operating system developers would recommend the use of select() if it is
> known that the behaviour is unreliable?

I have never seen anyone recommend anything different. Every time this
comes up, it astonishes me that anyone would ever use 'select' on a blocking
file descriptor except in the very special case where you plan to change it
to non-blocking before performing the I/O operation. There are so many well
known corner cases, TCP write being one, UDP discard due to memory pressure
being another, connection abort being yet another.

> > The CAVEAT is that 'select', like every other status
> > information function
> > provided by the kernel, does not guarantee anything about the
> > future. Just
> > like 'stat' does not guarantee that the file size will still be the same
> > later when you call 'read'.
>
> We're not talking about the future. Get off that horse.
>
> We're talking about the present. At the time select() is invoked, there is
> *no* available data. select() is lying.

There is no difference to the application between a UDP packet that was
discarded due to memory pressure in-between a 'select' hit and a packet that
was received with a bad checksum. If you can write an application that can
detect the difference, do so. Then I'll agree with you. Until that time,
this argument fails due to the 'as if' rule. If the API can perform
identically under conditions that do not differ as far as the application is
concerned, then the behavior is legal.

DS


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/